0% found this document useful (0 votes)
32 views692 pages

A Concise Course in A-Level Statistics - Crawshaw.J

This document is a second edition of 'A Concise Course in A-Level Statistics' aimed at students and teachers of A-level Pure Mathematics with Statistics. It presents statistical theory concisely, supported by worked examples and graded exercises to reinforce learning. The text covers essential topics required by major examining boards and includes permissions for reproduced questions from various educational institutions.

Uploaded by

Juan Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views692 pages

A Concise Course in A-Level Statistics - Crawshaw.J

This document is a second edition of 'A Concise Course in A-Level Statistics' aimed at students and teachers of A-level Pure Mathematics with Statistics. It presents statistical theory concisely, supported by worked examples and graded exercises to reinforce learning. The text covers essential topics required by major examining boards and includes permissions for reproduced questions from various educational institutions.

Uploaded by

Juan Gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 692

A CONCISE COURSE IN

A-LEVEL STATISTICS
Second Edition
Also from the same publisher:

Greer A FIRST COURSE IN STATISTICS


Greer REVISION PRACTICE IN STATISTICS
Francis ADVANCED LEVEL STATISTICS
Montagnon FOUNDATIONS OF STATISTICS
Greer STATISTICS FOR ENGINEERS
White etal TABLES FOR STATISTICIANS
Thomas NOTES AND PROBLEMS IN STATISTICS
A CONCISE COURSE IN"
A-LEVEL STATISTICS
With Worked Examples

Second Edition

J CRAWSHAW ssc
Head of Mathematics Department
Clifton High School, Bristol

J CHAMBERS ma
Head of Mathematics Department
Sutton High School GPDST, Surrey

STANLEY THORNES (PUBLISHERS) LTD


© JCrawshaw and J Chambers 1984, 1990
Original line illustrations © Stanley Thornes (Publishers) Ltd 1990

All rights reserved. No part of this publication may be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopy, recording, or any information
storage and retrieval system, without permission in writing from the publisher or under licence
from the Copyright Licensing Agency Limited. Further details of such licences (for reprographic
reproduction) may be obtained from the Copyright Licensing Agency Limited, of 90 Tottenham
Court Road, London W1P 9HE.

First published in 1984 by

Stanley Thornes (Publishers) Ltd


Old Station Drive \
Leckhampton
CHELTENHAM GL53 0DN
UK

First Edition 1984


Reprinted 1985 (twice)
Reprinted 1986
Reprinted 1987
Reprinted 1988 (twice)
Reprinted 1989
Second Edition 1990
Reprinted 1991

British Library Cataloguing in Publication Data

Crawshaw, J.
A concise course in A-level statistics. With worked examples
2nd ed.
1. Statistics
I. Title II. Chambers, J.
519.5

ISBN 0-7487-0455-8

Typeset by Tech-Set, Gateshead, Tyne & Wear.


Printed and bound in Great Britain at The Bath Press, Avon.
~ CONTENTS
PREFACE ix The Cumulative Distribution Function 189 +
Two Independent Random Variables 192
1 DESCRIPTIVE STATISTICS 1 The Distribution of X,+ X, 197
_ Discrete and Continuous Data i Comparing the Distributions of
Frequency Distribution 2 2X and X,+ X, 200
Histograms bre Summary — Discrete Random Variables 204
Circular Diagrams or Pie Diagrams 11
Frequency Polygons 152 4 SPECIAL DISCRETE PROBABILITY
Frequency Curves 16 DISTRIBUTIONS 209
Cumulative Frequency 17 The Binomial Distribution 209
Cumulative Frequency Curve 18 Expectation and Variance 214
“57 The Median 22 Z Diagrammatic Representation of
Quartiles, Percentiles, Interquartile the Binomial Distribution 217
Range 99 & Cumulative Binomial Probability
The Mode 34 2 Tables 219
The Arithmetic Mean Siler The Recurrence Formula 221
The Mean Deviation from the Mean 46 4 Fitting a Theoretical Distribution 225
The Standard Deviation 48 2. Worked Example 229
The Variance 48 2. Summary — Binomial Distribution 233
The Use of Calculators 51 2 The Geometric Distribution 235
Sealing Similar Sets of Data 54 2 Expectation and Variance 236
Combining Sets of Numbers 56 2 The Poisson Distribution 242
Miscellaneous Worked Examples 66 Expectation and Variance 243
Uses 245
X42 PROBABILITY Unit Interval 246
Classical Definition of Probability Cumulative Poisson Probability
Important Results Tables 251
Mutually Exclusive Events Diagrammatic Representation of
Exhaustive Events the Poisson Distribution 252
Conditional Probability The Recurrence Formula 254
Independent Events Fitting a Theroetical Distribution 255
Summary — Probability Laws The Distribution of Two Independent
Extension of Results to more than Poisson Variables 258
Two Events Miscellaneous Worked Examples 262
Probability Trees Summary — Poisson Distribution 268
Bayes’ Theorem 115
Some Useful Methods 120 5 PROBABILITY DISTRIBUTIONS II
Arrangements 124 —CONTINUOUS RANDOM
Permutations of r Objects from VARIABLES 272
n Objects 130 Probability Density Function 272
Combinations of r Objects from Expectation 275
n Objects 131 Variance 281
Summary — Arrangements, Permutations The Mode 286
and Combinations 138 Cumulative Distribution Function
Miscellaneous Worked Examples 140 F(x) 287
Obtaining the p.d.f. from the
3 PROBABILITY DISTRIBUTIONS I Cumulative Distribution 295
~ —DISCRETE RANDOM VARIABLES 167 The Rectangular Distribution 302.
Discrete Random Variable 167 en Expectation and Variance 304
Probability Density Function 168 “ The Exponential Distribution 307
Expectation, E(X) ey Expectation and Variance 308
Expectation of any Function of The Link Between the Exponential
X, E[g(X)] 179
183
Distribution and the Poisson
Distribution 314
Variance, Var(X)
The Normal! Distribution 317 The Most Efficient Estimator of the
Expectation and Variance 317 Population Variance 425
Miscellaneous Worked Examples 320 Estimator of Population Proportion 428
Summary — Continuous Random Pooled Estimators from Two Samples
Variables 326 mean and variance 429
proportion 431
THE NORMAL DISTRIBUTION 331 Summary — Point Estimators 433
Probability Density Function of Interval Estimation —Confidence
Normal Variable 331 Intervals 433
The Standard Normal Distribution 332 Confidence Interval for the Population
The Probability Density Function Mean
for Z, 0(2) 333 (a) o? known 434
The Cumulative Distribution Function (b) (i) o unknown, sample size large 437
for Z, B(z) 333 The ¢-distribution 441
Use of the Standard Normal Tables Use of t-distribution Tables 442
using ®(z) 334 Confidence Interval for the Population
Use of the Standard Normal Tables for Mean
any Normal Distribution 338 » (b) (ii) o” unknown, sample size small 444
Problems which Involve Finding the Confidence Interval for the Proportion
Value of wt or 0 or both 346 of Successes in a Population 448
Miscellaneous Worked Examples 350 Summary — Confidence Intervals 455
The Normal Approximation to the
Binomial Distribution 355 9 SIGNIFICANCE TESTING 458
The Normal Approximation to the Null and Alternative Hypotheses 458
Poisson Distribution 362 Critical Regions and Critical Values 459
When to Use the Different One-tailed and Two-tailed Tests 459
Approximations 364 Testing a Single Sample Value 462
Testing a Mean
RANDOM VARIABLES AND (1) 0? known 465
RANDOM SAMPLING 374 (2) o* unknown, sample size large 473
Sum and Difference of Two Independent (3) o unknown, sample size small 477
Normal Variables 374 Testing the Difference between Means 481
Extension to More than Two Independent Testing a Proportion 492
Normal Variables 377 Testing the Difference between
Multiples of Normal Variables 383 Proportions 497
Distinguishing Between Multiples and Summary — Significance Testing 503
Sums of Random Variables 385 Tests Involving the Binomial
Miscellaneous Worked Examples 388 Distribution 506
Summary — Sums Differences and Summary — Testing a Binomial
Multiples of Independent Proportion n not large 513
Normal Variables 391 Tests Involving the Poisson
The Sample Mean 395 Distribution 514
Sampling Without Replacement 397 Summary — Testing a Poisson Mean 517
The Distribution of the Sample Mean Type I and Type II Errors 518
(a) from a normal population 399
(b) from any population 403 10 THE x? TEST 533
The Distribution of the Sample The Chi-squared Distribution 533
Proportion 406 The X? Test 534
Summary — The Sample Mean and the Uniform Distribution 536
Sample Proportion 410 Distribution in a Given Ratio 539
Random Sampling 410 Goodness of Fit Tests 540
Random Number Tables 411 Binomial Distribution, p known 540
Sampling from Given Distributions Binomial Distribution, p unknown 542
(a) frequency distributions 412 Poisson Distribution 543
(b) probability distributions 413 Normal Distribution, u and o* known 545
Normal Distribution, U and o
ESTIMATION OF POPULATION unknown 546
PARAMETERS 420 Use of X? Test in Contingency Tables 548
Point Estimation — Unbiased Estimator 420 2 X 2 Contingency Tables 548
Consistent Estimator 423 h Xk Contingency Tables 551
The Most Efficient Estimator of the Summary — xX? Test and Degrees of
Population Mean 425 Freedom 554
11 REGRESSION AND CORRELATION 559 APPENDIX 1 629
Scatter Diagram 559 Random Numbers 629
Regression Function 559 Cumulative Binomial Probabilities 630
Linear Correlation and Regression Cumulative Poisson Probabilities 632
Lines 559 The Distribution Function ®(z) of
Drawing a Regression Line ‘by Eye’ 560 the Normal Distribution N(0,1) 634
Calculating the Equations of the Upper Quantiles z),,) of the Normal
Least Squares Regression Lines 564 Distribution N(0, 1) 635
Covariance 567 Upper Quantiles t;,| of the
Alternative Method for Least Squares t-Distribution t(v) 636
Regression Lines 568 Chi-squared Tables 637
Minimum Sum of Squares of Residuals 569 Tables A, B, C 638
The Product-moment Correlation The Upper Tail Probabilities Q(z) of
Coefficient 576 the Normal Distribution N(0,1) 640
Alternative Method for the Minimum
Sum of Squares of Residuals 579
Relationship Between Regression
Coefficients and r 581 APPENDIX 2 oa
Using a Method of Coding 588 Use of the Standard Normal Tables
Coefficients of Rank Correlation 591 using Q(2) a=
Spearman’s Coefficient of Rank Use of the Standard Normal Tables
Correlation rg 591 for any Normal Distribution 644
Significance of Spearman’s Rank Problems which Involve Finding the
Correlation Coefficient 596 Value of 6 or # or both 50
Kendall’s Coefficient of Rank
Correlation r, 605
Significance of Kendall’s Rank ANSWERS 654
Correlation Coefficient 607
Miscellaneous Worked Examples 613
Summary — Regression and
Correlation 620 INDEX 675
Geo ihe ie ah : nese
Bch «= Svypacte tine Oe aio wintined ~ :
Hen - ee cao wr te OO Petaluma! - x
S65 OF Lae aaiRin ay
57 (Meta? aolindinalt aT °°
ste 41 ,0)8 anitdinsi A i warty? orit Gn8
© 1 dibtAldee tetee elton qa! On one say
165 Trobabiiiy Deitt aA agtieat agit] rau * pce 7) any LHX
AeA \noltnanl yay 9 FOE >> ma —o ne
géy th+ Maries 9 yry dereadinwel) ve 28 “en GLb
Tea! Thi Ps pele) 6 all ite
aeTt ae CD . ad
. 388 vara 5 HA enlist “Oa esti
herby a6 Awiidsr a rm Y wags Sat t
OFS (LW weirégtd4 Igenot attr *: @s.
S:
=t 4 ,@ z a ‘ i =

at TE
aoe
+ inet aint he st
ietenn & marae
#2 yotan 3
fan ft
fog
tga -
i ’ <tiaaeh

alattae
| pe

* olga! gatasoit beversuitierle to oa 66 Se


45 . eaerat: sip Anwstn yal ra a
saber bebe? tiaive wetatiion®?. oN Sit epee
ihTe:Pane eee 7) ooptnll ; oee * ss “a: %
a2 tscelegs Bigs
aeatton © call nf
Sea
roa
Taes
«ff a
“oi oy
> oe ova be =
7 e * iZ Test) a)

om aa ey hant \e a Ls
aia. Totenee
7 at x ; Fig af ‘
a ’ p our 7 s
aia eee Vac Soruiving>ie
~1
ran 1, SS am
. Svomery *~ "Taba
PREFACE
This text is intended primarily for use by students and teachers of
the statistics section of A-level Pure Mathematics with Statistics, an
increasingly popular course.
Points of theory are presented concisely and illustrated by suitable
- worked examples, many taken from previous A-level papers. These
are then supported by very carefully graded exercises which serve to
consolidate the theory, link it with previous work and build up the
confidence of the reader. There are frequent summaries of main
points and miscellaneous exercises containing mainly A-level
questions.
Throughout the text we have aimed to provide the reader with a
mathematical structure and a logical framework within which to
work. We have given special attention to topics which, in our
experience, cause great difficulty. These include probability theory,
the theory of continuous random variables and significance testing.
The text covers the main theory required by all the major examining
boards. We are very grateful to the following for permission to
reproduce questions:
University of Cambridge Local Examinations Syndicate (C)
The Southern Universities’ Joint Board (SUJB)
Joint Matriculation Board (JMB)
University of London (L)
Oxford and Cambridge School Examinations Board (O & C)
incorporating School Mathematics Project (SMP)
Mathematics in Education and Industry (MEI)
The Associated Examining Board (AEB)
Oxford Delegacy of Local Examinations (O)
A-level questions are followed by the name of the board. Questions
from Additional Mathematics papers are indicated by the word
Additional, and (P) indicates a part-question.
We are particularly indebted to The Associated Examining Board
and The Southern Universities’ Joint Board for allowing us to use
some of their questions as worked examples, and would stress that
they are in no way involved in, or responsible for, this working.
We extend our thanks to our families, colleagues and students for
all their encouragement and support, in particular to Audrey
Shepherd and Jane Ziesler.
J Crawshaw
J Chambers

1x
PREFACE TO THE
SECOND EDITION
In order to give a fully comprehensive coverage of the present
A-level syllabuses the following material has been added:
Chapter 4 — The use of binomial and Poisson cumulative
probability tables. The geometric distribution
Chapter 5 — The negative exponential distribution
Chapter 6 — The use of the standard normal cumulative tables
®(z) (with the use of tables giving Q(z) retained in
the Appendix)
Chapter 7 — Random sampling and the use of random number
tables
Chapter 9 — Significance testing relating to the binomial and
Poisson distributions
Chapter 11 — A fuller treatment of correlation and linear regres-
sion, including significance testing relating to
Spearman’s and Kendall’s coefficients of correlation.
Numerous recent A-level questions taken from all the major
examining boards have been added, together with worked examples
from the University of London Schools Examination Board which
we would stress is in no way responsible for these solutions.

J Crawshaw
J Chambers
1990
DESCRIPTIVE STATISTICS
DISCRETE DATA
These are the marks obtained by 30 pupils in a test:

G34. 9 EO 1 Beye rr ois 1S.


Co eee Oe wero e 4 > 6.2. tO Se4S
This is an example of discrete raw data.
Discrete data can assume only exact values, for example
the number of cars passing a checkpoint in a certain time,
the shoe sizes of children in a class,
the number of tomatoes on each of the plants in a greenhouse.
The data is ‘raw’ because it has not been ordered in any way.
To illustrate the data more concisely, a frequency distribution can
be formed. We count the number of 0’s, 1’s, 2’s,... , and form a
table:

Frequency toes ea gem egan 17)Total 80


Discrete data can be grouped into ‘classes’, but once this has been
done some of the original information is lost:

Ge AS AGnLe 6 Total 30

CONTINUOUS DATA
These are the heights of 20 children in a school. The heights have
been measured correct to the nearest cm.

133 136 120


131 127 141
130 131 125
134 135 137
This is an example of continuous raw data.

1
2 A CONCISE COURSE IN A-LEVEL STATISTICS

Continuous data cannot assume exact values, but can be given only
within a certain range or measured to a certain degree of accuracy,
for example
144 cm (correct to the nearest cm) could have arisen from any
value in the range 143.5cm <h< 144.5 cm.
Other examples of continuous data are
the speeds of vehicles passing a particular point,
the masses of cooking apples from a tree,
the time taken by each of a class of children to perform a task.

FREQUENCY DISTRIBUTIONS

To form a frequency distribution for the heights of the 20 children


we group the information into ‘classes’ or ‘intervals’:

(Alternative ways of writing


the interval)

119.5 <p <124.5 119.5-124.5 120-124


124.5 <h <129.5 124.5-129.5 125-129
129.5 <h< 134.5 129.5-134.5 130-134
134.5 <h < 139.5 134.5-139.5 135-139
139.5 <h<144.5 139.5-144.5 140-144

The values 119.5, 124.5, 129.5, ..., are called the class boundaries.

NOTE: the upper class boundary (u.c.b.) of one interval is the lower
class boundary (1.c.b.) of the next interval. —

The width ofan interval = u.c.b.—l.e.b. _


Therefore the width of the first interval = 124.5—119.5

= 5
In fact, in this example, each of the classes has been chosen so that
the width is 5.
To group the heights into the following classes it helps to use a
‘tally’ column, entering the numbers in the first row, then the
second row, and so on.

Height (cm)
119.5 <h < 124.5
124.5 <h<1295
129.5 <h < 134.5
134.5 <<h< 139.5
1389.5 <h < 144.5
DESCRIPTIVE STATISTICS

The final frequency distribution should read:

Height (cm) Tally


119.5 <h << 124.5
124.5 <h < 129.5
129.5 <h < 1384.5

Example 1.1 The following table gives the diameters of 40 ball-bearings, each
measured in cm correct to 2 decimal places (d.p.). Form a frequency
distribution by taking classes of width 0.02 cm.

Solution 1.1 The smallest value in the table is 3.93 and the largest value is 4.04.
As measurements have been taken in cm correct to 2 d.p., the lowest
class boundary is 3.925cm. As the class width is 0.02 cm, the first
interval must have an upper class boundary of 3.945cm.
So we take as class boundaries 3.925, 3.945, 3.965, ..., 4.045.
The frequency distribution is as follows:

Diameter (cm)

3.925 <d < 3.945


3.945 <d < 3.965
3.965 <d < 3.985
3.985 <d < 4.005
4.005 <d < 4.025
4.025 <d < 4.045

NOTE: The intervals are often written

Diameter (cm)

3.93-3.94
3.95-3.96
3.97-3.98
and so on

Remember to work out the class boundaries.


A CONCISE COURSE IN A-LEVEL STATISTICS

The following frequency distributions show some of the ways in


which data may be grouped.

(i) Frequency distribution to show the lengths of 30 rods. Lengths


have been measured to the nearest mm.

Length(mm) |27-31 32-36 37-46 47-51

The interval ‘27-31’ means 26.5 mm < length < 31.5 mm.
The class boundaries are 26.5, 31.5, 36.5, 46.5, 51.5
The class widths are Sree Se rT Oss 25

(ii) Frequency distribution to show the marks in a test of 100


students

30-39 40-49 50-59 60-69 70-79 80-89


10 14 26 20 18 12
The class boundaries are 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 89.5
The class widths are 10; 10y iano, «LOS aA Oa

(iii) Frequency distribution to show the lengths of 50 telephone


calls

Length
of call (min) O0O- 38- 6- 9- 12- 18-

The interval ‘3-’ means 3 minutes <time <6 minutes, so any time
including 3 minutes and up to (but not including) 6 minutes comes
into this interval.
The class boundaries are 0, 3,6,9,12,18
The class widths are 3,3, 3, 3546

(iv) Frequency distribution to show the masses of 40 packages


brought to a particular counter at a post office

Mass (g) -100 -250 -500 -800

The interval ‘-250’ means 100g < mass < 250 g; so any mass over
100 grams up to and including 250 grams comes into this interval.
The class boundaries are 0,100, 250, 500, 800
The class widths are 100, 150, 250, 300
DESCRIPTIVE STATISTICS oi

(v) Frequency distribution to show the speeds of 50 cars passing a


checkpoint

Speed (km/h) 20-30 30-40 40-60 60-80 80-100


Frequency 2 7 20 16 5

The class ‘30-40’ means 30 km/h < speed < 40 km/h.


The class boundaries are 20, 30, 40, 60, 80, 100
The class widths are mero, 2075205 20

(vi) Frequency distribution to show ages (in completed years) of


applicants for a teaching post

Age (years) 21-24 25-28 29-32 33-40 41-52


Frequency 4 2 2 1 i

As the ages are in completed years (not to the nearest year) then
‘21-24’ means 21 < age < 25. Someone who is 24 years and 11
months would come into this category. Sometimes this interval is
written ‘21-’ and the next is ‘25-’, etc.
The class boundaries are 21, 25, 29, 33, 41, 53
The class widths are weed 8. 12

HISTOGRAMS

Grouped data can be displayed in a histogram.

In a histogram rectangles are drawn so that the area of each


rectangle is proportional to the frequency in the range covered

We have area « frequency

(a) Histograms with equal class widths


Example 1.2 The lengths of 30 Swiss cheese plant leaves were measured and the
information grouped as shown. Measurements were taken correct to
the nearest cm. Draw a histogram to illustrate the data.

Length of leaf (cm) 10-14 15-19 20-24 25-29

Solution 1.2 The class boundaries are 9.5, 14.5,19.5, 24.5, 29.5
The class widths are 5, Ds 5, 5

Now, area of rectangle = class width x height of rectangle


A CONCISE COURSE IN A-LEVEL STATISTICS

As the class width is 5 for each interval,


area of rectangle = 5X height of rectangle

So area < height of rectangle


Now, if we make the height of each rectangle the same as the
frequency,
we have area « frequency, as required.

When all the class intervals are of equal width the penis can
be used for the height of each rectangle.

Histogram to show the lengths of 30 leaves

Frequency

Length of leaf (cm)

(b) Histograms with unequal class widths


Example 1.3 The frequency distribution gives the masses of 35 objects, measured
to the nearest kg. Draw a histogram to illustrate the data.

Mass (kg) 6-8 oor”) 19"t'7 18-90) meait29

Solution 1.3 The class boundaries are 5.5,8.5, 11.5, 17.5, 20.5, 29.5
The class widths are rs Pae's 6, 3, 9
As the class widths are not equal we cannot make the height of each
rectangle equal to the frequency.
So we choose a convenient width as a ‘standard’ and adjust the
heights of the rectangles accordingly, as follows.
If we choose a class width of 3 as standard, then the first two
rectangles can be 4 and 6 units high respectively. However, as the
third interval is twice the standard width we must make the height
of the rectangle equal to half the frequency.
Similarly, as the last interval is 3 X standard we must make the height
of the rectangle equal to one-third of the frequency.
DESCRIPTIVE STATISTICS

As the heights of the rectangles have been adjusted, we are con-


sidering frequency per standard width. We will write this as ‘standard
frequency’.

Height of
Mass (kg) Class width Frequency | rectangle (stan-
dard frequency)

standard 4
standard 6
2 X standard 10
standard 3
3 X standard 12

We have now ensured that the area of each rectangle is proportional


to the frequency, and the histogram is drawn as shown.

Histogram to show the masses of 35 objects

Standard
frequency

Mass (kg)

In general, choose a ‘standard’ width.


If class width = nx standard width
1 ae
then height of rectangle = ao corresponding frequency

Example 14 The following table gives the distribution of the interest paid to 460
investors in a particular year.

Tndekest(e)edabe ssoeiab4oed eG05e! gol m02


17- 55° 49" Fi5s" 93 0
Draw a histogram to illustrate this information.
A CONCISE COURSE IN A-LEVEL STATISTICS
8

Solution 1.4 The class boundaries are 25, 30, 40, 60, 80, 110
The class widths are 5, 10, 20, 20, 30
We will choose a class width of 10 as the standard width.

Standard
Interest (£) Class width frequency

5 | $X standard 17 1
isX17 2 34
=
standard 55 55
2X standard 142 3X 142 =71
2X standard 153 $X153 = 76.5
3 X standard 93 3x 93 = 31

Histogram to show the interest paid to 460 investors

80

70

60

frequency
Standard
50

40
30

0 ¥ 25 30 40 50 60 70 80 90 100. 110
Interest (£)

Example 1.5 The following table gives the distribution of marks of 60 pupils in a
test. Draw a histogram to illustrate the data.

0-9 30-14. 948-19) Vaglcoastoneed


Frequency 13 19 12 a 9

Solution 15 The class boundaries are 0,9.5,14.5,19.5, 24.5, 34.5


The class widths are 9.57 OF \\ 60, 5, 10

We will choose a class width of 5 as the standard width.


DESCRIPTIVE STATISTICS

95x
© standard

standard
standard
standard
2 X standard

Histogram to show the marks of 60 pupils

20

frequency
Standard
10

CF pelponie
34.5
Marks

Alternative Approach The first interval can be regarded as having


a lower class boundary of —0.5, in which case the width of the first
interval is 10. Therefore the height of the first rectangle is 6.5 and
the histogram would look like this:

Histogram to show the marks of 60 pupils

frequency
Standard
10 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 1a

1. The following table gives the distribution Number of


of marks obtained by 101 pupils in a test. O- 10- 18- 30- 35- 45- 50- 60-
programmes

Number of 3
L636) 215 02" Oeeoae O
viewers

Draw a histogram to illustrate the data.

68 smokers were asked to record their


consumption of cigarettes each day for
several weeks. The table shown is based
on the information obtained.

Construct a frequency distribution, taking Average number


equal class intervals 10-19, 20-29, 30-39, of cigarettes O- 8- 19-16=24- 98" 34250
..., 90-99. smoked per day

Drawa histogram to illustrate the data.


12285 8) 46
smokers
; :

2. The masses of 50 apples (measured to the


Illustrate these data by means of a histo-
nearest g) were noted and shown in the
gram. (C Additional) P
table.
LTA TIS Sie 92" Fae 16y 05 On a particular day, the length of stay of
93 101 111 96 117 100 106 each car at a city car park was recorded.
107 96 101 102 104 92 99
The length of stay was measured to the
105 113 100 103 108 92 109
nearest minute. The results were as shown in
103 110 113 99 106 116 101
in Table A below.
88 108 92
Taking 20 minutes as ‘standard’, draw a
Construct a frequency distribution, using histogram to illustrate this information.
equal class intervals of width 5 g, and
taking the lower class boundary of the
first interval as 84.5 g. G) The marks awarded to 136 pupils in an
examination are summarised in Table B
Draw a histogram to illustrate the data. below.

3. The masses (measured to the nearest g) of Draw a histogram to illustrate the data.
washers are recorded in the table. Draw a
histogram to illustrate the data. 38 children solved a simple problem and
the time taken by each was noted.
Mass (g) 0-2 3-5 6-11 12-14 15-17
Time
20 20 eae
iieencalie 10
4. 100 people were asked to record how
many television programmes they watched Draw a histogram to illustrate this informa-
in a week. The results were as follows: tion.

Table A

Length of
6-25 26-60 61-80 81-105 106-115 116-150 151-200 201-300
stay (min)

Table B

|Marks [10-29 30-39 40-49 50-59 60-64 65-69 70-84


29:*! 7B N. ODF oa) uae Merl a ulead,
DESCRIPTIVE STATISTICS d 11
oO Table C below shows the number of pupils Draw a histogram to illustrate these data.
gaining marks within various groups in an (C Additional) P
examination.

Table C

5-29 30-39 40-49 50-59 60-79 80-99


Number of pupils 30 30 65 48 40 20

CIRCULAR DIAGRAMS OR PIE DIAGRAMS

Another useful way of displaying data is to draw a pie diagram,


sometimes called a pie chart. Here again, area is proportional to
frequency.

The sales (in thousands of litres) of petrol from four petrol stations
A,B,C and D are noted for the first week of March, and are shown
in the table:

Construct a pie diagram to illustrate this information.

Solution 1.6 The total angle of 360° at the centre of acircle is divided according
to the sales at each of the stations.
The total sales (thousands of litres) = 90+140+30+20 = 280

The angle representing the sales of petrol at station A is given by

(= (360) = 115.7° (1d.p.)


and so, for each of the petrol stations we have

Petrol station
A CONCISE COURSE IN A-LEVEL STATISTICS

Pie diagram to show the sales of petrol (in thousands of litres)

Station A

(90)

Station B
Station D
(140) (20)

Station C

(30)

COMPARISON OF DATA USING PIE DIAGRAMS

Pie diagrams are particularly useful when we wish to compare two


or more sets of similar data.
Suppose that we are given information about the land use (for
barley, wheat and woodland) in three parishes. We can draw three
pie diagrams to illustrate the land use in each parish. However, if we
wish to compare the sets of data we must make the size (or area) of
each circle proportional to the total land for each parish. In this
example we will refer to the total amount of land as the ‘frequency’
F.
The area of a circle is mr, so we will require, with obvious notation,
Mr hla. Arss Oe Py Poo hs
sO yin” ‘ts = FFs:PF, (cancelling 7)
ie. Tyifoits = VFy:VFo:\/F3_ (taking
square roots)
So, the radii of the circles are proportional to the square roots of
the frequencies.

We then choose a convenient scale and draw the circles.

Example 1.7 The following agricultural statistics refer to the land use, in hectares,
of three parishes. Draw three pie diagrams to compare these data.

Appleford 1830 1640 550 -


Burnford 645 435 120
Carnford 820 160 150
DESCRIPTIVE STATISTICS . 13

Solution 1.7 Now F, = 4020, F,= 1200 and F; = 630.


So Mergir3 = VFiiVFy:VF3

= /4020:+/1200 :./630
63.40 : 34.64: 25.10

SALT e43221-255
For convenience, we take r; = 3.2cm,r,= 1.7 cmandr;= 1.3 cm.

The angles in the pie diagrams are calculated as shown in the table:

1830 1640 [oe


Appleford ne } (960)= 163.9° 1088(360)= 146.9° |60) = 492°
4020 4020 4020
645, 435 20
Burnford (360)= 193.5° |(860)= 190.5 |
(360)= 36°
1200 1 200

0
32 pee
Carnford 22°)(360)= 182.9° (360)
= 91.4° 250)(360)== 85.7°
630 630 630

Pie diagrams to show land use (in hectares) in three parishes

APPLEFORD

Barley

(1830)

Woodland
Wheat (550)

(1640)

BURNFORD
CARNFORD

Barley
Barley
(645)
(320)

Woodland Wheat |Woodland


(120) (160) | (150)
14 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 1b

1. Construct a pie diagram to illustrate the és.) Five companies form a group. The sales of
scores obtained when a die is thrown 120 each company during the year ending 5th
times. April, 1978, are shown in the table below.

Company A B C D E
Sales (in
20 35 860
£1000’s) oP tae
Draw apie chart of radius 5 cm to illustrate
2. The results of the voting in an election this information.
were as follows: For the year ending 5th April, 1979, the
total sales of the group increased by 20%,
2045 votes ,and this growth was maintained for the
4238 votes year ending 5th April, 1980.
8605 votes
If pie charts were drawn to compare the
12012 votes
total sales for each of these years with the
Represent this information on a pie total sales for the year ending 5th April,
diagram. 1978, what would be the radius of each of
these pie charts?
If the sales of company E for the year
ending 5th April, 1980, were again
3. The pie chart, which is not drawn to scale,
£60 000, what would be the angle of the
shows the distribution of various types of
sector representing them? (C Additional)
land and water in a certain county. Cal-
culate
6. Mr Williams worked out how much it had
(i) the area of woodland,
cost him to run his car for each of 3
(ii) the angle of the urban sector,
consecutive years. The results were as
(iii) the total area of the county.
follows:
(C Additional) P
Tax and
insurance

£150.00 £72.50 £190.00


£187.00 £116.00 £205.00
£175.00 £289.90 £253.10
WOODLAND Draw three pie diagrams to compare this
88° information.

160° 7. Housewives were asked how much they


spent last week on various items. Mrs M
FARMLAND
replied as follows:

fem
1200 km?
Item A

Draw a pie diagram with radius 4 cm to


4. The table shows the sales, in millions of illustrate this information.
dollars, of a company in two successive A comparison was then made with the pie
years.

[atin[Amen[Asia[Bape
diagram drawn to illustrate Mrs N’s replies
Year | in which the circle representing the total
amount had a radius of 5cm, the sector
1972 8.4 12.2 15.6 23.8 representing the amount spent on item A
1973 5.5 6.7 13.2 19.6 had an angle of 7 2° and the amount spent
on item B was £4.00. Find the amount
Draw two pie charts which allow the total spent on item C by Mrs N and draw a pie
annual sales to be compared. (C Additional) diagram to illustrate her expenditure.
DESCRIPTIVE STATISTICS ; 15

FREQUENCY POLYGONS
A frequency distribution may be displayed as a frequency polygon.

(a) Ungrouped data

Example 18 Twenty pieces of material, each of length 10 m, were examined for


flaws and the number of flaws in each length was noted. Draw a
_ frequency polygon to illustrate this information.

Number of flaws 0 1 2 3 4 5

Solution 1.8 Points are plotted, with the number of flaws on the horizontal axis
and the frequency on the vertical axis.

A frequency polygon to show the number


of flaws in lengths of material

Frequency

1 1 2 3 4 5
Number of flaws

(b) Grouped data


A frequency polygon may be superimposed on a histogram by
joining the mid-points of the tops of the rectangles.

Example 1.9 Construct a frequency polygon for the data given in Example 1.3.
16 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 1.9 Frequency polygon to show the masses of 35 objects

frequency
Standard

0 5:5 8.5 11.5 14.5 17.5 20.5 23.5 26.5 29.5


Mass (kg)

The frequency polygon can be constructed without drawing the


histogram first. To do this, plot standard frequency against the mid-
point of the interval.

NOTE: the mid-point of the interval (a, 0) is (a 4-0),

FREQUENCY CURVES
If the number of intervals is large, then the frequency polygon will
consist of a large number of line segments. The frequency polygon
approaches a smooth curve, known as a frequency curve.
Frequency curve

Exercise 1c

1. Ina competition to grow the tallest holly- Draw a histogram and superimpose the
hock, the heights recorded by 50 com- frequency polygon.
petitors were as follows. Heights were
measured to the nearest cm (see Table A
below).

TableA

Height (cm) |177-186 187-191 192-196 197-201 202-206 207-216


Frequency 12 8 8 S, a 6
DESCRIPTIVE STATISTICS 17

2. (a) The following table shows the weekly 3. The table shows the duration, in minutes,
sales of television sets in a department of 64 telephone calls made from a high
store in one year. street call box in one day.
Number of sets Length of
5-13 14-22 23-31 32-40 41-49
sold/week call (min)
Number of weeks Frequency 3 1 22 20 6 6 0

Draw a frequency polygon to illustrate this Draw a frequency polygon to illustrate the
information. information.

(6) The following year the sales were as 4. The table shows the ages (in completed
follows: years) of women who gave birth to a child
at Anytown Maternity Hospital during a
Number of sets particular year.
5-13 14-22 23-31 32-40 41-49
sold/week

Number of weeks 3 16 20 12 A Age (years) 16- 20- 25- 30- 35- 45-
Number of births 70 470 535 280 118 0O
Draw a frequency polygon to show the
sales in the second year, on the same grid Draw a frequency polygon to illustrate this
as part (a). information. Do not draw a histogram first.

CUMULATIVE FREQUENCY

The cumulative frequency is the total frequency up to a particular


item or class boundary. Sometimes this is thought of as a ‘running
total’.

(a) Ungrouped data

Example 1.10 The marks of 40 pupils in a test are shown in the table. Construct a
cumulative frequency distribution.

Mark

Frequency

Solution 1.10 The cumulative frequency distribution for the marks is as follows.

Up to and including 4
Up to and including 5 Zoo
Up to and including 6 2 oO 1.0 =15
Up to and including i 25 +o LO = 25
Up to and including 8 2AA5 428410 -7 = 32
Up to and including 9 2-54 8 -aL0 +7 + 5 = 37
Up to and including 10 2+5+8+10+7+5+3=40

NOTE: the final value in the cumulative frequency column must be


AO, as all the pupils obtained 10 marks or less.
18 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) Grouped data


When data is grouped we consider the total frequency up to the
upper class boundary of each interval.

Example 1.11 The heights of 30 broad bean plants were measured, correct to the
nearest cm, 6 weeks after planting. The frequency distribution is
given below. Construct the cumulative frequency table.

fiitiom)| 6-5 |6-8]oi |i244[ae | 10-00


Solution 1.11. The upper class boundaries are 5.5, 8.5, 11.5, 14.5, 17.5, 20.5.
The lower boundary of the first class is 2.5.

Cumulative frequency table to show heights of plants

Height (cm) Cumulative frequency

CUMULATIVE FREQUENCY CURVE


The information in a cumulative frequency table can be shown on a
graph, called a cumulative frequency curve, or ogive.
The cumulative frequencies are plotted against the upper class
boundaries.

Example 1.12. (a) Construct a cumulative frequency curve for the data in Example
LAT.
(b) Estimate from the curve
(i) the number of plants that were less than 10 cm tall;
(ii) the value of x, if 10% of the plants were of height x cm or
more.
DESCRIPTIVE STATISTICS , 19

Solution 1.12 (a) Cumulative frequency curve to show the heights of


30 broad bean plants

frequency
Cumulative

Height (cm)

(b)
(i) To find how many plants were less than 10 cm tall, find the
height, 10 cm on the horizontal axis. Draw a vertical line to meet
the curve and then draw a horizontal line to meet the cumulative
frequency axis.

From the graph, 7 plants were less than 10 cm tall.

(ii) 10% of the plants were of height x cm or more,


so 90% of the plants were less than x cm tall,
i.e. 27 plants were less than x cm tall.

Find 27 on the cumulative frequency axis and draw a horizontal


line to meet the curve. Then draw avertical line to meet the height
axis.

From the graph, 27 plants were less than 16 cm tall.

Therefore 10% of the plants were of height 16 cm or more, and the


value of x is 16.
20 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 1.13 Pupils were asked how long it took them to walk to school on a
particular morning. A cumulative frequency distribution was
formed:

Time taken | —5 <19 <15 <20 <25 <30 <35 <40 <45
(minutes)

Cumulative 28 45 81 143 280 349 3874 395 400


frequency

(a) Draw a cumulative frequency curve.


(b) Estimate how many pupils took less than 18 minutes.
(c) 6% of the pupils took x minutes or longer. Find x.
(d) Taking equal class intervals of 0-, 5-, 10-,..., construct a
frequency distribution and draw a histogram.

Solution 1.13 (a) Cumulative frequency curve to show the times taken to walk to
school

frequency
Cumulative

0 5 TOS 152420 625 9"s0 Ps 5aeed0m™ 45


Time (minutes)

(b) From the graph we estimate that 114 pupils took less than 18
minutes.

(c) 6% of the pupils took x minutes or longer,


so 24 pupils took x minutes or longer,
and 376 pupils took less than x minutes.

From the graph, x = 36

Therefore,
e 6% of the pupils took 36 minutes or longer.
e ee
DESCRIPTIVE STATISTICS
oH

(d) We form the frequency distribution as follows:

Upper class Cumulative : :


boundary frequency Time (min) Frequency

: 28
45—28= 17
81—45= 36
143—81= 62
280 —143 = 1387
349—280= 69
374—349= 25
395—374= 21
400—395= 5
Total = 400

Histogram to show times taken by 400 pupils to walk to school

150

Frequency
100

30 35 40 45
Time (minutes)

Exercise 1d

ee ) Table A below gives the distribution of (ii) the range of marks gained by all
~~ marks of candidates in an examination. candidates except the top 10% and the
(a) Construct a cumulative frequency dis- bottom 10%.
tribution and draw a cumulative frequency
curve. (2, The cumulative frequency curve overleaf
(6) Use your curve to estimate has been drawn from information about
(i) the percentage of candidates who the amount of time spent by 50 people in
passed, if the pass mark was 45; ; a supermarket on a particular day.

Table A

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

Frequency 20 15 35 55 65 105 155 £100


22 A CONCISE COURSE IN A-LEVEL STATISTICS

-15,..., construct the frequency distribu-


3 tion and draw a histogram.

— (3) Table B below shows the frequency distri-


i bution of the masses of 52 women students
2 at a college. Measurements have been
= recorded to the nearest kg.
5 (a) Construct a cumulative frequency
table and draw a cumulative frequency
curve.
(b) How many students were of mass less
than 57 kg?
(c) How many students were of mass
greater than 61 kg?
(d) What was the mass exceeded by 20%
~ of the students?

Ga) 49 soil samples were collected in an area of


~ woodland, and the pH value for each
Time (minutes)
sample was found. The cumulative
frequency distribution was constructed as
(a) Construct the cumulative frequency shown in Table C below.
table, taking boundaries <5,<10,.... (a) Draw a cumulative frequency curve.
(b) How many people spent between 17 (b) What percentage of the samples had a
and 27 minutes in the supermarket? pH value less than 7?
(c) 60% of the people spent less than or (c) 50% of the samples had a pH value
equal to t minutes. Find t. greater than x. Find x.
(d) 60% of the people spent longer than (d) Taking equal class intervals of 4.4-,
s minutes. Find s. 4,8-, 5.2-,..., construct the frequency
(e) Taking equal class intervals of -5, -10, distribution and draw a histogram.

Table B

Mass (kg) |40-44 45-49 50-54 55-59 60-64 65-69 70-74

Table C
|pHvalue |< 4.8 <5.2 <5.6 <6.0 <64,/<68 <7.21 <76 =<S0 52
Cumulative

THE MEDIAN

The median is the middle value of a set of numbersarranged in


order of magnitude.— a
If there are n numbers, themedian is the s(n +1)th v ie. :‘ ae

(a) Raw data


Example 1.14 Find the median of each of the sets
(Q) isl 2nd, 4c eel ook
were 106, 41, 27; 32, 29, OO, a0, 40%
DESCRIPTIVE STATISTICS ; 23
Solution 1.14 (a) Arranging in order of magnitude

2,2, 3, 4, [7], 7, 7,9, 31


n = 9, and the median is the 3(9 + 1)th value, i.e. the 5th value
So median = 7.

(b) Arranging in order of magnitude

27, 29, 32,|36, 38], 39, 41, 43


‘n = 8, and the median is the 3(8 + 1)th value, i.e. the 43th value.
This does not exist, so we consider the 4th and 5th values

median = 3(36 + 38)


= 87
So median = 37.

(b) Ungrouped frequency distribution


The median can be found directly from the cumulative frequency
distribution.

Example 1.15 The table shows the number of children in the family for 35 families
in a certain area. Find the median number of children per family.

Number of children 0 1 2

Frequency 3 On le

Solution 1.15 Form a cumulative frequency distribution:

Number of children

0 3
Up to and including 1 8
Up to and including 2 20
Up to and including 3 29
Up to and including 4 33
Up to and including 5 35

There are 35 values, so the median is the 5(35 + 1)th vali~ *


18th value.
24 A CONCISE COURSE IN A-LEVEL STATISTICS

We could have written out all the values in order from the frequency
table, thus 0, 0,0,1,1,1,1,1, 2, 2,.... However, we can see from
the cumulative frequency table that the 18th value will be 2, as the
first 8 values are 0 or 1 and the first 20 values are 0 or 1 or 2.
Therefore the median number of children per family is 2.

1. Find the median of each of the following 3. , Find the median of each of the following
sets of numbers: frequency distributions:
(a) 4, 6,18, 25, 9, 16, 22, 5, 20, 4,8
(b) 192, 217, 189, 210, 214, 204 (2)
(c) 1267, 1896, 895, 3457, 2164
(d) 0.7, 0.4, 0.65, 0.78, 0.45, 0.32, 1.9,
0.0078 (b)

2. The table shows the scores obtained when


a die is thrown 60 times. Form a cumula- (c) 5 9 13) 17.) 21
tive frequency table and use it to find the
oo ~] oo bo boco e [op)

feowe sft ews 6]


median score.
24 54 84 11.4 14.4
9 aa Sle Way aL7, 6

Vea DAD _Um ay

(c) Grouped frequency distribution


Once the information has been grouped and the raw data lost we
can only estimate a value for the median.
This can be done by one of the following methods:
(a) by calculation,
(b) from a cumulative frequency curve,
(c) from a histogram.
Example 1.16 below is done in the three different ways to illustrate
the methods.

Example 1.16 The masses, measured to the nearest kg, of 49 boys are noted and
the distribution formed. Estimate the median mass.

Mass (kg) | 60-64 65-69 70-74 75-79 80-84 85-89


~ Frequency 2 6 12 14 10 5
DESCRIPTIVE STATISTICS
25
Solution 1.16 First form a cumulative frequency distribution.
The upper class boundaries are 64.5, 69.5, 74.5, 79.5, 84.5, 89.5.
The lower class boundary of the first class is 59.5.

Mass (ks) | Preaueney | Mas) | Gomlativ

The median is the 3(49 + 1)th value, i.e. the 25th value.

Method (a) — By calculation The 25th value lies in the class


74.5-79.5.
74.5 kg Median 79.5 kg

20 items

25 items

34 items

There are 14 items in the class 74.5-79.5 and from the diagram the
median is a of the interval of 5 kg from 74.5 to 79.5.
5
Estimate of the median mass = 74.5+ Pat(5)

76.3kg (1d.p.)
Therefore we estimate the median to be 76.3 kg (1 d.p.)

Method (b) — From the cumulative frequency curve Draw the


cumulative frequency curve and read off the value corresponding to
a cumulative frequency of 25.
Cumulative frequency curve to show the masses of 49 boys

frequency
Cumulative

0
59.5 645 69.5 74.5¢ 79.5 84.5 89.5
Mass (kg)
Med
26 A CONCISE COURSE IN A-LEVEL STATISTICS

From the graph, the value corresponding to the cumulative frequency


of 25 is 76.3 kg.
Therefore an estimate of the median is 76.3 kg.

Method (c) — From a histogram First, draw the histogram. As the


classes are of equal width the vertical axis can be labelled ‘frequency’.
Histogram to show the masses of 49 boys

Frequency

0 59.5 64.5 69.5 74.5 79.5 84.5 89.5


Mass (kg)

In a histogram, the area is proportional to the frequency. The


median is the middle value, so it will divide the area under the
histogram into two equal parts. Altogether, contained in the histo-
gram, there are 49 ‘units’ of area. We want to drawaline to the left
of which there are 24% units of area, and to the right of which there
are 24% units of area.

Consider the class 74.5-79.5; it contains 14 units. To the left of it


there is a total of 20 units of area. So we need to divide the area in
this class in the ratio 45::(14—49), ie. 45: 95. This will give 243
units of area to the left of the meds

Now, to divide AB in the ratio 45: 95, first


find the point P which is at 4h on the vertical
axis.
So AP:PC = 45:93 and AP: AC = 45:14.

Draw AD, the diagonal of the rectangle.


Draw a horizontal line PR and drop the vertical line QM from the
point where PR cuts the diagonal AD.
By similar triangles PQ:CD = 4s 714,
Therefore AM: AB = 45:14 and so AM:MB = 45:95.
Therefore, from the histogram, an estimate of the median is 76.3 kg.
DESCRIPTIVE STATISTICS
27
Example 117 The haemoglobin levels were measured in a sample of 50 people and
the results were as follows, each being correct to 1 d.p.:

13.5 15.6 16.38 12.3 13.1 14.2 12.4 11.3 14.0 14.6
19.G514.8 12.7 10.9 11.0 11.4 15,0 10.1 15.4 11.3
10% 146 13.5 15.1 12.1.12.0 14.2 11.4 15.0 13.3
13.2 9.1 16.9 14.2 15.0 13.6 14.8 11.4 14.8 15.7
13.5 13.5 12.9 13.8 13.7'16.2 11.6 13.8 14.2 10.7

(a) Group the data into eight classes, 9.0-9.9, 10.0-10.9,...,


16.0-16.9.
(b) What are the smallest and largest possible measurements which
could be included in the class 9.0-9.9?
(c) Draw a histogram of the grouped data and use it to estimate the
median value of the sample, showing your working.
(d) Find the true median of the sample. (SUJB)

10.0-10.9
11.0-11.9
12.0-12.9
13.0-13.9
14.0-14.9
15.0-15.9
16.0-16.9

(b) If the haemoglobin level y is in the class 9.0-9.9 then, as levels


have been measured correct to 1 d.p., we have 8.95 < y < 9.95.

The smallest measurement is 8.95 and the largest measurement is a,


such that a<.9.95. The upper class boundary of the interval is 9.95.

(c) The class boundaries are 8.95, 9.95, 10.95, 11.95, ... , 16.95.

The class widths are each equal to 1. As the class widths are equal
we can label the vertical axis ‘frequency’.

In a histogram the area is proportional to frequency. So the median


divides the histogram into two equal parts. We need to draw on the
histogram a line which will have 25 ‘units’ of area to the left of it
and 25 ‘units’ to the right of it.

Consider the class 13.0-13.9; there are 18 units of area to the left
of the lower class boundary point of 12.95, so we need another
7 units. If we find P such that AP = 7 then AP: AC = 7:12 and, by
similar triangles, PQ: CD = 7:12. Hence AM: AB = 7:12,
28 A CONCISE COURSE IN A-LEVEL STATISTICS

Histogram to show haemogiobin levels of 50 people

Frequency

10

i fo: Mie |
8.95 9.95 10.95 11.95 12.95 13.95 14.95 15.95 16.95
=O
Haemoglobin level

So there are 7 units of area in the class 13.0-13.9 to the left of the
line QM.
From the histogram, an estimate of the median is 13.45.

(d) In the sample the median is the 5(50 + 1)th value, i.e. the
255th value.
Now there are 18 readings as far as a haemoglobin level of 12.95.
Arranging the items in the class 13.0-13.9 gives
19th 20th 21st 22nd 23rd 24th 25th 26th
13.1) «13.2en 93:3) 9135. aldsbaeAd.5 i

median

So the true median is 5(13.5 +13.6) =13.55.

Exercise 1f

1. Estimate the median of the following Mass (gm) -50 -54 -58 -62 -66 -70 -74
frequency distribution
(a) by calculation, Frequency S222 IER AQ M10 PG. a9
(6) from a cumulative frequency curve,
(c) from a histogram. Construct a cumulative frequency table
and draw a cumulative frequency curve.
The frequency distribution shows the times
Use the curve to estimate the median
taken by 55 pupils to do their mathematics mass.
homework. Times have been measured to
the nearest minute. 3. The table shows the frequency distribution
: : of the speeds of cars passing along a marked
Time (min)| 5-14 15-24 25-34 35-44 45-54 stretch of road of length 1 kilometre.
Frequency 5 7 19 17 7 Estimate the median speed.
Speed (km/h) 40- 60- 80=- 100-
2. Eggs laid at Hill Farm are weighed and the
results grouped as shown:
DESCRIPTIVE STATISTICS 29

4. Estimate the median diameter of rods Construct a cumulative frequency table


produced by a particular machine by and use it to estimate the median mark.
drawing a histogram of the data given in
Table A below. Explain your method.
The length of life (to the nearest hour) of
5. The distribution of marks obtained by 199 each of 50 electric light bulbs is noted and
students in a mathematics examination is the results shown in Table C below. Cal-
shown in Table B below. culate the median length of life.
Table A

Diameter (cm) | 0.49-0.51 0.52-0.54 0.55-0.57 0.58-0.60 0.61-0.63


Frequency

Table B
10-39 40-49 50-54 55-59 60-69 70-79 80-89
120 ih ot AGe be oe) 17
TableC
Length of life (h) 650-669 670-679 680-689 690-699 700-719

QUARTILES, PERCENTILES, INTERQUARTILE RANGE


The three values which split a distribution into four equal portions
are known as the quartiles.
The 99 values which split a distribution into 100 equal portions are
the percentiles.
Consider n items, arranged in ascending order:

Lower quartile Q; qin +1)th value


Median _ 2 Ln +1)th value
Upper quartile 3 3(n + 1)th value
10th percentile Pig n(n +1)th value
90th percentile - Poo rot 1)th value 7

and so on.

The interquartile range = upper quartile —lower quartile


ar
‘The sermvinterquartile range = 1Q,—@1)

NOTE: the advantage of these ranges is that they depend entirely


on the middle half of the readings and they are not affected by
extreme values.

The 10 to 90 percentile range = Py9— Pio.


30 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 1.18 Find the semi-interquartile range of the following set of numbers:
2,3,3,9,6,6,12,11,8, 2, 3,5, 7,5,4,4,5,12,9

Solution 1.18 First, arrange the numbers in ascending order:

5,6,6,7,8, @,9,11,12,12
5,4,5,
2,2,3,3, @),4,
There are 19 numbers.

Q, is the (19 + 1)th value, i.e. the 5th value


Q,=3

Q; is the 3(19 +1)th value, i.e. the 15th value

Q,; = 9

Therefore semi-interquartile range = $(@3— 1)

= 3(9—3)
= 3
The semi-interquartile range of the set of numbers is 3.

When data has been grouped into classes, the values of the quartiles
and percentiles may be obtained in the same way as the median.

Example 1.19 The table gives the cumulative distribution of the heights (in cm) of
400 children in a certain school:

Height (cm) <100 <110 <120 <130 <140 <150 <160 <170

0 27 85 215 320 370 395 400


requency

(a) Draw a cumulative frequency curve.


(b) Find an estimate of the median.
(c) Determine the interquartile range.
(d) Determine the 10 to 90 percentile range.

Solution 1.19 Consider the 400 values arranged in ascending order:


Median 5(401)th value ~ 200th value
Lower quartile ;(401)th value ~ 100th value
Upper quartile 3(401)th value ~ 300th value
10th percentile 7p(401)th value ~ 40th value
90th percentile *(401)th value ~ 360th value
DESCRIPTIVE STATISTICS ; 31

(a) Cumulative frequency curve to show the heights of


400 children

frequency
Cumulative
300

200

100 110 120 130 140 150 160 170


| Height (cm)

From the curve,

(b) An estimate of the median is 129 cm.

(c) Q;= 187.5 cm, Q, = 121.5 cm.


The interquartile range = Q3;—Q,
137.5—121.5
= 16cm

The middle half of the readings, that is the interquartile range, has a
range of 16cm.

(d) Po =147cm, Pio = 113 cm.


The 10 to 90 percentile range = Po o—Pio
147—1138
= 84cm

Therefore the middle 80% of the readings have a range of 34 cm.


32 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 1g

1. Find (a) the median, (b) the lower quar- Draw a cumulative frequency curve. Use _
tile Q;, (c) the upper quartile Q3 for each this curve to estimate the median and the
of the following sets of data: quartiles of the distribution. , (O&C)
(i) Test marks of 11 students:

52, 61, 78, 49, 47, 79, 54, 58, 62, 73, 72 @/ From the soil of an English garden 100
earthworms were collected. Their lengths
(ii) were recorded to the nearest millimetre
Number of and grouped as shown in Table C below.
peas per pod
Write down the cumulative frequency
Frequency LO 1S se 24 22519548) 85 table and draw a cumulative frequency
curve to illustrate this information.
2. The marks scored by 63 pupils in a test Estimate
are shown in the frequency distribution. (i) the median length of worm,
Calculate (a) the median, (6) the inter- (ii) the semi-interquartile range,
quartile range for the set of marks. (iii) the percentage of worms which are
Mark Ope t C2235 4 5s Ga 18 BLO over 180 mm in length. (C Additional)
Frequency
| 2.2 3 4 6 11 15 10 6 3 i “~~

(6. )Every day at 08 28 a train departs from


/ 8. \ Table A below shows the marks, collected a one city and travels to a second city. The
into groups, of 400 candidates in an times taken for the journey were recorded
examination. The maximum mark was 99. in minutes over a certain period and were
Compile the cumulative frequency table grouped as shown in Table D below.
and draw the cumulative frequency curve. (The interval ‘-90’ indicates all times
Use your curve to estimate (i) the median, greater than 85 minutes up to and in-
(ii) the 20th percentile. cluding 90 minutes.)
If the minimum mark for Grade A was From these figures draw a cumulative
fixed at 74, estimate from your curve frequency curve and from this curve
the percentage of candidates obtaining estimate
Grade A. (C Additional) (i) the median time for the journey,
(ii) the semi-interquartile range,
4. An inspection of 34 aircraft assemblies (iii) the number of trains which arrived
revealed a number of missing rivets as at the second city between 1000 and
shown in Table B below. 1015. (C Additional)
Table A
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
No. of candidates 26 42 66 83 als 52 30

Table B

Number of
0-2 3-5 6-8 9-11 12-14 15-17 18-20 21-23
rivets missing

Length (mm) 95-109 110-124 125-139 140-154 155-169 170-184 185-199 200-214
Number of
2 8 17 26 24 16 6 al
worms

2806-85 -90 -95. -100 -105 -110 -115 -120 -125 over 125
0 6 12Z60°227) 231 FTO a + 2 1 0
DESCRIPTIVE STATISTICS 33
te 30 specimens of sheet steel are tested for centage of persons who performed the
tensile strength, measured in kN m7. task in forty minutes or less. (JMB)
Table E below gives the distribution of =A
the measurements. 10 The frequency distribution, given in
Draw a cumulative frequency diagram of the table, refers to the heights, in cm, of
this distribution. 50 men, corrected to the nearest 10 cm.

Estimate the median and the 10th and Height (em) |140 150 160 170 180 190
90th percentiles. (O&C)

The figure below shows the cumulative (a) State the least possible height of the
frequency diagram for the distribution of one man whose height is recorded in the
the number of marks, N, in the range 0 to table as 140 cm.
99 inclusive, obtained by 120 candidates
(b) Draw on graph paper a histogram to
in an examination. From the diagram,
illustrate the data of the table, drawing five
estimate
columns, with the first column represent-
(a) the median mark,
ing the seven shortest men. Label the axes
(b) the inter-quartile range,
carefully and explain clearly how fre-
(c) the number of candidates who scored
quency has been represented on your
more than 59 marks.
histogram.
State why the diagram has to be read at (c) Draw a cumulative frequency diagram |
N = 9.5, 19.5, ...if a grouped frequency on graph paper for the data given in the
table showing how many candidates are table. From your diagram, estimate the
in the class intervals 0-9, 10-19, ... is upper and lower quartiles, the median
to be found. height and the interquartile range.
Draw up such a table and illustrate it by (L Additional)
drawing a histogram. Mark on your dia- The following data concern a random
gram the median mark. (L Additional) sample of 1000 men with heights in
The distribution of the times taken when the given ranges.
a certain task was performed by each of a
large number of people was such that its
twentieth percentile was 25 minutes, its
fortieth percentile was 50 minutes, its 180-
sixtieth percentile was 64 minutes and its 182-
eightieth percentile was 74 minutes. Use 184-
186-
linear interpolation to estimate (i) the
188-
median of the distribution, (ii) the upper
190-192
quartile of the distribution, (iii) the per-

Table E

Tensile strength |405-415 415-425 425-435 435-445 445-455 455-465

Number of 4 3 6 10 5 2
specimens

PT
TN

frequency
Cumulative

80 90 100
MarksNV
A CONCISE COURSE IN A-LEVEL STATISTICS
34

Draw a cumulative frequency diagram to between the fortieth and seventieth per-
illustrate these data. Use your diagram to centiles,
estimate (c) the number of men in the sample
(a) the median height, with heights of at least 183 cm.
(L Additional)
(b) the range of heights for men who are
_ MitreYa nanrire aT EYTOR TR ET

MEASURES OF LOCATION
There are three main statistical measures which attempt to locate a
‘typical’ value. These are
the median (which we have already investigated, p. 22),
the mode,
the arithmetic mean.

THE MODE
The mode is the value that occurs most often.

The mode has the advantage that it is easy to calculate and it elim-
inates the effects of extreme values, but it is generally unsuitable
for further calculation and it is not used widely.

(a) Raw data

Example 1.20 Find the mode of each of the following sets:


(a) 4,5, 5,1,2,9,5, 6, 4, 5,7, 5,5 (b) 1, 8,19, 12,3, 4,6,9
(c) 2,2, 3,5,8,2,5,6, 6,5
Solution 1.20 (a) The mode is 5, as it occurs most often.
(b) The mode does not exist.
(c) The modes are 2 and 5. The distribution is said to be bimodal.

(b) Grouped data


When data has been grouped into classes, the class which has the
largest standard frequency is called the modal class. An estimate of
the mode can be obtained from the modal class.

Example 1.21 Estimate the mode of the following frequency distribution which
shows the marks of 330 candidates in an examination.

Marks 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Frequency 20 40 80 100 50 200 10 10 0

Solution 1.21 First, a histogram is constructed.


DESCRIPTIVE STATISTICS 35

Histogram to show examination marks

Frequency

-0
0 Y10.5 20.5 30.5 pee 60.5 70.5 80.5 90.5
43 Marks “

Estimate
of mode

The modal class is 41-50.

Now the modal class contains 20 more than the class below and 50
more than the class above. So the mode is likely to divide the modal
class in the ratio 20:50 = 2:5.

An estimate of the mode can be found from the histogram by


drawing lines as shown in the diagram. This gives a value of 43
marks.

Estimate of the mode by calculation:


20
An estimate of the mode is ————— of the interval of 10 marks
20+ 50
from 40.5 to 50.5. So

20
timate ofof mode
estimate mode = 40.5+ bead
|———] (10 )

m=eme 405+
oray(| a0
| (Eo)
lI 43.4

An estimate of the mode is 43.4 marks.

From the example it can be seen that the relevant information


concerns the modal class and the class on either side of it. Hence
we can save time if we extract this information instead of drawing
the complete histogram. For example,
A CONCISE COURSE IN A-LEVEL STATISTICS

Aap
36

A
dz a is the l.c.b. of the modal class

cis the width of the modal class

Estimate of the mode

:
<=c>
es rate
Estimate of the mode from the cumulative frequency curve
The rate of increase of a cumulative frequency curve is greatest at
the point corresponding to the mode. Therefore, at the mode there
is a point of inflexion on the cumulative frequency curve. To
estimate where this occurs, place a ruler along the curve and find
where the curve has its maximum gradient.
y

{= Point of inflexion

Mode

Example 1.22 Draw a cumulative frequency curve for the data in Example 1.21
and use it to estimate the mode.

Solution 1.22 The upper class boundaries are 20.5, 30.5, 40.5, ..., 90.5. The
lower boundary of the first class is 10.5.
The cumulative frequency distribution is as follows:
Marks <10.5 <20.5 <30.5 <40.5 <50.5 <60.5 <70.5 <80.5 <90.5
Cumulative 310 320 330
frequency 0 20 60 140 240 290

Cumulative frequency curve to show examination marks


if i 7

300

200

frequency
Cumulative
100

0
10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5
Mode Marks
DESCRIPTIVE STATISTICS
37
We see from the curve that the point of inflexion occurs when the
mark is 44.5 (approximately).
Therefore an estimate of the mode is 44.5 marks.

Exercise 1h

1. Find the mode of each of the following The age recorded for each man is the
sets of numbers: number of completed years lived.
(Q)e2im 22 a Zon 2a eee ao (a) Construct the cumulative frequency
(b) 412, 426, 435, 412, 427, 428, 485 table and draw the cumulative frequency
(c) 4, 6, 4, 8, 9, 2,4, 2, 6, 7, 8, 6, 5, 5, curve.
4,6 (b) From the cumulative frequency curve,
(d) 101; 106, 99, 108, 76, 87, 102, 93 estimate the mode.
(c) Drawa histogram, and use it to estimate
2. Find the mode of the following frequency the mode.
distribution:

Shoe size Donraitess 5 (6


5. The lives of 80 electric light bulbs were
Frequency 8 15 23 20 14 recorded in hours to the nearest hour and
grouped as shown in Table B below.
3. State the modal class and using the histo- (a) State the limits between which the
grams which you drew previously estimate actual life of each bulb in the first group
the mode: must lie.
(a) for the data of Question 2, Exercise la, (b) Construct the cumulative frequency
(6) for the data of Question 3, Exercise la, table and draw the cumulative frequency
(c) for the data of Question 4, Exercise la. curve.
For part (a) refer to the original data and (c) Use your curve to estimate
state the true mode. ' (i) the median,
(ii) the 90th percentile.
4. Table A below shows the number of men in (d) Explain how the curve may be used to
various age groups with some form of paid estimate the mode of the distribution.
employment in the village of West Morton. (C Additional)
Table A

Age (years) | 14-20 21-30 31-40 41-50 51-60 61-70 71-90

12 14 26 35 23 5 1
Table B

Life (in hours) - 660-669 670-679 680-689 690-699 700-709 710-719 720-729 730-739

No. of bulbs

THE ARITHMETIC MEAN

The mean of the n numbers x;, X2,..- , X» is ¥ where


Cea for 3t = 1.2,.22.0
38 A CONCISE COURSE IN A-LEVEL STATISTICS

NOTE: for simplicity we often drop the subscript 1 and write


vem ax
=
(a) Raw data
Example 1.23 Find the mean of the set of numbers
63, 65, 67, 68, 69, 70, 71, 72, 74, 75

Solution 1.23 n=10

Lx = 63+65+674+68+69+70+ 71+ 724+ 74 +75 = 694

Dx
Therefore =
n

_ 694
10
= 69.4

The mean of the set of numbers is 69.4.

The calculations can be made easier by using an assumed mean,


written X,.
We then consider the deviation, y, of each reading from X,. So, for
X1,X2,...,X, we have

V5 =) X1—X,

x X2~ Xq

ee <i Ke

Vimo Xn— Xa
Summing 2y; = 2x,—nx, for i = 1,2,...,n

So ee ze or
n n
Therefore Y= K=KX_q
Rearranging x = x,+y

An general, ify =x —X,,thony =X—x,


and so ee x = *,+7

Example 1.24 Find the mean of the set of numbers given in Example 1.23, using
an assumed mean X, of 70.

Solution 1.24 xX, = 70, so y = x—70


DESCRIPTIVE STATISTICS 39

Now y = x—70
Therefore
y = x—70
sO x = 70+%¥

= 170+2
n
—6
= 70 tee
10
= 69.4
Therefore the mean, X = 69.4, as before.

Exercise 1i

1. Find the mean for each of the following 3. The mean of 10 numbers is 8.If an eleventh
sets of numbers (a) without using an number is now included in the results, the
assumed mean, (6) using an assumed mean. mean becomes 9. What is the value of the
(i) 5,6, 6,8,8,9,11,13, 14,17 eleventh number?
(ii) bee 153, bees oe pee
aaa 1 gd ggd 541 cod 541 551 564
i) 445, 475, 485, 515, 525, 545, 555, 565 4. The mean of 4 numbers is 5, and the mean
(iv) 1769, 1771, 1772, 1775, 1778, 1781, of 3 different numbers is 12. What is the
rs Ae bani aed Bil toe mean of the 7 numbers together?
Vv . 9 . > . r} ° > na > -

2. Ifthe mean of the following numbersis17, 5, The mean of n numbers is 5. If the number
find the value of c: 13 is now included with the n numbers, the
12,18, 21,c,13 new mean is 6. Find the value of n.

(b) Ungrouped frequency distribution


strit
For a frequency di

Example 1.25 The 30 members of an orchestra were asked how many instruments
each could play. The results are set out in the frequency distribution.
Calculate the mean number of instruments played:

Number of instruments,x}| 1 2 3 4 5

Frequency, f FiigalOm aoe) 3 1


40 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 1.25 fx

11 _) 2px
Cae
10 Lf
5
63
3 ie
1 30
Xf = 30 = 2.1

The mean number of instruments played is 2.1.

Again, an assumed mean may be used, and we have

: 2 fy Ss 6
Kk = 8,7) whereVy =F and. y = xX,

Example 1.26 For the data of Example 1.25, find the mean using an assumed
mean of 2.

Solution 1.26

sO si I bo a
30
= 2.1
The mean number of instruments played is 2.1, as before.

It is particularly useful to use an assumed mean when dealing with


large numbers or ones involving fractions or decimals.

(c) Grouped frequency distribution


- When data has been grouped into interv: ‘is
the interval is taken to represent the interval.
DESCRIPTIVE STATISTICS 41

Example 1.27 The lengths of 40 bean pods were measured to the nearest cm and
grouped as shown. Find the mean length, giving the answer to 1 d.p.

Length (cm) 4-8 9-13 14-18 19-23 24-28 29-33


Frequency, f Zi 4 q 14 8 5

Solution 1.27 Method 1 Without using an assumed mean

Length (cm) Mid-point, x aera

x fx
SI II
Sig
825
bah
= 20.6 (1d.p.)
Therefore the mean length of the bean pods is 20.6 cm (1 d.p.).

Method 2 Using an assumed mean


Let X, = 21, then y = x—21, where x is the mid-point of an
interval, and y = x— 21.

Length (cm) Ss = oS ° 5 ot & <<


II 8 | boah

2 315 — 30
oy = c=10 —40
16 7 =o. — 35
21 14 0 0
26 8 5 40
50
Dfy =—105+90 =—15
2 fy
i= 21 --yeywhere™ ¥ = Sf

oD
eee ri ee
40
= 20.6 (1d.p.)
Therefore the mean length of the bean pods is 20.6 cm (1 d.p.), as
before.
42 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 1]

1./ Find the mean for each of the following (a) Calculate the mean maximum day time
frequency distributions (a) without using temperature for February. (b) Find the
an assumed mean, (b) using an assumed mode. (c) Find the median.
mean. 5.
A sample of 100 boxes of matches was

() taken and a record made of the number of


matches per box. The results were as

o
follows:
Number of matches
pens 47 48 49 50 51

ai 14.25 32 23. 6
; Frequency 4

Calculate the mean number of matches per


20 35 24 17

box, using an assumed mean of 49.


(iv)
Interval | 5-9 10-14 15-19 20-24 25-29 30-34] 6, The table shows the speeds of 200 vehicles
passing a particular point:
f 4 e i ior % 1
Speed (km/h)| 30- 40- 50- 60- 70- 80-
(v) Frequency 14 30 52 71 33 0
Interval |101-104 105-108 109-112 113-116 117-120
alg 13 18 21 12 6 Using an assumed mean of 55 km/h, cal-
/
culate the mean speed.
|
7. After an exciting friendly rugby match
J If the mean of the following frequency between the A and B teams at a certain
distribution is 3.66, find the value of a, school the matron had to treat all 30 boys
and find the median and the mode of the for bruises. She recorded the number of
distribution. bruises sustained by each boy and sub-
mitted her report to the headmaster. How-
ever, she gave it to a very grubby boy to
deliver and what the headmaster finally
saw looked something like this:
3.~(a) State briefly the meaning of the terms Number of bruises 1.2) 3 4 156 7-8
mode, median and mean. Give an example
Number of boys 3°1°@ 3 e 6 3 6
of a situation where the most appropriate
average would be (i) the mode, (ii) the The headmaster overheard one boy saying
median. that if everyone had received the same
(b) A bag contained five balls each bearing number of bruises he would have 5, instead
one of the numbers 1, 2, 3, 4, 5. A ball was of 8.
drawn from the bag, its number noted, (a) How many boys received (i) 3 bruises,
and then replaced. This was repeated 50 (ii) 5 bruises?
times and the table below shows the (b) How many boys were fortunate enough
resulting frequency distribution. to sustain fewer than the median number
of bruises?

[Feauney fe i Sy 88 8 On a certain day the number


40 shelves in a library was
of books on
noted and
If the mean is 2.7, (i) determine the value
grouped as shown. Find the mean number
of x and y, (ii) state the mode and median
of books on a shelf, using an assumed
of this distribution. (C Additional)
mean. Give your answer to 2 significant
figures.
4.\ /The maximum daytime temperature was
Number of books 31-35 36-40 41-45 46-50 51-55 56-60
recorded for each day in February and the
Number of shelves 4 6 10 13 5 2
results noted as shown in Table A below.

Table A

Temperature (°C) 1. 2 eS As 5 6a. 7 Bi Oi Oeil


Frequency LO ORS eae aS eee Shes Demet eee
DESCRIPTIVE STATISTICS 43

Using a method of ‘coding’ or ‘scaling’ to find the mean


The idea of using an assumed mean may be extended to make
calculations easier. We now illustrate this method of ‘coding’ or
‘scaling’.

Example 1.28 Find the mean of the set of numbers 5693, 5700, 5714, 5721,
5735.

Solution 1.28 Choose as assumed mean X, = 5714. However, when we write out
the values of x —xX, we note that all the numbers are multiples of 7.
KO TL
So, we introduce a further column y = eo

We have used the ‘coding’


- X—DIA4
oven ee
Rearranging x = 5714+ Ty

Summing, for all 5 values

Dx = (5)(5714)
+ D7y
= (5)(5714)+72y

ak Ly
So =5 = (6714-75

and X= O1L47 1y¥

ja Cra)
Therefore x = Daas

aa 57 42,0

The mean of the set of numbers is ba 2.0.


44 A CONCISE COURSE IN A-LEVEL STATISTICS

‘Ingeneral, if the set of numbers x1, X2, - OR, is transformed to


the set of numbers yj, V2, ---,¥n by means of the coding
kk
2
then = %,+by
and x = Kay POY:

Proof Let
Xa,
vi = a Ce aa
Rearranging x; = X,+ by;
Summing Lx = nx, toZy
oe S, Ly
so 4 tg te Dace
n n
i.e. x = X,+ by
This method is particularly useful when the data is in the form of a
frequency distribution and the intervals are of equal width.

Zfy;
In this case f= Sa
: Df;
For data grouped into classes of equal width:
(a) use the mid-point of each interval to represent the class,
(b) choose a central value as assumed mean, X,,
(c) divide by the class width, b.

Example 1.29 A girl measured her waiting time (in minutes, to the nearest minute)
for the school bus on 30 mornings, and obtained the following
results:

Waiting time (min) | 1-4 5-8 9-12 138-16 17-20


Frequency 3 6 10 of 4

Use a method of coding to find her mean waiting time.

Solution 1.29 The mid -points are 2.5, 6.5; 10.5, 14.5, 18.5. Choose a central value
of 10.5, say, as assumed mean; so x, = 10.5.
The class widths are each equal to 4 so take b = 4.
DESCRIPTIVE STATISTICS 45

So we use

10.5 and b= 4

i.e. 10.5 + 4y

Waiting
time (min)

0
T
8

Lfy =—12+15=3

z
Now xX = X,+by where Y = Ie
2f
oa 3
So x = 10.5+4|—
30
= 105+0.4

= 10.9
The mean waiting time is 10.9 minutes.

eee
De ee
Exercise 1k

1. Find the mean, X, of the set of numbers Find the mean, using a method of coding.
10, 20, 30, 40, 50, 60 using the coding
_ x—A40
02: 4, Find the mean, using a method of coding,
for each of the following frequency dis-
2. Find the mean, x, of the numbers 217, tributions:
222, 227, 237,242,252 using the coding
Skanes& (a) | Interval |15-21 22-28 29-35 36-42 43-49
Ere Frequency 2 18 23 17 9

3. The table shows the masses of a group of (b) | Interval | 0- 10- 20- 30- 40- 50- 60-
male students at a college. Measurements Frequency |10 15 23 32 18 2 0
have been taken to the nearest kg.
60-64 65-69 70-74 75-79 80-84 85-89 (c) 1-2 3-4 5-6 7-8 9-10 11-12 13-14
Mass (kg)
2 42 60 35 12 p mccemsiomnid | 18 «6 2
Frequency 4
46 A CONCISE COURSE IN A-LEVEL STATISTICS

5. Ina practical class students timed how long feed the animals. The results were as
it took for a sample of their saliva to break shown:
down a 2% starch solution. The times, to
the nearest second, are shown in Table A |Time(min) [-15 -20 -25 -30 -35 -40 -45 -50
below. Find the mean time, using a method
of coding.
6. Each morning for a month the owner of a Calculate the mean time taken to feed the
smallholding timed how long it took to animals, using a method of coding.
Table A

Time (seconds) |11-20 21-30 31-40 41-50 51-60 61-70 71-90

MEASURES OF DISPERSION

There are several ways of obtaining a measure of the ‘spread’ of a


set of observations.

THE RANGE

The range is the difference between the highest and the lowest
value. It is based entirely on the extreme values.

RANGES BASED ON QUARTILE AND PERCENTILE OBSERVATIONS

Interquartile range = Q@;—@Q, where Q) is the upper quartile


Q, is the lower quartile
Semi-interquartile range = 5(Q3— Q,)
NOTE: these ranges depend entirely on the:middle half of the
observations.
The 10 to 90 percentile range = Poy —Pj9 where Po is the 90th
percentile
Po is the 10th
percentile
NOTE: this range depends on the middle 80% of the observations.

THE MEAN DEVIATION FROM THE MEAN

The mean deviation from the mean makes use of all the observations.

The mean deviation from the mean of a set of n numbers,


(ii e Xn), is given by a .

where x 1s the mean of the Bat of numbers.


DESCRIPTIVE STATISTICS : 47
NOTE: |x;—< |is the positive difference between x; and X and is
called the modulus of (x;—<).

Example 1.30 Two machines, A and B, are used to pack biscuits. A sample of 10
packets was taken from each machine and the mass of each packet,
measured to the nearest gram, was noted. Find the mean deviation
from the mean of the masses of the packets taken in the sample
for each machine. Comment on your answer.

_| Machine A
(mass in g)
Machine B
(mass in g)

Solution 1.30 Machine A Machine B

2 2000 Ux 2000
pe OH x = ——pit | = 200
n 10 n 10

Zz
2
1
0
0
1
r
2
5

L\lx—200| Ses > |x


—200]
Mean deviation ee Mean deviation ee
10 10

= 1.8 = 4.2

The larger number for machine B indicates that the masses are more
widely spread than those from machine A.
Therefore machine A is more reliable.

For a frequency distribution


n from the mean Rieti u

the
It is possible to find the mean deviation from the median and
on is not
mean deviation from the mode. However, the mean deviati
widely used.
48 A CONCISE COURSE IN A-LEVEL STATISTICS
THE STANDARD DEVIATION, s

The square of the deviation from the mean is considered for each
value of x.

The standard deviation of a set of n numbers, th Xo, er ee


wea mean X is given by s, where 2

lt TZ. ere ae

The standard deviation is the most useful measure of spread. For


most distributions the bulk of the readings lie within + 2 standard
deviations of the mean, i.e. within the interval (X + 2s).
The units of standard deviation are the same as the units of the
original data.
NOTE: sometimes the abbreviation s.d. is used for standard devia-
tion.

THE VARIANCE

The variance of the set of numbers is given by s” where


2 (xpo-X)?
——
n
We have standard deviation = \/variance —

Example 1.31 For the data given in Example 1.30, calculate the standard devia-
tion of each machine, given X = 200g in each case.

Solution 1.31 Machine A Machine B

—8 64

Gh
Mr’
eieeieS
or
DESCRIPTIVE STATISTICS 49

» _ 2(x—200) s¢
>. =
U(x—200)
——— ae
10 10
= 5.6 = 24
s = V5.6 s = /24
= 2.37 (2d.p.) = 4.90 (2d.p.)
The standard deviation for machine A is 2.37 g and the standard
deviation for machine B is 4.90 g, once again indicating that machine
_ A is more reliable.

Alternative form of the formula for standard deviation The formula


x— x)?
s= gel is sometimes difficult to use, especially when x
n
is not an integer, so an alternative form is often used.
1
Now g? = — 2 (x; =)? irr lt 2eee
n

= =
I ZK;
Beyer ater?
—SpaxXkypt XxX )
n
1
Ss (Dx? —2x xz Lx’)
n

Sx ee x, ex
gt
n n n
Lx? 8 —2
Sm RK
n
2
— 2 Sistine
n

So we have (38 Il

Example 1.32 Find the mean and the standard deviation of the set of numbers
2,3,5,6,8

Solution 1.32
50 A CONCISE COURSE IN A-LEVEL STATISTICS

Standard deviation:

[= (x—®)* : De 2 as
Method 1— using s = | Method 2— using s = a ae
crs

138
= sep as

= 4.56
s = V4.56
214 (2d.p.) = 2.14 (2d.p.)
Therefore the standard deviation of the set of numbers is 2.14
(2 d.p.).
NOTE: in this case there is far less working involved in method 2.

Exercise 11

Find the mean and the standard deviation For a set of 9 numbers ©(x— X)* = 234.
of the following sets of numbers. For Find the standard deviation of the num-
questions (a), (6) and (c) try using both bers.
forms of the formula for the standard
deviation. Use whichever you wish for For a set of 9 numbers ©(x—X)? = 60
parts (d), (e) and (f). Do not use the and Dx” = 285. Find the mean of the
programmed functions on your cal- numbers.
culator.
(a) 2, 4,5, 6,8 The numbers a, b,8,5,7 havea mean of
(bo) 6,8,9,11 6 and a variance of 2. Find the values of
(c) 11, 14,17, 28, 29 aand b,ifa>b.
(d) 5,13, 7,9, 16,15
Find the mean and the standard deviation
(e) 4.6, 2.7, 3.1, 0.5, 6.2
of the set of integers 1,2,3,..., 20.
(f) 200, 203, 206, 207, 209
Find the mean and the standard deviation
of the first n integers.
The mean of the numbers 3,6, 7,a,14 is
8. Find the standard deviation of the set You may use
of numbers.
>a - = 1ann td);

For a set of 10 numbers 2x = 290 and


Lx?= 8469, Find the mean and the
variance.
Zr?zr == dant
gn(n+1)(2n+ 1yant 1)
DESCRIPTIVE STATISTICS , 51

9. From the information given about each of 10. Calculate the mean and the standard
the following sets of data, work out the deviation of the four numbers

Te[3 | Be
missing values in the table:
25 .Ono

(2) 63 | 7623 924 800


Two numbers, a and 5, are to be added to
(0) 152.6 this set of four numbers, such that the
(c) 52 57 300 mean is increased by 1 and the variance is
increased by 2.5. Find a and b.
(d) 18
(L Additional)

THE USE OF CALCULATORS

If your calculator has SD (standard deviation) mode then it can be


used to calculate standard deviations, and you will have access to
the following information x,s,n, 2x, Zx?.

The following example has been done using two types of calculator,
and you should consult your calculator instructions if yours does
not appear to follow one of the patterns.

Example 1.33 Find the mean and standard deviation of the numbers
33, 28, 26, 35, 38

Solution 1.33

Method 1 — using Casio 100C or 115N Method 2 —using Casio 82D

Set the calculator to SD mode by Set the calculator to SD mode by

pressing pressing ec

sar aes]

right hand for


(Try to use both hands, left hand for the numbers and
52 A CONCISE COURSE IN A-LEVEL STATISTICS

gives SI lI 32 gives SI lI 32

gives S lI AA) gives gs = 4.427...

gives n=5 [6 | gives n=5

gives “x = 160 gives 2x = 160

gives Dx? = 5218 gives Dx? = 5218

Therefore the mean is 32 and the standard deviation is 4.43 (2d.p.)

NOTE: To clear the SD mode, press |MODE [9|.

Exercise 1m

Do Exercise 11 question 1 using your calculator in SD mode.

Calculations involving the mean and the standard deviation

Example 1.34 For the set of numbers 3,6,7,9,10 the mean is 7 and the standard
deviation is /6. If each number in the set is increased by 3, find the
new mean and standard deviation. Comment on your answers.

Solution 1.34 The new set of numbers is 6,9,10,12,13.


De _ (649410412413)
The mean =
5
= ak on
Ow SS where
x = 10

on
oO
oo
feb)SS a n II |S
-
D(x —xX)? = 30
Therefore, if each member of the set of numbers is increased by 3,
then the mean is increased by 3 but the standard deviation remains
unaltered.
DESCRIPTIVE STATISTICS : 53

In general, consider the set of n numbers x;, x», ..., x, with mean
x and standard deviation s,.

(i) Increase each number by a constant, c Then

VP SSeS more tie= ly Qecteirn


Summing 2y; = Ux;+ne
dy: .
and ad = 2x; +e
n n
So y = X¥+e

If each number is increased by a constant c, the mean is increased


by c.

For the new set of numbers

ee 2 (y;— 9)?
Sask =>a
n
PR SOARS
Alls
n

a L (x;—X)?
n
= 5,2
Therefore S, = 8

If each number is increased by a constant c, the standard deviation


remains unaltered.

(ii) Multiply each number by a constant k It can be shown that


new mean = kx
new standard deviation ks,

‘Tf each number is multiplied by a constant k, both the mean and


the standard deviation are multiplied by k.

Exercise 1n

G) By considering the set of numbers 3,6,7,9, 2. The set of numbers x1, %2,...,%X, has
10, with mean 3 and standard deviation mean X and standard deviation s;. Each of
w6i investigate the effect on the mean and the numbers is multiplied by a constant
on the standard deviation of multiplying term k. Show that the new mean is kx and
each term by 3. the new standard deviation s2 = ks}.
54 A CONCISE COURSE IN A-LEVEL STATISTICS

‘3 (a) Find the mean and the standard devia- () If a;is the ith member of A and d;is
‘ tion of the set of numbers 4,6,9,3,5,6,9. the ith member of D, find a relatiouship
(b) Deduce the mean and the standard between a; and d;in the form d;=/la;+m
deviation of the set of numbers 514,516, where / and m are constants.
519, 513,515,516, 519. (c) The two ordered sets X and Y
(c) Deduce the mean and the standard each have rn elements and y;= px;+q
deviation of the set of numbers 52,78, where p and q are constants. If the mean
117, 39,65, 78,117. and the variance of X are X and s,°, show
f{Y
(a) Find the mean and the variance of the eset ae isae os ea vee
ordered set of numbers ¥ (1 Additional)
= {1,2,3, 4,5, 6, 7}.
Hence find the mean and the variance of the
following ordered sets 5./ A set of values of a variable X has mean 5
and standard deviation 2. Values of a new
B = {4,5,6,7,8,9,10} variable are obtained by using the formula
C = {10, 20, 30, 40, 50, 60, 70} Y = 4X-—3. Find the mean and the standard
D = {13,23,33,43, 53, 63, 73} , deviation of the set of values of Y.

Scaling similar sets of data for comparison


If we wish to compare two sets of data, e.g. examination marks in
two papers, we ‘scale’ one of the sets of data so that the two means
are the same and the two standard deviations are the same.

Example 1.35 A set of marks has a mean of 40 and a standard deviation of 5. The
marks are to be scaled so that the mean becomes 50 and the stand-
ard deviation becomes 8. If the equation of the transformation is
y = ax + b, find the values of the constants a and b. Find also the
scaled mark which corresponds to a mark of 45 in the original set.

Solution 1.35 If there are n marks, then y; = ax;+ b for eachi =1,2,...,n.
Summing Ly; = aXx;+nb

29 =a ee b
n n
So y = ax+b
Hence 50 = a(40)+b 40a+b = 50 (i)
Let s, be the original standard deviation, and s, the new standard
deviation
L(y;—y)?
Then So aes wee =610%
n
2 laxp (ae O12
n
=
— a
iC ases

on
= q?s,?
or 8, = as,
So 8 = 5a a = $(ii)
DESCRIPTIVE STATISTICS 55

Substituting for a from (ii) into (i),


40(2) +b 50
b= —14
Therefore the equation of the transformation is y = Bac—14.

If x = 45, then y = 8(45)—14 = 58.


Therefore a mark of 45 in the original set becomes a mark of 58
when scaled.

Exercise 10

It is proposed to convert a set of marks were scaled linearly (that is, a mark of x
whose mean is 52 and standard deviation is became a mark of ax +6 where a and b
4 to a.set of marks with mean 61 and are constants) so that the means and
standard deviation 3. The equation for the standard deviations of the marks in both
transformation necessary to convert the examinations became the same. The
marks is y = ax + b. Find (i) the values of original means and standard deviations
a and b, (ii) the value of the scaled mark are shown in the table.
which corresponds to a mark of 64 in the
original data, (iii) the value in the original
data if the scaled mark is 79. Mean mark 48
Standard deviation 12
The marks of 5 students in a mathematics
test were 27, 31, 35, 47, 50. Find a and b.
(i) Calculate the mean mark and the
The original marks of a particular student
standard deviation. are 36 in algebra, 48 in biology. In what
(ii) The marks are scaled so that the mean sense, if any, has he done better in algebra
and standard deviation become 50 and 20 than in biology? (C Additional)
respectively. Calculate, to the nearest whole
number, the new marks corresponding to A linear function f(x) = ax + b transforms
the original marks of 31 and 50. Xe enon oO sek)
(C Additional)
into a set Y, so that {(5) = 13 and f(1) = 5.
In order to compare the performances of (a) Find f.
candidates in two schools a test was given. (b) Calculate the mean and the variance
The mean mark at school A was 45, and of X.
the mean mark at school B was 31 witha (c) Hence calculate the mean and the
standard deviation of 5. The marks of variance of Y.
school A are scaled so that the mean An element, k, is added to X forming a
and standard deviation are the same as set Z. Given that the mean of Z is three
school B and a mark of 85 at school A greater than the mean of X, find
becomes 63. Find the values of a and b (d) the value of k,
if the transformation used is y = ax + b. (e) the variance of Z. (L Additional)
Find also the original standard deviation
of the marks from school A. Show that the standard deviation of the
integers
Given that the mean and standard devia-
1, 2,3, 4, 5,6, 7
tion of a set of figures are J and 0 respect-
tively, write down the new values of the is 2.
mean and standard deviation when Using this result find the standard deviation
(i) each figure is increased by a con- of the numbers ok
stant c, (a) 101, 102, 103,404, 105, 106, 107.
(ii) each figure is multiplied by a con- (b) 100, 200, 300, 400,500, 600, 700.
stant k. (c) 2.01, 3.02, 4.03,5.04, 6.05, 7.06, 8.07.
(d) Write down seven integers which have
A group of students sat two examinations,
mean 5 and standard deviation 6.
one in algebra and one in biology. In order
(L Additional)
to compare the results the algebra marks
56 A CONCISE COURSE IN A-LEVEL STATISTICS

Combining sets of numbers

Example 1.36 A set of 12 numbers has a mean of 4 and a standard deviation Olea.
A second set of 20 numbers has a mean of 5 and a standard devia-
tion of 3.

Find the mean and the standard deviation of the combined set of
32 numbers.

Solution 1.36 For the first set of numbers: n, = 12, X, = 4,8, = 2


Therefore
ny
x; = nx, = (12)(4) = 48
i=1
ny
xe

and oie == a —*,?


ny

ny
sO > «? = n\(X%,2+8,2) = (12)(42+ 27) = 240
i=4

For the second set of numbers: n,= 20, X, = 5,s,=8


Therefore

n2 F

>.8 = mF. = (205) = 100


jt

n2
and x? = n3(X52+937)\p 2005? 32) 680

i=1 ase 48+100


SI Il = 4.625
nyt+n, 32

To find the standard deviation of the combined set of numbers:

>” = Sia?t
I Se7 = si gin = 920
a
all x i= =
DESCRIPTIVE STATISTICS 57

So

7.359
So S = V/7.359
2.71 (2d.p.)
Therefore the mean of the combined set of numbers is 4.625 and
the standard deviation is 2.71 (2d.p.).

In general, the mean of the combined set of numbers is given by

_ yk +1 ,X2
(n, +n)

The formula for the standard deviation of the combined set of


numbers is very complicated to write out. It is better to work out
ny nz

ye x;? and Dt as in Example 1.36 and proceed from there.

Example 1.37 Suppose that the values of a random sample taken from some
population are x1, X2,...,X,. Prove the formula
n

ya; -#) = > “7° — nx?


t=

Prior to the start of delicate wage negotiations in a large company,


the unions and the management take independent samples of the
work force and ask them at what percentage level they believe a
settlement should be made. The results are as follows:

; Standard.


management’ 350 12.4% 2.1%

union’ Zod. 10.7% 1.8%

Assuming that no individual was consulted by both sides, calculate


the mean and standard deviation for these 587 workers.
(AEB 1979)
58 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 1.37 For first part, see p. 49.

Management n, = 350, X, = 12.4, 8; 2.1

Union nz = Zale X> =a 10.7, So TS

For the combined set of 587 workers

n,X, =e NX

n, +n,

(350)(12.4) + (237)(10.7)
587
= 11.7' (1dp.)
Therefore the mean percentage level for the 587 workers is 11.7%
(1 d.p.).

Dx?
For the combined variance s? = —— — xX”
n

We need to find 2x” for ‘management’ and ‘union’.

Management

|= n,(¥,2+8,2) = 350(12.42+2.12) = 55359.5


Ig=
Nn

II NWAxs tes) 237(10.77+1.87) = 27902.0


3

ny nz
LPS vor
Bs Bs 1 7—1 _ x2

n,+n,

_ 55 359.5
+ 27 902.0 aie
587 eet

= 4,95

Hence s = 2.2 (1d.p.)

The standard deviation for the 587 workers is 2.2% (1 d.p.).


DESCRIPTIVE STATISTICS AS o / 59
ea ee Se Beee ee
Exercise 1p
I NRA SER LARNER ENE

( T) For each of the following sets of data, find and a standard deviation 1.9 yr. Calculate
~ the mean and the standard deviation of the the mean and standard deviation of the
combined set. ages of all the 565 pupils. (AEB 1976)
a)rny = 125 x; = 6, = 2 Se
te) ar 5. Be ma 51 = ‘4. Suppose that the values of a random
Derek? | 10, s2.= 3 sample taken from some population are
(b) ny = 30, X, = 27, s; = 5.6 X1,%2,...,Xy,. Prove the formula
nz = 40, X2 — 33, 2 6.4 n 2 n

(c) np =42, °F, = 15, 31 = 2.7 42(4)


moe
cea nk
2

Cheeses = a a a Parplan Opinion Polls Ltd. conducted a


my — 13,0 %3 = 12, %s, = 2.4 nationwide survey into the attitudes of
(\ teenage girls. One of the questions asked
\ 2.) For a set of 20 numbers 2x = 300 and bed me : be ideal age for a ginlite have
ees 7 5500 Por a second setof 30 her first baby?’ In reply, the sample of 165
mumbets 2x = 480 and Six? = 9600" Find girls from the Northern zone gave a mean
the mean and the standard deviation of the of 23.4 years and a standard deviation of
ROminncd SeriCo DO numbers. 1.6 years. Subsequently, the overall sample
of 384 girls (Northern plus Southern zones)
nN , » 3 gave a mean of 24.8 years and a standard
\ 3.) Prove the formula 2 (x —*X)" = Die TwIk deviation of 2.2 years.
~~ In a Middle School there are 253 girls Assuming that no girl was consulted twice,
whose ages have a mean 11.8 yr and a calculate the mean and standard deviation
standard deviation 1.7 yr. There are also for the 219 girls from the Southern zone.
312 boys whose ages have a mean 12.3 yr (AEB 1981)

Standard deviation —data in the form of a frequency distribution


If (x1, 2,...,X,) occur with frequencies (f;,/,...,f,) then the
standard deviation s is given by:

fie —*)? i= 12). ~n


Lf;

When class intervals are given, the mid-point of an interval is taken


to represent the interval.

Example 1.38 The table shows the number of children per family for a group of
20 families. The mean number of children per family is 2.9. Find
the standard deviation.

Frequency, f
60 A CONCISE COURSE IN A-LEVEL STATISTICS

D f(x —x)’
Solution 1.38 Method 1—using s =
af

So $2
Yf(x — 2.9)?
= 2f(x— 2.9)"

Lf
_ 29.80
20
= 1.49
s = /1.49
= 1.22 (2d.p.)

The standard deviation of the number of children per family is


1.22 (2d-p.).

; 2B fx?
Method 2—using s = xs
Lf

So

La 198 (2 9)?
20 ;
= 1.49
s = 71.49
= 1.22. (2d.p.)

The standard deviation is 1.22 (2 d.p.), as before.


DESCRIPTIVE STATISTICS X 61
Method 3 — using the calculator in SD mode.
This time we need to take account of the frequencies, and this is
done as follows:

Using Casio 100C or 115N Using Casio 82D

Set to SD Set to SD ia

|gives
seurt][1 % = 29 ][7 ]eives
[inv = 29
suit] |2 |gives sei 22088n [wv |[8 |gives ; 1E2203..;

3 |gives
[Kout][ =f = 20 ][6]sives
[inv =f = 20
2 |gives Eye =II 58
[Kout][ |[5|gives Eye = 58
[vv
[Kout ][1 |gives Sfx? =I 198 [inv |[4 |gives Sfx?= 198

Therefore the standard deviation is 1.22 (2 d.p.), as before.

Example 1.39 The lengths of 32 leaves were measured correct to the nearest mm.
Find the mean length and the standard deviation.

Length (mm) 20-22 23-25 26-28 29-31 32-34


62 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 1.39 The mid-points, x, of each interval are considered:

21
24
27
30
33
D fx? = 23 805

Sfx 867
N Ow g = eee
x Sf 39 a7 st (tian.p.)
Sfx? 23805 (867\2
Bnd pi Se | = 9.835
Sf 32
s /9.835 = 3.14 (2dp.)
The mean length of the leaves is 27.1 mm and the standard devia-
tion is 3.14 mm (24d.p.).

_ Exercise 1q

Do questions 1, 2 and 3 without using the (d) 1020 30 40 50 60


calculator in SD mode, and then check them S ag cua ou Car
using SD mode. S
(e) 1-7 8-14 15-21 22-28
1. The score for a round of golf for each of lnelnenest
50 club members was noted. Find the
mean score for a round and the standard (f) 78) 36 ete de OTS
deviation. 7 5
Score, x |66 67 G68 GO TO, 7) 72 73

A
Frequency, f | 2 b OC 2) 39) 6 4 2
4. Fora particular set of observations Lf = 20,
Dfx? = 16143, Dfx = 563. Find the values
2. The scores in an IQ test for 60 candidates of the mean and the standard deviation
are shown in the table. Find the mean 7
score and the standard deviation.
5. For a given frequency distribution
100-106 107-113 114-120 121-127 128-134
3 7. i = ; mh
> =e
—x)2= 182.3, Dfx 2= 1025, 2f= = 30.

3. Find the mean and the standard deviation


for each of the following sets of data: 6. From the information given about each of
the following frequency distributions,
(a) work out the missing values in the table:
ni seme algerie (ike

(b) 1-3 4-6 7-9 10-12 13-15

(c) | interval | 20-24 25-29 30-34 35-39 40-44


Frequency 1 6 10 2 1
DESCRIPTIVE STATISTICS c 63
Using the method of coding to find the standard deviation
Consider the set of numbers x,, x2, ..., x,,. To find the standard
deviation we can use the coding

tk, Be.
y= b where X, is the assumed mean and
b is a suitable constant
Now = x, + by
and x = X,t+by (seep. 44)
z Lie 2
Also Rae) PORe Aiea
be) en eT
n
_ 2 (%_ + by; — (%, + bY)?
n
2 b%(y;—F)?
n

Ly yy
and so s=b 0 for: 7)1,75...
58
n

Using the alternative form of the formula, we have

Ly;?
s = 8 ee,
n

If the numbers occur with frequencies /,, f,, ..., f,, then the
corresponding formulae for the standard deviation are:

Phi oy)? Diy


s=b /—— or s=bdb /— -y
Lf; Zhi

These formulae look very complicated, but in fact they are very
easy to use.

Example 1.40 Find the standard deviation of the set of numbers


327, 332, 342, 347, 352

aX eS
Solution 1.40 We will use the coding y = . * where X, = 342,0=5.

ewBAZ
sO aeaa ene
64 A CONCISE COURSE IN A-LEVEL STATISTICS

Now

; oF
Fe 5
= 25[3.6—(—0.4)?]
= 86
So s = /86
= 927 (2d.p.)
The standard deviation of the set of numbers is 9.27 (2 d.p.).

Example 1.41 For the data given in Example 1.39, find the standard deviation,
using a method of coding.

Solution 1.41 We note that the class widths are each equal to 3 and a central
Si
value is 27. Therefore, we choose the coding y =

Now y=
DESCRIPTIVE STATISTICS ; 65

D fy2
We have s? = b? p —¥|) wherey bp = 73
2f
35 layne
= 9/— — |—
. La |
= 9.835
So s = 79.835
= 3.14 (2d.p.)
‘The standard deviation of the length of the leaves is 3.14 mm, as
ree

Exercise 1r

1. Find the mean and the standard deviation Find the mean mark and the standard
of the following sets of data, using a deviation by using an assumed mean
method of coding: between 50 and 60 and a coding factor
of 10.
(a) |x |304 308 312 316 320 324
ife es oN eAel ea 2 3. A farmer grows two different varieties of
(b) 10-19 20-29 30-39 40-49 50-59 60-69 potatoes, Desiree and Pentland Squire. A
sample of 50 potatoes of each variety is
f 3 ie 48. 2p te
taken and the potatoes are weighed. The
(c) |x |1250 1500 1750 2000 2250 2500 2750 results are shown in Table B below. Find
Ca ar ee ee the mean and the standard deviation for
each sample. Use a method of coding.
(@) i be LOS ls) 22) Gee G 0 4, The table shows the times taken on 30
consecutive days for a coach to complete
(e) ey(Oa Of 0:7 1.0 123° WS 1.9 22 one journey on a particular route. Times
ie eens 1579 fe. 3.1 have been given to the nearest minute.
Find the mean time for the journey and
(f) -200 -250 -300 -350 -400 -450 -600 the standard deviation.
i 8 80 ess 38 |25 14 28
Time (min) 60-63 64-67 68-71 72-75 76-79
2. The marks obtained in an examination by Frequency 1 3 12 10 4
190 students are recorded in Table A below.
Table A

Mark O- 10- 20- 30- 40- 50- 60- 70- 80- 90- 100-

|Frequeney| 3 5 5 6 25 33 49 40 15 9 0
TableB

0-60 60-120 120-180 180-240 240-300 300-360 360-420 420-480


Mass (g)

Desiree 1 3 4 7 12 15 5 3
frequency

Pentland
17 12 3 1
Squire
frequency
66 A CONCISE COURSE IN A-LEVEL STATISTICS

MISCELLANEOUS WORKED EXAMPLES

Example 1.42 In order to estimate the mean length of leaves from a certain tree a
sample of 100 leaves was chosen and their lengths measured correct
to the nearest mm. A grouped frequency table was set up and the
results were as follows:

NEU 9°57 39 3.7 42 AY 525.7 6.2


value (cm)

(a) Display the table in the form of a frequency polygon and


describe the distribution exhibited by this polygon.

(b) Calculate estimates for the mean and standard deviation of the
leaf lengths using an assumed mean of 4.7 cm.

(c) What are the boundaries of the interval whose mid-point is


3.7cm?

(d) Construct a cumulative frequency table and use it to estimate


the sample median. (SUJB)

Solution 1.42 (a) Frequency polygon to show lengths of leaves

——4

20
Frequency

222 257 323.7 a2 42 ale Ore


Length of leaf (cm)

This is a frequency distribution which is skewed to the left; an


estimate of the modal length of leaf is 4.7 cm.

(b) As the intervals each have a width of 0.5, and we are told to

take 4.7 as assumed mean, we use the code y = to make


0.5
the working easier when finding the mean and the standard devia-
tion. *
DESCRIPTIVE STATISTICS 67

= 3
Now
" 0.5
so x = 4.74+0.5y
a
and X = 4.7+0.5y where y = *Ty
ead

Therefore x = 4.74+0.5 ae
100

= 4.405

s2 — oy
> fy? =
Now
Lf
363 —59\?
=..(0.25)=— Na
100 100

= (0.8205

s = 0.8205
/V
= 0.91 (2d.p.)

Therefore the mean length of the leaves is 4.405 cm and the stan-
dard deviation is 0.91 cm.

(c) Each class width is 0.5 cm, so the class with mid-point 3.7 has
l.c.b! ="°3.7—0.25 = 3.45
u.c.b. = 3.7+0.25 = 3.95

Therefore the interval whose mid-point is 3.7 cm is defined by

3.45 cm S length of leaf << 3.95 cm


68 A CONCISE COURSE IN A-LEVEL STATISTICS

(d) Cumulative frequency table to show lengths of leaves

Length (cm) oy Cumulative


(mid-point) poco oe frequency

< 1:95 0

The median is the 5(2f+1)th item, i.e. the 50.5th item.


From the table, we see that this lies in the interval 4.45-4.95.
There are 24 items in the interval 4.45-4.95 and the median is

A of the interval of 0.5cm from 4.45 to 4.95.

4.45cm median 4.95cm

46 items

50.5 items

70 items

4.5
An estimate of the median is 4.45 + Flos = 4.54cm (2d.p.).

Example 1.43 In a certain village of 400 inhabitants the distribution of ages is


as follows:

Age(years) | 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-99
Frequency 44 56 64 78 60 40 36 18 ~

Explain why the mid-interval value of the 80-99 group is 90 years.


Use the data to estimate the mean age and the standard deviation,
in years, each correct to 1 d.p.
Draw a cumulative frequency graph and estimate the percentage of
the population that have ages within 1 s.d. of the mean. (SUJB)

Solution 1.43 Assuming that ages have been given in completed years then the
interval 80-99 means 80 < age < 100; someone who is 99 years and
11 months, for example, would come into this interval.
DESCRIPTIVE STATISTICS , 69

The mid-point of the interval = 3(u.c.b.


+Lc.b.)
= 5(100 + 80)
= 90
To find the mean and the standard deviation we use a method of
‘coding’. The class widths (with the exception of the last one) are
each equal to 10 and a ‘central’ value, suitable to be chosen as an
assumed mean, is the mid-point of the interval ‘40-49’ which is 45.
x45
We use the code y=
10

L fy?
= 1969

N 2% 46
Ow y ie

So x = 45+10y
a my=
¥ =
and f=.45710y where
Lf

Seo
i ig i ]
== 30.0 1(1d.p)
>» 2

We have gts Sp?=i — where b = 10


Lf
1969 ee
abs eoous e400
= 408.53

So Ss =. 20-2. (4'd.p.)

the
Therefore the mean age of the population is 35.9 years and
deviati 20.2 years.
on is See
OS EN dEe
standar ees
70 A CONCISE COURSE IN A-LEVEL STATISTICS

The cumulative frequency distribution is as shown:

Cumulative
Age (years) Frequency Age (years) frequency

0-9 44 44
10-19
20-29

frequency
Cumulative

0 20 40 60 80 100
Age (years)

We want to find the number of people with ages within one stand-
ard deviation of the mean, i.e. the number of people with ages in
the interval 35.9 + 20.2 = (15.7, 56.1).
From the graph, we estimate that
72 people have ages below 15.7 years,
330 people have ages below 56.1 years.
So 258 people have ages in the interval (15.7, 56.1),

258
i.e. the percentage of the population wenlCL00)Z
400
= 64.5%
Therefore 64.5% of the population of the village have ages which
are within 1s.d. of the mean.
DESCRIPTIVE STATISTICS Zhi

Example 1.44 A set of n values has mean yu and variance s,”. A second set of values
has mean au and variance s,”. Given that s is the standard deviation
of the combined set of 27n values, show that
1 1
a= s(8 1 82) AG te

Solution 1.44

elesetse
= i
Second set

Let X be the mean of the combined set.


Det
oi
2n
nut+anp
2n

nu(a
+1)
2n

= 3u(a+1)
patel n
Dx? = n(s;7+p7)
Similarly Dx—?- = n(s.°
+ a7”)
For the combined set

x7 n(s,;2 + 2 +s, + ay?)

n(s,;2 +s? +u2(1 +a*))


Diaee —
= x2
2n
n(s\2+s8,2+(1+a?)u?) (1+a)*u?
2n 4

py?
ai" tS)0) $(1 +a?)y?—4(1 +a)?

15,2 45,2) + bu2(2(1 +a?) —(1 + 2a +a?))


1(5,2 45,2) + }u2(2 + 2a?—1 — 2a ~<a?)
4(s,?7+ 8:7) + tu%(a?—2a + 1)
4(s,?
De
ee i +s,7)+
ey AN 1u(a —1)? asrequired
ee
A CONCISE COURSE IN A-LEVEL STATISTICS

Miscellaneous Exercise 1s

Two hundred and fifty Army recruits Obtain an estimate of the mean and standard
have the following heights. deviation of the data. Estimate the median,
and the lower and upper quartiles. (O &C)
Height (cm) 165- 170- 175- 180- 185- 190-195
No. of recruits | 18 37 60 65 48 22 4, Table A below gives an analysis by num-
bers of employees of the size of UK
Plot the data in the form of a cumulative factories of less than 1000 employees
frequency curve. Use the curve to estimate manufacturing clothing and footwear.
(a) the median height, (b) the lower Calculate as accurately as the data allow
quartile height. the mean and the median of this distribu-
The tallest 40% of the recruits are to be tion, showing your working.
formed into a special squad. Estimate If 90% of the factories have less than N
(a) the median, (b) the upper quartile of employees, estimate N. (O&C)
the heights of the members of this squad.
(SUJB Additional) \ The numbers 4,6,12;4,10,12,3,x,y
' have a mean of 7 and a mode of 4. Find
Below are given the number n of hours (i) the values of the two numbers x and
worked in a week by 64 men. y, (ii) the median of this set of nine
numbers.
30.8 27.6 33.6 39.4 39.7
21.8 40.6 33.9 36.9 39.1 When two additional numbers 7+ 7 and
45.4 42.5 9.6 26.3 36.1 7— n are included the standard deviation
30.5 44.4 38.4 40.6 26.5 of all eleven numbers is found to be 4.
52.7 35.7 28.9 38.2 30.4
34.8 37.8 38.0 43.7 40.8 Write down the mean of these eleven
40.1 23.7 31.8 42.0 29.1 numbers and calculate the value of n.
37.3 28.4 39.6 22.9 35.2 (C Additional)

(i) Group the numbers into intervals of 654 The sum of 20 numbers is 320 and the
width 8 hours defined by 9.5 <n < 12.5, sum of their squares is 5840. Calculate
1285 in) <a ober the mean of the 20 numbers and the
(ii) Use the grouped data to calculate standard deviation.
estimates of the mean and standard (i) Another number is added to these
deviation of n. 20 so that the mean is unchanged. Show
(iii) Estimate the percentage of workmen that the standard deviation is decreased.
for whom nis within one standard devia- (ii) Another set of 10 numbers is such
tion of the mean. (MEI) that their sum is 130 and the sum of
their squares is 2380. This set is com-
The following table shows the durations bined with the original 20 numbers. Cal-
of 40 telephone calls from an office via culate the mean and standard deviation
the office switchboard. of all 30 numbers. (C Additional)
Duration A weather station recorded the number
S1 1-2/ 2-3) 3-5 5-10 210
in minutes
of hours of sunshine each day for 80
Number of
6 LO 45 5 4 0} days, with the results as shown in Table B
calls
below.

Table A

Number of
11-19 20-24 25-99 100-199 200-499 500-999
employees

Number of
1500 800 2800 70 0 400 100 5800

Table B
Hours of
O O-1 1-2 2-3 3-4 4-5 5-6 6-7 17-8 8-12 over 12
sunshine

Number
Bee ls 2 6 17 ade allah 5 3 9 2 0
DESCRIPTIVE STATISTICS 73

[The grouping symbol 2-3, for example, nearest minute, the estimated mean and
denotes greater than 2 hours and less than standard deviation for the duration of all
or equal to 3 hours.| 100 journeys. (C)
State which is the modal group.
10.
~
Table D below gives the cumulative
Construct a cumulative frequency table frequency distribution of the masses x in
and draw the cumulative frequency curve. kilogrammes of a group of 200 eighteen-
Use your curve to estimate (i) the median, year-old boys.
(ii) the inter-quartile range, (iii) the per-
Draw a cumulative frequency graph and
centage of days for which more than 3h from this estimate the median.
hours of sunshine were recorded.
(C Additional) Compile a frequency distribution from
the data and hence estimate the mean and
8. (a) Sketch the expected frequency curves standard deviation of the sample. State a
‘for each of the following distributions: well known probability distribution which
(i) the number of light bulbs broken you would expect to fit such data.
in boxes containing 125 bulbs, id (JMB)
assuming that the modal number of iy
breakages is 0, 11. 100 pupils were tested to determine their
(ii) the age at marriage of females. intelligence quotient (I.Q.), and the
(b) State the assumption that is made in results were as follows:
obtaining measures of average and dis- Beals 55- 65- 75- 85- 95- 105- 115- 125-134
persion from grouped frequency tables. No, of pupils | 1 1° 2 #6 21 29 24 12 4
The table below shows the ages, at last
birthday, of the employees of a certain All 1.Q.’s are given to the nearest integer.
firm. (i) Calculate the mean, and the standard
deviation.
Age (last Less than 20 20- 25- 30- 40- 50 and over (ii) Draw a cumulative frequency graph,
birthday)
and estimate how many pupils have I.Q.’s
within 1s.d. on either side of the mean.
employees
(SUJB)
Without drawing a cumulative frequency
curve, estimate (i) the semi-interquartile 12. (a) Find the median, mean, and standard
range, (ii) the number of employees aged deviation of the set of numbers 3,5,12,1,
37 and over. (C Additional) 6, 3,12.
(bo) A set of digits consists of m zeros
9. Table C below shows the durations of 60 and n ones. Find the mean of this set and
journeys on the same route bya lorry, show that the standard deviation is
the variations in journey times being
caused by varying traffic conditions. V(mn
Can) (C Additional)
Calculate, to the nearest minute, estimates
of the mean and standard deviation for
the duration of the journeys. 13. (a) A set of values of a variable X has a
When the times for 40 other journeys mean WU and a standard deviation 0. State
were taken, it was found that the mean the new value of the mean and of the
_pf standard deviation for the times of standard deviation when each of the
these 40 journeys were 6h 24 min and variables is (i) increased by k, (ii) multi-
18 min, respectively. Find, also to the plied by p.

Table C
\

5.6-5.8 5.8-6.0 6,0-6.2 6.2-6.4 6.4-6.6 6.6-6.8


j
Time of journey
in hours
Number of
journeys

Table D

30 35 40 45 50 55 60 65 70 75 80 85 90 95

Number with
0 1 4 11 25 47 79 114 146 171 187 195 198 200
mass less
than x
74 A CONCISE COURSE IN A-LEVEL STATISTICS

Values of a new variable Y are obtained that the mean becomes 45 and the lower
by using the formula Y = 3X+ 5. Find quartile becomes 35.
the mean and the standard deviation of State, with reason, whether the quartiles
the set of values of Y. of the original marks will scale into the
(b) It is proposed to convert a set of quartiles of the scaled marks. (SUJB)
values of a variable X, whose mean and |
16: The table shows the yield, in litres, of
standard deviation are 20 and 5 respec-
/ milk produced by 131 cows at a certain
tively, to a set of values of a variable Y
farm on a given day.
whose mean and standard deviation are
42 and 8 respectively. If the conversion Yield (litres) }5-10 11-16 17-22 23-28 29-34 35-40
formula is Y = aX + b, calculate the value 26 18 7
of a and of b. (C Additional)
(a) State the modal class and estimate the
14. A set of numbers has mean pL and standard mode. (b) By calculation, estimate the
deviation o. A new set of numbers is median yield. (c) Draw a cumulative
obtained by subtracting uw from each frequency curve, and from it estimate
number and dividing the result by 0. Write the semi-interquartile range. (d) Calculate
down the mean and standard deviation the mean and the standard deviation of
of the new set of numbers. the distribution, using a method of
In an examination in Statistics the mean coding.
mark of a group of 120 students was 68
andathe=stantiard udeviatiouswassGain 17. In a certain industry, the numbers of
Algebra the mean mark of the group was thousands of employees in 1970 were as
62 and the standard deviation was 5. One shown in Table F below, by age groups.
student scored 76 in Statistics and 70 in Calculate the arithmetic mean, median,
Algebra. By scaling the marks for each variance and standard deviation of the
subject so that each set of marks has the ages of employees in the industry.
same mean and standard deviation ou Estimate the percentage of the employees
pare the performances of this student in whose ages lie within one standard
the two subjects. (C Additional) deviation of the arithmetic mean.
15. 200 candidates sat an examination and aS ee te
the distribution was obtained as shown in 18, PRO ee ae 4d ee eee
ble B below. \) each estimate the height of the top ofa

If the limits of class 40-49 are 39.5 to | eeSe se


49.5, what is the mid-interval value of
tlicuslana? 47, 52,52, 54, 52, 50, 51,50, 48,538,54,49
Calculate the mean of the marks explain- (a) Calculate the median of these
ing any limitations of your calculation. ecuietl miter . "
: f alculate the mo e,m,t at is the
ae ee: mat Bae Ae number which has the highest frequency.
fee (c) Two extra children join the class
: and each makes an estimate. The mode
Assuming that your estimates are exact, for the set of 14 estimates is different
find values for a and 6 correct to 2 sig- from m and unique. Suggest what the two
nificant figures, in order that the above new estimates could be. .
marks can be scaled by the equation (d) Calculate the arithmetic mean, X, of
y =ax+ b, where y is the new mark, so the original 12 estimates.

Table E

Marks (x) |10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99
10 18 20 30 49 46 20 5 2

Table F

Age last
AaeeOtee 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64

Number of
thousands
66 65 56 50 42 37 35 30 24 22
DESCRIPTIVE STATISTICS 4 75

(e) One member of the original class of (b) The average height of 20 boys is
12 revises his estimate and the new mean 160 cm, with a standard deviation of
for the 12 estimates is X+ 0.5. Find the 4cm. The average height of 30 girls is
increase in the estimate of this member. 155 cm, with a standard deviation of
(f) The teacher of the class makes an 3.5cm. Find the standard deviation of
estimate of the height of the church the whole group of 50 children. (SUJB)
tower and when her estimate is taken
with the original 12, the mean of all 13 21. (a) Sketch frequency curves for distribu-
estimates is X+ 0.5. Find the teacher’s tions which have one mode and for
estimate. which (i) the mode, median and mean
(g) Two extra children, different from coincide, (ii) the mode is less than the
those mentioned in (c), join the class and median, indicating on each sketch the
each make an estimate so that the mean positions of these measures.
of their two estimates and the original 12 (b) The mean of the set of numbers 3,1,
estimates is X+ 0.5. 7,2,1,1,7,x,y, where x and y are single
digit positive whole numbers, is known to
Find the sum of their two estimates.
be 4. Show that x+y = 14.
(L Additional)
Hence, or otherwise, find the mode of
this set of numbers when (i)x =y,
_\19. Ten values of a variable x are
(ii)x Fy.
822, 0.0, O71, 6225 0.4, (-9,,8.0, 8:3, 7.8, 8.1
If the standard deviation is 3/76 find x
Express each of these values in the form
and y, assuming that x Sy.
8+ 0.1y. Calculate the arithmetic mean
(C Additional)
and the variance of the ten values of y
and hence, or otherwise, deduce the 22. A random sample of 1000 surnames is
mean and the variance of the ten values drawn from a local telephone directory.
of x. The distribution of the lengths of the
Hence find the mean and the variance of names is as shown in Table G below.
the set of ten numbers Calculate the sample mean and sample
824, 804, 814, 824,844, standard deviation. Obtain the upper
794, 804, 834, 784,814 quartile.
A transformation of the form z = a+ bx, Represent graphically the data in the
where b > 0, is applied to the first set of table.
ten values of x so that the mean is Give a reason why the sample of names
increased by 0.9 and the standard devia- obtained in this way may not be truly
tion is doubled. Find the values of the representative of the population of Great
constants a and b. (L Additional) Britain. (JMB)
)
20. /Show, from the basic definition, why 23. In an agricultural experiment the gains in
/ the standard deviation of a set of obser- mass, in kilograms, of 100 pigs during a

4
vations x,X2,X3,---,*, with certain period were recorded as follows:
mean X may be found by evaluating
Gain in mass | 59 410-14 15-19 20-24 25-29 30-34
(kilograms)
Ex,70°.5sa.
freee
n Frequency 2 29 37 16 14 2

(a) Find, showing your working clearly Construct a histogram and a relative
and not using any pre-programmed cumulative frequency polygon of these
function on your calculator, the standard data. Obtain (i) the median and the semi-
deviation of the following frequency interquartile range, (ii) the mean and the
distribution: standard deviation.
Which of these pairs of statistics do you
27 28
consider more appropriate in this case,
15 11 and why? (AEB 1977)

Table G

Number of letters | 3 4 5 6 7 8 94410-1419


in surname
13 102.186) 6237 215, 1134, 83 32 13 6
76 A CONCISE COURSE IN A-LEVEL STATISTICS

24. Table H below gives the ages in completed (b) Draw a histogram and comment on
years of the 113 persons convicted of the shape of the distribution.
shop-lifting in a British town in 1986. (c) Using the frequency table estimate
Working in years and giving answers the mean and standard deviation of the
correct to 1 place of decimals, calculate marks.
(a) the mean age and standard deviation, (d) The marks are to be scaled linearly
(b) the coefficient of skewness given by
by the relation Y=atbX where X is
the old mark and Y the new mark. The
(mean — mode)/standard deviation, new mean and standard deviation are
(c) the median age. to be 50 and 10 respectively. Using your
estimates in (c) calculate suitable values
Which do you consider to be best as a
fora and b. (SUJB)
representative average of the distribution
—the mean, median or mode? Give
27. A travel agency has two shops, FR and S.
reasons for your choice.
The number of holidays purchased in a
Draw a histogram of the data with a class particular week and the mean and stand-
interval of 2 years. (SUJB) ard deviation of the costs of these holidays
at each shop are shown in the following
\ 25. A grouped frequency distribution of the table.
ages of 358 employees in a factory is
shown in Table I below. Estimate, to the Number of | Meancost | S.D.
nearest month, the mean and the standard holidays (£) (&)
deviation of the ages of these employees.
Shop R 32 190.35 10.4
Graphically, or otherwise, estimate
Shop S 24 202.25 15.5
(a) the median and the interquartile
range of the ages, each to the nearest Calculate the mean, and, to the nearest
month, : penny, the standard deviation of the costs
(b) the percentage, to one decimal place, of all the 56 holidays purchased. (L)P
of the employees who are over 27 years
old and under 55 years old. (L) 28. The following are the ignition times in
seconds (correct to the nearest 100th
26. The following is a set of 109 examination of a second) of samples of 80 uphol-
marks ordered for convenience. stery materials They are arranged in
numerical order by columns.

(a) Construct a grouped frequency distri- (a) Group these data into 8 equal classes
bution using a class width of 10 and commencing 1.00-2.49, 2.50-3.99, ...
starting with 0-9. and arrange them in a frequency table.

Table H

|Age| 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-49
Bee ee aa ae es 6

Table I

Age (last birthday) 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-60 61-
Number of employees 6 56 58 52 46 38 36 36 0
DESCRIPTIVE STATISTICS , 77
(b) Using the frequency table obtain litter. The contents of each bag are then
estimates for the mean time and standard weighed. A summary of the results is
deviation. shown in the table.
(c) Construct a frequency polygon for
the distribution and comment on its : Mean wt. S.D
Sample | Size %
shape.
(d) Chebychev’s Theorem states that, for al 50 11.8 0.5
any distribution, the proportion of the 2 30 2 1 0.9
population that lies outside k standard ao 20 ite. 7 eal
deviations from the mean is less than
1/k?. Verify this for the above distribu- Find, in kg to 2 decimal places, the mean
tion when k > 1.5. (SUJB) weight per bag and the standard deviation
arr for the 100 bags. (L)P |
20Xtn a borehole the thickness, in mm, of : : . s
Heros diriks aie cho warn thetable: 31. Referring to your projects if possible,
give an example of a graphical represen-
tation of
Thickness
(mm) Preis ast Oms hs AD 89% (a) a discrete frequency distribution,
(6) a grouped frequency distribution.
Number “5 9 ey 0
of strata Given the frequency distribution

Draw a histogram to illustrate these data. bisa OSes wo 6 Teno


Construct a cumulative frequency table
dBi hab Sai ary grag, OG LDC 1
and draw a cumulative frequency polygon.
Hence, or otherwise, estimate the median find the median and the semi-interquartile
and the interquartile range for these data. range when
Find the proportion of the strata that (c) x is a discrete variable,
are less than 28 mm thick. (L)P (d) x isa continuous variable whose values
were recorded to the nearest integer.

30. Three random samples of 50, 30 and 20 Calculate also, to 2 decimal places, the
bags respectively are taken from the mean and the variance of the above
production line of ‘12 kg bags’ of cat distribution. (L)

Table J

Lifetime (to 720-729 730-739 740-744 745-749 750-754 755-759 760-769 770-788|
690-709 710-719
nearest hour)

Number of 3 7 15 38 41 35 21 16 14 10 |
discs
PROBABILITY
An experiment can result in several possible outcomes. For example
(a) One toss of a coin results in the outcomes (H, T). If the coin is
fair, then each outcome is equally likely.
(b) Two tosses of a coin result in the outcomes (HH, HT, TH, TT).
Again, if the coin is fair, then each outcome is equally likely.
(c) If a machine produces articles, some of which are defective, the
outcomes are (defective, not defective). In this case the out-
comes should not be equally likely.
(d) If acoin is tossed repeatedly until a head is obtained, the out-
comes are (H, TH, TTH, TTTH, TTTTH,...).
(e) The outcomes of a race being run by A and B could be (A wins,
B wins, there is a dead heat). These outcomes may not be
equally likely.
Each possible outcome is called a sample point and the set of all
possible outcomes is the possibility space S.
If the possibility space has a finite number of sample points then we
denote the number of points in S by n(S).
Consider an event E which is a subset of S, then n(E) < n(S).
For example, for one throw of an ordinary die the possibility
space S = (1,2,3,4, 5,6) and n(S) =
Let E, be the event ‘the number is odd’, then E, = (1, 3,5) and
n(£,)= 8.
Let E, esthe event ‘the number is less than 3’ then E,= (1,2) and
n(E,)=

CLASSICAL DEFINITION OF PROBABILITY

Ifthe posstbiiey space S cone ofa faite number of equally


likely outcomes, wey the pe of an event E,wotsn PE)
is defined as

78
&
PROBABILITY 79
So, in the example on the previous page

Drees 5
: nis) a6 2
_ ME)
2 1
ie nS) 6 8
In order to investigate the rules which apply when considering
probabilities, we will consider the situation in the classical defini-
tion —that of a finite possibility space with equally likely outcomes.
However, the results apply in general and will be used in other
situations in the problems.

IMPORTANT RESULTS

Let the number of sample points in the possibility space be n, so


that n(S) =n.
Let the event A have r sample points, so that n(A) =r.

Result 1 We have

n(A) n s
Paya ay ;

fs CD |
=>

- n

Now, since A is a subset of S


O<r<n
p
i.e. 0=—=1
n

Hence 0< P(A) <1

NOTE: the probability of an event A is a number between 0 and 1


inclusive.
If P(A) = 0 then the event cannot possibly occur.
If P(A) = 1 then the event is certain to occur.
For example, if a card is drawn from the clubs suit of a pack of
cards, then
P(card is red) lI 0
P(card is black) II 1
80 A CONCISE COURSE IN A-LEVEL STATISTICS

Result 2 Let A denote the event ‘A does not occur’.

Now P(A) = na
n(S)
lizar
3 n

5 n
= 1—P(A)

Therefore P(A) = 1>P(A)

or P(A)+P(A) = 41

Example 2.1 A card is drawn at random from an ordinary pack of 52 playing


cards. Find the probability that the card (a) is a seven, (b) isnota
seven.

Solution 2.1 The possibility space S = (the pack of 52 cards) and n(S) = 52. Let
A be the event ‘the card is a seven’, then n(A) = 4.

(a) Now PA} =


n(A)
n(S)
4
52
i
13

Therefore the probability that the card drawn is a seven is as:

(b) Let A be the event ‘the card is not a seven’.


Now P(A)-= 1—P(4)

1
1
ae ae

13
12
13
Therefore the probability that the card drawn is not a seven is 12
13°

Example 22 Compare the probabilities of scoring a 4 with one die and a total of
8 with two dice.

Solution 2.2 With one die


The possibility space S = (1,2,3,4,5, 6) and n(S) = 6.
PROBABILITY 81
Let A be the event ‘a 4 occurs’, then n(A) = 1.

So P(A) = n(A)
n(S)
1
6
The probability of scoring 4 with one die is é

With two dice


The possibility space S has 36 sample points, each of which is
equally likely to occur. These can be represented on a diagram as
shown. The dots in the first column represent the outcomes (1, 1),
(1,2), (1,3), (1,4), (1, 5), (1, 6),..., and so on for the other
columns.
Let B be the event ‘the sum on
the two dice is 8’.

The sample points which give a


die
Second
sum of 8 are ringed on the
diagram.
We see that n(B) = 5.

First die

n(B)
So P(B) = —
n(S)

gah
36
The probability of obtaining a total of 8 with two dice is 2.

So P(scoring an 8 with two dice) < P(scoring a 4 with one die)

Example 2.3 Two fair coins are tossed. Illustrate the possible outcomes on a
possibility space diagram and find the probability that two heads
are obtained.

Solution 2.3 Each coin is equally likely to show a head or atail. The possibility
space for the outcomes when two coins are tossed is as shown.
n(S) = 4 ye
Let A be the event ‘two heads are obtained’.
From the diagram n(A) = 1.
Second
coin

First coin
82 A CONCISE COURSE IN A-LEVEL STATISTICS

Therefore PA n(A)
n(S)
iL
4
The probability that two heads are obtained when two fair coins
are tossed is ie

si ele Uo A ee ets es ee ee ee ee __

Exercise 2a
eee LLCO OE

An ordinary die is thrown. Find the If a child is chosen at random, find the
probability that the number obtained probability that there are three children
(a) is a multiple of 3, (b) is less than 7, in his or her family.
(c) is a factor of 6.
ti
A ecard is drawn at random from an & = {(x:x is an integer and 1 < x < 20}
ordinary pack containing 52 playing
A = {x:x
isa multiple of 3}
cards. Find the probability that the
card drawn (a) is the four of spades, B = {x:x
isa multiple of 4}
(b) is the four of spades or any diamond, and an integer is picked at random from
(c) is not a picture card (Jack, Queen, &, find the probability that (a) it is in
King) of any suit. A, (6) it is not in B.
From a set of cards numbered 1 to 20a 8 A die is in the form of a tetrahedron and
card is drawn at random. Find the proba- its faces are marked 1,2,3 and 4. The
bility that the number (a) is divisible by ‘score’ is the number on which the die
4, (b) is greater than 15, (c) is divisible lands. Find the probability that when a
by 4 and greater than 15, tetrahedral die is thrown the score is
If the card is divisible by 4 and it is not (a) an even number, (6) a prime number.
replaced, find the probability that (d) the (NOTE: 1 is not a prime number.)
second card drawn is even.
If two tetrahedral dice are thrown find
A counter is drawn from a box con- the probability that (c) the sum of the
taining 10 red, 15 black, 5 green and 10 two scores is 5, (d) the difference of the
yellow counters. Find the probability two scores is 1, (e) the product of the
that the counter is (a) black, (6) not two scores is a multiple of 4.
green or yellow, (c) not yellow, (d) red
An ordinary die and a fair coin are
or black or green, (e) not blue.
thrown together. Show the possible
Two ordinary dice are thrown. Find the outcomes on a possibility space diagram
probability that (a) the sum on the two and find the probability that (a) a head
dice is 3, (b) the sum on the two dice and a 2 is obtained, (b)a tail and a 7 is
exceeds 9, (c) the two dice show the obtained, (c) a head and an even number
same number, (d) the numbers on the is obtained.
two dice differ by more than 2, (e) the
product of the two numbers is even.
10. An ordinary die and two coins are thrown
together. Show the possible outcomes on
The pupils in a class were asked how a possibility space diagram and find the
many brothers and sisters they had. Their probability that (a) two heads and a
answers are shown in the table: number less than 3 is obtained, (b) the
coins show different faces and a 4 is
Number of brothers shown on the die, (c) the die shows an
odd number and the coins show the
Number of pupils
same face, (d) a 6 and at least one head
MPA
(sh eet al
is obtained.
PROBABILITY | / 83

11. Two dice are thrown simultaneously. The 30


scores are to be multiplied. Denoting by (0) P(4), (c) P(44), (a) ZY Pm).
P(n) the probability that the number n E ‘lee Ce
will be obtained, calculate (a) P(9), ce prin Se )PoesIDle
values of ft. (L Additional)

Result 3
'

If A and B are any two events of the same experiment such that
P(A) #0 and P(B) # 0 then
P(A or B) = P(A) + P(B)—P(A and B)
Note that ‘A or B’ means ‘A occurs, or B occurs, or both A and B
occur’.
Writing the result in set notation we have

P(AUB) = P(A)+P(B)—P(ANB)

To illustrate this result let n(S) = n where S is the possibility space,


n(A) =r
n(B) = s
n(ANB) =t
The Venn diagram is as shown. The shaded area represents AUB.
Now

P(AUB) = ee
n(S)
(Gf) dt (saab)
= aaa tea ae

Cale ss
| aL ee
Pas t ANB
“non on
P(A) + P(B)—P(ANB)

Example 24 A coin and a die are thrown together. Draw a possibility space
diagram and find the probability of obtaining (a) a head, (b)a
number greater than 4, (c) a head anda number greater than 4,
(d) a head or a number greater than 4.

Solution 24 | The possibility space S is as shown.


Let A be the event ‘a head is obtained’, so n(A) =
Let B be the event ‘a number greater than 4 is obtained’, so
n(B) =
84 A CONCISE COURSE IN A-LEVEL STATISTICS

A
\ s
Gs ial ae)
Coin
0 neg ee =12
penBe? vi(S)
1 2 3 4 5 6
Number on die

(a) PA) eae n

The probability of obtaining a head is 5

‘n(B) 4 1
(b) se) Sj 12 ak
The probability of obtaining a number greater than 4 is 5:

(c) P(head and a number greater than 4) = P(AMB)


_ n(ANB)
~— n(S)
ll

Bl»
alH
The probability of obtaining a head and a number greater than 4 is :

(d) P(a head or a number greater than 4) = P(A UB)


_ n(AUB)
n(S)

[a
wir
The probability of obtaining a head or a number greater than 4 is z.

We now check that this satisfies P(A UB) = P(A) + P(B)— P(A NB).

left hand side = P(AUB) = -


PROBABILITY 7
85

right hand side = P(A)+P(B)—P(AMB)


aie ei? 71
Op
8
ele
ee

a8
Therefore left hand side = right hand side and
P(AUB) = P(A) +P(B)—P(ANB).

Example 25 Events A and B are such that P(A) = 8, P(B) = 2 and P(AUB) = 2.
Find P(ANMB).
Solution 25 Now P(AUB) = P(A) +P(B)—P(ANB)

sO fi = Lea. P(ANB
5 S0, 5 ( )
TOME T2224
PUA Be sate air
30 30 30

ae
30
7
Therefore P(AMB)= 30°

Example 2.6 In a group of 20 adults, 4 out of the 7 women and 2 out of the 13
men wear glasses. What is the probability that a person chosen at
random from the group is a woman or someone who wears glasses?

Solution 2.6 Let W be the event ‘the person chosen is a woman’ and G be the
event ‘the person chosen wears glasses’.
Now

7 6 4
PWieanae20’nt BOG) — PCW andiG) SRW OG) =o20
20:
P(W or G) = P(WUG) = P(W)+P(G)—P(WOG)
Cn
~ 20 20 20
9
~ 20
Therefore the probability that the person is a woman or someone
who wears glasses is oe
tape ce eee ae oh)20
86 A CONCISE COURSE IN A-LEVEL STATISTICS

MUTUALLY EXCLUSIVE EVENTS

Result 4
A can occur or an event B can occur but not both A
If an event
and B can occur, then the two events A and B are said to be mutually
exclusive.

In this case n(ANB) = 0 and ANB= 9.

When A and B aree mutually exclusive


events A a fe
3 P(AUB) Il: PLA) + P(B) C-)

and =P(ANB)= Ul 0 *
This is known as the addition law for mutually exclusive events.

Examples of mutually exclusive events:

(i) A number is chosen from the set of integers from 1 to 10


inclusive. If A is the event ‘the number is odd’ and B is the
event ‘the number is a multiple of 4’ then A and B are mutually
exclusive events, as an event cannot be both odd and a multiple
of 4.
(ii) Two men are standing for election as chairman of a committee.
Let A be the event ‘Mr Smith is elected’ and Y be the event ‘Mr
Jones is elected’. Then A and Y are mutually exclusive events as
both cannot be elected as chairman.

Example 2.7 In a race the probability that John wins is - the probability that
Paul wins is t and the probability that Mark wins is z. Find the
probability that (a) John or Mark wins, (b) neither John nor Paul
wins. Assume that there are no dead heats.

Solution 2.7 We assume that only one person can win, so the events are mutually
exclusive.

(a) P(John or Mark wins) = P(John wins) + P(Mark wins)


la
= —4+—
3 5
at
15

P(John or Mark wins) = &


PROBABILITY 87

(b) P(neither John nor Paul wins) = 1 — P(John or Paul wins)

=l|1-—
7
12
ao
ao
P(neither John nor Paul wins) = 2.

Example 2.38 A card is drawn at random from an ordinary pack of 52 playing


cards. Find the probability that the card is (a) a club or a diamond,
(b) a club or a king.

Solution 2.8 The possibility space S = (the pack of 52 cards) so n(S) = 52.
Let C be the event ‘a club is drawn’, D be the event ‘a diamond is
drawn’, K be the event ‘a king is drawn’.

OuE Taigegy ae BGTee Oy


(a) P(el
nD)
Pidiamond). = =.= =a = t>
ieutug
ns) 52 4
Now the events C and D are mutually exclusive since they cannot
occur together; a card cannot be both a club and a diamond.
Therefore
P(club Udiamond) = P(club) + P(diamond)
bo. oP 52 Ss
ei ae C D
=
mae)
ote
The probability of drawing a club or a diamond is 15

(b) P(club) = ae
52

noe ee
n(S) 52
Now P(king club) = P(king of clubs)
1
52
A CONCISE COURSE IN A-LEVEL STATISTICS
88
both
The events C and K are not mutually exclusive as a card can be
a king and a club.
Therefore
P(club U king) = P(club) + P(king) — P(club M king)
;

2" T@
1s Ace ee
B2 52 52
= - a

= = co aN

The probability of drawing a club or a king is =

In this example we could have noted straight away that the event ‘a
club or a king is drawn’ has 16 sample points: )

(Ag, 24,34, 44, 54, 54, 74, 84,949,104,


Ke, Ka,,
Ja,Qa,Ka Ky)
and the possibility space has 52 sample points
so P(clubs U king) = is =e as before.

2b
Exercise

An ordinary die is thrown. Find the households have a black and white set
probability that the number obtained and 7 households have a colour and a
is (a) even, (b) prime, (c) even or prime. black and white set. Find the proba-
bility that a household chosen at random
In a group of 30 students all study at owns a colour television set.
least one of the subjects physics and
biology. 20 attend the physics class and For events A and B it is known that
21 attend the biology class. Find the P(A) = P(B) and P(ANMB) = 0.1 and
probability that a student chosen at P(AUB) = 0.7. Find P(A).
random studies both physics and biology.
The probability that a boy in class 2 is in
From an ordinary pack of 52 playing the football team is 0.4 and the proba-
cards the seven of diamonds has been bility that he is in the chess team is 0.5.
lost. A card is dealt from the well-shuffled If the probability that a boy in the class
pack. Find the probability that it is (a) a is in both teams is 0.2, find the proba-
diamond, (6b) a queen, (c) a diamond or bility that a boy chosen at random is in
a queen, (d) a diamond or a seven. the football or the chess team.

For events A and B it is known that Two ordinary dice are thrown. Find the
P(A) = 3,(AUB) = 3 and P(ANB) = 35. probability that the sum of the scores
obtained (a) is a multiple of 5, (b) is
Find P(B).
greater than 9, (c) is a multiple of 5 or
In a street containing 20 houses, 3 house- is greater than 9, (d) is a multiple of 5
holds do not own atelevision set; 12 and is greater than 9.
PROBABILITY 89

9. Given that P(A) = 2 P(B) = 4 and probability that (a) at least one 6 is
P(ANB) = +; find P(A UB). thrown, (b) at least one 3 is thrown,
(c) at least one 6 or at least one 3 is
10. Two ordinary dice are thrown. Find the thrown.

EXHAUSTIVE EVENTS

Result 5

If two events A and B are such that AUB = S then P(AUB) = 1


and the events A and B are said to be exhaustive.

For example
(i) Let S = (1, 2,3,4,5,6,7,8,9,10).
IfA = (1, 2,3,4,5,6) and B = (5,6, 7,8,9,10) then AUB=S
and A and B are exhaustive events.

(ii) Let S be the possibility space when an ordinary die is thrown.


If A is the event ‘the number is less than 5’ and B is the event
‘the number is greater than 3’ then the events A and B are
exhaustive as AUB=S.

Example 29 Events A and B are such that they are both mutually exclusive and
exhaustive. Find a relationship between A and B. Give an example
of such events.

Solution 29 If A and B are mutually exclusive then P(AUB) = P(A)+P(B)


If A and B are exhaustive then P(AUB) = 1
Therefore
P(A)+P(B) = 1
so P(B) = 1—P(A) s
But P(A) = 1—P(A)
Therefore P(B) = P(A)
i.e. pee

Similarly Anan
Toss a coin. Let A be the event ‘a head is obtained’, B be the event
‘a tail is obtained’.

Now A and B are mutually exclusive, as the coin cannot show both
a head and a tail. e

A and B are exhaustive as the probability that the outcome is a


head or a tail is 1.
Therefore A and B are both mutually exclusive and exhaustive.
90 A CONCISE COURSE IN A-LEVEL STATISTICS

CONDITIONAL PROBABILITY
Result 6

If A and B are two events and P(A) #0 and P(B) #0, then the
probability of A, given that B has already occurred is written :
P(A|B)

and .: P(A|B)
(A|B) = = PANB)
PB) :
Illustrating this by means of the Venn diagram, the possibility space
is B, since we know that B has already occurred.
n(ANB)
P(A|B)
n(B)
t
‘s
t/n
s/n
P(ANB)
P(B)
This result is often written

P(AMB) = P(AIB)-P(B)

NOTE: if A andB are mutually exclusive events then, as


P(AMB) = 0 and P(B) # 0, it follows that P(A|B) = 0.

Example 2.10 Given that a heart is picked at random from a pack of 52 playing
cards, find the probability that it is a picture card.

Solution 2.10 We require


P(picture card ™ heart)
P(picture card| heart) =
P(heart)
3/52
"13/52
3
13
The probability that it is a picture card, given that it is a heart,
isao=.
13
PROBABILITY 91

Example 2.11 When a die is thrown, an odd number occurs. What is the proba-
bility that the number is prime?

P(prime M odd)
Solution 2.11 P(prime|odd) =
P(odd)
2/6
(The odd prime numbers are 3 and 5)
3/6
2
3
The probability that the number is prime, given that it is odd, is5.

Result 7

As P(A|B) = Bee wehave P(AMB) = P(A|B)-P(B)


P(B)
It follows that

P(B|A) = ee P(BN A) = P(B\A)-P(A)


P(A) ;
Now
P(ANB) = P(BNA)
Therefore _ P(A|B)-P(B) = P(B|A)-P(A)

Example 2.12 Two tetrahedral dice, with faces labelled 1,2, 3 and 4, are thrown
and the number on which each lands is noted. The ‘score’ is the sum
of these two numbers. Find the probability that (a) the score is
even, given that at least one die lands on a 3, (b) at least one die
lands on a 8, given that the score is even.

Solution 2.12 There are 16 sample points in the possibility space S, as shown in
the diagram, so n(S) = 16.
Let A be the event ‘at least one die lands on a 3’ and let B be the
event ‘the score is even’.

die
Second

First die
92 A CONCISE COURSE IN A-LEVEL STATISTICS

The sample space A is ((1, 3), (2, 3), (3, 3), (4, 3), (3, 1), (3, 2), (3, 4),
sO
n( A) ss
n(A) = 7 and P(A) = n(S) 16

B is ((1,1), (1,3), (2, 2), (2, 4), (3,1), (3, 3), (4, 2),
Sample space
(4, 4)).
B has been marked A on the diagram.

We have B)
n(B) = 8 an d P(B)
(B) = we
n(S) = es
16

There are 3 sample points which are in both A and B, so .

pe n(AMAB) 3
PANS) = ———__ = a
n(S) 16

(a) P(score is even| at least one die lands on a 3) = P(B |A)


_ P(BNA)
P(A)
_ 3/16
~ 7/16
Le
a?
Therefore the probability that the score is even, given that at least
one die lands on a 8, is3.

NOTE: this result could have been obtained directly from the
diagram. The possibility space has been reduced to the 7 sample
points in A. For 3 of these the event B occurs, so P(B| A) = 2.

(b) P(at least one die lands on a 3|score is even) = P(A|B).


Using P(A|B)-P(B) = P(B|A)-P(A)

we have P(A|B)-
= = |=}(—
16 7/\16
(3) GI

P(A|B)
(AIB) == =5
Therefore the probability that at least one die lands on a 38, given
that the score is even, is 3.

NOTE: the possibility space has been reduced to the 8 sample


points in B. For three of these the event A occurs, so P(A|B) = 3.
PROBABILITY 93

Example 2.13 A bag contains 10 counters, of which 7 are green and 8 are white. A
counter is picked at random from the bag and its colour is noted.
The counter is not replaced. A second counter is then picked out.
Find the probability that (a) the first counter is green, (b) the first
counter is green and the second counter is white, (c) the counters
are of different colours.

Solution 2.13 (a) Let G, be the event ‘the first counter is green’.
7
P(G,) = = (as there are 10 counters, of which 7 are green)
Go sal
The probability that the first counter is green is i:

(b) Let W, be the event ‘the second counter picked is white’. Now
3 a
P(W,|G,) = — = — (as there are 9 counters in the bag,
9 of which 3 are white)

Bis
We require Il P(W,|G,)-P(G,)
P(W2 G,) =

II

The probability that the first counter is green and the second
counter is white is a

+ P(G,
(c) With obvious notation, we require P(W,M G,) Wj).
tai
Now P(W,) = = and P(G,|W,) =. Therefore
P(G,0 W,) = P(G2|W,)-P(Wi)

=a
TA toe

7
eG
it a
So P(W, 1 Gy) +P(G2N Wi) = 30 ra
sj
~ 15
the proba
re the
Therefoore that the counters are different colours
lity *e
probabibly
Theref
Ged
1S 5°
94 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 2c

A card is picked at random from a pack known to be yellow, (c) if one is known
of 20 cards numbered 1, 2, 3,..., 20. Given to be yellow ticket numbered 1?
that the card shows an even number, find (SUJB Additional)
the probability that it is a multiple of 4.
A number is picked at random from the
digits 1, 2,...,9. Given that the number
If P(A|B) = 2, P(B) = }, P(A) = 3, find is a multiple of 3, find the probability
(a) P(B| A), (b) PAMB). that the number is (a) even, (b) amultiple
of 4.
Two digits are chosen at random from a
table of random numbers containing the Two tetrahedral dice are thrown; one is
digits 0,1,2,...,9. Find the probability red and the other is blue. The number on
that (a) the sum of the two numbers is which each lands is noted, the faces being
greater than 9, given that the first number marked 1, 2, 3 and 4. Find the probability
is 3, (b) the second number is 2, given that (a) the sum of the numbers on
that the sum of the two numbers is greater which the dice land is 6 given that the red
than 7, (c) the first number is 4, given die lands on an odd number, (6) the blue
that the difference between the two die lands on a 2 or a 3, given that the red
numbers is 4. die lands on a 2.

A bag contains 4 red counters and 6 black 10. If oo A and B are such that P(A) = 5,
counters. A counter is picked at random P(B) = # and P(A|B)= 0. (a) Find
from the bag and not replaced. A second P(A UB). (b) Are events A and B
counter is then picked. Find the proba- exhaustive? (Give a reason.)
bility that (a) the second counter is red,
given that the first counter isred, (b) both 11. A and Biare two events such that P(A) = 4,
counters are red, (c) the counters are of P(B) = 3 and P(AMB) =. Are A and B
different colours. ee events?

Two cards are drawn successively from 12. A and B are Sree events and it =
an ordinary pack of 52 playing cards and known that P(A|B)= i and PBy= z
kept out of the pack. Find the probability Find P(A).
that (a) both cards are hearts, (b) the
first card is a heart and the second card is 13. Give two examples of events which are
a spade, (c) the second card is a diamond, both mutually exclusive and exhaustive.
given that the first card is a club. 14. Two coins are tossed. A is the event ‘at
least one head is obtained’. Describe an
X and Y ae two events such that P(X) = z,
event B such that A and B are exhaustive
P(X|Y)=3and P(Y|X)=2.Find events.
(a) P(XNY), (b) P(Y), (c)(XU Y).
15. If
A box contains two yellow and two black & = { (x,y): x and y are positive integers }
tickets numbered 1 and 2. Two tickets
A =({(x,y):2<x<5and1<y<4}
are drawn from the box. Indicate the
sample space by listing all possible pairs = (x,y): 44 y= 5}
of results. C ={(x,y): y= 2}
What is the probability that both tickets Find the probability that a member of A
drawn will be yellow, (a) if nothing is chosen at random will also be a member
known about either of them, (0) if one is of (a) B, (b) C, (c) BNC, (d) BUC.

INDEPENDENT EVENTS

Result 8 If the occurrence or non-occurrence of an event ;


influence in any way the probability of an event B, then event Bi
is
independent of event A and P(B| A)= P(B).
PROBABILITY 7
95
If events A and B are independent, then P(A|B) = P(A)
and P(B|A) = P(B)
Now “P(ANB) = P(A|B)-P(B)

Therefore P(ANB) = P(A)-P(B)


This is known as the multiplication law for independent events.
NOTE: If two events are mutually exclusive, P(A MB) = 0. So for
two events to be both independent and mutually exclusive we must
have P(A)-P(B) = 0. This is possible only if either P(A) = 0 or
P(B) = 0.

Example 2.14 A die is thrown twice. Find the probability of obtaining a 4 on the
first throw and an odd number on the second throw.

Solution 2.14 Let A be the event ‘a 4 is obtained on the first throw’, then P(A) = i
Let B be the event ‘an odd number is obtained on the second throw’.

Now the result on the second throw is not affected in any way by
. the result on the first throw. Therefore A and B are independent
events and P(B) = Sicyt
62k
As A and B are independent events
P(ANB) = P(A)-P(B)

= (@)
ee aly al

= li
a2
The probability that the first throw results in a 4 and the second
throw results in an odd number is os

Example 2.15 A bag contains 5 red counters and 7 black counters. A counter is
drawn from the bag, the colour is noted and the counter is replaced.
A second counter is then drawn. Find the probability that the first
counter is red and the second counter is black.

Solution 2.15 Let R, be the event ‘the first counter is red’.


5
Then (R,) = =—
P(R,;)

Let B, be the event ‘the second counter is black’.


Now, as the first counter is replaced before the second draw is made
R, and B, are independent events.
96 A CONCISE COURSE IN A-LEVEL STATISTICS

7
Now P(B2) = 12

and P(R,NB,) = P(R,)-P(B2)

Pica
35
5 7

~ 144
The probability that the first counter is red and the second counter
is black is =.

Example 2.16 A fair die is thrown twice. Find the probability that (a) neither
throw results in a 4, (b) at least one throw results in a 4.

Solution 2.16 Let A be the event ‘the number on the first throw is 4’.
Let B be the event ‘the number on the second throw is 4’.

Now P(A) = zsso P(A) = :where A is the event ‘the number on the
first throw is nota 4’.

Similarly P(B) = 2.
NOTE: Aand Bare independent events.

(a) P(neither throw results in a 4) P(ANMB)


= P(A)-P(B)

-A 5\/5

25
36
The probability that neither throw results in a 4 is 2.

(b) P(at least one throw results in a 4) = 1 — P(neither results in a 4)

oh ee ES
36

ou
36
The probability that at least one throw results in a4 isae

Example 2.17 Events A and B are such that P(A) = =and P(ANB)= = If A and
B are independent events, find (a) P(B), (b) P(AUB).
PROBABILITY 97

Solution 2.17 (a) As A and B are independent events P(ANB) = P(A)-P(B)


eL a
so —12 = —P(B)
3
i
Hence P(B) = —
4

(b) Now P(AUB) = P(A)+P(B)—P(ANB)


Lene iL
so PAU BB) asim airs
Sine Anke

i
2
Therefore P(B)= fand P(A UB) =

Example 2.18 Two events A and B are such that P(A)= i P(A|B) = 5and
P(B|A) =
(a) Are A and B independent events? (b) Are A and B mutually
exclusive events? (c) Find P(AMB). (d) Find P(B).

Solution 2.18 (a) If A and B are eee events then P(A|B) = P(A).
Now P(A|B)= 5and P(A)=
Therefore ere # P(A) and . and B are not independent events.

(b) If A and B are mutually exclusive events then P(A|B) =


But we are given that P(A|B) = 5
Therefore A and B are not mutually exclusive events.

(c) Now P(ANMB) P(B|A)-P(A)

- (NC
ie
«6
Therefore P(AM B) = &.

(d) Now P(A|B)-P(B) = P(B|A)-P(A)

sO ra
IG
$P(B) = ie

P(B)
ees(Bt= os
98 A CONCISE COURSE IN A-LEVEL STATISTICS

in
Example 2.19 The probability that a certain type of machine will break down
s
the first month of operation is 0.1. If a firm has two such machine
the same time, find the probabi lity that, at
which are installed at
the end of the first month, just one has broken down.
-
Solution 2.19 We assume that the performances of the two machines are indepen
dent.
Let A be the event ‘machine 1 breaks down’ and let B be the event
‘machine 2 breaks down’.

Then, if just one machine breaks down, either machine 1 breaks


down and machine 2is still working, or machine 2 breaks down and
machine 1is still working. Therefore we require
P(ANB)+P(ANB) P(A)-P(B) + P(A) -P(B)
(0.1)(0.9) + (0.9)(0.1)
= 0.18
NOTE: A and B are independent events, as are A and B.
Therefore the probability that after 1 month just one machine has
broken down is 0.18.

a SS SS Se

Exercise 2d a
init ee ee ee ee ee

“1, A die is thrown twice. Find the proba- is 0.05. Find the probability that, on two
bility of obtaining a number less than 3 consecutive mornings, (a) I am late for
on both throws. work twice, (b) I am late for work once.

A card is picked from a pack containing Events A and B are such that P(A) = 2 and
52 playing cards. It is then replaced and a P(B) = 4. If A and B are independent
second card is picked. Find the probability
events, find (a2) PPANMB), (b) P(A OB),
that (a) both cards are the seven of
(c) PAN B).
diamonds, (b) the first card is a heart
and the second card is a spade, (c) one If events A and B are such that they are
card is from a black suit and the other is independent and P(A) = 0.3, P(B) = 0.5,
from a red suit, (d) at least one card is a find (a) PPAMB), (b) (AUB).
queen.
Are events A and B mutually exclusive?
A coin is tossed and a die is thrown. What
is the probability of obtaining a head on Events A and B are such that P(A) = 2
the coin and an even number on the die? P(A|B)= 3, P(B) = §. Find (a) P(B|A),
Two men fire at a target. The probability (b) (ANB).
that Alan hits the target is 5 and the In a group of 120 girls, each is either
probability that Bob does not hit the freckled or blonde or both; 80 are
target is . Alan fires at the target first, freckled and 60 are blonde. A girl is to
then Bob fires at the target. Find the be chosen at random from the group. A
probability that (a) both Alan and Bob is the event ‘a freckled girl is chosen’ and
hit the target, (b) only one hits the B is the event ‘a blonde girl is chosen’.
target, (c) neither hits the target. (a) Calculate P(AMB). (b) State, giving a
reason, if you think A and B are indepen-
The probability that I am late for work dent events. (L Additional)
PROBABILITY 99

10. A and B are independent events and 11. The probability that I have to wait at the
P(A) = L. P(B) = 3. Find.the probability ee Reed:
traffic lights on my way to school is a
Find the probability that, on two con-
that (a) both A and B occur, (b) only secutive mornings, I have to wait on at
one occurs. least one morning.

Result 9

For events
Aand B we have P(B) = P(BNA)+P(BNA).
Illustrating this by means of the Venn diagram:

P(B)

n n

P(BNA)+P(BNA)
This result is often written

P(B) = P(B|A)-P(A) + PB |A)-P(A)

Example 2.20 The probability that it will be sunny tomorrow is 7 If it is sunny,


the probability that Susan plays tennis is :. If it is not sunny,
the probability that Susan plays tennis is 2.Find the probability
that Susan plays tennis tomorrow.

Solution 2.20 Let A be the event ‘it is sunny tomorrow’ and let B be the event
‘Susan plays tennis tomorrow’.
Then A is the event ‘it is not sunny tomorrow’.
P(A) = § and P(A) = 2; also P(B|A) =# and P(B| A) = 2.

Sieh
We require P(B) lI P(B|A)-P(A) + P(B|A)-P(A)

II

Therefore the probability that Susan plays tennis tomorrow is ie


O
fee a OM eee aest

Example 2.21 If events A and B are independent, show that events A and B are
independent.
100 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 2.21 Now P(B) P(BNA)+P(BNA)


sO P(BN A) P(B)—P(BNA)
P(B)—P(B)-P(A) as A and B are
P(B)[1—P(A)] independent

P(B)-P(A)
Therefore P(B A) = P(B)- P(A) and so A and B are independent.

Exercise 2e

1. A bag contains 6 white counters and 4 blue buying none. Calculate the probability
counters. A counter is drawn, its colour is that the winning ticket will be bought by
noted and it is not put back into the bag. A a boy. (L Additional)
second counter is then drawn. Find the 7 ;
probability that the second counter drawn 5. P(X) = 5 and P(Y) = Z. Given that X and Y
is blue. are mutually exclusive, find (a) P(X UY),
2. Ina restaurant 40% of the customers (b) P(Y MX).
choose steak for their main course. If a
6. It is estimated that one-quarter of the
customer chooses steak, the probability
drivers on the road between 11 p.m. and
that he will choose ice cream to follow is
midnight have been drinking during the
0.6. If he does not have steak, the proba-
evening. If a driver has not been drinking,
bility that he will choose ice cream is 0.3.
the probability that he will have an accident
Find the probability that a customer
at that time of night is 0.004%; if he has
picked at random will choose (a) steak
been drinking, the probability of an
and ice cream, (b) ice cream,
accident goes up to 0.02%. What is the
3. Events C and D are such that P(C) = 4, probability that a car selected at random
P(CND) = . P(C|D) = 1% Find at that time of night will have an accident?
(a) (COD), (b) P(D), (c) (DIC). A policeman on the beat at 11.30 p.m. sees
a car run into a lamp-post, and jumps to
4. Exactly 60% of the members of a form are the conclusion that the driver has been
boys, and 90% of these boys and 75% of drinking. What is the probability that he
the girls each buy one raffle ticket, the rest is right? (SMP)

SUMMARY — PROBABILITY LAWS

For a finite possibility space S with equally likely outcomes,


and a subset E of S,

we)
ais,
0<P(E)<1
P(S) =1
P(E)+ P(E) =1
PROBABILITY , 101

P(A UB) = P(A) + P(B)—P(ANB)


If A and B are exhaustive, then P(A UB) =1
If A and B are mutually exclusive, then P(ANM B) = 0
and P(AUB) = P(A) + P(B) Addition law for mutually
exclusive events

P(ANB) P(BN A)
P(A|B) = on) P(B\A)=
P(A)
i.e. P(ANB) = P(A|B)-P(B) i.e. P(BN A) =P(B|A)-P(A)
so that P(A|B)P(B) = P(B|A)P(A)
If A and B are independent, P(A |B) = P(A)
and P(AM B) = P(A)-P(B) Multiplication law for
independent events

If A and B are mutually exclusive, P(A MB) = 0


so that P(A|B) = 0

If P(A) # 0 and P(B) = 0 then events A andB cannot be both in-


dependent and mutually exclusive, as P(A M B) = P(A)-P(B) #0

P(A) = P(ANB)+P(ANB)
or P(A) = P(A|B)-P(B)+ P(A|B)-P(B)

a Te

Miscellaneous Exercise 2f
SESSILIS ESO ne

Bag A contains 5 red and 4 white counters. A domino is drawn from the set. Let the
Bag B contains 6 red and 3 white counters. event A be ‘The domino is a double’,
A counter is picked at random from bag A event B be ‘The sum of the spots is 6’ and
and placed in bag B. A counter is now event C be ‘The number of spots at each
picked from bag B. Find the probability end differ by more than 3’. On graph paper
that this counter is white. draw a diagram to represent the possibility
space with, for example, the point (1, 2)
In a set of 28 dominoes each domino has representing the selection of the domino
from 0 to 6 spots at each end. Each domino shown in the figure. On your diagram mark
is different from every other and the ends clearly the set of elements associated with
are indistinguishable so that, for example, each of the events A, B and C. Using your
the two diagrams in the figure represent diagram find the probability that (a) both
the same domino. A and B occur, (b) both A and C occur,
(c) both B and C occur.
State a pair of events which are indepen-
dent and also a pair which are mutually
exclusive. Find the probability that A
A domino which has no spots at all or the
occurs and B does not occur.
same number of spots at each end is called (L Additional)
a ‘double’.
A CONCISE COURSE IN A-LEVEL STATISTICS

The probability that a person in a parti- (You may leave your answers as fractions
in their lowest terms.) (O &C)
cular evening class is left-handed is z. From
the class of 15 women and 5 men a person At a féte the vicar has a board in the
is chosen at random. Assuming that ‘left- shape of a circle, having sectors coloured
handedness’ is independent of the sex of a red and green, with an arrow which can be
person, find the probability that the spun above it: you have to try to guess
person chosen is a man or is left handed. the colour on which the arrow will come
to rest when it is next spun. It is made so
Two events A and B are such that that the results of successive spins are
P(A) = 0.2, P(A' NB) = 0.22, independent, and
P(ANMB) = 0.18.
P(the arrow rests on red) = 0.6
Evaluate (a) P(AMB’), (b) P(A|B). (JMB)
Find the probability of guessing correctly
(NOTE: B’ is the event ‘B does not occur’.) (i) if you always guess ‘green’;
(ii) if you toss a fair coin and guess ‘green’
In a group of 100 people, 40 own a cat, 25
if it comes down ‘head’ and ‘red’ other-
own a dog and 15 own a cat and a dog.
Find the probability that a person chosen wise;
(iii) if your guess is always the colour the
at random (a) owns a dog or a cat,
arrow is resting on before the spin. (SMP)
(b) owns a dog or a cat, but not both,
(c) owns a dog, given that he owns a cat, Two soldiers, Alan and Bill, are shooting at
(d) does not own acat, given that he owns a target with independent probabilities of
a dog.
2 and & respectively of hitting the bull
The two events A and B are such that with a single shot. If they each fire two
P(A) = 0.6, P(B) = 0.2, P(A|B) = 0.1. shots, copy and complete the tables which
Calculate the probabilities that (i) both show the possible outcomes, together with
of the events occur, (ii) at least one of the their associated probabilities.
events occur, (iii) exactly one of the Alan’s two shots at the target
events occurs, (iv) B occurs, given that A

row |[al
has occurred. (JMB) Number of bulls eas aria: ge

Of a group of pupils studying at A-level in


schools in a certain area, 56% are boys and
44% are girls. The probability that a boy
Bill’s two shots at the target
of this group is studying Chemistry is :
and the probability that a girl of this group Number of bulls ewe ay

is studying Chemistry is i. Probability . zd


16
(a) Find the probability that a pupil
selected at random from this group is a Calculate the probability that
girl studying Chemistry. (a) Alan records two bulls and Bill records
(6) Find the probability that a pupil two misses,
selected at random from this group is (6) either Alan records two bulls or Bill
not studying Chemistry. records two misses with the two events
(c) Find the probability that a Chemistry not occurring simultaneously,
pupil selected at random from this group (c) the soldiers record two bulls and two
is male. misses between them. (L Additional)

EXTENSION OF RESULTS TO MORE THAN TWO EVENTS


The result P(A UB) = P(A) + P(B)—P(ANMB) can be extended for
three events A, B and C:

P(AUBUC) = P(A)+P(B)+P(C)—P(ANB)—PBOC)
—P(COA)+P(ANBNC) 2 :
To illustrate this, consider the Venn diagram with the number of
elements in each part as shown:
PROBABILITY ‘ 103

The shaded area shows

AUBUC

right hand side = P(A)+P(B)+P(C)—P(ANB)—P(BNC)


—P(CNA)+P(ANBNC)
ae CH IU ee teem ome est tz)
n n n
_(t+v)_(¢+w)
n n
(tw),
n
tn
os tCTuUrTutTrwre ty re
= n
= P(AUBUC)
= left hand side

Example 2.22 In the Good Grub Restaurant customers may (if they wish) order
any combination of chips, peas and salad to accompany the main
course. The probability that a customer chooses salad is 0.45,
peas and chips 0.19, salad and peas 0.15, salad and chips 0:25, salad
or peas 0.6, salad or chips 0.84, salad or chips or peas 0.9. Find the
probability that a customer chooses (a) peas, (b) chips, (c) all
three, (d) none of these.

Solution 2.22 Let A be the event ‘salad is chosen’, E the event ‘peas are chosen’
and C the event ‘chips are chosen’.
Then
P(A) = 0.45, P(ENC) = 0.19, P(ANE) = 0.15,
P(ANC) = 0.25, P(AVE) = 0.6, P(AUC) = 0.84,
P(AUEUC) = 0.9

(a) We require P(peas are chosen) = P(E).


Now P(AUE) = P(A)+P(E)—P(ANE)
so 0.6 = 0.45+P(E)—0.15
P(E) = 0.3
The probability that a customer chooses peas is 0.3.
104 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) We require P(C).


Now P(AUC) = P(A) +P(C)—P(ANC)
50 0.84 = 0.45+P(C)—0.25
P(C) = 0.64
The probability that a customer chooses chips is 0.64.

(c) We require PAN EMC).


Now
P(AUEUC) = P(A)+P(£E)+ P(C)—P(ANE)—-P(ENC)
—P(CNA)+P(AN ENC)
so 0.9 0.45 +.0.3+ 0.64—0.15—0.19—0.25
+P(ANENC)
P(ANENC) 0.1
The probability that a customer chooses all three is 0.1.

(d) P(customer chooses none) P(ANE NC)


1—P(AUEUC)
1009
0.1

YjasUi
iS,
AUVUEUC

EM \
ANENC

The probability that a customer chooses none of these is 0.1.

Mutually exclusive events

If events A, B and C are mutually exclusive, then


PAUBUC) = P(A) +P(B)+P(C)

This can be extended to any number of mutually exclusive events:


— P(A, UA,U...UA,) = P(A,) + P(A.) +...+P(A,)

Example 223 Records in a music shop are classed in the following sections:
classical, popular, rock, folk and jazz. The respective probabilities
PROBABILITY : 105
that a customer buying a record will choose from each section are
0.3, 0.4, 0.2, 0.05 and 0.05. Find the probability that a person
(a) will choose a record from the classical or the folk or the jazz
sections, (b) will not choose a record from the rock or folk or
classical sections.

Solution 2.23 A record cannot be classed in more than one section, so the events
are mutually exclusive. !
(a) (classical or folk or jazz) = P(classical) + P(folk) + P(jazz)
0.3+0.05+0.05
= 04
The probability that the record will be classical or folk or jazz is 0.4.

(b) P(rock or folk or classical) = P(rock) + P(folk) + P(classical)


= 0.2+0.05+0.3
= 0.55
P(not rock nor folk nor classical) = 1—0.55
= 0.45
Therefore the probability that the record is not rock nor folk nor
classical is 0.45.

The result P(AMB) = P(A)-P(B|A) can be extended for three


events A, B and Cas follows:
P(ANBNC) = P[(ANB)NC]
= P(ANB)-P[C|(ANB)]
= P(A)-P(B|A)-P(C|ANB)
So P(ANBOC) = P(A)-P(B|A)-P(CIANB)

Example 2.24 A bag of sweets contains 4 red ‘fruities’ and 5 green ones. A child
picks out 8 fruities one after the other and eats them. Find the
probability that the first is red, the second is green and the third
is red.

Solution 2.24 Let R, be the event ‘the first fruitie is red’, G, be the event ‘the
second fruitie is green’, R; be the event ‘the third fruitie is red’.
We require P(R, 1G, R3) = P(R,)-P(G2 |R,)-P(R31G,R,).
4
Now P(R,) = o

and

P(G,|R,) II | as there are now 8 fruities, of which 5 are green


106 A CONCISE COURSE IN A-LEVEL STATISTICS

P(R3/G.NR,) = as there are now 7 fruities of which 3 are red


o
|

4\/(5\/3
So P(R,NG,ZOR3) = 5a |

5
42
Therefore the probability that the first is red, the second is green
and the third is red is 2.

Independent events ;

If A, B and C are independent events, then


P(ANBOC) = P(A)-P(B)-P(C).

This result can be extended to n independent events so that


P(A,;NA,N...9A,) = P(A;)-P(A2)..-P(Ap)

Example 2.25 A die is thrown four times. Find the probability that a 5 is obtained
each time. ;

Solution 2.25 Let 5, be the event ‘a 5 is obtained on the first throw’, 5, be the
event ‘a 5 is obtained on the second throw’ and so on.
The events are independent, so
P(5,5,953M 5,4) P(5;)-P(52) -P(53)-P(54)

(a)(a(a
nf
1296
Therefore the probability that a 5 is obtained each time is =

Example 2.26 Three men in an office decide to enter a marathon race. The respec-
tive probabilities that they will complete the marathon are 0.9, 0.7
and 0.6. Find the probability that at least two will complete the
marathon. Assume that the performance of each is independent of
the performances of the others.

Solution 2.26 Let A be the event ‘the first man completes the marathon’, then
P(A) = 0.9.
PROBABILITY 107

Let B be the event ‘the second man completes the marathon’, then
P(B) = 0.7.
Let C be the event ‘the third man completes the marathon’, then
P(C) = 0.6.
P(all complete the marathon) = P(AN BMC). We will abbreviate
P(AN BMC) as P(ABC). Then
P(all complete the marathon) 'P(ABC)
P(A)-P(B)-P(C) (independent
events)
(0.9)(0.7)(0.6)
II 0.378
P(two out of the three complete the marathon) P(ABC) + P(ABC)
+ P(ABC)
(0.9)(0.7)(0.4)
+ (0.9)(0.3)(0.6)
+(0.1)(0.7)(0.6)
0.456
P(at least two complete the marathon) 0.378 + 0.456
= 0.834
Therefore the probability that at least two complete the marathon
is 0.834.

ee oe ee eee ee ee

Exercise 2g See

Three cards are drawn from a pack con- Of 24 boys ina class, 8 play rugby, 6 play
taining 52 playing cards. Find the proba- hockey and 13 play soccer. One boy plays
bility that they are a heart, club and spade, both soccer and rugby. Every boy plays at
in that order, if (a) the card is looked at least one game but not one plays all three
and then replaced after each draw, (0) the games. Two boys play both hockey and
card is not replaced after each draw. rugby. A boy is to be picked at random
from the group.
A die is thrown three times. What is the (a) Draw a possibility space diagram to
of scoring a 2 on just one illustrate the situation.
probability
occasion? (b) Calculate the probability of a boy
being selected who (i) only plays hockey,
A coin is tossed four times. Find the proba- (ii) plays both hockey and soccer.
bility of obtaining less than two heads.
(c) If S is the event ‘a boy is chosen who
plays soccer’, H is the event ‘a boy is
A box contains 4 black, 6 white and 2 red chosen who plays hockey’ and R is the
event ‘a boy is chosen who plays rugby’,
balls. Balls are picked out of the box
state, giving reasons for your answers,
without replacement. With obvious notation,
(i) two events which are independent,
find (a) P(By1 W2), (b) P(W2), (ii) two events which are mutually ex-
(c) P(ByU W2), (d) (Bi W20R3), clusive. (L Additional)
(e) P(the first three are different colours).
A CONCISE COURSE IN A-LEVEL STATISTICS

In a lucky drawafirst prize is given, then a (b) An athlete aims to measure his fitness
second, then a third prize. 8 boys and 4 by subjecting himself to a sequence of 3
girls each buy one ticket. (a) Find the physical tests, the completion of each
probability that (i) a girl has the first test in a specified time being classed by
prize, a boy the second andagirl the third, him as a ‘pass’. The probability that he
(ii) the prizes go to 3 boys. (6) If the first passes the first test in the sequence is p,
and third prizes go to members of one but the probability of passing any subse-
sex and the second to a member of the quent test is half the probability of passing
opposite sex, find the probability that the the immediately preceding test. Show that,
second prize goes to a boy. if the probability of passing all 3 tests is
se the value of p is ‘ Hence find the
probabilities (i) that he fails all the tests,
Three fair cubical dice are thrown. Find (ii) that he passes exactly 2 of the 3 tests.
the probability that (i) the sum of the (MEI)
scores is 18, (ii) the sum of the scores is
A, B and C are three events and, for
5, (iii) none of the three dice shows a
example, AUB denotes the event that
6, (iv) the product of the scores is 90.
‘either A or B or both A and B occur,
(AEB 1974)
AB denotes the event that both A and B
occur, A is the event complementary to the
event A. Find Pr(AN BMC) given that
(a) A, B and C represent 3 events. If 11
A(MB is the event that both A and B occur PHA) = Pie 9 OC yaa19
and P(B|A) is the probability that B occurs
given that A has already occurred, show
PANE) EBay
that 16 16
P(ANB) = P(A)-P(BIA)
Deduce, or show otherwise, that
Pr(BNC) = a Pr(AIBNC) = .
P(ANBNC) = P(A)-P(B|A)-P(C| ANB) (MEI)

PROBABILITY TREES

A useful way of tackling many probability problems is to draw a


‘probability tree’. The method is illustrated in the following
examples.

Example 2.27 A bag contains 8 white counters and 3 black counters. Two counters
are drawn, one after the other. Find the probability of drawing one
white and one black counter, in any order, (a) if the first counter is
replaced, (b) if the first counter is not replaced.

Solution 2.27 (a) Withreplacement Let


W, be the event ‘a white counter is drawn first’,
W, be the event ‘a white counter is drawn second’,
B, be the event ‘a black counter is drawn first’,
B, be the event ‘a black counter is drawn second’.
The results of the first draw and the second draw are shown on the
‘tree’ opposite. As the counter is replaced after the first draw the
events along any one ‘branch’ of the tree are independent.
We have P(W, W2) = P(W,)-P(W)
PROBABILITY } 109
We multiply terms as we go along the branch.

PIW,AW;) = (F)(2)
11/ \11
=e121

| 3 24
P(B, AW) = |—||—) = —
ve alte 121

SUN 9
Iststd draw 2nd draw P(B, 1 OB.) Bs) = lili
|— =|ee 1

NOTE: these events are mutually exclusive, so check that the sum
of the probabilities is 1.
P(drawing one white and one black counter) P(W,B>)
+ P(B, OW)
——
24 +
2A———

121 121
48
121
The probability of drawing one black and one white counter if the
counter is replaced after the first draw isaaa

(b) Without replacement The events along one branch are no


longer independent, but we may still multiply as we go along the
branch, using the fact
P(Wi\O\W2) = P(W,)-P(W2|W,) and so on

pm.owy = (2)(2)
-
te a “(7 fr ~ 110

pte,082)=(=(2| = 35
(8, = (FHig 110
110 | R=)
1st draw 2nd draw Total = 110 |
110 A CONCISE COURSE IN A-LEVEL STATISTICS

P(drawing one white and one black counter) = P( W,OB,)


+ P(B,O W2)
24 24
= —+—
110 110

eee
#110
— 24
55
The probability of drawing one black and one white counter if the
counter is not replaced after the first draw is 2.

NOTE: the diagram can be made simpler if, instead of writing


P(W,|W,) = a on the second branch, we write P(W,) = i as the
diagram makes it clear that event W, has already occurred. The dia-
gram then becomes:

1st draw 2nd draw

Example 2.28 The probability that a golfer hits the ball on to the green if it is
windy as he strikes the ball is 0.4, and the corresponding probability
if it is not windy as he strikes the ball is 0.7. The probability that
the wind will blow as he strikes the ball is 0.3.
Find the probability that (a) he hits the ball on to the green, (b) it
was not windy, given that he does not hit the ball on to the green.

Solution 2.28 Let W be the event ‘it is windy’, then P(W) = 0.3 and P(W) = 0.7.
Let H be the event ‘he hits the ball on to the green’.
Then P(H|W) = 0.4 and P(H| W) = 0.7.
PROBABILITY 177

We can draw a probability tree as follows:

P(W MH) = (0.3)(0.4) = 0.12

P(W OA) = (0.3)(0.6) = 0.18

P(W OH) = (0.7)(0.7) = 0.49

P(WOA) = (0.7)(0.3) = 0.21


windy hitting the green
Total = 1

(a) Werequire P(H) = PPHNW)+P(HNW)

0.12+0.49

= 0.61

The probability that he hits the ball on to the green is 0.61.

prmtrereca
(b) We require aWwiny
( == 240»
PH)

Now gee§y oak inles


= 1—0.61

So P(W\) = —

= 0.54 (2d.p.)

The probability that it was not windy, given that he does not hit
the ball on to the green, is 0.54 (2 d.p.).

= :.
Example 2.29 Events A and B are such that P(A) = 1 P(B|A) = @ and P(B|A)

By drawing a tree diagram, or otherwise, find (a) P(B|A),


(b) (ANB), (c) P(B), (a) AUB).
112 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 2.29 If we draw atree diagram and put on it the information given, we have

From the diagram, it is obvious that P(A) = 2, P(B|A) = sand


P(B|A) = E as the ‘total’ probability for each set of branches is 1.
The completed tree diagram is

. 3
(a) P(BIA) =F

1
(b) P(ANB) = 75

are y I
(c) P(B) = P(BNA)+P(BNA) =—+ ope
12 15 60

(a) P(AUB)= 1—P(ANB) =1 = —7


-
15 15
; Ne
PROBABILITY

Example 2.30 Afair coin is tossed three times. What is the probability of obtaining
(a) exactly two heads, (b) at least two heads?

Solution 2.30 We now extend the tree to include three events.


Let H, be the event ‘a head occurs on the first toss’, T, be the event
‘a tail occurs on the second toss’, and so on.

1
P(H,O.H>9 A) = 5

1
P(H,QHNT3) = a

1
PUHy NT 2 Ha) ==

P(H, OT: Te) =


o|—
1
P(T,0 H20H3) ==

P(T,; Ha MTs) =

PITA T20H3) = o|-


wi|—

TATION UR OE) =
co|—
1st toss | 2nd toss 3rd toss

(a) P(exactly two heads) = P(H,1H,9T3)+ P(A, T,H;3)


+ P(T, 1H, A3)

+ —
1
+
Co
|e 8

co
cole
|e

The probability that exactlyA two heads are obtained is3.


eee eee) SABRE A Se BOY POEL et Be See ee

(b) P(at least two heads) = P(two heads and atail)


+ P(three heads)

The probability that at least two heads are obtained is :


cape neiotiee, spebres Merstipe
A CONCISE COURSE IN A-LEVEL STATISTICS
es

Exercise 2h _

The probability that a biased die falls Draw a tree diagram to show all the
showing a 6 is £.This biased die is thrown possible total scores and their respective
twice. probabilities after a player has completed
(a) Draw a tree diagram showing the two rounds.
possible outcomes and the corresponding Find the probability that a player has
probabilities, considering the event ‘a six (a) a score of 4 after 2 rounds, (6) an
is thrown’. odd number score after 2 rounds.
(b) Find the probability that exactly one (L Additional)
six will be obtained.
An unbiased die is now thrown.
(c) Extend the tree diagram to show the
possible outcomes, again with regard to
whether or not a 6 is thrown. ‘

(d) Find the probability that, in the three 6. Three bags, A, B and C contain counters

[Ret [Yatow
throws, exactly one 6 will be obtained. as follows:

In a class of 24 girls, 7 have black hair. 4 3


(a) If 2 girls are chosen at random from 3 6
the class, find the probability that (i) they 2, 4
both have black hair, (ii) neither has
black hair. (a) A counter is taken at random from
(b) If 8 girls are chosen at random, find each of the bags in turn and kept. Draw a
the probability that more than 1 will tree diagram to show the possible out-
have black hair. comes and find the probability that more
red counters than yellow counters are
kept.
A coin is biased so that the probability
(b) The counters are now replaced in
that it lands showing heads is 2 The coin their original bags. A counter is taken at
is tossed three times. Find the probability random from bag A and placed in bag B.
that (a) no heads are obtained, (b) more Then a counter is taken at random from
heads than tails are obtained. bag B and placed in bag C. What is the
probability that a counter now taken
from bag C is yellow?
A box contains 6 red pens and 3 blue
pens.
(a) A pen is selected at random, the
colour is noted and the pen is returned
to the box. This procedure is performed
a second, then a third time. Find the Events X and Yare such that P(X) = 3,
probability of obtaining (i) 3 red pens, P(Y|X) = $ and P(Y |X) = }. By drawing
(ii) 2 red pens and 1 blue pen, in any
a tree diagram, or otherwise, find (a) P(Y)
order, (iii) more than 1 blue pen.
(b) (XN Y), (c) P(XU Y).
(b) Repeat (a) but this time find the
probabilities if, at each selection, the pen
is not returned to the box.

In each round of a certain game a player


A mother and her daughter both enter the
can score 1, 2 or 3 only. Copy and
cake competition at a show. The proba-
complete the table which shows the
scores and two of the respective proba- bility that the mother wins a prize is :
bilities of these being scored ina single and the probability that her daugher wins
round. a prize is 2. Assuming that the two events
are independent, find the probability that
(a) either the mother, or the daughter,
but not both, wins a prize, (b) at least
one of them wins a prize.
PROBABILITY 115

9. A bag contains 7 black and 3 white bility of choosing (a) three black marbles,
marbles. Three marbles are chosen at (b) a white marble, a black marble and a
random and in succession, each marble white marble in that order, (c) two white
being replaced after it has been taken out marbles and a black marble in any order,
of the bag. (d) at least one black marble.
Drawatree diagram to show all possible State an event from this experiment
selections. which together with the event described
From your diagram, or otherwise, cal- in (d) would be both exhaustive and
culate, to 2 significant figures, the proba- mutually exclusive. (L Additional)

BAYES’ THEOREM

Now we come to an important extension of the result


P(B|A)-P(A)
Ca.) eS
P(B)
Suppose A,, A>,...,A, are n mutually exclusive and exhaustive
events so that A, UA,U...UA, =S, the possibility space, and B is
an arbitrary event of S, then

P(A; |B) =
P(B|A,) -P(Ai)
/P(BIA;)-P(A) + PBI Ay)P(A) +... + PBIA,)-PCAn)
fori = 1,2..."

This is known as Bayes’ theorem and is useful when we have to


‘reverse the conditions’ in a problem.

Proof Ss

P(B) = P(BNA+ ,)P(BO Az) +...+P(BM An)


P(B|A,)-P(A,) + P(B|Az)-P(A2) +... + P(B| An) -P(An)
Now
P(B|A;)-P(A;)
P(A;|B) = rae

so

eat ak) P(B |8)


A,)-P(Aj)
A
P(A; |B) =
P(B|A,)-P(A; ) + P(B|A2 )-P(A2 ) +... + P(BI An)-P(An)
to use if
The formula looks very complicated, but in fact it is easy
lity of B.
you remember that the denominator is the total probabi
116 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 2.31 Three girls, Aileen, Barbara and Cathy, pack biscuits in a factory.
From the batch allotted to them Aileen packs 55%, Barbara 30%
and Cathy 15%. The probability that Aileen breaks some biscuits in
a packet is 0.7, and the respective probabilities for Barbara and
Cathy are 0.2 and 0.1. What is the probability that a packet with
broken biscuits found by the checker was packed by Aileen?

Solution 2.31 Let A be the event ‘the packet was packed by Aileen’, B be the
event ‘the packet was packed by Barbara’, C be the event ‘the
packet was packed by Cathy’, D be the event ‘the packet contains
broken biscuits’.
We are given P(A) = 0.55, P(B) = 0.8, P(C) = 0.15
and P(D| A) = 0.7, P(D|B) = 0.2, P(D|C) = 0.1.
We require P(A |D). So we use Bayes’ theorem to ‘reverse the con-
ditions’:

P(A|D) = P(D|A)-P(A)
P(D)

Now P(D) is the ‘total’ probability of D, that is the probability


that a packet contains broken biscuits. This can be found very
easily from the tree diagram. The outcomes resulting in a packet
with broken biscuits are shown with an asterisk.

0.7 * P(D\|A)P(A) = (0.7) (0.55)

* P(D|B)P(B) = (0.2) (0.3)

* P(D|C)P(C) = (0.1) (0.15)

Packer State of biscuits

P(D) = P(D\A)-P(A)+P(D|B)-P(B) + P(D|C)-P(C)


= (0.7)(0.55) + (0.2)(0.3) + (0.1)(0.15)
= 0.46
As shown in the tree diagram
P(D\A)P(A) = (0.7)(0.55)

Therefore P(A|D) = ee
0.46
0.837 (3d.p.)
PROBABILITY 117

The probability that a packet with broken biscuits was packed by


Aileen is 0.837 (3 d.p.).

Example 2.32 Three children, Catherine, Michael and David, have equal plots in a
circular patch of garden. The boundaries are marked out by pebbles.
Catherine has 80 red and 20 white flowers in her patch, Michael has
30 red and 40 white flowers and David has 10 red and 60 white
flowers. Their young sister, Mary, wants to pick a flower for her
teacher.
(a) Find the probability that she picks a red flower if she chooses
a flower at random from the garden, ignoring the boundaries.
(b) Find the probability that she picks a red flower if she first
chooses a plot at random.
(c) If she picks a red flower by the method described in (b), find
the probability that it came from Michael’s plot.

Solution 2.32 (a) If the boundaries are ignored


The possibility space S = (flowers in the garden).
We have n(S) = 100+70+70 = 240
Let R be the event ‘a red flower is chosen’,
then n(R) = 80+380+10 = 120
n(R)
Then P(R) =
n(S)
120
240
dL
2
The probability that Mary picks a red flower if she ignores the
boundaries is .

(b) A plot is chosen first. Each of the three plots is equally likely
to be chosen.
Let C be the event ‘Catherine’s plot is chosen’, then P(C) = 5

With similar notation P(M) = } and P(D) = 3.

80 red
40 white
20 white

10 red
60 white
A CONCISE COURSE IN A-LEVEL STATISTICS

The outcomes resulting in event R are shown with an asterisk on


the tree diagram.

* P(R|C)P(C) = (350) (3)

* P(R|M)P(M) = (35) (3)

* P(R|D)P(D) = (35)(5)

Plot Colour of flower

Now P(R) = P(R|C)-P(C)+P(R|M)-P(M)


+ P(R|D)-P(D)

= Grol) ls
The probability that Mary picks a red flower if she chooses a plot
at random first is ee

NOTE: thetwo different results for part (a) and part (b) are slightly
surprising. In the first case, there is one group of flowers and each
flower is equally likely to be chosen. In the second case, even
though each plot is equally likely to be chosen, the proportions of
red and white flowers within these plots are different.

(c) Using Bayes’ theorem:


pn hie P(R|M)-P(M)
P(R)
Now P(R|M):P(M) = (22)(3)=4 (from tree diagram)
and P(R) = # (from part (b))
Al

Therefore P(MIR) = =
35

7116
Given that Mary picks a red flower, the probability that it came
from Michael’s plot is 2.
PROBABILITY
119

ae
Exercise 2i —

In my bookcase there are four shelves and | 5


the number of books on each shelf is as
shown in the table: Age group | 0-16 17-25 26-64 65 and over

Shelf 1 11 9 In a town the percentage of males in


Shelf 2 8 12 particular age groups are as shown in the
Shelf 3 16 4 table above. In a survey it is found that the
Shelf 4 9 3 probability that a male aged 65 or over
wearing glasses is 0.8. Similarly if a male is
(a) If I choose a book at random, irrespec- in the age group 0-16, 17-25 and 26-64
tive of its position in the bookcase, what is the probabilities of his wearing glasses are
the probability that it is a paperback? 0.2, 0.1 and 0.4 respectively. Given that a
(6) Iam equally likely to choose any shelf. particular male is wearing glasses, use
I choose a shelf at random and then choose Bayes’ formula
a book. (i) What is the probability that it
is a hardback? (ii) If the book chosen is a
Pag By = —PAW PBI)
P(A;,)-P

hardback, what is the probability that it 2X P(A,)-P(B|A,)


is from shelf 3? r=1

to calculate the probability that he is 65


I travel to work by route A or route B. or over.
The probability that I choose route A is
Similarly calculate the probabilities that he
4 The probability that I am late for work is in each of the other age groups and
if I go via route A is 2 and the corres- hence state which age group he is most
ponding probability if I go via route B likely to be in. (L Additional)
is $-
(a) What is the probability that I am late 6) In an experiment two bags A and B,
for work on Monday? containing red and green marbles are used.
(b) Given that Iam late for work, what is Bag A contains 4 red marbles and 1 green
the probability that I went via route B? marble and bag B contains 2 red marbles
and 7 green marbles. An unbiased coin is
Of the buses leaving the bus station each tossed. If a head turns up, a marble is
day, 60% are double deckers and the rest drawn at random from bag A while if a tail
are single deckers; 30% of the double turns up, a marble is drawn at random
deckers are ‘limited stop’ buses and 40% from bag B. Calculate the probability that
of the single deckers are ‘limited stop’ a red marble is drawn in a single trial.
buses. Draw a tree diagram to represent Given that a red marble is selected, cal-
this information. culate the probability that when the coin
was tossed a head was obtained.
Find the probability that a bus leaving the
(L Additional)
bus station (a) is not a ‘limited stop’ bus,
(b) is a double decker, given that it is a
‘limited stop’ bus. @ On a given day a petrol station serves
three times as many men as women.
Susan takes examinations in mathematics, Two types of petrol are available, grade A
French and history. The probability that
and grade B. Customers pay by cheque or
she passes mathematics is 0.7 and the
by cash.
corresponding probabilities for French
and history are 0.8 and 0.6. Given that 70% of the men and 40% of the women
her performances in each subject are buy grade A petrol.
independent, draw a tree diagram to show Of the men buying grade A petrol, 80%
the possible outcomes. pay by cheque, and of the men buying
grade B petrol, 60% pay by cheque.
Find the probability that Susan (a) fails
all three examinations, (0) fails just one Of the women buying grade A petrol,
examination. Given that Susan fails just half pay by cheque and of the women
one examination, (c) find the probability buying grade B petrol, 40% pay by
that she fails history. cheque.
»)

120 A CONCISE COURSE IN A-LEVEL STATISTICS

Find the probability that (a) a customer B is an arbitrary event of S such that
buys grade A petrol, (b) a customer pays P(B) # 0, show that
by cheque, (c) a woman customer pays P(A)-P(B|A)
by cheque, (d) a customer who pays by P(A|B) =
cheque for grade A petrol is a man. P(A)-P(B|A) + P(A) -P(B |A)
: ties that aon boy
(ii) The probabilibicycle goes to
;
6,) A bag contains school by bus, or foot on a
8.) 10 counters of which 4 are

pink, 3»are-green and3 "are yellow. certain day are 0.2, 0.3 and 0.5 respec-
Counters are removed at random, one at tively. The probabilities of his being
a time and without replacement. Find late by these methods are 0.6, 0.3 and 0.1

Mss probability nc en tsoy respectively. If he was late on this particular


drawa is greet, °(B) tive figs’ Lwo Grwn day, using Bayes’ theorem or otherwise,
Veale she tus travelled
calculate the probability that(LheAdditiona
ee are of re different colours, by bus. l)
three drawn
(d) the second one drawn is green, given
that the first one drawn is pink, (e) the /Y ~
third one drawn is green, given that the \ 11. . ‘If Aj, Az and A3 are mutually exclusive
first two are the same colour as each —~"_ events whose union is the sample space S
_-~. other. of an experiment and B is an arbitrary
( 9 event of S such that P(B) # 0, show that
9 A and B are two events for which P(A) = §,
= -
P(B|A) = 3, and P(B|A) = 3. oe -P
Pt dkpeapyoed oe
(a) Draw a fully labelled tree diagram >» P(A,)-P(B| A,)
with A preceding B, that is with A and A r=

on the first two branches. whee and write down the results for P(A,| B) and
(b) Calculate (i) QPANMB), (ii) QPANB), P(A3iB).
(iii) P(A UB). A factory has three machines 1,2 and 3,
(c) Draw a possibility space diagram to
illustrate both the given data and your producing a particular type of item. One
anowerd item is drawn at random from the factory’s
(d) Calculate (i) P(AIB), (ii) P( A\B). production. Let B denote the event that
(e) Use your answers to (d); to draw a the chosen item is defective and let A,
denote the event that the item was
fully labelled tree diagram with B
preceding A. (L Additional) produced on machine k where k = 1,2 or
38. Suppose that machines 1,2 and 3
t 10. ) a, s produce respectively 35%, 45% and 20% of
eee the total production of items and that
Gey P(B|A,) = 0.02, P(B| A2) = 0.01,
P(B|A3) = 0.08.
Given that an item chosen at random is
defective, find which machine was most
(i) By considering the diagram which repre- likely to have produced it.
sents the sample space S, for AU A, where (L Additional)

SOME USEFUL METHODS

(a) Problems involving an ‘at least’ situation

Example 2.33 (a) Find the probability of obtaining at least one 6 when 5 dice are
thrown.

(b) Find the probability of obtaining at least one 6 when n dice are
thrown.
(c) How many dice must be thrown so that the probability of
obtaining at least one 6 is at least 0.99?
PROBABILITY d 121

Solution 2.33 (a) In one throw P(6) = z and P(6) =2.


= 2
When 5 dice are thrown,
P(at least one 6) = 1—P (no 6s)

ff
= 1—P(66666)
5 5

6
= 0.598 (8d.p.)
The probability of obtaining at least one 6 when 5 dice are thrown
is 0.598 (3 d.p.).

(b) When n dice are thrown, P(at least one 6) = 1—(3)".

(c) We require n such that

f 5 \”
i.e. | < 0.01

Taking logs of both sides


5
nlog (=|< log (0.01)

Dividing both sides by log (2) and reversing the inequality sign
since log (2)is negative, we have

Aes log g ((0.01


* )
log (8)
= 20.3 (1. d.p.)
sO least n = 26
Therefore 26 dice must be thrown so that the probability of obtain-
ing at least one 6is at least 0.99.

(b) Problems involving the use of an infinite geometric progression (G.P.)


Many probability examples involve the use of G.P.s and the follow-
ing formula is required.
If S = atartar*+ar?+... (to infinity)
Then

aia z for |r|<1 where a is the first term


—r ris the common ratio
122 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 2.34 A, B and C, in that order, throw a tetrahedral die. The first one to
throw a 4 wins. The game is continued indefinitely until someone
wins. Find the probability that (a) A wins, (b) B wins, (c) C wins.

Solution 234 With a tetrahedral die the number ‘thrown’ is the number on which
the die lands. Therefore P(4 is thrown) = i:

(a) Let A, be the event ‘A wins on his first throw’, A, be the event
‘A wins on his second throw’, and so on.

Now

P(A wins) = P(A,)+P(A,)+P(A3)+... (mutually exclusive


events)
J s 3
P(A) = [ and P(A) = 7
ie Bee oye
P(A2) = P(A,B,C,A2) = ae ‘As

Sia
P(A3) = P(A,B,C,A,B,C,A3) — (=) (=) and so on

navn = i} BV+(RTG)
Therefore

- bay +(ah+-
ll
eH
—]S_ where Sis the sum of an infinite G.P.
ee

with a=landr= re

“Aira
-

iS 1
109

Ga
pan

a IS
e
z
16
37
The probability that A wins is e-

(b)
P(B wins) = P(B,)+P(B,)+ P(B3)+... to infinity (mutually
; exclusive events)
_ Sig 1
Now P(B;) II P(A,B,) = ale
PROBABILITY 123

P(B,) =

on pail > Bet |


P(B3) = P(A,B,C,A,B,C,A3B3) a 7] 3 and so on

So

P(B wins)

i
wien!) ||1
4]\4

oy hla)
me
37
Therefore the probability that B wins is 12
a

(c)
P(C wins) = 1—P(A wins) — P(B wins)
28
= ]-—
37

ace
37
Therefore the probability that C wins is 37
i

Exercise 2]

G) A coin is biased so that the probability that 3. A die is biased so that the probability of
it falls showing tails is 2. obtaining a 3 is p. When the die is thrown
four times the probability that there is at
(a) Find the probability of obtaining at
least one 3 is 0.9375. Find the value of p.
least one head when the coin is tossed five
times. How many times should the die be thrown
(b) How many times must the coin be so that the probability that there are no 3’s
tossed so that the probability of obtaining is less than 0.03?
at least one head is greater than 0.98?
On asafe there are four alarms which are
2. A missile is fired at a target and the proba- arranged so that any one will sound when
bility that the target is hit is 0.7. someone tries to break into the safe. If
(a) Find how many missiles should be the probability that each alarm will function
fired so that the probability that the target properly is 0.85, find the probability that
is hit at least once is greater than 0.995. at least one alarm will sound when some-
(b) Find how many missiles should be fired one tries to break into the safe.
so that the probability that the target is
not hit is less than 0.001. ) Two people, A and B, play a game. An
)

124 A CONCISE COURSE IN A-LEVEL STATISTICS

ordinary die is thrown and the first person The first boy to draw the white ball wins
to throw a 4 wins. A and B take it in turns the game. Assuming that they do not
to throw the die, starting with A. Find the replace the balls as they draw them out,
__ probability thatB wins. find the probability that Bill wins the
) ‘ game.
reat
(
6,/ A,B, Cand D throwacoin, in turn, starting If the game is changed, so that, in thelnee
with A. The first to throw a head wins. game, they replace each Rent ther i bas

oesae cae a aang 7 been drawn out, find the probabilities that:
os pee wee “ a ee A,
because the others have their first turn
itine Scorn Mit thind attem
) a ?
before him. Compare the probability that
S Ph.
D wins with the probability that A wins.
Show that these answers are terms in a
7. A box contains five black balls and one Geometric Progression. Hence find the
white ball. Alan and Bill take turns to draw probability that Alan wins the new game.
a ball from the box, starting with Alan. (SUJB Additional)
eT

ARRANGEMENTS

In order to calculate the number of possible outcomes in a possibil-


ity space and the number of sample points for an event, the following
results are often used.

Result 1 The number of ways of arranging n unlike objects in a


line is n!

NOTE: n! = n(n—1)(n—2)...(8)(2)(1).
For example, consider the letters A, B, C, D.

The first letter can be chosen in 4 ways (either A or B or C or D),


Then
the second letter can be chosen in 3 ways,
the third letter can be chosen in 2 ways,
the fourth letter can be chosen in only 1 way.

Therefore the number of ways of arranging the 4 letters is


(4)(8)(2)(1) = 4! = 24.

These are

ABCD ABDC ACBD ACDB ADCB ADBC


BCDA BCAD BDAC BDCA BACD BADC
CDBA CDAB CABD CADB CBAD CBDA
DABC. DACB DBCA DBAC DCAB_ DCBA

Example 2.35 How many different number plates can be formed if each is to con-
tain the three letters A, C and E followed by the three digits 4, 7, 8?
PROBABILITY 125

Solution 2.35 There are 3! ways of arranging the letters A, C and E, and 3! ways
of arranging the digits 4, 7 and 8.
Therefore the total number of different plates = (3!)(3!)
= 36
36 different number plates can be formed.

Result 2. The number of ways of arranging in a line n objects, of


n!!
which p are alike, is cl
p!

For example, if, instead of the letters A, B, C, D we have the letters


A, A, A, D then the 24 arrangements listed previously reduce to the
following:
AAAD AADA ADAA DAAA
So, the number of ways of arranging the 4 objects, of which 3 are
aie eo ORIG)
3! (8)(2)(4)
The result can be extended as follows:

The number of ways of arranging in a line n objects of which p


of one type are alike, g of a second type are alike, r of a third
n! .
type are alike, and so on, is ——~——.
pig...

Example 2.36 (a) In how many ways can the letters of the word STATISTICS
be arranged?
(b) Ifthe letters of the word MINIMUM are arranged in a line
at random, what is the probability that the three M’s are
together at the beginning of the arrangement?

Solution 2.36 (a) Consider the word STATISTICS.

There are 10 letters and S occurs 3 times,


T occurs 3 times,
I occurs twice.

Therefore
10!
number of ways =
3!3!2!

(10)(9)(8)(7)(6)(5)(4)(3)(2)A)
(3)(2)(1)(8)(2)(1)(2)(1)
50 400

50400 Y&oeye
are oye
There are of arranging the letters in the word
ways eee
there
STATISTICS.
126 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) Consider the word MINIMUM.


The possibility space S = (arrangements of MINIMUM).
7!
Now n(S)
312!
(7)(6)(5)(4)(8)(2)Q)
(3)(2)(1)(2)(1)
= 420

Let E be the event ‘the three M’s are placed together at thepees
of the arrangement’.

So we must have MMM....'There is only one way of arranging


A!
MMM, then the remaining 4 letters can be arranged inoI ways = 12

ways.

Therefore n(£) = 12

So P(E) n(B)
n(S)
12
420

va
35

The probability that the three M’s are together at the beginning of
an arrangement is =

Example 2.37 Ten pupils are placed at random in a line. What is the probability
that the two youngest pupils are separated?

Solution 2.37 Let the possibility space be S, then n(S) = 10!

Let E be the event ‘the two youngest pupils are together’.

Now treat these two together as one ‘item’ and so we have 9 ‘items’
to arrange.

The 9 items can be arranged in 9! ways.

But the two youngest can be arranged in 2! ways (Y, Y, or Y, Y;).

Therefore n(E) = 2!9!


PROBABILITY 127
n(£)
So P(E) II
n(S)
2!9!
10!

Now Eis the event ‘the two youngest are not together’.
So P(E) LPF)
= 1-1
1 5
a A
5
The probability that the two youngest are separated is ;.

Example 2.38 If a four-digit number is formed from the digits 1, 2, 3 and 5 and
_ repetitions are not allowed, find the probability that the number
is divisible by 5.

Solution 2.38 Let S be the possibility space, then n(S) = 4! = 24.


Let E be the event ‘the number is divisible by 5’.
If the number is divisible by 5 then it must end with the digit 5.
Therefore
n(E) = number of ways of arranging the digits 1, 2,3
= 3!
n(E)
So P(E) =
n(S)
3!
24
L
- A
The probability that the number is divisible by 5 is i
a
De lv esPea

Result 3 The number of ways of arranging n unlike objects in a


ring when clockwise and anticlockwise arrangements are
different is (n—1)!
128 A CONCISE COURSE IN A-LEVEL STATISTICS

For example, consider 4 people A, B, C and D, who are to be seated


at a round table. The following four arrangements are the same, as
A always has D on his immediate right and B on his immediate left.

A D C B

C B A D

To find the number of different arrangements, we fix A and then


consider the number of ways of arranging B, C and D.

Therefore the number of different arrangements of 4 people around


the table is 3!

Result 4 The number of ways of arranging n unlike objects in a


ring, when clockwise and anticlockwise arrangements
(n—1)!
are the same, is ( rs :

For example, if A, B, C and D are 4 different coloured beads which


are threaded on a ring, then the following two arrangements are the
same —the one is the other viewed from the other side.

A A

oO Oo 0 ao

3!
Therefore the number of arrangements of 4 beads on aring is 2 = 3.

Example 2.39 Six bulbs are planted in a ring and two do not grow. What is the
probability that the two that do not grow are next to each other?

Solution 2.39 Let S be the possibility space, then n(S) = 5!

Let E be the event ‘the bulbs that do not grow are next to each
other’. Consider the two bulbs that do not grow as one ‘item’.
They can be arranged in 2! ways. There are now five ‘items’ to be
arranged in a ring and this can be done in 4! ways.
PROBABILITY 129
Therefore n(E) = 2!4!

So P(E) i n(E)
n(S)
214!

The probability that the bulbs that do not grow are next to each
other is2.

Example 2.40 One white, one blue, one red and two yellow beads are threaded on
a ring to make a bracelet. Find the probability that the red and
white beads are next to each other.

Solution 2.40 Let S be the possibility space.


If all the objects are unlike, the number of ways of arranging five
4!'
beads on aring is9° but as there are two yellows

4!
n(S) =
(2)(2!)
= 6
Let E be the event ‘the red and the white beads are next to each
other’.
red and white can be arranged in 2! ways

2!3!<— number of ways of arranging four


Then n(kE) = — objects in a ring
B12
anticlockwise and clockwise
arrangements are the same

there are two yellows

So n(E) = 8

P(E) = n(E
(E)
n(S)
mo
6
a al
2
are next to each other
The probabil
aah hl the
ity that nS beads
red and white ean
real
IS 5:
130 A CONCISE COURSE IN A-LEVEL STATISTICS

This result can be shown diagrammatically:


Ways of arranging the beads

W W B

vy Y aB ‘Co
Y R
Y B

W Y B

R ay
Y coy
WR NA te)
R ¥

NOTE: in three of the six arrangements the red and white beads
are next to each other.

Exercise 2k

‘1. / In how many ways can the letters of the 5. Nine children play a party game and hold
word FACETIOUS be arranged in a hands in acircle.
line? What is the probability that an (a) In how many different ways can this
arrangement begins with F and ends be done?
with S? (b) What is the probability that Mary will
: be holding hands with her friends Natalie
2. (a) In how many ways can 7 people sit endl Sarah?
at a round table?
(b) What is the probability that a husband
ang pute sit together (6) If the letters in the word ABSTEMIOUS
(3) On ashelf there are 4 mathematics books | +~—«-8Ye arranged at random, find the probability
and 8 English books. that the vowels and consonants appear
(a) If the books are to be arranged so that alternately.
the mathematics books are together, in
how many ways can this be done? y !
(b) What is the probability that all the (77 (a) In how many different ways can the
mathematics books will not be together? letters
‘ in the word ARRANGEMENTS
be arranged?
[ 4) If the letters of the word PROBABILITY (b) Find the probability that an arrange-
\ are arranged at random, find the proba- ment chosen at random begins with the
bility that the two I’s are separated. letters EE.

PERMUTATIONS OF , OBJECTS FROM n OBJECTS

Consider the number of ways of placing 3 of the letters A, B, C, D,


E, F,G in 3 empty spaces.
The first space can be filled in 7 ways. The second space can be
filled in 6 ways. The third space can be filled in 5 ways. Therefore
there are (7)(6)(5) ways of arranging 3 letters taken from 7 letters.
PROBABILITY , 131

This is the number of permutations of 3 objects taken from 7 and


it is written "P3.

es Py = (7)(6)(5)= 210
Now (7)(6)(5) could be written CHC CELE
(4)(3)(2)(1)
— ae ed rie
7P,
i-e:
Alek (a8):
NOTE: the order in which the letters are arranged is important —
ABC is a different permutation from ACB.

In general, the number of permutations, or ordered arrangements,


of r objects taken from n unlike objects is written "P, where

n “ee
n!
ss (ir)!

n! n!
NOTE: "P,
(n—n)! 0!
But we know that the number of ways of arranging n unlike objects
is n!
So we must define 0! to be 1.

So Ot = 1

COMBINATIONS OF , OBJECTS FROM n OBJECTS

When considering the number of combinations of r objects from n


objects, the order in which they are placed is not important.
For example, the one combination ABC gives rise to 3! permutations
ABC, ACB, BCA, BAC; CAB, CBA
So, if the number of combinations of 3 letters from the 7 letters
A,B,C,D,E,F,G, is denoted by 7C3 then
703(3!) = /P3

(GC: — Ps
3!
le
3!4!
(7)(6)(5)(ABAZVEA)
(3)(2)(1)(A)(2)(2)(4)
= 35
132 A CONCISE COURSE IN A-LEVEL STATISTICS

In general, the number of combinations of r objects from n


n!}
= ———.
objects is "C, a where "C, oa
unlikelike objec

n
NOTE: "C, is sometimes written ,,C, or |}
r

Example 2.41 In how many ways can a hand of 4 cards be dealt from an ordinary
pack of 52 playing cards?

Solution 2.41 We need to consider combinations, as the order in which the cards
are dealt is not important.

Now 340, = a

e (52)(51)(50)(49)
‘ (4)(3)(2)(1)
= 270725 ways

The number of ways of dealing the hand of 4 cards is 270 725.

Example 242 Four letters are chosen at random from the word RANDOMLY.
Find the probability that all four letters chosen are consonants.

Solution 2.42 Let S be the possibility space, then


n(S) = °C,
8!
414!
“e (8)(7)(6)(5)
(4)(3)(2)(1)
= 70
Let E be the event ‘four consonants are chosen’. As there are six
consonants

- n(E) 6C,

6!
412!
meget)
(2)(1)
= 15
PROBABILITY . / 133

Now P(E)
n(E)
n(S)
aR
70
3
14
The probability that the four letters chosen are consonants is 3.

Example 2.43 A team of 4 is chosen at random from 5girls and 6 boys.


(a) In how many ways can the team be chosen if (i) there are no
restrictions; (ii) there must be more boys than girls?
(b) Find the probability that the team contains only one boy.

Solution 2.43 (a) (i) There are 11 people, from whom 4 are chosen. The order
in which they are chosen is not important.
Number of ways of choosing the team II eC.

os
4'7!
_ (11)(20)(9)(8)
(4)(3)(2)(1)
= 330
If there are no restrictions, the team can be chosen in 330 ways.

(ii) If there are to be more boys than girls, then there must be 3
boys and 1 girl or 4 boys.

Number of ways of choosing 3 boys and 1girl (°C3)(°C;)

-_ (6)(5)(4)(5)
(ssi 6! 5!

(3)(2)(1)(1)
= 100

Number of ways of choosing 4 boys = °C,


6!
412!
_ (6)(5)
(2)(1)
= 15

Therefore the number of ways of choosing the team if there are


more boys than girls = 100 + 15 = 115 ways.
134 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) The possibility space S = (all possible teams of 4) and


n(S) = 330.

Let E be the event ‘only one boy is chosen’.

Now

n(E) = (°C,)(5C3) (if 1 boy is chosen, then 3 girls must


be chosen)

rs) la
6! 5!

_ (6)(5)(4)
(2)(1)
= 60

So P(E) i ,
n(S)

60.
330.

11

The probability that the team contains only one boy is 2.

Example 2.44 Four items are taken at random from a box of 12 items and
inspected. The box is rejected if more than 1 item is found to be
faulty. If there are 3 faulty items in the box, find the probability
that the box is accepted.

Solution 2.44 The box is accepted if (a) there are no faulty items in the sample
of 4, or (b) there is one faulty item in the sample of 4.

Let S be the possibility space, then

n(S) = °C,
12!
4'8!

_ (12)(11)(10)(9)
(4)(3)(2)(1)
= 495
PROBABILITY 4 135
There are 9 items that are not faulty, so the number of ways of
choosing 4 items that are not faulty
= EO,

9!
415!
: _ (9)(8)(7)(6)
(4)(3)(2)(1)
| = 126
The number of ways of choosing 1 faulty item and 3 good items

= (C,)(7C3)

- (i)i)
3! o}

— (8)(9)(8)(7)
~ (8)(2)(1)
= 252
Let E be the event ‘the number of faulty items chosen is 0 or 1’.

Then n(E) = 126+252 = 378


n(£)
So P(E)
n(S)
378
495
= 0.76 (2d-p.)
If the number of faulty items is 0 or 1 then the box is accepted, so
the probability that the box is accepted is 0.76 (2 d.p.).

Example 2.45 If a diagonal of a polygon is defined to be a line joining any two


non-adjacent vertices, how many diagonals are there in a polygon of
(i) 5 sides, (ii) 6 sides, (iii) n sides? SUJB (P)

Solution 2.45 (i) Number of ways to choose 2 points from 5


ea &C5

(5)(4)
2
= 10

So there are 10 possible lines to draw, but


as there are 5 sides, 5 of these are joining
adjacent vertices.
Therefore the number of diagonals = 10—5 = 5.
(

136 A CONCISE COURSE IN A-LEVEL STATISTICS

(ii) Similarly, for the polygon of 6 sides,


the number of diagonals = °C,—6
(6)(5)
2
= 9
The number of diagonals for a polygon with 6 sides is 9.

(iii) For a polygon with n sides,


the number of diagonals lI "Ca—n

_ ————-n
=
wnt)
2
n?—n—2n
eZ.

n(n—8)
2.

n(n—8
The number of diagonals for a polygon with nsides is pees

Example 2.46 A certain family consists of Mother, Father and their ten sons.
(a) They are invited to send a group of four representatives to a
wedding. Evaluate the number of ways in which the group can
be formed, if it must contain (i) both parents; (ii) one and
only one parent; (iii) neither parent.
(b —
On another occasion, the ten sons decide to play five-a-side
football. Evaluate the number of ways in which the teams can
be made up. Determine the probability that the two eldest
brothers are in the same team. (SUJB Additional)

Solution 2.46 (a) (i) If the group contains both parents,


number of ways to choose remaining 2 from 10 = 1!°C,

_ (10)(9)
(2)(1)
= 45
If the group is to contain both parents, then it can be chosen in
45 ways.

(ii) Number of ways to choose one parent = 2.


Number of ways to choose remaining 3 from 10 OCS

_ (10)(9)(8)
(3)(2)(1)
= 120
PROBABILITY , 137
Therefore number of ways to choose the group of 4 = (2)(120)
= 240
If the group is to contain one, and only one parent, then it can be
chosen in 240 ways.

(iii) |Number of ways to choose 4 from 10 = !°C,

pe Sy
(4)(3)(2)(1)
=
210
If the group is to contain neither parent, then the number of ways
in which it can be chosen is 210.

(b) Number of ways to choose 5 from 10 = 1°C,


_ (10)(9)(8)(7)(6)
(5)(4)(3)(2)(1)
= 252
When one team has been chosen, the other team is formed auto-
matically. But, since the pairs of teams are interchangeable, e.g.
ABCDE versus FGHIJ is the same as FGHIJ versus ABCDE, the
_ total number of ways in which the two teams can be formed is
5(252) eG:
The two teams can be formed in 126 ways.

If the two eldest are the in the same team,


the number of ways in which the remaining _ g.C
frece 43
3 can be chosen
_ (8)(7)(6)
~ (8)(2)(1)
= 56

Let E be the event ‘the two eldest are in the same team’, then
n(E£) = 56.
If S is the possibility space, then n(S) = 126.
P(two eldest are in the same team) = P(E)

n(E)
n(S)
56
126
4
9
The probability that the two eldest are in the same team is 5:
138 : A CONCISE COURSE IN A-LEVEL STATISTICS

SUMMARY — ARRANGEMENTS, PERMUTATIONS AND


COMBINATIONS

The number of ways of arranging n unlike


objects in a line

The number of ways of arranging in a linen


objects of which p of one type are alike, q of
another type are alike, r of a third type are
alike, and so on

The number of ways of arranging n unlike


objects in a ring when clockwise and anticlock-
wise arrangements are different

The number of ways of arranging n unlike


objects in a ring when clockwise and anticlock-
wise arrangements are the same

The number of permutations of r objects taken


from n unlike objects

The number of combinations of r objects taken


from n unlike objects

Exercise 21

From a group of 10 boys and 8 girls, 2 5. Four persons are chosen at random from
pupils are chosen at random. Find the a group of ten persons consisting of four
probability that they are both girls. men and six women. Three of the women
are sisters. Calculate the probabilities that
From a group of 6 men and 8 women, 5 the four persons chosen will: (i) consist
people are chosen at random. Find the of four women, (ii) consist of two women
probability that there are more men and two men, (iii) include the -three
chosen than women. eieteecs (JMB)
From a bag containing 6 white counters
and 8 blue counters, 4 counters are chosen , R :
at random. Find the probability that 2 6. <A touring party of 20 cricketers consists
white counters and 2 blue counters are of 9 batsmen, 8 bowlers and 3 wicket
ehGeen! keepers. A team of 11 players must have
at least 5 batsmen, 4 bowlers and 1 wicket
From a group of 10 people, 4 are to be keeper. How many different teams can be
chosen to serve on a committee. selected, (a) if all the players are available
(a) In how many different ways can the for selection, (b) if 2 batsmen and 1
committee be chosen? bowler are injured and cannot play?
(6) Among the 10 people there is one
married couple. Find the probability that
both the husband and the wife will be 7. Find the number of ways in which 10
chosen. different books can be shared between a
(c) Find the probability that the 3 boy and agirl if each is to receive an
youngest people will be chosen. even number of books.
PROBABILITY 139

8. Four letters are picked from the word ca) How many even numbers can be formed
BREAKDOWN. What is the probability with the digits 3,4,5,6,7 by using some
that there is at least one vowel among the or all of the numbers (repetitions are
letters? not allowed)?

Eight people sit in a minibus: 4 on the 1S.


sunny side and 4 on the shady side. If 2
people want to sit on opposite sides to
OR@s6@©
each other, another 2 people want to sit Different coloured pegs, each of which
on the shady side, in how many ways can is painted in one and only one of the
this be done? six colours red, white, black, green, blue
and yellow, are to be placed in four
10. Disco lights are arranged in a vertical line. holes, as shown in the figure, with one
How many different arrangements can be peg in each hole. Pegs of the same colour
made from 2 green, 3 blue and 4 red are indistinguishable. Calculate how many
lights (a) if all 9 lights are used, (b) if different arrangements of pegs placed in
at least 8 lights are used? the four holes so that they are all occupied
can be made from
A group consisting of 10 boys and 11 (a) six pegs, all of different colours,
girls attends a course for special games (b) two red and two white pegs,
coaching. (c) two red, one white and one black peg,
(a) When they are introduced, each (d) twelve pegs, two of each colour.
person hands a card containing his or her (L Additional)
photograph, name and address to every
other member of the group. State the 16. (a) Calculate how many different num-
total number of cards which are ex- bers altogether can be formed by taking
changed. one, two, three and four digits from
(b) Five boys are selected for basketball the digits 9,8,3 and 2, repetitions not
and six girls for netball. Find the number being allowed.
of different possible selections for each (b) Calculate how many of the numbers
of these. in part (a) are odd and greater than 800.
(c) Five particular boys and five par- (c) If one of the numbers in part (a)
ticular girls are selected and placed in is chosen at random, calculate the proba-
mixed pairs for tennis. Find the total bility that it will be greater than 300.
number of different mixed pairs which (L Additional)
can be made using these ten children.
(d) If 4 children are chosen at random 17. The positions of nine trees which are to
from the whole group find the proba- be planted along the sides of a road, five
bility that there is a majority of girls in on the north side and four on the south
the 4 selected. (L Additional) side, are shown in the figure.
Oo Oo fe) Oo oO N
12. To enter a cereal competition, com-
petitors have to choose the 8 most
important features of a new car, froma
possible 12 features, then list the 8 in ° Oo O° O Ss
order of preference. Each cereal packet
entry form contains space for 5 entries. A (a) Find the number of ways in which
correct entry wins a new Car. this can be done if the trees are all of
(a) What is the probability that a house- different species.
wife wins a new car if she completes the (b) If the trees in (a) are planted at
entry form from one packet? random, find the probability that two
(b) How many entry forms would she particular trees are next to each other on
need to complete, each entry showing the same side of the road.
different arrangements, if the proba- (c) If there are 3 cupressus, 4 prunus and
bility that she wins a car is to be at least 2 magnolias, find the number of different
0.8? ways in which these could be planted
assuming that trees of the same species
Three letters are selected at random from are identical.
the word BIOLOGY. Find the proba- (d) If the trees in (c) are planted at
bility that the selection (a) does not random, find the probability that the 2
contain the letter O, (b) contains both magnolias are on the opposite sides of the
road. (L Additional)
the letter O’s.
140 A CONCISE COURSE IN A-LEVEL STATISTICS

18. A committee consisting of 6 persons is to


be selected from 5 women and 6 men.
Se
(a) Calculate the number of ways in
which the chosen committee will contain L L
exactly two men.
(b) Given that the committee is to con-
tain at least 2 men, show that it can be Eigeedes
selected in 456 ways. (i) Given that each may sit in any of
the six places, calculate the number
(c) Given that these 456 ways are equally of different ways they may be seated
likely, calculate the probability that there
at the table.
will be more men than women on the
(ii) Given that the committee con-
committee.
sists of 3 men and 3 women and that
(d) At a meeting the members of the the men and women must sit alter-
chosen committee sit at a rectangular nately round the table, calculate in
table in the fixed seats illustrated in the how many different ways they may
diagram: be seated. (L Additional)

MISCELLANEOUS WORKED EXAMPLES

Example 2.47 The events A and B are such that P(A) = 5?P(B) = : and
P(B|A) po ta
aE :
Find

(a) P(ANMB),
(b) P(AUB),
(c) P(A|B),
(d) P(A|B).
State whether A and B are (i) independent, (ii) mutually exclusive.

Solution 2.47 Consider a possibility space S and let n(S) =n.


Consider events A and B such that n(A) =r and n(B) =s,
n(ANB) =t.
First, draw a Venn diagram, showing this information:

n s
() BNA

ANB

(a) Now
Ti z
Fipencre nets but P(A)
=— so Eg:
n(S) n

a) n 3

butieh PCR) <02 esa


‘et mets) >
5 n 5
#
PROBABILITY 1417

Now

aA aP(B CVA n(BNA —t n—t/n


P(A) n(A) i=7 L—T/n

But we are given that P(B|A) = om therefore

ll s/n—t/n
20 Ln
_ 2/5t/n
3 1-1/3
—(2\(t) 2
3/ \20 oi vi

S by & fected,
. n 5 80
Ls fede
80
N Ow P(ANB)
( )== n(AMB)
n(S)

nae
Or
Therefore P(ANMB) = rt

(b) P(AUB) = P(A)+P(B)—P(ANB)


FOR te oe
ep Pep Pitas Wii
gr he
ail)
Therefore P(A UB) = =.

e P(ANB)
(c) P(A|B) = PB)

_ n(ANB)
——-n(B)
ha s—t
me
s/n—t/n
s/n
142 A CONCISE COURSE IN A-LEVEL STA TISTICS
= 2/5—1/30
2/5

aL
12

Therefore P(A|B) = oS

P(ANB)
(d) P(A|B) P(B)
1/30
2/5

Zn
12
Therefore P(A|B) = *.

NOTE: P(A|B)+P(A|B) =1.


(i) If two events are independent then P(AM B) = P(A)- P(B).

Now P(ANB) = zs and P(A)-P(B) = | (2) = is


30 3/ \5 15
So P(ANM B) # P(A)-P(B) and the events A and B are not indepen-

(ii) If two events are mutually exclusive, then P(AMB) = 0.


So, as P(AN B) #0, A and B are not mutually exclusive.

Example 2.48 Tung-Pong and Ping-Ho play a game of table tennis. The score
reaches 20-20. The game continues until one player has scored two
more points than the other.
The probability that Tung-Pong wins each point is 0.6. What are
the probabilities that:
(a) Tung-Pong wins the game after 2 further points?
(b) Ping-Ho wins the game after 2 further points?
(c) The score is 21-21 after 2 further points?
(d) Tung-Pong wins the game after 3 further points?
(e) Tung-Pong wins the game after 4 further points?
(f) Tung-Pong wins the game after 6 further points?
If the game can continue indefinitely, for each player what is the
probability that he will ultimately win? (SUJB Additional)
PROBABILITY , 143
Solution 2.48 Let W be the event ‘Tung wins a point’.
Then P(W) = 0.6 and P(W) = 0.4.
(a) P(Tung wins after 2 further points) = P(WW)
= (0.6)(0.6)
= 0.36
The probability that Tung wins after 2 further points is 0.36.

(b) P(Ping wins after 2 further points) = P(WW)


= (0.4)(0.4)
= 0.16
The probability that Ping wins after 2 further points is 0.16.

(c) P(score is 21-21 after 2 further points) = P(WW)+ P(WW)


= 2(0.6)(0.4)
= 0.48
The probability that the score is 21-21 after 2 further points is 0.48.

(d) To consider the situation after 3 further points we look first at


the situation after 2 further points. Now after 2 further points
either Tung has won, or Ping has won, or the score is 21-21. If the
score is 21-21 then there is no way that Tung can win after just one
more point.
So the probability that Tung wins after 3 further points is 0.

(e) After 4 further points, if Tung wins, the sequence of points


must be
(WW) (WW)
As ays
or (WW) (WW)

So
P(Tung wins after 4 further points) = P(WW WW)+P(WW WW)
= 2(0.6)3(0.4)
= 0.1728

The probability after 4 further points is


that Tungoewins Se ee 0.1728.
Peg Ore ee

(f) If Tung wins after 6 further points, the sequence must be


(WW) (WW) (WW)
t fi t
(2 ways) (2 ways) (1 way)
144 A CONCISE COURSE IN A-LEVEL STA TISTICS
So P(Tung wins after 6 further points) = AP(WW WW WW)
= 4(0.6)*(0.4)?
0.0829 (35S.F.)
Soeethe probability eee wins after 6 further
Tung ee
that ee eee ee points is 0.0829
eee
eo
(3S.F.).

If the game can continue indefinitely,


P(Tung wins) = (0.6)? + 2(0.6)3(0.4) + 4(0.6)*(0.4)?+...
= (0.6)*(1 + 2(0.6)(0.4) + 4(0.6)?(0.4)?+...)
= (0.6)7(1+0.48+0.487+...)
if
= (0.6)? eae (sum of an infinite G.P.,
1—0.48/ common ratio 0.48)

Therefore

P(Ping wins) ll H
| |

13
The probability that Tung wins is A and the probability that Ping
rere ee
wins is =.
13

Example 2.49 (a) A bag contains 5 red and 4 blue balls. 3 balls are picked out,
one at a time, and are not replaced. Find the probability that at
least 1 of the 3 balls is blue.

(b) One letter is selected from each of the names: SIMMS, SMITH,
THOMPSON. What is the probability that 2, and only 2 are the
same?

(c) A candidate attempts a question to which 5 possible answers


have been given, one of them correct. For any question, there is
a probability of 1 that he knows the correct answer. If he does
not know the correct answer he will mark one of the answers at
random. He does, in fact, mark the correct answer. What is the
probability that he knew the correct answer? (SUJB)
PROBABILITY 145

Solution 2.49 (a) Using an obvious notation,


P(3 red balls) = P(R,;NR,1 R3)
5\ /4\ (38 U
= 5 5)[| (non-independent events)

4D
~ 42
So P(at least one ball is blue) = 1—P(8 balls are red)

Dh 42
_ 387
~ Ag
The probability that at least one of the 3 balls is blue isa7

(b) SIMMS SMITH THOMPSON


Let the event S, be ‘choosing an S from the first name’, and so on.
If one letter is selected from each name, then if 2 and only 2 letters
are the same the possible outcomes are listed below, with their

ns = (5) (a)
respective probabilities:

0.07

P(S,S,S3) = alee
( 12 ey) jak 5 5 8 =".0.04
= .

P(§,S,S3)
( Le) 3) =ye algle
5 5 8 = 0.015
are .

PULLS
(hh) = ane: (1)
i T="p=" G04
0;

P(M,M,M3) =
pS
mo = (F)(s}
Be
Oe
(5)= 9(ANT
Va
1:
:

P(M,M,M3) = [)a 5] = 0.04

ie Sela
P(M,M,M3) = (5 =
[| | 0.015

a Bet
P(A, H>H;) = (1) | | =, 01025
1\/1
P(T,T)T3) = (1) aig == "0.025
Total 0.340
ore £4
Therefore and sey
P(2 ant letters are the same) = 0.34.
2 ke
only 4
ineret
146 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) Let K be the event ‘he knows the correct answer’, then
P(K) =ee!3

Let M be the event ‘he marks the correct answer’.

Now P(M|K) = E as he marks an answer at random if he does not


know the correct answer.

ax os “| 1
| ein? a P(KOM) AM) =|—)(1)
(5( =>
3

P(KNM) =0

Knowing Marking the


the answer correct answer |

P(KNM)
We require P(K|M) =
P(M)
Now P(M) P(MNK)+P(MNK)
Le
3° 16
q
ats
1/3
So P(K|M)
7/15
5
7

The probability that he knew the correct answer, given that he


marked the correct answer, is2.

Example 250 (a) A bag contains a number of counters, alike in shape and size,
but x are red and y are green. Counters are to be chosen at ran-
dom from the bag. Prove that the probability that the second
counter chosen will be red is the same, whether the first counter
is replaced or not before the second is drawn.
(b) A three-figure number, not less than 100, is to be made up
using three digits selected at random from the digits 0,1; 2,3, 4,
PROBABILITY , 147
5,6,7,8,9 WITHOUT using the same digit twice in any num-
ber. Show that the total possible number of numbers is 648.
Calculate the probabilities: (i) that the number is even, (ii) that
the number is divisible by 5, (iii) that the number is greater
than 600, (iv) that the number is even and greater than 600.
What are the corresponding results for (i) and (ii) if the same
digit may be used two or three times in the same number?
(SUJB)

Solution 250 (a) The bag contains x red counters and y green counters.
When the first counter is replaced

x
With obvious notation PER) = cess
xy

When the first counter is not replaced

RRS (x —1)
FUROR ~ (x+y)(x +y—1)

Sy suppl
Mi dele
ee icyah

1st counter 2nd counter

P(R2) lI P(R,0R,)+P(R,1G,) (mutually exclusive events)

x (x—1) y x
(ety) (@ty—1) (xty)(e+y—-1)
x
= ——__——__ (x-1+
Guyer ”)
wh x

re xy

Therefore the probability that the second counter is red is en


EER ODEOE Cnca
whether or not the first counter is replaced. This is because there
a

is no condition placed on the first counter so that any one of the


x+y counters is equally likely to be the second counter.
148 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) 1st digit can be chosen in9 ways (0 not included)


2nd digit can be chosen in 9 ways (0 included here)
3rd digit can be chosen in 8 ways
So, total number of ways = (9)(9)(8)
= 648
Therefore the possibility space is 648 equally likely outcomes.

(i) If the number is even


Either the 3rd digit is 0 or the 3rd digit is 2, 4,6 or 8
3rd digit chosen in 1 way 3rd digit chosen in 4 ways
2nd digit chosen in 9 ways 1st digit chosen in 8 ways
' (0 excluded)
1st digit chosen in 8 ways 2nd digit chosen in 8 ways
Number of ways = (1)(9)(8) Number of ways = (4)(8)(8)
= 72 = 256
Therefore the number of ways in which the number is even = 328.
. 328 41
P(number is even) = —— = —
648 81

(ii) If the number is divisible by 5


Either the 3rd digit is 0 or the 3rd digit is 5
8rd digit chosen in 1 way 3rd digit chosen in 1 way
1st digit chosen in 9 ways 1st digit chosen in 8 ways
(0 excluded)
2nd digit chosen in 8 ways 2nd digit chosen in 8 ways
Number of ways = (1)(9)(8) Number of ways = (1)(8)(8)
a the = 64
Therefore the number of ways in which the number is divisible by
5 = 136.
136 17
P(number is divisible by 5) = —— = —
648 81
(iii) If the number is greater than 600
1st digit can be chosen in 4 ways (from 6,7, 8,9)
2nd digit can be chosen in 9 ways
3rd digit can be chosen in 8 ways
Number of ways = (4)(9)(8)
288

P(number is greater than 600) = sia ae


648 9
PROBABILITY 149
(iv) If the number is even and greater than 600
Either the 1st digit is 6 or 8 or the Ist digit is 7 or 9
1st digit chosen in 2 ways 1st digit chosen in 2 ways
3rd digit chosen in 4 ways 3rd digit chosen in 5 ways
2nd digit chosen in 8 ways 2nd digit chosen in 8 ways
Number of ways = (2)(4)(8) Number of ways = (2)(5)(8)
= 64 = 80

The number of ways in which the number is even and greater than
600 = 144.

: 144 2
P(number is even and greater than 600) = 648 = a

Now consider the case when the same digit may be used two or
three times:

If the number is even

We are concerned with the 3rd digit, which can be chosen in 5 ways.
If there was no restriction, this could be chosen in 10 ways.

—~5 = —al
P(number is even) = 9
10

If the number is divisible by 5

We are concerned with the 3rd digit which can be chosen in 2 ways.

P(number is divisible by 5) =
10 ou
|e

Example 2.51 (a) A and B play a game as follows: an ordinary die is rolled and if
a six is obtained then A wins and if a one is obtained then B
wins. If neither a six nor a one is obtained then the die is rolled
again until a decision can be made. What is the probability that
A wins on (i) the first roll, (ii) the second roll, (iii) the rth
roll? What is the probability that A wins?

(b) A bag contains 4 red and 3 yellow balls and another bag con-
tains 3 red and 4 yellow. A ball is taken from the first bag and
placed in the second, the second bag is shaken and a ball taken
from it and placed in the first bag. If a ball is now taken from
the first bag what is the probability that it is red?
(You are advised to draw a tree diagram. ) (SUJB)
150 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 251 (a) P(6 is obtained) = zand P(1 is obtained) = 3.

(i) P(A wins on the first roll) = P(6 is obtained) =


o>
|e

(ii) | P(A wins on the second roll) = P(neither a 6 nor a1 on


the first roll)-P(6 on
second roll)

- File
2

(iii) P(A wins on the rth roll) = P(neither a6 nora lon


1st r—1 rolls)- P(6 on
the rth roll)

) (a
i) (sl
are
These are mutually exclusive events.

desl 2) 1 215 (4
So PUA WO) dle a)+ aiaie +

+... to infinity

shawn
= —|1+(=)+(=]
2) 002 +... ailesth|
6 3 3

1
= —S
6

where S is the sum of an infinite G.P. witha = 1,r= z.

P(A wins) b(t


6 \1 —3

II oo
yay —
oO
|r

mle

The probability that A wins is 5:


PROBABILITY \ y oan

(b) First we show the possibilities diagrammatically:

The tree diagram to show the possible outcomes and the proba-
bilities is as follows:
4

| PIR) = 4 P(RRR) = (3)(5)BFmoe


7/\8)\7) 392

ma=(5ra)ea)

5) ~sa
=(7r
nv

moran =(]a)(a)a
Draw from 1st bag | Draw from 2nd bag | Draw from 1st bag

1
P(red from 1st bag) —— (64+
48+ 45+ 60)
392
217
392
0.554 (8d.p.)
The probability that the ball is red is 0.554 (3 d.p.).

Example 252 (a) A pack of 52 playing cards is cut at random into three piles.
Find the probability that the top cards are all (i) black,
(ii) hearts, (iii) aces.
After the top cards have been examined and found not to be
picture cards, calculate the probability that the three bottom
cards are all queens.
152 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) A bag contains eight black counters and two white ones. Each
of two players, A and B, draws one counter in turn, without
replacement, until one of them wins by drawing a white counter.
A draws first. Calculate his chance of winning. (AEB 1975)

Solution 2.52 (a) The possibility space S = (the pack of 52 cards).


There are 26 black cards in the pack, so with obvious notation,
26\ | 25)\ (24
i
(7) Bai=
P(B,B,B3) |S}
a (2°)a (non-independent events)

4
17
The probability that the top cards are all black is =.

Loy ie eee ;
(ii) P(H,H,H3) = S ea ) (non-independent events)

ail
~ 850
The probability that the top cards are all hearts is =

(iii) P(A,A,A3) = a al | (non-independent events)


52) \51/ \50
eral:
5525
The probability that the top cards are all aces is ae
5525"

If the three top cards have been examined‘and found not to be


picture cards, the possibility space has been reduced by 3 members
so that S consists of the 49 remaining cards.
The number of remaining queens = 4.

There f re
erefo P(Q: Q2Q3
(Q1:Q 2Q3)) == (S le
49]\a 3) 460
g}]\47) = 6
The probability that the three bottom cards are queens is
4606.
(b) Let A, be the event ‘A wins on the first draw’,
A, be the event ‘A wins on the second draw’, and so on.
Then, with obvious notation for the black and white counters,

PAve= PW) a ae
10 5
Sl NZ
P(A,) = P(BBW) = * Bg =
nos= mmm =61618)
PROBABILITY 153

- GI) IEIE)- 2
P(A,) = P(BBBBBBW)

P(A;) = P(BBBBBBBBW)

olla hella}(e)(s)(S) aN) ~ a


FAZER -2

P(A wins) P(A,) + P(A2) + P(A3) + P(Ay) + P(As)


(mutually exclusive events)
1 Tee mel: E
—4+—+4+=4+—4+—
Se ioe oe tom 45
2
2
Therefore the probability that A wins is 2.
9

Example 2.53 (a) Ruby Welloff, the daugher of a wealthy jeweller, is about to
get married. Her father decides that as a wedding present she
can select one of two similar boxes. Each box contains three
stones. In one box two of the stones are real diamonds, and the
other is a worthless imitation; and in the other box one is a real
diamond, and the other two are worthless imitations. She has
no idea which box is which. If the daughter were to choose
randomly between the two boxes, her chance of getting two
real diamonds would be 5:Mr Welloff, being a sporting type,
allows his daughter to draw one stone from one of the boxes
and to examine it to see if it is a real diamond. The daughter
decides to take the box that the stone she tested came from if
the tested stone is real, and to take the other box otherwise.
Now what is the probability that the daughter will get two real
diamonds as her wedding present?
(b) A fair die is cast; then n fair coins are tossed, where nis the
number shown on the die. What is the probability of exactly
two heads?
(c) A fair die is thrown for as long as necessary for a 6 to turn up.
Given that 6 does not turn up at the first throw, what is the
probability that more than four throws will be necessary?
(AEB 1979)

Solution 2.53 (a) Let A be the event ‘she chooses the box with 2 diamonds’,
B be the event ‘she chooses the box with 1 diamond’,
D be the event ‘she chooses a diamond from the box’.
154 A CONCISE COURSE IN A-LEVEL STATISTICS

P(A
vats)5
=\5}\3) 7 3

Choosing box | Choosing stone |

Now, she takes the box if the tested stone is real,


she takes the other box if the tested stone is worthless.

So the probability of getting 2 real diamonds = P(AND)+P(BND)


= —
a
3 3

a
3

Therefore the probability that she has 2 diamonds for her wedding
present is 2.

(b) Number on the die Number of heads and tails required


(i) 2 2H
(ii) 3 2H, 1T
(iii) 4 2H, 2T
(iv) 5 2H, 3T
(v) 6 2H, 4T
We will consider first the probability of obtaining the required
number of heads and tails for each of the situations:

(i) Two coins are tossed


The possibility space consists of 2? equally likely outcomes.
1
So P(2H) = 7

(ii) Three coins are tossed


The possibility space consists of 2? equally likely outcomes.
PROBABILITY
155

Number of ways of arranging H, H, T = — = 3

So P(2H,1T) =
oe
sj
(iii) Four coins are tossed
The possibility space consists of 2* equally likely outcomes.
4!
Number of ways of arranging H,H, T, T = 313! = 6.

6 3
So P(2H, 2T) = = = =
16 8

(iv) Five coins are tossed


The possibility space consists of 2° equally likely outcomes.
5!
Number of ways of arranging H, H, T, T, T =oeiai = 10.

So P(2H,3T) = —
32. 16

(v) Six coins are tossed


The possibility space consists of 2° equally likely outcomes.
!
Number of ways of arranging H,H, T, T, T, T = 3141 = 15.

So P(ZHEAT St =

Let P(n) be the probability that n is shown on the die


i 3 3 5 15
P(exactly 2 heads) = —p(2)
a (2) + —P(3)
3 (3) + —P(4)
3 (4) + 16
—P(5)
(5) + —P6
64 (6)
1
But HZ) eaten tdg (5) EG) =

Therefore
Til 3 or 5 © 15
P(exactly 2 heads) ~+— i
6 an asge8-
16: 564
33
128

The probability that exactly two heads are obtained is z


E
ae eee 3
156 A CONCISE COURSE IN A-LEVEL STA TISTICS

(c) Let 6, be the event ‘a 6 is obtained on the first throw’ and so


on.

_ P(6,6 5636461)
We require P(6 663641 6)
P@,)
_ POO®
5
6

5)
5 3

125
216

The probability that more than four throws will be necessary is x.


ee

Example 2.54 (a) When a person needs a minicab, it is hired from one of three
firms, X, Y and Z. Of the hirings 40% are from X, 50% are from
Y and 10% are from Z. For cabs hired from X, 9% arrive late,
the corresponding percentages for cabs hired from firms Y and
Z being 6% and 20% respectively. Calculate the probability
that the next cab hired
(i) will be from X and will not arrive late,
(ii) will arrive late.
Given that a call is made for a minicab and that it arrives late,
find, to 3 decimal places, the probability that it came from Y.
(b —
For a certain strain of wallflower, the probability that, when
sown, a seed produces a plant with yellow flowers is g. Find
the minimum number of seeds that should be sown in order
that the probability of obtaining at least one plant with yellow
flowers is greater than 0.98. (L)

Solution 2.54 (a)


* P(LAX) = (0.09) (0.4) »

P(LNX) = (0.91) (0.4)

* P(LMY) = (0.06) (0.5)

x PUL 1GVZ) = 1(0:2))(051))

Choice of firm Arriving late or not


PROBABILITY 157
(i) From the diagram
BEA P(L |X)*P(X)
(0.91) (0.4)
0.364
Therefore the probability that the next cab hired will be
from X and will not arrive late is 0.364.

(ii) P(L) = PQLNX)+P(LON Y) + P(L NZ)


P(L|X)°P(X) + P(LI|Y)*P(Y)
+ P(L|Z)*P(Z) (marked with * on diagram)
(0.09) (0.4) + (0.06) (0.5) + (0.2)(0.1)
0.086
Therefore the probability that the next cab hired will arrive
late is 0.086.
P(L\Y):P(Y)
We require P(Y |L)
P(L)
(0.06) (0.5)
0.086
= 0.3849 (8d.p.)
Therefore, given that the cab arrives late, the probability
that it came from Yis 0.349 (3 d.p.).

(b) When
aseed is sown, P(yellow flower) = §
When n seeds are sown,
P(at least one yellow flower) = 1—P(no yellow flowers)

= 1-(8)"
Now we need
P(at least one yellow flower) > 0.98

so 1—(3)” > 0.98


(2)" > 0.02
Taking logs of both sides n log(2) < log 0.02

Dividing both sides by log (2) and reversing the inequality since
log (2) is negative, we have
log 0.02
5
log (@)
n >021.45...
Therefore the minimum number of seeds that should be sown
Isic.
158 A CONCISE COURSE IN A-LEVEL STATISTICS

Miscellaneous Exercise 2m
———
SERURISaTneeeRT

(a) Two dice are thrown together, and The probability that it is red is 1.5 times
the scores added. What is the probability the probability that it is blue, and the
that (i) the total score exceeds 8? (ii) the probability that it is blue is twice the
total score is 9, or the individual scores probability that it is green. Find the
differ by 1, or both? probabilities that the counter is (a) red,
(b) A bag contains 3 red balls and 4 black (b) blue, (c) green.
ones. 3 balls are picked out, one at a time A counter is taken at random from the
and not replaced. What is the probability bag, its colour is noted and it is then
that there will be 2 red and 1 black in the replaced in the bag. The process continues
sample? until at least one of each colour has been
(c) A committee of 4 is to be chosen seen. Considering the order in which the
from 6 men and 5 women. One particular colours are first seen, find the proba-
man and one particular woman refuse to bilities that (d) red is seen before green,
serve if the other person is on the com- (e) the order is green, blue and finally
mittee. How many different committees red. (O &C)
may be formed? (SUJB)
There are eight girls and ten boys in the
In Camelot it never rains on Friday, upper sixth form of a small school. Six
Saturday, Sunday or Monday. The prefects are to be selected. In how many
probability that it rains on a given Tues- ways can this be done if (a) there must
day is :. For each of the remaining two be three girl and three boy prefects;
days, Wednesday and Thursday, the con- (b) there must be at least four boy
ditional probability that it rains, given prefects?
that it rained the previous day, is @, and Amongst the eighteen pupils, there is a
the conditional probability that it rains, pair of twins, one girl and one boy. The
given that it did not rain the previous Headmaster has decided that there must
day, is B. be three girl and three boy prefects. Find
(a) Show that the (unconditional) proba- the probability that both of the twins
bility of rain on a given Wednesday is are selected. (SUJB Additional)
eat 46), and find the probability of rain
The events A and B are such that
on a given Thursday.
(b) If X is the event that, in a randomly
chosen week, it rains on Thursday, Y is
(4) =— 1=3
P

(4'1B) == 3—1
the event that it rains on Tuesday, and Y
is the event that it does not rain on Tues- P(AB)
day, show that
3
P(X|Y)—P(X|Y) = (a—6) P(AUB) = 5
(c) Explain the implications of the
case a= 8B. (Cambridge) where A’ is the event ‘A does not occur’.
Using a Venn diagram, or otherwise,
Mass-produced glass bricks are inspected determine P(B| A’), P(BMA) and P(A|B’).
for defects. The probability that a brick The event C is independent of A and
has air bubbles is 0.002. If a brick has air P(ANC) = §. Determine P(C| A’).
bubbles the probability that it is also
cracked is 0.5 while the probability that a State, with a reason in each case, whether
brick free of air bubbles is cracked is (a) A and B are independent, (b) A and
0.005. What is the probability that a C are mutually exclusive. (Cambridge)
brick chosen at random is cracked? If p; and p2 are the probabilities of two
The probability that a brick is discoloured
independent events, show that the proba-
is 0.006. Given that discolouration occurs
bility of the simultaneous occurrence of
independently of the other two defects, these two events is pp.
find the probability that a brick chosen at
random has no defects. (O &C) In 18 games of chess between A and B,
A wins 8, B wins 6 and 4 are drawn.
A bag contains red, blue and green A and B play a tournament of 3 games.
counters of equal size and shape. A On the basis of the above data, estimate
counter is taken at random from the bag. the probability that: (a) A wins all three
PROBABILITY 159

games, (b) A and B win alternately, Queen’s and the: Royalty. Find the
(c) two games are drawn, (d) A wins at probability that (a) A and B meet, (b) B
least one game. (AEB 1972) and C meet, (c) A, B and C all meet,
(d) A, B and C all go to different places,
(a) From an ordinary pack of 52 cards (e) at least two meet. (C)
two are dealt face downwards on a table.
What is the probability that (i) the first 12. In a game, three cubical dice are thrown
card dealt is a heart, (ii) the second card by a player who attempts to throw the
dealt is a heart, (iii) both cards are hearts, same number on all three. What is the
(iv) at least one card is a heart? chance of the player
(6) Bag A contains 3 white counters and (a) throwing the same number on all
2 black counters whilst bag B contains three?
2 white and 3 black. One counter is (6) throwing the same number on just
removed from bag A and placed in bag two?
B without its colour being seen. What is If the first throw results in just two dice
the probability that a counter removed showing the same number, then the third
from bag B will be white? is thrown again. If no two dice show the
(c) A box of 24 eggs is known to con- same number, then all are thrown again.
tain 4 old and 20 new eggs. If 3 eggs are The player then comes to the end of his
picked at random determine the proba- turn. What is the chance of the player
bility that (i) 2 are new and the other succeeding in throwing three identical
old, (ii) they are all new. (SUJB) numbers in a complete turn?

The probabilities of A, B or C winning a What is the chance that all the numbers
game in which all three take part are 0.5, are different at the end of a turn?
0.3 and 0.2 respectively. A match is won (O &C)
by a player who first wins two games.
13. Alec and Bill frequently play each other
Find the probability that A will win a in a series of games of table tennis.
game involving all three players. Records of the outcomes of these games
When the players are joined by a fourth indicate that whenever they play a series
player, D, the probabilities of A, B or C of games, Alec has the probability 0.6 of
winning a game, in which all four take winning the first game and that in every
part, are reduced to 0.3, 0.2 and 0.1 subsequent game in the series, Alec’s
respectively. A match is played in which probability of winning the game is 0.7 if
all four players take part; again, the first he won the preceding game but only 0.5
player to win two games wins the match. if he lost the preceding game. A game
Find the probabilities that D wins in cannot be drawn. Find the probability
fewer than (i) four games, (ii) five that Alec will win the third game in the
games, (iii) six games. (JMB) next series he plays with Bill. (JMB)

10. A bag contains 5 red, 4 orange and 3 14. The events A and B are such that
yellow sweets. One after another 3 children
1
select and eat one sweet each. When the (A) = =és
P(A)
bag contains n sweets, the probability of
any one child choosing any particular dl
P(A or B but not both A and B) = ee
sweet is 1/n. What are the probabilities
that (a) they all choose red sweets,
(b) at least one orange sweet is chosen, PB) = =4
(c) each chooses a different colour,
(d) all choose the same colour? Answers Calculate P(ANB), P(A’ OB), P(A| B) and
may be left as fractions in their lowest P(B| A’), where A’ is the event ‘A does
terms. (O &C) not occur’. State, with reasons, whether
A and B are (a) independent, (6) mutu-
11. Three men, A, B and C agree to meet at ally exclusive. (C)
the theatre. The man A cannot remember
15. (a) Two men each have a set of 7 cards,
whether they agreed to meet at the
numbered 1 to 7. Each shows a card
Palace or the Queen’s and tosses a coin
to decide which theatre to go to. The man drawn at random. Find the probability
B also tosses a coin to decide between the that the total of the two numbers is
(i) even, (ii) odd, (iii) greater than 5.
Queen’s and the Royalty. The man Cc
(b) A signal consisting of 7 dots and/or
tosses a coin to decide whether to go to
dashes is to be given. The probability of a
the Palace or not and in this latter case
he tosses again to decide between the
dot in any position is 2/5 and of a dash is
160 A CONCISE COURSE IN A-LEVEL STATISTICS

3/5. Find the probability that, in asignal, 18. Six fuses, of which two are defective and
no two consecutive characters are the four are good, are to be tested one after
same. another in random order until both defec-
(c) A die is loaded so that the chance of tive fuses are identified. Find the proba-
throwing a one is x/4, the chance of a two bility that the number of fuses that will be
is 1/4 and the chance of a six is (1— x)/4. tested is
The chance of a three, four or five is 1/6. (a) three,
The die is thrown twice. (b) four or fewer. (L)P
Prove that the chance of throwing a total 19. In this question you may leave the answers
9x —9x?+ 10 as fractions. Your arguments must be
of7 is carefully explained in both parts.
72 :
Find the value of x which will make this (a) A pack of ten cards consists of two
chance a maximum, and find this maxi- marked with the letter A, three with E,
mum probability. (SUJB) four with S and one with T. The pack is
well shuffled and six cards are dealt.
Find the probability that (i) they form
the word ASSETS, the letters
16. 4 girls and 3 boys plan to meet together
appearing in that order; (ii) the letters
on the following Saturday. The ie pian
either form or can be made to form the
that each boy will be present is 2 indepen- word ASSETS.
dently of the other boys. Find the proba- (bo) A manufacturer of tea inserts one of
bility that (a) 0, (6) 1, (c) 2, (d) 3 boys five types of picture card into each
will be present. packet. Equal numbers of each type
The probability that each girl will be are distributed randomly. Estimate the
present is 4 independently of the other probability that a person buying three
packets will have (i) three cards of the
girls and of the boys.
same type, (ii) just two the same. If a
(e) Find the probability that the mnoee
person buys five packets, estimate the
of girls present will equal the number of
probability of obtaining five different
boys.
(f) Find the probability that both sexes types of card. (SUJB)
will be present. 20. A census of married couples showed that
(g) Afterwards it was reported that the 50% of the couples had no car, 40% had
gathering had included at least one boy one car and the remaining 10% had two
and at least one girl. What is the proba- cars. Three of the married couples are
bility that there were equal numbers of chosen at random.
boys and girls in the light of this addi- (a) Find the probability that one couple
tional information? had no car, one has one car and one has
(Answers may be left as fractions in their two cars.
lowest terms.) (OO &C) (6) Find the probability that the three
couples have a combined total of three
cars.
17. A sailing competition between two The census also showed that both the
boats, A and B, consists of a series of husband and the wife were in full-time
independent races, the competition being employment in 16% of those couples
won by the first boat to win three races. having no car, in 45% of those having one
Every race is won by either A or B, and car and in 60% of those having two cars.
their respective probabilities of winning (c) For a randomly chosen married
are influenced by the weather. In rough couple find the probability that both the
weather the probability that A will win husband and wife are in full-time employ-
is 0.9; in fine weather the probability ment.
that A will win is 0.4. For each race the (d) Given that a randomly chosen married
weather is either rough or fine, the couple is one where both the husband
probability of rough weather being 0.2. and wife are in full-time employment,
Show that the probability that A will find the conditional probability that the
win the first race is 0.5. couple has no car. (JMB)
Given that the first race was won by A, 21. Three machines A, B and C produce 25%,
determine the conditional probability 25% and 50% respectively of the output
that (a) the weather for the first race of a factory manufacturing a certain
was rough, (b) A will win the competi- article. A sample of 3 articles is selected
tion. (C) at random from the total output. Find
PROBABILITY 161

the probabilities that (a) they are all (a) the five observations include the
from C, (b) at least 2 are from B. largest and the least among the 12 ob-
If a second independent sample of 3 servations,
articles is selected, find the probability (b) the second largest and the second
that both samples have the same number smallest will be included,
of articles produced by A. (c) the five smallest observations are
included,
Of the articles produced by A, B and C, (d) at least three of the smallest five
1%, 2% and 5% respectively are defective. observations are included? (MET)
A single article is selected at random. If
D denotes the event ‘defective’ and C the 26. In a class of 30 pupils, 12 walk to school,
event ‘produced by machine C’, find 10 travel by bus, 6 cycle and 2 travel by
p(D) and p(C and D). car. If 4 pupils are picked at random,
obtain the probabilities that (a) they all
An article is examined and found to be travel by bus, (b) they all travel by the
defective. What is the probability that it same means.
was produced by C? (SMP)
If 2 are picked at random from the class,
(a) The events A and B are such that find the probability that they travel by
22.
P(A) = 0.6, P(B) = 0.25, P(A UB) =0.725. different means.
Show that the events A and B are neither In picking out pupils from the class, find
mutually exclusive nor independent. Cal- the probability that more than three
culate the values of P(AUB) and P(A|B). trials are needed before a pupil who
(b) One red card and two black cards are walks to school is selected. (JMB)
removed from a pack of cards. From the
27. Four ball-point pen refills are to be drawn
remainder, three cards are taken at
at random without replacement from a
random without replacement. Show that
bag containing ten refills, of which 5 are
the probability that they are all of the
red, 3 are green and 2 are blue. Find
same colour is 3 Assuming that this (a) the probability that both blue refills
event occurs, find the probability that a will be drawn,
fourth card drawn from the remaining 46 (b) the probability that at least one refill
cards will be of the same colour as the of each colour will be drawn. (JMB)
previous three. (L Additional)
28. At the ninth hole on a certain golf course
there is a pond. A golfer hits a grade B
23. Events A and B are such that P(A) = 3
ball into the pond. Including the golfer’s
P(A|B) = §, (ANB) = §- Find (a) P(B), ball there are then 6 grade C, 10 grade B
(b) P(A|B), (c) P(BIA), (d) (AUB). and 4 grade A balls in the pond. The
State whether events A and B are golfer uses a fishing net and ‘catches’ four
(a) mutually exclusive, (b) independent. balls. The events X, Y and-Z are defined
as follows:
24. The following are three of the classical X: the catch consists of two grade A
problems in probability. balls and two grade C balls
(a) Compare the probability of a total of Y: the catch consists of two grade B
9 with the probability of a total of 10 balls and two other balls
when three fair dice are tossed once Z: the catch includes the golfer’s own
(Galileo and Duke of Tuscany).
ball
(b) Compare the probability of at least
one six in four tosses of a fair die with Assuming that the catch is a random
the probability of at least one double-six selection from the balls in the pond,
in twenty-four tosses of two fair dice determine
(Chevalier de Mere). (a) P(X), (b) P(Y), (¢) P(Z), (d) P(ZI Y).
(c) Compare the probability of at least
For each of the pairs X and Y, Y and Z,
one 6 when six dice are tossed with the
state, with a brief reason, whether the
probability of at least two sixes when 12
two events are (i) mutually exclusive,
dice are tossed (Pepys to Newton). (ii) independent. (C)
Solve each of these problems. (AEB 1978)
29. A committee of 8 members consists of
one married couple together with 4 other
25. A set consists of 12 observations no two
men and 2 other women. From the
of which are equal. Five of the observa-
committee a working party of 4 persons
tions are selected at random. What are
is to be formed. Find the number of
the probabilities that
162 A CONCISE COURSE IN A-LEVEL STATISTICS

different working parties which can be score on the card and die noted. X
formed. denotes the event ‘Both dice are thrown’,
i : . and Y denotes the event ‘The score noted
CE eS tay is less than five.’ Calculate the proba-
party ae
(a) may not contain both the husband bilities
and his wife, (a) P(X), (b) (XNY), (ce) P(Y),
(b) must contain 2 men and 2 women, (d) P(Y|X), (e) P(XIY). (C)
(c) must contain at least one man and at 33. In a constituency containing many
LOS EO Co elderly inhabitants there are twice as
The 8 committee members sit round an many women as men. At an election
octagonal table, their positions being seven-eighths of the women and half the
decided by drawing lots. Find the proba- men cast a vote. Show that the proba-
bility of bility that an adult inhabitant (selected
(d) the man sitting next to his wife, at random) casts a vote is 3/4. For a
(e) the man sitting opposite to his wife, random group of four inhabitants, find
(f) the 3 women sitting together. (AEB) (a) the probability that just one of them
‘ votes:
30. In a game of chance, a player’s turn (b) the probability that two or more
starts by drawing a card at random from vote
a pack of playing cards. If he draws a black a :
card which is not an ace, his turn ends. If It is further found that for married
he draws a black ace he throws a black couples the probability that a man
die, and if he draws a red card he throws votes is z, the probability that a woman
a red die. After a die has been thrown, votes is g the probability that a woman
the card that was drawn is replaced in the
votes given that her husband votes is 2,
pack which is then shuffled and the
player draws again with the same con- and the probability that a man votes
ditions leading to the throwing of a die. given that his wife votes is 2 (you may
This continues until the player draws a assume that this information is con-
black card, which is not an ace, when his sistent). Find
turn ends. A player’s score in any turn is (c) the probability that a husband and
the sum of the scores thrown with the wife both vote;
red die plus three times the sum of the (d) the probability that a husband votes
scores thrown with the black die. Cal- and his wife does not vote;
culate the probability that in a turn a (e) the expected number of votes per
player will score (a) zero, (b) exactly married couple.* (SMP)
three. (L)
34. Inasingle round of a general knowledge
31. (a) Two cards are drawn at random contest, each competitor is first asked a
without replacement, from an ordinary question. If the competitor answers
pack of 52 cards. Find the probability correctly, then that competitor is asked
that they are: (i) of the same suit, another question. This continues until
(ii) of the same value (both aces, both either the competitor has answered five
kings, etc.), (iii) either of the same suit questions correctly, in which case the
or of the same value. competitor scores six points (including a
(6) Two cards are drawn at random, one bonus point), or until the competitor ‘
from each of two ordinary packs. Find answers a question incorrectly, in which
the probability that they are (i) of the case the competitor’s score in that round
same suit, (ii) of the same value, is equal to the number of correct answers
(iii) either of the same suit, the same given.
value, or both.
One of the competitors is named Smith.
(c) Three fair cubical dice are thrown. The probability that Smith answers a
Find the probability that the sum of the question correctly is p, independent of all
number of spots on the upper faces is a
previous answers. Determine the proba-
perfect square. (AEB 1976)
bility distribution of Smith’s score in a
32. A card disd single round, and show that Smith’s
is drawn from a full pack of 52 bxpected eoere is pat py pte op*)*.
playing cards. If the card drawn is an
Ace, King, Queen or Jack, two dice are At the start of the final round of the
thrown and the sum of the scores on the contest Smith is 3 points ahead of Jones,
dice noted. If any other card is drawn, and Smith and Jones are then the only
one die only is thrown and the sum of the *Expectation required — see p.171.
PROBABILITY 163

competitors who can win the contest. The 37. A company makes a certain type of fan
probability that Jones answers a question heater (called an X-heater) at each of its
correctly is also p, independent of all two factories F, and F,. The factory F;
previous answers. Show that the proba- produces one quarter and F three quarters
bility that Jones wins the contest is of the total output. X-heaters are coloured
p'(1—p)(1+
p*+p’). either red or blue. One third of the
Given that Jones wins the contest, deter- X-heaters produced at F, are red and
mine the probability that he scored 6 in seven-ninths of the X-heaters produced
the final round. (C) at F, are red.
A customer goes into a shop and selects
35. Two men are walking directly towards an X-heater at random. Show the proba-
each other on a wide pavement, along the bility is 2 that when he unpacks it he will
same line. When they are six paces apart, find that it is red.
they realise that they are in danger of
colliding. With each of his next three Two shops A and B stock X-heaters.
steps forward therefore, each pedestrian Shop A has four and shop B has three.
adopts the following strategy: if the two Find
are still in line with each other each (a) the probability that neither shop has
independently steps half-left with proba- ared X-heater;
(6) the probability that there are at least
bility p, or steps half-right with probability
3 X-heaters in shop A;
Dp, or keeps straight on with probability
(c) the probability that there are the
1— 2p; if they are not still in line, each
same number of red X-heaters in each
keeps walking straight forward (the
shop;
diagram illustrates one possible version
(d) the probability that there are two
of the encounter). Calculate the proba-
red X-heaters in each shop, given that all
bility that they are still in line after each
the X-heaters in shop A come from F and
has taken his first step, and deduce that the
that all the X-heaters in shop B come
probability of a collision is (1— 4p + 6p’).
from F).
(You may leave all your answers as
fractions with powers of 3 as denomina-
tors.) (SMP)

38. (a) Find the number of ways in which 10


people can be divided into
(SMP) (i) two groups consisting of 7 and 3
people,
36. A college has 750 women students and (ii) three groups consisting of 4, 3
2250 male students. There is a higher and 2 people with 1 person rejected.
proportion of male students in engineer- (b) Seven coins of which 3 are silver and
ing, physics and similar subjects so that 4 are copper are in a box. A random
60% of male students study mathe- selection of 3 coins is made and the
matics and only 30% of women students coins selected are placed in a purse
study it. If one student studying mathe- (purse A). The remaining coins are
matics is chosen at random from all the placed in a second purse (purse B). Find
students studying it, what is the proba- the probabilities associated with each of
bility that the student will be a woman? the possible numbers of silver coins
If three students studying mathematics (ranging from 0 to 3) in purse A.
are chosen, what is the probability that On a particular occasion it is known
there will be at least two men? that purse A has in it 2 silver coins and
1 copper coin, and that the remaining
25% of all the students study French.
coins are in purse B. If one coin is then
The proportion of male students of
drawn at random from a purse selected
mathematics who also study French is
at random, find the probability that the
20% and the proportion of women
coin is silver. (C)
students of French who also study mathe-
matics is 20%. There are 500 male
students of French. If four students are 39. X,, X) and X3 are three independent
selected what is the probability that at events with probabilities of occurrence
Pr(X;) = pj, i = 1, 2,3. Give the proba-
least one is male and at least one is
bilities of the occurrence of 0,1, 2 and 3
studying mathematics and at least one is
studying French? (MET) events respectively and verify that these
164 A CONCISE COURSE IN A-LEVEL STATISTICS

probabilities satisfy the conditions for a exactly (c) 2 boys, (d) 2 children with
distribution. blue eyes. (MEI)

If
43. A committee has 22 members, of whom
Pr(at least one event) = 0.664 7 have black hair, do not smoke and do
Pr(at least two events) = 0.212, not wear glasses; 5 have white hair, do not
and smoke and do not wear glasses; 4 have
white hair, smoke and wear glasses; 3
Pr(at most two events) = 0.976, have black hair, smoke and do not wear
find the probabilities of exactly 0,1, 2 glasses, 2 have white hair, do not smoke
and 3 events respectively. and wear glasses; 1 has black hair, smokes
By considering a linear combination of and wears glasses.
Pr(one event) and Pr(two events) and (a) One committee member is chosen at
. Pr(three events) find the value of random. Let W be the event that this
member has white hair, G be the event
Pit pot p3. (MEI)
that this member wears glasses and S
40. (a) Two cards are drawn from a well the event that the member smokes.
shuffled pack of 52 playing cards. If Find (i) P(W), (ii) P(WIS), (iii) P(W1G),
Jacks, Queens and Kings count 10 points, (iv) the probability that this member has
aces count 1 point, and the rest count either white hair or glasses (but not both),
points equal to their face values, what is given that this member smokes. Are the
the chance that the total points of the events W and S independent? Are the
two cards will be 12? events W and G independent? Give a
(bo) If three cards are drawn in succession reason for each answer.
from a complete pack, what is the proba- (b) Two committee members are chosen
bility that the first two cards score 12 at random. Let W, be the event that both
points and the total points will be less have white hair. Let S, be the event that
than 21? (O &C) both smoke. Find (i) P(W2), (ii) P(W2! S2).
4l. A hand of 18 cards is dealt from a stan- (C)
dard pack of 52 cards (which consists of
4 suits, clubs, diamonds, hearts, and 44, (a) How many odd numbers can be
spades, each of 138 cards). formed from the figures 1,2,3 and 5 if
(a) Write down, but do not calculate, an repetitions are not allowed?
expression for the probability that the (6) See worked example, p. 135.
hand consists of 3 spades, 4 hearts, and 6 (c) Six different books lie on a table, and
cards from the other suits. a boy is told that he can take away as
(b) Calculate, to 3d.p., the conditional many as he likes but he must not leave
probability that the hand contains exactly empty handed. How many different
3 diamonds, given that it contains exactly selections can he make? One of the books
3 spades and 4 hearts. is a Bible. How many of these selections
(c) Calculate, again to 3 d.p., the proba- will include this Bible? (SUJB)P
bility that the hand contains at least 2
diamonds given the same conditions as 45. (a) Of the households in Edinburgh, 35%
in part (b). (O) have a freezer and 60% have a colour TV
set. Given that 25% of the households
42. Show that the total number of random have both a freezer and a colour TV set,
samples of size r that can be drawn from calculate the probability that a house-
a population of size n is hold has either a freezer or a colour TV
n!} set but not both.
nna)! State, with your reasons, whether the
In a class of 10 boys and 10 girls there are events of having a freezer and of having a
5 children with blue eyes. A random colour TV set are or are not independent.
sample of 4 children is taken. Find the (b) State in words the meaning of the
probabilities that in this sample there are symbol P(B|A), where A and B are two
exactly (a) 2 boys, (6) 2 children with events.
blue eyes.
A shop stocks tinned cat food of two
Half the children living in a big city are makes, A and B, and two sizes, large and
boys, and one quarter of the children small. Of the stock, 70% is of brand A,
have blue eyes. A random sample of 4 30% is of brand B. Of the tins of brand
children is taken. Estimate the proba- A, 30% are small size whilst of the tins
bilities that in this sample there are of brand B, 40% are small size. Using a
PROBABILITY 165

tree diagram, or otherwise, find the his third shot, (iii) the probability that
probability that A wins.
(i) a tin chosen at random from the (6) Given that the archers toss a fair coin
stock will be of small size, to determine who shoots first, find the
(ii) a small tin chosen at random probability that A wins. (JMB)
from the stock will be of brand A. (L)
49. (a) Explain in words the meaning of the
46. During an epidemic of a certain disease a symbol P(A|B) where A and B are two
doctor is consulted by 110 people suffer- events. State the relationship between A
ing from symptoms commonly associated and B when (i) P(A|B) = 0,
with the disease. Of the 110 people, 45 (ii) P(A|B) = P(A).
are female of whom 20 actually have the
When a car owner needs her car serviced
disease and 25 do not. Fifteen males have
she phones one of three garages, A, B, or
the disease and the rest do not.
C. Of her phone calls to them, 30% are to
(a) A person is selected at random. The
garage A, 10% to B and 60% to C. The
event that this person is female is denoted
percentages of occasions when the garage
by A and the event that this person is
phoned can take the car in onthe day of
suffering from the disease is denoted by
phoning are 20% for A, 6% for B and 9%
B. Evaluate (i) P(A), (ii) P(A UB),
for C. Find the probability that the
(iii) P(A NB), (iv) P(AIB).
garage phoned will not be able to take
(b) If three different people are selected
the car in on the day ef phoning.
at random without replacement, what is
the probability of (i) all three having the Given that the car owner phones a
disease, (ii) exactly one of the three garage and the garage can take her car
having the disease, (iii) one of the three in on that day, find the probability that
being a female with the disease, one a she phoned garage B.
male with the disease and one a female (b) A shelf contains ten box files of which
without the disease? four are empty and six contain papers.
(c) Of people with the disease 96% react Five files are chosen at random one after
positively to a test for diagnosing the another from the shelf. Find, to 3 decimal
disease as do 8% of people without the places, the probability that exactly two
disease. What is the probability of a person of the chosen files will be empty wher
selected at random (i) reacting positively, the files are chosen (i) with replacement,
(ii) having the disease given that he or she (ii) without replacement. (L)
reacted positively? (AEB 1987)
50. Show that, for any two events E and F
47. In a simple model of the weather in Oct- P(EUF) = P(E)+P(F)—P(ENF)
ober, each day is classified as either fine
or rainy. The probability that a fine day Express in words the meaning of
is followed by a fine day is 0.8. The prob- P(E\F).
ability that a rainy day is followed by a Given that E and F are independent
fine day is 0.4. The probability that 1 events, express P(EMF) in terms of
October is fine is 0.75. P(E) and P(F), and show that E' and F
(a) Find the probability that 2 October are also independent.
is fine and the probability that 3 October In a college, 60 students are studying one
is fine. or more of the three subjects Geography,
(b) Find the conditional probability that French and English. Of these, 25 are
3 October is rainy, given that 1 October studying Géography, 26 are studying
is fine. French, 44 are studying English, 10 are
(c) Find the conditional probability that studying Geography and French, 15 are
1 October is fine, given that 3 October is studying French and English, and 16 are
(C)
rainy. studying Geography and English. Write
down the probability that a student
48. Two archers A and B shoot alternately at chosen at random from those studying
a target until one of them hits the centre English is also studying French. Deter-
of the target and is declared the winner. mine whether or not the events ‘studying of
Independently, A and B have probabili- Geography’ and ‘studying French’ are
ties of 3 and 4, respectively, of hitting the independent.
centre of the target on each occasion
A student is chosen at random from all
they shoot.
60 students. Find the probability that the
(a) Given that A shoots first, find (i) the
chosen student is studying all three sub-
probability that A wins on his second (L)
jects.
shot, (ii) the probability that A wins on
166 A CONCISE COURSE IN A-LEVEL STATISTICS

51. Explain, by suitably defining events A view is wrongly transmitted to the appli-
and B, what is meant by ‘the probability cant as a morning interview with proba-
of A occurring given that B has occurred’. bility 0.1. Find the probability that an
A local greengrocer sells conventionally applicant arriving
grown and organically grown vegetables. (i) for a morning interview is expec-
Conventionally grown vegetables con- ted for a morning interview,
stitute 80% of his sales; carrots constitute (ii) for an afternoon interview is
12% of the conventional sales and 30% of expected for an afternoon interview.
the organic sales. (AEB 1988)

Display this information in an approp- 53. (a) A bag contains 4 red, 6 white and 5
riately and accurately labelled tree blue balls. If a random sample of 6 balls is
diagram. selected (without replacement) what is
One day a customer emerges from the the probability that there are two balls of
shop and is questioned about her pur- each colour?
chases. What is the probability that she (b) A number N consists of n digits each
bought of which can be 0 or 1. It is copied onto a
(a) conventionally grown carrots, _ Sheet of paper by A and the probability
(6) carrots? that A transcribes any digit wrongly is p.
Given that she did buy carrots, what is The sheet of paper is then passed to B
the probability that they were organically who copies the number onto another
grown? What assumptions have you made sheet of paper. The probability that B
in answering this question? (O) transcribes any digit wrongly is p’. What
is the probability that the number written
52. (a) In a group of 200 people, each by B contains no error?
individual is classified as either male or (c) An assembly plant receives 60% of its
female and according to whether or not resistors from supplier X and 40% from
he or she wears glasses. The numbers supplier Y. 5% of X’s resistors and 6% of
falling into each category are as tabulated. Y’s are defective. If a resistor is tested at
the plant and found to be defective, what
Not wearing | Wearing is the probability that it was supplied
glasses glasses by X? (SUJB)
Male 90 24 54. A and B are mutually exclusive and
Female 66 20 exhaustive events in a sample space S
and Cis any event in S for which
Suppose one of this group is chosen at P(C) #0 on ideri
random. Let A be the event that the See Or ye eee aie a
person chosen is male and B the event
that the person chosen is not wearing PrAIC Se P(C\A)*P(A)
glasses, P(C|A)*P(A)+P(C|B)*P(B)
(i) Define the events A’ and AUB’.
(ii) Calculate the probability of occur-
rence of each of the events in-(i)
Tee SU DRE eects carmen ey”
it classifies the client as either class A
(iii) Given that the person chosen is (eoad Tee) OU es oe
pie: Weis euinced Ocalcuin icine clients are class A. Records show that
PrOcebuteharihic persod ic malc. the probability of a client making a claim
(iv) Use the available data to deter- during any year is 0.08 for class A and
mine whether not wearing glasses is Coos ae ;
independent of sex within the group. (2) Mr Smith buys a policy and makes a
Give a practical interpretation to your claim during his first year. Calculate,
finding. each to 3 decimal places, the probability
(b) After advertising for an assistant, a that Mr Smith was originally classified A
manager decides to interview suitable or was originally classified B.
applicants. The interview of an applicant (6) Mrs Jones bought a policy two years
will take place during the morning or the ago and has not made a claim during that
afternoon with probabilities 0.45 and time. Show that she is more likely to be
0.55 respectively. Each applicant is in- class B.
formed by telephone and in each case a Show, also, that if she does not make a
message has to be left. A morning inter- claim for a further two years she is more
view is wrongly transmitted to the appli- likely to be class A than class B. What do
cant as an afternoon interview with you need to assume for your calculations
probability 0.2, and an afternoon inter- to be valid? (SUJB)
Oe eee eee
PROBABILITY
DISTRIBUTIONS I —
DISCRETE RANDOM
VARIABLES
DISCRETE RANDOM VARIABLE

Let X have the following properties:


(a) it is a discrete variable,
(b) it can only assume values x,, X2,..., Xp,
(c) the probabilities associated with these values are pj, D2, ---, Pm
where P(X =x,) = Di

P(e X32) = Pa

P(X = Xn) = Pn
then X isa discrete random variable if p, +p,+...+p, = 1.

This can be written

ll ie (= 1.2...77

or > P(X = x) Il ra

allx

We usually denote a random variable (r.v.) by a capital letter


(X, Y, R, etc.) and the particular value it takes by a small letter
(x, y, 7, etc.).

Example 3.1 Let X be the discrete variable ‘the number of fours obtained when
two dice are thrown’. Show that X is a random variable.

Solution 3.1 With regard to the number of fours thrown, the outcome could be
one of the following: 0 fours, or 1 four, or 2 fours.

167
168 A CONCISE COURSE IN A-LEVEL STATISTICS

Therefore X can assume the values 0,1 and 2 only.


Then, with obvious notation,

Ped 5\ (5 25
P(X =0) = P(44) = aid. er

sem =EG) o61E)


px =1)= paa+Paa = (2)(2] Oe lca

P(X =2) FE
= P44)=| ("
[EJIE)| Pee
=
95. bite et
oy DPX =3) = 36+ 36* 6

_ 36
v4eRG
a4
Therefore X is a discrete random variable.

PROBABILITY DENSITY FUNCTION

We can write the results obtained in Example 3.1 in table form:

This is known as the probability distribution of X.


The function which is responsible for allocating probabilities is
known as the probability density function (p.d.f.) of X.
In Example 3.1, the p.d.f. ofX is given by P(X = x) for x = 0,1,2.
Sometimes the p.d.f. can be expressed as a formula, as in the
following example.

Example 3.2 Two tetrahedral dice, each with faces labelled 1,2,3 and 4 are
thrown and the score noted, where the score is the sum of the
two numbers on which the dice land. If X is the r.v. ‘the score when
two tetrahedral dice are thrown’, find the p.d-f. of X.
PROBABILITY DISTRIBUTIONSI — DISCRETE RANDOM VARIABLES 169

Solution 3.2 The score for each possible outcome is shown in the table:
‘Score’

die
Second From the table we can see that X
can assume the values 2, 3, 4, 5, 6,
7,8 only.

First die

The probabilities can be found from the table, as each outcome


shown is equally likely.

For example P(X = 5) = 2 as 4 out of the total of 16 outcomes


result in a score of 5.
The probability distribution is formed:

This can be written as a formula, giving the p.d.f. of X as


ea1
P(X=x) = for x = 2,3,4,5
16
9%
P(X =x) = for x = 6,7,8
16
NOTE: > PUK =a) = BAF 2+3+44+34+241)=1,
allx
confirming that X is a random variable.

Example 3.3 The p.d.f. of a discrete random variable Yis given by P(Y = y) = cy’,
for y = 0,1,2,3,4. Given that c is a constant, find the value of c.

Solution 3.3 The probability distribution of Y is

As Y is a random variable, DEC =y)=1.


ally
So Let 4ea9edl6e
380c = 1

ae1
30
170 A CONCISE COURSE IN A-LEVEL STATISTICS

Therefore if Y is a random variable, then c = 5

Example 34 The p.d.f. of the discrete r.v. X is given by P(X =x)= a(2)* for
x = 0,1,2,3,.... Find the value of the constant, a.

Solution 3.4 As X is a random variable NDEs =x)=1.


allx

Now P(X= 0) = a(8)"


PIX= ites a(3)'
P(X = 2) = a(3)’
P(X= 8) = a(3)°
and so on

So » X=) = ata(3)+a(3)+a(2)3+...
allx

= a(1 +34 (8)? + (2)? +...)


1 a ed,
= | | (sum of an infinite G.P. with
1—%) first term 1 and common
ratio 3)

=" (a)(4)
We have 4a = 1

1
Therefore a=
4

If X is a random variable, then a = i.

Exercise 3a

1. For each of the following random variables (e) The number of tails obtained when
write out the probability distributions. three fair coins are tossed.
Check that the variables are random ((f) The difference between the numbers
and for parts (b), (d) and (f) write the \when two ordinary dice are thrown.
formula for the p.d.f.
(a) The number of heads obtained when The probability density function of a
two fair coins are tossed. discrete random variable X is given by
(6) The sum of the scores when two P(X = x) = kx forx = 12,18,14. Find the
ordinary dice are thrown. value of the constant k.
(c) The number of threes obtained when
two tetrahedral dice are thrown. The discrete random variable R has p.d.f.
(d) The numerical value of a digit chosen given by P(R =r)=c(3—r) for r=0,
from a set of random number tables. 1, 2,3. Find the value of the constant c.
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 171
4. <A game consists of throwing tennis balls 5. A drawer contains 8 brown socks and 4
into a bucket from a given distance. The blue socks. A sock is taken from the
probability that William will get the tennis drawer at random, its colour is noted
ball in the bucket is 0.4. A ‘go’ consists of and it is then replaced. This procedure is
three attempts. (a) Construct the proba- performed twice more. If X is the r.v. ‘the
bility distribution for X, the number of number of brown socks taken’, find the
tennis balls that land in the bucket in a go. probability distribution for X.
William wins a prize if, at the end of his
go, there are two or more tennis balls in 6. Ther.v. X has p.d.f. P(X =x) = e(2)” for
the bucket. (b) What is the probability x =0,1,2,3,.... Find the value of the
that William does not win a prize? constant, c.

EXPECTATION, £(X)
Experimental approach
Suppose we throw an unbiased die 120 times and record the results:

Score, x 1 2 3 4 5 6
Frequency, f 1Oee225 Coe t eee bom. FOtalt ZO

Then we can calculate the mean score obtained where

x= = = 3.558 «(8 dp.)

Theoretical approach
The probability distribution for the r.v. X where X is ‘the number
on the die’ is as shown:

Score, x

We can obtain a value for the ‘expected mean’ by multiplying each


score by its corresponding probability and summing, so that

expected mean
fg)+ alg}+96) le) lal]
21
6
= 3.5
So, the expected mean = 3.5.
If we have a statistical experiment:
a practical approach results in a frequency distribution and a
mean value,
a theoretical approach results in a probability distribution and
an expected value.
172 A CONCISE COURSE IN A-LEVEL STATISTICS

The expectation of X (or expected value), written E(X) is given —


by
E(X) = », xP(X = x)
allx

This cn also be written

E(X) = >. xiv; i=1,2,...,n

Example 3.5 Av.v. X has ap.d-f. defined as shown. Find E(X).

Ca P(X = x) 0.3 0.1 0715 0.05

Solution 3.5 Now

E(X) » «P(x = x)
allx

(— 2)(0.3) + (—1)(0.1) + 0(0.15) + 1(0.4) + 2(0.05)


Ue
Therefore E(X) = — 0.2.

NOTE: an important property which some probability distributions


possess is that of symmetry.
For example, (a) Consider the r.v. with probability distribution:

P(X
= 2) ai Or 0.28 0 de ee
It can be seen from the table that the distribution is symmetrical
about the central value X = 3, so E(X) = 8. .
Check: E(X) = > P(X = x) = 1(0.1) + 2(0.2) + 3(0.4)
ax + 4(0.2) + 5(0.1) = 3

(b) Consider the r.v. with p.d.f. P(X =x) = 5forx— lee, eae

The a distribution for X is:

The distribution is symmetrical about the central value, mid-way


between 4 and 5,so E(X) =
#
PROBABILITY DISTRIBUTIONS| — DISCRETE RANDOM VARIABLES 173

Example 36 A fruit machine consists of three windows, each of which shows


pictures of fruits —lemons or oranges or cherries or plums. The
probability that a window shows a particular fruit is as follows:
P(lemons) = 0.4, P(oranges) = 0.1, P(cherries) = 0.2,
P(plums) = 0.3
The windows operate independently.
Anyone wanting to play the fruit machine pays 10p for a turn.
_ The winning combinations are as follows:
Oranges in 3 windows £1.00
Cherries in 3 windows £0.50
Oranges in 2 windows and cherries in1 window £0.80
Lemons in 3 windows £0.40
Find the expected gain/loss per turn.

Solution 3.6 P(orangesin 3 windows) = (0.1)? = 0.001 (independent events)


P(cherries in 3 windows) = (0.2)? = 0.008
P(oranges in 2 and cherries in 1) = 38(0.1)?(0.2) = 0.006
P(lemons in 3 windows) = (0.4)? = 0.064
Therefore
P(combination will not win a prize)
lI 1—(0.001 + 0.008 + 0.006 + 0.064)
= 0.921
Let X be the r.v. ‘the amount gained per turn in pence’.

Now the amount paid out by the fruit machine could be 100p,
80p, 50p, 40p or Op.
So considering the initial payment of 10p for a turn, X can assume
the values 90, 70, 40, 30,—10.
The probability distribution for X is

90.70 40 30. —10


P(X=x) |0.001 0.006 0.008 0.064 0.921

Now E(X) > xP(X = «)


allx

90(0.001) + 70(0.006) + 40(0.008) + 30(0.064)


+ (—10)(0.921)
— 6.46
So, the expected loss per turn is 6.46p.
poh eee ee ereeentoe Ee
174 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 3.7 (a) Three dice are thrown. If a 1 ora 6 turns up, you will be paid
1p, but, if neither a 1 nor a 6 turns up, you will pay 5p. How
much would you expect to lose in 9 games?
You are now given the opportunity to change the rule for
payment when a 1 ora 6 appears. To make the game worth-
while to yourself, what is the minimum amount in everyday
currency that you would suggest?

(b —
Three coins are thrown. If one head turns up, 1p is paid. If two
heads turn up, 3p is paid, and if three heads turn up 5p is paid.
If the game is to be regarded as fair (i.e. neither the player nor
the bank should lose in the long run), what should be the
penalty if no heads turn up?

(c) A bag contains 3 red balls'and 1 blue ball. A second bag contains
1 red ball and 1 blue ball. A ball is picked out of each bag and
and is then placed in the other bag. What is the expected num-
ber of red balls in the first bag? (SUJB)

Solution 3.7 (a) P(1 or 6 on die) = == 5.


If three dice are thrown, P(neither a 1 nor a 6 on all three) = (2)°.
So P(a 1 or a6 turns up) = 1— (3)° =
Let X be the r.v. ‘the number of pence won in a game’. Then X can
assume the values — 5 and 1 only. Now

P(X =—5) = P(neitheral nora6) = 27

12
P(X =1) = P(al ora6) = 27

The probability distribution for X is

So E(X) = >. sP(X = x)


allx

=, eee 19
( a(S] + (55
(een
9
Therefore the expected loss after one game is 1p,

so, after 9 games, the expected loss is 7p.


7
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 175
If we change the rule for payment to y pence when a1 or a 6 turns
up then the probability distribution becomes

8 19
We now have E(X) = =
(— a (>>|

—40+19y
27
To make the game worthwhile, we require E(X) > 0,
i.e. —40+19y > 0.

So Re RS

Therefore the minimum amount we require to be paid, in everyday


currency, is 3p.

(b) Three coins are thrown. Let X be the r.v. ‘the number of pence
paid in 1 game’.
Then

P(X =1) = P(1 head) = P(HTT)+P(TTH)


+ P(THT)
w

ll co lI
bo
|eSee
NS |
co

P(X = 8) |!‘ P(2 heads) = P(HHT) + P(HTH) + P(THH)


L\ tars
a
2 8
1\3 1
P(X= 5) = P(8 heads) = P(HHH) = =) = =

If the penalty if 0 heads turn up is y p,

Tae iL
P(X =—y) = P(Oheads) = P(TTT) = S = e

The probability distribution of X is


176 A CONCISE COURSE IN A-LEVEL STATISTICS

So Ey > P(X = x)

Bla) )eol
allx

8 8 8 8

Licey
- 8
Now, if the game is to be fair, the expected winnings must be zero.

So a. = 0
i.e. y = 17
Therefore the penalty if 0 heads turn up should be 17p.

(c) Assume that the balls are taken from each bag simultaneously.
If a red ball is picked from each bag and placed in the other then
the number of red balls in the first bag is now 3, etc.
Let X be the r.v. ‘the final number of red balls in the first bag’.
Then X can assume the values 2, 3 or 4 only.
P(X = 2) = P(red from first bag and blue from second bag)
P(R,B,) with obvious notation

)
3
8
P(X = 3) Il P(R,R2) + P(B,Bo)

l(a} +a)
1
2
P(X = 4) P(B,R2)

iG)
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 177

So E(X) = xP(X
= x)

3
(3) +963) (6)
= 2(—| +3(/—| +4|—
8 2 8
= 2-—-
4
The expected number of red balls in the first bag after the exchange is
22balls.

Exercise 3b
a

‘WW The probability distribution for the r.v. X If it lands on a face marked with a 2 ora
\—"_ is shown in the table: 4, the player wins 5p and if it lands on a
3, the player wins 3p. Find the expected
gain in one throw.

8. A discrete r.v. X can assume values 10

Find E(X). and 20 only. If E(X) = 16, write the


BY, p.d.f. of X in table form. eee Op.
i) La Mt.
{ 2™) The r.v. X has p.d.f. P(X =x) for x =5,
A 6, 7,8, 9 as defined in the table: { 9x" The discrete r.v. X can assume values 0,1, k
~“ 9 and 3 only. Given P(X <2) = 0.9,
P(X <1)=0.5 and E(X)=1.4, find
(a) P(X = 1), (b) P(X = 0).
10. Ina game, a player rolls two balls down
S& Find E(%). an inclined plane so that each ball finally
(\ 3, The probability distribution of a r.v. X is settles in one of five slots and scores the
Rew as shown in the table: number of points allotted to that slot as
shown in the diagram below:

P(X =x) {0.1 0.38 » 0.2 0.1

Find (a) the value of y, (b) E(X).

4. Find the expected number of heads when


‘two fair coins are tossed. It is possible for both balls to settle in one
slot and it may be assumed that each slot
ball.
is equally likely to accept either
5:1) Find the expected number of ones when
three ordinary fair dice are thrown.
A The player’s score is the sum of the points
scored by each ball.
6, A bag contains 5 black counters and 6 red
“counters. Two counters are drawn, one at
Draw up a table showing all the possible
a time, and not replaced. Let X be the r.v. scores and the probability of each.
‘the number of red counters drawn’. If the player pays 10p for each game
Rae and receives back a number of pence
equal to his score, calculate the player’s
7. An unbiased tetrahedral die has faces
expected gain or loss per 50 games.
marked 1, 2,3,4. If the die lands on the (C Additional)
face marked 1, the player has to pay 10p.
Bs x
178 A CONCISE COURSE IN A-LEVEL STATISTICS

11. Ina game a player tosses three fair coins. (d) the probability that no red disc will
He wins £10 if 3 heads occur, £x if 2 be drawn,
heads occur, £3 if 1 head occurs and £2 (e) the most probable number of red
if no heads occur. Express in terms of x discs that will be drawn,
his expected gain from each game. (f) the expected number of red discs
Given that he pays £4.50 to play each that will be drawn, and state the proba-
bility that this expected number of red
game, calculate
discs will be drawn. (JMB)
(a) the value of x for which the game is
fair,
17. A woman has 3 keys on aring, just one of
(b) his expected gain or loss over 100
which opens the front door. As she
games if x = 4.90. (C Additional)
approaches the front door she selects one
12. A committee of 3 is to be chosen from key after another at random without
4 girls and 7 boys. Find the expected replacement. Draw a tree diagram to
number of girls on the committee, if the illustrate the various selections before she
members of the committee are chosen at finds the correct key. Use this diagram to
random. calculate the expected number of keys
that she will use before opening the
13. The discrete r.v. X has p.d.f. given by door. (L Additional)
P(X =x) =kx for x =1,2,3,4,5 where
k is constant. Find E(X).
18. An urn containing 4 black balls and 8
14. In an examination a candidate is given the white balls is used for two experiments.
four answers to four questions but is not In experiment 1, two balls are to be
told which answer applies to which drawn at random from the urn, one after
question. He is asked to write down each the other, without replacement. In experi-
of the four answers next to its appropriate ment 2, one ball is to be drawn at random
question. S from the 12 balls in the urn and replaced
(a) Calculate in how many different ways before a second ball is drawn at random.
he could write down the four answers. Copy and complete the following two
(b) Explain why it is impossible for him tables, which give the probabilities for
to have just three answers in the correct the different compound events in the
places and show that there are six ways of two experiments.
getting just two answers in the correct
places. Second ball
(c) If a candidate guesses at random
where the four answers are to go and X
is the number of correct guesses he makes,
draw up the probability distribution for
‘X in tabular form.
_-—./(d@) Calculate E(X). (L Additional)
f)

The above table shows the probability


distribution for a random variable X.
Calculate (a) c, (b) E(X). (L Additional)

16. A box contains 9 discs of which 4 are red,


3 are white and 2 are blue. Three discs are
Experiment 2
to be drawn at random without replace-
ment from the box. Calculate For each of the two experiments, cal-
(a) the probability that the discs, in the culate the expected number of black balls
order drawn, will be coloured red, white which will be drawn.
and blue respectively, If in experiment 2, the urn contains b
(b) the probability that one disc of each black balls and w white balls, where
colour will be drawn, b+ w= 12, calculate the expected num-
(c) the probability that the third disc ber of black balls which will be drawn.
drawn will be red, (L Additional)
PROBABILITY DISTRIBUTIONSI — DISCRETE RANDOM VARIABLES : 179

THE EXPECTATION OF ANY FUNCTION OF X, E[g(X)]


The definition of expectation can be extended to any function of
the random variable such as 10X, X?, (X — 4)‘, etc.

In general, if g(X) is any function of the discrete random variable


X then : . : .
Elg(X)] = > e(x)P(X =x)
allx

Example 3.8 Ina game a turn consists of a tetrahedral die being thrown three
times. The faces on the die are marked 1, 2,3, 4 and the number on
which the die falls is noted. A man wins £x? whenever x fours
occur in a turn. Find his average win per turn.

Solution 38 Let X be ther.v. ‘the number of fours obtained when the die is
thrown three times’. Then X can assume the values 0, 1, 2, 3 only.
a ee al 27
Pee aN1 5

alah =z
Wehave P(X =0) II

a8
P(X =1) = 3P(444)

P(X= 2).=| 3P(444) II

P(X = 3) P(444) = EI =—

The average win is given by E(X 2) so we write out the probability


help
distribution for X, but add a row showing the values of X? to
make the calculations easier:

Now

ll S
i) (ei)*Ce)*
~]

eat Ga
1.125

Therefore per turn is £1.13 (nearest p).


his average win a
eg
180 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 39 The random variable X has p.d.f. P(X = x) forx = 1, 2,3.

P(X=x) |0.1 06 0.3


Calculate (a) E(3), (b) E(X), (c) E(5X), (d) E(5X+ 3),
(e) 5E(X)+8, (f) E(X*), (g) E(4X?—3), (h) 4E(X?)—38.
Comment on your answers to parts (d) and (e) and parts (g) and (h).

Solution 3.9 We have

Now Efg(x)] = )° a(x)P(X


=x)
allx

(a) E(3) = DS 3P(X = x)


allx

= 3(0.1)
+ 3(0.6) +3(0.3)
= 3
E(3) = 3

(b) E(X) = Dae a)


allx

= 1(0.1)
+ 2(0.6) +3(0.3)
= 22
E(X) = 2.2

(c) E(5X) = ye 5a P(X = x)


allx

= 5(0.1)+10(0.6)
+15(0.3)
= 11
E(5X) = 11
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 181

(d) E(5X+8) = yy (5x + 3)P(X


= x)
allx

= 8(0.1)+13(0.6) + 18(0.3)
= 14
E(\5X+3) = 14

(e) 5E(X)+3 = 5(2.2)+3


= 14
5E(X)+3 = 14

(f) E(X?) = De x?P(X = x)


allx

= 1(0.1)
+4(0.6) +9(0.3)
= 5.2
E(X?) = 5.2

(g) E(4X?2—8) = ys (4x2—3)P(X


= x)
allx

= 1(0.1)+13(0.6)
+ 33(0.3)
= 17.8
E(4X?—3) = 17.8
(h) 4E(X*)—3 = 4(5.2)—3
= 17.8
4E(X?)—3 = 17.8
We note that E(5X+8) = 5E(X)+38
E(4X?—3) = 4E(X?)—3

In general, the following results hold when X is a discrete random


variable:

Result 1 E(a) = a,wherea is any constant.

Proof: E(a) = >a aP(X = x)


allx

=a P(X =x)

a since >, PX =x) = 1


allx
182 A CONCISE COURSE IN A-LEVEL STATISTICS

Result 2 E(aX) = aE(X), where aisany constant /

Proof: E(aX) i axP(X = x)


allx

a) xP(X = x)
allx

aE(X)

Result 3 E(aX + b)=a E(X)+ b,where a and 6are any constants.

Proof: E(aX+b) = > (ax+ b)P(x = x)


allx

Ee y axP(X = x)+ »- bP(X = x)


all x allx

= aE(X)+b

Result 4 E[A(X)+AlX)] = ELAtX)) + EAC] where f, and


fy
arefunctionsof X. oo oo

Proof:

Elf(X) + A(X] = D° ti) + fle) 1PX = =)


allx

= So Ale)P(X = x) + D* f(@)P(X = x)
= Elfi)1+ EAC)

_ Exercise 3c

1. The discrete r.v. X has p.d.f. P(X = x) for 3. The discrete r.v. X has p.d.f. given by
x= 1,2,3. P(X =x) =# for x = 1,2,3,4,5,6.
Find (a) E(X), (b) E(X), (e) E(8X+ 4).
P(X=x) 10.2 03 0.5 Verify that .
E(2X*+ X—4) = 2E(X*) + E(X)—
Find (a) E(X), (b) E(X?). : oo ae aie
(c) Verify that E(x 1) = 38E(X)—1.
(d) Verify that E(2X?+ 4) = 2E(X)+4. 4. The discrete r.v. X has p.d.f. given by
2. The discrete r.v. X has p.d_f. P(X =x)= 3x + 1 f =
P(X = 0) = 0.05, P(X = 1) = 0:45, aire aa
P(X = 2) = 0.5. Verify that Find (a) E(X), (b) E(X?), (c) E(3X— 2),
E(5X?+ 2X —8) = 5E(X?)+ 2E(X)—3. (d) E(2X?+ 4X— 8).
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 183

5. A roulette wheel is divided into 6 sectors 6. Ther.v. X has p.d.f. P(X = x) as shown in
of unequal area, marked with the numbers the table:
1,2,3,4,5 and 6. The wheel is spun and
X is the r.v. ‘the number on which the
wheel stops’. The probability distribution
of X is as follows: P(X=x)|01 01 O08 O04 O12

5 et Find the value of c (a) if E(X) = 0.3,


pie} mle]
& Ble|
o

Calculate (a) E(X), (b) E(X”),


(c) E(3X—5), (d) E(6X”),
(e) E(6X*+ 6X—10).

VARIANCE, Var(X)

Consider the discrete r.v. X and let E(X) = p (pronounced “‘mew’).

Thevariance of X, written Var(X), is given by _


: Var(X) = E(X—n)?
For a frequency distribution we found that the formula for the
variance
2
$2 ee Dane ee SIT sae DLLICC LO Sa 2 a 2f" _= <x2
ook Xf

In the same way, we can find an alternative form for the formula
for Var(X). Now
Var(X) E(X—p)?
= A(X 22x + py")
E(X*)— 2uE(X) + E(u’)
E(X?)=2y?+yp?
E(X?)—yw?
So we have _ ‘Var(X) Il Bee
NOTE: yp? =[E(X)]?.
We write [E(X)]? as E?(X) in a similar way to the notation used in
trigonometry where (sin A)? is written sin7A.

So we have Var(X) = E( x?)— EX x) :

the table:
Example 3.10 The r.v. X has probability distribution as shown in
Lie 2daSers aq 55
(0.3) 0120.5, 0.1
184 A CONCISE COURSE IN A-LEVEL STATISTICS

Find
(a) w= E(X),
(b) Var(X), using the formula Var(X) = E(X— p)?,
(c) E(X”),
(d) Var(X), using the formula Var(X) = E(X?)— y?.

Solution 3.10 (a) By symmetry, p = E(X) = 3.

) E(X— py? EG 3)t


=} (&—3)P(X = x)
allx

eae 3 4 iad
—2 —- 1-8 Troae?

So E(X—3)? = 4(0.1) + 1(0.3) + 0(0.2)+ 1(0.3) + 4(0.1)


aes
Therefore Var(X) = E(X—p)* = 1.4.

(c) E(X”) », x? P(X = x)


allx

1(0.1) + 4(0.3) + 9(0.2) + 16(0.3) + 25(0.1)


10.4
So E(X*) = 10.4.

(d) Now Var(X) = E(X?)—p?


= 10.4—9
= 14
Therefore Var(X ) = 1.4, as before.

Usually we use the most convenient form of the formula for the
variance. *

Example 3.11 Two discs are drawn, without replacement, from a box containing
3 red discs and 4 white discs. The discs are drawn at random. If X
is the r.v. ‘the nu mber of red discs drawn’, find (a) E(X), (b) the
standard deviation of X.
PROBABILITY DISTRIBUTIONS I — DISCRETE RANDOM VARIABLES 185

Solution 3.11 X is the r.v. ‘the number of red discs drawn’.

Now X can assume the values 0,1, 2 only. We have

4\/3 12 2
P(X
= 0) = P(W,W2) = Fe = 76 = a

P(X =1) = P(W,R,)+P(R,W,) = | | te le = ae a s


7/\6 7/\6 42 a

P(X = 2) = P(R,R,) = (|a = ee = a


7/\6 42 7
The probability distribution for X is as follows:

(a) Now E(X) pas x)

“Bo 6
7 i 7

oT
discs is 8.
So E(X) = :, or the expected number of redeee
eens een ete ec 0 2g. ee

(b) Standard deviation of X =/ Var(X).


Now Var(X) = E(X?)—E*(X)
We have HX \e= = x?P(X = x)
allx

6 2

So Var(X) = =(5|

= 0.408 (38d.p.)

Therefore 0.408 = 0.639 (3 d.p.).


the standard deviation of X =V ee
ee —SOSO™SOSC“‘“NNSNSS
Si
186 A CONCISE COURSE IN A-LEVEL STATISTICS

The following results are useful.

Result 1 Var(a) = 0 where ais any constant.


Proof: Var(a) = E(a?)— E%(a)
me eben?

=0
NOTE: this is as expected, as a constant does not vary.

Result 2. Var(aX) = a? Var(X) where a isany constant.

Proof: Var(aX) = E(aX)*— E*(aX)


= g?E(X?)—a’E(X)
= a?[E(X?)—E*(X)]
a*Var(X)

Result 3 Var(aX + b) = a’ Var(X) where a and bare any constants.

Proof:

Var(aX +b) = E(aX+b)*—E*(aX+ b)


= E(a?X? + 2abX + b?)— [aE(X) +b]?
= @?E(X?)+ 2abE(X) + b?—a?E?(X)— 2abE(X)— b?
=" a7 BEX -\—a B(x)
= a°[E(X*)—E*(X)]
= a’Var(X)

Example 3.12 The discrete r.v. X has the probability distribution shown in the
table.

Verify that Var(2X + 3) = 4 Var(X).


PROBABILITY DISTRIBUTIONSI — DISCRETE RANDOM VARIABLES 187

Solution 3.12 First, we need to find E(X). Now


E(X) = Ds xP(X =x)

= feaa
eas
allx

8
Now Var(X) = E(X?)—E?(X)
We have E(X?) = De x?P(X = x)

“BhGlestlont
allx

57
8
57 =)
So Vax) = |

Now consider 2X + 8.

ae Eee
(2x + 3)? 81 121

P(X =x)

We require
E(2X+8) = Di (2x + 3) P(X =x)

eat
all x

4
E{(2X+3)?] = a (2x + 3)?P(X = x)

-nf$} oo)an
allx

= 66
188 A CONCISE COURSE IN A-LEVEL STA TISTICS

Therefore
Var(2X+3) = E[(2X+3)?]—E(2X+ 3)
66 31\7
r 4
95
16
95
= A\——
64
= 4 Var(X)
Therefore Var(2X + 3) = 4Var(X).

Exercise 3d

'1./ The probability distribution for the r.v. 5. A team of 3 is to be chosen from 4 boys
X is as shown: and 5girls. If X is the r.v. ‘the number of
girls in the team’, find (a) E(X),
(b) E(X7), (c) Var(X).

eer
6. , The r.v. X has p.d.f. as shown:

Pind (0) = BOD, (FOEH


(c) Verify that E(X— y) = E(x
(d) Verify that Var(3X)= 9Var(X).
a ye
P(X=x) {0.11 0.28 0.33 0.18 0.10
+ 4) = 9Var(X).
(e) Verify that Var(3X (a) Find E(3X?— 5X+ 7).

(2) If X is the r.v. ‘the sum of the scores on (b) Verify that Var(2X— 1) = 4Var(X).
two tetrahedral dice’, where the ‘score’ is ‘ “
the number on which the die lands, find 7. \Two discs are drawn without replacement
(a) E(X), (b) Var(X), (c) Var(2X) from a box containing 3 red and 4 white
(d) Var( 9X+3) A ; discs. If X is the r.v. ‘the number of white
: discs drawn’, construct a probability
3. | Find Var(X) for each of the following distribution table.
__/ probability distributions: Find (a) E(X), (b) E(X?), (c) Var(X),

(a) (d) Var(3X— 4).


8. For the following probability distribu-
tion find (a) u= E(X), (b) Ex? ),
(b) (c) E(xX— ihe Verify that

E(X—y)’ = E(X ae
(c)

4. If X is the r.v. ‘the number on a biased 9. Ten identically shaped discs are in a bag;
die’, and the p.d.f. of X is shown, two of them are black, the rest white.
Discs are drawn at random from the bag
in turn and not replaced.
Let X be the number of discs drawn up
to and including the first black one.
find (a),the value of y, (6) E(X), List the values of X and the associated
(c) E(X7), (d) Var(X), (e) Var(4X). theoretical probabilities.
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 189
Calculate the mean value of X and its 2,3. Find (a) the value of the constant k,
standard deviation. What is the most (b) E(X), (c) E(X?), (d) the standard
likely value of X? y deviation of X.
If instead each disc is replaced before , 44. The random variable X takes integer
the next is drawn, construct a similar | values only and has p.d.f.
list of values and point out the chief
differences between the two lists. (SUJB) Oe P(X 2x) ener x = 1,2,3,4,5
10. The discrete r.v. X has p.d.f. P(X=x) = k(10—x) x = 6,7,8,9
P(X=x) = klix| Find (a) the value of the constant k,
(b) E(X), (¢) Var(X), (d) E(2X— 3),
where x takes the values — 3,—2,—1,0,1, (e) Var(2X— 3).

THE CUMULATIVE DISTRIBUTION FUNCTION

When we had a frequency distribution, the corresponding cumula-


tive frequencies were obtained by summing all the frequencies up to
a particular value. In the same way, if X is a discrete random
variable, the corresponding cumulative probabilities are obtained by
summing all the probabilities up to a particular value.

If X is a discrete random variable with p.d.f. P(X = x) for x = x),


Xo, ++, Xy, then the cumulative distribution function is given by
F(t) where :
F(t) =
II P(X <t)
t
II >. P(X =x) b= Xj, Xp, +++ Xp
xm,

The cumulative distribution function is sometimes known just as the


distribution function.

Example 3.13 Find the cumulative distribution function for the r.v. X where X is
‘the score on an unbiased die’.

Solution 3.13 The probability distribution for X is shown in the table:

F(1) = P(X<1) = -

=
F(2) = P(X <2) == PX =1)+ fa tae
P(x=-2 aur
=1)+P(X= Mic6

3
F(3) = P(X<3) =.—
190 A CONCISE COURSE IN A-LEVEL STA TISTICS

F(4) = P(X <4)=

F(5) = P(X S5)

F(6) P(X <6) =


|e
Dala
Ala
t
Therefore F(t) = a forst =119 2), 359.. 90:
pleat oernentnne—e e
NOTE: (a) F(6)=1, as expected.
(b) Although we work with the variable t we often write the final
x
answer in terms of x; i.e. F(x) = oe = 152. ae:

Example 3.14 The probability distribution for the r.v. X is shown in the table.
Construct the cumulative distribution table.

Solution 3.14 Now F(t) = 5 PK =2) t= O0f1, 252.56


x=0

So F(O) = P(X <0) = 0.08


F(1) = P(X <1) = 0.03+0.04 = 0.07
F(2) = P(X <2) = 0.03+0.04+0.06 = 0.138
and so on
So we have the cumulative distribution table:

zea 2 ridlaiied
NOTE: it is not possible to write a formula for the cumulative
distribution function in Example 3.13.

Example 3.15 For a discrete r.v. X the cumulative distribution function F(x) is
as shown:

Find (a) P(X = 3), (b) P(X > 2).


PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES ; 191

Solution 3.15 (a) From the table,


F(3) = P(X<8) P(X =1)+P(X = 2)+ P(X = 3) = 0.67
F(2) = P(X <2) = P(X =1)+P(X = 2) = 0.32
Now P(X = 3) = F(3)—F(2)

Therefore P(X = 3) =) 0.67—0.32


= 0.35
(b) P(X > 2) = 1—P(X<2)
= 1—F(2)
= 1—0.32
= 0.68
So, P(X = 3) = 0.35 and P(X > 2) = 0.68.

_ Exercise 3e

Construct the cumulative distribution tables


for the following discrete random variables:
(a) the number of sixes obtained when
RA thy
two ordinary dice are thrown, Construct the probability distribution
(b) the smaller number when two ordinary
of X, and find Var(X).
dice are thrown,
(c) the number of heads when three fair For a discrete r.v. X the cumulative Sere
coins are tossed.
bution function is given by F(x)= = for
The probability distribution for the r.v. Y x =1,2,3. Find (a) F(2), (b) P(X = 2),
is shown in the table:

Gea es oa
(c) Write out the probability distribution
of X, (d) Find E(2X— 3).

P(Y=y) 10.05 0.25 0.3 0.15 0.25 For a discrete r.v. X the cumulative distri-
bution function is given by F(x) = kx,
Construct the cumulative distribution table. x =1,2,3.Find (a) the value of the con-
stant k, (b) P(X <3), (c) the probability
For a discrete r.v. R the cumulative distri- distribution of X, (d) the standard
bution function F(r) is as shown in the deviation of X.
table: The discrete r.v. X has distribution

fae eae Er function F(x) where


xy oe x =1, 2,3, _
|F(r) [0.18 0.54 0.75 Fey tT
(a) Show that F(3)= et and F(2)=
\Find (a) P(R = 2), (b) P(R > 1), (b) Obtain the probability aaa
(c) P(R> 3), (d) P(R < 2), (e) E(R). of X.

For the discrete r.v. X the cumulative


(c) Find E(X) and Var(X).
distribution function F(x) is as shown:
(d) Find P(X > E(X)).
CS
192 A CONCISE COURSE IN A-LEVEL STA TISTI

TWO INDEPENDENT RANDOM VARIABLES

If X and Yare any two random variables, then


E(X+Y) = E(X)+£E(Y)
If X and Y are independent random variables, then
Var(X+ Y) = Var(X)+ Var(Y)

Example 3.16 X is the r.v. ‘the score ona tetrahedral die’, Y is the r.v. ‘the number
of heads obtained when two coins are tossed’.
(a) Obtain the probability distributions of X and of Y.
(b) Find E(X) and E(Y).
(c) Find Var(X) and Var(Y).
(d) Obtain the probability distribution for the r.v. X + Y.
(e) Find E(X + Y) and Var(X + Y) using the probability distribution
for X + Y; comment on your results.

Solution 3.16
(a) The probability distributions are as follows:

(b) By symmetry, E(X) = 25 E(Y)=

(c)
E(X?) = > Pur =e) E(Y?) II
allx

uta)+4QG)+2(8)+26(7)
=
1
—|+ =
1 1
oe
1
a2

i

2
Var(X) = E(X?)—E(X) Var(Y) BY) Ee
AP sot 1
Loan
25-4 2
1
4
So Var(X) = 1i. Var(Y)
bw
:
Nile
|e
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES ; 193
(d) Consider the r.v. X+ Y.
X + Y can assume values 1, 2,3,4,5 and 6.
. hy 2 1
P(X+ Y=1) = P(1 ondie,
0heads) = |—]|—|] = —
4]\4 16

4 P(X+ Y = 2) = P(2 on die, 0 heads) + P(1 on die, 1 head)

De eNy eu ese Lay ea ligt)


4]\4 A} \2 16

P(X+ Y= 3) = P(8 on die, 0 heads) + P(2 on die, 1 head)

E+E) 8
+ P(1 on die, 2 heads)

P(X+ Y= 4) = P(4 on die, 0 heads) + P(8 on die, 1 head)


+ P(2 on die, 2 heads)

- GG)+ Ge) GG)


Lid cael LV ii

4
16
P(X+ Y=5) = P(4 on die, 1 head) + P(3 on die, 2 heads)

= le) GIG)
aid Vid

3
16
P(X+ Y=6) = P(4 on die, 2 heads)

- (i) Eta

1
16
The probability distribution is as follows:

Ano

Pax Yasty) i (of 4

(e) By symmetry E(X+Y) = 35

But from (b) E(X)+ E(Y) 25 +1 = 33

Therefore E(X+Y) E(X)+E(Y)


194 A CONCISE COURSE IN A-LEVEL STATISTICS

Now

(cosa noel emt


Var(X+ Y) = E[(X+ Y)*]—-E(X+ Y)

E(X+Y)

16
14
1
Var(X + Y) 14—12—
4
3
tt fe
4
Therefore Var(X + Y) = 13.

NOTE: Var(X)+ Var(Y)= 13+ $= 13.


So Var(X + Y) = Var(X) + Var(Y).

In this example, the variables X and Y are independent.

In Sat Bo randomvariables X and Y and constants a and b,


_ E@x+ bY) = ee + bE(Y)
If X and Y are
reindependent, then—
oe Var(aX+ bY) = @?Var(X) 7 pve)
An important application of this occurs when a = 1 andb=~—1.
In this case, the r.v. is X + (—1)Y, i.e. X— Y, and

BAY EOE
: Var(X—Y) = Var(X) + (—1)*Var(¥)
: = Var(X) + Var(¥)

Example 3.17 X and Y are independent random variables with p.d.f. as shown:

Construct the probability distribution for X— Y and find


(a) E(X—Y), (b) Var(X—Y).

Given that E(X)=1, Var(X)=0.4, E(Y) = 2, Var(Y) =


comment On your answers.
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 195

Solution 3.17 X—Y can take the values —3,—2,—1,0,1


P(X—Y=—3) = P(X =0)-P(Y =3) = 0.06
P(X—Y=—2) = P(X =0)-P(Y = 2)+ P(X =1)-P(Y = 3)
= 0.08+0.18 = 0.26
P(X>Y=—1) = P(X = 0)-P(Y=1)+ P(X = 1)-P(Y= 2)
+P(X = 2).P(Y= 3)
= 0.06 + 0.24+0.06 = 0.36
P(X—Y = 0) = P(X =1)-P(Y =1)+P(X= 2)-P(Y = 2)
= 0.18+0.08 = 0.26
P(X—Y=1) = P(X =2)-P(Y=1) = 0.06

PLY >= x—y) 0.06 0.26 0.36 0.26 0.06

(a) E(X-Y) = —1 (by symmetry)

(b) Var(X— Y) = FUXSR)


=©1)"
= 9(0.06) + 4(0.26) + 1(0.36)+0+1(0.06)—1
= 1
Var(X— Y) = 1.
Now E(X)—E(Y) = 1-2 =-1
so E(X—Y) = E(X)—E(Y)
Var(X)+ Var(Y) = 0.4+0.6 = 1
so Var(X — Y) = Var(X)+ Var(Y)

Example 3.18 The r.v. X is such that E(X) = 2, Var(X) = 0.5; the r.v. Y is such
that E(Y) = 5, Var(Y) = 2; X and Yare independent.
Find (a) E(83X+4Y), (b) Var(3X+4Y), (c) Var(5X—2Y).

Solution 3.18 (a) E(83X+4Y) =ll 3E(X)+4E(Y)


= 3(2)+4(5)
26
So E(3X+ 4Y) = 26.
(b) Var(3X + 4Y) 9Var(X) + 16Var(Y)
9(0.5) + 16(2)
= 36.5
So Var(3X+ 4Y) = 36.5.
196 A CONCISE COURSE IN A-LEVEL STA TISTICS

(c) Var(5X—2Y) II25Var(X) + 4Var(Y)


= 25(0.5)+ 4(2)
20.5
So Var(5X
— 2Y) = 20.5.

Example 3.19 The table gives the joint probability distribution of two random
variables X and Y:

Calculate (a) E(X), (b) E(Y), (c) E(X+ Y).

Solution 3.19 Consider the r.v. X


P(X =0) = 2)
= P(X = 0)-P(Y= 1)+ P(X = 0)-P(Y
0.2+0.3
= 0.5
P(X =1) P(X =1)-P(Y =1)+P(X =1)-P(Y= 2)
= 0.4+0.1
= 0.5
The probability distribution for X is

By symmetry E(X) = 5:

Consider the r.v. Y _


PCYee hy) =" PCY = TPS = 0) Pea 1) 2x1)
= 0.2+0.4
= 0.6
P(Y= 2) = P(Y= 2)-P(X= 0)+P(Y= 2)-P(X= 1)
= O/de0.1
= 0.4
The probability distribution for Y is
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 197

BUY). = > PY = 9)
ally

= 1(0.6)
+ 2(0.4)
= 1.4
Therefore E(Y) = 1.4.

Now E(X+Y) =II E(X)+E(Y)


= 0.54+1.4
=o
Therefore E(X+ Y) = 1.9.

THE DISTRIBUTION OF X; + X2

We now consider the distribution of X,+X,, where X,, X, are


two independent observations from the same distribution X.
Now E£(X,+X>) E(X,) + E(X)
E(X)+ E(X)
I 2E(X)
and Var(X,+X~2) II Var (X,) + Var (X>)

Var (X) + Var (X)


2Var (X)

For the distribution X,+X,, where X, and & are independent


observations from thedistribution a
—-E(X,+X;) = 2E(X)
I | 2Var(X)
“Var(X,+X) oS
:
Forn1 independent observations ee |
ees Be Xa - EO
‘Var(X,+X2+...+Xn) = nVar(X).

Example 3.20 X has p.df.as shown

(a) Find E(X) and Var(X).


the
(b) Two independent observations are made from X. Construct
probability distribution for Xx, +X, and find the expecta tion
and variance. Comment on your results.
198 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 3.20 (a) E(X) = UxP(X =x)


2(0.3) + 3(0.5) + 4(0.2)

2.9

BOC lI Da PX =
II 4(0.3) + 9(0.5) + 16(0.2)
= 89

Var (X) \| E(X?)—E*(X)


8.9 —2.9?

= 0.49

Therefore E(X) = 2.9 and Var(X) = 0.49.

To show the possible


(b) Consider the distribution of X,+X>.
tree.
outcomes it is useful to draw a probability

X,+X2 probability

4 (0.3) (0.3) = 0.09

5 (0.3) (0.5) = 0.15

6 (0.3) (0.2) = 0.06

5 (0.5) (0.3) = 0.15

P(X, =3) =0.5 (0.5) (0.5) =0.25 .


a

7 (0:5)(0.2)'= 0:1

6 (0.2)(0.3) = 0.06

7 = (0.2)(0.5) =0.1

8 (0.2)(0.2)
=0.04

Xx; | X2 |
PROBABILITY DISTRIBUTIONSI — DISCRETE RANDOM VARIABLES : 199

Now P(X,+ X,= 4) =0.09, P(X, + X,=5) =0.15+0.15 =0.3,


and so on. We see that X,+ X, can take values 4, 5, 6, 7, 8
where

E(X,+X,) = 4(0.09) + 5(0.3) + 6(0.37) + 7(0.2) + 8(0.04)


= 5.8
Var (X,+X>) = 16(0.09) + 25(0.3) + 36(0.37) + 49(0.2)
+ 64(0.04) —5.8?
= 0.98
Therefore E(X,+X,)=5.8 and Var(X,+X.) = 0.98.

We note that, as expected,


E(X,+X,) 5.8 = 2(2.9) = 2E(X)
Var (X, + X>) 0.98 = 2(0.49) = 2Var(X)
In practice, if we need the expectation and variance, but not the
actual probability distribution we can just quote these results.

Example 3.21 Find the expectation and variance of the number of heads obtained
when 6 coins are tossed.

Solution 3.21 Let X be the r.v. ‘the number of heads when a coin is tossed’. Then
X can take the values 0, 1

Now E(X) = 0.5 (by symmetry)


E(X?) = 1(0.5) =
so Var(X) = E(X*)—E?(X)
pee Oia0s
= 0.25

Now consider Y = X,+X,+...+Xe where Y is the r.v. ‘the num-


ber of heads when 6 coins are tossed’.
E(Y) = 6E(X) Var(Y) = 6Var(X)
= 6(0.5) = 6(0.25)
= 3 =15
variance is 1.5.
So the expected number of heads is 3, and the
200 A CONCISE COURSE IN A-LEVEL STATISTICS

COMPARING THE DISTRIBUTIONS OF 2X AND X; + X2

Confusion often arises over the different random variables 2X and


X,+X>, where X;, X, are two independent observations of X. For
example, if X is the r.v. ‘the number on which a tetrahedral die
lands’ then 2X is the r.v. ‘double the number on which a tetrahedral
die lands’, whereas X, +X, is ther.v. ‘the sum of the two numbers
when a tetrahedral die is thrown twice’. We will see from the
following example that the distributions of the two random variables
are very different.

Example 3.22 (a) A tetrahedral die is thrown and the number of the face on
which it lands is noted. Find the expectation and the variance.

(b) The ‘score’ is double the number on which it lands. Find the
expectation and the variance of the ‘score’.

(c) A new experiment is set up, where the ‘score’ is the sum of the
numbers obtained when the die is thrown twice. Find the
expectation and the variance of this new ‘score’.

Solution 3.22 (a) Let X be the r.v. ‘the number on which the die lands’.

The probability distribution for X is:

Now, by symmetry, F(X) = 2.5.

Var(X) = E(X?)—E(X)
oS x? P(X = x)—E*(X)
all x

1
qilt4+9+16)—
(2.5)
1.25
So E(X) = 2.5, Var(X) = 1.25.
PROBABILITY DISTRIBUTIONS | — DISCRETE RANDOM VARIABLES 201

(b) Now consider the r.v. R, where R is ‘double the number on


which the die lands’, i.e. R = 2X. The probability distribu-
tion for R is:

By symmetry E(R) = 5.

Var(R) = E(R?)—E7(R)

> PPR =r)—25


allr

1
7 (4 +16 +36 + 64)—25

= 5

Therefore E(R)=5 and Var(R) =5, where R = 2X.

We note that

E(2X) = 2(2.5) = 2E(X) and Var(2X) = 4(1.25) = 4 Var (X).

(c) Consider the r.v. S where S is the sum of the two numbers on
which the die lands when it is thrown twice. Therefore S = X; + X).

Now S can assume the values 2,3, 4, 5,6, 7,8 and the outcomes (all
equally likely) are shown in the diagram:

throw
Second

First throw

The probability distribution for S is:


202 A CONCISE COURSE IN A-LEVEL STATISTICS

By symmetry, E(S) = 5.

Var(S) = E(S?)—E*(8)
= Ds? P(S =s)—25
alls

= =1401) + 9(2) + 16(8) + 25(4) + 36(3) + 49(2)

+ 64(1)]—25

= 2.5

Therefore E(S) = 5, Var(S) = 2.5, where S = X,+ Xp.

We note that

E(X,+ X2) = 2(2.5) = 2E(X),

Var (X,+X,) = 2(1.25) = 2 Var (X).

We can see that the distribution for R, double the number on which
the die lands, is very different from the distribution for S, the sum
of the numbers on which the die lands when it is thrown twice.

Although the means of the two distributions are the same, the
variances are not, with the r.v. ‘double the number’ having the
greater variance.

Summarising, we have
PROBABILITY DISTRIBUTIONS| — DISCRETE RANDOM VARIABLES : 203

Exercise 3f

1. Independent random variables X and Y 6. Two ordinary dice are thrown, a red and a
have probability distributions as shown in green die. Let R be the r.v. ‘the score on
the tables: the red die’ and let G be the r.v. ‘the score
on the green die’.
(a) Construct the probability distribution
for R+ G, the r.v. ‘the sum of the two
scores’, and find (i) E(R+G),
(ii) Var(R + G).
(b) Construct the probability distribution
Oo 02s 0A for R—G and find (i) E(R—G),
(ii) Var(R— G).
(a) Find E(X), E(Y), Var(X), Var(Y). (c) Given that E(R) = 3.5 and
(b) Construct the probability distribution Var(R) = 3, comment on your answers.
forthe rv.xct Y.
(c) Verify that E(X+ Y) = E(X)+ E(Y).
(d) Verify that 7. X has probability distribution as shown:
Var(X+ Y) = Var(X) + Var(Y)
(e) Construct the probability distribution
for the r.v. X— Y.
(f) Verify that E(X— Y) = E(X)— E(Y).
(g) Verify that
Var(X— Y) = Var(X)+ Var(Y). (a) Find E(X) and Var(X).
(b) Find P(X,+ X2,=4) where X,, X2
are two independent observations of X.
2. Independent random variables X and Y (c) Find E(X,+X,) and Var(X;+ X2).
are such that E(X) = 4, E(Y)=5, (d) Find P(2X = 4).
Var(X) = 1, Var(Y) = 2. Find (e) Find E(2X) and Var(2X).
(a) E(4X+ 2Y), (b) E(5X—Y),
(c) Var(3X+ 2Y), (d) Var(5Y— 3X), 8. : Rods of length 2m or 3 mare selected
1 at
(e) Var(3X—5Y).
random with probabilities 0.4 and 0.6
respectively.
(a) Find the expectation and variance of
the length of a rod.
(b) Two lengths are now selected at
random. Find the expectation and
variance of the sum of the two lengths.
(c) Three lengths are now selected at
The above table gives the joint probability random. Show that the probability distri-
distribution of two random variables X bution of Y, the sum of the three lengths,
and Y. Calculate (a) P(Y = 1), ie
(b) P(XY = 2), (c) E(X+ Y).
(L Additional)

random variables X and Y


4. Independent and find E(Y) and Var(Y). Comment on
aye auch that (Xj) — 14, E(Y-) = 20, your results.
Var(X) = 10, Var(Y) = 11. Find
(a) E(83X—2Y), (0) Var(aX4 2Y)-
9. Find the variance of the sum of the scores
a re tesixvand ¥. when an ordinary die is thrown 10 times.
5. Ind Heat variables
. Independent random :
aren stich ‘that £(X) = 3, -E(X?) = 12, Xhasap.df. given by P(X =x)= kx,
E(Y’) = 18. Find the value of 10.
E(Y)=4,
E(3X—2Y), (b) E(2Y— 3X), x=1. 2, 3, 4, Bind (a)k, (0)E(X),
(c) Var(X), (d) P(X1+ X2= 5), (e) H(4X)
mi Pies 4Y), (d) Var(2X—Y), (f) Var(X ,+X2+ X3).
(e) Var(2X+ Y), (f) Var(3Y+ 2X).
sane eee) eee
204 A CONCISE COURSE IN A-LEVEL STATISTICS

SUMMARY — DISCRETE RANDOM VARIABLES

For the discrete random variable X with probability density


function P(X = x) for x = X;,X2,---5Xn»

t
F(t) = » P(X =x) where F(t) is the cumulative
x= xX, distribution function

E(X) = Dee x)
allx

E(X?)—E*(X)
I x?P(X = x)— E*(X)
all x

For the random variable X and constants a and 0,


E(a) =a Var(a) = 0
E(aX) = aE(X) Var(aX) = a?Var(X)
E(aX +b) = ak(X)+b Var(aX +b) = a*Var(X)

For any two random variables X and Y and constants a and b,


E(X HY) = E(X)+ E(Y)
E(X—Y) = E(X)—E(Y)
E(aX+ bY) = aE(X)+ bDE(Y)

For independent random variables X and Y and constants a


and b,
Var(X + Y) = Var(X) + Var(Y)
Var(X—Y) = Var(X)+ Var(Y)
Var(aX+ bY) = a2Var(X)+ b?Var(Y)
Var(aX—bY) = a*Var(X)+ b?Var(Y)

If X,,X,,...,X, aren independent observations of the r.v. X


then,
E(X,+ X54. -+X,) =nE(X)
Var(X,+X,+...+X,) =nVar(X)
PROBABILITY DISTRIBUTIONS| — DISCRETE RANDOM VARIABLES 205

_____ Miscellaneous Exercise 3g

Two tetrahedral dice are thrown and the the probabilities of all other possible
score is the product of the numbers on values of X.
which the dice fall. What is the expected Use your results to show that the mean of
score for a throw? xX is 2, and find the standard deviation
of X.
A housewife removes the labels from
Two trials are made. (The two balls in the
three tins of peaches and a tin of baked
first trial are replaced in the box before
beans in order to enter a competition and
the second trial.) Find the probability
then puts the tins in a cupboard. She
that the second value of X is greater than
discovers that the tins are outwardly (MET)
or equal to the first value of X.
identical. Let X be the number of tins she
now needs to open in order to have baked
beans. List the values that X can take and A man stakes £2 to play a game in which
determine the probabilities for each of he rolls an ordinary (fair) die. If he scores
these values of X. Calculate the expected 1 or 2 he wins £3 (plus his stake) and
value of X. loses his stake if he scores 3, 4 or 5. If he
scores a six he may roll the die once
Her neighbour has five tins of peaches and again, winning if he scores 1, 2 or 6,
two tins of baked beans, again outwardly losing if he scores 3, 4 or 5. Find
identical once the labels are removed. (a) the probability that the man wins the
This woman removes the labels and puts game by rolling (i) once, (ii) twice.
the tins away. Find the probability that (b) his expectation,
this woman later requires to open at least (c) the expected number of times he will
three tins to have baked beans. roll the die.
: (SUJB Additional)
If the rules are changed so that the
winning scores are 1 and 2 but that every
On a long train journey, a statistician is
time he scores 6 he may roll the die again,
invited by a gambler to play a dice game.
find
The game uses two ordinary dice which
(d) the probability that he wins on his
the statistician is to throw. If the total
rth roll of the die,
score is 12, the statistician is paid £6 by
(e) the probability that he wins the
the gambler. If the total score is 8, the (SUJB)
game.
statistician is paid £3 by the gambler.
However if both or either dice show a 1,
A and B each roll afair die simultaneously.
the statistician pays the gambler £2. Let
£X be the amount paid to the statistician
Construct a table for the difference in
their scores showing the associated proba-
by the gambler after the dice are thrown
bilities. Calculate the mean of the distri-
once.
bution. If the difference in scores is 1 or
Determine the probability that (a) X =6, 2, A wins; if it is 3,4 or 5, B wins and if it
(b) X =38, (ec) X=—2. is zero, they roll their dice again. The
Find the expected value of X and show game ends when one of the players has
that, if the statistician played the game won. Calculate the probabilitity that A
100 times, his expected loss would be wins on (a) the first, (0) the second,
£2.78, to the nearest penny. (c) the rth roll. What is the probability
that A wins?
Find the amount, £a, that the £6 would
have to be changed to in order to make If B stakes £1 what should A stake for
(SUJB) the game to be fair? (SUJB)
the game unbiased.

A box contains nine numbered balls. A gambler has 4 packs of cards each of
balls
Three balls are numbered 3, four which is well shuffled and has equal
ered 4 and two balls are num- numbers of red, green and blue cards.
are numb
bered 5. For each turn he pays £2 and draws a
he
Each trial of an experiment consi
sts of card from each pack. He wins £3 if
drawing two balls with out repl acem ent gets 2 red cards, £5 if he gets 3 red cards
and recor ding the sum of the numb ers on and £10 if he gets 4 red cards.
that
them, whichRt is denoted a by X.rTShow a as: (a) What are the probabilities of his
the proba bilit y that X = 10 is 36, and find drawing 0,1, 2,3,4 red cards?
206 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) What is the expectation of his 11. The faces of an ordinary die are re-
winnings (to the nearest 10p)? (SUJB) numbered so that the faces are 1, 2, 2,3,3
and 3. This die and an ordinary, unaltered
die are thrown at the same time. The
During winter a family requests 4 bottles score, X, is the sum of the numbers on
of milk every day, and these are left on the uppermost faces of the two dice.
the door-step. Three of the bottles have Show that the probability of X being 3
silver tops and the fourth has a gold is + and of being 4 is §.
top. A thirsty blue-tit attempts to remove
List the values that X can take and deter-
the tops from these bottles. The proba-
bility distribution of X, the number of mine their respective probabilities. Hence
silver tops removed by the blue-tit, is the obtain the expected value of X, correct
same each day and is given by to 3 decimal places.
If the dice are thrown 3 times, determine
P(X=0)
=% P(X=1) = &,6 the probability, correct to 3 significant
P(X=2) = 3, P(X=3)=% figures, that none of the three values of
X exceeds 3. (SUJB)
The blue-tit finds the gold top particularly
attractive, and the probability that this
top is removed is 2 independent of the
number of silver tops removed. Determine
the expectation and variance of 12. Alan and his younger brother Bill play a
(a) the number of silver tops removed in game each day. Alan throws three darts at
a day, a dartboard and for each dart that scores
(b) the number of gold tops removed in a a bull (which happens with probability p)
day, Bill gives him a penny, while for each dart
(c) the total number of tops (silver and which misses the bull (which happens with
gold) removed in 7 days. ; probability 1— p) Alan gives Bill two-
pence. By considering all possible out-
Find also the probability distribution of comes for the three throws, or otherwise,
the total number of tops (silver and gold) find the distribution of the number of
removed in a day. (C) pence (positive or negative) that Bill
receives each day. Show that, when p = é,
The probability of there being X unusable the mean is 3 and the variance 6.
matches in a full box of Surelite matches The game takes place on 150 days. What
is given by P(X = 0) = 8k, P(X = 1) = 5k, is the mean and standard deviation of Bill’s
P(X= 2) =P(X =3)=k, P(X 24)=0. total winnings when p = 3? (O)
Determine the constant k and the expecta-
tion and variance of X.
Two full boxes of Surelite matches are
chosen at random and the total number Y
of unusable matches is determined. Cal- 13. In a certain field, each puffball which is
culate P(Y > 4), and state the values of growing in one year gives rise to a number,
the expectation and variance of Y. (C) X, of new puffballs in the following year.
None of the original puffballs is present
in the following year. The probability
10. A player throws a die whose faces are
distribution of the random variable X is
numbered 1 to 6 inclusive. If the player
as follows:
obtains a six he throws the die a second
time, and in this case his score is the sum RX 0) = PEG2\e 10 s
of 6 and the second number; otherwise PUGS) lI =,04s
his score is the number obtained. The Find the probability distribution of Y,
player has no more than two throws. the number of puffballs resulting from
Let X be the random variable denoting there being two puffballs in the previous
the player’s score. Write down the proba- year, and show that the variance of Y is
bility distribution of X, and determine 2"
the mean of X.
Hence, or otherwise, determine the proba-
Show that the probability that the sum bility distribution of the number, Z, of
of two successive scores is 8 or more is ut. puffballs present in year 3, given that
Determine the probability that the first there was a single puffball present in year
of two successive scores is 7 or more, 1. Find also the mean and variance of Z.
given that their sum is 8 or more. (C) (C)
PROBABILITY DISTRIBUTIONSI — DISCRETE RANDOM VARIABLES 207

14. A discrete random variable X can take Show that the expected value of X, the
only the values 0, 1, 2 or 3, and its length of the selected rod, is 3 units and
probability distribution is given by find the variance of X.
P(X= 0) = "kp P(X = 1) = 8k; After a rod has been selected it is not
P(X = 2) = 4k, P(X = 3) = 5k, where replaced. The probabilities of selection
k isa constant. Find for each of the three rods that remain
(a) the value of k, are in the same ratio as they were before
(6) the mean and variance of X. (JMB) the first selection. A second rod is now
selected from the bag. Defining Y to be
the length of this rod and writing
15. A random variable R takes the integer P,=P(Y=1|X=2),P,=P(Y= 2|X=1)
value r with probability P(r) where show that 16P,; = 9P>.
P(r) = kr’, ees Show also that (X+ Y=3)=35, (C)
P(r) -= 0, otherwise.
Find
(a) the value of k, and display the
distribution on graph paper, 18. A game is played in which a complete
(b) the mean and the variance of the throw consists of three fair coins being
distribution, tossed once each and any which have
(c) the mean and the variance of landed tails being tossed a second time;
5R—3. (L)P no coin is tossed more than twice. The
score for the complete throw is the total
number of heads showing at the end of
16. A gambling machine works in the the throw.
following way. The player inserts a (a) Find the respective probabilities that
penny into one of five slots, which are the score after a complete throw is (i) 0,
coloured Blue, Red, Orange, Yellow and (ii) 1, (iii) 2, (iv) 3.
Green corresponding to five coloured (b) Show that the average score over a
light bulbs. The player can choose which large number of complete throws is 9/4.
ever coloured slot he likes. After the (You may leave your answers as fractions
penny has been inserted one of the five in their lowest terms.) (O &C)
bulbs lights up. If the bulb lit up is the
same colour as the slot selected by the
player, then the player wins and receives
from the machine R pennies, where 19. The random variable X takes values —2,

P(R=2) = 3, P(R=4) = 0, 2 with probabilities "7i i respectively.


Find Var(X) and E(|X|).
P(R=6) = #, and
The random variable Y is defined by
P(R=8) = P(R=10) = 5 Y = X,+ X>, where X, and X2 are two
independent observations of X. Find
If the colour of the bulb lit up and the the probability distribution of Y. Find
slot selected are not the same, the player Var(Y) and E(Y+ 3). (C)
receives nothing from the machine. In
either case the player does not get back
the penny that he inserted. Assuming that
each of the colours is equally likely to 20 The discrete random variable X can take
light up, and that the machine selects only the values 0,1, 2,3,4,5. The proba-
the bulbs at random, determine bility distribution of X is given by the
(a) the probability that the player following:
receives nothing from the machine,
(b) the expected value of the amount
P(X = 0) P(X =1) = PS I £
=| Q

P(X = 3)
|
P(X = 4) = =z be I —
|
gained by the player from a single try,
o

(c) the variance of the amount gained by P(X> 2) = 3P(X <2)


the player froma single try. (C)
where a and Bb are constants.
(i) Determine the values of a and b.
(ii) Show that the expectation of X is 3
17. Four rods of lengths 1, 2, 3 and 4 units
and determine the variance of X.
are placed in a bag from which one rod
(iii) Determine the probability that the
is selected at random. The probability
sum of two independent observations
of selecting a rod of length/is kl. Find (C)
from this distribution exceeds 7.
the value of k.
208 A CONCISE COURSE IN A-LEVEL STATISTICS

21. A random variable R takes the integer (a) Write down the probability distribu-
values 1,2,...,n each with probability tion of X.
1/n. Find the mean and variance of R. (b) Find the probability distribution of
the sum of two independent observations
A pack of 15 cards bearing the numbers 1 from X and find the mean and variance of
to 15 is shuffled. Find the probability the distribution of this sum.
that the number on the top card is larger
than that on the bottom card, giving
reasons for your answer.
If the sum of these two numbers is S,
find 23. A random variable R takes the integer
(a) the probability that S <4 value r with probability P(r) defined by
(b) the expected value of S.
P(r) = kr’, r = 1,2, 8,
(Answers may be left as fractions in their
lowest terms.) (O &C) P(r) = k(7—r), rv = 4,5,6,
JQP) otherwise.
22. A discrete random variable X has the Find the value of k and the mean and
distribution function variance of the probability distribution.
Exhibit this distribution by a suitable
diagram.
Determine the mean and the variance of
the variable Y where Y=4R—2. (L)P
SPECIAL DISCRETE
PROBABILITY
DISTRIBUTIONS
THE BINOMIAL DISTRIBUTION
Consider an experiment which has two possible outcomes, one
which may be termed ‘success’ and the other ‘failure’. A binomial
situation arises when n independent trials of the experiment are
performed, for example
toss a coin 6 times; consider obtaining a head onasingle toss as
a success, and obtaining a tail as a failure;
throw a die 10 times; consider obtaining a 6 on a single throw as
a success, and not obtaining a 6 as a failure.

Example 4.1 A coin is biased so that the probability of obtaining a head is2. The
coin is tossed four times. Find the probability of obtaining exactly
two heads.

Solution 4.1 We will consider ‘obtaining a head’ as success.


Now P(H) = 3 and P(H) = 1.
The probability of obtaining two tails and two heads, in that order,
is given by
ree af ONS
P(HHHH) = 5] | (independent events)

4!
in BIO
But the result ‘two heads and two tails’ can be obtained

ways.
the heads
This is the number of ways of choosing the 2’places for
from the 4 places, i.e. 4C, ways. The arrangements are:

HHH HHHH HHHH HHHH HHHH HHHH

209
210 A CONCISE COURSE IN A-LEVEL STATISTICS

Tp

ere
Therefore P(2 heads exactly) = orf )

The probabilit A ee exactly


y of obtaining ae two headseewhen the biased
ee TOR Oe
coin is tossed four times is =.

Example 4.2 An ordinary die is thrown seven times. Find the probability of
obtaining exactly three sixes.

Solution 42 We will consider ‘obtaining a 6’ as success.

Now P(6) = 3and P(6) = 2.


one Bry
P(6666666) = |—] |—
6/ \6
But the result ‘four numbers which are not 6 and three sixes’ can be
!
obtained in ae ways, i.e. "C3 ways (the number of ways of choosing

the 3 places for the sixes from the 7 places).


page
So P(exactly three sixes) = ’C; A 5)

0.078 (38d.p.)
The probability of obtaining exactly three sixes when a die is
thrown seven times is 0.078 (3 d.p.).

Example 43 The probability that a marksman hits a target is p and the proba-
bility that he misses is g, where q = 1—p. Write an expression for
the probability that, in 10 shots, he hits the target 6 times.

Solution 43 We will consider ‘obtaining a hit’ as success.


P(success) = p and P(failure) = q = 1—p
We require 4 failures and 6 successes, in any order, so
P(6 successes) = !°C,q*4p®
Therefore the probability that he hits the target exactly 6 times in
10 \shots is*?°CZq"p°:
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 211

In general:

If the probability that an experiment results in a successful out-


come is p and the probability that the outcome is a failure is q,
where q = 1—p, and if X is the r.v. ‘the number of successful
Dpiton in n independent trials’, then the p.d.f. of X is given
y i

PRat = Gq op 2 0,12,.550

Example 44 If p is the probability of success and gq = 1—p is the probability


of failure, find the probability of 0,1,2,...,5 successes in 5 in-
dependent trials of the experiment. Comment on your answer.

Solution 44 Let X be the r.v. ‘the number of successful outcomes’. Then


P(X =x) ="C,q" *p*,x=0,1,...,5andn=5.So
P(X =0) = *Coq*p® = @°
P(X =1) = °C,q*p' = 5a"p
P(X = 2) = °C,q°p* = 10q°p?
P(X= 3) = °C3q7p® = 10q’p°
P(X=4) = *Cyq'p* = 5qp*
Pye C34 p= P 5
We note that q°, 5q4p,..., p> are the terms in the binomial expan-
sion of (q + p)° and we have

(qt+p) = @g5 + 5q*p + 10qg%p? + 10qg*p? + 5qp* + p°


i ¢: c t 4 i: ¢
1 = P(X=0)+ P(X =1)+ P(X = 2)+ P(X = 3) + P(X= 4) + P(X= 5)

_ In general:
The values P(X = x) for x =0,1,...,n can be obtained by con-
sidering the terms in the binomial expansion of (q+ p)”, noting
that gq+tp=1 iS

(q+ py" = "Coq"p9+"C,q"~ tp!+"CQ"t *p? +... "Cg" t p+... + "Eng"


t
ee t -
1 SPX =O) PR 1) + PUR 2) + PX or) t+... Pan)
If X is distributed in this way, we write

X ~ Bin(n,p) where n is the number of independent trials


ee and pis the probability of a successful
outcome in one trial

n and p are called the parameters of the distribution.


a binomial
So we read the statement X ~ Bin(n, p) thus: X follows
distribution with parameters n and p.
212 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 45 The probability that a person supports Party A is 0.6. Find the
probability that in a randomly selected sample of 8 voters there are
(a) exactly 3 who support Party A, (b) more than 5 who support
Party A.

Solution 4.5 We will consider ‘supporting Party A’ as success. Then p = 0.6 and
q=1—p=0.4. Let X be the r.v. ‘the number of Party A
supporters’. Then X ~ Bin(n, p) with n = 8 and p = 0.6.
So X ~ Bin(8, 0.6)
and
PX =a)e= UC. ites GlOA) Ooi ek =a de
(a) We require
P(X = 8) = °C;(0.4)°(0.6)? = 0.124 (3d.p.)
The probability that there are exactly 3 Party A supporters is 0.124
(op)
(b) We require
P(X > 5) P(X = 6)+P(X = 7)+ P(X =8)
8C,(0.4)?(0.6)°
+8C(0.4)(0.6)7 +8C,(0.6)®
28(0.4)2(0.6)°
+ 8(0.4)(0.6)7 +(0.6)®
(0.6)°(4.48
+1.92 +0.36)
0.315 (3dp.)
The probability that there are more than 5 Party A supporters is
0.315 (8d.p.).

Example 46 A box contains a large number of red and yellow tulip bulbs in the
ratio 1:3. Bulbs are picked at random from the box. How many
bulbs must be picked so that the probability that there is at least
one red tulip bulb among them is greater than 0.95?

Solution 46 Consider ‘obtaining a red tulip bulb’ as ‘success’.

Then p = P(success) = iand q = 3.


Let X be the r.v. ‘the number of red tulip bulbs’.
Then X ~ Bin(n, p) where p = Land n is unknown.

Now POS =x) ="4C..g"


“np Koa.alt
We require P(X = 1) >0.95.
Now

“(
Bixee) 1—P(X
=0)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 213

3 \n
So 1(5 > 0.95

3 n

0.05 > 3]
4
log 0.05 > nlog0.75 _ (taking logs to base 10)
i.e.
nlog0.75 < log0.05
(=O. 1301
1.301
n> (change inequality when dividing
mek ep by a negative quantity)

So n > 10.4, and the least value of n is 11.


Therefore at least 11 bulbs must be picked out of the box to ensure
that the probability that there is at least one red tulip bulb among
them is greater than 0.95.

eee eae ee —

Exercise 4a

Give answers to 3S.F. where appropriate.


(a) P(X = 4), soap powder (a) exactly 3 have bought
1. If X~ Bin(6, 3), find
(b) P(X S 2). Soapysuds, (b) more than 5 have bought
Soapysuds.
2. IfX ~ Bin(8, 0.4), find (a) P(X = 2),
(b) P(X = 0), (c) P(X > 6). 8. Describe an experiment in which the
7 probabilities involved are the terms of the
8. The probability that a pen drawn at binomial expansion of (3 +1), Takers
: ; :
random from a box of pens is defective of this experiment describe the event
is 0.1. If a sample of 6 pens is taken,
find the probability that(b)it5 will contain whose probability is given by the fourth
(a) no defective pens, or 6 defec- term of the expansion, and calculate this

tive pens, (c) less than 3 defective pens. probability.


9. eee is ea oe rie pie te
4. Find the probability of throwing at least Te
ikely PO Snow jeaGe ae eve.
five sixes in seven throws of an unbiased probability that in five tosses of the coin
OF ‘
‘ (a) exactly three heads are obtained,
5. Find the probability of throwing not (b) more than three heads are obtained.
Siac f
ee an four heads tatstposeve=? 10. The probability that a marksman scores a
Saat bull when he shoots at a target is 0.6.
Find the probability that in 7 attempts
6. Assuming that a couple are equally likely
he scores less than 3 bulls. Assume that
to produce a girl or a boy, find the proba-
the outcome of each shot is independent
bility that in a family of 5 children there
of any other.
will be more boys than girls.
that a housewife will 11. (a) A coin is biased so that the proba-
7. The probability
bility of obtaining a head is p. The coin
buy Soapysuds Powder is 0.65. Find the
is tossed three times. Show the possible
probability that in a sample of 8 house-
of outcomes on a tree diagram and compare
wives who have each bought a packet
214 A CONCISE COURSE IN A-LEVEL STATISTICS

the probabilities of obtaining 0,1, 2,3 Find the least number of shots which
heads with the terms in the binomial should be fired if the probability that the
expansion of (q +p)? where g = 1— p. target is hit at least once is greater than
(b) The coin is now tossed four times. 0.95.
Compare the probabilities P(X= x) for
x = 0,1, 2,3,4 given in the tree diagram 16. In a multiple choice test there are 10
with the Bionuel expansion of (q + p). questions and for each question there is
X is the r.v. ‘the number of heads obtained a choice of 4 answers, only one of which
in four tosses’. is correct. If a student guesses at each of
the answers, find the probability that he
12. If X ~ Bin(n, 0.6) and P(X <1) = 0.0256,
gets (a) none correct, (b) more than 7
find n. correct. If he needs to obtain over half
13. 1% of a box of light bulbs are faulty. marks to pass, and the questions carry
What is the largest sample size which can equal weight, find the probability that
be taken if it is required that the proba- he passes.
bility that there are no faulty bulbs in
the sample is greater than 0.5? 17. Of the pupils in a school, 30% travel to
school by bus. From a sample of 10
14. If X ~ Bin(n, 0.3) and P(X2 1) > 0.8,
pupils chosen at random, find the proba-
find the least possible value of n.
bility that (a) only 3 travel by bus,
15. The probability that a target is hit is 0.3. (b) more than 8 travel by bus.

EXPECTATION AND VARIANCE

If the dom variable Xiis such lee X ~ Bin(n, p)


‘then| E(X) = np
and — -Var(X) = npq where q =1-—p

Proof Now
P(X =x) = ECG Tr.
X= OFF
Zoe ee te

So X has the probability distribution shown in the table:

[nica[ear
feceythYENal cased ae g" 2p? n(n—1)(n—2) ,_,
3 Ls

E(X) >xP(X =x)


allx

= (0)q" + (1)nq"'p+42D) na oN
2

DOS Cy 2
ai q”? 33 +... +p”
(n—1)(n—2)
= np[q”~'+(n—1)q”~2p+ a gn 3? +

pat pte |

= np[(q+p)"—*]
= np since qt+p=1
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 215

Therefore

E(X). =

Now Var(X) = E(X*)—E?(X).

iB (xe) ye x?P(X = x)
allx

(0)q” oe (1)nq”~!p a(Ann 1) on 2


2!

lO aso pe
= npla" + 2(n—1)g"2p + I gray +... np" 3]

Sag phe ee ra
+(n—1)q"~*p rit...

+(n—1)p"-
Now the first row of terms is, as before, the expansion of
(qtp)"~’:
So E(X?) = np{(q+p)"*
+(n—1)p[q” 7+ (n—2)q""*pt+...+p”7J}
= np(1+(n—1)p(q+p)" 7]
= np(1t+(n—1)p]
= np(1—p) +n’p*
Therefore Var(X) = np(1—p)+n?p?— (np)?
= npq where q = 1—p

Therefore Var(X) II npq

Example 47 If the probability that it is a fine day is 0. 4, find the expected


number of fine days in a week, and the standard deviation.

X be the
Solution 4.7 Let ‘fine day’ be ‘success’. Then p = 0.4 and q = 0.6. Let
r.v. ‘the number of fine days in a week’.

Then X ~ Bin(n, p) where n = 7 andp = 0.4.

Now E(X) II ripe (7)(0.4) =


Var(X) lI npq = (7)(0.4)(0.6) = 1.68
216 A CONCISE COURSE IN A-LEVEL STATISTICS

Therefore the standard deviation of X =V1.68 = 1.30 days


(20-02);
The expected number of fine days in a week is 2.8 and the standard
deviation is 1.30 days (2 d.p.).

Example 48 The rv. X is such that X ~ Bin(n, p) and E(X) = 2, Var(X) = %.


Find the values of n and p, and P(X = 2).

Solution 4.38 If X ~ Bin(n, p) then E(X) = np and Var(X) = npq.


Now (Xx) i= 62, sO LP (i)

24 24 Re
Var(X) = 13 so npq = 5 (ii)

Substituting for np in (ii) we have


res
2q = as

eke
Bae
Therefore p= 1-@q

12 13
i
mis
Now substituting for p in (i) we have

13

n = 26

Therefore n = 26 and pe 7d 80 that X ~ Bin(26,


.
<4).
Now PX =X) = "Cg ae

26¢ rs 26—x ois


*\43 13 x* = 0 cher mice

so P(X = 2)
ols) (Ga)
13
cy (26)(25)(12)4
13

~ (1(2)(23)2¢
= 0.282 (3d.p.)
Therefore P(X = 2) = 0.282 (3d.p.).
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 217

Exercise 4b

Of the articles from a certain production bility that exactly three sixes are recorded
line, 10% are defective. If a sample of 25 during a particular experiment.
articles is taken, find the expected number (C Additional)
of defective articles and the standard
deviation. (i) For each of the experiments described
below, state, giving a reason, whether a
binomial distribution is appropriate.
The probability that an apple, picked at
Experiment 1. A bag contains black, white
random from a sack, is bad is 0.05. Find
and red marbles which are selected at
the standard deviation of the number of random, one at a time with replacement.
bad apples in a sample of 15 apples. The colour of each marble is noted.
Experiment 2. This experiment is a repeat
X is ar.v. such that X ~ Bin(n, p). Given of Experiment 1 except that the bag con-
that E(X) = 2.4 and p = 0.3, find n and tains black and white marbles only.
the standard deviation of X.
Experiment 3. This experiment is a repeat
of Experiment 2 except that marbles are
In a group of people the expected num- not replaced after selection.
ber who wear glasses is 2 and the variance (ii) On average 20% of the bolts produced
is 1.6. Find the probability that (a) a by a machine in a factory are faulty.
person chosen at random from the group Samples of 10 bolts are to be selected at
wears glasses, (b) 6 people in the group random each day. Each bolt will be selected
wear glasses. and replaced in the set of bolts which have
been produced on that day. °
If the r.v. X is such that X ~ Bin(10, p) (a) Calculate, to 2 significant figures, the
where p <$ and Var(X) = 14, find (a) p, probability that, in any one sample, two
(b) E(X), (c) P(X = 2). bolts or less will be faulty. ;
(b) Find the expected value and the
variance of the number of bolts in a sample
A die is biased and the probability, p, of
which will not be faulty. (L Additional)
throwing a six is known to be less than é.
An experiment consists of recording the In two binomial distributions the ratio of
number of sixes in 25 throws of the die. In the number of independent trials is 5:6,
a large number of experiments the standard the ratio of the arithmetic means is 2:9
deviation of the number of sixes is 1.5. and the ratio of the variances is 32:45. For
Calculate the value of p and hence deter- each distribution, find the probability of
mine, to two places of decimals, the proba- success.
Po eee Re Fe ee ee

DIAGRAMMATIC REPRESENTATION OF THE BIN@MIAL


DISTRIBUTION

Consider X ~ Bin(5, p) for various values of p. The probability


distributions are illustrated on the following page. It is useful to
compare, for example, the distributions of X ~ Bin(5,0.1) and
X ~ Bin(5,0.9) and these have been printed side by side to facilitate
this.
Consider X ~ Bin(5,0.1) and X ~ Bin(5,0.9)
Notice that
P(X = 0|X ~ Bin(5, 0.1)) P(X = 5|X ~ Bin(5, 0.9))
P(X =11X ~ Bin(5, 0.1)) P(X = 4|X ~ Bin(5, 0.9))
and so on.
218 A CONCISE COURSE IN A-LEVEL STATISTICS

X ~ Bin (5, 0.1) X ~ Bin (5, 0.9)

<
0.0081
wi <0.000
45
+] 0.00001
o}< ©}< 45
0.000010.000
|< 0.0081
\§<

X ~ Bin (5, 0.2) X ~ Bin (5, 0.8)

t+
8o
S
{
>1< 0.000
0.0064 32
a}
<— <
0.00032
©} 1

X ~ Bin (5, 0.3) X ~ Bin (5, 0.7)

X ~ Bin (5, 0.4) X ~ Bin (5, 0.6)

X ~ Bin (5, 0.5)


9
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 219

Also, considering X ~ Bin(5,0.2) and X ~ Bin(5,0.8)

P(X = 2|X ~ Bin(5, 0.2)) = P(X = 31X ~ Bin(5, 0.8))


In general
P(X =r|X ~ Bin(n,p)) = P(X =n—r|X ~ Bin(n, 1—p))

NOTE: If p =0.5 then the distribution of X, where


X ~ Bin(n, 0.5), is symmetrical.

CUMULATIVE BINOMIAL PROBABILITY TABLES

The task of finding binomial probabilities is made much easier if


tables are available. These give the cumulative probabilities
F(r) = P(X Sr) for the possible values of r. The tables are printed
on p. 680 and an extract is shown below. In this we are consider-
ing X ~ Bin(5, 0.3).

Example 49 If X ~ Bin(5,0.3) find (a) (X<4), (b) P(X =2),


(c) P(X <3), (d)P(X>1), (e) (X>3).

Solution 439 (a) P(X <4) 0.9976 (directly from the tables)

(b) (P(X =2) = P(X <2)—P(X <1)


= 0.8369 — 0.5282
= 0.3087
(c) (P(X<3) = P(X<2))
aie = 0.8369
(@) (P(X >1) = .1-P(X <1)
= 1—0.5282
= 0.4718
(e) P(X> 3) = 1-P(X <2)
= 1—0.8369
= 0.1631
TISTICS
220 A CONCISE COURSE IN A-LEVEL STA

However we are
In the tables values of p are given from 0.1 to 0.5.
0.8 and 0.9 by using the
still able to use them for p = 0.6, 0.7,
fact that
Parl Xo Binin, p)dn=— P= n—r|X~ Bin(n, 1 —p))
~ Bin(5, 0.3)
Consider again the probability distributions for X
and X ~ Bin(5, 0.7).
X ~ Bin (5, 0.7)
X ~ Bin (5, 0.3)

ough Dneiay BeOS op aS. 21 ea e424 5

We see that ‘P(X <3|p =0.8) = P(X2 2|p = 0.7)


and P(X >4|p =0.3) = P(X <1 |p = 0.7)
In general

P(X <r1X ~ Bin(n, p)) P(X =n—r|X ~ Bin(n, 1—p))


P(X >r|X ~ Bin(n, p)) P(X <n—rlX ~ Bin(n, 1—p))

Example 4.10 If X ~ Bin(5, 0.7) find (a) P(X >3), (b) P(X <4),
(c) P(X = 4).

Solution 4.10 Using the column headed p = 0.3, with n= 5:


(a) P(X 2 3\|p =0.7) II P(XS 2 |p = 0.3)
0.8369

(b) P(X <4|p =0.7) = P(X >1\p = 0.3)


= 1—P(X <O|p = 0.3)
= 1—0.1681
= 0.8319
(c) P(X =4\lp =0.7) = P(X =1\|p = 0.3)
= P(X <1\|p = 0.8) P(X <0 |p = 0.3)
= 0.5282—0.1681
= 0.3601
NOTE: Obviously there are times when it is not advantageous to
use the tables and it is quicker to calculate the probabilities directly.
However they are particularly useful when finding the probability
distribution P(X =x) for all values of x.
f
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 221

Example 4.11 Given X ~ Bin(8, 0.2) write out the probability distribution of X.

Solution 4.11 Using the cumulative probability tables on p. 630 with n = 8


and p = 0.2, and writing P(X <r) as Fir),
PRX =0). = F(0) = 0.1678
PX == F(1)—F(0) = 0.5033 —0.1678 0.3355
RX 2ye= F(2)—F(1) = 0.7969 —0.5033 0.2936
Fk oh F(3)—F(2) = 0.9437 —0.7969 0.1468
P(X =4) = F(4)—F(3) = 0.9896 —0.9437 = 0.0459
PU=15) t= F(5)—F(4) = 0.9988 —0.9896 0.0092
P(X =6) = F(6)—F(5) = 0.9999 —0.9988 0.0011
PCE 7) 2= F(7)—F(6) = 1.0000—0.9999 0.0001
Pec 8)G= F(8)—F(7) = 1—1.0000 = 0.0000
NOTE: P(X = 8) = (0.2)® = 0.000 002 56, but since the tables
give values to 4d.p. they will give P(X = 8) = 0.0000.

Exercise 4c

1. Use cumulative binomial probability (d) X ~ Bin(7, 0.75), find


tables to find the following: (i) F(5), (ii) F(3), (iii) P(X 2 4),
(a) X ~ Bin(6, 0.2), find (iv) P(X = 6).
(i) P(X <3), (ii) P(X > 4),
(iii) P(X = 5).
(b))X~ Bin(10, 0.45), find Given that X ~ Bin(6, 0.4) write out the
\_/ (i) P(X = 6), (ii) P(X < 3), probability distribution of X.
(iii) P(X <5), (iv) P(X > 8).
(c) X ~ Bin(4, 0.9), find
(i) P(X <1), (ii) P(X < 2), ) Given that X ~ Bin(5, 0.65) write out
(ip PX = 8): the probability distribution of X.

THE RECURRENCE FORMULA FOR THE BINOMIAL DISTRIBUTION


If cumulative probability tables are not available, then calculations
can be performed more easily with the help of the recurrence
formula, especially when a calculator with a memory is being used.
Now, if X ~ Bin(n, p) then

P(X =x) = "C,q” *p”


n! na X wk

(7a a & Voc!


n—x—1,x+1
and P(X =x+1) = "C, +19 Dp
n! Ree A
Dp
~ (n—x—1)x +1)!
222 A CONCISE COURSE IN A-LEVEL STATISTICS

Dividing these, we have


(nx iat Gee pe!
* n!
P(X =x+1)
(nxXcaL) Maced)! n! Qiea P
P(X =x)
_ (n—x)p
(ScEaLAag

So
(n—x)p ee
PX xe Chin

This is often written

(n—x)p ] .
Px +1 are
(elias where a ( =xt+1)
= P(X :
ll P(X =x)
Px

Example 4.12 If X ~ Bin(8, 0.3) use the recurrence formula to calculate P(X <4).

Solution 4.12 We require P(X < 4) = pot py + p2 t+ p3t Da.


Now p = 0.8,q = 0.7 andn = 8.

Po = P(X =0) = (0.7)°


If you are using a calculator with a memory system, there is no
need to write out the numerical answer for po; it can be stored in
the memory straight away. However, the values are written out so
that you can check them on your calculator.
_ (8—x)(0.3)
Now Px +1 = ein (check each value as it is
stored in the memory)

and Poi = (037)® 0.057 648


When x = 0
8(0.3)
PIMA 1(0.7) Po 0.197 650 3

When x = 1
7(0.3)
Po. 2(0.7)"! 0.296 475 4

When x = 2
6(0.3)
P3 = 3(0.7) P2 0.2541218

When x = 3
5(0.3)
D4 = 0.136 136 7
4(0.7)
°°
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 223

So P(X <4) = potpit+pr.+p3t+


D4
0.9420323 (from ‘memory’ on calculator)
Therefore P(X < 4) = 0.942 (3S.F.).
NOTE: if you are using a calculator, it is well worth practising this
method. However, you need to set out all your working first, then
use the calculator when you are ready.

Example 4.13 A pottery produces royal souvenir mugs. It is known that 6% are
defective. If 20 mugs are selected at random, find the probability
that the sample contains less than 5 defective mugs.

Solution 4.13 Let ‘obtaining a defective mug’ be ‘success’.


Then P(success) = p = 0.06 and q = 0.94.
Let X be the r.v. ‘the number of defective mugs’.
Then X ~ Bin(n, p) where n = 20 andp = 0.06. :
We have
PX =x) = "Clq" *p* Xe Oslo
ee. , 20
== °C (0,904)-°r (0.06),
We require
P(X <5) = P(X=0)+ P(X =1)+P(X= 2)+ P(X = 3) + P(X= 4)
We show the two methods of performing the calculations:

Method1
P(X =0) = (0.94)?° = 0.290 106 (6 d.p.)
P(X =1) = 20(0.94)!9(0.06) = 0.370 348

P(X (20)(19) 18
(2)(1) ((0.94)!8(0.06
( =2) ) = ——— yr )
2 =es 0.224573
(20)(19)(18) ms i 5= 0.086 007
P(X (3)(2)(1) ((0.94)!7(0.06
( = 3) ) = ————— point )

(20)(19)(18)(17) bs sets
=A) = — (0,94) !9(0:06 5 0.023 332
Sa (4)(3)(2)(1) ip —— =
0.994 366
So P(X <5)=0.994 (38S.F.).

Method 2 using the recurrence formula


Now
ss OR PIP
A CONCISE COURSE IN A-LEVEL STATISTICS

In the example n = 20,p = 0.06, q = 0.94. Therefore


_ (20—<x) (0.06)
p, (store each value in the memory)
PeedS toe1u(0.94)
Now
Po =" (0.94)?° 0.290 106 (as above)
When x = 0

Pir 2 (oePo 0.370 348

When x = 1
19 /0.06
Poe 2 fenle and so on

When x = 2
18 (0.06
Bs 3. fea

When x = 3
17 (0.06
Pa > ns oot,

P(X <5) = Pot pit+pr.t+


p3t Pa
= 0.994 (3S8.F.).
The probability that there are less than 5 defective mugs in a sample
of 20 is 0.994 (3S.F.).

To find the value of X that is most likely to occur


The value of X that is most likely to occur is the one with the
highest probability. It is very tedious to work through finding
P(X = x) for all x. Instead the recurrence formula can be used.

Example 4.14 Of the inhabitants of a certain African village, 80% are known to
have a particular eye disorder. If 12 people are waiting to see the
nurse, what is the most likely number of them to have the eye
disorder?

Solution 4.14 Let X be the r.v. ‘the number of people with the eye disorder’.
Then X ~ Bin(n, p) with n = 12 andp = 0.8.
Therefore X ~ Bin(12, 0, 8)
and P(X = x) =#@,(0:2)'7**(0.8):ons OF eee.
Using the recurrence formula

aR 2 (n—x)p
+1 Ging
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 225

SO Py +1 > Px when (n—x)p > (x + 1)q,


i.e. (12—x)(0.8) > (x +1)(0.2)
4(12—x) > x41
i.e. Dy 41 > Dx when x < 9.4

So that Pio> Po> Ps--->Pi-> Po


But Px+1 — Dx when x> 9.4
So that Pio > Pir > Pir:

Pio is the highest probability, so the most likely number of people


with the eye disorder is 10.
NOTE: the reader should verify that P(X = 10) does give the
highest probability by calculating P(X = x) forx =0,1,...,12.

Exercise 4d

(a) If the r.v. X is such that X ~ Bin(6, 3), (b) the most likely number of green
use the recurrence formula to find the counters drawn,
most likely value of X. (c) the probability that no more than 4
(b) Now use the formula to find P(X = x) yellow counters are drawn.
for x = 0,1,..., 6. Check whether your
answer is consistent with your answer to The random variable X is distributed
part (a). binomially with mean 2 and variance 1.6.
Find (a) the most likely value of X,
The r.v. X is such that X ~ Bin(9, 0.35). (b) P(X <6).
Find P(X <6) (a) without using the re-
currence formula, (b) using the recurrence The probability that a student is awarded
formula. Compare your answers. a pass in the mathematics examination is
0.75. Find the probability that in a group
In a bag there are 6 red counters, 8 yellow
of 10 students more than half pass the
counters and 6 green counters. A counter is mathematics examination.
drawn at random from the bag, its colour
is noted and it is then replaced. This pro-
cedure is carried out ten times in all. Find The random variable X is such that
(a) the expected number of red counters X ~ Bin(8, 0.4). Find (a) the most likely
drawn, value of X, (b) P(X <4), (c) P(X2 4).
Re ee UU wees (BO ee

FITTING A THEORETICAL DISTRIBUTION


with a
It is sometimes useful to compare experimental results
theoretical distribution.

of heads noted. The


Example 4.15 A biased coin is tossed 4 times and the number
ment is performed 500 times in all. The results obtained are
experi
shown in the table:
226 A CONCISE COURSE IN A-LEVEL STATISTICS

Number of heads

Frequency

the coin is
(a) Find the probability of obtaining a head when
tossed.
4 heads, using
(b) Calculate the theoretical frequencies of 0,1, 2,3,
the associated theoretical binomial distribution.

Solution 4.15 (a) For the frequency distribution


»fx
mean, xX =
Se
(0)(12) + (1)(50) + (2)(151) + (3)(200) + (4)(87)
500

1300
500
= 2.6
Let X be the r.v. ‘the number of heads obtained in 4 tosses’. Then
X ~ Bin(n, p) with n = 4. So the mean, E(X) = np.
Therefore np = 2.6
4p = 2.6
So p = 0.65
Therefore the probability that the coin will show heads is 0.65.

(b) X ~ Bin(4, 0.65). To find the values of P(X =x) for


x =0,1, 2, 3,4:

Method 1 — using P(X = x) = *C,(0.35)*~*(0.65)*


P(X =0) = (0.35)* = 0215 006.25
P(X =1) = 4(0.35)° (0.65) aa. 0 1 1A 5
P(X = 2) = 6(0.35)?(0.65)?-= 0.8105375
P(X = 8) = 4(0.85)(0.65)? = 0.884 475
P(X = 4) = (0.65)* = 0.178 5062

Method 2 — using p, + , _ (4—x)(0.65)


(x +1)(0.35)° *
Po = (0.35)* = 0.015 006 25
4(0.65) = (7.428 571 4) (0.015 006 2)
Bei nieinyte
= 0.111 475
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 227

Pr. - = aes =D2.7857 14 3)(0.111 475)


SLD OOS ng

= 0.310 5375

2(0.65)
3(0.35)"” = (1.238 095 2) (0.310 537 5)

= 0.384 475
1(0.65)
4(0.35)*° = (0.464 285 7) (0.384 475)

= 0.178506 25

Method 3 — Using cumulative probability tables.


X ~ Bin(4, 0.65) so we will need to consider X ~ Bin(4, 0.35)
in order to use the tables.

P(X = 0|X ~ Bin(4, 0.65)) = P(X = 4|X ~ Bin(4, 0.35))


= 1—0.9850 = 0.015
P(X =1|X ~ Bin(4, 0.65)) = P(X = 3|X ~ Bin(4, 0.35))
= 0.9850—0.8735 = 0.1115

P(X = 2|X ~ Bin(4, 0.65)) = P(X = 2|X ~ Bin(4, 0.35))


= 0.8735 —0.5630 = 0.3105

II P(X =11X ~ Bin(4, 0.35))


P(X = 3|X ~ Bin(4, 0.65)) =
= 0.5630 —0.1785 = 0.3845

P(X = 4|X ~ Bin(4, 0.65)) = P(X = 01X ~ Bin(4, 0.35))


= 0.1785
To obtain the theoretical distribution, multiply each of the proba-
bilities by the total frequency, 500.

Therefore the theoretical binomial frequencies (rounded to the


nearest integer) are as follows:

Number of heads

ncy
NOTE: this compares reasonably well with the original freque
distribution.
illustrated on
A statistical test to compare the two sets of data is
p. 540 (chi-squared test).
S
A CONCISE COURSE IN A-LEVEL STA TISTIC
228

Exercise 4e

A biased die is thrown 3 times and the 4. Fit a theoretical binomial distribution to
1.
number of fours is noted. The procedure is the following frequency distribution, given
n=A:
performed 180 times in all and the results
are shown in the table.
[x|o Bipen?}2 Foye eu!
Number of 4’s OD I 2s ed: 7 20 35 30 8

(a) What is the mean of this distribution?


(b) What is the probability of obtaining a
5. Seeds are planted in rows of six and after
4 when the die is thrown?
' 14 days the number of seeds which have
(c) Calculate the theoretical probabilities
germinated in each of the 100 rows is
of obtaining 0,1,2,3 fours, using the
binomial distribution. noted. The results are shown in the
(d) Calculate the corresponding theoretical table:
frequencies.

Ruubgotyar qth
Naa Ge Tae ie y
a9 14) me
& 40 80° BSEoe
2. Ina large batch of items from a production
line the probability that an item is faulty
is p. 400 samples, each of size 5, are taken
and the number of faulty items in each Find the theoretical frequencies of 0,1,...,
batch is noted. From the frequency distri- 6 seeds germinating in a row, using the
bution below estimate p and work out the associated theoretical binomial distribution.
expected frequencies of 0,1, 2,3,4, 5 faulty
items per batch for a theoretical binomial
distribution having the same mean.

Number of 6 Derive the mean and variance of the


faulty items ‘
binomial distribution.
Frequency 297 90 10; 2 > eee
Mass production of miniature hearing aids
is a particularly difficult process and so the
quality of these products is monitored
3. In an experiment a certain number of dice carefully. Samples of size six are selected
are thrown and the number of sizes obtained regularly and tested for correct operation.
is recorded. The dice are all biased and the The number of defectives in each sample
probability of obtaining a six with each is recorded. During one particular week
individual die is p. In all there were 60 140 samples are taken and the distribution
experiments and the results are shown in of the number of defectives per sample is
the table. given in the following table.

Number of sixes obtained More


; : Oimeedlae oe. con ace Number of defectives
in an experiment than 4
per sample (x) Pe a SNe ee
Frequency Oi 2G alae 2 0
Number of samples
with x defectives (/f) ee eee
Calculate the mean and the standard devia-
tion of these data.
By comparing these answers with those Find the frequencies of the number of
expected for a binomial distribution, defectives per sample given by a binomial
estimate (a) the number of dice thrown distribution having the same mean and
in each experiment, (0) the value of p. total as the observed distribution.
(C Additional) (AEB 1978)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS : 229

WORKED EXAMPLE

Example 4.16 70% of the passengers who travel on the 8.17 to London buy the
‘Daily Doom’ at the bookstall before boarding the train. The train is
full and each compartment holds eight passengers.
(a) What is the probability that all the passengers in a compart-
ment have bought the ‘Daily Doom’?
(b) What is the probability that none of the passengers in a com-
partment has bought the ‘Daily Doom’?
(c) What is the probability that exactly three of the passengers in
a compartment bought the ‘Daily Doom’?
(d) What is the most likely number of passengers in a compartment
to have bought the ‘Daily Doom’?
(e) If there are 40 compartments on the train in how many of
them would you expect there to be exactly three copies of the
‘Daily Doom’?
(f) The train is so full that in each carriage ten people are standing
in the corridor. What is the probability that the third passenger
I pass in the corridor of a carriage is the first I meet who has
bought the ‘Daily Doom’?
(g) What is the mean number of buyers of the ‘Daily Doom’
standing in a corridor? (SUJB)

Solution 4.16 Let ‘buying the Daily Doom’ be termed ‘success’. Therefore
p= 0.7 and q =1—p=0.3.
Let X be the r.v. ‘the number of passengers who have bought the
Daily Doom’. Then X ~ Bin(n, p) where n = 8 and p = 0.7, i.e.
xX ~
Bin(8, 0.7).

P(X =x) = "C,q" *P’. LOM ea


=e PCr Oia) (Ont)

(a) Pix =8) 0):


0.0576 (3S.F.)
t have
The probability that all the passengers in a compartmen
boughteeethe Daily Doom is 0.0576 (3 S.F.).
Dee aie eas tae ee

(b) P(X = 0) (0.3)


6.561 X1075
of the passengers has bought the Daily
Oa lity
probabi
The PO
tne eS none
that
Doom is 6.561 X 10°.
ee OeoI es
A CONCISE COURSE IN A-LEVEL STATISTICS
230

(c) P(X = 8) 8¢,(0.3)°(0.7)°


_ 8D) (0.3)5(0.7)?
(1)(2)(3)
0.0467 (38S.F.)
three passengers have
exactlylcs
probability thattat bought the Daily
The hme DA EESE
ce
Doom is 0.0467 (3S.F.).

bought the
(d) To find the most likely number of people who have
Daily Doom, use the recurrence formula to find the term with the
highest probability.
Recurrence formula:

Ding CO eget reel a


Pe ela
So Py41>Px When (8—x)(0.7)>(x+1)(0.3),
i.e. 5.6—0.7x > 0.3x+0.3
Se 5i3

Therefore pg > Ps > Pa > P3> P2—> Pi Do but De > P1> Ds-

So the most likely number of passengers to have bought the Daily


Doom is 6.

(e) In one compartment P(X = 3) = 0.0467. Let Y be the r.v. ‘the


number of compartments where there are exactly three copies of
the Daily Doom’.
Then Y ~ Bin(n, p) where n = 40 and p = 0.0467.
So E(Y) = np
= (40)(0.0467)
= 1.87 (3858.F.)

Therefore the expected number of compartments where there are


exactly three copies of the Daily Doom on the train of 40 com-
partments is 1.87 (3S.F.).

(f) P(third passenger is the first to have a copy) = P(DDD) where


D is the event ‘the person has a copy of the Daily Doom’ and
P(D) = 0.7.
Now P(DDD) (0.3)(0.3)(0.7) (independent events)
0.063
The probability that the third person is the first to have a copy is
0.063.
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS f 231

(g) Let C be the r.v. ‘the number of people in.the corridor to have
bought the Daily Doom’. Then C ~ Bin(10, 0.7).
E(C) = (10)(0.7)
= 7
Therefore the expected number of buyers standing in a corridor is 7.

Example 4.17 (a) State in words the meaning of P(E’) and of P(E|F) for two
events E and F.
(b —
All the letters in a particular office are typed either by Pat, a
trainee typist, or by Lyn, who isa fully trained typist. The
probability that a letter typed by Pat will contain one or more
errors is 0.3. Find the probability that a random sample of 4
letters typed by Pat will include exactly one letter free from
error.
(c) The probability that a letter typed by Lyn will contain one or
more errors is 0.05. Using the tables provided, or otherwise,
find, to 3 decimal places, the probability that in a random
sample of 20 letters typed by Lyn, not more than 2 letters will
contain one or more errors.

(d) On any one day, 6% of the letters typed in the office are typed
by Pat. One letter is chosen at random from those typed on
that day. Show that the probability that it will contain one or
more errors is 0.065.
(e) Given that each of 2 letters chosen at random from the day’s
typing contains one or more errors, find, to 4 decimal places,
the probability that one was typed by Pat and the other by
Lyn. (L)

Solution 4.17 (a P(E’) is the probability that event E does not occur.
F has
P(E|F) is the probability that E occurs, given that

occurred.

(b P(Pat’s letter contains errors) = 0.3.


P(Pat’s letter is free from errors) = 1—0.3 = 0.7.

by Pat which
Let X be the r.v. ‘the number of letters typed
are free from errors’ . Then X ~ Bin(4, 0.7).

P(X = 1) 4¢3(0.3)3(0.7)
= 0.0756
of 4 letters
Therefore the probability that a random sample
free from error is
typed by Pat will include exactly one letter
0.0756.
232 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) P(Lyn’s letter contains errors) = 0.05.


Let Y be the r.v. ‘the number of letters typed by Lyn contain-
ing errors’. Then Y ~ Bin(20, 0.05).
P(Y <2) = P(Y =0)+P(Y =1) +P(Y = 2)
(0.95)?°+ 20(0.95)19(0.05)
+ 190(0.95)'8(0.05)?
0.925. (3 d.p.)
Therefore the probability that a random sample of 20 letters
typed by Lyn will contain not more than two with errors is
0.925 (3d.p.).
NOTE: P(Y <2) can be found directly from cumulative
binomial probability tables, p.629, with n = 20, p = 0.05.
P(Y <2) = 0.925 (3d.p.)
(d) Let E be the event ‘a letter contains one or more errors’.

P(E|Pat)-P(Pat) = (0.3) (0.06)

P(E|Lyn)- P(Lyn) = (0.05) (0.94)

Typist Errors or not

P(E) = P(E|Pat)+P(Pat) + P(E| Lyn)*P(Lyn)


= (0.3)(0.6)+(0.05) (0.94)
II 0.065

Therefore the probability that a letter will contain errors is


0.065, as required.

P(E | Pat)+P(Pat)
(e) Now P(Pat|E)
P(E)
(0.3) (0.06)
0.065
|
Fileo
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS F 233

P(E |Lyn)*P(Lyn)
P(Lyn|E) =
P(E)
(0.05) (0.44)
0.065
Sed65
P(1 typed by Lyn, 1 typed by Pat |2 letters contained errors)

= 2(8)
ge 18,
(@)
4

= 0.4005 (4d.p.)
Therefore the probability that one was typed by Pat and one
by Lyn, given that the two letters contained errors, is 0.4005
(4d.p.).

SUMMARY — BINOMIAL DISTRIBUTION

If X ~ Bin(n, p) then P(X =x) ="C,.q” *p*, x = 0,1,2,...,n,


where g = 1~—p.
E(X) =np

Var(X) = npq
Recurrence formula:

.. (n= x)p
Dx ime ea|Px where Px+1 = P(X =x+1)
(x +1)q
Dicwcmel Auge)
~ Bin(n, 1—p))
~ Bin(n, p)) = P(X =n—r|X
P(X =r|X
~ Bin(n, 1—p))
~ Bin(n, p)) = P(X <n—r|X
P(X >r|X
~ Bin(n, 1—p))
~ Bin(n, p)) = P(X 2n—r|X
P(X <r|X

! ligatne et tl a As SS) Ee eee

__ Miscellaneous Exercise 4f

Find the probability of throwing three sixes (b) Find the most likely number of left-
1.
handed people in a random sample of 12
twice in five throws of six dice.
people.
(c) Find the mean and the standard devia-
tion of the number of left-handed people
2. Ina large city 1 person in5 is left handed.
in a random sample of 25 people.
(a) Find the probability that in a random
(d) How large must a random sample be
sample of 10 people
(i) exactly 3 will be left handed, if the probability that it contains at least
(ii) more than half will be left- one left-handed person is to be greater
than 0.95?
handed.
A CONCISE COURSE IN A-LEVEL STATISTICS

A crossword puzzle is published in The particular batch is 2%, evaluate, to two


Times each day of the week, except decimal places, the probabilities that
Sunday. A man is able to complete, on (a) the batch is accepted as a result of
average, 8 out of 10 of the crossword inspection of the first sample,
puzzles. (b) a second sample is taken and the
(a) Find the expected value and the batch accepted as a result of inspection of
standard deviation of the number of com- the second sample,
(c) the batch is rejected. (C)
pleted crosswords in a given week.
(b) Show that the probability that he will
complete at least 5 in a given week is Thatcher’s Pottery produces large batches
0.655 (to 8 significant figures). of coffee mugs decorated with the faces
(c) Given that he completes the puzzle on of famous politicians. They are con-
Monday, find, to three significant figures, sidering adopting one of the following
the probability that he will complete at sampling plans for batch inspection.
least 4 in the rest of the week. Method A (single sample plan) Select 10
(d) Find, to three significant figures, the mugs from the batch at random and
probability that, in a period of four accept the batch if there are 2 or less
weeks, he completes 4 or less in only one defectives, otherwise reject the batch.
of the four weeks. (C)
Method B (double sample plan) Select 5
mugs from the batch at random and
Samples, each of 8 articles, are taken at accept the batch if there are no defectives,
random from a large consignment in reject the batch if there are 2 or more
which 20% of articles are defective. Find defectives, otherwise select another 5
the number of defective articles which is mugs at random. When the second sample
most likely to occur in a single sample, is drawn count the number of defectives
and find the probability of obtaining this in the combined sample of 10 and accept
~ number. the batch if the number of defectives is 2
If 100 samples of 8 articles are to be or less, otherwise reject the batch.
examined, calculate the number of samples (a) If the proportion of defectives in a
in which you would expect to find 3 or batch is p, find, in terms of p, for each
more defective articles. (C) method in turn, the probability that the
batch will be accepted.
(b) Evaluate both the above probabilities
A small boy plays a game in which he has forp = 0.2 andp = 0.5.
to guess in which hand his uncle is hiding (c) Hence, or otherwise, decide which of
a toffee. The first time he chooses ‘left’. these two plans is more appropriate, and
For the next three times he chooses ‘same why. (AEB 1981)
hand as previous time’ with probability s,
and ‘different hand from previous time’ A trial may have two outcomes, success
with probability d, where s+ d = 1. Find or failure. If in n such independent trials,
the probability that he will choose ‘left’ the probability p of a success remains
on the last time. constant from trial to trial, write down
By adding together the binomial expan- the probability of r successes in the n
sions for (s+ da) and (s— dy’, deduce that trials.
the probability that he chooses ‘left’
on the last time can be written as When two friends A and B play chess, the
1(1+ (s—d)°}. (SMP) probability that A wins any game is 2.
and if A does not win the game, the
probabilities then of B winning and of a
In an inspection scheme a sample of 20 draw are equal. In the course of an
items is selected at random from a large evening they play four games. Calculate
batch and the number of defective items the probabilities (a) that A does not win
is noted. If this number is more than 2 a game, (b) that he wins more than two
the batch is rejected; if it is less than 2 the games.
batch is accepted. If the number of If it is known that A has won exactly two
defective items is exactly 2, a further of these four games, write down the
sample of 10 items is taken and the batch probability distribution of the number of
is rejected if this second sample has any games that B has won.
defective items, but otherwise the batch is Calculate the probability that A wins
accepted. more games than B when four games are
If the proportion of defective items in a played. (JMB)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 235

9. At a certain university in Cambford (6) What is the probability that a student


students attending a first course in selected at random from the examination
statistics are asked by the lecturer, pass list had in fact completed four
Professor Thomas Bayes, to complete example sheets or less? (AEB 1977)
10 example sheets during the course. At
the end of the course each student sits an
examination as a result of which he either
10. Two random variables X; and X, have
passes or fails. Assuming that
independent binomial probability dis-
tributions, where both X, and X, can only
(I) the number, N, of example sheets
take the values 0,1 and 2. If
completed by any student has a binomial
distribution given by P(X, = 2) = py and P(X, = 2)= py,
show that E(X,X2) = 4p 1P2-
n @o—n) (L Additional)
P(N =n) = *c,(2) 2] In an experiment two bags, A and B,
contain a very large number of white
n = 0,1,...,10 and black balls. In Bag A, 20% of the
balls are white and in Bag B, 70% of the
and (II) the probability of a student
balls are white. Two balls are selected at
passing the examination given that he
random from each bag.
completed n sheets during the course, is
(a) Find the expected number of white
n/10,
balls selected from Bag A.
(a) what is the (unconditional) proba- (b) If Z denotes the total number of white
bility that a student passes the examina- balls selected, calculate (i) P(Z = 2),
tion? (ii) E(Z). (L Additional)

THE GEOMETRIC DISTRIBUTION

A geometric distribution arises when we have a sequence of indepen-


dent trials, each with a definite probability p of success and
probability q of failure, where q = 1—p. Let X be ther.v. ‘the
number of trials up to and including the first success’.

Now

P(X =1) = P(success on the first trial) = p

Pee P(failure on first trial, success on second) = qp

P(X = 3) = q’p
P(X = 4) = q°p

P(X =x) = q* 'p

A discrete r.v. X having p.d.f. of the form’ P(X =x) = q. D:


ric
where 0<p<1 and q =1~-P, is said to follow a geomet
distribution, with x =1, 2,3,...
S
236 A CONCISE COURSE IN A-LEVEL STA TISTIC

p is the parameter of the distribution.


If X is defined in this way, we write

X~Geo(p)

EXPECTATION AND VARIANCE


1: a.
If X ~ Geo(p) then E(X) oe Var(X) = :

where q=1—p. |

E(X)

pt+2qp+3q2p+4q*pt...
+3q?2+ 493+...)
p(1+2q
p(l—q)_ since (1—g)?=1+2q+3q?+4q*+...

S/S
SIH
Now

E(X?) >, x2P(X = x)


allx

p+4qp+9q2p+16q*pt...
p(it+ 4q+9q?+16q?+...)
p(1+2q+38q?+4q?+...
4+2q + 6q2+129q3+...)
p((1—9) * + 2g) #38q 6q2-e7))
i
sat2atta>) since
p[ (1—q) °=1+3q+6q7+...

Da
De ae
& =)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS ; 237
Var(X) E(X*) =X)

heen et
pp? p?
p*+2q-—1
Pp?

iat eeee a
=, site p—1—9
p
1
Therefore E(X) =— and Var(X) =~.
p p

Example 4.18 The probability that a marksman hits the bull’s eye is 0.4 for each
shot, and each shot is independent of all others. Find

(a) the probability that he hits the bull’s eye for the first time on
his fourth attempt,

(b) the mean number of throws needed to hit the bull’s eye, and
the standard deviation,

(c) the most common number of throws until he hits the bull’s
eye.

Solution 4.18 (a) P(hits bull’s eye on fourth attempt) = (0.6)3(0.4) = 0.0864.

(b) Let X be the r.v. ‘the number of attempts up to and including


the first bull’s eye’. Now X follows a geometric distribution
with p = 0.4.

So P(X =x)=q*|p with q=0.6, p = 0.4.

Now £E(X) = a and Var(X) = =


Pp Pp
1 _ 0.6
. 04 (0.4)
2525 eeaa5
| and s.d.of X = 73.75
= 1.94 (3S.F.)
4

r of attempts is 2.5 and the standard devia-


So ene
OO ea numbeee
the mean
tion is 1.94 (3 S.F.).
238 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) P(X = 1) Il 0.4

P(X =2) = 0.6X0.4 = 0.24

P(X =33)_- = 106,X 0-4 =O 18

The probabilities are decreasing, and therefore the most


common
ia eae sienumber
ienea of erase
throwsEeBae is
2 1.

Result 1 ‘

P(X =r) = gq'‘p

P(X <r) = Dodi ae

II p(atqrbiatad
7)

jana)
(f==q)
= 1—q’

Therefore P(X >r) a dantlag)

Example 4.19 A coin is biased so that the probability of obtaining a head is 0.6.
If X is the r.v. the number of tosses up to and including the first
head, find

(a) P(X <4),


(b) P(X > 5),
(c) the probability that more than 8 tosses will be required to
obtain a head, given that more than 5 tosses are required.

Solution 4.19 P(X =x)=q* ‘p, x =1,2,3,... with p=0.6 and q =0.4.

(a) P(X>4) =q 4

Therefore P(X <4) = 1—q*


= 1-(0.4)4
= 0.9744

Therefore P(X < 4) = 0.9744.


SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS : 239

(b) P(X>5) =q 5
(0.4)5
0.010 24

Therefore P(X > 5) = 0.010 24.

P(X>8NX>5)
oath
(c) > 8|X>5)
P(X =

=" (0,4)*

= 0.064
The probability that more than 8 tosses will be required given
that more than 5 tosses are required is 0.064.

Grn
In general, P(X >a+b|X >a) lI
a
q
b
q
P(X >b)

Result 2 P(X >at+b|X>a) = (X>b) _

Example 4.20 In a particular board game a player can get out of jail only by
obtaining two heads when she tosses two coins.

(a) Find the probability that more than 6 attempts are needed to
get out of jail.

(b) What is the smallest value of n if there is to be at least a 90%


chance of getting out of jail on or before the nth attempt.

Solution 4.20 P(2 heads when 2 coins are tossed) = a


So p = P(success) = i and q= 3
of
Let X be the r.v. ‘the number of attempts required to getout
jail’. Then X follows a geometric distribution, X ~ Geo(Z).
A CONCISE COURSE IN A-LEVEL STATISTICS
240

(a) P(X> 6) =<q°

aa)
= 0.178 (3S.F.)

Oe aee
lity that player needs more than 6 attempts
ne probabi
The Oe Oey
before getting out of jail is 0.178 (3 S.F.).

(b) P(X >n) = (3).


So P(X <n) =1—(3)".
We require P(X <n) 20.9.

So 1—(3)" > 09

(Take logs to base 10.)

nlog (0.75) < log0.1

(Divide by negative quantity, so reverse the inequality.)

log 0.1
nea
log 0.75

n 2 8.0039

Therefore the smallest value of n is 9.

Check: P(X <6) = 1—(3)® = 0.8220 ...<90%


P(X <7) = 1—(2)7 = 0.8665
... <90%
P(X <8) = 1—(3)8 0.8998 ...<< 90%
P(X <9) Il pan
|
Rico II 0.9249 ... > 90%
0

So least value of n is 9.
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 241

Exercise 4g

(a) The probability distribution of a ran- 4. A random variable R has probability


dom variable X is geometric, that is function P(R =r) defined by

P(X =R) = (—p)p*) R="0;1, 2... 4 es’


e
PR(Ran
=== —|—
FE)
0< p<.
wher
co

Given that ZT np?—! = 1/(1—p)’, show


n=
foberSMe2As.Aks
P(R =r) =0, otherwise.
that E(X) = p/(1—p).
(b) When two dice are thrown together = Pp
a ‘double’ is obtained when the scores on Given that npi= —==5) 1ind
the two dice are the same. Assuming the pis (1B)
two dice to be unbiased, calculate the E(R).
probabilities that
Given that the variance of RF is i, deter-
(i) in two throws of the two dice, a
mine the mean and variance of S where
double will be obtained on the first
throw but not on the second throw, Ss = 38h 2.
(ii) in three throws of the two dice, The probability that a telephone box is
doubles will be obtained on the first occupied is 3. Find, to 2 significant
two throws but not on the third figures, the probability that a person
throw. wishing to make a phone call will find a
(c) Suppose the two dice are thrown telephone box which is not occupied only
until a double is not obtained. Using the at the sixth box tried.
result of part (a) find the expected num-
ber of doubles. (L Additional) Write down the mean number of occupied
boxes which will have to be tried before
An unbiased coin is tossed repeatedly the person finds a box which is not
until a tail appears. Find the expected occupied. (L)P
number of tosses.
NOTE: (1—x)-2?=14+ 2x+3x7+....
A darts player practises throwing a dart
(a) Describe an experiment you may
at the bull’s-eye on a dart board. Indepen-
dently for each throw, her probability of have carried out which can be modelled
by a geometric distribution. State any
hitting the bull’s-eye is 0.2. Let X be the
assumptions you may have made.
number of throws she makes, up to and
including her first success. (b) In many board games it is necessary
(a) Find the probability that she is to ‘throw a six with an ordinary die’
successful for the first time on her third before a player can start the game. Write
throw. down, as a fraction, the probability ofa
player
(b) Write down the distribution of X,
(i) starting on his first attempt,
and give the name of this distribution.
(ii) not starting until his third
(c) Find the probability that she will attempt,
have at least 3 failures before her first (iii) requiring more than three
success. attempts before starting.
(d) Show that the mean value of X is 5. What is
(You may assume the result (iv) the most common number of
oo throws required to obtain a six,
js rq’ = eat when |q| <1.) (v) the mean number of throws
r=1 required to obtain a six,
s
On another occasion the player throw Prove that the probability of a player
the dart at the bull’s -eye until she has 2 requiring more than n attempts before
sses. Let Y be the numb er of throws starting is (6)”.
succe
second
she makes up to and including her (c) What is the smallest value of n if
Given that Var(X ) = 20, deter-
success. there is to be at least a 95% chance of
varia nce of Y, and
mine the mean and the starting on or before the nth attempt? (O)
proba bilit y that Y = 4. (L)
find the
A CONCISE COURSE IN A-LEVEL STATISTICS

State conditions which give rise to a free gift. On any occasion when a motor-
geometric distribution whose Der aere ist buys petrol, the card received is equally
function is P(X=r)=(1—p)"_ likely to carry any of the ten pictures in
=1,2,3,..., where oe the set.
Prove that P(X <r) =1—(1—p)’. (a) Find the probability that the first
four cards the motorist receives all carry
Hence prove that, for any two positive different pictures.
integers s and f¢,
(b) Find the probability that the first
P(X>stt|X>s) = P(X>t) four cards received result in the motorist
and explain in words the meaning of this having exactly three different pictures.
result. (c) Two of the ten film stars in the set
During the winter in Glen Shee, the are X and Y. Find the probability that
probability that snow will fall on any the first four cards received result in the
given day is 0.1. Taking November 1st as motorist having a picture of X or of Y
the first day of winter and assuming (or both).
independence from day to day, find, to 2 (d) At a certain stage the motorist has
significant figures, the probability that collected nine of the ten pictures. Find
the first snow of winter will fall in Glen the least value of n such that
Shee on the last day of November (30th).
P(at most n more cards are needed
Given that no snow has fallen at Glen
to complete the set) > 0.99.
Shee during the whole of November, a
teacher decides not to wait any longer to (C)
book a skiing holiday. The teacher
decides to book for the earliest date for A marksman fires at a target. The proba-
which the probability that snow will have bility of his hitting the bull’s-eye is p for
fallen on or before that date is at least each shot, and each shot is independent
0.9. Find the date of the booking. (L) of all others. The random variable X
denotes the number of shots previous
In a sales campaign, a petrol company to that on which the bull’s-eye is first hit.
gives each motorist who buys their Show that
petrol a card with a picture of a film
star on it. There are 10 different picture Pr(X =x) "ap
cards, one of each of ten different film where gq = 1—p. Find the mean of X
stars, and any motorist who collects a and show that the variance is q/p”.
complete set of all ten pictures gets a (O &C)

THE POISSON DISTRIBUTION

Adiscrete r.v. X having p.d.f. of the form |


x

oe x)= eak
: xl
for 7 0,1, 2, 3, oo infinity
ae A. can take any positive value, is said tofollow the Poisson
pe :

NOTE: X is the parameter of the distribution.

If X is distributed in this way, then X ~ Po(A).

Example 4.21 Verify that if X ~ Po()), then X is a random variable.


i
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 243

Solution 4.21 If > PX =x) = 1 then X is a random variable.


allx

N ow oy P(X=x)
(X=x) = yy
. Pa
i
allx x=0

Pe rx
= eA pie
x!
x=0

2 3

= eG Det sect Fa]


pene

Ah?
But e=14+A+—4+—+4...
2) 3!

So PX = x)s=(e“)(e*)

= y
Therefore X is a random variable.

EXPECTATION AND VARIANCE

Example 4.22 If X ~ Po(A) find (a) E(X), (b) E(X?), (c) Var(X).

XNx
Solution 4.22 P(X =x) = Ohne KE ON Lee, ones
x.

The probability distribution can be written as follows:

(a) Now E(X) = xP(X = x)


allx

m eae he
E(X) = de iaaehet..]
ae rhe (e*)

=X
Therefore E(X) = X.
244 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) Now
E(X?)2y ies De 2 P(X = =x)

= forsee ee.

= pe a(rtant ea ]
A Ook,
= deaa vasfue + =Soe =on ts =aka ue
=

= nef + ita+ > + |

= de “(er + ded)
= +2?
Therefore E(X*) = A+2?.

(c) Now Var(X). =" E(k) Ee)


| = \+202—2?
=X
Therefore Var(X) = X.

Therefore if X~ Po(A),then E(X) =X


: and Var(X) =

Example 4.23 If X ~ Po(2), find (a) P(X = 4), (b) P(X> 8).

De

Solution 4.23 X ~ Po(2),so P(X =x) = 8 es Ya OA


x!
4
(a) P(X =4) = e? - = 0.0902 (3S.F.)

(b) P(X>3) =II 1—-P(X <3)


1— [P(X= 0) + P(X =1)+P(X= 2)]
92
1-74 etre 2]
21 2

1—5e2
0.323 (3S.F.)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 245

Exercise 4h

1. IfX ~ Po(3.5), ; find (a) P(X = 0) h 4. j X~ Po(A) and P(Xk== 0) =0)=0.0.201 9. Find
i
(6) P(X = 1), (ce) P(X = 2), (d) P(X =8), (a) A, (b) P(X <4).
(e) (X<8), (f) P(X2 4). 5.
:
Find the first 5 terms of the Poisson
distribution if (a2) \= 0.5, (b) A= 2.8,
2. IfX~ Po(1.8), find (a) P(X =6), (c) A= 3.6.
(b) P(X = 8), (c) P(X <2), (d)P(X> 4). -@. If X~ Po(A) and E(X’) =6, find (a) A,
(b) P(X = 2).
3. If x a Po(2.4) and F(X) is the cumulative 7. The random variable X follows a Poisson
distribution, find (a) F(0), (b) F(1), distribution with standard deviation 2.
(c) F(2), (d) F(8). Find P(X S 8).

USES OF THE POISSON DISTRIBUTION

There are two main practical uses of the Poisson distribution:


(1) when considering the distribution of random events,
(2) as an approximation to the binomial distribution.

We shall now look at these in more detail.

(1) The distribution of random events_

If an event is randomly scattered in time (or space) and has


mean number of occurrences 2 in a given interval of time (or
space) and if X is the r.v. ‘the number of occurrences in the given
interval’, then X ~ Po(A).

Examples of events which might follow a Poisson distribution:


The number of
(a) flaws in a given length of material,
(b) car accidents on a particular stretch of road in one day,
(c) accidents in a factory in one week,
(d) telephone calls made to a switchboard in a given minute,
(e) insurance claims made to a company in a given time,
(f) particles emitted by a radioactive source in a given time.

known to be
Example 4.24 The mean number of bacteria per millilitre of a liquid is
n distribu-
4. Assuming that the number of bacteria follows a Poisso
will be
tion, find the probability that, in 1 ml of liquid, there
(a) no bacteria, (b) 4 bacteri a, (c) less than 3 bacteri a.
246 A CONCISE COURSE IN A-LEVEL STATISTICS

liquid’.
Solution 4.24 Let X be the r.v. ‘the number of bacteria in 1 ml of
x

Then X ~ Po(4), so that P(X = x) = OF ea ie O12 ee

(a) P(X = 0) er

0.01838 (3S.F.)
that there will be no bacteria in 1 ml of liquid is
Oene probability
The ee ee
0.0183 (3S.F.).
44

(b) P(X =4) = on

= 0.195 (38.F.)
The probability that there will be 4 bacteria in 1 ml of liquid is
0.195 (3S.F.).

(c) P(X <8) = 2)


= P(X =0)+ P(X =1)+P(X
42
= 60 ema Cn
2!
= ¢ 4(14+4+8)
= 6-713
= 0.238 (385S.F.)
The probability that there are less than 3 bacteria in 1 ml of liquid
is 0.238 (3S.F.).

UNIT INTERVAL

In Example 4.24 we have considered 1 ml of liquid as the ‘unit’


interval.
The number of bacteria in 1 ml of liquid follows a Poisson distribu-
tion with parameter 4.
It follows that the number of bacteria in 2 ml follows a Poisson
distribution with parameter 8, the number in 3 ml follows a Poisson
distribution with parameter 12, and so on.

Example 4.25 Using the date of Example 4.24, find the probability that
(a) in 3 ml of liquid there will be less than 2 bacteria,
(b) in 5ml of liquid there will be more than 2 bacteria.

Solution 4.25 (a) In 1 ml of liquid we ‘expect’ to find 4 bacteria, so in 3 ml of


liquid we ‘expect’ to find 12 bacteria.
Let Y be the r.v. ‘the number of bacteria in 3 ml of liquid’.
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 247

12”
So Y~ Po(12) and P(Y=y) =e ache VseO phe, ses

Now, we require P(Y < 2) P(Y = 0)+P(Y=1)


e 124 e 12192
e 1213
7.99X10-5 (3S.F.)
Therefore the probability that there are less than 2 bacteria in 3 ml
of liquid is 7.99X10-° (8 S.F.).

(b) In 1 ml of liquid we ‘expect’ 4 bacteria, so in i5 ml of liquid we


‘expect’ 2 bacteria.
Let R be the r.v. ‘the number of bacteria in 5ml of liquid’.
oF

Then R ~ Po(2) and P(R=ry =e?


r!
fr=Os2,...

We require

|
P(R > 2) 1—[P(R = 0)+ P(R = 1)+P(R = 2)]

t
92
e *7+e 72+e 27—

1—e (5)
0.323 (3S.F.)
The probability that there are more than 2 bacteria in 5ml of liquid
ee a ot ee
is 0.323 (3S.F.).

eee ee SS

Exercise 4i

A book containing 750 pages has 500 3. Cars arrive at a petrol station at an average
1.
misprints. Assuming that the misprints rate of 30 per hour. Assuming that the
occur at random, find the probability that number of cars arriving at the petrol
station follows a Poisson distribution, find
a particular page contains (a) no misprints,
the probability that
(b) exactly 4 misprints, (c) more than 2
misprints. (a) no cars arrive during a particular 5
minute interval,
An insurance company receives on average (b) more than 3 cars arrive during a 5
2 claims per week from acertain factory. minute interval,
Assuming that the number of claims follows (c) more than 5 cars arrive in a 15 minute
a Poisson distribution, find the probability interval,
that (a) it receives more than 3 claims in a (d) in a period of half an hour, 10 cars
given week, (b) it receives more than 2
arrive,
claims in a given fortnight, (c) it receives
(e) less than 3 cars arrive during a 10
no claims on a given day, assuming that the
minute interval.
factory operates on a 5 day week.
248 A CONCISE COURSE IN A-LEVEL STATISTICS

If the number of bacterial colonies on a the team scores no goals in a match is


petri dish follows a Poisson distribution 0.301 (3d.p.) find (a) the value of AS
with average number 2.5 per cm/?, find the (b) the probability that the team scores
probability that less than 3 goals in a match, (c) the
(a) inl cm? there will be no bacterial probability that the team scores less than
colonies, 8 goals in 2 matches.
(6) inl cm’ there will be more than 4
ers The number of telephone calls made to the
bacterial colonies,
~ school office during a 5 minute interval
(c) in 2cm? there will be less than 4 follows a Poisson distribution with mean
bacteralicolone 0.5. Find the probability that (a) no calls
(d) in4 cm? there will be 6 bacterial
will be received between 10 05 and 1010,
oo (b) more than 4 calls will be received
The mean number of flaws per 100m of during a particular period of 30 minutes.
material produced on a certain machine —__, a The numberlof aceideniateer wean ins
at Blanktown Fabrics is 2. If flaws occur ie certain factory follows a Poisson distribu-
randomly, find the probability that (a) in
tion with variance 3.2. Find the proba-
a 200m length of material there will be
bility that (a) no accidents occur in a
more than 8 flaws, (b) in 50 m of material particular week, (b) more than 4 accidents
there will be exactly 2 flaws.
occur in a particular week, (c) less than 3
The number of goals scored in a match by accidents occur in a particular fortnight,
Random Rovers follows a Poisson distribu- (d) exactly 7 accidents occur in a particular
tion with mean AX. If the probability that fortnight.

(2) Using the Poisson distribution as an approximation to


the binomial distribution

A binomial distribution with parameters n and p can be approxi-


mated by a Poisson distribution, with parameter A = np, if n is
large (> 50 say) and p is small (<< 0.1 say). The approximation
gets better as n -* 00> and p- 0.

For a binomial distribution P(X = x),x =0,1,...,n are given by


the terms of the expansion of (q + p)”.
For a Poisson distribution P(X = x),x =0,1,2,..., are given by
ne
the terms e 4(1+A+ aicate)

So we wish to show that

r2
(q tee Aieaeh +... as n->oo

We will need the following theory, relating to the binomial theorem.


By the binomial theorem,
a+=|n 1+n(=}4 20ae 1) x?2) , n(n—1)(n—
A,
2)x?3
ee
eee
n n 21 ay 3! n3

bee aa n+ “le(n—1) 4
2! n n 3! n n

2 3
rt2+2 (-2)42 ye a|bet Zep
2! n 3! n nijate ce
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 249

as n>oo,
Now, |1——]-> 1, and
n
x\" ees
f+=) ee ee
n 2h ct 3)
=—er

1.€ lim [+4 = ar


n—>oco

Similar]
imuarly - lim |1——]i .=_e ~

Now

n—1 n(n—1) ,_, n(n 1) — 2)


DO ee aa
fae 0
Gnthi ce
eoiant meecaton GiiEegiee eetoa 2
Stee
PN

barb ae
pe ree totttae Ar equ So
n n
IN n n—\ ee Tie 2

(q+p)" n n n 2! n n

‘uae?
| |
=

=: 4 a=) | al
tag
—t
ee
Ca Se |3 a |>
3
2!{1
Sea
fs ————

= ee
le a >
n

ix We r
Asn > © we have (i=) +e* from (a) and —~> 0.
n n
2

(qt py >e h+a+ at+ Jasrequired.


ee
—————————

Example 4.26 A factory packs bolts in boxes of 500. The probability that a bolt
is defective is 0.002. Find the probability that a box contains 2
defective bolts.

Solution 4.26 Let X be the r.v. ‘the number of defective bolts in a box’.
This is a binomial situation, with n = 500, p = 0.002.
So X ~ Bin(500, 0.002)
CS
250 A CONCISE COURSE IN A-LEVEL STA TISTI

Method 1 Using the binomial distribution,


P(X =x) = CQ NGD x= 0; 12, cer. ott
so
We have n = 500,p = 0.002 and q = 0.998,
P(X =x) = 500C. (0.998)°°°*(0.002)*
P(X = 2) 500(7, (0.998)498(0.002)”

= Se) (0.998)*98(0.002)?
(2)(1)
0.184 (3d.p.)

Method 2 Since n is large and p is small, we use the Poisson


approximation.
The parameter \ = np = 500(0.002) = 1.
ile

So X ~ Po(1) and P(X=x) = eae x= O01


x!
1?

We require P(X =2) = oy = 0184 (3d:p,)

Therefore the probability that a box contains 2 defective bolts is


0.184 (3 d.p.).
NOTE: the answers agree to 3d.p. and the calculations were much
easier in method 2.

Example 4.27 Find the probability that at least two double 6’s are obtained when
two dice are thrown 90 times.

Solution 4.27 Throw two dice, P(double 6) = (3)(Z) = &

Let X be the r.v. ‘the number of double 6’s obtained when two dice
are thrown 90 times’, then X ~ Bin(90, x) and np = (90)(z) = 2.5.

Using the Poisson approximation,


2.5)*
X~Po(2.5) and P(X=x) = ens (0) pS Oe
x!
Now P(X >2) = 1—[P(X=0)+ P(X =1)]
=1—(e
2 6, 2.0)
= 1—e725(3.5)
= 0.713 (34.p.)
The probability that at least two double 6’s are obtained when two
dice are thrown 90 times is 0.713 (8 d.p.).
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 251

Exercise 4j

1. If X ~ Bin(100, 0.03), use (a) the binomial average, 0.8% of the eggs are found to be
distribution, (b) the Poisson distribution broken when the eggs are unpacked.
to evaluate (i) P(X = 0), (ii) P(X = 2), (a) Find the probability that in a box of
(iii) P(X = 4). 500 eggs (i) exactly 3 will be broken,
4 f (ii) less than 2 will be broken.
2. If x~ Bin(200, 0.006), use the Poisson (b) A hypermarket unpacks 100 boxes of
distribution to find (a) P(X <3), eggs. What is the probability that there will
(b) P(X > 5). be exactly 4 boxes containing no broken
3. On average one in 200 cars breaks down on eges?
a certain stretch of road per day. Findthe 6. The probability that there is a flaw in a
probability that on a certain day (a) none metre length of cloth is 0.02. Find the
of a sample of 250 cars breaks down, probability that there are more than three
(6) more than 2 of a sample of 300 cars flaws in 175 m of cloth.
break down.
7. An aircraft has 116 seats. The airline has
4. The probability that a particular make of found, from long experience, that on
light bulb is faulty is 0.01. The light bulbs average 2.5% of people with tickets for a
are packed in boxes of 100. particular flight do not arrive for that
(a) Find the probability that in a certain flight. If the airline sells 120 seats for a
box there are (i) no faulty light bulbs, particular flight determine, using a suitable
(ii) 2 faulty light bulbs, (iii) more than 3 approximation, the probability that more
faulty light bulbs. than 116 people arrive for that flight.
(b) A buyer accepts a consignment of 50 Determine also the probability that there
boxes if, when he chooses two boxes at are empty seats on the flight. (C)
random, he finds that they contain no : ‘ ;
more than two faulty light bulbs altogether. 8- A firm selling electrical components packs
Find the probability that he accepts the them in boxes of 60. On average 2% of the
consignment. components are faulty. What is the chance
of getting more than 2 defective com-
5. Eggs are packed in boxes of 500. On ponents in a box? (SUJB)P
EE
Se

CUMULATIVE POISSON PROBABILITY TABLES

The task of finding Poisson probabilities can be made much easier


if tables are available. These give P(X <r) for given values of i.
The tables are printed on p. 632 and an extract is shown below.
In the extract, X ~ Po(2.4).

0
1
2
3
4
5
6
7
8
9
1
A CONCISE COURSE IN A-LEVEL STATISTICS
252
Example 4.28 If X ~ Po(2.4) find (a) P(X <6), (b) P(X 7 3),
(c) P(X <8), (d) P(X >7), (e) P(X = 4).

Solution 4.28 (a) P(X <6) = 0.9884 (directly from the tables).

(b) P(X> 3) =1—P(X <2) =1 — 0.5697 = 0.4303.

(c) P(X <8) =P(X S71) = 0.9967.

(d) P(X >1) =1—P(X <1) = 1—0.3084 = 0.6916.


(e) P(X = 4) = P(X< 4) —P(X< 8) = 0.9041 — 0.7787 = 0.1254.

DIAGRAMMATIC REPRESENTATION OF THE POISSON


DISTRIBUTION

The following diagrams show the probability distribution of


X ~ Po(A) for various values of \. The horizontal axis gives values
of x and the vertical axis gives values of P(X =<).

Notice that for small values of \ the distribution is very skew, but
it becomes more symmetrical as A increases.

X= ROK) Xe Po (tee) X ~ Po(2) X ~ Po(2.2)

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

OMiP2 35455. 6 0 1e263-4e5e6 OS, 2°34 2556 Baz. OW 125354556


7a8
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS

; Xi Pol(S} X ~ Po(3.8) : X ~ Po(5)

OF Ze Seat oO i OR erode SOM, Tomo CUI 253545


Se Gu 7sSanton

X ~ Po(10)

012345 67 8 91011121314
1516 171819 20

THE MODE OF THE POISSON DISTRIBUTION

The mode is the value which is most likely to occur, that is the one
with the highest probability. Consider X ~ Po(A). Considering the
diagrams, we see that
when A =1, there are two modes, 0 and 1

when i = 2, there are two modes, 1 and 2

when \ = 3, there are two modes, 2 and 3

In general, if A is an integer then there are two modes and thes.


occur when x =A—1, x =A.

Notice that
when \ = 1.6, the mode is 1
when A = 2.2, the mode is 2
when A = 3.8, the mode is 3.

In general, if \ is not an integer, then the mode m is the integer


such that
| A-1<m<..
254 A CONCISE COURSE IN A-LEVEL STATISTICS

THE RECURRENCE FORMULA FOR THE POISSON DISTRIBUTION

As with the binomial distribution, when cumulative probability


the use
tables are not available calculations can be made easier with
of the recurrence formula.

If X ~ Po(A) then
dh

P(X =x) =e
x!
rx +1

sO P(X =x FA) (=
(x +1)!
Therefore
P(X=xt1) ev *ntthe!
P(X =x) (x +1)!e—*d*
Lian
(x7LD)

So P(X=x+1))= eG iy P(X=x)
G *) §o ee So ee

Sometimes this is written p, +, = @


mrs
+1)" where Py Co- x).

This is known as the recurrence formula for the Poisson distribution


with parameter X.

Example 4.29 If X ~ Po(2.3), use the recurrence formula to find P(X 2 5).

g
Solution 4.29 Pyt 1 = — 2.3
iedaftyay
Now po =e? (0.100 258 8)
2.3
Pi = Po (0.280 5953)
2.3
P2= Pi _— (0.265 1846)
2.3
Pale 3 2 (0.203 308 2)

2.3
DP4 = Pipe (0.116 902 2)
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 255
We require P(X25) = 1—(pot+p,+p,.+p3+Da,4)
= 1—0.916 2492 (from memory
store)
= 0.0838 (35.F.)
Therefore P(X > 5) = 0.0838 (38.F.).
The recurrence formula can be used to find the value of X which is
most likely to occur, i.e. the value of X with the highest probability.

Example 4.30 If X ~ Po(3.4), find the most likely value of X.

Solution 4.30 The recurrence formula for this Poisson distribution is


3.4
Dae (e+ 1)” TOPME NOs eco

Now px +1 > Px when nnn s


i.e. 3 4 eee teted
or x < 2.4
i.e. Px+1>DPx, When x<2.4
so that P3>P2>P1-> Po
But PxtiDy when. x>2.4
so that P3> DP4a> Ds
D3 is the greatest term in the sequence, so the most likely number to
occur is 3.
NOTE: \ =3.4 and the mode m is such that 2.4<m < 3.4, giving
m = 8 as above.

eee

Exercise 4k ieee
Cnn ee

1. Use (a) cumulative Poisson probability butions and verify that the mode m is
tables, (b) the recurrence formula, to the integer such that A—1<m<).
find the first six terms of the following (i) X~ Po(1.8) (ii) X ~ Po(2.6)
Poisson distributions. Sketch the distri- (iii) X ~~ Po(4.5) (iv) X ~ Po(3.8).
hs gue dtena 3 Hie Ae hag! 1 AST Te eS

FITTING A THEORETICAL DISTRIBUTION


cal
As with the binomial distribution, it is possible to fit a theoreti
Poisson distribution to experimental data.

of
Example 4.31 I recorded the number of phone calls I received over a period
150 days:
S
256 A CONCISE COURSE IN A-LEVEL STA TISTIC

Number of calls Ure De Pn ae

Number of days 5lehb4 osGslam ee

(a) Find the average number of calls per day.


(b) Calculate the frequencies of the comparable Poisson distribu-
tion. (SUJB)P

Solution 4.31 (a) For the data given


2fx
Bi
PS

0(51) + 1(54) + 2(36) + 3(6) + 4(3)


150
lI 1.04

The average number of calls per day is 1.04.

(b) We use the mean of 1.04 as the parameter \ of the Poisson


distribution.
Let X be the r.v. ‘the number of calls per day’.
Then, if X ~ Po(1.04) then
P(X =x) =e 1% (1.047 ee OPl S2ee
x
To make the calculations easier, use the recurrence formula.
Then, to find the expected frequencies, multiply each probability
by 150 and give the answer to the nearest day.
The recurrence formula for this Poisson distribution is
1.04
Py41 =
(ee

Pe Expected
frequency (150 p,,)

Poze ' =0.353 4546 0 53


DP, = 1.04 po = 0.367 5928
1.04
p> Sere = 0.191 148 2

1.04
P3= ae ?2= 0.066 2647

1.04
P4= pee = 0.017 2288 3

Total 150
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 257

NOTE: the sum of the above probabilities is 0.995 689 4, not 1.


This is because we have stopped calculating at P(X = 4). As the sum
of the remaining probabilities is only 0.004 3106, we ignore the
error involved.

The theoretical Poisson distribution gives frequencies as follows:

Number of calls, x woe tly HT isnined


Number of days, f DoDD 2 LON. BG

This compares reasonably well with the original frequency distribu-


tion.
A statistical test to compare the two sets of data is illustrated on
p. 548 (chi-squared test).

Exercise 41

1. For each of the following sets of data, fit a 3. A firm investigated the number of employ-
theoretical Poisson distribution: ees suffering injuries whilst at work. The
results recorded below were obtained for a
(a) One ee 8 ee 52-week period:
Asighta 120 0 Bin 23
Number of employees
ey
injured in a week
Onakiibeyies ko
more
Co) On ee ge eae Sts
Part 50-120 1a the yi Number of weeks Syl Wye} al 0

Give reasons why one might expect this


An inn caters for overnight travellers and distribution to approximate to a Poisson
during its busy season of 100 days the distribution. Evaluate the mean and
number of requests each day for rooms variance of the data and explain why this
has a Poisson distribution with mean 4. gives further evidence in favour of a
The inn has four rooms for hire. Poisson distribution.
Draw up a table to show the expected Using the calculated value of the mean,
frequencies of 0,1,2,...,12 requests each find the theoretical frequencies of a
day for rooms during the 100 days. Poisson distribution for the number of
Obtain an estimate of the number of weeks in which 0,1,2,3,4 or more,
requests which will have to be refused employees were injured. (C)
during the period.
The cost of building an extra room is
estimated at £1000 which the owner 4. State the conditions under which the
would pay from capital invested to yield binomial distribution approximates to the
a net 85% per annum. If each room let Poisson distribution. Hence derive the

yields on average £3 net per day, estimate Poisson distribution of mean m and show
the annual gain or loss of income (excluding that its variance is also m.
the capital outlay) were the owner to have Tests for defects are carried out in a textile
the room built. It may be assumed that factory on a lot comprising 400 pieces of
during the rest of the year fewer than five cloth. The results of the tests are shown in
rooms are let. (SUJB) Table A below.

Table A

Number of faults
per piece
Number of pieces 92 142 96 46
258 A CONCISE COURSE IN A-LEVEL STATISTICS

Show that this is approximately a Poisson How many pieces from a sample of 1000
distribution and calculate the frequencies pieces may be expected to have 4 or more
faults? (AEB 1972)
on this assumption.

THE DISTRIBUTION OF TWO INDEPENDENT POISSON VARIABLES

The sum of io independent Poisson varies with parameters _


m and nm, spupie as is a ee variable wi Lpanels

ifxX~ Po(m) and Yo Port thenX+Y-

Proof

m~ 7
X ~ Po(m) so P(X "a= ei sa. Y~ Po(Y) sorPCYi = y) es Fe
bee x!

P(X =0)=e ”™ P(Y=0)=e”


P(X =1)=e "m P(Y=1)=e ”
2
m
Si 3!
P(X= 2)=e

and so on

Now
P(X+Y=0) = P(X=0)-P(Y=0)
= (ear \(emc)
e (m+n)

P(X+Y=1) = P(X =0)-P(Y =1)+P(X= 1)-P(Y= 0)


= (e ™)(e "n) +(e "mye“)
= e ™+"(m+n)
P(X+Y=2) = P(X =0)-P(Y= 2)+ P(X =1)-P(Y =1)
+ P(X = 2)-P(Y = 0)
oe
aa m?
“ane)te —m m)(e —n in)chew
(eme us
=
(e7")
e7 (mt n)
(m? + 2mn +n?)
a
II e (mtn) (m + n)
2!
and so on.
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS i 259

The probability distribution for X + Y is

xy 0 1 2
2
P(X + Y=x+y) e (mtn) e +” (m+n) e (m+n) (m+n)"
ae

From the distribution we see that X + Y ~ Po(m +n), as required.

Example 4.32 Two identical racing cars are being tested on a circuit. For each car,
the number of mechanical breakdowns follows a Poisson distribution
with a mean of one breakdown in 100 laps.

The first car does 20 laps and the second does 40 laps. What is the
probability that there will be (a) no breakdowns, (b) one break-
down, (c) more than two breakdowns altogether? Assume that
breakdowns are attended and the cars continue on the circuit.

Solution 4.32 In 100 laps we ‘expect’ 1 breakdown. So, in 20 laps we ‘expect’


0.2 breakdowns and in 40 laps we ‘expect’ 0.4 breakdowns.

Let X be the r.v. ‘the number of breakdowns for the first car’.

Then X ~ Po(0.2)

Let Y be the r.v. ‘the number of breakdowns for the second car’

Then Y ~ Po(0.4)

Let T be the r.v. ‘the total number of breakdowns’,so T= X+Y

T ~ Po(0.2 + 0.4)

i.e. T ~ Po(0.6)

(a) P(T=0) = ae
= 0.549 (8d.p.)

Therefore the probability that there are no breakdowns is 0.549


(3 d.p.).

(b) P(T=1) = e (0.6)

= 0.3829 (3d.p.)

ility that
probabblity
The propa there will be one breakdown is 0.329 (3d.p.).
the Ee eS
S
260 A CONCISE COURSE IN A-LEVEL STA TISTIC

(c) P(T > 2) 1—[P(T = 0)+P(T =1)


+P( =T2)]
= (0.6)2
1—|e—-* +e °-9(0.6) te °° an

1—e-°(1+ 0.6 + 0.18)


ll 1—e~°6(1.78)

= 0.023 (3d.p.)
be more than two breakdowns eeis
ON
probabilit
The EO eat eswill
there
y that See
0.023 (8 d.p.).

Example 4.33 The centre pages of the ‘Weekly Sentinel’ consists of 1 page of film
and theatre reviews and 1 page of classified advertisements. The
number of misprints in the reviews has a Poisson distribution with
mean 2.3 and the number of misprints in the classified section has a
Poisson distribution with mean 1.7.
(a) Find the probability that, on the centre pages, there will be
(i) no misprints, (ii) more than 5 misprints.
(b) Find the smallest integer n such that the probability that there
are more than n misprints on the centre pages is less than 0.1.

Solution 4.33 Let X be the r.v. ‘the number of misprints on the review page’.
Then xX ~ Po(2.3)
Let Y be the r.v. ‘the number of misprints on the classified page’.
Then Yo sPo(ie)
Let T be the r.v. ‘the number of misprints on the centre pages’.
Therefore T=X+Y and T~ Po(2.3+ 1.7)
ie. T ~ Po(4)

(a) (i) P(T = 0) =e~* = 0.018 3156 =0.018 (3d.p.)


The probability that there will be no misprints on the centre pages
is 0.018 (3d.p.).

(ii) Using cumulative Poisson probability tables, with \ = 4,


r= 5,

P(T>5) = 1—P(T<5)
= 1—0.7851
= 0.215 (3dp.)
If tables are not available, consider

P(T>5) =1—[pot+pitp.t+p3 + pat ps]


SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 261

Using the recurrence formula:

Po e* (0.018 3156)
Py ADo (0.073 262 5)

Ps (0.146 5251)

372 (0.195 366 8)

4
HS (0.195 366 8)

4
Ds BPs (0.156 293 4)

Sothat P(T>5) = 1—0.785 (from memory store)


0.215 (8d.p.)
The probability that there will be more than 5 misprints on the
centre pages is 0.215 (3 d.p.).

(b) Now P(T > 5) I| 0.215 > 0.1


and from tables we find that
P(T > 6) 0.141 925,02
Pr 7) 0.051 < 0.1
So the smallest integer n, such that the probability that there are
more than n misprints on the centre pages is less than 0.1, is 7.

- Exercise 4m

Telephone calls reach a secretary indepen- direction will pass P in a given 10-minute
dently and at random, internal ones at a period,
mean rate of 2 in any 5 minute period, and (c) that there will be exactly 4 lorries
external ones at a mean rate of 1 in any 5 passing P in a given 20-minute period.
(O &C)
minute period. Calculate the probability
that there will be more than 2 calls in any 3. A large number of screwdrivers from a trial
period of 2 minutes. (O&C)
production run is inspected. It is found
that the cellulose acetate handles are
During a weekday, heavy lorries pass a defective on 1% and that the chrome steel
census point P on a village high street blades are defective on 13% of the screw-
independently and at random times. The drivers, the defects occurring indepen-
mean rate for westward travelling lorries dently.
is 2 in any 30-minute period, and for east- (a) What is the probability that a sample
ward travelling lorries is 3 in any 30- of 80 contains more than two defective
minute period. screwdrivers?
Find the probability (b) What is the probability that a sample
(a) that there will be no lorries passing P of 80 contains at least one screwdriver
in a given 10-minute period, with both defective handle and a defective
blade? (O&C)
(b) that at least one lorry from each
A CONCISE COURSE IN A-LEVEL STATISTICS
262
places, the probability that in the next
4. Avrestaurant kitchen has 2 food mixers,
3 weeks
A and B. The number of times per week
(a) A will not break down at all,
that A breaks down has a Poisson distri-
(b) each mixer will break down exactly
bution with mean 0.4, while indepen-
dently the number of times that B breaks once,
down in a week has a Poisson distribu- (c) there will be a total of 2 breakdowns.
(L)P
tion with mean 0.1. Find, to 3 decimal

MISCELLANEOUS WORKED EXAMPLES

ing
Example 4.34 Along a stretch of motorway, breakdowns requiring the summon
of the breakdown services occur with a frequency of 2.4 per day,
that
on average. Assuming that the breakdowns occur randomly and
they follow a Poisson distribu tion, find

a
(a) the probability that there will be exactly 2 breakdowns on
given day,

(b) the smallest integer n such that the probability of more than n
breakdowns in a day is less than 0.03.

Solution 4.34 (a) Let X be the r.v. ‘the number of breakdowns a day requiring
the breakdown services’.
Then
9.AY
X ~ Po(2.4) and P(X =x) = e2 EAT 1 Des
ei
Oe

54 (2.4)?
So P(X 2) eae a iio 0-261 (as, F)

The probability that there will be exactly 2 breakdowns on a given


day is 0.261 (3S.F.).
(b) We require the least integer n such that PX >n) < 0.08.
Now if P(X >n) < 0.03
then P(X Sn) > 0.97
From cumulative Poisson probability tables, with \ = 2.4
P(X <5) = 0.9643 < 0.97
P(X <6) = 0.9884 > 0.97
So the least integer n such that P(X >n) <0.08 is 6.

If tables are not available then consider

1—(Potpitpr2t...tp,) < 0.038 where p, = P(X =x)


i.e. Dot PietRarte.s
ep; 1-2 Oe
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS ; 263

: 2.4
Using the recurrence formula p,. 4, = era DP, we will need to

find Po, P1,P2,-.., and keep a record of the cumulative probabilities


. “s )

a we know when the cumulative probability is greater than

Cumulative probability
Now py = e24 = 0.0907179 | F(0) = 0.0907179
2.4
Py = Po = 0.217728 F(1) = 0.308 441

2.4
Pr = “>Pi =/0.2612677 | F(2) = 0.5697087

2.4
P3 = —
Pa = 0.2090141 | F(8) = 0.778 7229

2.4
Pa = “Pa = 01254085 | F(4) = 0.904 1314

2.4
Pa = 0.060 1960 | F(5) = 0.9643274
Ps = —

2.4
Pe = “SPs = 0.024 0784 | F(6) = 0.9884059

By trial, De =
potpitpotpstPatpst 0.988 405 9

So the least integer n such that P(X >n) < 0.03 is 6.

Example 4.35 A random variable X has a Poisson distribution given by


nw

P(X =r) = pp=e*—,ia r=0,1,2,...

Prove that the mean of X is A. Give two examples (other than that
suggested below) of situations where you would expect a Poisson
distribution to occur.
n distribution
The number of white corpuscles on aslide has a Poisso Taw
ON
with mean 3.2. By considering the values of r for which T

find the most likely number of white corpuscles on a slide. Calculate


correct to 3 decimal places the probability of obtaining this
number. If two such slides are prepared what is the probability,
correct to 3 decimal places, of obtaining at least two white
(SUJB)
corpuscles in total on the two slides?
264 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 4.35 For the answer to the first part of the question, see page 243.
Let X be the r.v. ‘the number of white corpuscles on a slide’. Then
X ~ Po(8.2).
:
So P(X=r) = e*? =
g.9rtl
and P(X =r4+1) = & 3? +1)!

PiSerbijee- 6,273.27") 3!
ee P(X=r)e-3?3.2" (r+1)!
32
r (ead)

So aa = an where p, = P(X =r)

Hence Devi =P, When 38.2 > r+


P< 2.2,
i.e. Pexi1 > Pp when r=0,1,2
i.e. . Po<Pi<P2<DPs;
but Prei <P, when r = 3,4,5,...s0 D3 > P4—> Ds.
Therefore the most likely number of white corpuscles on a slide
is 3.

Now Px 18)4= 082

= 0.223 (8d.p.)

(3 d.p.).

Let X, be the r.v. ‘the number of white corpuscles on the first slide’.
Then X, ~Po(8.2).
Let X, be the r.v. ‘the number of white corpuscles on the second
slide’. Then X, ~ Po(3.2).
Let Y = Xt Xow nen ye Yee BP Ola.
2 ee)
i.e. Y ~ Po(6.4)
We require P(Y22) = 1—[P(Y=0)+P(Y =1)]
= 1—(e56 4+ e~ 6:4)
=e
= 0.988 (8d.p.)
The probability of obtaining at least two white corpuscles in total
on the two slides is 0.988 (3 d.p.).
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 265
We now show an alternative approach to the last part:
If two slides are prepared, we require P(total number of white
corpuscles = 2).
Now
P(total > 2) = 1—[P(total = 0) + P(total = 1)]
= 1—[P(X, = 0)P(X, = 0)
+ P(X, = 0)P(X, = 1)
+ P(X, = 1)P(X2 = 0)]
= 1—[(e~32)(e~
32)+ e-3-2(e-323.2) +(673:3.2)e~ 37]

1 Ome lL+ 3.2132)


=e =v Ae~*4
= 0.988 as before

Example 4.36 Derive the mean and the variance of the Poisson distribution.

In a large town, one person in 80, on the average, has blood of type
X. If 200 blood donors are taken at random, find an approximation
to the probability that they include at least five persons having
blood of type X.
How many donors must be taken at random in order that the
probability of including at least one donor of type X shall be 0.9
or more? (AEB)

Solution 4.36 We have already shown (p. 243-4) that if R is a r.v. such that
R ~ Po(A), then
f(t) cle a i ah

Let R be the r.v. ‘the number of blood donors of type X’.

Then R ~ Bin(n, p) where n = 200 and p = P(blood type X)= a


Now, as 7 is large and p is small, we use the Poisson approximation
to the binomial distribution.

The parameter \ = np = (200)(5) = 2.5

The probability that there are at least five donors of type X is


P(R>5) = 1—-P(R <4)
= 1—0.8912 (from tables)
= Od OF: pin(cis1))

The probability that the sample will contain at least five people
having blood of type X is 0.109 (3 d.p.).
peSeyr OCs tiga pees skyeleree eeeer
266 A CONCISE COURSE IN A-LEVEL STATISTICS

Suppose n donors are taken, then A= n(x)

SoR~ P of}

We require n such that P(R2 1) 2 0.9,


i.e. 1—P(R=0) 2 0.9
P(R =0) S 0.1
Now P(R = 0) =e 7/80
So Pei nepal |

e7/80 = i
0.1
ice. e7/89 > 10
So, taking logs to the base e,

= > In(10)
=a iN
80
n
— 2 2.30
80
n = (80)(2.30)
n 2 184.2

So we need to take 185 donors in order that the probability of


including at least one donor of type X is 0.9 or more.

CHECK: Ifn = 184, = (184)(%) 2


So Pats 1) II =") en
0.8997 (4d.p.)
We have P(R 2 1) < 0.9 when n = 184.
Now consider n = 185, then A = (185)(%) = 2.3125

P(R2S 1)=1—-e723!44
= 0.901 (8d.p.)

So P(R 2 1) > 0.9 when n = 185.

Example 4.37 In the Growmore Market Garden plants are inspected for the
presence of the deadly red angus leaf bug. The number of bugs per
leaf is known to follow a Poisson distribution with mean one. What
is the probability that any one leaf on a given plant will have been
attacked (at least one bug is found on it)?
g
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 267

A random sample of twelve plants is taken. For each plant ten leaves
are selected at random and inspected for these bugs. If more than
eight leaves on any particular plant have been attacked then the
plant is destroyed. What is the probability that exactly two of these
twelve plants are destroyed? (AEB 1977)

Solution 4.37 Let X be the r.v. ‘the number of bugs per leaf’.
ilies

Then X~Po(1) and P(X=x) = ges = 001, 2500.


x!
We require P(X 21) = 1—P(X = 0)
al en
= 1—0.368
= 0.6382 (8S.F.)
The probability that any one leaf has been attacked is 0.632 (3S.F.).

Now let Y be the r.v. ‘the number of leaves that have been attacked
on a plant’.
Then Y ~ Bin(10, 0.632) since n = 10, and the probability that a
leaf has been attacked is 0.632.
We have
P(Y=y) = 10C,(0.368)1°—7(0.632)” y =0,1;2,...,10

We require
Fixe Oo) II P(Y = 9)+P(Y = 10)
10(0.368)1(0.632)? + (0.632)!°
0.069 (3 d.p.)

probability that
The Oe any one plant is destroyed is 0.069 (3 d.p.).
ne ee

Now let R be the r.v. ‘the number of plants that are destroyed’.
Then R ~ Bin(12, 0.069) since 12 plants are inspected, and the
probability that a plant is destroyed is 0.069.
P(R =r) = 120 (0.931)!2—"(0.069)’

We require P(R=2) = 120¢,(0.931)!9(0.069)?

= (2)@1) (9 931)1(0.069)?
(2)(1)
= 0.154 (8d.-p.)

The probabi exactly two


lity thatedna of the twelve plants will be
ahh ieee eee ne
destroyed is 0.154 (3 d.p.).
268 A CONCISE COURSE IN A-LEVEL STATISTICS

SUMMARY — POISSON DISTRIBUTION

If X ~ Po(A) then P(X=x)=e *—


x!

E(X)=A
Var(X) =A

hee
Recurrence formula: Pe
i.
IfX ~ Po(m) and Y ~ Po(n) then X + Y ~ Po(m ‘ty 7)
(X, Y independent)

Miscellaneous Exercise 4n —

Lemons are packed in boxes, each box not all the sensors in a unit are opera-
containing 200. It is found that, on tional. 100 units are tested and the
average, 0.45% of the lemons are bad numbers N of pressure sensors which
when the boxes are opened. Use the function correctly are distributed accord-
Poisson distribution to find the proba- ing to Table A below.
bilities of 0,1,2, and more than 2 bad Calculate the mean number of sensors
lemons in a box. which are faulty.
A buyer who is considering buying a The manufacturer only markets those
consignment of several hundred boxes units which have at least 32 of their 36
checks the quality of the consignment sensors operational. Estimate, using the
by having a box opened. If the box Poisson distribution, the percentage of
opened contains no bad lemons he buys units produced which are not marketed.
the consignment. If it contains more than (O &C)
2 bad lemons he refuses to buy, and if it
Show that, for the Poisson distribution in
contains 1 or 2 bad lemons he has another
box opened and buys the consignment if which the probabilities of 0,1,2,...
2.-m
e
the second box contains fewer than 2 bad
successes are e7””, me~™, 5 hates
lemons. What is the probability that he 2!
buys the consignment? the mean number of successes is equal to
Another buyer checks consignments on a m. State the variance.
different basis. He has one box opened; A sales manager receives 6 telephone calls
if that box contains more than 1 bad on average between 9.30a.m. and
lemon he asks for another to be opened 10.80a.m. on a weekday. Find the
and does not buy if the second also con- probability that
tains more than 1 bad lemon. What is the (a) he will receive 2 or more calls between
probability that he refuses to buy the 9.30 and 10.30 on a certain weekday;
consignment? (SUJB) (6) he will receive exactly 2 calls between
9.30 and 9.40;
A manufacturer produces an integrated (c) during a normal 5-day working week,
electronic unit which contains 36 separate there will be exactly 3 days on which he
pressure sensors. Due to difficulties in will receive no calls between 9.30 and
manufacture, it happens very often that 9.40. (SUJB)
Table A
86 85 84 83 32 31 30 29 28 <28

Number
of units 3) 0 916) 22) (2217 tee bee eee 0
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 269

4, X is a random variable having a Poisson of 100 working days would you expect
distribution given by a particular lawnmower not to be in use?
f(x) = e&™ m*/x!, x =0,1,2,... (MEI)

Prove that the mean of X is m and state


An experimenter marked out ten neigh-
the variance of X.
bouring plots of land, all of the same area,
The number of telephone calls received and examined them for the occurrence of
per minute at the switchboard of a certain a certain species of plant. The numbers,
office was logged during the period f,, of plots in which r plants were found
10a.m. to noon on a working day. The were as follows:
results were as follows:

Calls per min. (x) Op ee ie ae ee Ae DE Oe AiG,

f is the number of minutes with x calls Calculate the mean number of plants per
per minute. plot.
By consideration of the mean and variance Assuming that the plants were scattered
of this distribution show that a possible randomly with this same mean number
model is a Poisson distribution. of plants per plot, find
(a) the probability of a given plot con-
Using the calculated mean and on the
taining no plants;
assumption of a Poisson distribution
(6) the probability of at least three plots
calculate (a) the probability that two or
being found which contain no plants.
more calls were received during any one
minute, (6b) the probability that no calls What conclusions, if any, can be drawn
were received during any two consecutive from the observed number of plots in
minutes. (SUJB) which the experimenter found no plants?
(You may assume that e237 ~ 0.1) (SMP)
Customers enter an antique shop inde-
pendently of one another and at random Derive the Poisson distribution as the
intervals of time at an average rate of limiting form of the Binomial distribution
four per hour throughout the five days of when n becomes very large and p becomes
a week on which the shop is open. The very small in such a way that np remains
owner has a coffee-break of fifteen min- constant. Write down the mean and the
utes each morning; if one or more cus- variance of this distribution.
tomers arrive during this period then his The mean number of bacteria per milli-
coffee goes cold, otherwise he drinks it litre of a liquid is known to be 3. Ten
while it is hot. samples of the liquid, chosen at random
Let X be the random variable denoting and each of volume 1 ml, are examined.
the number of customers arriving during Assuming the Poisson distribution is
a Monday coffee-break, and let Y be the applicable, obtain expressions for the
random variable denoting the number of probabilities
days during a week on which the owner’s (a) that each of the ten samples contains
coffee goes cold. Assuming that X has a at least one bacterium,
Poisson distribution, determine (correct (b) that exactly eight of the samples
to three significant figures) (a) P(X = 9), contain at least one bacterium,
(b) P(X> 2), (c) E(Y), (d) P(Y = 2). (C) If 3 ml of the liquid is examined show
that it is rather improbable that it will
A hire company has two electric lawn-
contain fewer than 3 or more than 15
mowers which it hires out by the day. (MEI)
bacteria.
The number of demands per day for a
lawnmower has the form of a Poisson
distribution with mean 1.50. In a period A discrete random variable, X, can take
of 100 working days, how many times values 0,1,... and has a Poisson distribu-
do you expect tion such that the probability that X =r
(a) neither of the lawnmowers to be is em" /r!. Prove that the mean of X
is m.
used,
(b) some requests for the lawnmowers During each working day in a certain
to have to be refused? factory a number of accidents occur
If each lawnmower is to be used an equal
independently according to a Poisson
distribution with mean 0.5.
amount, on how many days in a period
270 A CONCISE COURSE IN A-LEVEL STATISTICS

Calculate the probability that (b) Also find, to 3 decimal places, the
(a) during any one day there are 2 or probability that, if two of the sample
more accidents, are chosen at random, they have birth-
(b) during two consecutive days there days in the same month. (In 1961 there
are exactly three accidents altogether. were 7 months with 31 days, 4 months
with 30 days and 1 month with 28 days.)
Out of 50 consecutive five-day weeks how (MEI)
many would you expect to be accident-
free?
13. Define the Poisson distribution and derive
Give two further situations where you its mean. State the circumstances under
would expect a Poisson distribution to which it is appropriate to use the Poisson
apply. (SUJB)
distribution as an approximation to the
binomial distribution.
10. Prove that for the Poisson distribution in
which the probability of r successes is A lottery has a very large number of
tickets, one in every 500 of which entitles
eqttm"
(r2 0) the purchaser to a prize. An agent sells
1000 tickets for the lottery. Using the
the expected number of successes is equal Poisson distribution, find, to three
tom. decimal places, the probabilities that the
number of prize-winning tickets sold by
The telephone exchange inside an office
building has a number of outside lines of the agent is (a) less than three, (b) more
which, on average, 3 are being used at any than five.
instant. Assuming that the number of Calculate the minimum number of tickets
lines in use at any instant follows a the agent must sell to have a 95% chance
Poisson distribution, find of selling at least one prize-winning ticket.
(a) the probability that, at any given (JMB)
instant, not more than 3 lines are in use,
(b) the minimum number of outside lines 14. Define the Poisson distribution and derive
required if there is to be a probability of its mean and variance.
more than 0.9 that, at any given instant, The number of telephone calls received at
at least one of the lines is not being used. a switchboard in any time interval of
(C) length T minutes has a Poisson distribution
tae The monthly demand for a certain with mean ST. The operator leaves the
magazine at a small newsagent’s shop switchboard unattended for five minutes.
has a Poisson distribution with mean 3. Calculate to three decimal places the
The newsagent always orders 4 copies probabilities that there are (a) no calls,
of the magazine for sale each month; any (b) four or more calls in her absence.
demand for the magazine in excess of 4 is
Find to three significant figures the maxi-
not met.
mum length of time in seconds for which
(a) Calculate the probability that the
the operator could be absent with a 95%
newsagent will not be able to meet the
probability of not missing a call. | (JMB)
demand in a given month.
(b) Find the most probable number of
magazines sold in one month. 15. Define the Poisson distribution and derive
(c) Find the expected number of maga- its mean and variance. :
zines sold in one month. In the first year of the life of a certain
(d) Determine the least number of copies type of machine, the number of times a
of the magazine that the newsagent maintenance engineer is required has a
should order each month so as to meet Poisson distribution with mean four. Find
the demand with a probability of at least the probability that more than four calls
0.95. (JMB) are necessary.
A random sample of 500 people born in The first call is free of charge and subse-
12.
1961 is being studied. It can be assumed quent calls cost £20 each. Find the mean
that birthdays are uniformly distributed cost of maintenance in the first year.
throughout the year. (JMB)
(a) Use the Poisson distribution to find,
to 3 decimal places, the probabilities that 16. The number of oil tankers arriving at a
there are (i) exactly two people, and port between successive high tides has a
(ii) no more than two people, with birth- Poisson distribution with mean 2. The
days on 1 January. depth of the water is such that loaded
SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS 271
i
vessels can enter the dock area only on Find the probability that in exactly half
the high tide. The port has dock space for of these 10 rooms the carpets will con-
only three tankers, which are discharged tain exactly 3 faults. (AEB 1988)
and leave the dock area before the next
tide. Only the first three loaded tankers 18. A randomly chosen doctor in general
waiting at any high tide go into the dock practice sees, on average, one case of a
area; any others must await another high broken nose per year and each case is
tide. independent of other similar cases.
(a) Regarding a month as a twelfth part
Starting from an evening high tide after of a year,
which no ships remain waiting their turn, (i) show that the probability that,
find (to three decimal places) the proba- between them, three such doctors
bilities that after the next morning’s high see no cases of a broken nose in a
tide (a) the three dock berths remain period of one month is 0.779,
empty, (0) the three berths are all filled. correct to three significant figures,
Find (to two decimal places) the proba- (ii) find the variance of the number
bility that no tankers are left waiting out- of cases seen by three such doctors
side the dock area after the following in a period of six months.
evening’s high tide. (JMB) (b) Find the probability that, between
them, three such doctors see at least
17. The random variable X has a Poisson three cases in one year.
distribution with parameter A. (c) Find the probability that, of three
(a) Prove that E(X) =A. such doctors, one sees three cases and the
(b) If P(X =k) =P(X=k+1), where other two see no cases in one year. (C)
k is some integer, show that A must also
be an integer. 19. State, giving your reasons, the distribution
(c) If is not an integer, show that the which you would expect to be appropriate
mode, m, of the distribution is such that in describing
A <n <1), (a) the number of heads in 10 throws of
a penny,
In the manufacture of commercial carpet, (b) the number of blemishes per m? of
small faults occur at random in the car- sheet metal.
pet at an average rate of 0.95 per 20 m?.
Find the probability that in a randomly A building has an automatic telephone
selected 20 m? area of this carpet exchange. The number X of wrong con-
nections in any one day is a Poisson
(d) there are no faults,
(e) there are at most 2 faults. variable with parameter A. Find, in terms
of A, the probability that in any one day
The ground floor of a new office block there will be
has 10 rooms. Each room has an area of (c) exactly 3 wrong connections,
80 m? and has been carpeted using the (d) 3 or more wrong connections.
same commercial carpet described above.
For any one of these rooms, determine Evaluate, to 3 decimal places, these
the probability that the carpet in that probabilities when A = 0.5. Find, to 3
room decimal places, the largest value of A for
(f) contains at least 2 faults, the probability of one or more wrong
connections in any day to be at most 6.
(g) contains exactly 3 faults,
(h) contains at most 5 faults.
(L)
PROBABILITY
DISTRIBUTIONS Ii —
CONTINUOUS RANDOM
VARIABLES
A continuous random variable (r.v.) is a theoretical representation
of a continuous variable such as height, mass or time.

PROBABILITY DENSITY FUNCTION

A continuous r.v. is specified by its probability.density function


(p.d.f.) written f(x).
If X is acontinuous r.v. with p.d.f. f(x) valid over the rangea <x Sb.
then

(i) f(x)dx = 1 since X is ar.v.


all x ;

b
ie. | f(x)dx = 1
The area under the curve y = f(x)
between x =a andx = bisl.

(ii) Ifa<x,;<x,<b
then

P(x, <X<x,) = [ f(x) ax


P(x;<X <x) is given by the y
E area under the curve y = f(x)
between x = x, and x = xp.

272
PROBABILITY DISTRIBUTIONS I! — CONTINUOUS RANDOM VARIABLES 273
i
NOTE: in an experimental approach, the area under the histogram
represents frequency. In a theoretical approach, the area under the
curve y = f(x) represents probability.

Example 5.1 A continuous r.v. has p.d.f. f(x) where f(x) = kx, O<x <4.
(a) Find the value of the constant k,
(b) sketch y = f(x),
(c) find P(l<X < 25).

Solution 5.1 (a) Since


Xisar.v. | f(x)dx = 1
allx
4
So kx dx = 1
“0

a a
= 4

ZO
8k = 1
i
k=.
8
Therefore f(x) = EX O0<x <4.

(b) Sketch of y = f(x)

(c) P(1<xX<23)1 = [axa


2

=a
x?|2

= 0.828 (35S.F.)

Therefore
eteoe
< 25)=
P(1< en gee ee (3S.F.).
0.328

Example 52 A continuous r.v. X has p.d.f. f(x) where


Rx 0<x<2

Meee Rize) 2a = 4
0 otherwise
274 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Find the value of the constant k.


(b) Sketch y = f(x).
(c) Find P(g <X< 25).

Solution 5.2 Xis


(a) Sin ar.v.
ce | f(x) dx = 1
all x

2 4
Therefore | hexdx+ | k(4—x) dx Il e
(0) on?

x2]? x24
a +k w= 1
2 |o 2 |2
2k+ki16—8—(8—2)} = 1
4k = 1

pee
a
So the p.d.f. for X is given by
ax O0<x<2
f(x) = {e(4—-x) 2<xn<4
0 otherwise

(b) Sketch of y = f(x)


NOTE: this is known as a
triangular distribution.

(c) We require PG <x 25).

This is given by the shaded


areas in the diagram.
NOTE: we find the required
area in two stages:

PG<X< 23) = P§<X<2)+P(2<X< 23)

Il
j
nile
Lede + ["H4—x) dx

ekase ata fiero

|e[2
a a 40-2 |
PROBABILITY DISTRIBUTIONS || — CONTINUOUS RANDOM VARIABLES, 275
eT
= —+4+—
32 32

a
16
Therefore PS <xXx< 25) = as

_ Exercise 5a :

The following distributions will be used in 5. The continuous r.v. X has p.d.f. f(x) where
Exercises 5c and 5d and it will be useful to f(x) = kx®, O<x <c and P(X <3)= %.
refer to these answers. Start each answer on a Find the values of the constants c and k
fresh sheet of paper so that you can add to it and sketch y = f(x).
later.
z (6) The continuous r.v. X has p.d.f. f(x) where
1. The continuous r.v. X has p.d-f. f(x) where k Oma
f(x) = kx?, O<x <2. soca
(a) Find the value of the constant k. f(x) = \R(2x—3) 25x43
(b) Sketch y = f(x). 0 otherwise
(c) Find P(X 2 1). :
(d) Find P(0.5 <X<1.5). (a) Find the value of the constant k.
(b) Sketch y = f(x).
2. The continuous r.v. X has p.d.f. f(x) where (c) Find P(X <1).
f(x) =k, —2<x <3. (a) Find the value (d) Find P(X > 2.5).
of the constant k. (b) Sketch y = f(x). (e) Find P(l<X < 2.8).
(c) Find P(—1.6 < X S 2.1).
8. The continuous r.v. X has p.d.f. f(x) where The continuous r.v. X has p.d.f. f(x) where
f(x)
= k(4—x), 1<x <8. (a) Find the k(x+ 2)? —2<x<0
value of the constant k. (b) Sketch 2 ea
y = fix). (c) Find P(1.2<X < 2.4). [ees AR Ose515
: 0 otherwise
4. The continuous r.v. X has p.d.f. f(x) where
f(x) =k(x +2)’, O<x <2. (a) Find the (a) Find the value of the constant k.
value of the constant k. (b) Sketch (b) Sketch y = f(x).
= f(x). (c) Find P(O < X <1) and hence (c) Find P(-1<X <1).
find P(X > 1). (d) Find P(X > 1).
i
ee

EXPECTATION

IfXi is a continuous rv. with Pee--He)stthen theeee of.


Zois a )where ©

: “BuO=oe en
NO TE: E(xViis often denoted oFpw and refered | to as be mean
(Of x.

3x 2
Example 53 If X is a continuous r.v. with p.d_f. f(x)=—, 05x < 4, find E(X).
64
276 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 5.3 Now E(X) = | xflx)dx


/allx

ie Se
'o \64

= eres

ae
2 a

Therefore E(X) = 3.

Example 54 If the continuous r.v. X has p.d.f. f(x) = 3(3 =x)(x—5), 3x = 5,


find E(X).

Solution 5.4 E(X) is x f(x) dx

[ 8x(3—x)(e—5) dx
3

f. (8x2— 15x —x3) dx


3

P= 15x? |
3 es 3

cour,
eee
ae See

Blow
Plo
-

Therefore E(X) = 4.

NOTE: in this example it would have been advantageous to have


drawn a sketch of y = f(x), thus:
From the sketch, we see that there y
is symmetry about the line x = 4.
y ry f(x) = 5(3—x)x8)
x

Therefore, by inspection, E(X) = 4.

So, whenever possible, draw a sketch of y = f(x) and look for


symmetry when finding E(X).
PROBABILITY DISTRIBUTIONS Il— CONTINUOUS RANDOM VARIABLES y 277

The expectation of any function of X

If g(x) is any function of the continuous r.v. X having p.d_f.


f(x), then

Efg(X)] = [ e(x) f(x) dx


allx

In particular E(X?) = | x? foc)dex


allx

-As in the case of the discrete r.v. (see p. 181), the following results
hold when X is a continuous r.v.

Result1 E(@)=a E(a) = | af(x)dx

= a| f(x) ax
Jallx

=a

Result 2 — E(ax ) = aE(X)

E(aX) al ax f(x) ax
Jallx

ag Hees

= aE(X)

Result 3 E(aX +b) =ak(X)+b

Result 4 E[f;(X)+f(X)] = E[fi(X)1+ £Uf(X)]

The continuous r.v. X has p.d.f. f(x) where f(x) = a(x 3);
Example 55
0<x<4.
(a) Find E(X). (b) Verify that E(2X+ 5) = 2E(X) + 9.
(c) Find E(X?). (d) Find E(X?+ 2X—8).

Solution 5.5 Sketch of y = f(x)


A CONCISE COURSE IN A-LEVEL STATISTICS

(a) We note from the sketch that there is no symmetry.

Now E(X) = | x f(x) dx


all

41
|rex (toto) ax
o 20
1 4
aa (x2 + 8x) dex
20/ 0

ae
aa
se)’
—-+—

20| 3 2P'6

wet
1 (64
‘20\ 3

34
15

Therefore E(X) = 2.

(b) E(2X +5) [ (2x +5) f(x) dx


Jallx

[* (2x+ 5)(@+3) dx
/o9 20

eat
ae x x )
: 2x?+11x*+15)dx

4
1 11x?
—— + ———
+ 1x
20 3 2 0
1 ee
128
TOO 60|
20
572
60.
143
15
Therefore E(2X+ 5) = we

Now QE(X)+5 (2 Fe +5
15

143
15
So E(2X +5) = 2E(X)+5.
PROBABILITY DISTRIBUTIONS I! — CONTINUOUS RANDOM VARIABLES / 279

(c) D(X) = | x? f(x) dx


allx

z = [*x%(x + 3) de
20 0
1 (4 34 342
20/5 Sa

1 |x* 2
= —|—+ =|
20,4 0
1
= (64Gh64)
20

_ 82 5

Therefore E(X*) = 32

(d) E(X?+2X—3) = E(X2)+ E(2X)—E(3)


= E(X*)+2E(X)—3

Therefore E(X? + 2X —38) = i”.

Example 5.6 The continuous r.v. X has p.d.f. f(x) where


ox 0<x<1
f(x) = |$x(2—x) 1<x<2
0 otherwise

Find (a) E(X), (b) E(X?)

Solution 5.6 From the sketch, we see that there


is no symmetry.

(a) E(X) | whe) dx

we 8 x?de + | 6 2(2—x)dx
2

¥ pul ac
280 A CONCISE COURSE IN A-LEVEL STATISTICS

o [x], 8 Peal’
elas MtO. a ees 4}1

weer lae 46-4


7\3 7|3 3. 74

ia
6/5

15
14
Therefore E(X) = 2.

(b) E(X?) = | x? f(x) dx


allx

eed
"6-3
{3 dx +} ox (2—x) dx
"678

x4 1 6 x4 x> 2

4 tlie ei ee Hit

oe
mle
attSee

|e
AIH
AIR
A(R oe

oO
!lw
Sa

Therefore E(X’) = 870°

Exercise 5b

The continuous r.v. X has p.d.f. f(x) 4. The continuous r.v. X has p.d-f. f(x)
where f(x) = 3x?,0<x <1. (a) Find where f(x) = kx?, O<x <2. (a) Find
E(X). (b) Find E(X?). (c) Verify that the value of the constant k. (b) Find
E(3X—1) = 3E(X)—1. (d) Find uU=E(X). (c) Find E(8X). (d) Find
E(2X7+ 3X+ 3). E(X?—4X+ 8).
The continuous r.v. X has p.d.f. f(x) 5. The continuous r.v. X has p.d.f. f(x)
where f(x) = $x(2—<x), 0x 2. where
(a) Find E(X). (b) Find E(X?). 3 2SX < re,

The continuous r.v. X has p.d.f. f(x) f(x) = 3x(4—x) 2<5x%<4


where f(x) = 4(6—x), 0 <x <6. Find 0 ‘ otherwise
(a) E(X), (b) E(2X—1), (c) E(X”), (a) Find E(X)- (b) Find E(5X— 2).
(d) E(X*—4X+ 8). (c) Find E(X*).
PROBABILITY DISTRIBUTIONS Il— CONTINUOUS RANDOM VARIABLES ; 281
6. The continuous r.v. X has p.d.f. f(x) to function. If two new batteries are put
where in the torch, what is the probability that
the torch will function for at least 22
kx Osx
<1
hours, on the assumption that the life-
k 1s<<3 times of the batteries are independent?
8XSx <4 (0 &C)
0 otherwise
10. A random variable X has a probability
density function f given by
(a) Find k. (b) Calculate E(X). (c) Now
ex(5—x) ONxS5
sketch y = f(x). (d) Check E(X) from x —

the sketch. (e) Find E(X’). ix) 0 otherwise


Show that c = 6/125 and find the mean
In a game a wooden block is propelled
of X.
with a stick across a flat deck. On each
attempt the distance, x metre, reached by The lifetime X in years of an electric
the block lies between 0 and 10 m, and light bulb has this distribution. Given that
the variation is modelled by the proba- a lamp standard is fitted with two such
bility density function new bulbs and that their failures are
independent, find the probability that
g(x) = 0.0012x7(10—x). neither bulb fails in the first year and the
Calculate the mean distance reached by probability that exactly one bulb fails
the block. (SMP) within two years. (MET)

The continuous random variable X has ae The mass X kg of a particular substance


the probability density function f given produced per hour in a chemical process
by f(x) = kx, 5 <x < 10, f(x) =0 is a continuous random variable whose
otherwise. probability density function is given by
(a) Find the value of k. (6) Find the f(x) = 3x7/32 0<x<2
expected value of X. (c) Find the proba-
f(x). = 8(6—x)/82 25x <6
bility that X > 8.
f(x) = 0 otherwise
The annual income from money invested
in a Unit Trust Fund is X per cent of the (a) Find the mean mass produced per
amount invested, where X has the above hour.
distribution. Suppose that you have a (b) The substance produced is sold at £2
sum of money to invest and that you are per kg and the total running cost of the
prepared to leave the money invested over process is £1 per hour. Find the expected
a period of several years. profit per hour and the probability that in
an hour the profit will exceed £7. (JMB)
State, with your reasons, whether you
would invest in the Unit Trust Fund or in 12. A continuous random variable X has the
a Money Bond offering a guaranteed probability density function f defined by
annual income of 8 per cent on the money
invested. (JMB) f(x) = “3 0<x<3
The lifetime X in tens of hours of a torch f(x) =e 3<x<4
battery is a random variable with proba-
bility density function {(x) = 0 otherwise

_ j8a—@—-2)) Lee <3, where c is a positive constant. Find .


(i) the value of c,
Ae 0 otherwise
(ii) the mean of X,
Calculate the mean of X. (iii) the value, a, for there to be a proba-
A torch runs on two batteries, both of bility of 0.85 that a randomly observed
value of X will exceed a. (JMB)
which have to be working for the torch
Pes aa pbancenl gevidtiong of Xe

VARIANCE
For a random variable X,
Var(X) = E(X—yp)? where pw = E(X)
be written:
As in the discrete case (see p. 183) the formula can
Var(X) = E(X?)—E>(X)
282 A CONCISE COURSE IN A-LEVEL STATISTICS

or yp?
Var(X) = E(X?)—

As in the discrete case (see p. 186), the following results also hold
when X is a continuous r.v.: vi

Example 5.7 The continuous r.v. X has p.d.f. f(x) where f(x) = aX O<x <4.
Find (a) E(X), (b) E(X?), (c) Var(X), (d) the standard deviation
o of X, (e) Var(3X+ 2).

Solution 5.7 From the sketch of y = f(x) we note


that there is no symmetry.

(a) BC. = la x f(x) dx

ait |
[il 22ax

alt
1 =2

428,
3

Therefore E(X) = 5.

(b) R(X yR= i x? f(x) dx


4
= | 1,3 qx
oe
PROBABILITY DISTRIBUTIONS I! — CONTINUOUS RANDOM VARIABLES y 283

1 |"

814 Jo
Shen
3 | )
= 8
Therefore E(X’) = 8.

-)
“(c) Var(X) = E(X?)—E%(X)

Therefore Var(X) = S.

(d) Standard deviation, o Il < es


=;

_ 2/2
3
2/2
Therefore o = 2.

(e) Var(8X +2) = 9Var(X)

=
9|—

8
9(|
Therefore Var(3X + 2) = 8.

Example 5.8 The continuous random variable X has p.d.f. given by f(x) where
ax
1y2 0<x<3
<=

f(x) = 5 3<x<5
0 otherwise

(a) Sketch y = f(x). (b) Find E(X). (c) Find E(X?): (d) Find the
standard deviation o of X.

Solution 5.8 (a) Sketch of y = f(x)


NOTE: there is no symmetry.
284 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) E(X) Il |__ fla) de


3 5
| Ba? det | Ly dx
ee 33

- EEL

Therefore E(X) = oe

(c) E(X?) ae x? f(x) dx


I,
etals,
xt + | 5x?dix

1
27
9
os
57
571
45

Therefore E(X’) = ao

(d) a”. = Var(X)


=O) ta)
= a (s
5)

= fines
Therefore o = 1.008 (3 d.p.)
The standard deviation of X is 1.008 (3 d.p.)

Example 5. The continuous r.v. X has p.d.f. f(x) where f(x) = $(1 +x),
0<x <1. IfE(X)= wand Var(X) = o?, find P(|X—pl<o).
Solution 5.9 From the sketch of y = f(x) we see that Y| Fix) = = + x?)
there is no symmetry.

We will need to find up = E(X).


PROBABILITY DISTRIBUTIONS Il— CONTINUOUS RANDOM VARIABLES 285

sa E(X)
1 3 fl
[ Sea taxtax = 2 [ (etx)dx

sed
64 4 Jo
3 [x2 x4]!
Aa lee) 4 jo

at)
2
16
Therefore py = = = 0.5625.

To find o? = Var(X), first consider E(X?).

E(X?) II |: x? f(x) dx

1
| $27(1 + x?) dx
0
3 rl
= 71,
eae 2
+x)4 dx
es =)
—+—

3 5 jo

So Var(X) =

= 0.0836 (3S.F.)

and /Var(X)
= 0.289 (3S8.F.)

eee + x2)
a A

We require
as |
|

P(\X—pl<o) P(i—0 <X—y<0) |

P(p—o<X<yto) Oju-—o uta


286 A CONCISE COURSE IN A-LEVEL STATISTICS
lI P(0.5625 — 0.289 < X < 0.5625 + 0.289)

P(0.2735 < X < 0.8515)


°0.8515
| 8(1 +x?) de
0.2735

3 x? 0.8515 Me
= —Ix+—
rE a m2
3
5 [oasisst;

(08818) (oyr95
0.8515)?
+02785"
0.2735)?

0.583 (3858.F.)
Therefore P(|X—ul< a) = 0.583 (3 S.F.).

THE MODE
The mode is the value of X for which f(x) is greatest, in the given
range of X. It is usually necessary to draw a sketch of y = f(x) and
this will give an idea of the location of the mode.
For some probability density functions it is possible to determine
the mode by finding the maximum point on the curve y = f(x)
d
from the relationship f'(x) = 0, where f(x) = ze f(x).

Example 5.10 The continuous r.v. X has p.d.f. f(x) where f(x) = 3(2 +x \(4—=-x);
0<x <4. (a) Sketch y = f(x). (b) Find the mode.

Solution 5.10 (a) Sketch of y = f(x). ’ Sar


3
y= 30'7 +x)(4—x)

3 3
(b) fle) =~ (2tay(4—x) = a0(8 2)
j 3
f (x) = oe
Now f(x) = 0 when 2—2x =0
x=1

To check that this is a maximum, consider f"(x) = Z(- 2) 0:

i.e. f"(x) is negative for all values of x, therefore there is a maximum


atx =1. ;
So the mode is 1.
PROBABILITY DISTRIBUTIONS I] — CONTINUOUS RANDOM VARIABLES ¥ 287

_ Exercise 5c

For each of the following probability density 1


6. re=|, 4 <x<
OSxS2
functions of the continuous r.v. X, find
(a) E(X), (b) E(X7), (c) Var(X), (d) the 4(26—=3) 25553
standard deviation of X. It is assumed that
the value of the function is zero outside A(n+2)? —2<x<0
the range(s) stated. Do not forget to look for oe ee '
3 0<x<15
symmetry when considering EH(X).
8. A continuous r.v. X has p.d.f. f(x) = kx’,
NOTE: These functions were given in Exercise 0<Sx <4,
5a and you may wish to refer to your previous (a) Find the value of k, and sketch
sketches. You will need them again for Exer- y = f(x).
cise 5d. (b) Find E(X) and Var(X).
(c) Find? P< 22).
ite =3 3.2 <Sx< X has p.d.f. f(x) where
po a = 9. A continuous r.v.

2. f(x)=% eee Gg kx 0<x<1


ee ekex) eee
3. f(x) =4(4—2) 1<x<3 fx) ( aos
0 otherwise

4. f(x) = (x +2)" 0O<x <2 Find (a) the value of the constant Re,
(b) E(X), (c) Var(X), (d) P(a@<X<13),
5. = 4x?
f(x) O<x<1 (e) the mode.
eee eee

CUMULATIVE DISTRIBUTION FUNCTION, F(x)

When considering a frequency distribution the corresponding cumu-


lative frequencies were obtained by summing all the frequencies up
toa pee value.

t
in the same way, if X is a continuous random variable with p.d.
f(x) defined for a<x <b, then the cumulative distribution
fypcion is Bee »y F(t) where -

F(t) = P(X<t)= tfice)


ax

NOTE: (1) F(b) = |flex)ax =,


(2). lf f(x) 1s valid.tor = co <x <o then

r= [ f(x) dx, where the interval is

taken over all values of x St.

sometimes known just


(3) The cumulative distribution function is
as the distribution function.
288 A CONCISE COURSE IN A-LEVEL STATISTICS

USING F(x) TO FIND P(x, <X <x2)


The cumulative distribution function can be used to find
P(x, <X <x.) as follows:
P(X <x 1) = F(X)
P(X Sx») lI F(x2)

So P(x; <X SX) Fix.) F(x)

The median
The median splits the area under the curve y = f(x) into two halves.
So, if the value of the median is m,

fle)ae II 0.5 y y = f(x)

i.e. F(m) Il 0.5

Example 5.11 If X is a continuous r.v. with p.d.f. f(x) = 5X, 0<x <4,
(a) find the cumulative distribution function F(x) and sketch
y = F(x), (b) find the median m, (c) find P(0.83< X <1.8).

Solution 5.11 (a) Now F(t) = is1 dx


/ 0
x? t

- al
t?

16
t?
So that F(t) = —, 0<xt<4
16
42 ]
NOTE: (1) F(4)= 16 = 1 (as expected).

(2) F(t) =1, t2 4.


(3) we usually calculate the answer in terms of t and then write
out the cumulative distribution in terms of x as follows:

x<0

F(t) es Ee 1g a
PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES , 289

Sketch of y = F(x)

(b) For the median m,


F(m)
= 0.5
Ha?
-1.e. sae oe 0:5
16

m= §
m = 2.838 (2d.p.)
The median m = 2.83 (2 d.p.).

NOTE: 0<m<4,so m cannot be negative.

(c) P(0O.38 <x <1.8) = F(1.8)—F(0.3).

Now F(1.8) = Ls
16
ll 0.2025.

0.37
and F(0.3)
16.
= 0.005 625

Therefore P(0.83<x<1.8) = 0.2025—0.005 625


0.196 875
02197 (3'd.p.)

So P(0.38 <x <1.8) = 0.197 (3.d.p.).

Example 5.12 X is a continuous r.v. with p.d-f. f(x) where


ee je=
3

2x
f(x) = 4e 2<x<3

0 otherwise

bution function
(a) Sketch y = f(x). (b) Find the cumulative distri
2.5). (e) Find the
F(x). (c) Sketch y = F(x). (d) Find P(1 <X<
median, m.
290 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 5.12 (a) Sketch of y = f(x).

(b) Now F(t) = |* F(a) dix.


0

But, as f(x) is given in two parts, we must find F(x) in two stages:
Consider 0<x <2

F(t) \| &
aa
°o w|s

II
~
°
beeen

I>
SR,
OM.
its

x2
So, forO<x<2, F(x) =—.

NOTE: F(2)=%= 2.

Now, for t in the range 2 <t < 3 we see


from the diagram that

2x
F(t) = F(2) + (area under the curve y = — 3 + 2 between 2 and ft)

So
t 2x
F(t) = F(a)+[ [= +2)ax
2
ray+|- + 2x]"
3 2
2 {2 4
= —+(\——+2t-—|——+4
3 3 3
12
ae ahaa 2<5t<3
PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES , 291

Now check the value of F(8)


9
F(8) = si +6—2 = 1, as required

For any value of t 2 3, F(t) = 1.


Writing the answer in terms of x, we have

== O0<x<2
6
he
F(x) = (== +.2x—2 25x53

1 x23

(c) Sketch of y = F(x).

(ad) P.A<X<2.5) = F(2.5)—F(1).


Now F(2.5) = — “+2x—2 (as 2.5 is in the range 2<x <3)

a a + 2(2.0)2
3
aE.
12
oe
F(1) = ca (as 1 is in the range 0 <x < 2)

oe
«6
Therefore P(A<X<2.5) = F(2.5)—F(1)
Leaips
27 3G
= 0.75
So P(1 <X < 2.5) = 0.75.

(e) F(m) = 0.5, where m is the median.


0 <x <2.
Now F(2) = 5, so the median must lie in the range
2
There fore F(m) = By
292 A CONCISE COURSE IN A-LEVEL STATISTICS

So
|
3,
= 0.5

2= 8
O&O
3

m = 1.73 (2d.p.)

The median m is 1.73 (2 d.p.).

Example 5.13 A continuous r.v. X takes values in the range 0 <x <1 and ney,
p.d.f.
3.75x +0.1 0<x<0.4
f(x) = 416 ' 0.4<x<06
3.85—3.75x 06<x<1
(a) Sketch y = f(x). (b) Find the mean uy. (c) Find the cumulative
distribution function, F(x) and sketch y = F(x). (d) Find
POX —pl= 0.2):

Solution 5.13 (a) Sketch of y = f(x). y a Line of symmetry

(b) By symmetry, up = E(X) = 0.5.

(c) We must consider F(x) in three stages:

0<t<04
t
F(t) zal (3.75x
+ 0.1) dx
0

x2 t

- s75% +0.
2 0

= 1.87574 0,1¢

Now

F(0.4) = (1.875)(0.4)?
+(0.1)(0.4)
0.34
PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES 9
293

0.4<¢t<0.6

F(t) = F(0.4) fe 1-6 dx


0.4

II F(0.4) + [1.6x]¢,
= 0.34+ 1.6t—0.64
1.6t—0.3
Now F(0.6) = (1.6)(0.6)—0.3
| 0.66
06<x<1
t
F(t) F(0.6) sat _ (8.85 —3.75x) dx
0.

2 |t
F(0.6) + [s.85<—3.75 ~
0.6

= 0.66 + 3.85t—1.875t?— 2.31 + 0.675


= 3.85t—1.875t7— 0.975
Check F(1):.
F(1)' = 3.85—1.875—0.975
lI 1 as required

Writing the answer in terms of x we have:


0 x<0
1.875x? + 0.1x 0<x<0.4
F(x) = \ 1.6x—0.3 0.4<x<0.6
3.85x —1.875x?— 0.975 06<x<1l
Nd x21

Sketch of y = F(x).

(d) P(|\X—p|S<0.2) lIP(-0.2<X—u<0.2)


= P(—0.2<X—0.5 <0.2)
P(0.3<.X <0.7)
F(0.7)—F(0.3)
A CONCISE COURSE IN A-LEVEL STA TISTICS
294

Now 0.7 lies in the range 0.6 <x <1, so


F(0.7) 3.85x — 1.875x2—0.975
(3.85)(0.7)— (1.875)(0.7)?— 0.975
0.801 25
0.3 lies in the range O <x <0.4, so
F(0.3) ll 1.875x? + 0.1x
(1.875)(0.3)? + (0.1)(0.3)
0.198 75
Therefore
P(0.8 < X S0.7) F(0.7) —F(0.3)
0.801 25 — 0.198 75
0.6025
So P(|X—p|< 0.2) = 0.6025.

Exercise 5d © SS ee
ee

For each of the following probability density 0, (c) the cumulative distribution function
functions of the continuous r.v. X, find F(x), (d) the median, m.
(a) the cumulative distribution function F(x),
and for questions 1, 2, 5 and 7 find also 10. The continuous r.v. X has continuous
(b) the median, m. p.d.f. f(x) where

NOTE: These are the functions used in 2 9<x<3


IX IX
Exercise 5a and 5c. 3
1 f(x) = 3x? 0<xS<2 <x < 5
2. f(x)=% —2<x%<3 < x< 6
3. f(x) =4(4—*x) 1<x<3 otherwise

4. fix) =%(xt2) O0<x<2 Find (a) wand B, (b) F(x) and sketch
y = F(x), (ec) P(2<X S3.5),
Se f(x) = 4x? O0<x<1 (d) P(X > 5.5), (e) E(X), (f) Var(X).
1
6. 4;
f(x) -| 0O<x<2
SxeS 11. The continuous r.v. X has probability
density function given by
4(2x 3) 25553
k
a(x+2)?> —2<x<0 in) = for 1<x <Q,
0 otherwise,
2 0<x<13
where k is a constant. Giving your
8. The continuous r.v. X has p.d.f. f(x) = 4, answers correct to three significant
0<x <3. Find (a) E(X), (b) Var(X), figures where appropriate, find
(c) F(x) and sketch y = F(x), (a) the value of k, and find also the
(d) P(X > 1.8), (e) P(1.1<X<1.7). median value of X,
(b) the mean and variance of X,
9. X isthe continuousr.v. with p.d.f. f(x) = kx?, (c) the cumulative distribution function,
1<x <2. Find (a) the constant k and F, of X, and sketch the graph of
sketch y = f(x), (b) the standard deviation y = F(x): (C)
PROBABILITY DISTRIBUTIONS 11 — CONTINUOUS RANDOM VARIABLES 295

12. The continuous r.v. X has probability places, the median and the interquartile
density function f given by range of the distribution (L)P
k(4—x") for OS x <2. 14. Define the probability density function
Oj 0 otherwise, f(x) and the distribution function F(x)
where F is a constant. Show that k = % of a continuous random variable X.
and find the values of E(X) and Var(X). A factory is supplied with flour at the
Find the cumulative distribution function beginning of each week. The weekly
of X, and verify by calculation that the demand, X thousand tonnes, for flour
median value of X is between 0.69 and from this factory is a continuous random
0.70. variable having the probability density
Find also P(0.69 < X <0.70), giving function
your answer correct to one significant fix) = k(1—x)*, O<x<1,
figure. (C) f(x) = 0, elsewhere.
Find
13. A continuous random variable X has (a) the value of k,
probability density function, f, defined (b) the mean value of X,
by (c) the variance of X, to 3 decimal
f(x) ll 5, 0<x <1, places.

x 3 Sketch the probability density function.


f(x) Pte = 2, Find, to the nearest tonne, the quantity
5
of flour that the factory should have in
f(x) 0, otherwise. stock at the beginning of a week in order
Obtain the distribution function and that there is a probability of 0.98 that
hence, or otherwise, find, to 3 decimal the demand in that week will be met. (L)
pee ne

OBTAINING THE p.d.f. FROM THE CUMULATIVE DISTRIBUTION

The probability density function can be obtained from the cumula-


tive distribution as follows

Nee F(t) = [fle)ax cata


age
So “fie) = Fe)

NOTE: the gradient of the F(x) curve gives the value of f(x).

The cumulative distribution


function F(x)
The probability density
y = Fix) function f(x)

Vitx)

Shallow curve,
small gradient, Steep curve,
small value of f 1 large gradient,
| large value of f
296 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 5.14 The continuous r.v. X has cumulative distribution function F(x)
where
) x<0
eo
F(x) =)" 0<x<3
Zt
1 xes

Find the p.d.f. of X, f(x), and sketch y = f(x).

Solution 5.14
Fey
ae ee
The p.d.f. for X is f(x) where
F ;
een Ones ea8
fix) =| 9
0 otherwise

Example 5.15 The continuous r.v. X has cumulative distribution function F(x)
where '
0 Ne
1(2+) a 2 =x <0

F(x)= | 3+) O0<x<4


41 (6+x) 4A<x<6
? x26

Find the p.d.f. of X, f(x), and sketch y = f(x).

Solution 5.15 Now f(x) = <Fe),

dal 1
So, for) §= 2 =x0 x) = ——(24+x) = —
a dx12' . 12
dl 1
for O<Sx<4 f(x)
x) = ae
——(1+ x) es ;

lpg) 1
for 4<x<6 f(x) os x) = —=
x) = ——(6+x)

The sketch of y = f(x) is shown:


PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES 7
297

Example 5.16 The cumulative distribution function F(x) for a continuous r.v. X
is defined as

0 ae)

ly —1y2 0<x<l
2 8
F(x) = at ix 1<x<2
b+ ix? ae 2<x<3
1 <0

(a) Find a and 6 and sketch F(x). (b) Find and sketch the p.d.f.
f(x). (c) Find the mean p and the standard deviation o.

Solution 5.16 (a) Now it is obvious that F(3) = 1 as F(x) = 1 when x 2 3.


1 1
But F(x) = Tee oa Pay a

erefore
Theref: F(38) = ae
Wye

= ipches

S ar 1
8
b= —85

So we have for2<x <3

Therefore FQ) =

But for the range 1 <x <2

F(x) =

F(2) =
Ble
wpole
298 A CONCISE COURSE IN A-LEVEL STATISTICS

1 5
So we have at-—- =-
2 8

1
Therefore a= -

So the cumulative distribution function F(x) is as follows:


x IX °o
0<x

F(x) = 1<<x
2<x
x KH
NM
VA own

Sketch of y = F(x).

(b) fix) = =F
— Fr)
Therefore <0
R
Pie
<x<1
R

f(x) = Sx <2
8| Pie
F <x 8
\oO
wo <3
O
DIF
BIFP
fle 8 >3

Sketch of y = f(x).
PROBABI LITY DISTRIBUTIONS [1 — CONTINUOUS RANDOM VARIABLES , 299

Sag By
symmetry
pes BUS)od By
E(X?) = x? f(x) dx
allx

ieessEe a [Ge ae
= 3)dx + a4 2 dx + 3 ies esl 2

-G-eb-EL- EAs
_ 19
ne

Ger
4]1

16to" (21
3]2 4

16
313

12)2

Now © Var(X) SAE aE (Xs)

— er 5 2

6 ®)

_u
12

The standard deviation


o = V Var(X)

Biel
12

= 0.957 (38S.F.)

Therefore the mean p = 1.5 and the standard deviation 0 = 0.957


(3S.F.).

l 0 to 3.
Example 5.17 A continuous random variable X takes values in the interva
It is given that P(X> x) =a+ bx*, O<x <3.
(a) Find the values of the constants a and b.
(b) Find the cumulative distribution function F(x).

(c) Find the probability density function f(x).

(d) Show that E(X) = 2.25.

(e) Find the standard deviation.


300 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 5.17 (a) P(X >x)=at+bx*,0<x <3.


So P(X>0) =1 and P(X>3) =
i.e. a+bv(0) = 1 and atb(27) =

Therefore a = 1, and 1+27b = 0


1
b= THs
wd
ee
So P(X >x)=1-—— <3
05x,
27

3
b) N Now
(b) ( Sx )
Px OT
a
i.e. ee 0<xS<3
F(x) = (27
1 x3

c
(c) x) = —F(x
f(x) s F(x)
dx
ra
9
ee
Therefore the p.d.f. of X is f(x) where f(x) = > 0<xS3.

(d) E(X) = x f(x) dac


allx

mikes

at;
= 2.25
36] 0

Therefore E(X) = 2.25, as required.

(e) Var(X) | x? f(x) dx — E2(X)


allx

3x4
| Sdee39 1952

fi
09
3
+ 5.0625 .
0

= 0.3375
PROBABILITY DISTRIBUTIONS I! — CONTINUOUS RANDOM VARIABLES 301
i

So iE V 0.33875
0.581 (3 S.F.)
The standard deviation of X is 0.581 (3 S.F.).

Exercise 5e

The continuous r.v. X has cumulative The length X of an offcut of wooden


distribution function F(x) where planking is a random variable which can
take any value up to 0.5m. It is known
0 x <0 that the probability of the length being
not more than x metres (0 <x < 0.5) is
Sianae equal to kx. Determine
F(x) =) s (a) the value of k,
e 1<x<2 (b) the probability density function of X,
(c) the expected value of X,
1 x22 (d) the standard deviation of X (correct to
3 significant figures). (C)
Find (a) the value of k, (b) the p.d-f. f(x)
and sketch it, (c) the mean yW, (d) the
standard deviation 0, (e) P(| X—ulSo).
A continuous random variable X takes
The continuous r.v. X has cumulative values in the interval 0 to 4. The proba-
distribution function F(x) where bility that X takes a value greater than x
is equal to ax?+ B, (O<x <4).
0 x1 (a) Determine the values of @ and £.
ae oD (b) Determine the probability density
ae 1<x<3 function f(x) of X.
12
(c) Show that the expected value yu of X
Boe 14x — x?— 25)
Sse 24
ee ge, is 3.
(d) Show that the standard deviation 0 of
1 x27 X is 3/2.
(e) Show that the probability that
(a) Find and sketch f(x). (0) Find E(X)
and Var(X). (c) Find the median m. (u—0) <X<(uto)is$V2 (C)
(d) Find P(2.8 < X < 5.2).

A random variable X has cumulative The continuous random variable X has


(distribution) function F(x) where (cumulative) distribution function given
0 x<—1 by
ax+ a —-1<x<0 (1+x)/8 (-1<x<0)
F(x) = 2ax + a
oe
0<x<1 F(x) ={\(1+3x)/8 (0<x<2)
(5+x)/8 (2<x <8)
38a 1<x

Determine
with F(x) = 0 for x <—1, and F(x) =1
(a) the value of a, for x >'3..
(a) Sketch the graph of the probability
(b) the frequency function f(x) of X,
density function f(x).
(c) the expected value py of XxX,
standard deviation 0 of X, , (b) Determine the expectation of X and
(d) the
the variance of X.
(e) the probability that |X— p| exceeds 5. (C)
(C) (c) Determine P(3 < 2X < 5).
Ce tas
EE EE
302 A CONCISE COURSE IN A-LEVEL STATISTICS
7.
“THE RECTANGULAR DISTRIBUTION
A continuous r.v. X having p.d.f. f(x) where

fx) == - for aX<x<b


where a and b are constants, is said to follow a rectangular (or
uniform) distribution. —

a and b are the parameters of the distribution.

If X is distributed in this way, we write

xX Hiab)
v

The graph of y = f(x) is

X is a random variable, since

[,,feo a =|

This can be seen more


easily from the diagram

For a rectangular distribution, the probability


of the variable lying in one particular range
of length / (say) is exactly the same as the
probability that it lies in another range of
the same length I.
1
In each case, the probability is 1 real
——O
PROBABILITY DISTRIBUTIONS I] — CONTINUOUS RANDOM VARIABLES j 303

For example, if X ~ R(1,6) then f(x) = :


and 1
F(x) a

P(1.8<X<1.9) lI So e ee

ole
ae

P(5.1<X<5.2) ll o ry
ole
eg
eal
i

1.8: 1:9 Bile ose

Example 5.18 The rounding error made when measuring the lengths of metal rods
to the nearest 5 mm is a random variable E. What is the distribution
of E?

Solution 5.18 The error is the difference between the true length and the recorded
length after rounding to the nearest 5mm.

Suppose we have recorded a length to be 75 mm, to the nearest


5mm. Now the true length could have been any length in the
interval

725mm </ < 77.5mm

So, the error, E, could be anywhere in the interval — 2.5 < E < 2.5.

All points in this interval are equally likely ‘stopping places’ for E,
so E is uniformly distributed in the interval.

We write E ~ R(—2.5, 2.5)

Example 5.19 A child spins a ‘Spinning Jenny’ at a fair. When the wheel stops, the
shorter distance of an arrow measured along the circumference
from the child is denoted by C. What is the distribution of C?

All the points on the circumference are equally Arrow


Solution 5.19
likely stopping places for the arrow, so C is uni-
formly distributed between 0 (when the arrow is
Cc
next to the child) and ar (when the arrow is dia-
metrically opposite the child). : ——/
Child
So Gu~-R(0;17)
A CONCISE COURSE IN A-LEVEL STATISTICS
304 |

_EXPECTATION AND VARIANCE

If X ~ R(a, b) then
E(x) = Bat)
— Var(X) = 7g(ba)?
The graph of y = f(x) is as shown.

By symmetry, E(X) = AG + dD).


Now Vat(X) = E(X*) 87x)

where E(X?) = | x? f(x) dx


allx

1
cubation!
= : (b—a)(b?
+ ab+ a’)
3(b—a)
(Oarab ra)
eee

So Var(X) = E(X2)—E%(X)
_ (b? Fab +07) | @* Head +b")
3 4
1
PG {4(b? + ab +a”)— 3(a? + 2ab + b?)}

= eee +a?)
12
1

em o—
a) 2

Therefore E(X) = AG + b) and Var(X) = 5(b—a)?


PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES 305
f

Example 5.20 The random variable X has a rectangular distribution over the
Pee eieak 8). Find the probability density function of Y where
= XxX”.

Solution 5.20 The p.d.f. of X is f(x) = e


Now, since X is a random variable,

1 = |jax
Now, we require some function of y, g(y)
say, such that

v2
De ey )Gy,
al

g(y) then gives the p.d.f. of Y.


Now x¥3 = y, so that
ae dx = dy
ice. dx = 3x?/3dy
3y7 dy
Also, when x = 0, y = 0 and when x = 8, y =2. So, the first
integral becomes

1 = [ay
ne yo ay

2 8y?
i.e. 1= ——a
0 8 of

Y is g(y) = Sy, 0<y<2


Thus, the p.d.f. of et .
peaece mn ciese
ee 4Bag

Example 5.21 A rectangle, with one side of length x cm and perimeter 12cm, has
area A cm?. If X is uniformly distributed between 0 and 2, find the
probability density function of A.

Solution 5.21 Let the p.d.f. of A be f(a)


One side of the rectangle has length x cm.
(6—<).
If the perimeter is 12 cm, then the length of the other side is
Now a = x(6—<)
So x7 —6x% = 4
(x—8)? = 9-a
x—-3 = +/9-a
ah
Now dx = +o
on /9=8
306 A CONCISE COURSE IN A-LEVEL STATISTICS

We are given that X is uniformly distributed between 0 and 2,


therefore
hah y fe) =5
are : :
1 Shay
0 2X
and when x = 0,a = 0; when x = 2,a=8.

Therefore
=| & 1
0 2

A
da
2/9—-a

so A is distributed with p.d.f. f(a) =Loca 0S<aS8.

_ Exercise 5f

aD If the continuous
X ~ R(3,6) find
r.v. X is such
(a) the p.df.
(b) E(X), (c) Var(X), (d) P(X> 5).
that
of X,
Show that the mean is (b+ a)/2, and the
variance is (b—a)?/ 12 for this distribu-
tion.

If the continuous r.v. X has p.d.f. f(x)


Given that the mean equals 1 and the
where f(x) =k and X ~ R(— 5,— 2) variance equals 4/3 find
(i) P(X <0),
find (a) the value of the constant k,
(ii) the value of z such that
(b) P(—4.38 < X <— 2.8), (c) E(X),
(d) Var(X). P(X >z+0,)
=},
The continuous r.v. X has p.d.f. f(x) as where 0, is the standard deviation of X.
shown in the diagram: (AEB 1979)

y y =f(x) A rectangle of area A square metres has a


perimeter of 20 metres and a side of
length X metres, where X is uniformly
distributed between 0 and 2. Show that
the probability density function of A is
1 x

(0<A<16)
Find (a) the value of k, 4\/(25—A)
(b) P(2.1<X< 3.4), Find the mean and variance of A. (JMB)
(c) E(X), A child rides on a roundabout and his
(d) Var(X).
father waits for him at the point where
The length X of a side of a square is he started. His journey may be regarded
rectangularly distributed between 1 and as a circular route of radius six metres and
4. Find the probability density function the father’s position as a fixed point on
of A, the area of the square, and calculate the circle. When the roundabout stops,
the mean and variance of the area of the the shorter distance of the child from the
square. father, measured alone the circular path,
The radius of a circle follows a rectangular is S metres. All points on the circle are
distribution between 1 and 3. Find the equally likely stopping points, so that S
probability density function of A, the is uniformly distributed between 0 and
area of the circle and calculate the mean 67. Find the mean and variance of S.
and the variance of the area of the circle. The direct linear distance of the child’s
The random variable X has probability stopping point from the father is D
density function given by metres. Show that the probability density
2
1 function of D is. =. for D

(b—a)
aXSx<b where b>a m/(144— D?)
f(x) = between 0 and 12 and zero outside this
0 otherwise range.
PROBABILITY DISTRIBUTIONS Ii — CONTINUOUS RANDOM VARIABLES , 307
The father’s voice can be heard at a Han5
distance of up to ten metres. Find to two
decimal places the probability that the
(v= fy
child can hear his father shout to him and state the range of corresponding
when the roundabout stops. (JMB) values for V. Obtain the mean and median
of V. (C)
10. The line y + 2x = R crosses the coordinate
-» The object distance U and the image axes Ox and Oy at P and Qrespectively.
distance V for a concave mirror are related Given that the area of AOPQ is A, show
to the focal distance f by the formula that A = k?/4.
pet A random variable takes values k such
that 0 <k S65 and is rectangularly dis-
Umar. i tributed in this interval.
U is a random variable uniformly distri- (a) Show that the expected value of A
buted over the interval (2f, 3f). Show that is 25/12.
V is distributed with probability density (b) Calculate the variance of A.
function (L Additional)

THE EXPONENTIAL DISTRIBUTION

as continuousrrv. Xxhaving p af. f(x) where


fe) == et for x20,
exponential
a positiveconstant, is said tofollow an»
whee diisa
"distribution.

d is the parameter of the distribution.

The graph of y = f(x) is F(x)

Now X is a random variable, since

ee | eax
Jo
—[e *]-

=1 (since lim e = 0)
x CO
i A CONCISE COURSE IN A-LEVEL STATISTICS
— 308

EXPECTATION AND VARIANCE

Now E(X)

iex(he—™) dx

= [eee — [ce ae 0

re o+| e “dx (since lim xe “=0).


0 x—0o

Shamehp

Ba!
ee Om

cy
r

To find Var(X) consider first E(X?).


re

Now E(X?) = { x? flor)dx


all x

|Sct hon ne
0

(x ?(—e5 “ye i if 2x (=e) dx


0

I] 0+2 | xe. *dx (since lim.x7e9 = 0)


0 x co
Z
(since {iAxe dx = =
2
PROBABILITY DISTRIBUTIONS I! — CONTINUOUS RANDOM VARIABLES , 309

Var(X) = E(X?)—E*(X)
man
neces
L
2

1
Therefore E(X) = i and Var(X) = =.

If the continuous r.v. X has p.d.f. f(x) =Ae_, x >O0, then

P(X >a) = e™
P(X >at+b|X>a)= P(X>bd).
To show this, consider

P(X >a) | re * dx
a

= [-e ae
*]7
= e Aa

P(X>at+bNX>a)
Also P(X >at+b|X
>a)
P(X >a)
e A(atb)

pe Aa

eek

= P(X>bd).

Example 5.22 The lifetime in years of a television tube of a certain make is a


random variable T and its probability density function f(t) is given
by
FER] AeTHe tornh0 icin
t-— (k > 0)
= 0 elsewhere
Obtain A in terms of k.
of 1000
(a) If the manufacturer, after some research, finds that out
such tubes 371 failed within the first two years of use, estimate
the value of k.
e
(b) Using this value of k correct to 3 significant figures, calculat
the mean and variance of 7, giving answers correct to 2 signifi-
cant figures (t” et = 0 when t = © for finite r).
one
(c) If two such tubes are bought, what is the probability that
the other lasts longer than six
fails within its first year and
(SUJB)
years?
A CONCISE COURSE IN A-LEVEL STATISTICS
310

Solution 5.22 Since T is a random variable { f(t) dt = 1.


all t

So | Ae adh 14
0
a [te
1 ] co 2 are

k ) \

Ape : ie
oe sincee *>Oasx7@

A=kR

So f(t)=ke ™, 0Xt<. This is an exponential distribution.

(a) P(T <2) = 0.371.


: Tae
i.e. [re dt = 0.371

[—e“*]5= 0.371
—e%*+e9 = 0.371
oot
= 0.629

a
So a
0.629

i.
Taking logs to base e 2h = inh =
0.629
= 0.464 (3S.F.)

Therefore we estimate that k = 0.232 (358.F.).

1
(b) E (Tp ios see p. 308

1
0.232

4.3 (2S.F.)

Therefore E(T) = 4.3 years (2 S.F.).


PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES 311
i

1
Now Var(T) = ea

ae
(0.232)?
19 (2S.F.)
So, putting k = 0.232, we have Var(T) = 19 years? (2 S.F.).

(c) P(T <1) = [ beat


0

mile e],
= —¢@ *+]
=1—07793
= 0.207 (858.F.)
and P(T>6) = 1—P(T<6)
6
Ms | ke-*t dt
0

a alraen ue
= 1l+e *—]j
= 0.249 (3S.F.)
Therefore, if two tubes are bought,
PUT, < TOT, >6)] + Pl (T,< 1) 0 (7, > 6)] lI 2(0.207)(0.249)
0.103 (8S8.F.)
Therefore the probability that one fails within its first year and the
other lasts longer than 6 years is 0.103 (3S.F.).

Example 5.23 The continuous random variable X has the negative exponential
distribution whose probability density function is given by
(ixje= sea. x 20,
f(x) = 0, otherwise,
where J is a positive constant. Obtain expressions, in terms of X, for
(a) the mean, £(X), of the distribution,
(b) F(x), the (cumulative) distribution function.
Television sets are hired out by a rental company. The time in
months, X, between major repairs has the above negative exponential
distribution with \ = 0.05. Find, to 3 significant figures, the
S12 A CONCISE COURSE IN A-LEVEL STATISTICS

not
probability that a television set hired out by the company will
period. Find also the
require a major repair for at least a 2-year
median value of X.

The company agrees to replace any set for which the time between
major repairs is less than M months. Given that the company does
not want to have to replace more than one set in 5, fin (L)

Solution 5.23 (Ce hen ek ee 0


1
(a) E(X) =~ (see page 308)

(b) F(t) = { reo dx t>0


J0

= [ey
= —(@s771)
= {—e
Therefore F(x) =1—e *, x20.

Let X be the r.v. ‘the time, in months, between major repairs’.


fix) = 0.05e-°°*
P(X> 24) = 1—F(24)
= e 0-05(24)

Ora 2

0.301 (3S.F.)
The probability that a television set will not need major repair in a
2-year period is 0.301 (3 S.F.).

Let m be the median value, then


Fim) = 0.5
So 1—e4" = 05
e tt = 0.5

NT TOS

m = —p6n05
= 13.9months (3S.F.)
The median is 13.9 months (3 S.F.).
PROBABILITY DISTRIBUTIONS || — CONTINUOUS RANDOM VARIABLES 313
We require P(X <M) < 0.2
Therefore 1—e 005M < 0.2
0.8
—0.05M > In 0.8

In 0.8
Ms
0.05
M IX 4.46
Since M is an integer M = 4

The company agrees to replace any set for which the time between
major repairs is less than 4 months.

Example 5.24 The lifetime of a particular type of lightbulb has a negative exponen-
tial distribution with mean lifetime 1000 hours.

(a) Find the probability that a bulb is still working after 1300
hours.

(b) Given that it is still working after 1300 hours, find the prob-
ability that it is still working after 1500 hours.

Solution 5.24 Let X be the r.v. ‘the lifetime of a lightbulb in hours’,


fier tner ex 0.
nen
, ;
= zs
N ow E(X)

if
therefore 1 = 1000
E(X) = 1000,
aN
A = 0.001

Fo) 0,1e 0.001%


0,0
ao

@ P(>Xx) = e)
e209
(sp.
Poe= 1300) =e °9010200)
= e 13

= 0.273 (3S.F.)
The probability that a bulb is still working after 1300 hours is
0.273 (3 S.F.).
A CONCISE COURSE IN A-LEVEL STATISTICS
314

P(X > 1500|X > 1300) II P(X > 200) (see p. 309)
(b)
= g 0-001(200)
= pro?

= 0.819 (35.F.)

The probability that the bulb is still working after 1500 hours,
given that it is still working after 1300 hours, is 0.819 (3 S.F.).

E
fitLINK BETWEEN THE EXPONENTIAL DISTRIBUTION AND
THE POISSON DISTRIBUTION '

The exponential distribution can be regarded as the ‘waiting time’


between events following a Poisson distribution. This is illustrated
as follows:
Cars arrive at a petrol station at an average rate of \ per minute. Let
X be the r.v. ‘the number of cars arriving in t minutes’. Assuming
that the number of cars arriving follows a Poisson distribution and
we ‘expect’ At to arrive in t minutes, then X ~ Po(At).
Now Pi =0) =e.
and P(X>1) = 1-e™
So P(at least one car arrives) = 1—e
Now let T be the r.v. ‘the length of time before a car arrives’.
We have P(waiting time is less than t minutes)
ll P(at least one car arrives in t minutes)

So P(T<t) = 1—e
Therefore F(t) = 1—e (cumulative distribution function)
Now f(t). =<F (t)
= Xe. (probability density function)
This is the exponential distribution, with parameter A.
NOTE: the parameter X is the same value as the respective Poisson
parameter, and the units of time are the same in both distributions.

Example 5.25 Ona stretch of road, breakdowns occur at an average rate of 2 per
day, and the number of breakdowns follows a Poisson distribution.
Find
(a) the mean time between breakdowns,
(b) the median time between breakdowns.
PROBABILITY DISTRIBUTIONS 11 — CONTINUOUS RANDOM VARIABLES ¢ 315

Solution 5.25 Let T be the rv. ‘the time between breakdowns’. Then T follows an
exponential distribution with parameter 2 where

f(t) = 2e°% t>0 and tis in days

Liane
a)
(a) E(T)=—
E(T) ce = —by

The mean time between breakdowns is half a day.

(b) The median is m, where

Ob | Qe~?* at
m

al
at
6 eka oe
qee(o "say
= be we
Therefore
eer e— 05
—2m = 1n0.5 (taking logs to base e)

m = —31n0.5
= 0.3465 days (4d.p.)
= 8hours (approx.)
Therefore the median time between breakdowns is approxi-
mately 8 hours.

--——s Exercise 5g
A continuous r.v. X has p.d.f. f(x) where other lasts for less than the mean number
f(x) = 5e *, x 20. Find (a) P(X > 0.5), of hours.
(b) E(X), (c) P(X < E(X)), (d) the (d) A random sample of6 lightbulbs is
standard deviation of X, (e) the median, chosen. Find the probability that exactly
(f) the mode. 4 will each last more than 2500 hours.

The lifetime, in thousands of hours, of A batch of high-power light bulbs is such


Extralight lightbulbs follows an that the probability that any bulb fails
exponential distribution with p.d-f. before x hours, when kept on continu-
f(x) = 0.5e7-°-*, ously, is F(x) = 1—e7*/!9, (x > 0). Find
(a) Find the mean lifetime. (a) the median time to failure,
(b) A bulb is selected at random. Find (b) the density function of the distribu-
the probability that it lasts (i) more than tion of the time to failure,
2500 hours, (ii) less than 1800 hours. (c) the mean and the variance of the
(c) Two lightbulbs are selected at random. distribution
Find the probability that one lasts more (d) the probability that a bulb will fail
than the mean number of hours and the between five and ten hours. (O)
A CONCISE COURSE IN A-LEVEL STATISTICS

The lifetime T, in years, of articles Show that A =1/a and that the mean
and variance of T are a anda’ respec-
produced by a manufacturer can be
tively.
modelled by the probability density
function given by
(You may assume that | t"e—*/4 dt
fit) = ae @-t 20, 0
f(t) = 0, t<0. =a"+t1n! for integral values of n.)
1 The life in hours of a type of electric
Prove that the mean of T is a and its battery can be modelled by the above
Pee tine
median is ——. distribution and when a sample of 800
a is tested the mean life is found to be
The articles are produced at a unit cost 92.2h. What are the values of A anda
of £10 and sold for £25. Research shows based on this figure?
that 50% of those produced fail within (a) What is the probability that a battery
the first five years of life. Find the value will last for at least 200 h?
of a. (b) If a battery has lasted 200 h what is
the probability that it will last for at
After some time in business the manu-
least a further 100 h?
facturer decides to guarantee free replace-
(c) If two batteries are bought what is
ment of items which fail during their first
the probability that one fails before 200h
year, but at the same time he raises the and the other after 200 h? (SUJB)
price so that the increase covers the expec-
ted cost of providing the guarantee. What
should the new price be? The random variable X can take all values
If two items are purchased what is the between 0 and a inclusive, where a > 0.
probability that just one will be replaced Its probability density function f(x) is
under guarantee? (SUJB) zero for x <0 and x >a, and, for
0 <x Sa, satisfies
Describe the conditions under which it is f(x) = (A/a)exp(— x/a),
appropriate to use the Exponential Distri- where A is a positive constant. Show by
bution, supporting your answer with integration that A = 1.582 to 3 decimal
reference to an experiment you may have places.
carried out. Also use integration to find to 2 decimal
A major road construction project is places
underway. In the site supervisor’s office, (i) the probability that X is less than 3a;
there is an average of two telephone calls (ii) the number A for which there is a
every 5 minutes. Stating any assumptions probability 5 that X is less than Aa. (MEI)
you make, write down the probability
that in a period of ¢ minutes there is
(a) no telephone call, Explain briefly, from your projects if
(b) at least one telephone call. possible, a real-life situation that can be
Presenting a carefully reasoned argument, modelled by an exponential distribution.
give the cumulative distribution function, An archer shoots arrows at a target. The
F(t), for the length of time between tele- distance X cm from the centre of the
phone calls. Hence establish that the prob- target at which an arrow strikes the
ability density function, f(t), is target has probability density function,
iit = Uden ow peO f, defined by

Calculate f(x) = we 2? x>0.


(c) the mean time between calls, f(x) = 0 otherwise.
(d) the median time between calls. An arrow scores 8 points if X < 2,5
Given that the supervisor has had no call points if 2<X <5, one point if
in the last 3 minutes, what is the proba- 5 <X<15 and no points otherwise.
bility that he could leave the office for 5 Find, to 3 decimal places, the expected
minutes without missing a call? (O) score when one arrow is shot at the
target. (L)
A continuous random variable T has a
negative exponential distribution given by
Find the mean of the random variable X
f(t) = Ae *" te 0 which has an exponential distribution
= 0 elsewhere. with probability density function
PROBABILITY DISTRIBUTIONS II — CONTINUOUS RANDOM
VARIABLES

f(x) = Ae** for


x >0 where A>0 10. A random variable X has the probability
f(x) = 0 for
x <0 density function f given by

For people suffering from a mental ill- f(x) = ce * x>0


ness, the time in days from the end of a f(x) = 0 otherwise.
treatment to the occurrence of renewed Find the value of c. Find also the mean
symptoms is an exponential random and the variance of X.
variable with parameter \ > 0. Find,
in terms of Ad and ¢, the probability that [You may assume that
neither of two randomly chosen sufferers
| ‘xe 2% dx = i.
from the illness will show renewed symp- 0
toms for a time ¢ days after a treatment.
Find the distribution function of X.
Given that two patients have no renewed
Hence, or otherwise, show that, for
symptoms for a time t days after a treat- positive ¢ and k,
ment, find, in terms of ) and t, the prob-
ability that both will remain free of symp- P(X>t+k|X>k) = P(X>t)
toms for a further ¢ days. Given that X is the lifetime in years of a
During a routine check at time t days particular type of indicator lamp that is
after his treatment, another patient is alight continuously, explain in words
found to be showing renewed symptoms. the meaning of the above result.
Find, in terms of A, k and t, the probabil- Given that 2 such lamps, A and B, have
ity that the renewed symptoms first already been alight for 3 months and 4
showed in this patient less than kt days months respectively, find the probability
before the day of the routine check, that both will still be alight in 3 months
where 0<Rk X11. (L) time. (L)

-THE NORMAL DISTRIBUTION

This distribution will be used extensively in the following chapters


and it is a very important one in statistics. Here we consider some
of its mathematical properties.

_ A continuous r.v. X having p.d.f. f(x) where

fle) = ee (Ho <x <0)


is said to follow a normal distribution.
wand o? are the parameters of the distribution.

We write X ~ N(u, 0”).

EXPECTATION AND VARIANCE

If X ~ N(u, 07) then


E(X)
Var(X) ll Q
318 A CONCISE COURSE IN A-LEVEL STATISTICS

1 co

In the following, we assume that | emis dt = 1.)


J 20 — oo

E(X) fe x fx) dx

s x e & a u)?/207 ax
ov 2°

dhe 1
Now, let t = £ so that dt =—dx and whenx =~, f=©,
0
x = —00,
t =—o,
1 <2 102
So E(X) = ov 2m J—
| . (utot)e2? odt

II
oe at dt +=
al ern dt

n+ Aa
Ire ne

Therefore E(X) = u

Var(X) l| |f x? f(x)
dx—p?
f co. 402
I—p? where I= o/on |we + ot) ot)? eCe—7t o dt

1 Po, 12 co” :
dt+ auo| ted ae
a Belel et

<E o?| east “|

Now

Pea at =| t(tet*yae

= [ie is |Aen cael dt

0+/27

2 1
Sak he eee 200
So I Car (u?/2m + Quo[—e- 2" |" +0?./27)
SA
p?+o? since e 7! >Oast> +0
PROBABILITY DISTRIBUTIONS I] — CONTINUOUS RANDOM VARIABL
ES y 319
Therefore Var(X) = p?+0?—-p?

So, E(X) = wand Var(X) = o?.

The following results are also important

Result1 If X PA NGL 0), the maximum value of f(x) occurs


when x = yn.

‘Weconsider f(x) = 1 oe
&— 2,52
B20
20

’ L (x — p)
fx) => t-a a =) aya[ae
ov/ 2a Oo”

1 2 2
= aa (saa) ont a) /20
o°\/ 2a

Now f(x) = 0 when x—p = 0


i.e. x =p

Now

f"(x) ree 1 |(x


Eye
is coat 4 en

21 oO
pect. 2
a 1 e—@—H)"/20" [-
Seis +4
21 o
When x = yp, f"(x) <0.
There is a maximum value of f(x) when x = uy.

Result 2 me X ~ N(u, 0”) then f(x) has poi


CgSito and ee Co

To show this, consider f(x).


f(x) = 0 when (x—p)? = o*
x—p = toa
x =pto or x =yp-o
There are points of inflexion at x = wt+oandx =y—o.

NOTE: Sketch of y = f(x). F(x)


320 A CONCISE COURSE IN A-LEVEL STATISTICS

MISCELLANEOUS WORKED EXAMPLES

Example 5.26 A random variable X has a probability density function


f(x) = Ax(6—x)? 0= x= 6
= 0 elsewhere.
Find the value of the constant A.
Calculate the arithmetic mean, mode, variance and standard devia-
tion of X. (AEB)

Solution 5.26 Since X isavr.v. [ f(x) dx =1


all x

| Ax(6—x)? ax
6
Therefore 1 II
0

6
Il A| (36x — 12x? + x3)dx
(0)

= A[18x?=4a2 Fixt],
= 108A
So ASS

il
Therefore f(x) = —x(6—x)?
ia ail x)? O<x<6
<x<

The arithmetic mean is E(X) where E(X) = | X F(x) ax:


all x

1 6
BOS roa 2°3127/6 a x)?dx
(X) = e=

= —(° (36x?
—12x?+ x4) dx
108) o
1 x5] 6
Se fl 2 ok ae ae
= eo =|"
= 2.4

The arithmetic mean is 2.4.


PROBABILITY DISTRIBUTIONS I] — CONTINUOUS RANDOM VARIABLES ,y 321
To find the mode we consider the maximum point on y = f(x).

1
x)
f(x) = —=
10g (36x
(86%— 12x? —12x?+x3
+ x*)

1
(i) = ——
ee toa(8624g% + 8x7)ex"
34
= i0a ae a
f(x) = 0 when x = 2 andwhen x = 6

— 24).
Consider f"(x) = z35 (6x
When x = 2, f"(x)<0O and when x = 6, f"(x) >0.

Therefore x = 2 gives a maximum value of f(x), i.e. the mode is 2.

Var(X) = E(X2)—E%(X)
Now E(X?) = | x? f(x) dx
allx

= —("
108]
(36x3—12x4+
x5) dx
1 2x Fee!) ©
a 108 lax 5 a
= 7.2
Var(X) = 7.2—(2.4)
= 1.44
Standard deviation of X = /Var(X)
= 12
Therefore the variance of X is 1.44 and the standard deviation is 1.2.

Example 5.27 A continuous random variable, X, has probability density function


given by
fie = ax—bx= for 0=x <2
= 0 elsewhere
Observations on X indicate that the mean is 1.
(a) Obtain two simultaneous equations for,a and b, show that
a = 1.5 and find the value of b.
(b) Find the variance of X.
A CONCISE COURSE IN A-LEVEL STATISTICS
322
and verify that
(c) If F(x) is the probability that X <x find F(x)
F(2)= 1.
X what is the
(d) If two independent observations are made on
(SUJB)
probability that at least one of them is less than 5?

Solution 5.27 (a) If X isa random variable { f(x) dx = 1. Therefore


Jallx

2
| (ax —bx?)dx = 1
0

= ml
pecsSeen oe =1
2 3 jo
_ 8b
24> =}
3

rate
2P eo) 2a—1
a i
(i)
so

Now ‘
E(X) = [_
allx
xflx)de

= | (ax?
— bx?) dx

_ [x2
_bxtl?
3 4 |o
8a
= ——A4b
3
But, we are given that E(X) = 1.So

;
Say == 1

8,
4b = —-1
3

2 8ae _ 16a_2 :
olen ‘es
From equations (i) and (ii) we have

16A 2 en
9 3 -

1 _ 2a
3 9
3
a=—
2
PROBABILITY DISTRIBUTIONS II — CONTINUOUS RANDOM VARIABLES a 323

8b
Substituting for a in (i) we have — = 3—-1 .

S
b=—
4

Therefore a = 1.5 and b = :.

(b) ny i} x? f(x)
dex
allx

2 (8 3
= | [eae dx
0 \2 4

: BeBe 2

8 20 0

= 6—4.8

= 1.2

Now Var(X) /=" BCX?)


— B(x)
= il

="072

Therefore Var(X) = 0.2.

(c) F@)ee PD)

So eae ae
3 1
and F(2) me)

= 1 as required

-30)-4eh
(d) P(X < 5) F(5)

5
32
324 A CONCISE COURSE IN A-LEVEL STATISTICS

27
an Cee es,
Therefore P(X 2 5) Eig

So if two independent observations are made,

P(at least one is less than 5) 1—P(both > 5)


i
Big
— ——

$9) \32
0.288 (3d.p.)
Therefore if two independent observations are made on X, the
one of them is less than =is 0.288 (3 d.p.).
probability that at least a
ee ae oy OS)

Example 5.28 The time taken to perform a particular task, t hours, has the proba-
bility density function
10ct? 0<t<0.6
f(t) = | 9c(1—t) 0.6<t<1.0
0 otherwise.

where c is a constant.
(a) Find the value of c and sketch the graph of this distribution.
(b) Write down the most likely time.
(c) Find the expected time.
(d) Determine the probability that the time will be
(i) more than 48 minutes,
(ii) between 24 and 48 minutes.

Solution 5.28 (a) Now

1= | f(t) dt
allt

0.6 1.0
= 10¢ | Pat +9e| (1—t) dt
Jo 0.6

10c ¢2)1-0
= te aac e=5
3 2 a6

= 0.72c
+ 0.72c

= 1:44c

Therefore c= er

== 200
ae

|
SR
PROBABILITY DISTRIBUTIONS I1 — CONTINUOUS RANDOM VARIABLES / 325

We have

Le 2 0<¢<06
f(t) = {|2a-t) 0.6<t<1.0
0 otherwise.

(b) ¢ = 0.6 gives the maximum value of f(t).


f(t)
2.5

0 0.6 Tame
Therefore the mode is 0.6 hours = 36 mins.

The most likely time is 36 minutes.

(c) E(t) | tf(t) dt


all ¢

0.6 1.0
aatuca Pat +9 | (t—t?) dt
0 0.6

10ec t? t? 1.0
Selly 1 Ochoa
A ae me 0.6

= 0.225+0.366...
= 0.591 ... hours

= 35.5 minutes
The expected time is 35.5 minutes.

(d) (i) 48 minutes = 0.8 hours. f(t)

1.0
P(T> 0.8) = Qe | (1 —t)dt
/ 0.8

: [ “\. 0 0608 1¢
a eee
2 |0.8

0.125

The probability that the time will be more than 48 minutes


is 0.125.
326 A CONCISE COURSE IN A-LEVEL STATISTICS

(ii) 24 minutes = 0.4 hours. A(¢) 108 age

Now
P(0.4<T<0.8) = 1—P(T > 0.8) —P(T < 0.4)
0.4

and P(T<0.4) = 10e | t2 dt


0

10¢ ‘Gabe
3
yee
Therefore
P(0.4<T<0.8) = 1—0.125—0.1481...
O721. (o°5.5.)
The probability that the time will be between 24 and 48 minutes
isO.f2i (So.r.).

SUMMARY — CONTINUOUS RANDOM VARIABLES

For a continuous random variable X, with p.d.f. f(x) (a<x <b)

[,.feee—2 d
P(e<X<d)=| f(x) dx cH ocd
a

Bex) = | x f(x) dx
all

Var(X) = | x? f(x)
dx —E*(X)
all

“t
F(t) = | f(x)dx a<t<b_ where F(t) is the cumulative
2 distribution function

f(zelier
-<p (x)
PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES 327
7

The rectangular distribution


1
If f(x) = ; axx<b then X ~ R(a, b)
—a@

in
E(X) = a)

A
Var(X) = = (b—a)?
The exponential distribution
If f(x) =r\ x20

E(X) = X

Var(X) = =

The normal distribution

1
If LS Oa e-@ rN)’ 207 —oo<
x <00

then X ~ N(y, 0)
B(x) =
Var(X) = 0?

Miscellaneous Exercise 5h —

(a) A continuous variable X is distributed (ii) Find the variance of X.


at random between the values 2 and 3 and (iii) Verify that the median value of X
6 is approximately 1.3.
has a probability density function of 7 (iv) Find the mode. (SUJB)
Find the median value of X.
(b) A continuous random variable X
The random variable X is the distance, in
takes values between 0 and 1, with a metres, that an inexperienced tight-rope
probability density function of Ax(1— x)? walker has moved along a given tight-rope
Find the value of A, and the mean and before falling off. It is given that
standard deviation of X. (SUJB) 3

A continuous variable X is distributed at P(X>x) = LaeA O<x<4


random between 2 values, x = 0 and
x = 2, and has a probability density (a) Show that E(X) = 3.
function of ax?+ Be The mean is 1.25. (b) Find the standard deviation, 0, of X.
(i) Show that 6 = 2 and find the value (c) Show that P(| X—3|<0)= eo/E
of a. (C)
328 A CONCISE COURSE IN A-LEVEL STATISTICS

The random variable X has a probability 8. The probability density function of X is


density function given by given by
kx(1—x?) (0<x<1) k(ax—x?) O<x<2
P(x) = 0 elsewhere f(x) = 0 <0, = 2
k being a constant. Find the value of k
where k and a are positive constants.
and find also the mean and variance of
this distribution.
Show that a> 2 and that k = 3
Find the median of the distribution. 6a— 8
(O &C)
Given that the mean value of X is 1, cal-
culate the values of a and k.
An ironmonger is supplied with paraffin For these values of a and k sketch the
once a week. The weekly demand, X graph of the probability density function
hundred litres, has the probability and find the variance of X. (JMB)
density function f, where
f(x) = e(l—x)’ O<x<1
A continuous random variable X has
f(x) = 0 otherwise
probability density function f(x) defined
where c is a constant. Find the value of c. by
Find the mean value of X, and, to the 12(x?—x?
Ate (: (x*— x”) 0<x<1
nearest litre, the minimum capacity of otherwise
his paraffin tank if the probability that
Find the mean and standard deviation of
it will be exhausted in a given week is not
to exceed 0.02. (L)P X; find also its mean deviation about the
mean. (O &C)

A continuous random variable X has The continuous random variable X has


10.
probability density function f(x) given by probability density function f(x) defined
f(x) = 0 forx <0 and x > 8 and between
x = 0 andx = 3 its form is as shown in by
the graph. e4 (<1)
fix) =( e(2—x?) (-1<x<1)
2 (x > 1)

(a) Show that c = i:


(b) Sketch the graph of f(x).
(c) Determine the cumulative distribution
(a) Find the value of A. function F(x).
(bo) Express f(x) algebraically and obtain (d) Determine the expected value of X
the mean and variance of X. and the variance of X. (C)
(c) Find the median value of X.
A sample X,, X and X3 is obtained. What 11. The random variable X takes all values x
is the probability that at least one is in the range 0 <x <1, and has a con-
greater than the median value? (SUJB) tinuous probability density function f(x)
defined by
Determine X such that f(x) = "khx®-"1—x)*? "(0 2 1)
0 x<0
(a) Show that k = $0(0+1)(0+ 2).
(b) Find E(X) and E(X?).
A/2 0O<x<1
(c) Deduce the variance of X.
f(x) = {0 1 <7 <2 (d) For 0= 3. find the location of the
BA/2—38X(x—8)7/4 2<x<4 mode and sketch f(x). (O &C)
0 x>4
is a probability density function of the 12. State the conditions under which the
distribution of a random variable X. binomial distribution is a suitable model
Sketch the density function and find to use in statistical work. Describe briefly
E(X) and Pr(X S 3.5). (MEI) how you used, or could have used, a
PROBABILITY DISTRIBUTIONS Il — CONTINUOUS RANDOM VARIABLES ; 329
binomial distribution in a project, giving 15. (a) A discrete random variable R takes
the parameters of your distribution. integer values between 0 and 4 inclusive
A large store sells a certain size of nail with probabilities given by
either in a small packet at 50p per packet,
or loose at £3 per kg. On any shopping Teateak (r= 0,1,2)
day the number, X, of packets sold is a P(R=r)= 10 ee ? tu}

random variable where X ~ B(8, 0.6), Or. ( Od


and the weight, Y kg, of nails sold loose 10 r=
»4)
is a continuous random variable with
probability density function f given by Find the expectation and variance of R.
(6) A continuous random variable X
fy) = = 1<y S6, takes values in the interval x 2 0. The
probability density function of X is
defined by
TY dariaOs otherwise.
Find, to 3 decimal places, the probability kx if OSx<1
that, on any shopping day, the number Kx) x pk
of packets sold will be “ if x>1
(a) more than one,
(6) seven or fewer. Prove that k = §8 and find the expectation
Find the probability that and variance of X. (C)
(c) the weight of nails sold loose on any
shopping day will be between 4 kg and
iSkg. 16.
A continuous random variable X has
(d) on any one shopping day the shop probability density function defined by
will sell exactly 2 packets of nails and 0 x <0,
less than 2 kg of nails sold loose, giving
Wayne i <x< 2,
0OSx
your answer to 2 significant figures.
(e) Calculate the expected money ee x > 2.
received on any shopping day from the
sale of this size of nail in this store. (L)
Calculate the value of k. Find the median
value and the expectation of X.
13. A beam of electrons is directed at a solid Prove that the standard deviation of X
object. The depth, X, to which any given is infinite.
electron will penetrate the object before
colliding with an atom is a random Find the value of a such that
variable whose probability density P(X >a) = 0.005 (C)
function is f.
(i) If f(x) =a—bx(a>0,b>0,
0 <x Sa/b), find b in terms of a. Find 17. The distances x, in miles, travelled by
E(X) and Var(X) in terms of a. customers to the ‘Cheep Supermarket’
If f(x) =ae"™(a > 0,c > 0, x2 0) are distributed with density function
find c in terms of a. Show that, if a :
1 ,-x/5
0<x<o
fraction 1/N of all electrons penetrates to f(x) =
il 0 otherwise
a depth greater than d, then d = a N.
Find the proportion of customers
(C) travelling less than 1 mile and the propor-
tion travelling more than 15 miles to the
supermarket.
14. The number of kilograms of metal ex- The chance that a customer goes twice
tracted from 10 kg of ore from a certain to the supermarket on one day is p(x)
mine is a continuous random variable X when the customer has to travel x miles
with probability density function f(x), each way and the chance of one visit only
where f(x)= ex(2— x)? if O<x <2 and is 1— p(x), where
f(x) = 0 otherwise, where c is a constant.
0o< x<5
Show that c = 3 and find the mean and
p(x).= pet tee
variance of X. The cost of extracting the Ale
Cole
metal from 10 kg of ore is £10x. Find the
expected cost of extracting the metal Find the expected distance travelled by a
from 10 kg of ore. (MEI) customer on one day. (MEI)
330 A CONCISE COURSE IN A-LEVEL STATISTICS

18. A person frequently makes telephone When a caller telephones a particular


calls to destinations for which each call is company, there is a probability of 5 that
charged at the rate of 15p per minute or he will be asked to hold the line. When he
part of a minute. The cost of such a call is is asked to hold the line and decides to do
X pence and its duration, T minutes, has so, the total time taken for the call has
the exponential probability density the above exponential distribution with
function a= z;when he is not asked to hold the
At) eeae* +270 the line the total time for the call has the
f(t) = 0 a0) exponential distribution with a= - Cal-
culate the expected cost if he rings the
Show that
company and is asked to hold the line and
P(X = 15r) = eF(e"—
1) r= 7, 2, 3,... he
and that the mean cost per call in pence is (a) holds and completes the call,
(b) rings off and then rings later, com-
15/A—e-°).
pleting the second call whether he is
co
x asked to hold or not.
[You may assume that 2 rx? =——a,
nl (ie) Assume that a wasted call costs 15p and
take e~!/2 = 0.6065, e71/6 = 0.8465.
i2i<1| (JMB)
DAD) 2 et ee Te eA ee a ee he
THE NORMAL
DISTRIBUTION
The normal distribution is the most important continuous distribu-
tion in statistics. Many measured quantities in the natural sciences
follow a normal distribution, for example heights, masses, ages,
random errors, I.Q. scores, examination results.

PROBABILITY DENSITY FUNCTION OF NORMAL VARIABLE

A continuous random variable X having p.d.f. f(x) where


i 23
x
f(x) =
op se —(& — pW)" /20 —
co <x <0

is said to have a normal distribution with mean p and variance o”.

wand o? are the parameters of the distribution.


If X is distributed in this way we write

xX ~ Ni(u, 07)
Sketch ofy = f(x):
The distribution is bell shaped and fix)
symmetrical about x = uy.
Approximately 95% of the distribu-
tion lies within + 2 standard devia-
tions of the mean. 5 & Sie © 5 S
Approximately 99.8% of the dis- ’ I b “ + *
tribution lies within +8 standard
deviations of the mean.

d
The range of the distribution is therefore approximately 6 standar
deviations.
by
The maximum value of f(x) occurs when x = u and is given
uk
f(x) = ov/ 20
331
332 A CONCISE COURSE IN A-LEVEL STATISTICS

There is a point of inflexion at x = u—o and at x =yw +o.


The actual size of the bell-shaped curve depends on the values of wu
and o.
Here are some examples, each drawn to the same scale:
(1) X ~ N(0,1) (2) X ~ N(100, 6.25)
oa | f(x) ®

0.16

=? 0 2 x 94 96 98 100 102 104 106 x

(3) X ~ N(50,4) , (4) X ~ N(4, 3)


0.8
43

f(x)

fix)
0.2

ZANSTAGRN 4S 5052554 OG 7 ex
2 TS rae 56

NOTE: the variable X is random. It is possible to show that


Cs 1 2.2
—@—H)/20° Gy = 1
i Ov Di F -

but this is beyond the scope of this book.


The probability that X lies between a and bo Pla<X <b)
is given by

Pa<X<b)
ams S = [ :eee erates de
a ov
ov/ 27 L
However, this integral is very difficult to evaluate, so we work with
the standard normal variable Z.

THE STANDARD NORMAL DISTRIBUTION

To standardise X, subtract uw and divide by o. So

Las
ae
0

Now, if the r.v. X has a normal distribution with mean y and variance
o”, then the r.v. Z has a standard normal distribution with mean 0
and variance 1,
THE NORMAL DISTRIBUTION
#; 333

1é oe = Neto) ede
‘then Z ~ N(0,1)
Example 61 Show that E(Z) = 0 and Var(Z) = 1, where Z is the standard
normal variable.

Solution 61 | Z= ae and E(X) = yw, Var(X) =


x
E(Z) = aI : Var(Z) = var(~ = ]
0 O

1 1
= ( EX)-E)] pelea
atVar(i)

eal
= mie Ls M) oneae [Var(X
+)0]
2

So E(Z) =0 ies
So Var(Z) = 1

THE PROBABILITY DENSITY FUNCTION FOR Z (z)

The p.d.f. of the standard normal variable Z is denoted by ¢(z)


where
0.4] 92)
o(z) = a eR —w<z<0

35 2 al 0 1 2 3

THE CUMULATIVE DISTRIBUTION FUNCTION FOR Z, ®(z)

The cumulative distribution function of the standard normal


variable Z is denoted by ®(z) where

@(z)= P(Z<z)= ia Sk e- z dz

This integral is still very difficult to evaluate, so we refer to tables.


334 A CONCISE COURSE IN A-LEVEL STATISTICS

It should be noted that the tables may be printed in one of two


different formats.
They may give the values of ®(z), where ®(z) = P(Z <z)

@(z)

or the values of Q(z), where Q(z) =P(Z > 2).

Q(z)

In the main text we will refer to the tables giving ®(z), the cumula-
tive probabilities of the standard normal distribution. These are
printed on p. 634.

However, you will find instructions for the use of the Q(z) tables
in Appendix 2 on pp. 641-53. Q(z) tables are printed on p. 640.

The values of Q(z) are known as the ‘Uppertail Probabilities’.

USE OF THE STANDARD NORMAL TABLES USING @(z)

Only positive values of z are printed in the tables, so for negative


values of z the symmetrical properties of the curve are used:

@(a) P ye

0a a0

P(Z <a) = ®(a) P(Z


>=a) = Pia)

EM

0a —a 0

P(Z >a) = 1—®(a) P(Z <—a) = 1>-S{a)

NOTE: We have
THE NORMAL DISTRIBUTION 335
7

Example 62 If Z~ N(0,1), find from tables (a) P(Z < 1.377),


(b) P(Z > —1.377), (c) P(Z>1.377), (d) P(Z <—1.377).
Solution 62 (a) | y

1.377 ea t3770 10

P(Z <1.377) (1.377) P(Z> 1.377) P(Z < 1.377)


II 0.9158 (1.377)
0.9158

eee 1.377
er — 1.3778 0

P(Z >1.877) = 1—9(1.377) P(Z <—1.377) = P(Z> 1.377)


= sl 0.9158 = 1—0.9158
= 0.0842 = 0.0842

Example 6.3 If Z~N(0,1), find (a) P(0.345 <Z<1.751),


(b) P(—2.696 <Z <1.865), (c) P(—-1.4<Z<-—0.6),
(d) P(|Z| <1.433), (e) P(Z > 0.863 or Z<—1.527).

Solution 6.3 (a) P(0.345 < Z <1.751) (1.751) —®(0.345)


0.9600— 0.6350
0.3250

So P(0.345 < Z < 1.751) = 0. 3250. 0.345 1.751

(b) P(—2.696 < Z < 1.865) (1.865) — &(—2.696)


(1.865) — (1— (2.696) )
(1.865) + (2.696) —1
0.9690 + 0.99650 —1
0.9655

So P(—2.696 <Z< 1.865) = 0.9655. — 2.696 O 1,865


336 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) P(-1.4<Z<-—0.6) = &(—0.) a


lI (1.4) — (0.6

0.9192 — 0.7257
0.1935
+ © 0
So P(—1.4 <Z<—0.6) = 0.1935. ay

(d) P(\Z| <1.438) = P(—1.4383 < Z < 1.433)


= 26(1.433)—1
= 2(0.9240)—1
= 0.848 0
So P(|Z|< 1.433) = 0.848. 1.433
1.433

(e)
P(Z > 0.863 or Z<—1.527) = 1—(#(0.863) + (1.527) —1)
= 2—(0.863) — (1.527)
= 2—0.8059— 0.9365
= 0.2576 | D

a 9
So P(Z>0.863 or Z<—1.527) = 0.2576. = 8

Example 64 If Z~N(0,1), show that (a) P(—1.96 <Z< 1.96) = 0.95,


(b) P(—2.575 <Z < 2.575) = 0.99.
Solution 64 (a) P(—1.96<Z<1.96) = 2@(1. lee 95%
= 2(0.975)—1 5% 2.5%
= 0.95
Therefore P(—1.96 < Z < 1.96)= 0.95.

NOTE: This is an important result:

“ The central 95% of the distribution lies between +1.96.

(b) P(—2.575 << Hane 2.575) = 20(2. ea 99%

0.99
— 2.575 2.575
Therefore P(—2.575 < Z < 2.575) = 0.99.

The central 99% of thedistribution liesbetween +2.575.


THE NORMAL DISTRIBUTION
i SoZ,
|NF}”S{}{]{V}OM/™NT>#.-—H#HMS}>--——7I|J?="$?]{[][]"_BOoES—_———_____——_——_J_J_——_——_.
a a

Exercise 6a
i ntl

1. IfZ~N(0,1), find (a) P(Z > 0.874), (f) P(Z > 2.326), (g) P(Z > 2.808),
(b) P(Z < 0.874), (c) P(Z <— 0.874), (h) P(Z < 1.96).
(d) P(Z > — 0.874).
IfZ ~ N(0, 1), find
(a) P(0.829<Z <1.843),
2. IfZ~N(0,1), find (a) P(Z > 1.8), (b)/ P(— 2.56 <Z<0.134),
(b) P(Z<—0.65), (c) P(Z >— 3.46), (c) P(—1.762 <Z<— 0.246),
(d) P(Z < 1.36), (e) P(Z > 2.58), (d) P(O<Z <1.73),
(f) P(Z > — 2.37), (g) P(Z< 1.86), (e) P(-2.05<Z<0),
(h) (Z <— 0.725), (i) P(Z > 1.863), (f) P(— 3.08 < Z < 3.08),
(i) P('Z<1.63), (k) P(Z >— 2.061), (g) P(1.764 <Z < 2.567),
(1) P(Z <— 2.875). (h) P(—1.65 <Z< 1.725),
(i) P(—0.98<Z<—0.16),
(ji) P(Z<—1.97 or Z> 2.5),
3. IfZ~N(0,1), find (a) P(Z > 1.645), (k) P(|Z|<1.78), (1) P(|Z|> 0.754),
(b) P(Z <—1.645), (c) P(Z > 1.282), (m) P(— 1.645 <Z <1.645),
(d) P(Z > 1.96), (e) P(Z > 2.575), (n) P(|Z|> 2.326).

Example 65 If Z~N(0,1), find the value of aif (a) P(Z >a) = 0.3802,
(b) P(Z >a) = 0.7818, (c) P(Z <a) = 0.0793,
(d) P(Z <a) = 0.9698, (e) P(|Z|<a) = 0.9.

Solution 65 (a) P(Z >a)


= 0.3802.
P(a) 1— 0.3802
0.6198
P(Z >a) = 0.3802
so from tables
II 0.305. ar,

(b) P(Z >a) = 0.7818.


Now, since the probability is greater than 0.5, a must be
negative.
Now @®(—a) 0.7818
P(Z >a) = 0.7818
a 0.778
a —0.778

(c) P(Z <a) = 0.0793.


From the diagram it is obvious that a must be negative
®(—a) 1—0.0793
0.9207
P(Z< a) = 0.0793
—a

—1.41
338 A CONCISE COURSE IN A-LEVEL STATISTICS

(d) P(Z <a) = 0.9698.


®(a) 0.9693 P(Z <a) = 0.9693
=
1.87

(e) P(\Z| <a) = 0.9,


ie. P(—a<Z <a)
From symmetry P(—a<Z<a) =09
2@(a)—1
2P(a)
P(a) i 0.95
a = 1.645

Exercise 6b

1. IfZ~N(0, 1), find a if (b) P(|Z|> a) = 0.097,


(a) P(Z > a) = 0.001 22, (c) P(IZ|<=a) 0.5,
(b) P(Z > a) = 0.0100, (d) P(\Z|>a) = 0.0404.
(c) P(Z >a) = 0.025, (d) P(Z >a) = 0.198,
(e) P(Z >a) = 0.481, (f) P(Z >a) = 0.692, If Z ~ N(0, 1), find the upper quartile and
(g) P(Z > a) = 0.812, (h) P(Z> a) = 0.9885. the lower quartile of the distribution. Find
2. IfZ~N(0,1), finda if also the 70th percentile.
(a) P(Z <a) = 0.0003,
(b) P(Z <a) = 0.0296, If Z ~ N(0, 1), finda if P(|Z|> a) takes
(c) P(Z <a) = 0.325, (d) P(Z <a) = 0.506, the value (a) 10%, (b) 5%, (c) 4%,
(e) P(Z <a) = 0.787, (f) P(Z <a) = 0.891, (d) 2%, (e) 1%; (f) 0.5%.
(g) P(Z <a) = 0.8297,
(h) P(Z <a) = 0.9738.
If Z ~ N(0, 1), find a if P(| Z| <a) takes
3. IfZ~N(0,1), finda if the value (a) 80%, (b) 96%, (c) 97%,
(a) P(|Z|<a) = 0.6372, (d) 99%.

USE OF THE STANDARD NORMAL TABLES FOR ANY NORMAL


DISTRIBUTION
We now show how the tables for the standard normal distribution
can be adapted for use with any random variable X where
X ~ N(u, 0”).

Example 66 Ther.v. X ~ N(300, 25). Find (a) P(X > 305), (b) P(X < 291),
(c) P(X < 312), (d) P(X > 286).
THE NORMAL DISTRIBUTION i 339
Solution 66 (a) P(X > 305). X ~ N(300,
25)
s.d.=5
First we have to standardise the random
variable X by subtracting the mean, 300,
and dividing by the standard deviation, uw=300 305
X — 300
(s.d.), 5,,so0 that Z = ———_—_,

We also use the following properties of inequalities:


X—300 _ 305—300
ne 00ee X= S00 S00 G00, & a
5
X—3800_ 305—300
So P(X> 305) = pz, a)
5 5
= P(Z > 1) Standard normal curve

pean 263, Z~ NO, 1)


= 1—0.8413 Sade
= 0.1587
Therefore P(X > 305) = 0.1587.
NOTE: if the two curves had been drawn to scale, the curve for
X would have been much more spread out and not as steep as
the curve for Z. However, for convenience of drawing, we use
the same sketch.
Often, again for convenience, we draw
one sketch and write the values of the
standardised variable underneath the x
values. We use the abbreviation S.V. for 300 305
‘standardised variable’. SV. 0 1

X—3800 _291—300
(b) P(X < 291) = ee
5 5
= P(Z<-—1.8)
a i, Ie
: (ie) 291 300
= 1—0.9641 SV 180

= 0.0359
Therefore P(X < 291) = 0.0359.

X—3800 | 312—300
(6) ee ed 2) aed er
5 5
= P(Z< 2.4)
(2.4)
0.9918
Therefore P(X < 312) = 0.9918.
A CONCISE COURSE IN A-LEVEL STATISTICS
340

i 300. 286— 500


(a) P(X > 286) p|———__ > ————_-
5 5
= P(Z>—2.8)
(2.8)
0.997 44 S.V.
Therefore P(X > 286) = 0.997 44.

Example 6.7 Ther.v. X is such that X ~ N(50, 8). Find (a) P(48 < X < 54),
(b) P(52 <X <55), (c) P(46<xX < 49), (d) P( |X—50| <+/8).

or 50
Solution 6.7. Standardise X so that Z = ai

a X—50 a
(a) P(48 <X < 54) Ve ais meee Vs

= P(—0.707 <Z < 1.414)


(1.414) + (0.707)
—1 Me
0.9213 + 0.7601 —1
= 0.6814 48 50 54
Therefore P(48 <X <54) =0.6814. S.V. —0.707 0 1.414

52.7750 gX 50 ny 0D
= 50
(b) P(52<X < 55) P
ar RR gIN ig
P(0.707 <Z <1.768)

(1.768)
— &(0.707)
= 0.9615 —0.7601
= 0.2014 50'52 55

Therefore P(52 << X < 55) = 0.2014. 0.7071.768

P|46-50 X— 50 49 ve J
(c) P(46<X<49)
GSir aoe
P(—1.414 < Z < — 0.354) s.d.=/8
(1.414) — (0.354)
0.9213 — 0.6383
4 om 4950
0.283 eS Vi +t
=
~ cS
Ww

Therefore P(46 < X < 49) = 0.283. pq


THE NORMAL DISTRIBUTION
# 341

(d) P(IX—50| </8) = P(—\/8


< X—50 <x/8)
Il p(-1<
X—50
J/8
<1]

= P(-1<Z<1)
= 20(1)—1
= 2(0.8413)—1
= 0.6826
Therefore P(|X—50| <./8) = 0.6826.

Paxomeis 6.8 The time taken by a milkman to deliver milk to the High Street is
normally distributed with mean 12 minutes and standard deviation
2 minutes. He delivers milk every day. Estimate the number of days
during the year when he takes (a) longer than 17 minutes, (b) less
than 10 minutes, (c) between 9 and 13 minutes.

Solution 6.8 Let X be the r.v. ‘the time taken to deliver the milk to the High
Street’. Then X ~ N(12, 27).
X—12
We standardise X so that Z =

X—-12_ 17-12
(a) P(X>17) = S sae
2 2
= P(Z>2.5) —
= 1—(2.5) S.V. Or 225
= 1—0.99379
= 0.006 21
The number of days when he takes longer than 17 minutes
II 365(0.006 21)

= 2.27
~ 2
Therefore on approximately 2 days in the year he takes longer
than 17 minutes.

X—-12 10-12
(b) P(X<10) = >| Ea |
2 2

en (Z a1) 10 12
= 1-—@(1) Se she
= 1—0.8413
0.1587
342 A CONCISE COURSE IN A-LEVEL STATISTICS

The number of days when he takes less than 10 minutes


= 365(0.1587)
= 57.9
=~ 58

on 58daysintheyearhetakes less
approximately
Therefore
than 10 minutes.

ae P Gola Saale =)
(c) POX < 13)e= 9 9 re

= P(—1.5 <Z<, 0.5)


= (0.5) + 6(1.5)—1
= 0.6915 + 0.9332 —1 9 1213
= 0.6247 SV. -15 005

The number of days when he takes between 9 and 13 minutes


= 365(0.6247)
= 228 days
Therefore on 288 days he takes between 9 and 13 minutes.

1. If X~N(300, 25), find (a) P(X > 308), 6. If X ~N(84,12), find (a) P(80< X < 89),
(b) P(X > 311.5), (c) P(X > 294), (b) P(X <79 or X > 92),
(d) P(X > 290.5), (e) P(X < 302), (c) (76 <X <82),
(f) P(X < 312), (g) P(X < 299.5), (d) P(| X—84|> 2.9), (e) P(87 <X< 98).

(ny F(A = 293); 7. IfX~N(2,0.3), find


2. If xX ~N(50, 20), find (a) P(X> 60.3), (a) P(1.8 << X < 2.9),
(b) P(X < 47.3), (c) P(X > 48.9), (b) P(2.01<X< 2.8),
(d) P(X > 53.5), (e) P(X < 59.8), (c) P(|X—21< 2/0.3).
(f) P(X < 62.3). . a
8. Packages from a packing machine have a
3. If X~N(—8,12), find (a) P.X<— 9.8), mass which is normally distributed with
(b) P(X > 0), (ce) PX <= 34), mean 200 g and standard deviation 2 g.
(DPC bi) le) PRC 11058) Find the probability that a package from
(f) P(X >—1.6), (g) P(X >—8.2). the machine weighs (a) less than 197 g,
ie :
4. IfX~N(a,a’), find (a) P(X <0), (b) more than 200.5 g, (c) between 198.5 g
(b) P(X > 0), (c) P(X > 4a), ead
8a 5a 9./ The heights of boys at a particular age
(¢) (xZi 2 | tz) |x ~ 2 follow a normal distribution with mean
150.3 cm and standard deviation 5 cm.
5. IfX~N(100,80), find Find the probability that a boy picked at
(a) P(85<.X < 112), random from this age group has height
(b) P(105 <<X <115), (a) less than 153 cm, (b) less than 148 cm,
(c) P(85<X <92), (c) more than 158 cm, (d) more than
(d) P(| X—100| < V0), 144cm, (e) between 147 cm and 149.5 cm,
(e) P(99 <X< 105). (f) between 150 cm and 158 cm.
THE NORMAL DISTRIBUTION
7 343
10. A random variable X is such that 1 kg and standard deviation 0.15 kg. Ina
X ~ N(—5,9). Find the probability that lorry load of 800’of these cabbages,
(a) an item chosen at random will have a estimate how many will have mass
positive value, (b) out of 10 items chosen
at random, just 4 will have a positive value. (a) greater than 0.79 kg,
(b) less than 1.18 kg,
11. A certain type of cabbage has a mass (c) between 0.85 kg and 1.15 kg,
which is normally distributed with mean (d) between 0.75 kg and 1.29 kg.
a rg se

De-standardising
Sometimes it is necessary to find a value X which corresponds to
x?
jis

the standardised value Z. We use Z = Ee so that X = yt oZ.

Example 69 If X ~ N(50,6.8), find the value of X which corresponds to a


standardised value of (a) —1.2, (b) 0.6.

Solution 69 Now X = y+0Z, where pu = 50 and o = \/6.8, so that


X = 50++/6.8Z.
(a) when z = —1.2, (b) when z = 0.6,
x = 50+./6.8(—1.2) x = 50+./6.8(0.6)
= 46.87 (2d.p.) = 51.56 (2d.p.)

Exercise 6d

Find the value of X which corresponds to a (iii) X ~ N(84.5, 50), (iv) X ~ N(62.3, 38),
standardised value of (a) — 2.05, (b) 0.86 for (v) X ~ N(u, 0”), (vi) X ~ N(a, b),
each of the following distributions: (vii) X ~ N(a, a”), (viii) X ~ N(49, 49).
(i) X ~ N(60,17), (ii) X ~ N(124, 3.2%),

Example 6.10 IfX ~ N(100,36) and P(X >a) = 0.1093, find the value of a.
Solution 6.10 As P(X >a) is less than 0.5, a must be greater than the mean, 100.
Now P(X >a) = 0.1093 3
X—100_ a—100 P(X >a) = 0.1093
so pf > ]= 0.1093

—10
100 a

ne o(2> "| = 0.1093


a—100
We have o/ = 1—0.1093

= 0.8907
344 A CONCISE COURSE IN A-LEVEL STATISTICS

But from tables,


(1.23) = 0.8907
a—100
Therefore 6 eo

a = 100+ 6(1.23) = 107.38

Therefore, if P(X >a) = 0.1093, then a = 107.38.


SS

Example 6.11 If X ~ N(24,9) and P(X >a) = 0.974, find the value of a.

Solution 6.11 As P(X >a) is greater than 0.5, a must be less than the mean 24.
Now P(X >a) = 0.974
= —3A
eG (= 3 22 tel 3 ye 0.974
—924
os plz>" ; = 0.974
a—24 :
Now —e must be negative and

0-8) = oor a—24


a ]= 1.9438
3
a—24
= —1.943
3
a = 24—(8)(1.943)
= 18.171
Therefore, if P(X >a) = 0.974, then a = 18.171.

Example 6.12 If X ~ N(70, 25), find the value of a such that


P(|X —70|<a) = 0.8. Hence find the limits within which the
central 80% of the distribution lies.

Solution 6.12 P(\X—70|<a) = 0.8


Therefore
P(—a <X—70<a) = 0.8

P -2.<2*
A= 710 <4) = 08 Sie 6
5 5 5

UV | /\ N A lI 0.8
o|& Saale
THE NORMAL DISTRIBUTION
| J 345
Now, by symmetry

E
a
20 '—|~—1, = 0.8

a
20(2 = 1.8
5

o(é = 0.9

Therefore = 1.282

= 6.41

So P(—6.41 < X—70< 6.41) = 0.8


or P(63.59 < X < 76.41) = 0.8
The central 80% of the distribution lies between 63.59 and 76.41.

Exercise Ge

If X ~ N(60, 25) and if (b) Find the mass exceeded by 7% of the


(i) P(X >a) = 0.2324, finda, lettuces.
(ii) P(X > b) = 0.0702, find b, (c) In one day, 1000 lettuces are sold.
(iii) P(X > ce) = 0.837, find c, Estimate how many weigh less than 545 g.
(iv) P(X > d) = 0.7461, find d. The marks of 500 candidates in an examina-
If X ~ N(45, 16) and if tion are normally distributed with a mean
(i) P(X <a) = 0.0317, finda, of 45 marks and a standard deviation of
20 marks.
(ii) P(X <b) = 0.895, find b,
(iii) P(X <c) = 0.0456, find c, (a) Given that the pass mark is 41, estimate
(iv) P(X <d)= 0.996, find d. the number of candidates who passed the
examination.
(b) If 5% of the candidates obtain a
If X ~ N(80, 36), find c such that
distinction by scoring x marks or more,
P(|X—80|<c) = 0.9 and hence find the
estimate the value of x.
limits within which the central 90% of the
(c) Estimate the interquartile range of the
distribution lies.
distribution. (L Additional)
If X ~ N(400, 64), find If X ~ N(k, k?), find
(i) asuch that P(| X—400|<a) = 0.75, (i) asuch that P(| X—k|<ak) = 0.9,
(ii) b such that P(| X— 400|< b) = 0.98, (ii) b such that P(| X—k|> bk) = 0.01,
(iii) c such that P( | X — 400|<c) = 0.95, (iii) ¢ such that P(| X—k|> ck) = 0.05,
(iv) dsuch that P(| X— 400|< d) = 0.975, (iv) d such that P( |X—k|<dk) = 0.995.
(v) the limits within which the central
95% of the distribution lies. Bags of flour packed by a particular
machine have masses which are normally
The masses of cos lettuces sold at a hyper- distributed with mean 500 g and standard
market are normally distributed with mean deviation 20 g. 2% of the bags are rejected
mass 600 g and standard deviation 20 g. ~ for being underweight and 1% of the bags
(a) If a lettuce is chosen at random, find are rejected for being overweight. Between
the probability that its mass lies between what range of values should the mass of a
570g and 610g. bag of flour lie if it is to be accepted?
A CONCISE COURSE IN A-LEVEL STATISTICS
346

9. A sample of 100 apples is taken from a Determine the mean and standard devia-
load. The apples have the following distri- tion of these diameters.

bution of sizes Assuming that the distribution is approxi-


mately normal with this mean and this
Diameter to nearest standard deviation find the range of size of
67
8 oer oO apples for packing, if 5% are to be rejected
cm
as too small and 5% are to be rejected as
TL -2ir38i i eas too large. (O &C)

PROBLEMS THAT INVOLVE FINDING THE VALUE OF OR o OR BOTH

Example 6.13 The lengths of certain items follow a normal distribution with mean
j.cm and standard deviation 6 cm. It is known that 4.78% of the
items have a length greater than 82 cm. Find the value of the mean L

Solution 6.13 Let X be the r.v. ‘the length of an item in cm’.

X ~ N(u, 36) and P(X > 82) = 0.0478.


Now P(X > 82)
ac 2S
6 6

= p(z> 32 “| S.V. 0 1.667

so. .1—®
82=—p
“nafs
= 0.0478
8Z—
6

S240
@ a = 0.9522

But from tables


(1.667) = 0.9522

so a = 1.667
82—p = 10.002
w= 72 (28.F.)
The mean of the distribution is 72 cm.

Example 614 X ~ N(100,o7) and P(X < 106) = 0.8849. Find the standard devia-
tion, oO.
THE NORMAL DISTRIBUTION
y 347

Solution 6.14 P(X <106) = 0.8849


A100 pe106
alks
— 100
iP = 0.8849
oO
6 100 106
Vv. 6
2p
<¢)-
oO
0.8849 pe Nee ae
G
(5) = 0.8849
Oo
But from tables

(1.2) = 0.8849

6
Therefore SES L2
oO

6
C= ——
2

= 5
The standard deviation of the distribution is 5.

Example 6.15 The masses of articles produced in a particular workshop are


normally distributed with mean yp and standard deviation o. 5% of
the articles have a mass greater than 85 g and 10% have a mass less
than 25g. Find the values of yu and o, and find the range symmetrical
about the mean, within which 75% of the masses lie.

Solution 6.15 Let X be the r.v. ‘the mass, in g, of an article’. Then X ~ N(u, o7)
where yw and o are unknown.
Now P(X
>85) = 0.05
vel Fee
Aan
ren|
85 —
Sais 5%%
Ove: 0
3 bu 85
= ‘vp ,
| =—108 a ee
0

oS) —SOOr
A CONCISE COURSE IN A-LEVEL STA TISTICS
348

But from tables


(1.645) = 0.95
85—
Therefore ae = 1.645

85—p = 1.6450 (i)


Also < 25) =
P(X 0.10
= 95 — 10%
0 0
25 Lu
95— S.V. —1.282 0
eae = 0.10
0

25—K. :
But is negative, and by symmetry,

From tables
+A] =
_
oor
0

(1.282) = 0.9
25—pu
Therefore a Oe
0
i.e. pw—25 = 1.2820 (ii)

Adding (i) and (ii) we have


60 = 2.9270
o = 20.5 (38S.F.)
Substituting for o in (ii)
by 25 + (1.282)(20.5)
51.38 (3S.F.)
Therefore the distribution has mean mass 51.3 g and standard
deviation 20.5 g.

Now consider values a and b such that 75%


s.d. = 20.5
Paa<X<b) = 0.75 12.5% 12.5%
and a and b are symmetrical about the mean.
0.125 a 51.3 b
Now P(X>b) =

PS 51.8 b— 51.3) _
0.125
20.5 20.5 )
(=|
So p| 22 0.875
20.5
THE NORMAL DISTRIBUTION
y 349
But from tables @(1.15) = 0.875
Oe-Ol.
Therefore se = 1.15
20.5
b = 51.3+(20.5)(1.15) = 74.9 (3S.F.)
From symmetry a= 51.3—(20.5)(1.15) = 27.7 (38.F.)
Therefore, the central 75% of the distribution lies between the limits
27.7g and 74.9 g.

Exercise 6f ~

X ~ N(45,
0”) and P(X > 51) = 0.288. 12. The diameters of bolts produced by a
Find o. particular machine follow a normal
distribution with mean 1.84cm and
X ~ N(21, 0”) and P(X < 27) = 0.9332. standard deviation 0.04cm. A bolt is
Find o.
rejected if its diameter is less than 1.24 cm
X ~ N(u, 25) and P(X < 27.5) = 0.3085. or more than 1.40cm. (a) Find the
Find wu. percentage of bolts which are accepted.
The setting of the machine is altered so
X ~ N(u, 12) and P(X > 32) = 0.8438. that the mean diameter changes but the
Find uw. standard deviation remains the same. With
the new setting, 3% of the bolts are
X ~ N(u, 0”) and P(X > 80) = 0.0113,
rejected because they are too large in
P(X> 30) = 0.9713. Find p and o.
diameter. (b) Find the new mean diameter
X ~ N(u, 0”) and P(X > 102) = 0.42, of the bolts produced by the machine.
P(X < 97) = 0.25. Find p and o. (c) Find the percentage of bolts which are
rejected because they are too small in
X ~ N(u, 0”) and P(X < 57.84) = 0.90, diameter.
P(X > 50) = 0.5. Find wand o.
13. A certain make of car tyre can be safely
X ~ N(u, 0”) and P(X < 35) = 0.2, used for 25000 km on average before it
P(35 < X < 45) = 0.65. Find pu and o.
is replaced. The makers guarantee to pay
The marks in an examination were compensation to anyone whose tyre does
normally distributed with mean wu and not last for 22000km. They expect
standard deviation 0. 10% of the candi- 7.5% of all tyres sold to qualify for com-
dates had more than 75 marks and 20% pensation. Assuming that the distance, X,
had less than 40 marks. Find the values travelled before a tyre is replaced has a
of wand Oo. normal probability distribution, draw a
diagram illustrating the facts given above.
10. The lengths of rods produced in a work-
shop follow a normal distribution with Calculate, to 3 significant figures, the
standard deviation of X.
mean J and variance 4. 10% of the rods
are less than 17.4cm long. Find the Estimate the number of tyres per 1000
probability that a rod chosen at random which will not have been replaced when
will be between 18 and 23 cm long. they have covered 26 500 km.
(L Additional)
11. A man cuts hazel twigs to make bean
poles. He says that astick is 240cmlong. 14. A cutting machine produces steel rods
In fact, the length of the stick follows a which must not be more than 100 cm in
normal distribution and 10% are of length length. The mean length of a large batch
250 cm or more while 55% have a length of rods taken from the machine is found
over 240 cm. Find the probability that a to be 99.80 cm and the standard deviation
stick, picked at random, is less than of these lengths is 0.15 cm.
235 cm long. (a) Assuming that the lengths of the rods
A CONCISE COURSE IN A-LEVEL STATISTICS

15. The continuous random variable X is


are normally distributed, calculate, to one
decimal place, the percentage of rods normally distributed with mean UL
which are too long. and standard deviation 0. Given that
(b) The position of the cut can be adjusted P(X < 53)= 0.04 and P(X < 65)= 0.97,
without altering the standard deviation of find the interquartile range of the distri-
the lengths. Calculate in cm, to 2 decimal bution.
places, how small the mean length should
be if no more than 2% of the rods are to
be rejected for being longer than 100 cm.
(c) If the mean length is maintained atTea is sold in packages marked 750g.
16.
The masses of the packages are normally
99.80 cm, calculate, to the nearest tenth
of a mm, by how much the standard distributed with mean 760g, standard
deviation must be reduced if no more deviation o. What is the maximum value
than 4% of the rods are to be rejected for of o if less than 1% of the packages are
being longer than 100 cm. (L Additional) underweight?
e a ee

NOTE: In solutions involving the normal distribution we will now


omit the line of working involving ® and the solutions should
appear the same whether the reader is using the standard normal
tables giving ®(Z) or Q(Z).

MISCELLANEOUS WORKED EXAMPLES

Example 6.16 Tests on 2 types of electric light bulb show the following:
Type A, lifetime distributed normally with an average life of 1150
hours and a standard deviation of 30 hours.
Type B, long-life bulb, average lifetime of 1900 hours, with standard
deviation of 50 hours.
(a) What percentage of bulbs of type A could be expected to have
a life of more than 1200 hours?
(b) What percentage of type B would you pee to last longer
than 1800 hours?
(c) What lifetime limits would you estimate would contain the
central 80% of the production of type A? (SUJB)

Solution 6.16 (a) Let X be the r.v. ‘the length of life in hours of type A bulb’.
Then X ~ N(1150, 302).
a | X ~ N(1150, 30%)
~ 2

P(X > 1200) = 7c


5 11 —

s.d. = 30
a re
= P(Z > 1.667) TypeA
= 0.0478
1150 1200
Sve O 1.667

Therefore 4.78% of type A bulbs could be expected to have alife


of more than 1200 hours.
THE NORMAL DISTRIBUTION
i 351

(b) Let Y be the r.v. ‘the length of life in hours of type B bulb’.
Then Y ~ N(1900, 507).

po 1900. 1800—1900 Y ~ N(1900, 502)


P(Y > 1800) P| ———_ > ——_—
50 50
P(Z>—2)
0.9772 1800 1900
SV 0

‘Therefore 97.72% of type B bulbs could be expected to have alife


of more than 1800 hours.

(c) Now, from tables (1.282) = 0.90 or Q(1.282) = 0.10.


So that the central 80% of the standard normal distribution lies
between the limits +1.282.

We wish to find the values of X which correspond to the standardised


values of +1.282,so we use X = y+ 0Z.
So the central 80% of the distribution lies
between the limits u+1.2820.
For type A, the limits are
+ (1.282)(30)
1150 = 1150+38.46 S.V. —1.282 0 1.282
u—1.2820 pw pt 1.2820
(1111.54, 1188.46)

We estimate that the limits (1111.54 hours, 1188.46 hours) would


contain the central 80% of the production of type A.

Example 6.17 A machine is producing components whose lengths are normally


distributed about a mean of 6.50 cm. An upper tolerance limit of
6.54 cm has been adopted and, when the machine is correctly set,
1 in 20 components is rejected as exceeding this limit. On a certain
day, it is found that 1 in 15 components is rejected for exceeding
this limit.

(a) Assuming that the mean has not changed but that the produc-
tion has become more variable, estimate the new standard
deviation.

(b) Assuming that the standard deviation has not changed but that
the mean has moved, estimate the new mean.

(c) If 1000 components are produced ina shift, how many of them
may be expected to have lengths in the range 6.48 to 6.53 cm
if the machine is set as in (a)? (AEB 1972)
A CONCISE COURSE IN A-LEVEL STATISTICS
352

Solution 6.17 (a) Let X be the r.v. ‘the length in cm of a component’.


Then X ~ N(6.50, o”) where a is the new standard deviation.
1
> 6.54)
P(X = = = 0.0667
15 6.67%
so
6.54—6.50 6.50 6.54
X—6.50. 0 1.501
p|———_ > ——_ = _ 0.0667 sv.
o o
0.04
P\Z>—] = 0.0667
Oo
0.0
therefore — =-1.501
0
O 0.0266 (35S.F.)

The new standard deviation is 0.0266 cm (3S.F.).

(b) Let the original standard deviation be 0}.


Then X ~ N(6.50, o,”) originally.
1
P(X
>6.54) = —20 = 0.05 5%

X—6.50 6.54—6.50
E > = 0.05 6.50 6.54
O1 O1 S.V. 0 1.645
0.04
P\|Z> = O05
0;

therefore Dns = 1.645


0;
o, = 0.0243 (8S.F.)
Now suppose that the new mean is y.
So X ~ N(p, 0.02437).
P(X > 6.54) = 0.0667
pw 6.54
sea ad $y S.V. 0 1.501
0.0667
0.0248 0.02438
6.54—p
P|Z > ———] = 0.0667
0.0243

6.54—p
therefore —— sb O0L
0.0243 :

p = 6.54—(1.501)(0.0243)
6.504 (3d-p.)
Therefore the new mean is 6.504 cm (38 d.p.).
THE NORMAL DISTRIBUTION y 353

(c) If the machine is set as in part (a) then X ~ N(6.50, 0.026672).


6.48 —6.6 50 _Xas 6.50
P| &
_6.53—6.50
P(6.48 < X <6.53)
0.0266 0.0266 0.0266
POPU 52 7 on) s.d. = 0.0266
0.6442

6.48 6.50 6.53


S:\V5— UW52, 0, 1.128

- So for 1000 components we expect (0.644)(1000) = 644 to have


lengths in the range 6.48 to 6.53 cm.

Miscellaneous Exercise 6g

Batteries for a transistor radio have a mean Six hundred rounds are fired from a gun at
life under normal usage of 160 hours, with a horizontal target 50 m long which extends
a standard deviation of 30 hours. Assuming from 950 m to 1000 m in range from the
that battery life follows a normal distribu- gun. The trajectories of the rounds all lie
tion, in the vertical plane through the gun and
(a) calculate the percentage of batteries the target. It is found that 27 rounds fall
which have a life between 150 hours and short of the target and 69 rounds fall
180 hours; beyond it. Assuming that the range of
(b) calculate the range, symmetrical about rounds is normally distributed, find the
the mean, within which 75% of the battery mean and standard deviation of the range.
lives lie; Estimate the number of rounds falling
(c) if a radio takes four of these batteries within 5 m of the centre of the target.
and requires all of them to be working, (C)
calculate the probability that the radio
will run for at least 135 hours. (O&C) Machine components are mass-produced at
(a) Without using a calculator, calculate a factory. A customer requires that the
the mean and standard deviation of the components should be 5.2cm long but
numbers: 2,3,5,5,8,11,11,11. they will be acceptable if they are within
limits 5.195 cm to 5.205 cm. The customer
(6) A machine produces components in tests the components and finds that
batches of 20000, the lengths of which 10.75% of those supplied are over-size and
may be considered to be normally distri- 4.95% are under-size. Find the mean and
buted. standard deviation of the lengths of the
At the beginning of production, the components supplied assuming that they
machine is set to produce the required are normally distributed.
mean length of components at 15mm, If three of the components are selected at
and it can then be set to give any one of random what is the probability that one is
three standard deviations: 0.06 mm, under-size, one over-size and one satis-
0.075 mm, 0.09 mm. factory?
It costs £850, £550 and £100 respectively If the standard deviation of the machine
to set these deviations. producing the components is altered with-
Any length produced must lie in the out altering the mean so that 4.95% are
range 14.82 mm to 15.18 mm, otherwise it over-size, what will be the new standard
is classed as defective and costs the com- deviation and what percentage of com-
pany £1. ponents will now be under-size? (SUJB)
Which standard deviation should be used, if
the decision is to be made purely on the 5. A marketing organisation grades onions
cost of setting the machine and of the into 3 sizes: small (diameter less than
defectives? (SUJB) 60 mm), medium (diameter between 60 mm
354 A CONCISE COURSE IN A-LEVEL STATISTICS

and 80 mm) and large (diameter greater The candidate is not admitted unless his
than 80 mm). A certain grower finds that 1.Q. as given by the test, is at least 130.
61% of his crop falls into the small category Estimate the median I.Q. of the members
and 14% into the large category. Assuming of the Egghead Society, assuming that their
that the distribution of diameters of the 1.Q. distribution is representative of that
onions in his crop is described by a Normal of the part of the population having I.Q.s
probability function, sketch a graph greater than, or equal to, 130.
showing the information given above. What I.Q. would be expected to be exceeded
On this basis, calculate the standard by one member in ten of the society?
deviation and the mean of the diameters (AEB )
of the onions in his crop. (SMP)
The acidity of each of 100 random samples
Packets of semolina are nominally 226 g in of soil from an area of land was measured
weight. The actual weights have a Normal and the results given in Table A below.
distribution with u = 230.00g and
Assuming that the pH values are deter-
o = 1.50g. What is the probability that a
packet is underweight? mined correct to the nearest tenth of a
unit construct a cumulative frequency
A decision is taken that the probability * curve to illustrate the distribution.
of an underweight packet should not
A possible measure of kurtosis (i.e. flat-
exceed 0.001. To change the distribution
ness) is given by
of weights of the semolina packets to con-
form to this decision, two methods are
considered:
eoneen
Pg99— Pio
(a) to increase U, leaving 0 unaltered;
(b) to improve the packing machine, thus where Q is the semi-interquartile range,
reducing 0, while leaving W unaltered. Poo the 90th percentile and Pio the 10th
percentile. Estimate the value of k for the
Find the new values (of mu and of © res-
above distribution.
pectively) required for each method to
succeed, given that, for the standardised Use the standard normal table (p. 633)
Normal distribution, to estimate the value of k for a Normal
distribution. Is the above distribution
P(Z > 3.0902) = 0.0010 flatter than a Normal distribution with the
(SMP) same total frequency? (SUJB)

A factory is illuminated by 2000 bulbs. 10. Describe the principal features of a normal
The lives of these bulbs are normally distribution. Draw a sketch of the proba-
distributed with a mean of 550 hours and a bility density function of the distribution
standard deviation of 50 hours. It is N(0,1).
decided to replace all the bulbs at such
A machine is producing a type of circular
intervals of time that only about 20 bulbs
gasket. The specifications for the use of
are likely to fail during each interval. How
these gaskets in the manufacture of a
frequently should the bulbs be changed?
certain make of engine are that the thick-
When the manufacturing process is improved ness should lie between 5.45 mm and
so that the mean life of bulbs is increased 5.55 mm, and the diameter should lie
to 600 hours and the standard deviation is between 8.45 mm and 8.54 mm. The
reduced to 40 hours, the replacement machine is producing the gaskets so that
interval is changed to 500 hours. Show that their thicknesses are N(5.5, 0.0004), that
it will now be necessary to tolerate the is, normally distributed with mean 5.5 mm
failure of only about 12 bulbs per interval. and variance 0.0004 mm’, and their dia-
(AEB 1973) meters are independently distributed
Before joining the Egghead Society, every N(8.54, 0.0025).
candidate is given an intelligence test Calculate, to one decimal place, the per-
which, applied to the general public, would centage of gaskets produced which will
give a normal distribution of 1.Q.’s with not meet
mean 100 and standard deviation 20. (a) the specified thickness limits,

Table A

Acidity (pH) A.6~.4.8- 6.0- 5.26 ,5.4.5.6-05 8-6 6.0-6:2


No. of samples
THE NORMAL DISTRIBUTION
v
355
(6) the specified diameter limits, 12. The random variables X; and X> are both
(c) the specifications. galseceddistributed such that
Find, to 3 decimal places, the probability ~ N(t1, 01°) and X2~ N(ia, G2y,
that, if 6 gaskets made by the machine ales that M;<pM, and o,?<,”,
are chosen at random, exactly 5 of them sketch both distributions on the same
will meet the specifications. (L) diagram.
State the ‘20 rule’ for a normal random
variable. Explain how you used, or could
11. State the conditions under which the have used, a normal distribution in a
binomial distribution is a suitable model project.
to use in statistical work. Describe briefly The weights of vegetable marrows sup-
how a binomial distribution was used, or plied to retailers by a wholesaler have a
could have been used, in one of your normal distribution with mean 1.5 kg
projects giving the parameters of your and standard deviation 0.6 kg. The whole-
distribution. saler supplies 3 sizes of marrow:
It is known that bearings produced at a Size 1, under 0.9 kg,
factory have diameters that are normally Size 2, from 0.9 kg to 2.4 kg,
distributed with mean 14.2 mm and Size 3, over 2.4 kg.
standard deviation 1.2 mm. Find, to 4 Find, to 3 decimal places, the propor-
decimal places, the probability that a tions of marrows in the three sizes. Find,
bearing chosen at random from the in kg to one decimal place, the weight
production will have a diameter less than exceeded on average by 5 marrows in
13.9mm. every 200 supplied.
Six bearings are to be chosen at random The prices of the marrows are 16p for
from the production. Find, to 2 significant Size 1, 40p for Size 2 and 60p for Size 3.
figures, the probability that at least 5 of Calculate the expected total cost of 100
these bearings will have diameters between marrows chosen at random from those
13.9mm and 14.6 mm. (L) supplied. (L)

THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Under certain circumstances the normal distribution can be used as


an approximation to the binomial distribution. One practical advan-
tage is that calculations are much less tedious to perform.

If X ~ Bin(n, p) then
E(X) ane
Var(X) ll npq where q = rep
Now, for large n and p not too small or too large,
X ~ N(np,npq) approximately

Example 618 Find the probability of obtaining between 4 and 7 heads inclusive
with 12 tosses of a fair coin,
(a) using the binomial distribution,
(b) using the normal approximation to the binomial distribution.

Solution 6.18 Let X be the r.v. ‘the number of heads obtained’. Let ‘success’ be
‘obtaining a head’.
Then X ~ Bin(n, p) where n = 12 and p = P(head) = 5
A CONCISE COURSE IN A-LEVEL STATISTICS
356
1 12—x 4: x

(a) So X~Bin(12,4) and P(X =x) = YC, ) |


re ster Ae
1 8 1 4 1 12)

Now Pix(Xx = 10-0


) ao] [2]
(2) Vo eC a\5 = 0.121

1 12

P(X =5) = vole] = 0.193


1 12

P(X =6) = aed = 0.226


1 12
P(X =7) = 0,(5] = 0.193

So P(4<X<7) = 0.1214+0.193 + 0.226+ 0.1938


= 0.7383 (3d.p.)
The probability of obtaining between 4 and 7 heads inclusive is
0.7338 (3 d.p.).

The probability distribution for the number of heads in 12 tosses


has been calculated and is shown below. The required probability is
the sum of the areas of the shaded rectangles. Now this can be
approximated by the area under the corresponding normal curve.

ah 5 6 ae Om On i 12
3.5 7.5 Number of heads

(b) To find the probability of obtaining between 4 and 7 heads


using the normal approximation we consider X such that

where n= 12 and p = 5
X ~ N(np,npq)
So xX ~ N(6,3)
However, before using the approximation we must take into account
the fact that we are using a continuous distribution to approximate
a discrete variable. So we make a continuity correction.
In this example P(4 < X < 7) transforms to P(3.5 < X < 7.5).
So,
pea yeKas a
P(3.5 < X < 7.5)
/3 /3 /3
= P(—1.443 <Z < 0.866)
35 675
0.732 (8d.p.) S.V. —1.443 0 0.866
THE NORMAL DISTRIBUTION
/ 357
The probability of obtaining between 4 and 7 heads inclusive is
0.732 (3 d.p.).

NOTE: this answer compares very well with the answer in part (a),
and the working is much quicker to perform.
The approximation is even better for large n and it is preferable that
D is close to 5:

Continuity corrections often cause difficulty so we will look at


these in more detail. It will be helpful to refer to the diagram
showing the distribution for 12 tosses of the coin.

If we require the probability that there are 3 heads or less, i.e.


P(X S38), then we consider P(X < 3.5).
So P(X < 8) transforms to P(X < 3.5).
We will use the notation
P(X <3) — P(X <3.5) 35335
P(X <3) —rectangle included

If we require the probability that there are less than 3 heads, i.e.
P(X <3), then we consider P(X < 2.5).
So P(X <0) 6—« ox
=2.5)

25335
P(X <3)—rectangle not included

If we require the probability that there are exactly 3 heads, then


P(X = 8) — P(2.5<X<3.5)
Further examples:
P(5 < X¥<8) —> P(4.5<X< 8.5)
P(5<X <8) — P(5.5<X < 8.5)
P(5<X <8) —> P(4.5<X<7.5)
P(5<X<8) — P(5.5<X <7.5)
P(X <4) we P(X < 3.8)
P(X <4) see Ae a.0}
P(X > 4) ——= 2P(X) > 3:5)
P(X > 4) —+ P(X>4.5)
P(X= 9) — P(8.5<xX<9.5)
P(X a1) — > P(6.5<X<7.5)
POG 0) —— = PX
>—-0-5)
PEs OPK 0:5)
P(X = 0) —- P(-0.5<X< 0.5)
A CONCISE COURSE IN A-LEVEL STATISTICS
358
seeds 35% are ryegrass. Use
Example 6.19 It is known that in a sack of mixed grass
distribution to find the
the normal approximation to the binomial
are
probability that in a sample of 400 seeds there
(a) less than 120 ryegrass seeds,
(b) between 120 and 150 ryegrass seeds (inclusive),
(c) more than 160 ryegrass seeds.

s’ be
Solution 6.19 Let X be the r.v. ‘the number of ryegrass seeds’. Let ‘succes
‘obtaining a ryegrass seed’.
Then X ~ Bin(n, p) where n = 400 and p = 0.35.
Now, as 7 is large, we use the normal approximation to give
X ~ N(np,npq) where np = (400)(0.35) II 140
npq = (140)(0.65) 91

so X ~ N(140,91)

(a) We require
P(X <120) —— P(X < 119.5) (continuity correction)

Now
X—140 ,119.5—140
= Ae ed
P(X <119.5)
V91 J/91
=" P(Z <2 .149) 119.5 140
= 0.0158 $.V. —2.149 0

The probability that there are less than 120 ryegrass seeds is 0.0158.

(b) P(120 < xX<150) —~ P(119.5 < X < 150.5) (continuity


correction)

119.5— ue Xi= e
P(119.5 <X<150.5) = p[uesa oe Ba
V91 V91 J91
P(—2.149 <Z<1.101) s.d.=/91
0.8487

119.5 140 150.5


S.V. —2.149 O 1.101

The probability that there are between 120 and 150 ryegrass seeds
is 0.8487.
THE NORMAL DISTRIBUTION 359
i
(c) P(X > 160) —+ P(X > 160.5) (continuity correction)

Px 160.5) = a
x14 i ; s.d.=+/91
ey
V91 V91
= P(Z > 2.149)
140 160.5
= 0.0158 S.V. 0 2.149

The probability that there are more than 160 ryegrass seeds is
0.0158.

Example 6.20 The random variable X has a binomial distribution with parameters
n and p. Derive the mean and variance of X.
Show that the probability of obtaining a total of seven when two
fair dice are tossed is 1/6. A pair of fair dice is tossed 100 times and
the total observed on each occasion. What is the probability of
getting more than 25 sevens? How many tosses would be required
in order that the probability of getting at least one seven is 0.9
or more. (AEB)

Solution 6.20 If X ~ Bin(n, p) then E(X) = np and Var(X) = npq (see p. 214).
Total on two dice
P(total of 7 when two dice are tossed)

die
Second

ees qe
First die

Let X be the r.v. ‘the number of sevens when two dice are tossed’.
Let ‘success’ be ‘obtaining a total of 7’.

Then X ~ Bin(n, p) where n = 100 andp = a so X ~ Bin(100, a):

Now nis large and p is not too small, so we use the normal approxi-
mation:
1
X ~ N(np,npq) where np = (100)(5] = 50/3

aS 125
mB AOD)Ke harbiye
standard deviation = 55/3

so X ~ N(50/3, 125/9).
360 A CONCISE COURSE IN A-LEVEL STATISTICS

We require
(continuity correction)
P(X > 25) —» P(X > 25.5)
| s.d. = 55/3
P(X > 25.5 = oe
: 8) ee SVB 5/5/38
P(Z > 2.370) 50/3 25.5
S.V. 0 2.370
Il 0.00889

The probability of obtaining more than 25 sevens is 0.008 89.

Let the number of tosses required be n.

ee Binnie
3
6 | |

Now P(X = 0)
3)
P(at least one 7) = P(X 21)

1—P(X
=0)
iS

ie)
—s

6
We require n such that
3
ke
| V 0.9
oO
|o
aw
eee

5 n

i.e. (=) IX 0.1


6
Taking logs to base 10

nioe(>] < log(0.1)

5, 108(0.1)
log(5/6)
(when dividing by a negative quantity the inequality is reversed.)

n 2 12.63

The number of tosses required is 13.


THE NORMAL DISTRIBUTION 361
ns

Exercise 6h
er RR a a cn orn nA CCAR RECO ld

Continuity correcti—ons write down the 7. It is estimated that 1/5 of the population
transformations for each of the following: of England watched last year’s Cup Final
(a) (3 <5 X <Q), (BF) P(B<X<9), on television. If random samples of 100
(c) P(10< X < 24), (d) P(2<X <8), people are interviewed, calculate the
(e) P(X > 54), (f) P(X> 76), mean and variance of the number of
(g) P(45 < X <67), (h) P(X < 109), people from these samples who watched
(i) P(X <45), (j) P(X = 56), the Cup Final on television.
(k) P(400 < X < 560), (1) P(X = 67), Use normal distribution tables to estimate,
(m) P(X > 59), (n) P(X = 100), to 2 significant figures, the approximate
(0) P(84< X <48), (p) P(X=7),
probability of finding, in a random
(q) P(X > 509), (r) P(X <7), sample of 100 people, more than 30
(s) P(27 SX <-29), (t) P(X = 58). people who watched the Cup Final on
television. (L Additional)
If X ~ Bin(200, 0.7), use the normal
approximation to find (a) P(X 2 130), 8. In a series of n independent trials the
(b) P(1386 <X<148), (c) (X< 142), probability of a ‘success’ at each trial is
(d) P(X > 152), (e) P(141 <<X < 146). p. If R is the random variable denoting
the total number of successes, state the
10% of the chocolates produced in a probability that R =r. State, also, the
factory are mis-shapes. In a sample of mean and variance of R.
1000 chocolates find the probability that A certain variety of flower seed is sold
the number of mis-shapes is (a) less than in packets containing about 1000 seeds.
80, (b) between 90 and 115 inclusive, The packet claims that 40% will bloom
(c) 120 or more. white and 60% red. This may be assumed
to be accurate.
Find the probability of obtaining more If five seeds are planted estimate the
than 110 ones in 400 tosses of an un- probability that
biased tetrahedral die with faces marked (a) exactly three will bloom white;
1,2,3 and 4. (6) at least one will bloom white.
If 100 seeds are planted use the normal
A coin is biased so that the probability approximation to estimate the probability
that it will come down heads is double of obtaining between 30 and 45 white
the probability that it will come down flowers. (SUJB)
tails. The coin is tossed 120 times. Find
the probability that there will be (a) bet- 9. A die is biased so that the probability of
ween 42 and 51 tails inclusive, (b) 48 obtaining a six isi. The die is thrown 200
tails or less, (c) less than 34 tails, times. (a) Find the probability of obtain-
(d) between 72 and 90 heads inclusive. ing a six on the die (i) more than 60
times, (ii) less than 45 times, (iii) between
An experiment consists of tossing two 40 and 55 times (inclusive). (b) How
unbiased coins. The outcome is called a many throws would be required if the
success if and only if two heads appear, probability of obtaining at least one six
all other outcomes being called a failure. is greater than 0.9?
If the experiment were repeated 27 times,
write down the binomial distribution 10.
Two hundred fair dice are thrown 1000
times. Use the normal approximation to
governing this series of experiments in the
form (p+ q)”, stating the values of p, q the binomial distribution to find the
and n. number of times you would expect to
have the following number of sixes
Find the expected number of successes (a) 30, (b) 53, (c) more than 38, (d) less
and the standard deviation of this distribu- than 28, (e) between 28 and 38 inclusive.
tion.
With the normal curve approximation 11. A certain tribe is distinguished by the
estimate, using tables and giving your fact that 45% of the males have 6 toes
answer to 2 decimal places, the proba- on their right foot. Two explorers discover
bility of obtaining at least 5 successes. a group of 200 males from the tribe. Find
(L Additional) the probability that the number who have
A CONCISE COURSE IN A-LEVEL STATISTICS
362

six toes on their right foot is (a) 90, new pass mark be if it is decided that only
(b) less than 85, (c) between 82 and 91 115 candidates pass?

prelusives stg enOaeen ines 13. A lorry load of potatoes has, on average,
12. Four hundred pupils sit a test which con- one rotten potato in 6. A greengrocer
sists of 80 true-false questions. None of tests a random sample of 100 potatoes
the candidates knows any of the answers and decides to turn away the lorry if he
and so guesses. (a) If the pass mark is finds more than 18 rotten potatoes in the
38, how many of the candidates would be sample. Find the probability that he
expected to pass? (b) What should the accepts the consignment.

THE NORMAL APPROXIMATION TO THE POISSON DISTRIBUTION

If X ~ Po(d) then £E(X) II >

: : Var(X) lI >

Now, for large


X ~ N(A,A) approximately

Generally, we require \ > 20 for a good approximation.

Example 6.21 A radioactive disintegration gives counts that follow a Poisson


distribution with mean count per second of 25. Find the probability
that in 1 second the count is between 238 and 27 inclusive,
(a) using the Poisson distribution,
(b) using the normal approximation to the Poisson distribution.

Solution 6.21 Let X be the r.v. ‘the radioactive count in a 1 second interval’. Then
X ~ Po(25).

Now, we require P(23 < X< 27),

(a) Using the Poisson distribution:

P(X =x) = e 3

sO P(X = 23) = e * oe = 0.076 342

Using the recurrence formula for ease of calculation:


25
Babieses am P(X =x) = Px
25
Po = 94P23 (0.079 5229)

25 i
P25 = = ~Pra (0.079 5229)
25
THE NORMAL DISTRIBUTION / 363

25
Pos = 96°25 (0.076 464 8)

25
Pa = 97 P26 (0.070 800 3)

So P(23 SX S27) = P23 + Prat P25 + P26 + P27


= 0.3883 (38 d.p.)
The probability that the count is between 23 and 27 inclusive is
0.383 (3 d.p.).
(b) Using the normal approximation, X ~ N(25, 25).
So P(23<X <27) —> P(22.5<X<275) coe ae
correction)
one ih Ne
P(22.5<X<27.5) = pP2s—2 <x sf 27.5 =)
5
P(—-0.5<Z<0.5)
0.383
22.9425) 27,0
3.V. 0.5 0) 0:5

So the probability that the count is between 23 and 27 inclusive is


0.383.
NOTE: this answer compares very well with the answer in part (a)
and the working is easier.

Gi i(w®
Exercise

1.) If X ~ Po(24), use the normal approxima- one hour, (a) there are more than 33
tion to find (a) P(X < 25), calls, (6b) there are between 25 and 28
(b) P(22<X < 26), (c) P(X> 23). calls (inclusive), (c) there are 34 calls.

2. If X ~ Po(35), use the normal approxima-{ 5. In a certain factory the number of


tion to find (a) P(X < 33), accidents occurring in a month follows a
(b) P(83 <X <7), (c) P(X SS) Poisson distribution with mean 4. Find
(d) P(X = 37). the probability that there will be at least
40 accidents during one year.
If X ~ Po(60), use the normal approxima-
tion to find (a) P(50< X <58), The number of bacteria on a plate viewed
(b) P(57 <X <68), (c) P(X> 52), under a microscope follows a Poisson
(d) P(X2 70). distribution with parameter 60. Find the
probability that there are between 55 and
The number of calls received by an office
75 bacteria on a plate.
switchboard per hour follows a Poisson
distribution with parameter 30. Using the A plate is rejected if less than 38 bacteria
normal approximation to the Poisson are found. If 2000 such plates are viewed,
distribution, find the probability that, in how many will be rejected?
364 A CONCISE COURSE IN A-LEVEL STATISTICS

7. Inan experiment with a radioactive sub- than 250 eggs are laid, (iii) between 180
stance the number of particles reaching a and 240 eggs (inclusive) are laid.
counter over a given period of time (b) If the probability that an egg develops
mean
follows a Poisson distribution with is 0.1, show that the number of survivors
22. Find the probability that the number follows a Poisson distribution with para-
of particles reaching the counter over the meter 20, and find the probability that
given period of time is (a) less than 22, there are more than 30 survivors.
(b) between 25 and 30, (c) 18 or more.
10. Two towns, Allport and Bunchester, are
linked by telephone. There are 2000 sub-
8. The number of accidents on a certain rail- scribers in Allport, but it is too expensive
way line occur at an average rate of one
to install 2000 trunk lines between the
every 2 months. Find the probability that two towns. In a busy hour, each sub-
(a) there are 25 or more accidents in 4 scriber in Allport requires a trunk line
years, (b) there are 30 or less accidents in to Bunchester for an average time of 2
ovens. minutes. Show that the number of trunk
lines in use follows a Poisson distribution
9. The number of eggs laid by an insect with mean 66.67 per hour. What is the
follows a Poisson distribution with para- ‘minimum number of trunk lines that
meter 200. (a) Find the probability that should be installed if only 1% of all the
(i) more than 150 eggs are laid, (ii) more calls will fail to find an empty trunk line?

WHEN TO USE THE DIFFERENT APPROXIMATIONS

Restrictions on parameters Approximation

n large (say n > 50) X ~ Po(np)


p small (sayp < 0.1)

n > 10, p close to 5 X ~ N(np, npq)


or
n > 30(say), p moving away
from 15

dA > 20(say) X ~ N(A, A)

Example 622 If X ~ Bin(10,}), find the probability that X = 5. Then find the
approximation to this probability using (a) the normal distribution,
(b) the Poisson distribution.

Solution 6.22 X ~ Bin(n,p) where n= 10, p= §


X ~ Bin(10,})
10—x x 10
Now PX = x)=" °C: :] | = We -
2 2 *\2

so P(X =5) 10
cs{5)i f
= 0.2461
P(X = 5) 0.2461.
THE NORMAL DISTRIBUTION ; 365
(a) Using the normal approximation:
1
X ~ N(np,npq) where = np = ao(5] = 5

a
npq = @)(5] = 2.5
2
So x ~ N(5; 2.5)
Now
P(X = 5) ——._ P(4.8<.X, < 5.5) (continuity correction)
45-5 X-5 5.5—5
P(4.5<X<5.5)
V2.5 J/2:5. »/25
P(— 0.316 < Z < 0.316)
0.2478
45 5 55
S.V. —0.316 0 0.316

So the probability that X = 5 is 0.2478 (using the normal approxi-


mation).

NOTE: this is a fairly good approximation even though n = 10.


This is because p is exactly 5:

(b) Using the Poisson approximation:


A=np=5 so X ~ Po(d)
55
P(X =5) = Srna = 0.175

that X = 5 is 0.175 (using the Poisson approxi-


So the probability ae ee
beget ene ia ager ON
mation).
NOTE: this is a poor approximation since we should have n > 50
and p< b for a good Poisson approximation.

Example 623 If X ~ Bin(20, 0.4), find the probability that 6 < X <10. Then find
the approximations to this probability using (a) the normal distri-
bution, (b) the Poisson distribution.

Solution 6.23 X ~ Bin(n,p) where n = 20, p = 04

i.e. X ~ Bin(20, 0.4)


(014
= x) ="*9C} (0.6)°-*
P(X
= 6)°= Pe =
P(X 200,(0.6)!4(0.4)® = 0.124 4117
366 A CONCISE COURSE IN A-LEVEL STATISTICS

Now, using the recurrence formula:

_ (n—x)p
Px+1 (x +1)q x

Di se (0.165 882 2)

(04),
Ps = 13 (0.179 705 7)

Do. oro (0.159 738 4)

Pio = lo. (0.117 141 5)

P(6<X<10) = petpzt+...+
Pro
= 0.7469 (4d.p.)
Using the binomial distribution, P(6 < X < 10) = 0.7469 (4 d.p.).

(a) Using the normal distribution:

X ~ N(np,npq) where np = (20)(0.4) = 8


npq = (8)(0.6) = 4.8
sO X ~ N(8, 4.8)
Now
P(6<X <10) — P(5.5 << X < 10.5) (continuity
correction)
p(s <a ce)
P(5.5 < X < 10.5)
9/4:8— x48") 45/48
lI P(—1.141 <Z< 1.141)
s.d. =//48
= 0.7462

5.5 8 105
SV. —1.141 0 1.141
Therefore P(6 < X < 10) = 0.7462, using the normal approxima-
tion.

NOTE: this is a good approximation. p has moved away from 5


but n = 20, which is quite large.

(b) Using the Poisson distribution:


A = np = 8
THE NORMAL DISTRIBUTION / 367
8*

So X~Po(8) and P(X=x) = ha


x!
g6

Pe = PR a= Ohi oe te (0.122 138 2)


Now, using the recurrence formula:
PN
DPx+1 (x 4 fae

- we have Dp, = 876 (0.139 586 5)

8
Pea gP? (0.139 586 5)

Pos
is9 P8 (0.124 076 9)

Pio = 2
7pPs (0.099 261 5)
P(6<SX<10) = petprt+...+
Dio
= 0.6246 (4d.p.)
So P(6 S X < 10) = 0.6246 (4 d.p.) using the Poisson approxima-
tion.

NOTE: this is a poor approximation since we should have n > 50


and p< a

Example 6.24 If X ~ Bin(100,0.05) find the probability that X = 4. Then find


the approximations to the probability using (a) the normal distri-
bution, (b) the Poisson distribution.

Solution 6.24 X ~ Bin(n,p) where n = 100, p = 0.05


ie. X ~ Bin(100, 0.05)
Now PUN a= C095). 0.0)
so POV Aye 7(0;95)75(0.05)"
0.1781 (4d.p.)
Therefore P(X = 4) = 0.1781 (4 d.p.) using the binomial distribu-
tion.

(a) Using the normal approximation:


X ~ N(np,npq) where np (100)(0.05) = 5

npq (5)(0.95) = 4.75

so. X ~ N(5,4.75)
368 A CONCISE COURSE IN A-LEVEL STATISTICS

Now
P(X = 4) — P(3.5<X< 4.5) (continuity correction)
3,5 54 VX 5) 40
P(8.5<X<4.5) = PSs <
i V/V4.75
<
a)
P(— 0.6882 < Z <— 0.2294)
0.1637

3.5 |5
=a
S.V. oO

0.6882
— 0.2294

So P(X = 4) = 0.1687 using the normal approximation.

NOTE: this is a fairly good approximation, even thoughp is small.


This is because nis large.

(b) Using the Poisson approximation.


AX = np = 5 Aa POS)
5*

P(X =x) = e §—
x!
54

50, P(X =4) = tha = 0.1755 (4dp.)


So P(X = 4) = 0.1755 (4 d.p.) using the Poisson approximation.
NOTE: this is a good approximation, since n is large and p is small;
also note that mean ~* variance.

Example 6.25 If X ~ Po(30), find P(28 < X < 32). Then find the approximation
to this probability using the normal distribution.

Solution 6.25 X ~ Po(80)


30*
So POG a) een car
x
3028
and P(X= 28) = Pog = Seana (0.070 213 3)

Using the recurrence formula


30
Pose
Xtal (eee
THE NORMAL DISTRIBUTION / 369
30
we have Pro = 55P2s (0.072 634 5)
30
P30 = 35P29 (0.072 634 5)
30
P31 = >. P30 (0.070 291 4)
Se
30
P39 — 3973! (0.065 898 2)

P(28 SX S32) = pogt...+ p32


0.3517 (4d.p.)
So P(28 < X < 32) = 0.38517 (4d.p.).

Using the normal approximation:


X ~ N(80,30)
We require
P(28 SX S 82) — P(27.5< X < 32.5) (continuity
correction)
pa _X—30 sf2
P(27.5 <X < 82.5)
V30 ~——/30 /30
P(—0.456 < Z < 0.456) s.d. =./30

0.3516
27.5 30 32.5
S.V. —0.456 0 0.456

So P(28 < X < 32) = 0.3516 using the normal approximation.

NOTE: this is a good approximation since \ > 20.

E
pee i e SS

_ Exercise 6j ee

In Questions 1 to 4 calculate the probabilities 4, X~ Bin(120, 0.1), find P(X = 8).


using the binomial distribution. Then find the
approximations to them using (a) the normal 5. If X ~ Po(27), (a) find P(X = 30),
distribution, (b) the Poisson distribution. (b) find an approximation using the
Comment on your answers. normal distribution. Comment on your
answer.
1. X~ Bin(15,$), find P(7 <X <9).
seve |sh ries <12). Then
< X0
6. If X~ Po(12), find P(1
2. -X~ Bin(60,0,08), find P(4 S <S.8)- find an approximation using the normal
3. X~ Bin(30, 0.6), find P(X = 17). distribution. Comment on your answer.
TE a
A CONCISE COURSE IN A-LE VEL STATISTICS
370
EEE
Miscellaneous Exercise 6k

1. A number of different types of fungi are 4. Henri de Lade regularly travels from his
distributed at random ina field. Eighty home in the suburbs to his office in Paris.
per cent of these fungi are mushrooms, He always tries to catch the same train,
and the remainder are toadstools. Five per the 08.05 from his local station. He
cent of the toadstools are poisonous. A walks to the station from his home in
man, who cannot distinguish between such a way that his arrival times form a
mushrooms and toadstools, wanders across normal distribution with mean 08.00
the field and picks a total of 100 fungi. hours and standard deviation 6 minutes.
Determine, correct to 2 significant figures, (a) Assuming that his train always leaves
using appropriate approximations, the on time, what is the probability that on
probability that the man has picked any given day Henri misses his train?
(a) at least 20 toadstools, (b) If Henri visits his office in this way 5
(b) exactly two poisonous toadstools. days each week and if his arrival times at
(C) the station each day are independent,
what is the probability that he misses his
An old car is never garaged at night. On train once and only once in a given week?
the morning following a wet night, the (c) Henri visits his office 46 weeks every
probability that the car does not start year. Assuming that there are no absences
is 3. On the morning following a dry during this time, what is the probability
night, this probability isx. The starting that he misses his train less than 35 times
performance of the car each morning is in the year? (AEB 1980)
independent of its performance on
previous mornings. The probability of a man aged exactly 85
(a) There are 6 consecutive wet nights. dying before he is 86 is about 0.211.
Determine the probability that the car Write down an expression for p,, the
does not start on at least 2 of the 6 probability that r of a group of n men
mornings. aged exactly 85 die before they are 86.
(bo) During a wet autumn there are 32 (a) Calculate pp when n= 5.
wet nights. Using a suitable approxima- (b) By considering (p,/p,+1), or other-
tion, determine the probability that the wise, calculate the most likely value of r
car does not start on less than 16 of the for the case n = 100.
32 mornings. (c) Use the normal approximation to the
(c) During a long summer drought there binomial to estimate the probability that
are 100 dry nights. Using a Poisson at least 25 of a group of 100 men aged
approximation, determine the probability exactly 85 die before they are 86. (MEI)
that the car does not start on 5 or more
of the 100 mornings. In Urbania, selection for the Royal Flying
(Give 3 decimal places in your answers.) Corps (RFC) is by means of an aptitude
(C) test based on a week’s intensive military
training. It is known that the scores of
An urn contains 100 balls of which 4 potential recruits on this test follow a
are coloured red and the remainder are normal distribution with mean 45 and
coloured white. A ball is drawn at random standard deviation 10.
from the urn, its colour is noted and it is (a) What is the probability that a
then replaced in the urn. randomly chosen recruit will score
Write down (but do not evaluate) an between 40 and 60?
expression for the probability that, in a (6) What percentage of the recruits is
total of 10 such draws, a red ball is drawn expected to score more than 30?
exactly once. (c) Ina particular year 100 recruits take
the test. Assuming that the pass mark is
Determine, correct to two decimal places,
making use of a suitable approximation 50, calculate the probability that less than
in each case, the probability that 35 recruits qualify for the RFC.
(a) in a total of 100 such draws, a red (AEB 1978)
ball is drawn on exactly four occasions,
(6) ina total of 9600 such draws, a red During an advertising campaign, the
ball is drawn on between 350 and 400 manufacturers of Wolfitt, (a dog food)
occasions inclusive. (C) claimed that 60% of dog owners preferred
THE NORMAL DISTRIBUTION , 371
to buy Wolfitt. Assuming that the manu- 10. (a) Every year very small numbers of
facturer’s claim is correct for the popula- American wading birds lose their way
tion of dog owners, calculate on migration between North and South
(a) using the binomial distribution, and America and arrive in Great Britain
(b) using a normal approximation to the instead, so that in September the propor-
binomial; tion of American waders amongst the
the probability that at least 6 of a random waders in Great Britain is about one in
sample of 8 dog owners prefer to buy ten thousand.
Wolfitt. Comment on the agreement, or At Dunsmere (a bird reserve in Great
disagreement, between your two values. Britain), one September, there are twenty
Would the agreement be better or worse if thousand waders, which may be regarded
the proportion had been 80% instead of as a random sample of the waders present
60%? in Great Britain. Determine the probability
Continuing to assume that the manu- that there are
facturer’s figure of 60% is correct, use the (i) no American waders present at
normal approximation to the binomial Dunsmere,
to estimate the probability that, of a (ii) more than two American waders
random sample of 100 dog owners, the present at Dunsmere.
number preferring Wolfitt is between 60 (6) Three-quarters of all the sightings in
and 70 inclusive. (MEI) Great Britain of American waders are
made in the autumn. Suppose that in
If the probability of a male birth is 0.514, 1980 there will be ten sightings of
what is the probability that there will be American waders at Dunsmere. Assuming
fewer boys than girls in 1000 births? that all sightings are independent of one
(You may assume that 0.514 x 0.486 another, determine the probability that
~ 0.25.) exactly seven of these ten sightings will
be made in the autumn. (C)
How large a sample, to the nearest
hundred, should be taken to reduce the 11. An inter-city telephone exchange has 100
probability of fewer boys than girls to lines and on average 80 are in use at any
less than 5%? (You may assume that the moment (on a typical business-day .
sample size in this part of the question is morning). Calculate
sufficiently large for a continuity correc- (a) the probability that all lines are engaged;
tion to be unnecessary.) (SMP) (6) the probability that more than 30
lines are free.
On the surface of halfpenny postage We say that a number «x of lines is the
stamps there are either one or two ‘effective minimum level’ if the number
phosphor bands. Ninety per cent of half- of lines in use exceeds x for 95% of the
penny stamps have two bands and the rest time. Find x.
have one band. Of those having one band, (You may assume that for large n the
95% have the band in the centre of the binomial probability may be approxi-
stamp and the remainder have the band mated by a normal probability with mean
on the left-hand edge of the stamp. na and variance nab.) (SMP)
(a) Determine the probability that in a
random sample of ten halfpenny stamps 12. A telephone exchange serves 2000 sub-
there are exactly eight having two phos- scribers, and at any moment during the
phor bands. busiest period there is a probability of
(b) Determine, using a normal approxi- 1/30 for each subscriber that he will
mation, the probability that in a random require a line. Assuming that the needs
of subscribers are independent, write
sample of 100 halfpenny stamps there are
between five and fifteen stamps (inclusive) down an expression for the probability
having one phosphor band. that exactly N lines will be occupied at
any moment during the busiest period.
(c) Determine, using a Poisson approxi-
mation, the probability that in a random Use the normal distribution to estimate
sample of 100 halfpenny stamps there are the minimum number of lines that would
less than three stamps which have only a ensure that the probability that a call
single band, this band being on the left- cannot be made because all the lines are
hand edge of the stamp. occupied is less than 0.01.
(Any expressions evaluated should be Investigate whether the total number of
clearly exhibited, and answers should be lines needed would be reduced if the sub-
given correct to three significant figures.) scribers were split into two groups of
(C) 1000, each with its own set of lines. (MEI)
372 A CONCISE COURSE IN A-LEVEL STATISTICS

13. A population consists of individuals of method to find, to three decimal places,


three types A, B and C occurring in the probability that this total number of
proportions 1:5:14. mistakes is greater than 130. (JMB)
(a) A sample of three individuals is
16. Describe, briefly, the conditions under
drawn at random from the population.
which the binomial distribution Bin (n, p)
(i) Determine the probability that all
may be approximated by
three are of different types.
(a) anormal distribution,
(ii) Determine the probability that
(b) a Poisson distribution,
all three are of the same type.
giving the parameters of each of the
(b) Asample of 40 individuals is drawn approximate distributions.
at random from the population.
Among the blood cells of a certain animal
(i) Determine the approximate value
species, the proportion of cells which are
of the probability that 4 or more are
of type A is 0.37 and the proportion of
of type A. cells which are of type B is 0.004. Find,
(ii) Determine the approximate value to 3 decimal places, the probability that
of the probability that exactly 10 are in a random sample of 8 blood cells at
of type B. (C)
least 2 will be of type A.
14. In each of n independent trials of an Find, to 3 decimal places, an approximate
experiment the probability of an event A value for the probability that
occurring is 0.05. (c) in a random sample of 200 blood
(a) When n= 10, determine the proba- cells the combined number of type A and
bility that A occurs exactly once, giving type B cells is 81 or more,
your answer to three decimal places. (d) there will be 4 or more cells of type
(b) When n = 200, use a suitable approx- Bina random sample of 300 blood cells.
imate method to determine the proba- (L)
bility that A occurs not more than 10
times. ‘(JMB) 17. Manufactured articles are packed in
boxes each containing 200 articles, and
15. A discrete random variable X has the on average 13% of all articles manufactured
Poisson distribution given by are defective. A box which contains 4 or
a’
more defective articles is substandard.
P(X=r)=e ?—-, r=0 Using a suitable approximation, show
r! that the probability that a randomly
chosen box will be substandard is 0.353,
Prove that the mean and the variance of
correct to three decimal places.
X are each equal to a.
A lorry-load consists of 16 boxes,
When atrainee typist types a document
the number of mistakes made on any one randomly chosen. Find the probability
page is a Poisson variable with mean 3, that a lorry-load will include at most 2
independently of the number of mistakes boxes which are substandard, giving three
decimal places in your answer.
made on any other page. Use tables, or
otherwise, to find, to three significant A warehouse holds 100 lorry-loads. Show
figures, that, correct to two decimal places, the
(a) the probability that the number of probability that exactly one of the lorry-
mistakes on the first page is less than two, loads in the warehouse will include at
(b) the probability that the number of most 2 substandard boxes is 0.06. (C)
mistakes on the first page is more than
four. 18. Answer the following questions using, in
Find expressions in terms of e for each case, tables of the binomial, Poisson
or normal distribution according to which
(c) the probability that the first mistake
you think is most appropriate. In each
appears on the second page,
example draw attention to any feature
(d) the probability that the first mistake
which either supports or casts doubt on
appears on the second page and the second
your choice of distribution.
mistake appears on the third page.
(a) Cars pass a point on a busy city
Evaluate these expressions, giving your centre road at an average rate of 7 per
answers to four significant figures. five second interval. What is the probability
When the typist types a 48-page docu- that in a particular five second interval
ment the total number of mistakes made the number of cars passing will be
by the typist is a Poisson variable with (i) 7 or less,
mean 144. Use a suitable approximate (ii) exactly 7?
‘THE NORMAL DISTRIBUTION / 373
(b) Weather records show that for a 20. Explain briefly how you used, or could
certain airport during the winter months have used, a binomial distribution in a
an average of one day in 25 is foggy project.
enough to prevent landings. What is the
State the conditions under which a
probability that in a period of seven
normal distribution may be used as an
winter days landings are prevented on
approximation to the distribution Bin
(i) 2 or more days,
(n, p) and write down, in terms of n and
(ii) no days?
p, the mean and the variance of this
(c) The working lives of a particular
normal approximation.
brand of electric light bulb are distribu-
ted with mean 1200 hours and standard A large bag of seeds contains three
deviation 200 hours. What is the proba- varieties in the ratios 4:2:1 and their
bility of germination rates are 50%, 60% and
(i) a bulb’ lasting more than 1150 80% respectively. Show that the proba-
hours, bility that a seed chosen at random from
(ii) the mean life of a sample of 64 the bag will germinate is 7.
bulbs exceeding 1150 hours?*
Find, to 3 decimal places, the probability
(AEB 1988)
that of 4 seeds chosen at random from
*See ‘Distribution of the sample mean’,
the bag, exactly two of them will germin-
p. 408.
ate. Given that 150 seeds are chosen at
19: Explain briefly the circumstances under random from the bag, estimate, to 3 deci-
which a normal distribution may be used mal places, the probability that less than
as an approximation to a binomial distri- 90 of them will germinate. (L)
bution. Write down the mean and the
variance of the normal approximation to
the binomial distribution Bin (n, p). Give 21. Describe a project, or an experiment in
an example, from your projects if your course work, which you conducted
possible, of the use of this approximation to demonstrate a Poisson distribution.
stating the parameters of your binomial State the condition under which a normal
distribution and of your normal approxi- distribution may be used as an approxima-
mation. tion to the Poisson distribution. Write
down the mean and the variance of the
In a multiple-choice examination, candi-
normal approximation to the Poisson
date Jones picks his answer to each
distribution with mean X.
question at random from the list of 3
answers provided, of which only one is Tomatoes from a particular nursery are
correct. A candidate answering 18 or packed in boxes and sent to a market.
more questions correctly passes the Assuming that the number of bad toma-
examination. toes in a box has a Poisson distribution
(a) For a paper containing 45 questions, with mean 0.44, find, to 8 significant
use a normal approximation to find, to 3 figures, the probability of there being
decimal places, the probability that Jones (a) fewer than 2,
passes. (b) more than 2 bad tomatoes in a box
(b) It is required that the probability when it is opened.
that Jones passes should be less than Use a normal approximation to find, to 3
0.005. Use a normal approximation to decimal places, the probability that in 50
show that the paper should contain at randomly chosen boxes there will be
most 31 questions. (L) fewer than 20 bad tomatoes in total. (L)
ae
ee
RANDOM VARIABLES
AND RANDOM
SAMPLING
If X and Y are any two random variables, continuous or discrete,
then
E(X+Y) = E(X)+E(Y)
E(X—Y) = E(X)—E(Y)
Also, if X and Y are independent, then
Var(X + Y) = Var(X) + Var(Y)
Var(X — Y) = Var(X)+ Var(Y)

SUM AND DIFFERENCE OF TWO INDEPENDENT NORMAL VARIABLES

If X and Y are two independent normal variables such that


X~N(w,op) and Y ~ Nino?)
then K+Y~Nwituoftod
and X-Y ~ Num,0? +02)

Example 7.1 If.X ~ N(60,16) and Y ~ N(70,9), find (a) P(X+ Y< 140),
(b) P(120<X+Y<135), (c)P(Y-X>1),
(d) P(2< Y—X <12).

Solution 7.1 (a) X+Y~N(60+70,16+9),


i.e. X+Y ~ N(180, 25)

For convenience, let R = X+ Y,


sO R ~ N(180, 25)
374
RANDOM VARIABLES AND RANDOM SAMPLING / 375

We require R=X+Y

ie 00. te 2O0
P(R < 140) = pe < eae
5 5
= P(Z<2)
130 140
= 0.9772 SV. 0 2
Therefore P(X + Y < 140) = 0.9772.

(b) We require
120 3130.5. FR=1380 ..185—130
P(120 < R < 135) = pee Ae <
5 5 5
= P(i-2<Z2<1) BY
0.8185
Therefore P(120 <X+ Y<135) = 0.8185.
120 130 135
SiVe —2 ||

(c) We need to consider the r.v. Y—X.


Now Vo me N( LO 60, 9-16)
For convenience, let T= Y—X.

So Te~ N(10, 25)


We require
T—10 at
P(T> 7) P a
5 5
P(Z
> — 0.6)
= 0.7257 710
SV. —0.6 0
Therefore P(Y —X > 7) = 0.7257.

(d) We require
29—10. 1-10 12-10) 7=Y-x
P(2<T<12) ll
5 5 5 ree
= P(—-1.6 < Z< 0.4)

= 0.6006 2 10 12
SV. —1.6 0 04
Therefore P(2< Y—X <12) = 0.6006.
A CONCISE COURSE IN A-LEVEL STATISTICS
376
the news-
Example 7.2 Each weekday Mr Jones walks to the local library to read
to and from the library is a
papers. The time he takes to walk
normal variable with mean 15 minute s and standar d deviati on 2
is a normal variabl e with
minutes. The time he spends in the library
J/12 minutes . Find the
mean 25 minutes and standard deviation
the
probability that, on a particular day, (a) Mr Jones is away from
house for more than 45 minutes, (b) Mr Jones spends more time
travelling than in the library.

Solution 7.2 Let L be the r.v. ‘the time in minutes spent in the library’. Then
L ~ N(25,12).
Let W be the r.v. ‘the time in minutes spent walking to and from
the library’. Then W ~ N(15, 4).

(a) We require the distribution of the total time spent away from
the house.
Let Tes LW,
So T ~ N(40,16)

We require
T—40_ 45—40 T=L+W
P(T
> 45) = | at

= P(Z>1.25)
= 0.1056 Sanaa
S.V. QO 1.25

Therefore the probability that Mr Jones is away from the house for
more than 45 minutes is 0.1056.

(b) We require P(W > L), i.e. P(W—L > 0).


Let

U=W-L then U ~ N(15—25, 12+ 4)


we U ~ N(—10,16)
We require
=e) U=W-L
P(U>0) = po, 4
4
= P(Z> 2.5)
= 0.006 21

Therefore the probability that Mr. Jones spends more time travelling
than in the library is 0.006 21.
RANDOM VARIABLES AND RANDOM SAMPLING
y CHUA

_ Exercise 7a may
1. If X ~ N(100, 49) and Y ~ N(110, 576), per cup, what will be the gross profit on
find (a) P(X+ Y > 200), 1000 dispensed cups?
(b) P(180<xX + ¥ < 240), (v) What price per cup (to the nearest 5p)
(c) (Y—X <0), should the cafeteria charge if the average
(d) P(—20 <Y—X<50). profit is to be 5p per cup? (SUJB)
If X ~ N(75,5) and Y ~ N(78, 20), find Bolts are manufactured which are to fit
(a) P(X+ Y > 162), in holes in steel plates. The diameter of the
(b) P(140<X+Y <150), bolts is normally distributed with mean
(c) (X+ Y<4155), (d)P(X—Y>0), 2.60 cm and standard deviation 0.03 cm;
(e) (Y—X< 15). the diameter of the holes is normally
If A ~ N(3, 0.05) and B ~ N(2, 0.04), find distributed with mean 2.71 cm and standard
(a) PA—B> 1.9), (b) (A+ B<4.4), deviation 0.04 cm.
(c) P(B>A—0.6). (a) Find the probability that a bolt
selected at random has a diameter greater
If X ~ N(25,5) and Y ~ N(30, 4), find than 2.65 cm.
(a) (|\X+ Y—55|<5), (b) P(Y >X), (b) Find the probability that a hole
(c) (| Y—X—5|<3). selected at random has a diameter less than
2.65 cm.
At a self-service cafeteria a coffee machine
(c) Prove that, if a bolt and a hole are
is installed which dispenses (a) black coffee
selected at random, the probability that
in amounts normally distributed with mean
the bolt will be too large to enter the hole
6.10z and standard deviation 0.4 oz,
is about 0.0139.
(b) white coffee by first releasing a quantity
(d) The random selection of a bolt and a
of black coffee normally distributed with
hole described in (c) above is carried out
mean 4.9 oz and s.d. 0.3 0z and then
five times. Find the probability that in
adding milk normally distributed with
every case the bolt will be able to enter the
mean 1.2 0z and s.d. 0.2 oz. Each cup is
hole. (C)
marked on the inside to a level of 5.5 oz
and if this level is not attained the customer The diameters of axles supplied by a
receives the drink without charge. factory have a mean value of 19.92 mm
(i) What percentage of cups of black and a standard deviation of 0.05 mm. The
coffee will fall short of the 5.5 oz? inside diameters of bearings supplied by
(ii) What is the mean and s.d. of the another factory have a mean of 20.04 mm
amount of white coffee dispensed into and a standard deviation of 0.038 mm. What
each cup? is the mean and standard deviation of the
(iii) What percentage of cups of white random variable defined to be the diameter
coffee will fall short of 5.5 oz? of a bearing less the diameter of an axle?
(iv) If 10% of cups dispensed are black Assuming that both dimensions are nor-
and the cost per cup for the ingredients is mally distributed, what percentage of axles
2.1p per cup for both black and white and bearings taken at random will not fit?
coffee, whilst the customer is charged 10p (O & C)

EXTENSION TO MORE THAN TWO INDEPENDENT NORMAL


VARIABLES

We can extend the results on p. 374 as follows:


[px xe ., X, is any set of random variables, then

E(X,PX,+...+X,) = E(X,) + E(X2) +... tE(X,)


If the random variables are independent, then
Var(X,+X>+...+X,) = Var(X,) + Var(X.)+...+ Var(X,)
A CONCISE COURSE IN A-LEVEL STATISTICS
378

. :
If x Xo,...,Xy aren independent normal variables such that
02oe Xx, ~ Nin0) = :
Xi Be Oj a — a aon

o
- then
io N(ui + bat. 24074
+ bn02 mee +on?)
nay.

NOTE: In the special case when X,, X2, -. , A, are independent


so that
observations from the same normal distribution :
< Sue o”) fori=1,2,...,n,then
84 ot ~ N(np,n0?).

Example 73 If W~ N(100,8), X ~ N(120, 10) and Y ~ N(110, 12), find


P(W+X+Y< 320).
Solution 73 LetA=Wt+X+Y.
Then

E(A) E(W)+E(X)+E(Y) = 100+120+110 = 330


Var(A) Var (W)
+ Var(X) + Var(Y) = 8+10+12 = 30

So A ~N(330, 30).
We require
A=W+X+Y
<A
P( 820) = pA
/30 /30 s.d. =1/30

= P(Z<—1.826) =
= 0.0340
P( W+ X+ ¥<82 0)=0. 03 40. &N ~'me ©
Therefore

Example 74 Masses of a particular article are normally distributed with mean


20 g and standard deviation 2g. If a random sample of 12 such
articles is chosen, find the probability that the total mass is less
than 230 g.

Solution 74 Let X be the r.v. ‘the mass, in g, of an article’.


Then X, ~ N(20,4)
X, ~ N(20, 4)

X12 ~ N(20, 4)
RANDOM VARIABLES AND RANDOM SAMPLING 379
Now, let B= X,+X,+.:..+X45}
so E(B) = E(X,)+E(X,)+...+E(1)
X
==128(X)
= 240
and Var(B) = Var(X,)+ Var(X,)+...+ Var(X)2)
= 12Var(X)
= 48
We have B ~ N(240, 48)
We require
q B=X,+Xot...4+X49
B—240 | 230—240
P(B < 230) s.d. =/48
V48 /48
= P(Z<—1.443)
ns 230 240
mine ise S.V. —1.443 0

Therefore the probability that the total mass of the articles is less
than 230 g is 0.0745.

Example 75 If A~N(10,4), B ~ N(12,9) and C ~ N(8, 12), find


(a) PA+B—C<10), (b) QPB—C—A>0O),
(c) P[A, + A, —(B, + B,)+C,+C, > 20] where A, A, are two
independent observations from the population of A, etc., (d) the
probability that three independent observations from the population
of A have a sum which is greater than four independent observations
from the population of C.

Solution 75 (a) Let Y=A+B—C.


Then

E(Y) = E(A)+E(B)—E(C) = 104+12—8 = 14


Var(Y) = Var(A)+ Var(B) + Var(C) = 4+9+12 = 25
So Y ~ N(14, 25)
We require
PY <10) = = a) Y=A+B-C
5 5
= P(Z<—0.8)
= 0.2119
10 14
+ B—C< 10) = 0.2119.
Therefore P(A S.V. —08 0
380
A CONCISE COURSE IN A-LEVEL STATISTICS

(b) Let W=B—C—A.


Then
E(W) = E(B)—E(C) —E(A) = 128 082 6

Var(W) = Va+r( + Var(A) = 94+12+4 =


Var(C)B) 25

So W~ N(—6, 25)
We require
W—(=—6)— 0-—(-—6 w=B-C-A
P(W>0) = pH) SE)
5 5
=) P(Z 21,2)
= 0.1151 ——

0.1151. S.V. OF Wee


Therefore P(B—C—A>0)=

fey Let V = A, tA, —(B, Be) tC, Ce


Then E(V) = E(A,) +£(A,)—E(B,)(Bp) — + E(Cy) + E(C2)
— 2k(A)—2E(B)+ 2E(C)
= 20-24+16
a2
Var(V) = 2Var(A) + 2Var(B) + 2Var(C)
= §+18+24
="50
So V ~ N(12,50)
We require
(Ze , 20-12 |
P(V> 20)
/50 /50
P(Z> 1.181) 12 20
One 90 S.V. OST

Therefore P[A, + A>—(B, + B>) + C,+C, > 20] = 0.1290.


(d) Let U= A, +A, +A3—(C,+C,+C3+Cy,).
Then E(U) = 3E(A)—4E(C)
= 30-32
=—2
Var(U) = 3Var(A)+ 4Var(C)
= 12+48
= 60
So U ~ N(—2,60)
RANDOM VARIABLES AND RANDOM SAMPLING , 381
We require
P(A; tA, + Az > Cy +C,+C34+C,)
Uni 2) ae
ie. P(U>0O) = P|
60 V60
P(Z > 0.258) S.V. 0 0.258
0.3982

Therefore the probability that three observations from the popula-


tion of A have a sum which is greater than four observations from
the population of C is 0.3982.

Example 7.6 Ina cafeteria, baked beans are served either in ordinary portions or
in children’s portions. The quantity given for an ordinary portion is
a normal variable with mean 90 g and standard deviation 3 g and the
quantity given for a children’s portion is a normal variable with
mean 43 g and standard deviation 2 g. What is the probability that
John, who has two children’s portions, is given more than his
father, who has an ordinary portion?

Solution 7.6 Let C be the r.v. ‘the quantity given, in g, in a children’s portion’.
Then C ~ N(43, 4).
Let A be the r.v. ‘the quantity given, in g, in an ordinary portion’.
Then A ~ N(90, 9).

We require P(C, + C, >A),

1.e. P(C, + C,—A = 0).

Now let W = C,+C,—A


E(W) = E(C,)+ E(C,)—E(A)
= 2E(C)—E(A)
= 86—90
=-4

and Var(W) = Var(C,)


+ Var(C,) + Var(A)
= 2Var(C)
+ Var(A)
= 8+9
a7
So W~ N(—-4,17).
A CONCISE COURSE IN A-LEVEL STA TISTICS
382

Now

P(C,+C,—A>0) P(W>0)
(28)
Aa ca 0)
eh
W=C,+C,—A

P(Z > 0.970) s.d.=/17

0.166
470
S.V. 0 0.970

that John has more than his father is


aTheref
hhaahah tc ility
the probab
orei lah nh ts PSE eES PEED ES ©ES
0.166.

Exercise 7b

If A ~ N(50, 6), B ~ N(30, 8) and 6. If X; ~~ N(2,2) and Y; ~ N(1.5, 2.2) and


C ~ N(80, 11) find 20 26

(a) (PA+B+C >170), if L == X;andM= r=2ih Y;


(b) (-6 <A+ B—C <10), find P(L > M).
(c) P(Ay+ Az— (B+ C)< 0).
In a certain village the heights of women
A random sample of 20 items is taken follow a normal distribution with mean
from a normal population with mean 15 164 em and standard deviation 5 cm and
and variance 5. Find the probability that the heights of men are normally distributed
the sum of the values in the sample is less with mean 173 cm and standard deviation
than 305. 6 cm. If a man and woman are picked at
random, find the probability that (a) the
Lengths of rod of type A are normally
woman is taller than the man, (b) the
distributed with mean 5 cm and standard
man is more than 5 cm taller than the
deviation 0.5 cm and lengths of rod of
woman.
type B are normally distributed with
mean 10cm and standard deviation 1 cm. 8
The time taken to carry out a standard
Find the probability that (a) a length service on a car of type A is known, toa
consisting of 2 rods of type A and 4 rods good approximation, to be a normal
of type B is more than 52cm long, (b) a variable with mean 1 hour and standard
length consisting of 3 rods of type A and deviation 10 minutes. Assuming that only
2 rods of type B is between 33cm and one car is serviced at a time, find the
36cm long, (c) a length consisting of 6
probability that it will take more than 65
rods of type A is longer than a length
hours to service 6 cars.
consisting of 3 rods of type B.
The time taken to carry out a standard
If X ~ N(5,4) and Y ~ N(6,9), find the service on a car of type B is a normal
probability that a sample consisting of variable with mean 15 hours and standard
3 items from the population with r.v.
deviation 15 minutes. Find the proba-
X and 4 items from the population with
bility that 5 cars of type B can be serviced
r.v. Y will have a sum exceeding 50.
more quickly than 8 cars of type A. (C)
Chocolate Delight cakes are sold in
packets of 6. The mass of each cake is a 9.
If X; ~ N(70, 10), find
5
normal variable with mean 20 g and stan-
dard deviation 2 g. The mass of the packing
P\|335 < Z a 360).
l
material is a normal variable with mean
30 g and standard deviation 4 g. Find the 10. Four runners, A, B, C and D train to run
probability that the total mass of the the distances 100 m, 200m, 500m and
packet (a) exceeds 162g, (0) is less than 800 m respectively, in order to take part
137g, (c) lies between 140 g and 153g. in a 1600 m relay race. During training
RANDOM VARIABLES AND RANDOM SAMPLING
; 383
their individual times (recorded in The means and variances of independent
seconds) are normally distributed as normal variables X and Y are known. State
follows: A ~ N(10.8, 0.27), the means and variances of X+ Y in terms
B~ N(23.7, 0.3”), C ~ N(62.8, 0.97), of those of X and Y.
D ~ N(121.2, 2.17), Find the probability The values of two types of resistors are
that the runners take less than 3 minutes
normally distributed as follows:
35 seconds to run the relay race.
Type A: mean: 100 ohms; standard
deviation: 2 ohms
11. Type B: mean: 50 ohms, standard devia-
Mr Smith has five dogs, two of which are
tion: 1.3 ohms
male and three are female. The masses
of food they eat in any given week are (a) What tolerances would be permitted
normally distributed as follows: for type A if only 0.5% were rejected?
(b) 300-ohm resistors are made by
Standard connecting together three of the type A
deviation (kg) resistors, drawn from the total production.
What percentage of the 300-ohm resistors
Male 3.5 0.4 may be expected to have resistances
Female 25 0.3 greater than 295 ohms?
Find the probability that the two males (c) Pairs of resistors, one of 100 ohms
eat more than the three females in a and one of 50ohms, drawn from the
particular week. total production for types A and B
respectively, are connected together to
make 150-ohm resistors. What percentage
of the resulting resistors may be expected
12. The process of painting the body-work of to have resistances in the range 150 to
a mass-produced lorry consists of giving it 151.4 ohms? (AEB)
1 coat of paint A, 3 coats of paint B and
2 coats of paint C. A record of the 14. The time of departure of my train from
quantity of each type of paint used for Temple Meads Station is distributed
each coat is kept for each lorry produced normally about the scheduled time of
over a long period. The following table 08 25 with a standard deviation 1 minute.
gives the means and standard deviations I arrive at Temple Meads Station on 4
of these quantities measured in litres: another train whose time of arrival is
normally distributed about the scheduled
Mean ar time of 08 20 with standard deviation of
deviation 1 minute. It takes me 3 minutes to change
Pe re < ater dation
platforms.
The coat of paint A Sal 0.42
(a) Find the probability that I miss the
Each coat of paint B 1.3 0.15 08 25 and am late for work.
Each coat of paint C 1.0 0.12 (6) Find the probability that this happens
Assuming independence of the distribu- every day from Monday to Friday in a
tion for each coat, calculate the mean and given week.
standard deviation for the total quantity 15. The mass of a certain grade of apple is
of paint used on each lorry. normally distributed with mean mass
Assuming that the quantities of paint 120 g and standard deviation 10 g.
used for each coat are normally distribu- (a) If an apple of this grade is chosen at
ted, calculate random, find the probability that its mass
(a) the percentage of lorries receiving less lies between 100.5 g and 124g.
than 8.5 litres of paint, (b) If four apples of this grade are
(b) the percentage of lorries receiving chosen at random, find the probability
more than 10.0 litres of paint. (C) that their total mass will exceed 505 g.

MULTIPLES OF NORMAL VARIABLES


We have shown previously that, for any constant a,
E(aX) ak(X)

Var(aX) a’Var(X)
384 A CONCISE COURSE IN A-LEVEL STATISTICS

Now, if X is a normal variable such that X ~ N(u, 6“7 then


aX ~ N(ap,a *0*)

: If X and Y are two independent normal variables such. that


X~N(t1, 12) ed x N(Ka, 62”), and a and 6b are any cons ee,
_ then
dpa, ott p02)
aX+bY ~ Nau+,
—aX—bY ~ N(ap,— bpp, a0)? + b70,”)

Example 7.7 If X ~ N(50, 25), find P(83X > 160).


Solution 7.7 Now E(BX) = 3E(X) = 150
Var(3X) = 9Var(X) = 225
So 3X ~ N(150, 225)
Therefore
ae area (= w eo)
15 15
= P(Z > 0.667)
= 0.2523 S.V. ee ee?
Therefore P(3X > 160) = 0.2523.

Example 7.8 If X ~ N(70,10) and Y ~ N(50, 8), find P(2X > 3Y).
Solution 7.8 We require P(2X > 3Y), i.e. P(2X—3Y > 0).
Let 4 enex —3 A = 2X—3Y
then E(AY = 2QE(X)=8EY)
= 140—150
Sr —10 0.
Var(A) = 4Var(X)+9Var(Y) °”’ Airy 2.848
= 40+72
= 112
So A ~ N(—10,112)
A—(—10)_ 0—(—10)
P(A>0) = a 2 Teaol ]

= P(Z > 0.945)


= 0.1723
Therefore P(2X > 3Y) = 0.1723.
RANDOM VARIABLES AND RANDOM SAMPLING
v 385
DISTINGUISHING BETWEEN MULT IPLES AND SUMS OF
RANDOM VARIABLES

Care must be taken to distinguish between the r.v. 2X and the r.v.
X,+X,, where X, and X, are two independent observations of
the r.v. X.

if X ~ N(u,0?) then 2X ~ N(2u, 402)


but X, +X, fine N(2p, 2a")

NOTE: the means of the two distributions are the same, but the
variances are different.

Example 7.9 If X ~ N(10,9), find (a) P(2X > 23), (b) P(X,+X,> 23) where
X, and X, are two independent observations from the population
of X.

Solution 7. Now X ~ N(10, 9).


(a) Let V = 2X, then
E(V) = E(2X) and Var(V) Var(2X)
= 2E(X) II 4Var(X)

20 36
So V ~~
N(20, 36)
and P(V> 28)
ee
i= oR
6
P(Z>0.5)
23 — 20
6 |
0.3085
Therefore P(2X > 23) = 0.3085.

(b) Let W=X,+X>.


Then
E(W) = E(X,)+ E(X,) and Var(W) Var(X,) + Var(X>)

= 2E(X) 2Var(X)
= 20 18
So W ~ N(20, 18)
W=X,+X2
W208 23720
and P(W> 23) = Hs > |
s.d. =/18
= P(Z
> 0.707)
= 0.2399 20 23
S.V. 0 0.707
Therefore P(X,+ X,> 238) = 0.2399.
TISTICS
A CONCISE COURSE IN A-LEVEL STA
386

In general, if X ~ N(u, 0) Le :
then nX ~ N(np, n’07)
bet Re Ne
between multiples
The following example illustrates the difference
and sums of random variables.

s of drinks in two sizes. The


Example 7.10 A soft drinks manufacturer sells bottle as shown in
amount in each bottle, in ml, is normally distributed
the table:

aeemek Mean (ml) Variance (ml”)


Small 252 4
Large 1012 25

the proba-
(a) A bottle of each size is selected at random. Find
the
bility that the large bottle contains less than four times
amount in the small bottle.
. Find
(b) One large and four small bottles are selected at random
in the large bottle is less than
the probability that the amount
the total amount in the four small bottles.

Solution 7.10 Let S be the r.v. ‘the amount, in ml, in a small bottle’. Then
S ~ N(252, 4).
Let L be the r.v. ‘the amount, in ml, in a large bottle’. Then
L ~ N(1012, 25).
(a) We need P(L < 48) = P(L—48 <0).
Now E(L—48) = E(L)—E(4S) (multiple of S)
E(L)—4E(S)
II 1012—1008

= 4
Var(L —4S) = Var(L) + Var(4S)
= Var(L) + 16Var(S)
lI 25+ 64

= 89
So L—4S ~ N(4, 89)
P(L—4S <0) II p(2< =)
/89

ora S.V. Se ‘
0.3358
Therefore the probability that the large bottle contains less
than four times the amount of a small bottle is 0.3358.
RANDOM VARIABLES AND RANDOM SAMPLING # 387

(b) We need P(L <S, + S,+S3+S8,4) = P(L r-a(iSpethis! St94)<G 0).


Now
E(L—(S; +... +. S4)) = E(L)—E(S)+....+9,) (sum of r.v. S)
= E(L)—4E(S)
=—l0T2—1008
= 4
Var(L—(S,+. 22--'S,)) = Var(L)
+ Var(S, +... +8,)
= Var(L) + 4Var(S)
= 25+16
= 41
Therefore LD—(S,+...+84) ~ N(4, 41)
and

O0—4
P(D—(8,+... 48) <0) hel oes
V41
= P(Z<—0.625)
= 0.266 S.V.
0
—0:625,
4
0

Therefore the probability that the large bottle contains less


than the four small bottles is 0.266.

Exercise 7c

If X ~ N(40, 12) and Y ~ N(60, 15), find single observation from the population of
(a) P(2X > 90), (b) P(4Y< 270), X is greater than two-thirds of the value
(c) P(83X—2Y < 20), (d) P[d(X+ Y)>55}. of a single observation from the population
of Y.
Le 1.52), B~ N(42, 0.37) and
~ N(85, 0.77), find (a) P(3A < 250), If X ~ N(50,16) and Y ~ N(40, 9), find
ee > 255), (c) P(3A > 6B), (a) P(2X+ Y > 120), (b) P[s(X— Y) > 0],
(d) P(2B+A > 2C), (e) PLX(A+ B) < 64], (c) P(100 < 3X— Y <130).
(f) P(A(A+ B+ C)> 70).
If X ~ N(30, 4) find (a) P(5X > 160),
The r.v. X is normally distributed with
(b) P(Y > 160) where Y= X,+...+
Xs.
mean UL and variance 6, and the r.v. Y is
normally distributed with mean 8 and
The thickness, Pcem, of a randomly
variance 0”. If the r.v. 2X—3Y is normally
chosen paperback book may be regarded
distributed with mean — 12 and variance
as an observation from a normal distribu-
42, find (a) the values of uw and O-:
tion with mean 2.0 and variance 0.730.
(b) P(X > 8), (c) (PY <9), The thickness, H cm, of a randomly chosen
(ayaa 8X2Y <i):
hardback book may be regarded as an
The r.v. X is distributed normally with observation from a normal distribution
mean 25 and standard deviation 4, the r.v. with mean 4.9 and variance 1.920.
Y is distributed normally with mean 30 (a) Determine the probability that the
and standard deviation 3, and X and Y are combined thickness of four randomly
independent. Find the probability that a chosen paperbacks is greater than the
A CONCISE COURSE IN A-LEVEL STATISTICS
388

combined thickness of two randomly (c) Determine the probability that a


chosen hardbacks. randomly chosen collection of sixteen
(b) By considering X = 2P—H, or other- paperbacks and eight hardbacks will have
wise, determine the probability that a a combined thickness of less than 70 cm.
randomly chosen paperback is less than
half as thick as a randomly chosen hard- (Give 3 decimal places in your answers. )
(C)
back. ee

MISCELLANEOUS WORKED EXAMPLES


a
Example 7.11 (a) A certain liquid drug is marketed in bottles containing
nominal 20 ml of drug. Tests on a large number of bottles
indicate that the volume of liquid in each bottle is distributed
normally with mean 20.42 ml and s.d. 0.429 ml.
(i) Estimate the percentage of bottles which would be expected
to contain less than 20 ml of drug.
(ii) Find the level to which the mean should be adjusted
(without altering the s.d.) so that only 1% of bottles should
contain less than 20 ml.

(b If the independent random variables X and Y are normally


distributed with means f;, M2 and variances 0,7, o> respectively,

state what you can about the distribution of Z = X — 4


If the capacity of the bottles in (a) is normally distributed with
mean 21.77 ml and s.d. 0.210 ml and the liquid with (unadjust-
ed) mean 20.42 ml and s.d. 0.429 ml, estimate what percentage
of bottles will overflow during filling. (SUJB)

Solution 7.11 (a) Let X be the r.v. ‘the volume in ml of liquid in a bottle’
Then X ~ N(20.42, 0.4297)

(i) P(X <20) = (*


— 20.42
0.429
P(Z <—0.979)
_20—20.42
0.429 | s.d. = 0.429

20 20.42
0.1637 S.V. —0.979 0O

Therefore 16.37% of bottles would be expected to contain less than


20 ml of drug.

(ii) We need to find p such that


P(X < 20) 0.01

Doceteee Oa t)
Pie < =
ris 0.429 see
RANDOM VARIABLES AND RANDOM SAMPLING # 389
2 a

Ne 0.429 must be negative, and by symmetry we find that


—20
0.429 amo 2e 1% s.d. = 0.429

uw = 20+(2.326)(0.429)
20 wu
is
= 21.00 S.V. 2.326 0
The adjusted value of the mean should be 21.00 ml of drug.

“(b) X~ N(u4,0,7) and Y ~ N(u2, 072).


If Z = X—Y, then

E(Z) = E(X)—E(Y) = wih


Var(Z) = Var(X)+Var(Y) = of +0/
and Z ~ N(uy— 2, 01°
+ 07’).
If X is the r.v. ‘the volume in ml of liquid’
then X ~ N(20.42,0.4297) as before
If Y is the r.v. ‘the capacity in mlof a bottle’,
then VeCAN(QT.7 70.2107)
and Xo Vie N( 20 49e— 21.77..0.4292 + 0.2107)
i.e. X TVR ANGS 1.35, 02281)
Now, the bottle will overflow if X > Y,i.e.if Xx—Y>0.

asa asst |
Age oo) Oa (sea)
P(X—Y>0)
P(Z 232.827) s.d. =1/0.2281

0.002 35

=1.35 0
8.V. 0 2.827

We estimate that 0.2% of the bottles will overflow during filling.

Example 7.12 The random variable X has a normal distribution with parameters
wand o?. Derive the mean and variance of X.

SL feet 25
You may assume that TEI e,: dt =1).
1 —oo

Ben Wedgewood and Sons in co-operation with the National Enter-


prise Commission have just developed a sophisticated new microwave
oven. The ‘in use’ lifetimes of two vital components may be con-
sidered to be random variables, such that the lifetime of the quality
S
A CONCISE COURSE IN A-LEVEL STA TISTIC
390
and standard deviation
sensitiser, x, is normal with mean 60 hours
warni ng mechanism, Y sis
5 hours and the lifetime of the overheat
devia tion 4 hours.
normal with mean 70 hours and standard
P(X >x) = 0.99?
(a) What value of x should be quoted such that
at warning
(b) The intensive inspection period for the overhe
What is the
mechanism begins at 60 hours and ends at 75 hours.
probability of the mechanism failing in this period?
= Y—X,
(c) Assuming that X and Y are independent and that W
what are E(W) and V(W)? Further, what is the probab ility that
the overheat warning mechanism lasts longer than the quality
sensitiser? (AEB 1981)

distribu-
Solution 7.12 For the derivation of the mean and variance of the normal
tion see p. 317.

Let X be the r.v. ‘the lifetime of the sensitiser in hours’. Then


X ~ N(60, 57).

Let Y be the r.v. ‘the lifetime of the overheat warning mechanism


in hours’. Then Y ~ N(70, 4’).

(a) If P(X >x) = 0.99

X—60. <x-—60
then P = 0.99
5 5

! x — 60
1.e. P\Z> = 0.99 60
SV. 2.326 0

therefore — == OZ

x = 48.37

Therefore the value of x which should be quoted is 48 hours (to the


nearest hour). \

(b)
60— =
P(60<Y<75) = P| < = ie )
4 s.d.=4
P(—2.5<Z<1.25)
0.8323 60 70 «75
S.V; 2.5 O 1.25

Therefore the probability that the mechanism will fail in the inten-
sive inspection period is 0.8323.
RANDOM VARIABLES AND RANDOM SAMPLING y 391

(c) If W = Y—X then

E(W) = E(Y)—E(X) and Var(W) = Var(Y)+Var(X)


= 70—60 = 16425
= 10 = 41
Therefore E(W) = 10 and Var(W) = 41.

Now W ~ N(10, 41).


We require P(Y
> X), i.e. P(Y—X> 0).

This1S 18is P(W > 0) = ee


reSS
A]Aled
= P(Z>—1.562)
0 10
= 0.9408 S.V. —1.562 0

Therefore the probability that the overheat warning mechanism


lasts longer than the quality sensitiser is 0.9408.

SUMMARY — SUMS, DIFFERENCES AND MULTIPLES OF


INDEPENDENT NORMAL VARIABLES

For two independent normal variables such that X ~ N(u,,0,")


and Y ~ N(t2, 07)

X+Y ~ N(u, +p, 07+ 0,7)


ee NG Cee 07)

For n independent normal variables such that X; ~ N(u,;, G;’)


eX oe ere Ng to tele Or at iOo ta ot Gy.)

For n independent observations of the r.v. X where X ~ N(u, G-).


X,+X,+...+X, ~ N(np, no?)

For the normal variable such that X ~ N(u, 0”) and for any
constant a
aX ~ N(ap,a’o?)

For two independent normal variables such that X ~ N(u,, C7)


and Y ~ N(,0,’) and for any constants a and b
aX+bY ~ N(ap, + bur, a70,2 + b70,7)
aX—bY ~ N(ap,— bpp, a70;? + b70,*)
A CONCISE COURSE IN A-LEVEL STATISTICS
392
a a EE eee
Miscellaneous Exercise 7d

The weights of grade A oranges are so that again the total life is more than
normally distributed with mean 200g 3300 hours. Explain why this answer
and standard deviation 12 g. Determine, should be different from the previous
one. (JMB)
correct to 2 significant figures, the
probability that
The weight of a large loaf of bread is a
(a) a grade A orange weighs more than normal variable with mean 420g and
190 g but less than 210g,
standard deviation 30 g. The weight of a
(b) asample of 4 grade A oranges weighs small loaf of bread is a normal variable
more than 820 g.
with mean 220g and standard deviation
The weights of grade B oranges are 10g.
normally distributed with mean 175g (a) Find the probability that 5 large
and standard deviation 9g. Determine, loaves weigh more than 10 small loaves.
correct to 2 significant figures, the (b) Find the probability that the total
probability that weight of 5 large loaves and 10 small
(c) a grade B orange weighs less than a loaves lies between 4.25 kg and 4.4 kg.
grade A orange, (C)
(d) asample of 8 grade B oranges weighs
more than a sample of 7 grade A oranges. The tensile strengths, measured in new-
(C) tons (N), of a large number of ropes of
equal length are independently and
Prints from two types of film C and D normally distributed such that five per
have developing times which can be cent are under 706N and five per cent
modelled by normal variables, C with over 1294 N. Four such ropes are random-
mean 16.18s and standard deviation ly selected and joined end-to-end to form
0.11s and D with mean 15.88s and a single rope; the strength of the combined
standard deviation 0.10s. rope is equal to the strength of the
(a) What is the probability that a type C weakest of the four selected ropes. Derive
print will take less than 16s to develop? the probabilities that this combined rope
(b) A type C print is developed and will not break under tensions of 1000 N
immediately afterwards a type D print and 900N, respectively.
is developed. What is the probability that A further four ropes are randomly selected
the total time is greater than 32.5 s? and attached between two rings, the
(c) What is the probability of a type C strength of the arrangement being the
print taking longer to develop than a sum of the strengths of the four separate
type D print? (SUJB) ropes. Derive the probabilities that this
arrangement will break under tensions of
In testing the length of life of electric 4000 N and 4200N, respectively.
light bulbs of a particular type, it is found
Find the smallest number of ropes that
that 12.3% of the bulbs tested fail within
should be selected if the probability that
800 hours and that 28.1% are still
at least one of them has a strength greater
operating 1100 hours after the start of
than 1000 N is to exceed 0.99. © (JMB)
the test. Assuming that the distribution
of the length of life is normal, calculate, The independent random variables X, and
to the nearest hour in each case, the mean, X, are normally distributed with means
UM, and the standard deviation, 0, of the My, U2 and variances on Oz respectively.
distribution. What is the distribution of the random
A light fitting takes a single bulb of this variable Y = a,X 1+ aX?
type. A packet of three bulbs is bought, Certain components for a revolutionary
to be used one after the other in this new sewing machine are assembled by
fitting. State the mean and variance of inserting a part of one type (sprotsil) into
the total life of the three bulbs in the a part of another type (weavil). Sprotsils
packet in terms of ut and o and calculate, have external dimensions which are
to two decimal places, the probability normally distributed with mean 2.50 cm
that the total life is more than 3300 and standard deviation 0.018 cm. Weavils
hours. have internal dimensions which are
Calculate the probability that all three normally distributed with mean 2.54 cm
bulbs have lives in excess of 1100 hours, and standard deviation 0.024cm. Under
RANDOM VARIABLES AND RANDOM SAMPLING # 393
suitable pressure, the two types fit 9. Next May, an ornithologist intends to
together satisfactorily if the dimensions trap one male cuckoo and one female
differ by not more than +0.035 cm. cuckoo. The mass M of the male cuckoo
Show that, if pairs of parts are chosen at may be regarded as being a normal
random, the difference random variable with mean 116g and
D= internal dimension of a weavil standard deviation 16 g. The mass F of
— external dimension of a sprotsil the female cuckoo may be regarded as
being independent of M and as being a
is distributed with mean 0.04cm and
normal random variable with mean 106 g
standard deviation 0.030 cm. Hence show
and standard deviation 12 g. Determine
that approximately 42.8% of randomly
(a) the probability that the mass of the
selected pairs will fit together satisfactorily.
two birds together will be more than
.Now, if it is known that the internal 230 g,
dimension of a given weavil is 2.517 cm, (6) the probability that the mass of the
what is the probability that a randomly male will be more than the mass of the
chosen sprotsil will fit this weavil satis- female.
factorily? (AEB 1980)
By considering X = 9M—16F, or other-
wise, determine the probability that the
mass of the female will be less than nine-
sixteenths of that of the male.
The mass of a cheese biscuit has a normal Suppose that one of the two trapped
distribution with mean 6 g and standard birds escapes. Assuming that the remaining
deviation 0.2 g. Determine the probability bird will be equally likely to be the male
that or the female, determine the probability
(a) a collection of twenty-five cheese that its mass will be more than 118 g. (C)
biscuits has a mass of more than 149 g,
(6) acollection of thirty cheese biscuits
has a mass of less than 180 g,
(c) twenty-five times the mass of a
10. A train leaves a station punctually at its
cheese biscuit is less than 149 g.
scheduled time, which is currently 0808
The mass of a ginger biscuit has a normal hours (i.e. 8 minutes past 8 a.m.). A bus
distribution with mean 10 g and standard is due to arrive at that station at 08 00
deviation 0.3 g. Determine the probability hours, but in fact its arrival time is
that a collection of seven cheese biscuits normally distributed about the scheduled
has a mass greater than a collection of time with standard deviation 5 minutes.
four ginger biscuits. Transfer from bus to train requires 1
minute. What is the probability that the
(It may be assumed that all the biscuits
bus-train connection is made?
were sampled at random from their
respective populations.) (C) It is proposed to change the scheduled
departure time of the train (it must still
be an exact minute, e.g. 0809 hours,
0810 hours). What would be the earliest
scheduled departure time in order that
In a packaging factory, the empty con-
the probability of making the bus-train
tainers for a certain product have a mean
connection should be at least 99%?
weight of 400 g with a standard deviation
of 10g. The mean weight of the contents The train travels to a junction station, its
of a full container is 800 g with a standard journey time being normally distributed
deviation of 15 g. Find the expected total with mean 15 minutes and standard devia-
weight of 10 full containers and the tion 1.6 minutes. A connecting train
standard deviation of this weight, assuming leaves the junction punctually at 08 29
that the weights of containers and con- hours. Transfer between the two trains
tents are independent. can be regarded as instantaneous. What
is the probability that the two trains will
Assuming further that these weights are connect with the original train schedule?
normally distributed random variables,
Find what departure times (exact minutes)
find the proportion of batches of 10 full
of the train from the first station will
containers which weigh more than 12.1 kg.
result in both connections being made
If 1% of the containers are found to be with probability at least 95%. Find also
holding weights of product which are less whether it is possible to arrange for this
than the guaranteed minimum amount,
deduce this minimum weight. (O &C) probability to be at least 975%. | (MEI)
394 A CONCISE COURSE IN A-LEVEL STA TISTICS

(f) to what value should the mean be set


11. The random variables X;, X2, X3and X4
so that only 0.1% of cups will overflow?
are normal, independent and identically (AEB 1987)
distributed with mean and variance O°.
The random variables Y and Z are defined
by
The random variables X;, X2 and X3 are
4

Y = 4X, and ZEB IA independent and normally distributed


i=1 with means [1, U2 and [3 respectively and
Show that Var(Y) = 4Var(Z). common variance 0°. State precisely the
distribution of X,;+ X2— X3.
The number of hours per week spent in
study by both male and female college Two types of metal bars, A and B, are
students is known to be normally distri- produced. The lengths of A bars are
buted. For male students the mean is 28 distributed normally with mean 20cm
hours and the standard deviation 6 hours, and standard deviation 0.05cm and the
with corresponding figures for the female lengths of B bars distributed normally
students being 30 hours and 4 hours with mean 30cm and standard deviation
respectively. If a random sample of 6 0.05cm. An A bar is welded to a B bar
male and 2 female students is taken, find with an overlap whose length is normally
the probability that in a given week the distributed with mean 5 cm and standard
mean number of hours spent in study by deviation 0.05 cm. The lengths of the
this sample of students will lie between welded bars must lie between 44.9 cm
25 and 31 hours. and 45.15 cm in order to be acceptable.
Calculate the probability that the number (a) Calculate the proportion of welded
of hours studied that week by the 2 bars that are unsatisfactory.
female students will differ by more than (b) If the welded bars cost 40p each to
8 hours. produce find the price that the manu-
facturer should charge in order that the
Two of the students in the sample are
twins, one male and the other female. expected profit per article should be 50p.
(c) Before testing the lengths of the
Calculate the probability that in a parti-
welded bars, two are selected at random.
cular week the female twin works less
hours than her brother. Comment briefly What is the probability that their lengths
on the assumption of independence. differ by more than 0.1 cm? (SUJB)
(AEB 1987)

X and Y are independent normally distri-


buted random variables such that X has
12. A dispenser discharges an amount of soft mean 32 and variance 25, and Y has mean
drink which is normally distributed with 43 and variance 96. Find
standard deviation 20ml. The mean (a) P(X > 43),
amount may be set to any required value. (bp P(X —Y >);
If the cups into which it is dispensed have (ce) P(2X—Y> 0). (JMB)
a capacity of 500 ml,
(a) what proportion of cups will over-
flow if the mean amount is set to 475 ml,
The times taken by two runners A and B
(b) to what value should the mean be set
to run 400 m races are independent and
so that only 0.1% of cups will overflow?
normally distributed with means 45.0s
A customer requires a double size drink. and 45.2s, and standard deviations 0.5s
If the mean is set to 475 ml what is the and 0.8s respectively. The two runners
probability of no overflow occurring if are to compete in a 400 m race for which
he there is a track record of 44.5s.
(c) uses two 500 ml cups,
(d) make two discharges into a 1000 ml (a) Calculate, to three decimal places,
the probability of runner A breaking the
cup?
track record.
If now the capacity of the cups is nor- (6) Show that the probability of runner
mally distributed with mean 500 ml and B breaking the track record is greater
standard deviation 30 ml, than that of runner A.
(e) what proportion of cups will over- (c) Calculate, to three decimal places,
flow if the mean amount discharged is the probability of runner A beating
475 ml, runner B. (JMB)
RANDOM VARIABLES AND RANDOM SAMPLING 395
THE SAMPLE MEAN

Ifx D eae ces Xn is arandom sample of size n taken from any


infinite population (or finite population if sampling is with
replacement) with mean y and variance o?, then the sample mean

_X is such that E(X) = pw and Var(X) = =


| n

Proof:

: 1
Eine B= (+ Xa... + Xn)

iL
= | (B(X1) + BK) +... FE(Xn)]

1 1
ca UseFyUoueicny1) e108 Ras
n n

Var(X) = Var(x,+X24 Ka)

1
ae [Var(X,) + Var(X2)+...+ Var(X,)]

1 ft! oO
a BO gd eetiere tO) =a (ng?)<=e
n n n

Example 7.13 The discrete r.v. X has probability distribution P(X = x) where
P(X = 0) = 0.5, P(X = at} = 0.38, P(X = 2) = 0.2. The mean up is
0.7 and the variance o? is 0.61. Random samples of size 2 are taken
from the distribution. By considering all possible samples, find the
probability distribution oethe mean X of such samples. Verify that
0
E(X) = wand Var(X) = ae

Solution 7.13 Consider the samples of size 2 from the distribution. For the sample
(2,1) say, P(X = 2):P(X= 1)= (0.2)(0.3)= 0. 06 and the sample
mean is 1.5. Summarising the results for all the samples:

(0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2,1) (2,2)


[Mean | 0 05 1 05 1 15 1 15 2
Probability|0.25 0.15 0.1 0.15 0.09 0.06 0.1 0.06 0.04
396 A CONCISE COURSE IN A-LEVEL STATISTICS

The probability distribution for X is therefore

Probability O254 Os Oeoe ome 0:04

We have

E(X) = 0+0.15+0.29+0.18+0.08 = 0.7


E(X?) = 0+ 0.075 + 0,29 + 0.27 + 0.16
0.795
Var(X) = E(X?)—E%X) = 0.795—0.49 = 0.305
o2

Since 4 =0.7 and o? = 0.61, E(X) =m and Var(X) = ve

Example 7.14 (a) For the set of numbers 1, 4, 7 find the mean yp and the variance
oO 2

(b) Draw up a frequency distribution of the means of all possible


samples of size 8, where sampling is carried out with replace-
ment. Find the mean and variance of this distribution and
comment.

Solution 7.14 (a) p=4 and o*=6 (calculator).

(b) There are three ways of obtaining a sample containing the


numbers 1, 1, 4 i.e. (1, 1, 4), (1, 4, 1), (4, 1, 1) and there are 6
ways of obtaining a sample containing the numbers 1, 4, 7. The
frequency distribution of the means of all possible samples of
size 3 is shown in the table.

Numbers in sample Frequency :

wa v

vy ey

ay
v. ~~
wv
ayv

v v

~~ ARH
pe
HPPA
PBI RPP
RAR
. YIAIAPABAKRABRH
LowW;#IwnwnNnrarr

bo
JSS
or
09
co
|6 9
RANDOM VARIABLES AND RANDOM SAMPLING 397

Now, using a calculator, we find that


Mean of sample means = 4
and Variance of sample means = 2
So, when samples of size 3 are taken,
Mean of sample means = population mean

Population variance
Variance of sample means =
3

SAMPLING WITHOUT REPLACEMENT

If Xx Xa x is arandom sample of sizen taken without


replacement from a finite population of size N with mean yu and
variance o*, then the sample mean X is such that E(X) =m and
— o*(N—n
Var(X) = — :
n\N-~ 1)

Example 7.15 Find the mean p and the variance o? of the population 1, 4, 7.
Draw up a frequency distribution of the means of all possible
samples of size 2, taken without replacement. Find the mean and
variance of this distribution and verify that
bs. oF{N=n
Var(X) = at
i Niall
]
where X is the r.v. ‘the sample mean’, N is the number in the
population and n is the sample size.
What happens as N > 0?

Solution 7.15 u=4,07=6 (calculator)

e 0.7)
|(1,4)
[Sampl 4D 47 YD (4)
2.5 4 25 5.5 4 5.5
From calculator, mean = 4, variance = 1.5.
o?(N—n 6 =|
= = (5 au with quNi= 8) n= 2.96
i=
LAMMeeNisEEA Maik 2181
= 1.5
Var(X) as required.
= me 2
Now,as N77, eh and va Se

(sampling from an infinite population).


A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 7e

For each of the following distributions, where X is the r.v. ‘the sample mean’, N
(a) find the mean wu and the variance 0°, is the number in the population and n is
(b) by taking all possible samples of size the sample size.
2
ee nO) What happens as N > ©?
2 verify that E(X) = wand Var(X) = ee

(i) P(X = 0) = 0.6,


P(X = 1)=0.3, Find the mean py and the variance o” of
PX = 3) = 0.1 the five numbers 0, 3, 3, 6, 6. A sample
(ii) P(X = 0) = 0.2, of three of these numbers is to be drawn
P(X =S1)= 0:8; at random without replacement. By
P(X =2) = 0.3, making a list of all such samples, or
P(X = 8) = 0.2) otherwise, show that the sampling distri-
(iii) P(X =— 3) = 0.4, bution of the sample mean X is given by
PGS 2) 003: the following table.
P(X S41) = 03

The discrete random variable J has the


distribution Verify that X is an unbiased estimator* of
Mand calculate the variance. If, instead,
the sample is to be taken with replace-
ment, state the value of the variance of
the sample mean. (JMB)
Find the mean, UU, and variance, 0”, of the *See p. 420, Unbiased estimator.
distribution.
Random samples of size 2 are taken from
the distribution. By considering all possible A lecturer sets her students an assign-
samples, or otherwise, obtain the proba- ment on sampling. Part of the assign-
bility distribution of the mean of such ment involves the students sampling from
samples. a population which consists of 50 peeled
pickling onions kept in a large water-
Give the mean and variance of the distri- filled bowl. The lecturer knows that the
bution of the mean of random samples of mean and standard deviation of the
size 3 from the original distribution. (O) weight of such onions is 24.5 grammes
and 7.7 grammes respectively.
One student randomly selects an onion,
Find the mean U and the variance o” of the finds its weight and then returns it to the
population 1,4,5,9. Draw up a frequency bowl. This student repeats the process
distribution of the means of all possible until he has weighed a total of. nine
samples of size 2. Find the mean and the onions. What would you expect to be
variance of the distribution formed and (a) the mean of the weights of the nine
comment on your answers. onions,
(b) the standard deviation of the mean
weight?

Find the mean and the variance o” of the Another student adopts a different proc-
population 1,4,7,8. Draw up a frequency edure and she selects nine onions at ran-
distribution of the means of all possible dom without replacing them. What would
samples of size 2, taken without replace- you expect the standard deviation of the
ment. mean weight of the nine onions to be in
this case?
Find the mean and the variance of this
distribution and verify that For each approach, what is the minimum
number of onions that have to be selected
2 =
Var(X) = a= if the standard deviation of the sample |
mean is to be less than 3 grammes? (O)
RANDOM VARIABLES AND RANDOM SAMPLING 399

THE DISTRIBUTION OF THE SAMPLE MEAN


(a) From a normal population

:A. X>,..., X,, is a random sample of size n taken from a


:normal distribution with mean yu and variance o” such that
27 Nu, 0°), then the distribution of X is also normal and
xX y Nig.-} whereX =-—- (X,+X,+...+
X,).
n n

The distribution of the sample mean (X) is known as the sampling


distribution of means and the standard deviation of this distribution
O
Lr is known as the standard error of the mean.
Jn

Example 7.16 A random sample of size 15 is taken from a normal distribution


with mean 60 and standard deviation 4. Find the probability that
the mean of the sample is less than 58.

Solution 7.16 Now X ~ N(60, 16).


= 16
So, for samples of size 15, X ~ x (60,22)
15

We require Distribution ofX

PLE ieee Xaweb0 Seo 16


( = 16/15 aes 15
= P(Z.<
— 1.986)
58 60
= 0.0264 S.V. —1.936 0
58 is
The probability
Reise metanra tepase eSis less than
the mean of the sample
thatee
0.0264.

Example 7.17 The heights of a particular species of plant follow a normal distribu-
tion with mean 21 cm and standard deviation / 90cm. A random
sample of 10 plants is taken and the mean height calculated. Find
the probability that this sample mean lies between 18cm and
27cm.

Solution 7.17 Let X be the r.v. ‘the height in cm of a plant’. Then X ~ N(21, 90).
oe 90 es
Now n= 10,so X ~ n(21,95]. ie. X ~ N(21, 9).
400 A CONCISE COURSE IN A-LEVEL STATISTICS

We require
= 18-21 X-—21 27-21
Pas <X <27) II =P ee < ——
eS 2 3
P(-1<Z<2) Distribution of X

0.8185

182 ee 27
SsVene a0 2

Therefore the probability that the mean height of the sample lies
between 18cm and 27cm is 0.8185.

Example 7.18 A large number of random samples of size n are taken from the
distribution of X where X ~ N(7 4,36) and the sample means are
calculated. If P(X > 72) = 0.854, estimate the value of n.

Solution 7.18 X ~ N(74, 36).


Therefore
36
X 2 (74
=)
n

Now Wi .P(X>72) = =P tae


= ae)
=
6A/n 6A/n

3
go p(z>—*| = 0.854

Therefore va = 1.054

n = 9(1.054)?
= 100° (3S)
Samples of size 10 are taken.

Example 7.19 (a) If X,, X,,...,X, is arandom sample from N(y, 1), state the
distribution of the sample mean X.
(b) Find the sample size required to ensure that the probability
that X is within 0.1 of yu is greater than 0.95.

Es 1
Solution 7.19 (a) X ~ N(u,1), therefore X ~ N 1.
n
RANDOM VARIABLES AND RANDOM SAMPLING 401

(b) We require n such that


P(|\X —yw|<0.1) > 0.95
ice. P04 X —p
< 0,1), > 0.95
a Mer 0.1 95%
P< > 0.95 Nt,1)
s/1/n Vin V/1]n
P(—0.1/n<Z<0.1Vn > 0.95
Now P(-1.96< Z<1.96) = 095 =a OOM Ont .90

So 0.1V/n > 1.96


1.96
Vn >
0.1
x
n > 384.16
Therefore the least sample size required is 385.

Exercise 7f

If X ~ N(200, 80) and a random sample probability that the sample mean exceeds
of size 5 is taken from the distribution, 75 is 0.282, (b) find n if the probability
find the probability that the sample mean that the sample mean is less than 70.4 is
(a) is greater than 207, (b) lies between 0.001 35.
201 and 209.
A normal distribution has a mean of 30
If X ~ N(200, 100) and a random sample and a variance of 5. Find the probability
of size 10 is taken from the distribution, that (a) the average of 10 observations
find the probability that the sample mean exceeds 30.5, (b) the average of 40
lies outside the range 198 to 205. observations exceeds 30.5, (c) the average
of 100 observations exceeds 30.5. Find n
If X ~ N(50,12) and a random sample
of size 12 is taken from the distribution, such that the probability that the average
of n observations exceeds 30.5 is less than
find the probability that the sample mean
1%.
(a) is less than 48.5, (6) is less than 52.3,
(c) lies between 50.7 and 51.7.
The r.v. X is such that X ~ N(u, 4). A
At a college, the masses of the male random sample, size n, is taken from
students are distributed approximately the population. Find the least n such that
normally with mean mass 70 kg and P(|X—pl|<0.5) > 0.95.
standard deviation 5kg. Four male
students are chosen at random. Find the 9. X is the r.v. ‘the sample mean of samples,
probability that their mean mass is less size 15, taken from N(30,18)’ and Y is
than 65 kg. the r.v. ‘the sample mean of samples, size
8, taken from N(20,16)’. Find_the_
A normal distribution has a mean of 40 distribution of (a) X—Y, (b) Xt Y,
and a standard deviation of 4. If 25 items (c) Y—X, (d)5X+8Y, (e) 4X—2Y.
are drawn at random, find the probability
that their mean is (a) 41.4 or more, 10
In a certain country the heights of men
(b) between 38.7 and 40.7, (c) less than are normally distributed with mean
39.5. 175 cm and standard deviation 5 cm and
size n are the heights of women are normally
If a large number of samples,
taken from a population which follows a distributed with mean 165 cm and standard
normal distribution with mean 74 and deviation 6 cm. Find the probability that
standard deviation 6, (a) find n if the the mean height of three women chosen
402 A CONCISE COURSE IN A-LEVEL STATISTICS

at random is greater than the mean height from each distribution. Find the proba-
of four men chosen at random from the bility that the sample from the distribu-
population. tion of Y will have a mean which is at
least 21 more than the mean of the
it The continuous random variable X is sample from the distribution of X.
such that X ~ N(20, 16). If samples of
size n are taken and X is the random
16. Every child in a class does an experiment
variable ‘the mean of the n sample values’,
which consists of measuring V, the
find the least value of n such that
volume of water displaced by a solid
P(X > 21) < 0.05.
sphere. The children’s values of V are
distributed approximately normally with
12. A random sample Xj, X2 is drawn from a mean 27.4cm* and standard deviation
distribution with mean yu and standard lem’.
deviation 0. State the mean and standard
(a) Given that the nominal volume Vo of
deviation of the distribution of (a) X; + Xp, the sphere is 27.1 em?, estimate to 2
(b) X;—X2, (c) X. decimal places the probability, p, that the
A student’s performance is equally good value of V of a child chosen at random
in two subjects. The marks he might be ' exceeds 1.05 Vo.
expected to score in each subject may be (b) The nominal radius ro of the sphere, ,
treated as independent observations calculated from the formula rp = (3V9/47)3,
drawn from a normal distribution with is 1.86 cm. Each child calculates a value
mean 45 and standard deviation 5. Two r for the radius of the sphere, using the
procedures might be used to decide formula r = (8V/47)3. Explain why you
whether to give the student an overall would expect the probability that a value
pass. One is to demand that he pass of r of a child chosen at random exceeds
separately in each subject, the pass mark 1.05rg to be different from the value of p
being 40; the other is to require that his you obtained in (a). Show that, in fact, it
mean mark in the two subjects exceeds is extremely unlikely that any child’s
40. Find the probability that the student value of r exceeds 1.05ro.
will obtain an overall pass by each of (Hint: express r > 1.05ro in terms of V
these procedures. (O) and Vo.)
(c) The measured values of V obtained by
13. In a certain nation, men have heights
a second class of children are also distri-
distributed normally with mean 1.70m
buted approximately normally with the
and standard deviation 10cm. Find the
same mean as the first class, but with a
probability that a man chosen randomly
standard deviation of 1.5 cm”. One child
has height not less than 1.83 m.
from each class is chosen at random.
What is the probability that the average Estimate to 2 decimal places the proba-
height of three men chosen randomly bility that the mean of their values of V
is greater than 1.78 m and the probability exceeds 1.05 Vo. (MEI)
that all three will have heights greater
than 1.83 m?
For the nation, women have heights 17. (a) If X and Y are independent random
distributed normally with mean 1.60m variables with means Mx, My and variances
pee 4 ;
and standard deviation 7.5 cm. Find the Ox", Oy respectively, show from. first
probability that a husband and wife have principles that the mean and variance of
not more than 5 cm difference in heights aX+ bY are au,+ buy and a’o,/+ bo,"
and state the assumptions that you have respectively where a and 0 are constants.
made in the calculation. (MEI) (6) The diameters x of 110 steel rods were
measured in centimetres and the results
14. X, and X> are random variables such that were summarised as follows:
X;,is normally distributed with mean 120 DoT, 86 Butea tae Le. 0.
and variance 8 and Xz is normally dis-
tributed with mean 150 and variance 22. Find the mean and standard deviation of
A random sample of size 20 is taken from these measurements.
the distribution of 3X,+ 4X). Find the Assuming these measurements are a
distribution of the sample mean. sample from a normal distribution with
this mean and this variance, find the
15. Random variables X and Y are such that probability that the mean diameter of
X ~N(100,10) and Y ~ N(120, 20). a sample of size 110 is greater than
Random samples of size 50 are taken 0.345 cm. (O&C)
RANDOM VARIABLES AND RANDOM SAMPLING i 403

18. The number of miles travelled per week average number of miles travelled per
by a motorist is distributed normally witha week over a complete year of 52 weeks
mean of 640 and astandard deviation of 50. will exceed 650.
(a) Calculate the probabilities that in a (c) If the car’s petrol consumption is 30
week he will travel (i) more than 600 miles per gallon, calculate the probability
miles, (ii) between 600 and 700 miles. that the motorist will use less than 80
gallons of petrol over a period of 4 weeks.
(6) Calculate the probability that the (JMB)

THE DISTRIBUTION OF THE SAMPLE MEAN (continued)

(b) From any population

The central limit theorem

if X,, Xo... SA, 18a random sample of size n from any distribu-
tion with en wand variance oa” then, for large n, the distribution
0e
ofthe sample mean (X) is oe oe and x~ NI eemo

pa? Se ad
ma

So)

NOTE: the approximation gets better as n gets larger.

! 2 2
st i 0 = 29
Now if X ~ N then nX ~ {mu ="
n

But nX = Xj+X,+...+X,
therefore A feet .. +X, ~ N(np, no?)

fx, %... ~x, 8a random aPC of size n from any distribu-


tion with mean p and variance 0 ? then, for large n, the distribution
of the sum of the anno variables is approxinetey normal with
mean ny and variance no’.

The definition in this form is also referred to as the central limit


theorem.

Example 7.20 If a random sample of size 30 is taken from each of the following
distributions, find, for each case, the probability that the sample
mean exceeds 5.

(a) X ~ Po(4.5), (b) X ~ Bin(9, 0.5), (c) X ~ R(3, 6).


404 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 7.20 (a) X ~ Po(4.5), so E(X) = 4.5 and Var(X) = 4.5.


The sample size is large, so by the central limit theorem
2 4.5
Ra n (4.8, | approximately
30

The standard deviation of X is = = /0.15.

So P(X>5)= as > a
= P(Z>1.291)
= 0.0983. S.V. 0° 1.291

So, if X ~ Po(4.5), then P(X > 5) = 0.0983.

(b) X ~ Bin(9,0.5)
so
E(X) = (9)(0.5) = 4.5 and Var(X) = (9)(0.5)(0.5) = 2.25

a2 ZnO
Now, by the central limit theorem, X ~ N{4.5, oe :

ae te 2.25
The standard deviation of X = “30° =4/0.075.

e3 K-45 5-45 s.d. =+/0.075 -


So PX >s) = P| mS ]
V/0.075 +/0.075
= P(Z> 1.826) 45 5
0.0340 SSA 0 1.826

Therefore ifX ~ Bin(9, 0.5) then P(X > 5) = 0.034.

(c) X ~ R(3,6).
Then
Ceo)
E(X) = 5(3+6) = 4.5 and Var(X) = a = 0.75

wa 0.75
So xOoeN SO by the central limit theorem

i.e. X ~ N(4.5, 0.025)


bee X — 4.58 eDar 4.5 ——
Now P(x >>) ="? |e =/0.025
s.d.
( ) ae —
= P(Z
> 3.162)
45 5
= 0.000 783 S.V. 0 3.162

Therefore if X ~ R(8,6) then P(X > 5) = 0.000 783.


RANDOM VARIABLES AND RANDOM SAMPLING # 405

Example 7.21 Ifa large number of samples of size n are taken from Po(2.5) and
approximately 5% of the sample means are less than 2.025, estimate
n.

Solution 7.21 If X ~ Po(2.5) then


E(X) b= 2.5
Var(X) Coes
So, by the central limit theorem

~ Cc
ix n(x} approximately

: 2 2.0
i.e. xa N[25,|
n

Now, we require n such that P(X < 2.025) = 0.05

i.e pS Se
Ja5 in aes
Vain == 0.05
(0 Ee)
s.d. =./2.5/n

p(e<- S58 = 0.05 2.025 2.5


J25/n} S.V. —1.645 0

Now, from tables P(Z <— 1.645) = 0.05


g 0.475 1.645
Oo SS = E
/2.5/n
2 ae 1.645y
1.64
2.5:
0.475

= 5.476

sO n = 29.98

Therefore an approximate value for n is 30.

Exercise 7g

1. To find the mean life and the standard 2. Ifa large number of samples, size 30, are
deviation of a particular make of fluores- taken with replacement from the following
cent light bulbs a large number of samples distribution, find the mean and standard
of 100 bulbs are tested. The mean and deviation of the sampling distribution of
the standard deviation of the resulting means. Estimate the probability that a
sampling distribution of means were sample mean exceeds 4.
found to be 1580 hours and 120 hours,
respectively. Calculate the mean life and
lx | 0 oo 3. Au 6406
the standard deviation of this make of fst G0n16 227) 2116 5
light bulbs.
406 A CONCISE COURSE IN A-LEVEL STATISTICS

A random sample of size 100 is taken standard error of the sampling distribu-
from Bin(20, 0.6). Find the probability tion obtained were 20 500 km and
that (a) X is greater than 12.4, (0) X is 250 km respectively. Estimate the mean
less than 12.2, where X is the sample life and the standard deviation of this
mean. brand of car tyre.

12. The standard deviation of the masses


The heights of a new variety of sun- of articles in a large population is
flower are normally distributed with 4.55 kg. If random samples of size 100
mean 2m and standard deviation 40 cm. are drawn from the population, find the
100 samples of 50 flowers each are probability that a sample mean will
measured. In how many would you differ from the true population mean by
expect the sample mean to be (a) greater less than 0.8 kg.
than 210cm, (b) between 195cm and
205 cm, (c) less than 188 cm? 13. The lifetime, X, in hours of an electrical
component is modelled by the following
A random sample of size 30 is taken probability density function.
from Po(4). Find (a) P(X <4.5),
(b) P(X > 3.8), (c) P(3.8 =< X <5).
f(x) = ae 0<x<10
If a large number of samples, of size n, 0’ elsewhere
are taken from Po(4.6) and approx-
imately 2.5% of the sample means are (a) Show that a= 20 and sketch the
less than 4.005, estimate n. graph of y = f(x).
(b) Find the mean and variance of X,
If a large number of samples of size n each correct to three significant figures.
are taken from Po(2.9) and approx- (The substitution p =x+10 will help
imately 1% of the sample means are with the integrals.)
greater than 3.41, estimate n. (c) What is the probability that a random
sample of 80 components has a mean life
If a large number of samples of size n of more than 4 hours? (SUJB)
are taken from R(2,30) and approx-
imately 80% of the sample means are 14. A continuous random variable X has
less than 17.15, estimate n. probability density function given by

OR a
If a large number of samples of size n
f{ay= (axe 42= 0
are taken from Bin(20, 0.2) and approx-
ax, O<Sx<2,
imately 90% of the sample means are
On eet
less than 4.354, estimate n.
where a is a constant. Sketch the graph
10. If a large number of samples of size n of f and hence, or otherwise, find the
is taken from R(2, 6) and approximately value of a.
1% of the sample means are less than
Show that Var(X) = 2.
2.8, estimate the value of n.
A random sample of 200 independent
11. To find the mean life and standard observations of X is taken. Using a
deviation of a certain brand of car suitable approximation, find the proba-
tyres a large number of random samples bility that the sample mean exceeds 0.2.
of size 50 were tested. The mean and (C)

THE DISTRIBUTION OF THE SAMPLE PROPORTION

Consider a binomial population in which p is the proportion of


‘successes’, then the probability of success in any trial is p.
If a random sample of size n is taken from this population and X
is the r.v. ‘the number of successes’,
then X ~ Bin(n, p)
and for large n, X ~ N(np, npq) approximately, where q = 1—p
(see p. 355).
RANDOM VARIABLES AND RANDOM SAMPLING i 407

Now, if P, is the r.v. ‘the proportion of successes in a sample’,


xX
then Ptea—
n

xX

n bq

and *

xX 1 aT
vere) var(*)
n
pane Va)
n
= (19g)
n
= a

NOTE: the larger the sample size, the better the approximation.
The distribution of P, is known as the sampling distribution of
proportions and the standard deviation of the sampling distribution
Pq
— is known as the standard error of proportion.
n

NOTE: when considering the normal approximation to the bi-


nomial distribution, a continuity correction of +3 is used.
1
Now, since P, = —, we use a continuity correction of acre
n n

Example 7.22 It is known that 3% of frozen pies arriving at a freezer centre are
broken. What is the probability that, on a morning when 500 pies
arrive, (a) 5% or more will be broken, (b) 3% or less will be
broken?

Solution 7.22 Let P, be the r.v. ‘the proportion of pies in the sample which are
broken’.
Then

bq :
Pe N [p24] approximately, wheren =i 500,p az
= 0.03
n

. n(0ps eon
Be : apo Te
i.e. P, ~ N(0.03, 0.007 637)
408 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) We require
1 ;
P(P, 2 0.05) > lp> 0.00mi EE (continuity
a ; (2)(500) correction)

ce P,—0.038 2 (08
~ ~ \0.00763 0.007 63
II P(Z > 2.49) s.d. = 0.007 63

0.006 39
0.03 0.049
S:V. 0 2.49

the probability
e ee
Thereforee that 5% or more will be broken is 0.006 39.
a a aie epeeei

(b) We require

P(P,< 0.03) > lp,<0.03 + =a! (continuity


(2)(500) correction)

P,—0.03 _ (0.03 + 1/1000)—0.03


eae 63 0.007 63
ll P(Z < 0.131) s.d. = 0.007 63
0.5521
0.03 0.031
S.V. 0 0.131

Therefore the probability that 3% or less are broken is 0.5521.


ieee a ee eee

We now show another method of approaching this problem.


Let X be the r.v. ‘the number of broken pies in a sample’.
Then X ~ N(np, npq) where n = 500,p = 0.03, q = 0.97.
So X ~ N(15, 3.8147).

(a) Now 5% of 500 = 25.


So we require
P(X > 25) 7 P(X > 24.5) (continuity correction)
a . a s.d. = 3.814
3.814 3.814
P(Z > 2.49)
15 24.5
= 0.006 39 S.V. 0 2.49

The probability that 5% or more will be broken is 0.006 39, as


before.
RANDOM VARIABLES AND RANDOM SAMPLING 9 409

(b) 3% of 500 = 15.

So we require

P(X S15) > P(X < 15.5) (continuity correction)

A ied ie 2b. Oe
= aoe “Semen aoet s.d. = 3.814
3.814 8.814

= P(Z<0.181) 15 155
S.V. 0 0.131
= 0.5521

The probability that 3% or less will be broken is 0.5521, as before.

NOTE: problems of this type may be solved by considering the


distribution of the sample proportion, P, or by using the normal
approximation to the binomial distribution. If the continuity
corrections are used in both cases, or omitted in both cases, the
standardised values will agree exactly.

Exercise 7h

2% of the trees in a plantation are known 5. Three-quarters of the houseowners in a


to have a certain disease. What is the proba- particular area own a colour television set.
bility that, in a sample of 300 trees (a) less Find the probability that at least 73 of a
than 1%, (b) more than 4% are diseased? random sample of 100 houseowners in the
Py area own a colour television set.

A fair coin is tossed 150 times. Find the


probability that (a) less than 40% of the A die is biased so that 1 in 5 throws
tosses will result in heads, (b) between results in a six. Find the probability that,
40% and 50% (inclusive) are heads, when the die is thrown 300 times, (a) more
(c) more than 55% are heads. than 70 throws will result in a six, (b) at
least 70 throws will result in a six, (c) less
than 57 throws will result in a six.
A fair coin is tossed 300 times. Work
through parts (a), (0), (c) as in Question 2.
Why are the results different?
70% of the strawberry plants of a particular
variety produce more than 10 strawberries
Mr Hand gained 48% of the votes in the per plant. Find the probability that the
District Council Elections. What is the random sample of 50 plants in my garden
probability that a poll of (a) 100, (6) 1000 consist of more than 37 plants which
randomly selected voters would show over produce more than 10 strawberries per
50% in favour of Mr Hand? plant.
410 A CONCISE COURSE IN A-LEVEL STATISTICS

SUMMARY — THE SAMPLE MEAN AND THE SAMPLE PROPORTION

Distribution of the sample mean (X)


If X,, X>,...,X, is arandom sample of size n taken from
a normal distribution such that X ~ N(u, 0”) then
On Fe gy ee
XK ~ NH.=
xX h
where qo —=p) xX;i

The central limit theorem:

For large n, the result holds for a random sample taken from
any distribution.

Distribution of the sample proportion (P,)


Pq
For large n, Pays np,2

where p is the proportion of successes in the population,

= 1 — Pp,

n is the number in the sample.

RANDOM SAMPLING

If we are to select an item at random from a population then we


must ensure that each item in the population has an equal chance of
being selected.
To obtain a random sample of n items we repeat n times the pro-
cedure for selecting one item. However, each selection must be
independent of any other.

Example 7.23 Discuss how to select, at random, a sample of two people from a
group of six.

Solution 7.23 Write the name of each person on one of six otherwise identical
discs and mix them thoroughly in a hat. Without looking, select a
disc, note the name and return it to the hat. Draw again. If the first
name re-appears, disregard it and repeat the procedure until a
different name appears. The sample of two people is then obtained.
An alternative method might be to allocate to each person one of
the numbers 1, 2, 3, 4, 5, 6 and then select the people correspon-
ding to the numbers obtained on a die when it is thrown twice, for
example (3, 5).
If the population is large then the method of ‘drawing out of a hat’
is obviously not practical. We can however allocate a number to
RANDOM VARIABLES AND RANDOM SAMPLING s 411
each item and make the choice by referring to Random Number
Tables, shown on p. 629. If you have a random number generator
on your calculator you will be able to produce a random
3-digit number every time you press it.

NOTE: Most random number tables are computer-generated. These


numbers and the numbers produced on your calculator are known
as ‘pseudo’ random numbers. However, they suit our purposes very
well indeed.

RANDOM NUMBER TABLES

Random number tables consist of lists of digits 0,1, 2,..., 9 which


are such that each digit has an equal chance of appearing at any
stage. So each digit has a probability of 76 of occuring.

In random number tables the digits may be listed individually, or


grouped in some way. This is solely for convenience of printing.
Here are some examples:

List (a) 6 68 th Z 5 3 68 1 Deo


2 5 3 4 7 0 5 Ai VO. @5
Ser LORS d er aa 0 5

List (d) 52. 74 54 80. 68 72 51 96 08 00


Ogi en 005-03 60 ba ued 44
List (c) 848051 386103 153842
242330 580007 479971
These tables may be used to represent any number, discrete or
continuous.

Example 7.24 Using random number tables, select at random a sample of 8 people
from a group of 100.

Solution 7.24 Allocate a two-digit number to each person, for example 01 for the
first on the list, 02 for the second, ... to 98, 99, 00 (calling the
hundredth person 00, for convenience).

Using list (a) above, we might select people corresponding to the


following numbers:

Ssmieizuoo. SIb1 59525 34 70


A CONCISE COURSE IN A-LEVEL STATISTICS
412

Example 7.25 Choose 8 people from a group of 60.

then disregard any


Solution 7.25 Allocate each person with a number 01 to 60,
number outside this range. Using list (a)

68 7 538 Bt 59 25 34 W 54 9
32 68 FF AT 05

So the people chosen will correspond to the numbers

53, 59, 25, 34, 54, 32, 47, 05.

Example 7.26 Take a random sample of 12 numbers (to 2 d.p.) from the con-
tinuous range 0 <x <10. |

Solution 7.26 We require the sample values to have 2 d.p. accuracy so we will
need to consider groups of 3 digits, inserting the decimal point
between the first and second digit. Using list (b) on p. 411.

5.27, 4.54, 8.06, 8.72, 5.19, 6.08,


0.00, 2.52, 0.99,-3.60, 4.35, 7.42

Example 7.27 Take a random sample of 4 numbers (to 3 d.p.) from the con-
tinuous range 0O<x <5.

Solution 7.27 Using list (c) on p. 411 and disregarding any values out of range,
we have

8480 5438 6403 1.5388 4.224 2:330 5800 0.747

So the numbers chosen are

1.538, 4.224, 2.330, 0.747.

SAMPLING FROM GIVEN DISTRIBUTIONS

(a) Frequency distributions

Example 7.28 Take a random sample of size 5 from the following distribution,
using the random numbers 364294 588330 923918 400300.
RANDOM VARIABLES AND RANDOM SAMPLING
7 413

Solution 7.28 Consider first the cumulative frequencies and then transfer them to
proportional frequencies with a total proportion of 1. Random
numbers can then be allocated in accordance with the cumulative
proportional frequencies as shown:

Since the proportional frequencies are all given to 2d.p., we


consider 2-digit random numbers. Note that 00 was allocated to
the x-value of 4 for convenience.
Random numbers: 36, 42;-94,- 58; 83
Sample value: Dt 2h Oe
So a random sample of size 5 taken from the distribution gives
sample values 2, 2, 3, 3, 4.

(b) Probability distributions

Example 7.29 A discrete random variable X has probability distribution

Generate a random sample of size 10 from the distribution, using


the random numbers 38, 7, 4, 7, 6, 5, 3, 3, 9, 0.

Solution 7.29 Form the cumulative distribution function F(x) and then allocate
random numbers in a convenient way:

Taking 10 sample values, using the random numbers given, we have


Random number: Bae ly 10,8, sone, Bee OD
Sample value: fae Glee) 1. 38, 3
A CONCISE COURSE IN A-LEVEL STATISTICS
414
the
NOTE: We could have decided on a different allocation of
random numbers, for example

a ae
Corresponding
random numbers

In this case, the sample generated would have been


Random number: 3°17) AsienS; Baus. 0

Sample value: Dy 12, RODMa 12 MICE TO

NOTE: When sampling from a given p.d.f. remember that every


member of the population must have an equal chance of being
selected. In each case, work with the cumulative distribution
function F(x). When we know F(x) it is easy to allocate the random
numbers.

Example 7.30 Take a random sample of four from a binomial distribution with
parameters n = 4 and p = 0.2, using the random numbers 2811,
5747, 6157, 8988.

Solution 7.30 X ~ Bin(4, 0.2). Since the given random numbers have 4 digits, we
will work to 4 d.p.

Cumulative distribution.
function, F(x)

P(X = 0) = (0.8)* = 0.4096 F(0) = 0.4096


P(X = 1) = 4(0.8)3(0.2) = 0.4096 | F(1) = 0.8192
P(X = 2) = 6(0.8)*(0.2)? = 0.1536| F(2) = 0.9728
P(X = 3) = 4(0.8)(0.2)? = 0.0256 | F(3) = 0.9984
P(X = 4) = (0.2)* = 0.0016 F(4) = 1 (as expected)

NOTE: We could have used the cumulative binomial probability


tables to calculate to values of F(x).

Putting these results in table form, together with the corresponding


random number allocation, we have:

i [1 )tere ea | at
Corresponding 0001 | 4097 8$193/)~ 9729
random to to to to
numbers 4096 | 8192 | 9728 | 9984
RANDOM VARIABLES AND RANDOM SAMPLING , 415
The given number 2811 is in the range 0001 to 4096 and corres-
pondsto x =0.

Similarly 5747 corresponds to x = 1,


6157 corresponds to x = 1,
and 8988 corresponds to x = 2.
So the random sample of four is 0, 1, 1, 2.

Example 7.31 Using the random number 8135 take a single random observation
from a Poisson distribution with parameter 3.
x

Solution 7.31 x~ Po(3), so that P(X =x) =e 3 St aenO gl.


x!
Let p, = P(X =x) and using the recurrence formula

p ae mete p writing values to 4d.p. but retaining


i x+1°* all figures in the calculator:

Cumulative distribution
function, F(x)

Po=e °= 0.0498 F(0) = 0.0498


Pi = 2py = 0.1494 F(1) = 0.1991
Pz = 3p, = 0.2240 F(2) = 0.4232
Pa spy = 0.2240 F(3) = 0.6472
P4— 4P3 = 0.1680 F(4) = 0.8153
Ps = 2p4 = 0.1008 F(5) = 0.9161
= 6Ps= 0.0504 F(6) = 0.9665
Py =?p,= 0.0216 F(7) = 0.9881
The probabilities are now very small, so we end with
P(X> 8) =1—F(7) = 1—0.9881 = 0.0119
and the corresponding distribution function has the value 1.
Arranging these results in a table, we have

Corresponding

0 0001 to 0498
1 0499 to 1991
2 1992 to 4232
3 4233 to 6472
4 6473 to 8153
5 8154 to 9161
6 9162 to 9665
ft 9666 to 9881
8 or over 9882 to 9999
and 0000

The given random number 8135 is in the range 6473 to 8153, so


the random observation corresponds to x = 4.
A CONCISE COURSE IN A-LEVEL STATISTICS
416
of size
Example 7.32 Using the random numbers 723, 850, take a random sample
two from the continuous distribution whose p.df. is f(x) where
f(x) = 8x? (0<x <2)

Solution 732 The cumulative distribution function is given by

F(x) = | 3x?
dx
x

x
8

Now, we use the given random numbers in the following way.


If F(x) = 0.723, then
x3
— = 0.723
8
and x = »/8(0.723) = 1.80 (2dp.)
and if F(x) = 0.850, then
x3
= 0.850
8
and x = \/8(0.850) = 1.89 (2dp.)
So the two random observations are x = 1.80 and x = 1.89.

Example 7.33 Use the random numbers 382 824 to take a random sample of two
from the normal distribution N(30, 4).

Solution 7.33 X ~ N(380, 4).


The cumulative distribution function is given by ®(z) where
x= 30
a=

2
NOTE: ®(z) =1—Q(z). @(z) = 0.382

Now, if ®(a) = 0.382


then P(Z <a) = 0.382. S.V. —0.3 0
and a = —0.3 29.4 30
RANDOM VARIABLES AND RANDOM SAMPLING ¥ 417

Therefore ttle = aes

ns 30—0.6>= 29.4 panos 2


If (a) = 0.824, then
a 0.931
S.V. 0 0.931
Nie) -
Therefore 0.931 30 31.862
Z
x 30+1.862 = 31.862 = 31.9 (1d.p.)
So the two random observations are 29.4 and 31.9.

Exercise 7i

In the following, use the random number numbers and call the first two digits x
tables on p. 629 if random numbers have and y. Let z=10x+y. If 1<2z2<58
not been given in the question. then the person who was allocated the
number is selected. Otherwise, the person
1. Select a random sample of size 10 (to allocated the number z— 58 is selected.
3 d.p.) from the continuous range Comment on this method of selection.
2 = x= oO:
6. Take a random sample of size 6 from the
Draw up a random sample of 100 num- distribution:
bers from the discrete integer range 0 to
9. Find the mean and variance of the
sample values and compare them with the
theoretical mean and variance.

The discrete random variable X has


7. Take arandom sample of size 3 from the
probability distribution
distribution:

Simulate a sample of size 12 from the


distribution of X. Compare the mean and 8. Take a random sample of size 10 from
variance of this sample with E(X) and each of the following probability distri-
Var(X). butions. In each case, find the sample
mean and variance and compare with
The discrete random variable X has E(X) and Var(X).
distribution function F(x) = 4(x — 2),
x = 8,4,5,6. Using random number
tables, generate 10 observations of X, () P(X=x) |0.11 0.2 0.45 0.24
showing your working clearly.
Describe how you would select a random
sample of 30 pupils from a school contain- (0) 01-—02-—-0.3
ing 850 pupils. P(X =x) |0.175 0.214 0.329
You wish to select a person at random
from a group of 58 people. The following
procedure is suggested: P(X =x) |0.165 0.117
Allocate the numbers 1 to 58 to the
people. Choose a line ina table of random (c) P(X =x) = kx, « = 0/1) 253:
418 A CONCISE COURSE IN A-LEVEL STATISTICS

Take a random sample of size 5 from the (b) Using the table of random numbers
distribution of X where F(x) = 5x, provided, simulate the number of calls
arriving at the switchboard in 30 con-
x = 2,3, 4, 5.
secutive minutes. Indicate precisely how
10. (a) The discrete r.v. X is such that your values have been obtained.
X ~ Bin(3, 0.4). Take a random sample (c) Calculate the sample mean number
of size 5 from this distribution, using the of calls per minute. Given that the mean
random numbers and standard deviation of the number of
407 315 401 203 972 calls per minute obtained from the table
are 1.345 and 1.087 respectively, calculate
(b) Using the random number 6143 take the probability of a random sample of 30
a single random observation from the giving a mean value of at least that
Poisson distribution with parameter 4. obtained from your sample. (SUJB)

11. Using the random numbers 267 394


16. The digits 8453276 are obtained froma
018 take a random sample of size 3
table of random digits. Use them to
from the normal distribution with mean
obtain a random observation from each
35 and variance 9. of the following distributions:
12. Using the random numbers 2654 (a) the number of the winning ticket in
9342, make two random observations a lottery in which there are 500 ticket
from each of the following distributions: numbers from 1 to 500 and every ticket
(a) The number of seeds that germinate has the same chance of being selected.
in a group of 5 selected at random, given (b) the number of babies born in a
that 75% are expected to germinate. cottage hospital in a week, assuming that
(b) The number of goals in a football on average one baby is born every 3 days
match, where the number of goals follows and that births are independent (and
a Poisson distribution with variance 2.4. ignoring the possibility of multiple births),
(c) The mass of a bag of sugar, where the (c) the time between successive emissions
mass is normally distributed with mean of a particle from a radioactive substance,
1010 g and standard deviation 4.5 g. assuming that the probability density
function of this time is 2e 24(t > 0). (O)
13. Using the random number 256 construct
a random observation of the continuous 17. You are given the random number 431.
r.v. X where Use this number to obtain a sample
observation from
(a) F(x)
=5x’, 0 (a) a Binomial distribution with n= 12
(b) f(x)
=7sx9, 1 and p= 0.4.
(b) a Normal distribution with mean 6.2
14. Take 20 samples, each of size 2, from the and standard deviation 0.1.
following distribution: You are expected to explain clearly how
you obtain the sample observations. (O)

18. The 25 members of a City Council were


asked to record over a twenty day period
the number of days on which they made
Calculate the mean of each sample and a journey by public transport. The results
find the mean and variance of the sample are given below, c indicating that the
means. Find the mean and variance of the councillor was a car owner.
original distribution. Comment.

15. The following table gives the frequency


distribution of the number of telephone
calls per minute received over a period
of 2400 minutes at the switchboard of a
solicitor’s office.
(a) Calculate the arithmetic mean and
No. of calls 0 a 2) 3 ZA)
the standard deviation of the population.
592 844 602 269 91 2 (b) Explaining fully the procedure you
have followed use the extract from a
(a) Convert the frequencies to probabili- table of random sampling numbers at the
ties working correct to 4 decimal places. end of the question to
Hence draw up a table of cumulative (i) take an unrestricted random sample
probabilities. (i.e. allow the same person to
RANDOM VARIABLES AND RANDOM SAMPLING # 419
be chosen more than once) of size 5 5 by taking a simple random sample of
from the population. Calculate the size 3 from the cat owners and one of
sample mean and state its standard size 2 from the rest.
deviation.
Rank the three methods for estimating
(ii) take a simple random sample (i.e.
the mean in order of preference, explain-
do not allow the same person to be
ing your choice.
chosen more than once) of size 5
from the population. Calculate the Extract from table of random
sample mean and state its standard sampling numbers
deviation.
(c) A councillor suggests that an alterna- ) 2828 00920 61841 64754
: ie eee ee °
: S : 94342 91090 94035 02650 36284 91162
tive way of estimating the population
mean would be to make up the sample of (AEB 1987)
ee ee Oe ee ee ee i ®
ESTIMATION OF
POPULATION
PARAMETERS
Suppose that a population has an unknown parameter, such as the
mean, or the variance, or the proportion of ‘successes’. Then an
estimate of the unknown parameter can be made from the informa-
tion supplied by a random sample (or samples) taken from the popu-
lation.
A statistic used to estimate the value of a parameter is called an
estimator and it is denoted by a capital letter (e.g. U, T, ...). The
numerical value taken by the estimator in a particular instance is
called an estimate and is denoted by a small letter (e.g. u, t, ...).

POINT ESTIMATION — UNBIASED ESTIMATOR

Consider a population with unknown parameter 0.


If U is some statistic derived from a random sample taken from
the population, then Uis an unbiased estimator for 0 if
E(U)
= 0

There are many estimators which could be formed, but the best (or
most efficient) estimator is the one which (i) is unbiased, and
(ii) has the smallest variance.

Example 3.1 If X,, X2, X3 is a random sample taken from a population with
mean um and variance o?, find which of the following estimators for
we are unbiased, and which is the most efficient of these.
X,+X,+X; X,+2X, X,+2X,+ 3X;
Ty iT
aguante: 12:Sana ae eaeaaa
420
ESTIMATION OF POPULATION PARAMETERS

Solution 81 Now
E(X;) = w for i = 1,2,3
X,+X,+
So ECT \ i= g(t
1
= yj LEK) + E(X2) + E(X3)]

3 K)

=p
As E(T,) = yw, T, is an unbiased estimator for p.

Aah 2X
Now Ry ae
1
ty [E(X))+ 2E(X)]

akesiewv
3 et git
On
4
As E(T) = p, T> is an unbiased estimator for yu.

X,+2X,+ 3X;
Now E(T3) = E|—————>
3

ae1 [E(X,) + 2E(X,)+ 3E(X3)]

1
= pe et)

As E(T3) # yu, T3 is not an unbiased estimator for y.

The more efficient estimator is the one which has the smaller
variance.
Xie ks
Now Var(T,) = Ns Wii tern

lI :[Var(X,) + Var(X2) + Var(X3)]


30?
9
422 A CONCISE COURSE IN A-LEVEL STATISTICS

*~ a
and Var(T,) = Var
3

;[Var(X,) + 4Var(X>)]

5o?
9
As Var(T,) < Var(T>), T; is a more efficient estimator for uw than
fis

NOTE: the sample X,, X2, X3 is made with replacement and the
observations are independent.

Example 8.2 Two random samples of sizes n and 3n are taken from normal
populations with means pu and 3y and variances o” and 307 respec-
tively. If X, and X, are the sample means, show that the estimator
aX,+ bX, is an unbiased estimator for uw ifa+ 3b = 1. Also, find
the values of a and 0 if this estimator is to be the most efficient
estimator.

= Po ao
Solution 8.2 E(X,) = @ and Var(X,) = —
n
‘a: 3a02 o2
_
E(X,) Gapspiaands Var(X,) ies =
3n n

Therefore E(aX,+0X,) = aE(X,)+ bE(X,)


= aut 3bu

= u(a+ 3b)

The estimator is unbiased if E(aX,+ bX») =n,


i.e. the estimator is unbiased ifa+3b=1.

fet
Also

Var(aX,+bX>) = a*Var(X,) + b*Var(X>)

n n
GO 2
= (a 40.)
n
Buta+
3b =1.
ESTIMATION OF POPULATION PARAMETERS # 423
So

et oO 2
Var(aX,+bX,) = —[(1—8b)?+ b?]
n
o?

= —(1—6b +100?)
n

Lee
E ahai00|
al leting the
COmpleting th square
The minimum variance occurs when b = 3.

When b = 5, a=1— 3(4) =

The most efficient estimator is the estimator which is See aM


which has the minimum variance. To satisfy this, a = o and b = 3.

The most efficient estimator for p is g(x fox):

Example 383 If X,,X>,,...,X,,is a random sample taken from a population with


mean p and variance o”, prove that T= k,X,+k,X,+...+k,X,p is
n

an unbiased estimator for pw provided that De k;=1.


i=]

Solution 83 Now E(X;) = w for t= 1,2)... 5n


So E(T) Eki kat RoAo1 et hap)
= k, E(X,)+k,E(X2)+...1k,E(X,)
Rip Ru
Rol +...
eM (eye
ey teat Peat
The estimator is unbiased if E(T) =
Now if (k, +k,+...+k,) = 1, then E(T) = yu.
n

So, the estimator is unbiased provided that Ds k;=1.


‘=A

CONSISTENT ESTIMATOR

If U is an estimator for an unknown parameter 6, then Uisa


consistent estimator for @ if Var(U) > 0 as n'> &%, where n is the
size of the sample from which Uis obtained.
424 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 384 If Xie 5. eee) SONS essample taken from a population with
mean LL Sealevariance oO; state a condition which will
ensure that the
estimator T = k,X,+ boxih PRX, pis’ a consistent estimator
for pL.

Solution 84 Now Var(X;) = 0? for eachi=1,2,..., n. If Tis a consistent


estimator for p, then Var(T) > 0 asn > ©.
Var(T) = Var(k,X;+k2X.+...+ kyXn)
= k?2Var(X,)+k2Var(X2)+. +k,2Var(Xn)
= Var X)lke tke tat en

-o Da
n

bS 2

i=1

> @if > ke 0 as n > ©°,


> 0 asn
So Var(T)
i=1

Exercise 8a

1. If X,, Xz, X3 is a random sample taken 4, We wish to estimate K, the breaking


from a population with mean u and variance strength of elastic bands in a batch. We
e find which of the following estimators observe values of a related random variable
for Mare unbiased: Z which has density 3z7/K° for O0<z<K
and 0 elsewhere. Show that 4Z/3 is an
1 1 ah
(a) Uy= goat on Fee unbiased estimate of K and that it has
variance Kk TBs

(0) U= = 12X Xa hee3te Xs We can also observe independent variables


X 1, X2, X3 each having mean K/2 and
4 1 i variance K*/12. Set Y= X,+ X,+ X3
(c) (GeyU3 = =5)
Ge Oe
fo? =0?
ek What is the value of the constant c such
that cY is an unbiased estimate of K?
i 2 1 Of the unbiased estimates cY and 42/3,
ad) Ug= SX oa
SE 6. es ek oa which do you prefer, and why? (O)
al
(eyes = 3 X2+ X3)
5. Two populations have the2 same mean
2. Ofthe unbiased estimators given in Question
1, which is the most efficient estimator? but different variances oy, 07. A random
observation X, is taken from the first
3. If X,, X2,...,X, is a random sample population and another, X2, from the
taken from a population with mean py and second population. Show that if, in the
variance on show that linear function T = c1X,+0¢,X2, c; and c2
are chosen so that T is an unbiased
it
(a) neat X2+...+ X,) is an unbiased estimator of y (i.e. E(T)= y), and Thas)
minimum variance, then c; = 07 wltere + 04 2)
and consistent estimator for ,
and ¢2 = 07 /(0; + o7).
Xt 2X4+ eels nXy
( Toeie sy is an unbiased Two instruments are used in a laboratory
to measure a particular physical property
and consistent estimator for LU. of metals. From long experience it has
ESTIMATION OF POPULATION PARAMETERS i 425

been found that both instruments give two means that is unbiased and has mini-
unbiased readings and that determinations mum variance. (O)
by Instrument A have a variance that is
twice that of determinations by Instrument A random variable X has mean pM and
B. Random samples were taken from an variance 2 and an independent random
ingot of metal and divided between the variable Y has mean 3yu and variance 7.
two instruments. The mean of 12 deter- Find the values of a and b if aX+ bY has
minations by Instrument A was 6.0, in mean LM and minimum variance.
appropriate units, while the mean of 9 The values obtained in a single observation
determinations by Instrument B was 6.5. of each of X and Y are 10 and 25 respec-
Estimate the common mean from these tively. Obtain a best estimate of u and
data by using the linear function of the explain in what sense it is best. (C)

THE MOST EFFICIENT ESTIMATOR OF THE POPULATION MEAN

From a population with unknown mean yu take a random sample of


— le
size n, and let X = — > xi
nTel

Then the most efficient estimator for u, which we will write as


Lt, is X, where X is the sample mean
We write B= xX

NOTE: X is an unbiased estimator for u:


ae 1 I
ECM teat LEXyr PEG etn BOG) es GES) =p
n
X is a consistent estimator for yu:
1
Var(X) = =[Var(X,)+ Var(X2) +. + Var(Xy)] = —2nVar(X)

oO im
==> 2 _ainel ===
0)2S ti,= es
n n

THE MOST EFFICIENT ESTIMATOR OF THE POPULATION VARIANCE


From a population with unknown variance o? take a random sample
Xe lo
of size n, and let S? = maesree.) , where S? is the sample variance.
n

2 . ~2 ; ns*
The most efficient estimator for 0“, written 0°, 1s ae

! re nS?
We write ae

NOTE: this is a surprising result, as one might expect 0? = S7 but


this is not the case.
426 A CONCISE COURSE IN A-LEVEL STATISTICS

nS 2 : ; ;
Example 8.5 Show that yo is an unbiased estimator for o°.
1

Solution 8.5 Now


i(XpaA)? BAeoe
Ss? =
n n

so nS? = ) xe 0X?
1

and

E(nS?) = |9¥4) B08


2 —

= E(X?)+ E(X?)+...+ E(X,7) —nE(X?) (i)


Now E(X;) = w and Var(X;) = o?
and Var(X,) = E(X?)— EX)
sO Or EX ee
Therefore BPC (0 0 ee
ey es Ge
Also E(X) = uw and Var(X) = a

and Var(X) = E(X2)—E(X)


oO2 =
Re) sn BX oie
n
- ap
Therefore E(x) Ser ee
n
Therefore, from (i)
2
E(nS*) = n(p?+ o2)=n(u24 =
n
= (te) on
nS?
So that E = o* as required.
(n—1)
NOTE: nS? = > (X;—X )*'s0 Elox| =i
S2
Also pat
ee is a consistent estimator for a” (no proof given).

Example 386 Obtain the most efficient, or best, estimates of. the population mean
and variance from which the following sample is drawn:
19.30, 19.61, 18.27, 18.90, 19.14, 19.90, 18.76, 19.10
ESTIMATION OF POPULATION PARAMETERS , 427

Solution 86 The best estimate of the population mean is-@ where pt = X, the
sample mean.

i z 152.98
Now x = os = A = 19.12 (2d.p.)

So fi = 19.12 (2d.p.).
2
The best estimate of the population variance is G? where o? = Ries
= i)
_and s? is the sample variance.
Now
et ee oF loos,
G20 SS SYS = a = 0.217 (8d.p.)
n 8 8
“ ns? :
Soa as a8 = 0.25 (2id'p?):

Example 87 Obtain the best unbiased estimates of the population mean and
variance from which the following sample is drawn: n = 12,X = 23.5,
D(x —X)* = 48.72.

Solution 8.7 The best estimate of the population mean is # where u = X.


Therefore u = 23.5.
The best estimate of the population variance is 6” where
n 2 Y(x—x)?
Fo ede es oe since ns* = D(x —X)?
al iat
48.72
So.0 7a eqns 4-43:(2 d.p.):

eS SS

Exercise 8b
———— Ot

In questions 1 to 11, find the best unbiased 4. Lx =120, Dx? = 2102,n=8.


estimate of the population mean and of the
population variance from which each of the 5. Lx =120, D(x—X)* = 302, n= 8.
following samples is drawn:
- Des =
1. 46,48, 51,50, 45, 53, 50, 48. 6. Lx = 100, 2x” = 1028, n= 10.

2. 35,42, 38, 55, 70, 69. 7. n=34, Dx = 330, Dx? = 23700.


8. 1.684, 1.691, 1.687, 1.688, 1.689, 1.688, s.
1.690, 1.693, 1.685. B72, wx 560, 2(x—x) = 168 900.

oe 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80

pears eo me te Fl elpolctel fe 22) ocd


428
A CONCISE COURSE IN A-LEVEL STATISTICS

10. Interval 02 W4S) ts=oi2= G6 se20-


Frequency 3 6 24 10 il 0)

D1 |x|20 21 22 23 24 25 Use the sample to obtain unbiased esti-


mates of the population mean and
fad 14 17 26 20 9 variance. Compare these with the true
values.
12. The concentrations, in mg per litre, ofa 14. The random variable X has probability
trace element in 7 randomly chosen distribution
samples of water from a spring were:

240.8 237.3 236.7 236.6 A


234.2 233.9 232.5 O23 LO r2 a Ore!

Determine unbiased estimates of the Use the random numbers given below to
mean and the variance of the concentra- generate a random sample of size 20 from
tion of the trace element per litre of , the distribution of X and use it to obtain
water from the spring. (L)P unbiased estimates of the population
mean and variance.
13. Using the random numbers on p. 636
take a random sample of size 10 from the Random numbers: 57048 86526
following distribution: 27795 36820

ESTIMATOR OF POPULATION PROPORTION

From a binomial population in which pis the proportion of successes


(unknown), a random sample of size n is taken.
Let P, be the r.v. ‘the proportion of successes in the sample’.

Then, an unbiased estimator for p is P..

NOTE: the estimator is unbiased, since E(P,) = p (see p. 407).

The estimator is consistent, since Var(P,) = re where q = 1—p and


n

BS) oper MEY


n

Example 38.8 A random sample of 50 children from a large school is chosen and
the number who are left handed is noted. It is found that 6 are left
handed. Obtain an unbiased estimate of the proportion of children
in the school who are left handed.

Solution 38.8 From the sample, the proportion of children who are left handed is
Ds where p, = 5 = 0.12.

An unbiased estimate of the proportion of children in the school


who are left handed is 0.12.
7 Si
ESTIMATION OF POPULATION PARAMETERS # 429

POOLED ESTIMATORS FROM TWO SAMPLES

Kstimates of the population mean, variance, proportion, etc., may


be made by ‘pooling’ values from two samples.

Pooled estimators of population mean and of population variance


From a population with unknown mean mu and unknown variance

|e ee
o” we take two random samples:

: aX ot nx
Then t=
n,+ny

where j is an unbiased estimator for the population mean uy.

The estimator is unbiased since

‘3 mately) 1,
— E(n,X,+n2X>)
n,+n, n, tn,

1 eS =
= [n, E(X,)
+n, E(X)]
n,+n,

t
a - a Ta iMeat re2M)

ea

Al s ge = mSit nS?
e
; Rons

where G” is an unbiased estimator for the population variance 0”.


=~ ° . . . . p)

The estimator is unbiased since

E(n,82+n2S87) = E(n,87)
+ E(n2S.°)
(n;—1)o?+(n,—1)o? (see p. 426)
(n, +n,—2)o?
24 S.2
So pe Nw? - ‘i
hits 2
430 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 383 Two samples, sizes 40 and 50 respectively, are taken from a popula-
tion with unknown mean yw and unknown variance o”. Using re
data from the two samples, obtain unbiased estimates of u and o°.

Sample I Sample II

|x,[18 19 20 21 22 x2|18 19 20 21 22 23
eal 7 A510 8 /f[10 21 PES ©

Solution 38.9 SampleI


Dy 807
Pie eae
Seno a4
fx? ‘ 16329 (807)?
oe 16329_(207) = 1.194
pi Sfeee 40 40
Sample II
Dfx 977
xX, = —— = — = 19.54
Sf 50
D fx? POvAL 977\?
Pp alales
Df
ei) Sat
50
(222)
50
= 1.7284
An unbiased estimate for pu is &@ where
x = Haha th tox
NgXa YT 40(20.175) +
ore 40(20.175) 50(19.54
+ 50(19.54) = 19.82 (2d)
ni +> 40 +50
An unbiased estimate for o? is 6? where
mePh ns +n
pt Ne
sy —
40(1.194) + 50(1.728
40(1.194) + 50(1.728) ae (2 d.p.)
Nick Nase 40+ 50—2
Therefore an unbiased estimate of the population mean is 19.82
(2d.p.) and an unbiased estimate of the population variance is
1.52 (2.d.p.).

Example 8.10 A count was made of the bacteria in a certain volume of water.
Denoting the number of bacteria by x = 1800 + d, the results for
the first sample were
ny = 2, 2d ="162)"2(d dd) 1 1466
The results for the second sample, where y = 1800 + e, were
n, = 25, Ye = 125, L(e—é)* = 14984
Obtain unbiased estimates of the population mean and standard
deviation (a) considering the results of the first sample only,
(b) considering both samples.
ESTIMATION OF POPULATION PARAMETERS y 431

Solution 8.10 (a) Sample I: x = 1800+d,

a = 1800 +d& = 1800+ ine ~ ous


nis X(d—d)? 11466

Therefore, an unbiased estimate of the population mean is 1806


and an unbiased estimate of the population standard deviation
“is 21.

(b) For sample II:

in E 125
y = 1800+2@ = NS Gare. = 1805

So, for the two samples together,

_ mE+my_271806) +25(1805) _ sonesy ap)


and

—d)*+D(e—é)? 11 466+
14 984
=o
0 Pes eed Se eee Pen eng?
Ng ng Z 50

So, on the basis of the two samples, @ = 1805.52 (2d.p.) and


6 = 23.

Pooled estimator of population proportion


From a binomial population which has unknown proportion p of
‘successes’, we take two samples:

Sample I
Then p, an unbiased estimator for the population proportion p, is
given by

Nye. NP, é

a nyt Na
432 A CONCISE COURSE IN A-LEVEL STATISTICS

The estimator is unbiased since


[ee + 2a 1
a [E(m,P,,) + E(n2P;,)]
ny tn, ny+ny

1
a [n, E(P,,) + n2E(Ps,)]
Ny ny
1
= (n,;pt+np)
ny + 1)

792

Example 38.11 An opinion poll in a certain city indicated that 69 people ina
random sample of 120 said that they would vote for Mr Jones,
while in a second random sample of 160, 93 said that they would
vote for Mr. Jones. Find an unbiased estimate of the proportion of
people in the city who will vote for Mr Jones.

69
p,, = ic6 n, C= 160,
93
p,, _= ae
Solution 8.11 n, = 120,

An unbiased estimate p is given by


MPs, tn, 69 +93 = 058 (2dp.)
p= i nytn,
= ——
120+ 160

So, on the basis of the two samples, it is estimated that approxi-


mately 58% of the people in the city will vote for Mr Jones.

Exercise 8c

In each of the following, find unbiased estimates 4. SampleI n,=138, 2x = 109.8,


of the population mean and variance, using the Sx 1104
data given by the two samples. Sample II n,=15, 2x = 147.6
Dx? = 1529.68
1. SampleI 0.68, 0.67, 0.61, 0.78, 0.65 5. Ss et = 93 as
Sample II 0.64, 0.66, 0.63, 0.69, 0.66, Tege genes erly bal
Cae Sample II nz, = 18, Dx = 45,
2, Sample 1b-10.2010-1,10,94,10.5,.8.9, Dx? = 275
9.8 6. SampleI 5.26, 5.89, 5.64, 5.83, 5.
Sample II 8.7, 10.6, 10.8, 9.6, 9.9, Fe easton ten ine oe ; aD :aa
10.9, 8.4, 8.6, 10.9 Sample II 5.31, 5.37, 5.41, 5.45, 5.58,
3. SampleI 5.29, 5.36, 5.28
1 See 7. SampleI ni = 9, 2x = 267,
X(x—xX)° = 100
2 Seels tones Sample II n,=11,oeDx = 336,
U(x —%)* = 114.7
Sample II
8. SampleI n,=15, 2x = 35.9
i 2c Se)?
=0.269
3 6 12 26. 175"5 a Sample Tl nz= 20, Lx = 47.8,
D(x —X)* = 0.638
ESTIMATION OF POPULATION PARAMETERS # 433

In the following questions, find an unbiased 12. A random sample of 600 people from a
estimate of the population proportion, based certain district were questioned and the
on the data given by the two samples. results indicated that 30% used a parti-
cular product. In a second random sample
9. ny= 200, Dg, = 0.36; n2 = 300, Ps, = 0.34
of 300 people, 96 used the product. Find
10. ny= 50, Ds, = 0.82;n2= 80,Ds, =) 0:85: an unbiased estimate of the proportion of
people in the district who used the
11. ny= 10, ps, = 0.6; nz = 20, py = 0.7 product.

SUMMARY — POINT ESTIMATORS

POPULATION MEAN

From one sample From two samples


~ 1X, +N,X,
w ny os Nz

From one sample From two samples


ris nS? oz =
n,S?+n,87
G
nytn,—2

From two samples


als, + NP,
p=
fa No

INTERVAL ESTIMATION — CONFIDENCE INTERVALS


An interval estimate of an unknown population parameter is a
random interval constructed so that it has a given probability of
including the parameter.
Consider a population with unknown parameter 0.
If we can find an interval (a, b) such that P(a << 6 < b) = 0.95, we
say that (a, b) is a 95% confidence interval for 0.
In this case, 0.95 is the probability that the interval includes 0.
NOTE: it is not the probability that 0 lies in the interval.

CONFIDENCE INTERVAL FOR THE POPULATION MEAN

Consider a population, with mean wu and variance o”. Now take a


random sample from the population, X,, X», ...,X, and consider
=f aig
the distribution of X where X = o DX LS kay oss Tes
434 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Confidence interval for , population variance o? known


If X is normally distributed such that X ~ N(u, 07) then, for any n,
zl Ga
ee N =|
n

If X does not follow a normal distribution, and nis large, then by


the central limit theorem
a o?
Agr Y n(n}
nj
i
Standardising, we have Z = af where Z ~ N(0, 1). N(O, 1)
oh/n
2.5% 2.5%
We know that the central 95% of N(0,1)
lies between the values + 1.96 (p. 336).
S.V. —1.96 0 1.96
xX—

So p(-1.96sf
< ste as
<1.96] E005
ahh.

p(-1.96-=< X-p<196-=|
e ae us al Jn = 0.95
e

Now multiply through by — 1, so reversing the inequality

0 —
0.95
| Var * a
Ppl96 SS XS SG

en GO
P|X+1.96—=2> p> X-1. of = 0.95
Jn n
Therefore

?(R-1.96-7 < eX 1, 20%. 0.95


Vn n
So, we have found an interval such that the probability that the
interval includes yw is 0.95. This is called the 95% confidence interval
for py.

IfX is the mean of a random dieioe: of size n taken froma normal


population with known variance o”, then acentral 95% confidence
se 7Bs the population mean, is eee by -
Ut a 96- 2 a 96 :

F a x 1.967)oy

This can be 3 written: ¥+1. 96.F


eo

NOTE: if a large number of intervals are calculated in the same


way, then 95% of them will include, or ‘trap’, py.
ESTIMATION OF POPULATION PARAMETERS q 435

If the population is not normal, then we require n to be large


(n > 30, say) for the result to be used.

Similarly, a central 99% confidence interval for wu is given by

0 0 N(O,1)
[z-2.5757- B+ 2.575)
n
: 0.5% 0.5%
This can be written
S.V. —2.575 0 2.575
oO
ee 0
/n
A central 98% confidence interval for uw is given by
0 0 N(O, 1)
fe B20 =k 2.326 ==
1% 1%
This can be written
oO OVEN 23200 O 2.326

NOTE: often the word ‘central’ is omitted when considering con-


fidence intervals, but it is assumed, unless otherwise stated, that an
interval that is central, or symmetric, about the mean is required.
A central 95% confidence interval is sometimes written 95% C.I.
NOTE: One-sided confidence intervals.

A one-sided 95% confidence interval for p is given by N(O, 1)


Oo
Re
bsOD oa gg0 oa
| Vn |
3 oO
and plk—1.6457— <u<=| = 0.95 S.V. —1.645 0
Jn
5 N(O, 1)
or L= x+1.645Fa a

and p= <p<*¥+1.645 - = 0.95


Vn S.V. 0 ~—-'1.645
The format of the one-sided confidence interval depends on the
information required in a particular situation.

Example 38.12 After a particularly wet night, 12 worms surfaced on the lawn.
Their lengths, measured in cm, were:
9.5, 9.5, 11.2, 10.6, 9.9, 11.1, 10.9, 9.8, 10.1, 10.2, 10.9;511;0
Assuming that this sample came from a normal population with
variance 4, calculate a 95% confidence interval for the mean length
of all the worms in the garden.
436 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 8.12 n=12, Dx = 124.7.

So Ye 7 = 10.389 (2d.p.)

We are given the population variance 0” = 4, so o = 2.


A symmetric 95% confidence interval for the mean length of the
worms is
0 2
%+1.96-—=
“ Vn = 10.39+(1.96)
(198) =
fia
= 10.39+1.13
= (9.26, 11.52)
The 95% confidence interval ‘for the mean length of the worms in
the garden is (9.26 cm, 11.52 cm).

Example 38.13 On the basis of the results obtained from a random sample of 100
men from a particular district, the 95% confidence interval for the
mean height of the men in the district is found to be (177.22 cm,
179.18 cm). Find the value of X, the mean of the sample, and 9g, the
standard deviation of the normal population from which the sample
is drawn. Calculate the 98% confidence interval for the mean height.

Solution 8.13 The 95% confidence interval is given by


0
xXt1L IO a= = (197.225 17918)
Jn
;

Hence X¥+1.96— = 179.18 (i)


10

X
X= —1.96
.96 —2 =
10 4 ico ii
(ii)

Adding (i) and (ii), 2x = 356.4


X= 178.2,
Subtracting (i) and (ii),

Z 0
2(1.96)— = 1.96
10
10
O=4>>
2
o=5

Therefore the sample mean X is 178.2 cm and the population


standard deviation o is 5cm.
ESTIMATION OF POPULATION PARAMETERS , 437

The 98% confidence interval is given by N(O, 1)


5
178.2 + 2.326 |—
1% 1%
¥ +2.326——;
Vn fa
S.Vap= 2.326 0 2.326
1 Gi2ad LOO
(177.037, 179.363)
The 98% confidence interval for the mean height of the men in the
district is (177.04 cm, 179.36 cm) (2 d.p.).

(b) Confidence interval for {., population variance o? unknown


We must consider separately the following cases:
(i) n, the sample size, is large, (n = 30, say),
(ii) n, the sample size, is small.

(b) (i) Sample size large (n 2 30, say)


Since o? is unknown, it is necessary to use an estimator, 0? for it.
nS?
Now, a n—
where S? is the sample variance

If¥ and s? are the mean and variance of a random sample of size
n (where n is large) from anormal population with unknown _
mean wu and unknown v: iance O°, then a central 95% confidence
interval for pu is ~— aS 2 :
ns
{[x—-1. +1.96 here og? = —— ws? for largen
f 1967 F n:
(
oe)a need :
This can be written Rt1 eo
Jn

— (0) ‘ af O
NOTE: bieeer <p<X+1.96—=|
a = 0.95.
Vn al
Similarly,

a199% confidence interval for pis given by x=+2.575—=


a

a 98% ee nea.i ‘isgiven by ¥ + 2.326


Sle
si2)
438 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 814 A random sample of 120 measurements taken from a normal


population gave the following data:
n,=-120; = Zx' = 1008, Z(x—x)? = 172.8

Find (a) a 97% confidence interval, (b) a 99% confidence interval


for the population mean yu,

2x 1008
N X (es Set ee
Solution 8.14
= cement
Dig—R)e 172.8
and s a= ¥ = 120 = 1.44

sO s = 1.2
‘ ares me .
Using the large sample approximation, with o° = qo = §

(a) a 97% confidence interval for the population N(O, 1)


mean is given by
Bs 1.5% 1.5%
sii
ae . Sg a Oe eso me
“+4. ad

vn V.120 SVssai2 0 Meee


= 8.4+0.238
(8.162, 8.638) (3d.p.)
Therefore a 97% confidence interval for the population mean is
(8.162, 8.638).

(b) A 99% confidence interval for the population mean is given by


=~

+2573
742675 ==
—= = 842.515
8.442.575 aoe
tt)
= 84+0.282
ll (8.118, 8.682) (3d.p.)
Therefore a 99% confidence interval for the population is
(8.118, 8.682).

Example 38.15 A sample of readings from a normal population with unknown


mean p and unknown variance o? gave the following data:

be [1740] 17.6 Oia ae


[omar 4 19 23 10
A second sample of readings taken from the same population gave
noi= 92, 2x = 12672, 92x27 =. 22 536
Combine the two samples to give estimates of uw and o”, and give the
appropriate 90% confidence interval for wy.
ESTIMATION OF POPULATION PARAMETERS , 439
Solution 8.15 Sample I:
ae Dfx 1408.3 _ ee
1 Sse a0.
ates 2fet na 24 792.63 /1408.3\? _ asco
1 a oF
cake ar =e (3 S.F.)

Sample II
Ss tismand2Ou.2
5 pie elas = 17.6
Ny he,

Dx? 22 536
se = ———Xx = —17.6? = 3.24
ny 12

p= ot = = 17.602 (83d p.)


and

nse+nys2 80(0.0159) + 72(3.24)


o-
= SS = 1564 (8 dp.)
ni,tn,—2 80+72—2
From the two samples, estimates of and o” are 17.602 and 1.564
respectively.

A 90% confidence interval for u based on the combined sample is


O ns Ee hes
~

X¥+1.645—= where xX = 17.602, G = /1.564, n = 152


J/n
90% confidence interval is
/1.564
17.602
+1.645 == _= 17.602+0.167
152
(17.435, 17.769)
Therefore a 90% confidence interval for 1, based on the two samples
is (17.435, 17.769).

_~—~—~~——C«dExercise 8d © eR ee

1. Acertain type of tennis ball is known to the mean height of bounce of the sample
have a height of bounce which is normally is 140cm. Find (a) 95%, (b) 98% con-
distributed with standard deviation 2 cm. fidence intervals for the mean height of
A sample of 60 tennis balls is tested and bounce of this type of tennis ball.
A CONCISE COURSE IN A-LEVEL STATISTICS

A random sample of 100 is taken from a gave n, = 64, 2x = 5452.8,


population. The sample is found to have a D(x—X)?= 973.44. Estimate Mand o7
mean of 76.0 and standard deviation from this data.
12.0. Find (a) 90%, (b)97%, (c) 99% A second sample of readings gave:
confidence intervals for the mean of the
population. |x|82 83 84 85 86 87
|F| 6 9 19 27 22 17
150 bags of flour of a particular brand are
weighed and the mean mass is found to be Estimate u and o” for this second sample.
748 g with standard deviation 3.6 g. Find Now combine the two samples to give a
(a) 90%, (b)95%, (c) 98% confidence further estimate of U, together with the
intervals for the mean mass of bags of appropriate 97% confidence interval.
flour of this brand.

A random sample of 100 readings taken 10. The age, X, in years at last birthday, of
from a normal population | gave the 250 mothers when their first child was
following data: x = 82, Dx? = 686 800. born is given in the following table:
Find (a) 98%, (b) 99% confidence inter- Ni 18- 20- 22- 24- 26- 28- 30- 32- 34- 36- 38-
vals for the population mean UL.
No.of |14 36 42 57 48 26 17 7 2 0 1
80 people were asked to measure their mothers

pure mares when,they: woKe,up im ine (The notation implies that, for example in
Cree: ane as ues 68 he podvhe column 1, there are 14 mothers for whom
standard deviation a beats: Find (a) 95%, the continuous variable X satisfies
(b) 997% confidence intervals for the 18<X <20,)
population mean.
Calculate, to the nearest 0.1 of a year,
The 95% confidence interval for the mean estimates of the mean and the standard
length of life of a particular brand of light deviation of X.
bulb is (1023.3 h, 1101.7 h). This interval If the 250 mothers are a random sample
is based on results from a random sample from a large population of mothers, find
of 36 light bulbs. Find the 99% confidence 95% confidence limits for the mean age,
interval for the mean length of life of this U, of the total population. (C)
brand of light bulb, assuming that the
length of life is normally distributed.
11. The distribution of measurements of
A random sample of six items taken from thicknesses of a random sample of yarns
a normal population with variance 4.5 cm? produced in a textile mill is shown in the
gave the following data: following table:
Sample values: 12.9 cm, 13.2 cm, 14.6 cm, : ae
12.6 cm, ? 11.3 cm, ’ 10.1 em Se ec)
(mid-interval value) Brequency
Find the 94% confidence interval for the
population mean UL.

The data is from a random sample of 150


readings taken from a population with
mean Mand variance O°. Estimate Wand
(Or

pal sibsA ie 14 oer Illustrate these data on a histogram. Esti-


: . mate to two decimal places the mean and
A second sample of 100 readings is standard deviation of yarn thickness.
cae ze the aa oo ae For Hence estimate the standard error of the
is sample, n2= 100, 2x2=1119, mean to two decimal places, and use it to
Leg = 12 585.61. Calculate estimates of determine approximate symmetric 95%
1! and 0” from this second sample. Now confidence limits, giving your answer to
combine the two samples to give a further one decimal place (MEI)
estimate of LU, together with its appropriate ’
96% confidence interval. :
12. Thelifetimes of 200 electrical components
A sample of 64 readings from a normal were recorded to the nearest hour and
population with mean J and variance o? classified in the frequency tabulation.
ESTIMATION OF POPULATION PARAMETERS y 441
(d) Estimate the median time to failure
of thesample. (SUJB)
80
15. A machine produces plastic balls for use
in an industrial process. It incorporates a
device which automatically recycles those
balls whose mass is outside certain limits.
The mass of the balls produced (measured
Draw a histogram of the data and estimate as a deviation in g from the minimum
the mean and standard deviation of the value) may be regarded as a random
distribution. variable, X, with probability density
function
Calculate a symmetric 90% confidence
interval for the population means, using Raat Om we 2
a suitable normal approximation for the f(x) -{ 0 otherwise
distribution of the sample mean. (MEI)
(a) Show that k = 0.5.
13. A random sample of 250 adult men (6) Find the mean and the standard
undergoing a routine medical inspection deviation of X correct to 3 significant
had their heights (x cm) measured to the figures.
nearest centimetre, and the following (c) Find the probability that the mass of
data were obtained: 2x = 43 205, a ball is less than the mean. Compare this
Lx? = 7469107. Calculate an unbiased with the result you would have obtained
estimate of the population variance. if X had followed a normal distribution.
Calculate also a symmetric 99% con- (d) The mean of the distribution may
fidence interval for the population mean change from day to day but the shape
(C)P and standard deviation do not. A random
sample of size 20 yields a sample mean of
14. The time to failure of a sample of 200
0.9. Calculate a 90% confidence interval
batteries is given in Table A below.
for the population mean. Explain the
(a) Draw a histogram of the data. relevance of the Central Limit Theorem
(b) Estimate the sample mean and to your calculations. (AEB 1988)
variance of the time to failure by the
usual method of considering all observa- 16. The lifetime of a shuttlecock is the num-
tions in a class as being concentrated at ber of hours of continuous play before it
the mid-point of that class. Would you becomes unusable. A random sample of
expect the actual sample mean to be 40 shuttlecocks had a mean lifetime of
greater or less than the estimated value? 4 hours, with standard deviation 1.1
Give a reason for your answer. hours. Find the value of c such that
(c) Using your calculated values obtain c<pU<© isa 95% one-sided confidence
a 95% confidence interval for the popula- interval for Lt, the mean lifetime of a
tion mean. shuttlecock.
Table A

Time (hours) 0-20 20-40 40-60 60-80 80-100 100-120 120-140 140-160
80 48 29 18 12 7 4 2

Before we consider the case when the sample size is small, we must
introduce the t-distribution. It has a very complicated p.d.f. which
is included here only for completeness.

THE t-DISTRIBUTION

The r.v. X is said to follow the t-distribution if the p.d.f. of X is


x2 hele
fix) = Cy i+ Vv
—o <x <0
442 A CONCISE COURSE IN A-LEVEL STATISTICS

freedom.
X has one parameter, v, known as the number of degrees of
pronou nced ‘new’.) The constan t €,
(We use the Greek letter p,
depends on pv.

We say that Xe 1)

The curve ofy = f(x) is shown for v = 2 andv = 10.

For large values of v the t-distribution approximates to the standard


normal distribution N(0, 1), shown by the broken line.

Normal curve
N(O, 1)

USE OF t-DISTRIBUTION TABLES

These are printed on p. 636. Note that the upper quantiles of the
t-distribution are printed.

Referring to the tables, note that


each column is headed P, Q, 2Q. . + 3

This means, for X ~ t(v),


Oe

P(x <t) =P NOTE. We will


use this diagram
P(X >t) = 1—-P=@Q for all values
of v.
P(|X|>t) = 2Q

Consider column ie 0.975


Q 0.025
2Q 0.050

For X ~ t(6), row v = 6 gives t = 2.447.


ESTIMATION OF POPULATION PARAMETERS y 443

P= 0.975 = 97.5%, means that 97.5% of the t(6) distribution lies to


the left of 2.447. .
Q = 0.025 = 2.5%, means that 2.5% of the t(6) distribution lies to the
right of 2.447.
2Q = 0.05 = 5% means that 5% of the t(6) distribution lies outside
the range (— 2.447, 2.447) (using symmetry properties of the curve).

t(6) t(6)
2.5% 2.5% 2.5%

t = 2.447 t=—2.447 t = 2.447

Example 3.16 (a) Find two symmetrically placed values for t outside which 1%
of the t(11) distribution lies.
(b) If X ~ ¢(4), find t such that (i) P(X < t) = 0.99,
(ii) P(X > t) = 0.05, (iii) P(| X|< t) = 0.95.
t(11)
Solution 8.16 (a) Row v=11, column 2Q = 0.01 nee 0.5%
gives t= 3.106, so that 1% of the
distribution lies outside (— 3.106, 3.106).
—3.106 0 3.106
(b) (i) Rowv = 4, column P = 0.99 gives t = 3.747, so
P(X < 3.747) = 0.99.
(ii) Row v = 4, column Q@ = 0.05 gives t = 2.132, so
P(X> 2.132) = 0.05.
(iii) Row v = 4, column 2Q = 1— 0.95 = 0.05 gives t = 2.776, so
PE- 24116 — X = 2.776).— 0.95.

Example 3.17 If the r.v. X is such that X ~ t(8), find (a) PX <—2.9),
(b) P(X > 3.36), (c) P(—2.9< X < 3.36).

Solution 8.17

2 Ou Ol 3.30

(a) Rowv = 8, t = 2.896 gives Q = 0.01, so that, by symmetry,


approximately 1% of the distribution lies to the left of — 2.9, so
P(X <—2.9) = 0.01.

(b) Row v = 8, t = 3.355 gives Q = 0.005, so that 0.5% of the


distribution lies to the right of 3.86 and P(X > 3.36) = 0.005.

(c) P(i—-2.9<X < 3386)=— 1—(0.01 + 0.005)


= 0.985

So 98.5% of the distribution lies between — 2.9 and 3.36.


A CONCISE COURSE IN A-LEVEL STATISTICS
444
ee ee
Exercise 8e . a7

1. Find two symmetrically placed values for If X ~ t(13), find P(—1.77 < X < 3.012).
t outside which 1% of a ¢(6) distribution If X ~ t(10), find P(1.812 < X < 3.169).
lies.
If X ~ t(6), find P(— 2.447 <X <—1.44).
2. Repeat Question 1 for (a) 10%, (b) 5%,
If X ~ t(8), find the value of a such that
eee
(c) 2%, (d) £% of a t(6) distribution. P(X > a) = 0.05.

3. Repeat Question 1 for 1% of a (a) ¢(7), - If X ~ t(12), find the value of a such that
(b) t(12), (c) t(15), (d) t(16) distribution. P(| X|<a) = 0.95.

NOTE: DEGREES OF FREEDOM The number of degrees of free-


dom associated with a sample statistic is given by

vp = number of variables — number of restrictions involved in


calculating the statistic

For example
ae 1 a
(a) the sample variance is given by S? = — 2(X;—X)?.
n
When calculating S? the variables are X,, X2,...,X,, so the number
of variables is in. However, these variables are ‘restricted’ by the fact
that 2 X; = nX, so the number of restrictions is 1.

Therefore vy =n—1.

(b) Consider
n Sit Si nSe
o- =
Ny tlea2
The number of variables = n, +n.
The number of restrictions = 2 (these are X, and X,).
Therefore vy =n,+n,—2.

(b) (ii) Confidence interval for u, population variance unknown,


sample size small (n < 30, say)
Since o” is unknown, we use the estimator 6? for it.
re. nS?
Now G? = em where S? is the sample variance

G J/ns S
Vn VnVJ/n-1 Vn-1
X~u oat an
The statistic becomes = whichis 4. ————
ol/n Gi/n A AT
ESTIMATION OF POPULATION PARAMETERS y 445

Now, for small samples this statistic does not follow the normal
distribution; the t-distribution must be used instead. The number of
degrees of freedom involved in calculating the statistic is n—1.

a
So, if 7ST
=
ekLe then T follows a t-distribution with (n—1)
ned
degree of freedom

i.e. Tins)
_ If ¥ and s? are the mean and variance of a random sample of size
n (where n is small) from a normal population with unknown
mean uw and unknown variance a”, then a central 95% confidence
interval for u is given by

5 6 s
X—-t———=, ¥+t-——=

o S eo Ss
where Xt <u< 8+ = 0.95
inl yn1
i.e. (—f, t) encloses 95% of the t(n
—1) distribution.

Example 38.18 Ten packets of a particular brand of biscuits are chosen at random
and their masses noted. The results (in grams) are 397.3, 399.6,
401.0, 392.9, 396.8, 400.0, 397.6, 392.1, 400.8, 400.6. Assuming
that the sample is taken from a normal population with mean mass
bt, calculate (a) the 95% confidence interval for u, (b) the 99%
confidence interval for py.

Solution 38.18 From the sample,

s De”
x = cali el eS Sa
n n

; 1 583 098.3
aaj obi E = ———
— (397.87)*
10 10

= 997.8 (1d.p.) —=9.29 . (2 d.p;)

As n is small, the small sample approximation is used.

(a) The 95% confidence interval for y is

s
Gat
V/n-1

where (—t, t) encloses 95% of the t(n — 1) distribution, with n = 10.


446 A CONCISE COURSE IN A-LEVEL STATISTICS

From tables, row v = 9, column 2Q = 5%, we t(9)


find that t = 2.262. 5 ew 2.5%

So that the 95% confidence interval is


/9.29 ==9262 0 2.262
397.87 + 2.262 397.187 + 2.298
V9
(395.57,400.17) (2d.p.)
The 95% confidence interval for pu is (395.57 g, 400.17 8).
aT SRY NarRE SeNg a OS ea

is
(b) The 99% confidence interval for yp

XG ues
Vek ta

where (—t, t) encloses 99% of the t(n —1) distribution, with n = 10.
From tables, row v = 9, column 2Q = 1%, we
find that t = 3.25.
t(9)
The 99% confidence interval is
0.5% 0.5%
/ 9.29
397.87 +£3.25 = 397.87+4 3.302
v9 —3.25 0 3.25
(394.57,401.17) (2d.p.)
The 99% confidence interval for pu is (894.57 g, 401.17).

Example 3.19 Fifteen pupils experimented to find the value of g, the acceleration
due to gravity. Their results were as follows:
9.806, 9.807, 9.810, 9.802, 9.805, 9.806, 9.804, 9.811, 9.801,
9.804, 9.805, 9.808, 9.803, 9.809, 9.807
Calculate the mean and the standard deviation of these results.

Give 95% confidence limits for the value of g based upon them.
Estimate the number of experimenters needed to give a confidence
interval of less than 0.001. (SUJB)

DG 147.088
Solution 38.19 y= 9.0 (a Da
n 15
5 Dx? me 1442.3254 147.088)?
9 Se ea Le
n 15 15
Therefore s = 0.0028 (2S.F.).
The mean value for g obtained from the results is 9.8059 and the
standard deviation is 0.0028.
ESTIMATION OF POPULATION PARAMETERS 447
q

From the sample: n = 15, X = 9.8059, s = 0.0028.


As n is small, we use the small sample approximation, so a central

95% confidence interval for gis Xx +t aa where (— t, t) encloses


i
95% of the t(n—1) distribution, with n = 15.
From tables, v = 14, column 2Q = 5%, we
find that t = 2.145.

So the 95% confidence interval is t(14)


|
9.8059 + 2.145 | = 9.8059+0.0016 25% 2.5%

V14
(9.8043, 9.8075) —2.145 0 2.145

So the 95% confidence limits for the value of g based on the results
are (9.8043, 9.8075).

If we require a confidence interval of less than 0.001, then n will be


large, so we use the large sample approximation.
A central 95% confidence interval for g is given by
G 2
A? ns
1.96 where oO” = ~s§
/n
So we require N(O, 1)
=~ 95%
0
(2)(1.96)
J
= < 0.001 where 6? © 0.0028
(2)(1.96)(0.0028)
0.001 paar
i.e. Vie
ANT
/n > 10.976
so TE bZ0
| 0.0 1 |
<«— 0.001

So the number of experimenters needed would be at least 121.

Exercise 8f

The heights (measured in cm) of six Determine a 95% confidence interval for
policemen were as follows: the unknown population mean LU.
180,176,179, 181, 183,179
Calculate (a) 90%, (b) 95%, (c) 99% con-
fidence intervals for the mean height of the A normal distribution has variance Ge and
population of policemen. (Assume that the mean WU. A random sample of ten observa-
heights of policemen are normally tions gives values
distributed.) 0.3, 0.28, 0.27, 0.33, 0.35, 0.33, 0.27
A sample of eight observations of a nor- 0.31, 0.37, 0.29
mally distributed variable gives values Find (a) 95%, (b) 99% confidence intervals
3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8 for U.
448 A CONCISE COURSE IN A-LEVEL STATISTICS

4. The masses, in grams, of thirteen washers interval for the mean life of a candle,
selected at random are assuming the length of life to be normally
distributed.
15.4, 15.2, 14.6, 16.1, 14.8, 15.3, 15.9,
16.0, 15.4, 14.6, 15.0, 15.5, 16.1 A random sample of seven independent
Calculate 98% confidence limits for the observations of a normal variable gave
mean mass of the population from which Dx = 35.9, Dx? = 186.19. Calculate a 90%
confidence interval for the population
the sample is drawn, assuming that the
population is normal. mean.
A random sample of eight observations of a
5. Twenty measurements of the life of a normal variable gave
candle (measured in hours) gave the Dx = 261.2, U(x—x)* = 3.22
following data: Dx = 172, Dx? = 1495.4.
By taking a sample of 20 as (a) large, Calculate a 95% confidence interval for
(b) small, calculate a 99% confidence the population mean.
Se ee ee
i

CONFIDENCE INTERVAL FOR THE PROPORTION OF SUCCESSES IN A


POPULATION

Consider a binomial population where p, the proportion of ‘suc-


cesses’ in the population is unknown.

Take a random sample of size n from the population and let P, be


the random variable ‘the proportion of successes in the sample’.

Then ea)
P, ~ Np, where g = 1—p (seep. 407)

Now, since p is unknown, we use an estimator for it.

An unbiased estimator for p is P,

PQs
It is reasonable to assume that an estimator for ss! is , where
n n
Q, = 1—P,.
PQ,
So we have ote N(o. approximately

Standardising, we have

Peep
Bi = where Z ~ N(0,1)
V PsQ;/n
Ea
Therefore P(-1.96" < 1.96] = 0.95
V P.Qs/n
so, rewriting

P.Q, iB.
p(P.—1.96 fee <p<P.+1.96 28s) = 0.95
n
ESTIMATION OF POPULATION PARAMETERS / 449

If in a random sample of size n (n> 30) the proportion with a


particular property is p,, the 95% confidence interval for the
_ population proportion p is given by

P
(»,—1.06 fe, Pp, 71,96 /Pas) where q, = 1—p,

1. : } PQs
This can be written P2196 f=
n

Similarly, a 99% confidence interval for p is p, + 2.575 Peds


Von

and a 98% confidence interval for p is p, + 2.326 es

Example 820 A manufacturer wants to assess the proportion of defective items


in a large batch produced by a particular machine. He tests a
random sample of 300 items and finds that 45 are defective. Cal-
culate (a) a 95% confidence interval, (b) a 98% confidence interval
for the proportion of defective items in the complete batch.

45
Solution 8.20 The proportion of defective items in the sample, p, = 300 = 0.15.

So q, = 1—p, = 0.85, n = 300.


The 95% confidence interval for the proportion p of defective items
in the complete batch is given by

/Ds4s (0.15)(0.85)
p= +1,1.96 ==- = 0.15+(1.96
( ) ee
300 ayez

0.15 +0.0404 22? 2.5%

= (0.1096, 0.1904) Ps
SV. 196 0 1.96

The 95% confidence interval is (0.1096, 0.1904).

The 98% confidence interval for p is given by


Fae eer eG: (0.15)(0.85)
p.s2.826/PE = 0.15 + (2.326) 300 Bats Pas

= 0.15+0.048 1%

(0.101, 0.198)
sv. —2.326 0 2.326

The 98% confidence interval is (0.101, 0.198).


450 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 38.21 A point whose coordinates are (X, Y) with respect to rectangular
is
axes is chosen at random where 0<X<1land0O<Y<1. What
whose equatio n
the probability that the point lies inside the circle
isx?+y?=1?
In a computer simulation 1000 such points were generated and 784
of them lay inside the circle. Obtain an estimate for 7 and give an
approximate 90% confidence interval for your estimate. Show that
about 290000 points need to be selected in order to be 90%
certain of obtaining a value for 7 which will be in error by less
than 0.005. (SUJB)

Solution 8.21 The point (X, Y) is chosen at random,


where 0O< X<landO<Y< 1.
Points thus chosen are spread uniformly
over the square with vertices (0,0), (1,9),
(1,1), (0,1).
So the probability that the point lies within
the circle x? + y? = 1 is equal to the fraction
of the area of the square which lies in the
region defined by the inequality x?+y? <1.

area quadrant DZ
So, P(point lies within circle)
area square &
7/4
ae
= 7/4
If 1000 points are taken and 784 lie within the circle, then if the
true proportion of points lying within the circle is p, an estimate for
p is p, where
784
Ps ~ 1000
= 0.784
So an estimate for 7/4 = 0.784 and hence an estimate for 7 is
(0.784)(4) = 3.136.

Now, a 90% confidence interval for p is given by

p,+1.645 Pela where gq, =1-—p,


/* n
/(0.784)(0.216
0.784 + (1.645) (Use ae)
1000

= 0.784 +0.0214
(0.7626, 0.8054)
ESTIMATION OF POPULATION PARAMETERS / 451

Therefore P(0.7626 < 7/4 < 0.8054) 0.90


i.e. P(3.0504 <7 < 8.2216) = 0.90
So, a 90% confidence interval for 7 is (3.0504, 3.2216).

If the value for 7 is to be in error by less than 0.005 then the value
for 7/4 must be in error by less than 0.001 25.
When n = 1000, the size of the interval was p, + 0.0214.
Now we need to find n such that the size of the interval is
‘p,+0.001 25,

0.0214 0.001 25

i.e. we require n such that

1.645 /£% < 0.00125


n

SO Jn >asset /(0.784)(0.216)
0.001 25
J/n > 541.55
n > 293279

So about 290 000 points need to be selected in order to be 90%


certain of obtaining a value for 7 which will be in error by less
than 0.005.

Example 38.22 Derive the mean and variance of the binomial distribution.
In a survey carried out in Funville, 28 children out of a random
sample of 80 said that they bought Bopper comic regularly. Find
95% approximate confidence limits for the true proportion of all
children in Funville who buy this comic. A similar survey in Funville
found that 45 children out of a random sample of 100 said that
they bought Shooter comic regularly. Find 95% approximate
confidence limits for the true proportion of all children in Funville
who buy this comic.
On the basis of these surveys, is there any evidence that the sales of
Shooter comic are higher than the sales of Bopper comic in Funville?
Justify your reply. (AEB 1980)
452 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 38.22 For the answer to the first part, see p. 214.
Let pz be the true proportion of all children who buy the Bopper.
In the sample of 80,
28
Ds = 50 = 0.35 q; = 1—p, = 0.65

The 95% confidence interval for px is


0.35)(0.65
p. +196 /P2 = 0.35 +(1.96) ee
n

= 0.35+0.105
(0.245, 0.455)
The 95% confidence interval for the proportion who buy the
Bopper is (0.245, 0.455).

Let pc be the true proportion who buy the Shooter.


45
In the sample of 100, p, = 700 = 0.45, q, = 0.55.

The 95% confidence interval for pc is

p.t196 /P* = o.45+(1.98) 0.45)(0.55


(CAYO)
n 100
= 0.45+0.098
= (0.352, 0.548)
The 95% confidence interval for the proportion who buy the
Shooter is (0.352, 0.548).

These confidence intervals overlap, puis


so it is possible that pp = 0.43 C.1. for Ps
(say) and pc = 0.39 (say) so that
Pp-> Dc: 0.245 0.352 0.455 0.548
So, on these results there is not sufficient evidence to suggest that
the sales of Shooter comic are higher than the sales of Bopper
comic.

NOTE: this could have been approached by considering a signifi-


cance test for the difference between proportions
— for method,
see p. 497.

Example 38.23 In a sample of 400 shops taken in 1972, it was discovered that 136
of them sold carpets at below the list prices which had been recom-
mended by manufacturers.
(a) Estimate the percentage of all carpet selling shops selling below
list price.
ESTIMATION OF POPULATION PARAMETERS # 453

(b) Calculate the 95% confidence limits for this estimate, and
explain briefly what these mean.
(c) What size sample would have to be taken in order to estimate
the percentage to within + 2%? (SUJB)

Solution 823 From the sample, the proportion of shops selling below list price is
where = We .
es Pe 400

(a) An estimate of the percentage of all carpet selling shops selling


below list price is p where p = p,.
Sop = 0.34 = 34%.

(b) A 95% confidence interval for the true population proportion p


is given by

p#1.96 0.34)(0.66
(2% = 0.34271.96) (Os 8%)
n 400
= 0.34+40.046
= 34% +4.6%
The 95% confidence interval is (34% + 4.6%) = (29.4%, 38.6%).

(c) In part (b) the percentage of shops was estimated to within


+ 4.6%. So we now require n such that the percentage of shops is
estimated to within + 2% (assuming 95% confidence).

Situation in part (b) Situation in part (c)


ait 0.34) (0.66)
que (0.34) (0.66)
pel 400 JV n

We require n such that

p.+1.96 /P = p,+0.02
n

0.34)(0.66 ) = 9.02
i.e. 1.96 —
art. 1296
so Jn = 509 ¥0-34)(0-68)
= 46.42
n = 2155.14
size 2156 would ee be taken.
have to eee
Pee te ofee
So a sample ee
454 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 8g
i. A sample, of size n, is taken from a between the original colonies where the
population in which the proportion of birds were born and the place of recovery.
‘successes’ is p. From the value of the Using an assumed mean of 250.5 obtain
sample proportion given, calculate the the sample mean recovery distance and
confidence interval indicated for the the sample standard deviation.
proportion p. Assuming that the recovery distances
accurately reflect the dispersal of birds
from their original colonies, estimate the
proportion of this type of bird at more
than 300 miles from the original colony.
Give an approximate symmetric 95%
confidence interval for this estimate. (C)
A random sample of 600 was chosen from
the adults living in a town in order to
investigate the number x of days of work
lost through illness. Before taking the
In a survey carried out in a large city, 170
sample it was decided that certain cate-
households out of a random sample of
gories of people would be excluded from
250 owned at least one pet. Find 95%
the analysis of the number of working
confidence limits for the proportion of
_ days lost although they would not be
households in the city who own at least
excluded from the sample. In the sample
one pet. 180 were found to be from these cate-
In order to assess the probability of a gories. For the remaining 420 members
successful outcome, an experiment is per- of the sample 2x = 1260 and
formed 200 times and the number of Lx? = 46 000.
successful outcomes is found to be 72. (a) Estimate the mean number of days
Find (a) 95%, (b) 99% confidence inter- lost through illness, for the restricted
vals for p, the probability of a successful population, and give a 95% confidence
outcome. interval for the mean.
(bo) Estimate the percentage of people
In a market research survey 25 people out in the town who fall into the excluded
of a random sample of 100 from a certain categories, and give a 99% confidence
area said that they used a particular brand interval for this percentage.
of soap. Find 97% confidence limits for (c) Give two examples, with reasons, of
the proportion of people in the area who people who might fall into the excluded
use this brand of soap. categories. (O)

The probability of success in each of a 8. There are No fish in a lake. A random


long series of n independent trials is con- sample of m of these fish is taken. The
stant and equal to p. Explain how 95% fish in this sample are tagged and released
approximate confidence limits for p may unharmed back into the lake. After a
be obtained. suitable interval, a second random sample
In an opinion poll carried out before a of size n is taken. The random variable R
local election, 501 people out of a is the number of fish in this second
random sample of 925 declare that they sample that are found to have been
will vote for a particular one of two tagged. Assuming that the probability
candidates contesting the election. Find that a fish is captured is independent of
95% confidence limits for the true propor- whether it has been tagged or not, and
tion of all voters in favour of this candi- that no is sufficiently large for a binomial
date. approximation to be used, obtain the
expectation of R in terms of m, n and no.
Do you consider there is significant
evidence that this candidate will win the Suppose that m = 100, n = 4000 and that
election? (AEB 1977) the observed value of R is 20. Obtain an
approximate symmetric 98% confidence
The data in Table A overleaf refer to interval for the proportion of fish in the
144 recoveries of a particular type of lake which are tagged. Deduce an approxi-
sea-bird. The distances given are those mate 98% confidence interval for np. (C)
ESTIMATION OF POPULATION PARAMETERS
455
9. A random sample of 500 fish is taken standard deviation of the probability
from a lake, marked, and returned to the distribution of p. Say also how and in
lake. After a suitable interval a second what circumstances this probability distri-
sample of 500 is taken, and 25 of these bution can be approximated by a normal
are found to be marked. By considering distribution. Show that, according to this
the proportion of marked fish in the approximation, the probability that p
second sample, estimate the number of satisfies the inequality
fish in the lake and, by considering a con-
fidence interval for the proportion of Oto)
|p—0|<1.96 i)
marked fish in the lake, obtain a 95% n
confidence interval for the number of
is 95%.
fish. (O)
In a set of 100 observations of this type,
10. In observations of a particular type of 90 gave a positive result. Obtain an
event, the probability of a positive result inequality of the above form, and by
of any one observation is independent of squaring both sides of the inequality
the results of other observations and has calculate from the roots of a quadratic
the value 0, the same for all observations. equation an approximate 95% symmetric
In n observations the proportion giving confidence interval for the value of @ for
positive results is p. State the mean and the type of event observed. (JMB)
TableA

Distance (miles) 1-100 101-200 201-300 301-400 401-500 501-600

Frequency 50 16 it 12 9

Distance (miles ) 700 701-800 801-900 901-1000


Frequency 8 3 4 1

SUMMARY — CONFIDENCE INTERVALS


95% Confidence 99% Confidence
interval interval

Population mean uw

o” known n is sample size.

a? unknown, n is sample size.


n large
aa
where s? is the
sample variance.

o* 2 unknown, x+
SLE = Gt
i Will n is sample size.
n small vn—1 2:
where (— tf, t) where (—t, t) s* is the sample
encloses 95% of the | encloses 99% of the | Vatiance.
t(n—1) distribution | t(n—1) distribution

Population n is sample size.


proportion p Ps is the sample
n large proportion and
qs = 1—ps.

SPECIAL NOTE: In some texts the unbiased estimate of the


variance is not written as 6”, but as s”, the notation that we have
used for the sample variance. Therefore care must be taken to
ensure that these formulae are fully understood so that they can
be used accurately whichever notation is adopted.
A CONCISE COURSE IN A-LEVEL STATISTICS
456

Miscellaneous Exercise 8h

a (a) Before its annual overhaul, the mean Calculate unbiased estimates, in each case
operating time of an automatic machine to 2 decimal places, of the mean and the
was 103 seconds. After the annual over- variance of the sizes of the diamonds in
the original large pile. (L)
haul, the following random sample of
operating times (in seconds) was obtained.

Describe briefly the empirical evidence


90 97 101 92 101 95 95 98 96 95 that you acquired for the Central Limit
theorem.
Assuming that the time taken by the The amount, to the nearest mg, of a
machine to perform the operation is a certain chemical in particles in the atmos-
normally distributed random variable phere at a meteorological station was
with a known standard deviation of 5 measured each day for 300 days. The
seconds, find 98% confidence limits for results are shown in the table.
the mean operating time after the over-
haul.
Amount of 12 13 14.15. 16
chemical (mg)
Comment on the magnitude of these
limits relative to the mean operating time
before the overhaul. Number of 5-42 910-31 12
days

(b) The results of a survey showed that Find the mean daily amount of chemical
3600 out of 10000 families regularly over the 300 days and estimate, to 2
purchased a specific weekly magazine. decimal places, its standard error.
Find the 95% confidence limits for the
Obtain, to 2 decimal places, approximate
proportion of the population buying the
98% confidence limits for the mean daily
magazine.
amount of chemical in the atmosphere.
Estimate the additional number of If daily measurements are taken for a
families to be contacted if the proba- further 300 days, estimate, to 2 decimal
bility that the estimated proportion is places, the probability that the mean of
in error by more than 0.01 is to be at
these daily measurements will be less than
most 1%. (AEB 1987)
14, (L)

A company manufactures bars of soap. In


From alarge pile of industrial diamonds,
a random sample of 70 bars, 18 were
20 were put through6sieves of different
found to be mis-shaped. Calculate an
mesh sizes and the number of diamonds
passing through each sieve was counted. approximate 99% confidence interval for
The table shows the mesh size (mm) and the proportion of mis-shaped bars of soap.
corresponding number of diamonds Explain what you understand by a 99%
passing through each sieve. confidence interval by considering

Piemaee
[>[eT fo]8[a
(a) intervals in general based on the
above method,
(b) the interval you have calculated.
Number of 1/2 14 The bars of soap are either pink or white
diamonds in colour and differently shaped according
to colour. The masses of both types of
Graphically, or otherwise, estimate the soap are known to be normally distributed,
mesh size of the sieve if half the diamonds the mean mass of the white bars being
will pass through it, and the mesh size of 176.2 g. The standard deviation for both
the sieve if one quarter of the diamonds bars is 6.46 g. A sample of 12 of the pink
will pass through it. bars of soap had masses, measured to the
nearest gram, as follows.
Construct a frequency table showing for
each mesh size listed in the table the
WA, 164.182.5169 «171 187 176
extra number of diamonds which passed
alefgh abtatey alireat MILESOY IEr/l3}
through.
ESTIMATION OF POPULATION PARAMETERS # 457

Find a 95% confidence interval for the soap of mass x gmis (15+ 0.065x)p,
mean mass of pink bars of soap. and it is sold for 32p. If the company
Calculate also an interval within which manufactures 9000 bars of pink soap
approximately 90% of the masses of the per week, derive a 95% confidence
white bars of soap will lie. interval for its weekly expected profit
The cost of manufacturing a pink bar of from pink bars of soap. (AEB 1988)
a ee Ee ee eee eee eee eee
SIGNIFICANCE
TESTING
Often in scientific enquiry a statement concerning a population
parameter is put forward asa statistical hypothesis. Its validity is
then tested, based on observations made from random samples
taken from the population.

NULL AND ALTERNATIVE HYPOTHESES Vv


For example, suppose we wish to test whether a sample value x
could have been drawn from a normal population with mean yu
and variance o”.
We assume that the sample is drawn from N(y, o*). This hypothesis
is called the null hypothesis and is denoted by Ho.
If statistical tests show that we should reject the null hypothesis, we
do so in favour of the alternative hypothesis, denoted by A.
For example, if we wish to investigate whether the mean of the
population from which the sample value is taken is 25, say, or
whether the population mean is not 25, the hypotheses would be
written:
Hy: w = 25 (the population mean yp is 25)
H,: pw # 25 (the population mean uy is not 25)
If Hy is true, then X ~ N(u, 0”). a see

mM

Now, we must decide whether it is likely that the sample value has
been drawn from this population. We consider whether it is ‘close
to p’, or whether it is in the tail end of the distribution.
We consider the standard normal variable Z.

In this case Z = r where Z ~ N(0, 1)


o
Z is known as the test statistic.
i
Based on the sample observation, we calculate 2 = ———.
oO
If z is small (i.e. close to zero), we accept that the sample value
could have been taken from a population with mean pw and we do
not reject Ho. If z is large (i.e. far from zero), we reject Ho.
458
SIGNIFICANCE TESTING oe / 459
a

CRITICAL REGION AND CRITICAL VALUES —

We need to select a set of values for Z which tell us when to reject


H. This set of values is known as the critical region and it depends
on the type and the level of the test chosen. The boundaries of the
critical region are called the critical values.
Often, the critical region is chosen so that the probability that Z
falls within it is just 5%.
We have P(|Z|> 1.96) = 0.05 so the critical values are + 1.96.

— 1.96 0 1.96
<— Critical pate Acceptance region —>|— Critical _,
region region
(reject Ho) (reject Ho)

If the test is carried out at the 5% level, we reject Hy if z<—1.96


Orit2 1.96716. 11121 1296.
If we reject Hy at this level, we say that ‘there is significant evidence,
at the 5% level, that the population mean is not w’.

ONE-TAILED AND TWO-TAILED TESTS y


There are two types of test which could be performed, depending
on the alternative hypothesis being made. These are (a) a two-
tailed test, (b) a one-tailed test.

(a) Two-tailed test


A two-tailed test looks for any change in the parameter, e.g. the
hypotheses could be
Hy: # = 29
H,: pw # 25
The critical region depends on the level of the test as shown:
Critical region at 5% level:
P(\Z|>1.96) = 0.05

—1.96 0 1.96
Reject Ho Reject Ho
<~— 1 «er
460 A CONCISE COURSE IN A-LEVEL STATISTICS

N(O, 1)
Critical region at 2% level:
= 0.02 sia 1%
P(\Z|> 2.326)

=32,396 0) | 2326
Reject Ho Reject Ho
— 4 a
Critical region at 1% level:
N(O, 1)
P(|Z|> 2.575) = 0.01
0.5% 0.5%

—2575 0 2.575
Reject Ho Reject Ho
beset ; eas

(b) One-tailed test


A one-tailed test looks for a definite decrease or a definite increase
in the parameter. For example, the hypotheses could be:

Definite decrease Definite increase


Ho: pw = 25 Hy: w= 25

Higa (lee) Bo 25.

Critical region at 5% level:


P(Z <—1.645) = 0.05 P(Z > 1.645) = 0.05

—1.645 0 QO 1.645

Reject Ho Reject Ho
ee
<«—__

Critical region at 2% level:


P(Z <— 2.054) = 0.02 P(Z > 2.054) = 0.02

2%

0 2.054
Reject Ho
| [|

Critical region at 1% level:


P(Z <— 2.326) = 0.01 P(Z
> 2.326) = 0.01
SIGNIFICANCE TESTING # 461

If the distribution given by the null hypothesis Hp is true, then the


BigP bay thatH z lies in the region pronounced as ‘critical’ is 0.05,
0.02, 0.01, ... , depending on the level of the test (5%, 2%, dbJon eu suche

But if z lies in the critical region, we reject Hp. Therefore the


probability that we reject Hy, when in fact it is true, is determined
by the level of the test chosen. For example, if the test is performed
at the 5% level, then the probability of wrongly rejecting Ho is
0.05.

- When performing a significance test it is useful to follow a set


procedure:

Before any sample readings are considered:

(1) State the null hypothesis, Hy and the alternative hypothesis H.


If we are looking for a definite increase or a definite decrease in
the population parameter, we use a one-tailed test and if we are
looking for any change we use a two-tailed test.

(2) Consider the appropriate distribution given by the null hypo-


thesis.

(3) Decide on the level of the test. This fixes the critical values of
the test statistic.

(4) Decide on the rejection criteria.

Now consider the sample values.

(5) Calculate the value of the test statistic.

(6) Makeaconclusion: If the value of the test statistic lies in the


critical region, reject Ho.
If the value of the test statistic does not
lie in the critical region, do not reject Hp.

If Hy is rejected at the 5% level, we say that the test value is


‘significant’.

If Hy is rejected at the 1% level, then the test value is ‘highly


significant’.
462 A CONCISE COURSE IN A-LEVEL STATISTICS

TEST 1 — TESTING A SINGLE SAMPLE VALUE

Example 91 Test, at the 5% level, whether the single sample value of te comes
from a normal population with mean yp= 150 and variance o? = 100.

Solution 9.1 Reminders:

(1) State Hp and H,; decide Ho: m= 150 (the mean of


whether the test is one- the distribution is 150)
tailed or two-tailed H,: #150 (the mean is
not 150) (two-tailed test)

(2) Consider the distribution Now, under Ho, X ~ N(u, 07)


given by Ho with uw = 150, 0 = 10

(3) Decide on the level of the Test at the 5% level


test

(4) Decide on rejection criteria Reject Hp if |z| > 1.96

150
~<—Reject Ho-| [Reject Hp—>

StVe > — Voom On elo

Pe
(5) Calculate the value of the e
test statistic o
_.172—150
10
2.2

(6) Make conclusion As |z| > 1.96, we reject Hp


and conclude that there is
significant evidence, at the
5% level, to suggest that the
sample value does not come
from a population with mean
150.

NOTE: Once z has been calculated, its position can be noted on


the diagram thus, &) , indicating whether it lies in the rejection
region or not. In Example 9.1, z = 2.2 so ®& is placed to the right
of the standardised value 1.96.
SIGNIFICANCE TESTING
v 463
Example 92 Test at the 1% level whether the single sample value 54 has been
drawn from a normal population with mean 65 and variance 30, or
whether the mean is less than 65.

Solution 9.2 Let the population mean be u and the population variance be o?.
Ho: pw = 65
Hy: uw < 65 (one-tailed test)
Now under Hy, X ~ N(u, 0”) with up = 65, o =./30.
We perform a one-tailed test at the 1% level, and reject Hp if
zZ<—2.326, where
x— s.d. = ./30
ie —

54—65 ae He 65
== ~< Reject Ho
V/30
S:Va a2-Oo208 nO
=.772.01
Conclusion: as z > — 2.326, we do not reject Hy and conclude at
the 1% level, that the sample value could have been drawn from a
population with mean 65.

Example 3.3 If 100 seeds are planted, and 83 seeds germinate, use the normal
approximation to the binomial distribution to test the manu-
facturer’s claim of a 90% germination rate. Use a 5% level of signifi-
cance.

Solution 9.3 Let X be the r.v. ‘the number of seeds that germinate’. Then we
have a binomial situation, and X ~ Bin(n, p) with n = 100.
Hy: p = 0.9 (the germination rate is 90%)
H,: p< 0.9 (the germination rate is less than 90%)
(We have chosen a one-tailed test, as this seems more appropriate
to the situation.)
Under Ho, X ~ Bin(n, p) with n = 100, p = 0.9.
Now, as 7 is large, we use the normal approximation to the binomial
distribution
so X ~ N(np,npq) where np (100)(0.9) = 90
LG nd acre N6 905.9) npq (100)(0.9)(0.1) = 9
464 A CONCISE COURSE IN A-LEVEL STATISTICS

if
Perform a one-tailed test, at the 5% level, and reject Ho
z<—1.645, where

es rier s.d. =./npq = 3


Vnpa oe
_ 838—90
- 3 a Reecretl i
= —2.30 S.V. —1.645 0

Conclusion: As z<—1.645 we reject Hp and conclude that there is


significant evidence, at the 5% level, to suggest that the manu-
facturer’s claim is false.

NOTE: on the continuous scale, 83 lies between 82.5 and 83.5, so


we should really apply a continuity correction. In this case we
would try 83.5. If this value is in the critical region, then obviously
82.5 is also.
With the continuity correction,
83.5 —90
3
= —2.17
and the same conclusion is reached.

Exercise 9a
oe

1. Test whether the sample value could have whether you would accept the manu-
been drawn from the normal population facturer’s claim.
indicated in the null hypothesis. Test at
(a) the 5% level, (b) the 1% level. 4. Inasurvey it was found that 3 out of 10
people supported a particular political
iaypotheces A party. A month later the party representa-
tive claimed that the popularity of the
= party had increased. Would you accept that
peat the number who supported the party was
,Si still 3 out of 10 if a further survey revealed
u= that 38 people in a random sample of 100
ae supported the party. Test at the 3% level.

2. Acoin is tossed 64 times. Test at the 5% 5. A gardener sows 150 ‘Special’ cabbage
level of significance whether the coin is seeds and knows that the germination rate
fair, or whether it is biased in favour of is 75%. (a) By using a suitable approxima-
showing heads, if (a) 38 heads occur, tion find the probability that (i) more
(b) 42 heads occur. than 122 seeds germinate, (ii) less than
106 seeds germinate. (6) The gardener also
3. A manufacturer claims that 8 out of 10 sows 120 ‘Everyday’ cabbage seeds and
dogs prefer his brand to any other. In a finds that 81 germinate. Test whether the
random sample of 120 dogs, it was found ‘Everyday’ seeds have a germination rate
that 88 ate that brand. Test at the 5% level less than 75%. Test at the 4% level.
SIGNIFICANCE TESTING a i v 465

TEST 2— TESTING AMEAN ~~


We may wish to investigate whether a random sample of size n,
with mean X, could have been drawn from a normal population
with mean uy.

Case 1 — Population variance o? known Wad


Under the null hypothesis that the population
Mean is yw, taking samples of size n, Distribution of X
ve G7
a am x (u,= |
n

If X is distributed normally then this holds for all sample sizes, but
if X does not follow a normal distribution then n must be large
(Central Limit Theorem).

Reminders: the distribution of X is known as the sampling distribu-


tion of means; the standard deviation of this distribution (a/\/n) is
known as the standard error of the mean (see p. 399).

Now, we want to investigate whether there is a significant difference


between the sample mean and the population mean given by the
null hypothesis.

ve
Standardising, we have Z = sie where Z ~ N(O, 1).
o//n

— :
We use the teststatistic which jis distributed as N (0,» .
uA,
under the null hypotheesis Hothat the true population mean is A

Example 39.4 The lengths of metal bars produced by a particular machine are
normally distributed with mean length 420 cm and standard devia-
tion 12cm. The machine is serviced, after which a sample of 100
bars gives a mean length of 423 cm. Is there evidence, at the 5%
level, of a change in the mean length of the bars produced by the
machine, assuming that the standard deviation remains the same?

Solution 9.4 Let X be the r.v. ‘the length, in cm, of a metal bar’. Let the popula-
tion mean be p and the population variance be a.
466 A CONCISE COURSE IN A-LEVEL STATISTICS

Reminders:

(1) State Hy and H;; decide Hy: b= 420cm


whether the test is one-tailed (there is no change in the
or two-tailed population mean py)
H,: w#420cm
(there is a change in the
population mean yp)
(Two-tailed test)

(2) Consider the distribution Consider the sampling distri-


given by Ho bution of means under Ho,
= o*,
XEN
n
with o = 12, n= 100

(3) Decide on the level of the We perform a two-tailed test


test at the 5% level

(4) Decide on rejection criteria Reject Hy if |z| > 1.96

420 c
<— Reject Ho a Reject Hp >

Sve a7 1,96 1.96

(5) Calculate the value of the


test statistic
= a/x/n
4423—420
12//100
2.5
(6) Make conclusion As’|z| > 1.96. we reject Ho
and conclude that there is
sufficient evidence, at the 5%
level, of a change in the mean
length of the bars produced
by the machine

Example 9.5 Experience has shown that the scores obtained in a particular test
are normally distributed with mean score 70 and variance 36. When
the test is taken by a random sample of 36 students, the mean score
is 68.5. Is there sufficient evidence, at the 3% level, that these
students have not performed as well as expected?
SIGNIFICANCE TESTING y 467

Solution 95 Let X be the r.v. ‘the score of a student’. Let the population mean
be uw and the population variance be o?.
Ho: jw = 70 (the population mean yp is 70 and the students
have not under-achieved)
H,: jw < 70 (the population mean is less than 70 and the
students have not done as well as expected)
Consider the sampling distribution of means where, under Ho

= a?
hee nu “| With 840, O- = 30) n. =_36

We perform a one-tailed test at the 3% level.


Now, ifa is the critical value of the test statistic, then

(a) = 0.03
a =—c-1.881
So, we reject Hy if z < — 1.881, where 3%

gtf iol
ae ol/n SPnaea i
68.5 — 70 S.V. —1.881 0
6/36
elo
Conclusion: as z > —1.881 we do not reject Hy and conclude that
at the 3% level the students have not under-achieved.

Example 9.6 It is claimed that the masses of components produced at a particular


workshop are normally distributed with a mean mass of 6g anda
standard deviation of 0.8 g. If this claim is accepted, at the 5% level,
on the basis of the mean mass obtained from a random sample of
50 components, between what values must the mean mass of the
50 components in the sample lie?

Solution 96 Let X be ther.v. ‘the mass, in g, of a component’. Let the popula-


tion mean be yw and the population variance be a’.
Ho: w= 6g
H,: pw # 6g (two-tailed test)
Consider the sampling distribution of means where, under Ho
2

xX ~ nu a ituetne=s0:Sonands (nas) 50
n
468 A CONCISE COURSE IN A-LEVEL STATISTICS

If the test is performed at the 5% level then s.d. = z


Hp is accepted if |z|<1.96, where ae 2.5%

eZ i
Se —

af/n =
Acceptance
region
Now, as Hy is accepted, SV. —1.96
—1.96 <z< 1.96

1.€. : 0.8/ /50 7

—=1 99|
75) <0 <4 96

6— 1.96{V50}
= <%<6+1.96(—o
>> a
5.78 <X¥ <6.22
Therefore the mean mass of the 50 components must lie in the
range 5.78g< xX <6.22¢.

Example 3.7 A machine produces elastic bands with breaking tension normally
distributed with mean 45.00 N and s.d. 4.36 N. On a certain day a
sample of 50 was tested and found to have a mean breaking tension
of 43.46 N. Test at the 5% level of significance whether this indicates
a change in the mean, explaining what is meant by ‘5% level of
significance’.
Find a 95% confidence interval for the population mean based on
the sample mean assuming an unchanged s.d.
If the s.d. has changed to a, find the least value of o for a 95%
confidence interval for the population mean to contain 45.00 N.
(SUJB)

Solution 39.7 Let X be the r.v. ‘the breaking tension, in N, of an elastic band’.
Let the population mean be yp and the population variance be o,?.
Ho: wm = 45.00 (there is no change in the mean)
H,: yw # 45.00 (there is a change in the mean)

Consider the sampling distribution of means where, under Hy


me oi"
xX ~ n(u, with pw = 45.00N,

0, = 4.36N, n = 50
SIGNIFICANCE TESTING # 469

We use a two-tailed test at the 5% level and q, = 4:36


reject Hy if |z|> 1.96, where “50
2.5% 2.5%
Ss =]
x Uh
O4//n q 45.00 t
=~ Reject Hy - Reject Hp —>
43.46 — 45.00 S.V. — 1.96. 1.96
4.36/./50
= —2.498

Conclusion: As |z|> 1.96, we reject Hy and conclude that there


is evidence, at the 5% level, of a change in the mean.

As the level of the test chosen is the 5% level of significance, the


probability that we have wrongly rejected Hp is 0.05.
A 95% confidence interval for the population mean is given by

£1.96
BH1.96-0- 46 £1.96 96-2.
= 43.4641
= 43.46+1.209
= (42.25, 44.67)
So the 95% confidence interval for the mean breaking tension is
(42.25 N, 44.67N).

NOTE: as expected, the value 45.00 is not in this interval.

If the standard deviation has changed to o, then the least value of o


for the 95% confidence interval to contain 45.00 is such that

WG 11 06 A
/50
Be ee
45.00 — 43.46
1.96
5G 43.46 45.00
D000 (2 dp) Ih 1G @ ee

So the least value of o must be 5.56N.

Example 9.8 Describe, referring to your projects if you wish, the steps used in
carrying out a significance test.
/
//

\ Over a long period it has been found that the breaking strains of
| cables produced by a factory are normally distributed with mean
6000N and standard deviation 150N. Find, to 3 decimal places,
the probability that a cable chosen at random from the production
will have a breaking strain of more than 6200N.
A modification is introduced into the production process which
only affects the value of the mean breaking ‘strain. Six cables,
chosen at random from the modified process, are tested and found
to have a mean breaking of 5920N.
470 A CONCISE COURSE IN A-LEVEL STATISTICS

e
(a) Test, at the 5% significance level, whether the sample evidenc
is sufficient to conclude that the mean breakin g strain of the cables
is actually less than 6000 N.

(b) Find, to 3 significant figures, the value C for which we can


state with 90% confidence that the mean breaking strain of the
cables exceeds CN. (L)

Solution 3.8 For steps used in carrying out a significance test, see p. 461.
Let X be the r.v. ‘the breaking strain, in N, of a cable’.
X ~ N(6000, 1507) ‘

6200 — 6000
P(X > 6200) = P\ 7 aae

ee 6000 6200
=3 0091 (3 d.p.) S.V. 0 1.333

The probability that a cable chosen at random will have a breaking


strength of more than 6200N is 0.091 (3 d.p.).

A sample of six cables is tested, giving X = 5920N. Let yu be the


population mean.

(a) Ho:u = 6000N


H,:u<6000N

Consider the sampling distribution of means, where under Ho,


os o7
K~N(u = with » =6000N, 0 =150N, n=6
n

vm 1507
sO xX ~ N{6000,

We perform a 1-tailed test, at 5% level and reject Ho if


z<~—1.645 where
a 150
z= oir V6
oh/n Sh
_ 5920 —6000
150A/6 a racevhiol oe
= —1.306 Siva 1.645 6

Since z >—1.645 we do not reject Hy and conclude


that the mean breaking strength is not less than 6000 N.
SIGNIFICANCE TESTING
’ 471

(b) We require a one-sided (not symmetric) confidence interval


such that
P(C<p<o) =
This is given by

¥ —1.282——,
0

| Vn Siva 1.282 0
gins 0
so C i AO ce
Vn
1
5920 —1.282 i
/6
5840 (35S.F.)
Therefore we can state, with 90% confidence, that the mean
breaking strength of the cables exceeds 5840 N.

_ Exercise 9b

For each of the following, a random The sample mean is X.


sample of size n is taken from a normal Test the hypotheses stated, at the level of
distribution with mean y and variance ae significance indicated.

_Level of
H

M=15.8, Hy: u#15.8


:
seneaoee A: U> 26.3
> W=123.5, Ay: ie 23.5
(d) : w= 4.40, Ay: U< 4.40

The masses of components produced by a A manufacturer claims that his cassettes,


certain machine are normally distributed advertised as having a playing time of 90
with mean 15.4 g and standard deviation minutes, actually have a mean playing
2.3 g. The setting on the machine is time of 92 minutes, with standard devia-
altered, following which a random sample tion 1.8 minutes. 36 tapes are selected at
of 81 components is found to have a random and tested. The investigator
mean mass of 15.0 g. Does this provide rejects the manufacturer’s claim, at the
evidence, at the 5% level, of a reduction 5% level, saying that the mean playing
in the mean mass of components produced time of the tapes is less than 92 minutes.
by this machine? Assume that the standard What can be said about the value of the
deviation is not altered. sample mean obtained for this decision
to be taken?

A variable with known variance of 32 is Mass-produced washers have thicknesses


thought to have a mean of 55. A random which are normally distributed with mean
sample of 81 independent observations of 38 mm and standard deviation 0.2 mm.
the variable gives a mean of 56.2. Is there (a) Find, correct to three decimal places,
sufficient evidence that the mean is not the probability that the mean thickness
55 (a) at the 10% level, (b) at the 5% of a random sample of 4 washers will lie
level, (c) at the 1% level? between 2.9 mm and 3.1 mm.
A CONCISE COURSE IN A-LEVEL STATISTICS

(b) During a check on the manufacturing machine indicate that the diameters are
process a random sample of 25 washers is
normally distributed with mean 0.824cm
and standard deviation 0.046cm. Two
taken from production and the mean
hundred samples, each consisting of 100
thickness X mm is calculated. Find the
interval in which the value of X must lie ball bearings, are chosen. Calculate the
expected number of the 200 samples
in order that the hypothesis that the
having a mean diameter less than
production mean thickness is 3mm will
0.823 cm.
not be rejected when the significance
level is 5 per cent. (JMB) On a certain day it was suspected that
the machine was malfunctioning. It may
Describe briefly how the Central Limit be assumed that if the machine is malfunc-
Theorem may be demonstrated. tioning it will change the mean of the
The distance driven by a long distance diameters without changing their standard
lorry driver in a week is a normally distri- deviation. On that day a random sample
buted variable having mean 1130 km and of 100 ball bearings had a mean diameter
standard deviation 106 km. Find, to 3 of 0.834cm. Determine a 98% confidence
decimal places, the probability that in a interval for the mean diameter of the ball
given week he will drive less than bearings being produced that day.
1000 km. Find, to 3 decimal places the Hence state whether or not you would
probability that in 20 weeks his average conclude that the machine is malfunc-
distance driven per week is more than tioning on that day given that the signifi-
1200 km. cance level is 2%. (L)
New driving regulations are introduced 10. X, and X, are independen t random
and, in the first 20 weeks after their variables with means Ll, and [2, variances
introduction, he drives a total of ov and oy respectively. Give the mean
21900km. Assuming that the standard and variance of X;— X2. If Y= AX,
deviation of the weekly distances he where A is a constant, give the mean and
drives is unchanged, test, at the 10% variance of Y.
level of significance, whether his mean
weekly driving distance has been reduced. The random variable X , denotes the mean
State clearly your null and alternative of a random sample of size n,; from the
second of the above distributions, show
hypotheses. (L)
how to obtain the mean and variance of
A machine packs flour into bags. A the distribution of AyX;+ A,X from the
random sample of eleven filled bags was results you have stated, A, and A, being
taken and the masses of the bags to the constants.
nearest O.1g were: 1506.8, 1506.6,
The yield of a certain crop per plot of
1506.7, 1507.2, 1506.9, 1506.8, 1506.6,
standard area is normally distributed with
1507.0, 1507.5, 1506.3, 1506.4. Obtain
mean 253.0 kg and variance 67.1 kg. A
the mean and the variance of this sample
new fertiliser is applied to 10 randomly
showing your working clearly. Filled bags
selected plots, and their mean yield is
are supposed to have a mass of 1506.5 g.
found to be 257.8 kg. Is there any evidence
Assuming that the mass of a bag has
of significant improvement in the yield?
normal distribution with variance 0.16 g
Assume the new fertiliser does not affect
test whether the sample provides signifi-
the variance of the yields. (O)
cant evidence at the 5% level that the
machine produces overweight bags. Give 11. The masses of loaves from a certain
the 99% confidence interval for the mass bakery are normally distributed with
of a filled bag. (C) mean 500 g and standard deviation 20 g.
A sample of size 25 is taken from the (a) Determine what percentage of the
distribution of X where X ~ N(p, 4). output would fall below 475 g and what
The sample mean X is 10.72. At what percentage would be above 530 g.
level test would we reject the null hypo- (b) The bakery produces 1000 loaves
thesis that 4 = 10 in favour of the alter- daily at a cost of 8 p per loaf and can sell
native hypothesis all those above 475 g for 20p each but is
(a) u>10, (b)u#10? not allowed to sell the rest. Calculate the
expected daily profit.
Explain, briefly, the roles of a null hypo- (c) A sample of 25 loaves yielded a mean
thesis and a level of significance in a mass of 490g. Does this provide evidence
project which you have undertaken. of a reduced population mean? Use the
Records of the diameters of spherical 5% level of significance and state whether
ball bearings produced on a certain the test is one-tailed or two. (SUJB)
SIGNIFICANCE TESTING q 473
12. Illustrate the role of the null hypothesis turer believes that this will increase the
with reference, if possible, to one of your mean breaking strength without changing
projects making sure that you state the the standard deviation. A random sample
alternative hypothesis and the level of of 50 one-metre lengths of the new rope
significance used. Explain how you is found to have a mean breaking strength
decided whether to use a one-tail or a of 172.4 kg. Perform a significance test
two-tail test. at the 5% level to decide whether this
Research workers measured the body result provides sufficient evidence to
lengths, in mm, of 10 specimens of fish confirm the manufacturer’s belief that
spawn of a certain species off the coast of the mean breaking strength is increased.
Eastern Scotland and found these lengths State clearly the null and alternative
to be hypotheses which you are using. (L)

220 O 2a Lal EG 1201)


10.7 11.4 14.7 10.4 9.3 14. (a) Write down the mean and the variance
of the distribution of the means of all
Obtain unbiased estimates for the mean
possible samples of size n taken from an
and variance of the lengths of all such fish
infinite population having mean W and
spawn off Eastern Scotland.
variance 0.
Research shows that, for a very large
Describe the form of this distribution of
number of specimens of spawn of this
sample means when
species off the coast of Wales, the mean
body length is 10.2 mm. Assuming that (i) n is large,
the variance of the lengths of spawn off (ii) the distribution of the population
is normal.
Eastern Scctland is 2.56, perform asigni-
ficance test at the 5% level to decide Explain briefly how you acquired
whether the mean body length of fish empirical evidence for the Central Limit
spawn off the coast of Eastern Scotland Theorem.
is larger than that of fish spawn off the (6) The standard deviation of all the till
coast of Wales. (L) receipts of a supermarket during 1984
13. Give an example, from your projects if was £4.25.
you wish, of the steps used in carrying (i) Given that the mean of a random
out a test of significance. sample of 100 of the till receipts is
£18.50, obtain an approximate 95%
Climbing rope produced by a manufac- confidence interval for the mean of
turer is known to be such that one-metre all the till receipts during 1984.
lengths have breaking strengths that are (ii) Find the size of sample that
normally distributed with mean 170.2 kg should be taken so that the manage-
and standard deviation 10.5 kg. Find, to 3 ment can be 95% confident that the
decimal places, the probability that sample mean will not differ from the
(a) a one-metre length of rope chosen at true mean by more than 50p.
random from those produced by the (iii) The mean of all the till receipts
manufacturer will have a breaking strength of the supermarket during 1983 was
of 175 kg to the nearest kg. £19.40. Using a 5% significance level,
(b) a random sample of 50 one-metre investigate whether the sample in (i)
lengths will have a mean breaking strength above provides sufficient evidence to
of more than 172.4 kg. conclude that the mean of all the
A new component material is added to 1984 till receipts is different from
the ropes being produced. The manufac- that in 1983. (L)

Testing a Mean. Case 2 — Population variance o* unknown, sample


size large (n = 30, say)
Again, we have the sampling distribution of means, where
. Oo:
eer n(u,=|
n

Now, since o? is unknown, we use an estimator oO? for it.


A CONCISE COURSE IN A-LEVEL STA TISTICS
474
2
We have 0? = where S? is the r.v. ‘the sample variance’. But

since n is large, ae

‘We use the test statistic Z = = 2 which is distributed as N(0,2 1)


: eer as
oh/n

under the null hypothesis Hy that the true population mean is


= 4

with @~S. : ee

Example 9. A normal distribution is thought to have a mean of 50. A random


sample of 100 gave a mean of 52.6 and a standard deviation of
14.5. Is there evidence that the population mean has increased
(a) at the 5% level, (b) at the 1% level?

Solution 9.9 Let the population mean be p and the population variance be a”

The sample mean X is 52.6 and the sample standard deviation sis
14.5:

Hy: mw = 50 (there is no change in the population mean 1)


H,: wp > 50 (there is an increase in the population mean pL)

Consider the sampling distribution of means.

- oS
Under Ho PS deaii n(u=|
n
ns? 2
Now, o? is unknown, so since n is large we use G7 = Teall
<s

_
a2
0 px
Tei Nu =) with 6 ~ 14.5, w= 50, n = 100
n

(a) We use a one-tailed test at the 5% level and


reject Hy if z > 1.645, where

a emer
o//n e ee
59.6—50 SV. 0 1.645
~ 14.5/,/100
= 1.793
Conclusion: As z > 1.645, we reject Ho and conclude that there is
evidence, at the 5% level, that the population mean has increased.
SIGNIFICANCE TESTING

(b) We use a one-tailed test at the 1% level and


reject Hy if z > 2.326, where

z=
X—p
@ vn 50 La
= 1.793 (as before) eal , a ee
Conclusion: As z< 2.326, we do not reject Hyp and conclude that
there is not sufficient evidence, at the 1% level, that the population
mean has increased.

NOTE: the value of the test statistic z is significant at the 5% level,


but not at the 1% level, giving some, but not strong, evidence that
the mean has increased.
a pon Bl OO
If we do not approximate 6 and use 0” = gore)» then

@=14.57... and z = 1.784... giving the same conclusions.


There is little difference in the z value when n is large so we usually
use 0 ©S.

Example 3.10 A manufacturer claims that the average life of his electric light
bulbs is 2000 hours. A random sample of 64 bulbs is tested and the
life x in hours recorded. The results obtained are as follows:
Lx = 127808, D(x —x)* = 9694.6. Is there sufficient evidence, at
the 2% level, that the manufacturer is over-estimating the length of
life of his light bulbs? Assume that the distribution of the length of
life of light bulbs is normal.

Solution 39.10 Sample readings:


Dx = 127808, Z(x—x)? = 9694.6, n = 64

x= ee andess. Bia)
=
n n
_ 127808 9694.6
64 64

= 1997 = 151.48

The sample mean X = 1997 and the sample variance s?= 151.48
and standard deviation s = 12.31.

ofa light
Significance test: Let X be the r.v. ‘the life, in hours,
tion variance
bulb’. Let the population mean be wu and the popula
be o”.
Ho: uw = 2000 (the manufacturer is not over-estimating the
length of life) d

H,: uw < 2000 (the manufacturer is over-estimating the


length of life)
476 A CONCISE COURSE IN A-LEVEL STATISTICS

Consider the sampling distribution of means.


peo Co
Under Ho, Ae x(x.
n

estimate for
Now, as o? is unknown, and we have a large sample, we
2
2
it, using G? = ——~ *s
sel
~2
Nie nu = with G © 12.31, p= 2000, n= 64
and so
n
We use a one-tailed test at the 2% level, and reject Hy if z<— 2.054,
where
X—u eae OF

ec e a
_ 1997~2000 [es
12.31/./64 <=—Reject Ho

—1.95 S.V. —2.054 0


=

Conclusion: As z >— 2.054, we do not reject Hy and conclude that


there is not sufficient evidence, at the 2% level, that the manu-
facturer is over-estimating the length of life of the light bulbs.
8 | tie O48
NOTE: If we do not approximate for G, then 0? = oe a

6 =12.40...and z =—1.98 giving the same conclusion.

Exercise 9c

In this exercise use 0 ~s unless otherwise distribution with mean p and variance a",
stated.
The sample mean is X. Test the hypo-
i For each of the following, a random theses stated, at the level of significance
sample of size n is taken from a normal indicated.

b= 99.2, ise M#99.2 5%


:Me= 99.2, 2 p> 992

>:M=7, Hy: WT

2. A sample of 40 observations from a suggest that the mean time to perform the
normal distribution gave 2x = 24 and task is greater than 15 minutes.
Lx? = 596. Test, at the 5% level, whether Determine a symmetric 96% confidence
the mean of the distribution is zero. Per- interval for the mean time, based on the
form a two-tailed test. sample observations.
A random sample of 75 eleven-year-olds
performed a particular task. Denoting the Explain, briefly, the roles of a null
time taken by (15+ y) minutes, the results hypothesis, an alternative hypothesis
are summarised as follows: Ly = 90, and a level of significance in a statistical
X(y—F)? = 2025. Test whether there is test, referring to your projects where
sufficient evidence, at the 4% level, to possible.
SIGNIFICANCE TESTING y 477

A shopkeeper complains that the average place, an unbiased estimate of 07 is 97.5.


weight of chocolate bars of a certain type (6) Test the hypothesis that p= 80
that he is buying from a wholesaler is against the alternative hypothesis that
less than the stated value of 8.50g. The uU< 80, using a 5% significance level.
shopkeeper weighed 100 bars from a large (c) Calculate a symmetric 95% confi-
delivery and found that their weights had dence interval for LU. (C)
a mean of 8.36 ¢ and a standard deviation
of 0.72g. Using a 5% significance level, At an early stage in analysing the marks
determine whether or not the shopkeeper scored by the large number of candidates
is justified in his complaint. State clearly in an examination paper, the Examination
the null and alternative hypotheses that Board takes a random sample of 250
you are using, and express your conclusion candidates and finds that the marks, x,
in words. of these candidates give 2x = 11872
Obtain, to 2 decimal places, the limits of and Lx? = 646193. Calculate a 90%
a 98% confidence interval for the mean confidence interval for the population
weight of the chocolate bars in the shop- mean, JU, for this paper.
keeper’s delivery. (L) Using the figures obtained in this sample,
the null hypothesis u = 49.5 is tested
An electronic device is advertised as against the alternative hypothesis
being able to retain information stored M<49.5 at the a% significance level.
in it ‘for 70 to 90 hours’ after power has Determine the set of values of @ for
been switched off. In experiments carried which the null hypothesis is rejected in
out to test this claim, the retention time favour of the alternative hypothesis.
in hours, X, was measured on 250 It is subsequently found that the popula-
occasions, and the data obtained is tion mean and standard deviation for the
summarised by 2(x—76) = 683 and paper are 45.292 and 18.761 respectively.
2(x —76)* = 26132. The population Find the probability of a random sample
mean and variance of X are denoted by of size 250 giving a sample mean at least
Mand O° respectively. as high as the one found in the sample
(a) Show that, correct to one decimal above. (C)

Testing a Mean. Case 3 — Population variance 0? unknown, sample


size small (nm < 30, say)
Consider X ~ N(y, 07)
S?
Since o? is unknown, we use 6? = ie where S? is the r.v. ‘the
sample variance’.
n
Since n is small, i is not approximately equal to 1 and we

cannot use 0” © S?.


Cee as
Now
Vn V/nV/n—1 ~Vn—-1
mes Ll
The test statistic becomes
ol\/n
ei Xu
Gin Si\/n—1
and the distribution of the test statistic changes from N(0,1) to
t(n—1).
478 A CONCISE COURSE IN A-LEVEL STATISTICS

We use the test statistic T = s/Tat which is distributed as


t(n—1) under the null hypothesis Hy that the true petaaes
mean is WU.

Example 9.11 Five readings of the resistance, in ohms, of a piece of wire gave the
following results

1.51, 1.49, 1.54, 1.52, 1.54

If the wire were pure silver, its resistance would be 1.50 ohms. If
the wire were impure, the resistance would be increased. Test, at
the 5% level, the hypothesis that the wire is pure silver.

Solution 39.11 Let X be the r.v. ‘the resistance, in ohms, of a piece of wire’. We
assume X is normally distributed.

Hy: w = 1.500hms_ (the population mean yu is 1.50 ohms and


the wire is pure silver)

H,: uw > 1.500hms_ (the population mean uis greater than


1.50 ohms and the wire is not pure
silver)
(One-tailed test)

Xu
Under Hp,0 the test statistic is T = s)/n=1
————, (
where T ~ t(n—1).)

Now n—1 = 4,s0 T~ t(4).

We use a one-tailed test at the 5% level.

The critical value for ¢ is found from row pv = 4, column Q = 5%,


giving t = 2.132, so we reject Ho if ties, > 2.132.

Considering the sample readings, we have

z 3S («—z)?
i anda aaa
n n
mete _ 0.0018
5 5

= 1.52 = 0.000 36

x = 1.52 s = 0.019
SIGNIFICANCE TESTING 479

Now
X— yp
ttest = S/n t(4)
5%
__1.52—1/50
0.019/./4 Le an fi
= 2406 @). MiKy

Conclusion: As tress << 2.132, we do not reject Hy and conclude


that there is not sufficient evidence, at the 5% level, that the wire
is impure.

Example 3.12 A random sample of eight observations of a normal variable gave


x = 4.65, 2 (x —X)? = 0.74. Test, at the 2% level, whether the mean
of the distribution is 4.3.

_ ZQ@e—x)? 0.74
Solution 3.12 Now s? = me = 0.0925
n

s = 0.804, x = 4.65

Ho: w= 4.3

Has UM =e 4.3

ap
Under Ho, the test statistic is T ===, where T ~ t(n—1).
fe s Sjp/n—1 (
Now, n = 8, therefore T ~ t(7).

We-perform a two-tailed test, at the 2% level.

The critical value for t is found from row v = 7, column 2Q = 2%,


giving t = 2.998.

So we reject Hp if |ties; |> 2.998, where

i= OLS (7)
es ay
s/n : 1% 1%

_ 4.65—4.3
- 0.304/\/7 De maser re, 2
—2.998 0 2.998
3.05

Conclusion: As |test |> 2-998, we reject Hy at the 2% level and


conclude that the mean of the distribution is not 4.3.
480 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 9d

1. For each of the following, a random The sample mean is X. Test the hype-
sample of size n is taken from a normal theses stated, at the level of significance
and variance O°. indicated.
distribution with mean

(a) i=, 24.0, Hye ea


(6) : ; ee Hy ux 40
(c) : ; : W=15038, Hy: w> 1503
(d) : : : p= 133.0, Hy p< 133.0

2. A machine is supposed to produce steel A marmalade manufacturer produces


pins of length 2 cm. A sample of 10 pins , thousands of jars of marmalade each
was taken and their lengths measured in week. The mass of marmalade in a jar
cm. The following results were obtained: is an observation from a normal distribu-
tion having mean 455g and standard
1.98, 1.96, 1.99, 2.00, 2.01, 1.95, 1.97, deviation 0.8 g. Determine the proba-
1.96, 1.97, 1.99 bility that a randomly chosen jar con-

Assuming that the lengths are normally tains less than 454 g.
distributed, test, at the 1% level of Following a slight adjustment to the
significance, whether the machine is in filling machine, a random sample of 10
good working order. jars is found to contain the following
masses (in g) of marmalade:

‘ re ; 454.8, 453.8, 455.0, 454.4, 455.4,


8. An athlete finds that his times forrunning 454.4, 454.4, 455.0, 455.0, 453.6
the 100 m race follow a normal distribu-
tion with mean 10.6 seconds. He trains (a) Assuming that the variance of the
intensively for a week and then runs distribution is unaltered by the adjust-
100 m on each of 5 consecutive days. His ment, test, at the 5% significance level,
times (measured in seconds) were 10.7, the hypothesis that there has been no
10.65, 10.75, 10.8, 10.6. Is there evidence, change in the mean of the distribution.
at the 5% level, that the training has (b) Assuming that the variance of the
improved his times? distribution may have altered, obtain an
unbiased estimate of the new variance
and, using this estimate, test, at the 5%

4. ‘Family’ packs of bacon slices are sold in significance level, the hypothesis that
1.5 kg packs. A sample of 12 packs was there has been no change in the mean
selected at random and the masses, of the distribution. (C Further Maths)
measured in kg, noted. The following
esuline wereuobiaineds Dc = 17.61 A random sample of 8 women yielded the
Dx? = 26.4357. ; following cholesterol levels:

Assuming that the masses of the packs TALR2 ST howd. 8.4 1.9.3.3 4.6
follow a normal distribution, with variance It is required to test whether the sample
o”,test at the 1% level whether the packs could be drawn from a population whose
ne significantly underweight (a) if G? is mean cholesterol level is 3.1.
unknown, (b) if o” = 0.0003. (a) Assuming that the sample is drawn
from a normal distribution give two
reasons why a t-test is appropriate.
5. It is thought that a certain Normal (b) Perform the test, stating your null
population has a mean of 1.6. A sample and alternative hypotheses. What con-
of 10 gives X = 1.49 and s = 0.3. Does clusions do you draw?
this provide evidence, at the 5% level, (c) Calculate a 90% symmetric confi-
that the population mean is less than dence interval for the mean cholesterol
1.6? level in the population. (SUJB)
SIGNIFICANCE TESTING y ae | 481
\
TEST 3 — TESTING THE DIFFERENCE BETWEEN MEANS _
Consider two unpaired, independent samples of sizes n, and n such
that

snes N(uy, 0;’) and X, ~ N(t2, 07’)


Pie Oo eo
Then Aone oli N (isha Se+ SE
ny Ng
This distribution is known as the sampling distribution of the
difference between means.
The following may be used to test whether there is a significant
difference between means.
(1) Ifo,’, 0,7 are known
we use the test statistic
oe Xi Xe ht)
e rae
G- 67
ny Ng
which is distributed as N(0,1).

(2) If there is a known common population variance such that


Oo; = 6,7 = o7, then
aves <9 Bf 1
X,—X_, ~ N(ti—o, ier.
1 2

we use the test statistic

z= Te ee where Z — N(O-1)
Cfo a 0
ny Ny

(3) If there is an unknown common population variance o* then we


use an estimate G for it, where
= 14817 + N28" ae ‘
2 = ————— where s,*,5S,° are the sample variances
n,+n,—2

For small samples we use the test statistic

ces Xi 7X27 (ti ba) where T ~ t(n, +n,—2)


ee
Cf
ny Ny

For large samples we use the test statistic

Z= Acs ts where Z,~ N(0,1)


Cfo a
ny No
482 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 3.13 A random sample of size 100 is taken from a normal population
with variance 0,” = 40. The sample mean <;, is 38.3. Another
random sample, of size 80, is taken from a normal population with
variance 0,’ = 30. The sample mean X; is 40.1. Test, at the 5% level,
whether there is a significant difference in the population means
and p.

Solution 3.13 Sample 1: n, = 100, X, = 38.8, o/ = 40,


population mean = py

Sample 2: nm, = 80, x, = 40.1, Gn = ou.


population mean = U2

Ho: bi = M2 (there is no difference between the means)

Ay: My, # bw, (there is a difference)

We consider the sampling distribution of the difference between


means where under Hp,
ae dei On
Xapte spNi Mine aaah a6
ny nN

We use a two-tailed test, at the 5% level and reject Hy if |z| > 1.96,
where

z X—X2— (Mi — M2)


Cea
orm fox
MA n2 sd.= f+
ia aeeite.
38.3 —40.1 —(0) sa
40 = 30

100 80 ~<— Reject eal eal pee Hyo—>


= —204 SiViq al. S6e +0 1.96

Conclusion: As |z|> 1.96, we reject Hy and conclude that there is


evidence, at the 5% level, of a difference in population means.

Example 39.14 The same test was given to a group of 100 scouts and to a group of
144 guides. The mean score for the scouts was 27.53 and the mean
score for the guides was 26.81. Assuming a common population
standard deviation of 3.48, test, using a 5% level of significance,
whether the scouts’ performance in the test was better than that of
the guides. Assume that the scores are normally distributed.
SIGNIFICANCE TESTING y 483

Solution 3.14 Let X be the r.v. ‘a scout’s score’.

Scouts: xX = 27.538, n, = 100, population mean lI it

Let Y be the r.v. ‘a guide’s score’.


Guides: y = 26.81, n, = 144, population mean = yp,
Common population standard deviation o = 3.48.
Ho: by = w. (there is no difference in the performances)
Ay: wy > 2 (the performance of the scouts was better)
Consider the sampling distribution of the difference between means,
where under Ho,
Aasioived ‘ Lien
Ages N(i Ba, o*(-
ny,
=n No|
We use a one-tailed test, at the 5% level, and reject Hy if z > 1.645,
where

(X —¥) — (ui — Ma) sd.mo / +


1 1 5%
0 ee =
Nn, Ny
_ 27.58 —26.81— (0) ce lagen
vs 1 i SV. 0 1.645
3.48 /—+—
100 144
= 1.589
Conclusion: As z <1.645, we do not reject Hy and conclude that
there is not sufficient evidence, at the 5% level, to show that the
performance of the scouts in the test was better than that of the
guides.

Example 3.15 A certain political group maintains that girls reach a higher standard
in single-sex classes than in mixed classes. To test this hypothesis
140 girls of similar ability are split into two groups, with 68
attending classes containing only girls and 72 attending classes with
boys. All the classes follow the same syllabus and after a specified
time the girls are given a test. The test results are summarised thus:

Girls in the mixed classes: Lx = 7920, Ux? = 879912


Girls in single-sex classes: 2y 7820, Ly? 904 808

Treating both samples as large samples from normal distributions


having the same variance, obtain a 2-sample pooled estimate of the
common population variance. Test whether the results provide
significant evidence, at the 1% level, that girls reach a higher stan-
dard in single-sex classes.
484 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 9.15 Mixed classes:


Let X be the r.v. ‘score of a girl in a mixed class’.
Dx = F920 pad x2 = 879.912; tng = 72;
population mean py;
lx Dx? fs
Therefore XxX = — A se

Lt dag bs OOS
ny, ny

879 912
Ue, He.
= 110 = 121

Single-sex classes:
Let Y be the r.v. ‘the score of a girl in a single-sex class’.
Ly = 7820, Ly? = 904808, n, = 68,
population mean pu
x 2a
Therefore y = ae and %s, = ———¥?
nN nN
820 904 808
= aoe = = a
68 68
= 115 = 81

The pooled 2-sample estimate of the common population variance


is given by o”, where

#2 = nys~tns2
Rein 2

So
=>C=_ 72(121) +68(81)
124 08 =o
= 103.04
Therefore o* = 103.04 and 6 = 10.15 (2 d.p.).

Significance test:
Ho: Uy = M2 (there is no difference in the test scores)
Hy: wy < pw. (the girls in the single-sex classes reach a
higher standard)

We consider the sampling distribution of the difference between


means, where under Hy
eomter be Peay
BO Vou N (ia {+
Ty ge lta
We use a one-tailed test at the 1% level, and reject Ho if z << — 2.326,
SIGNIFICANCE TESTING 4

where z=
Q) | 1%
Ny Ny

P110—=115—(0), =enaean
Herons By pie a S.V. —2.326 0
—-+ —

72 68
ee
Conclusion: As z <—2.326 we reject Hy and conclude that there is
evidence at the 1% level to suggest that girls in single-sex classes
reach a higher standard.

Example 3.16 Two statistics teachers, Mr Chalk and Mr Talk argue about their
abilities at golf. Mr Chalk claims that with a number 7 iron he can
hit the ball, on average, at least 10 m further than Mr Talk. Denoting
the distance Mr Chalk hits the ball by (100 + c) metres, the following
results were obtained: n, = 40, Dc = 80, U(c —é)? = 1182.
Denoting the distance Mr Talk hits the ball by (100 + t) metres, the
following results were obtained: n, = 35, 2t = —175,
D(t—#)? = 1197
If the distances for both teachers are normally distributed with a
common variance, show that an unbiased estimate of this common
variance is 31.90.
Test whether there is any evidence, at the 1% level, to support Mr
Chalk’s claim.

Solution 3.16 Mr Chalk:


Let X, be the r.v. ‘the distance, in m, for Mr Chalk’.
Distance = (100+ c) metres
So = —S0 ec)” = 1132, ny = 40,
sample mean X,, population mean i;
80
Now x, = 100+¢ = eet = 102m

Mr Talk:
Let X, be the r.v. ‘the distance, in m, for Mr Talk’.
Distance = (100+) metres

fee iba OLE SL) ALO dat = 35,


sample mean X,, population mean p2
e Salo
Now So 100 Ais leSanne = 95m
486 A CONCISE COURSE IN A-LEVEL STATISTICS

The unbiased estimate of the population variance is 07 where

Tere L(c—@)? + D(t—#)?


oO —

ny ny 2

TIS2
ey1198
40+ 35—2

2329
73

= 31.90

The unbiased estimate of the common population variance is 31.90.

Mr Chalk claims that he can hit the ball at least 10 m further than
Mr Talk. Therefore the alternative hypothesis is that Mr Chalk hits
the ball less than 10 m, and a one-tailed test is performed.

Ho: My—M2 = 10 (Mr Chalk hits the ball 10 m further than


Mr Talk)

Ay: byw, < 10 = (Mr Chalk hits the ball less than 10 m
further than Mr Talk)

We consider the sampling distribution of the difference between


means where under Ho,

ar piel i
At Ae N {Hs “(+
ny Ny

We perform a one-tailed test, at the 1% level, and reject Ho if


z <—2.326 where

- Xxy —X2— (Ui —=e


Ma)
z= p fae
easFt ite Ue)
g /+++
i 1%

102
102 ——95— (10) See sygine= 10
1 1 S.V. —2.326
5.648 f= 4. ——
40 35

Esa

Conclusion: As z > — 2.326, we do not reject Hy and conclude that


there is not sufficient evidence, at the 1% level, to reject Mr Chalk’s
claim that he hits the ball, on average, at least 10 m further than
Mr Talk.
SIGNIFICANCE TESTING Y 487

Example 3.17 The heights (measured to the nearest cm) of a random sample of six
policemen from a certain force in Wales were found to be

176, 180, 179, 181, 183, 179

The heights (measured to the nearest cm) of a random sample of


eleven policemen from a certain force in Scotland gave the following
data:

Ly = 1991, Vy—y)? = 54

Test at the 5% level, the hypothesis that Welsh policemen are shorter
than Scottish policemen. Assume that the heights of policemen in
both forces are normally distributed and have a common population
variance.

Solution 3.17 Welsh policemen:

- Let X be the r.v. ‘the height, in cm, of a Welsh policeman’.

x= 1078s x — 1935708 4a eae.


population mean = p,

a 1078
S0kA = ay eee 179.67cm (2 d-p.)
ny 6

2 ¢\? 193 708 1078\?


Pee i
ny
=)
ny
a 6ae 6 = 4.556 (3d.p.)

Scottish policemen:

Let Y be the r.v. ‘the height, in cm, of a Scottish policeman’.

Ly 190 wey)” = 54,


n, = 11, population mean = yp

Ly _ 1991
So I I =_18iem
nye et
8, = 2(y—y)” = 54
A CONCISE COURSE IN A-LEVEL STA TISTICS
488
variance be ao
Let the unbiased estimate of the common population

We have

nystnsz _ 6(4.556) +54 = 5.422 (3 d.p.)


=
Nas ae Gera 2

so Q) II 2.329cem (34d.p.)

Significance test:

Ho: br = M2 (there is no difference in the mean heights)

Hy: wy < M2 (the Welsh policemen are shorter)

= A =Y ty i)
The test statistic is
z i kon
Cs nice ee
ny Ny

where under Hy T=" Why celts oe)

Now n,+n,—2 =15,so T ~ ¢(15).

We use a one-tailed test at the 5% level.

The critical value for t is found from row v = 15, column Q = 5%,
giving t = —1.753. ;

So we reject Ho if tres <_— 1.753, where inl

Re: 5%
Es (Biases)
test Se
1 1
0 et a <— Reject “i
12 = 1,75 360

_ (119.67—181)-—
(9)

2.329 /—+ oe
62) J

= —1.13 (2 d.p.)

Conclusion: As ties, > — 1.758, we do not reject Hy and conclude


that there is not sufficient evidence, at the 5% level, to suggest that
Welsh policemen are shorter than Scottish policemen.
SIGNIFICANCE TESTING g 489

Exercise 9e

1. For each of the following sets of data, means, 1 and My, of the normal popula-
perform a test to decide whether there is tions from which the samples are drawn.
a significant difference between the

Common population
standard deviation (0)

(d)

(e)

(f)

(g)

(h)

9)

()

(R)

(1)

(m)

(n)

An investigation was carried out to assess Number in


the effects of adding certain vitamins to sample

the diet. 64 two-week old rats were given Rats with


a vitamin supplement in their diet for a vitamin supplement
period of one month, after which time Rats without
their masses were noted. A control group vitamin supplement
of 36 rats of the same age were fed on an
the vitamin supplement have a greater
ordinary diet and their masses were also
mass, at age six weeks, than those not
noted after one month. The results are
given the vitamin supplement.
summarised in the table.
Treating the samples as large samples 3. (a) In one county in England, a random
sample of 225 12-year old boys and 250
from normal distributions with the same
12-year-old girls was given an arithmetic
variance, test whether the results provide
test. The average mark for the boys was
evidence, at the 5% level, that rats given
490 A CONCISE COURSE IN A-LEVEL STATISTICS

57 with a standard deviation of 12, whilst Random samples of fourth-year pupils at


the average for the girls was 60 with a two schools are given the same mathe-
standard deviation of 15. matics test. The results are summarised
thus:
Assuming that the distributions are
normal, does this provide evidence at the School A: nm, = 20, x = 43,
2% level that 12-year-old girls are superior L(x—X)° = 1296
to 12-year-old boys at arithmetic? School B: nz= 17, Y¥ = 86,
(b) An IQ test which had been standar- L(y—y)* = 1388
dised giving a mean of 100 and a standard Assuming that the distributions of marks
deviation of 12 was given to a random are normal with a common population
sample of 50 children in one area. The variance, and treating the samples as
average mark obtained was 105. large, test at the 2% level whether there
Does this provide evidence, at the 5% is a significant difference in the mathe-
level, that children from this area are matical ability of the fourth-year pupils
generally more intelligent? (SUJB) at the two schools.

The mean height of 50 male students of a = A random sample of 27 individuals from


college who took an active part in athletic the population of young men aged 18 and
activities was 178 cm with a standard of high intelligence have foot lengths (in
deviation of 5 cm, while 50 male students cm, to the nearest cm) as summarised
who showed no interest in such activities below.
had a mean height of 176cm with a
standard deviation of 7cm. Test the Foot length 94 95 26) 27 285 29 30
hypothesis that male students who take (in em)
an active part in athletic activities have Number with this
the same mean height as the other male foot length 1 (2: Che p? |Sage |
students.
If both samples had been of size n, Obtain the sample mean and show that
instead of 50, find the least value of n which the unbiased estimate of the population
would ensure that the observed difference variance, based on this sample, is 2.00.
of 2cm in the mean height would be Obtain a 96% confidence interval for the
significant at the 1% level. (Assume that mean foot length of this type of person.
the samples continue to have the same A random sample of 48 individuals from
means and standard deviations.) (C) the population of young men aged 18
and of moderate intelligence have foot
Mr Mean notes the time, in minutes, lengths summarised by x = 26.6,
that it takes him to drive to work in 2 (xz—X)? = 123.20. A complex genetic
the mornings. The results are n; = 8, theory suggests that persons of high
Dx, = 120, Dx? = 1827. intelligence have a greater foot length than
(a) Show that the value of the unbiased do those of moderate intelligence. The
estimate of the population variance two samples described above may be
obtained from this sample is 10.29. assumed to have been drawn at random
(bo) Assuming that the times are nor- from independent normal distributions
mally distributed, find a 98% confidence having a common variance. Obtain an
interval for the average journey time, unbiased two-sample estimate of this
and explain what it means. common variance. Treating the samples
as large samples, test this genetic theory,
For his return journey in the rush hour,
using a significance test at the 1% signifi-
Mr Mean notes that for nz= 10,
cance level and stating clearly the hypo-
Dx. = 230, Dx? = 5436.
theses under comparison. (C)
He maintains that, on average, it takes
him at least 10 minutes longer to drive If the mean of n numbers, x1, X2,.. ers
home. is X, prove that their variance is
(c) Using the results from the two
n
samples, find an unbiased estimate of the
common population variance.
DL (xj—m)*— (¥—m)*
kod
(d) Assuming that the times of all
journeys are normally distributed, use the for any constant m.
two-sample t-test at the 5% level to test A garage wished to estimate the average
Mr Mean’s claim. time spent in servicing and repairing
SIGNIFICANCE TESTING t 491

cars during a certain month. A sample normal distribution with variance 0.0001,
of 100 cars yielded: test, at the 5% level, the hypothesis that
Dx; = 325.5, Lx; = 1076, x; being the the population mean is 8.00 against the
time spent, in hours, on the ith car. alternative hypothesis that the population
Assuming that the measurements are from mean is not 8.00.
a normal population, give 95% confidence From a second large consignment, sixteen
limits for the population mean. screws are selected at random and their
Could the restriction of the population mean length (in millimetres) is found to
being normal be dropped? A sample of 25 be 7.992. Assuming a normal distribution
cars from the following month yielded a with variance 0.0001, test, at the 5%
mean repair time of 3.55 hours. Is this level, the hypothesis that this population
evidence of an increase in population has the same mean as the first population,
mean over the previous month? (SUJB) against the alternative hypothesis that this
population has a smaller mean than the
Mr Brown and Mr Green work at the same first population. (C)
office and live next door to each other.
Each day they leave for work together 12. A random sample of size n, is taken from
but travel by different routes. Mr Brown a population P; whose mean is [,; and
maintains that his route is quicker, on variance 0,” and a random sample of size
average, by at least 4 minutes. Both men nz is taken from population P, with mean
time their journeys in minutes over a M2 and variance oy. Under what circum-
period of 10 weeks. The results obtained stances is it valid to test the hypothesis
were: Mi—M2= 0 using a two-sample t-test?
Mr Brown: n, = 50, X,; = 21, A machine fills bags of sugar and a
| 7 ne ot random sample of 20 bags selected from
a week’s production yielded a mean
Mr Green: nz = 50, X2 = 24, weight of 499.1 g with standard deviation
sy = 7.84 0.63 g. A week later a sample of 25 bags
Assuming that the times are normally yielded a mean weight of 500.2 g with
distributed and that they have a common standard deviation 0.48 g. Assuming that
population variance, test at the 5% level your stated conditions are satisfied per-
whether Mr Brown’s claim can be accepted. form a test to determine whether the
mean has increased significantly during
10. Hischi and Taschi are two makes of video the second week. Test whether the mean
tapes. They are both advertised as having during the second week could be 500g.
a recording time of 3 hours. A sample of (Use a 5% significance level for both
49 Hischi tapes was tested and, denoting tests.) (SUJB)
the actual recording time by (180+ h)
minutes, the following results were ob- 13 A large number of tomato plants are
tained: : grown under controlled conditions. Half
DA = 147, 2thhn) = 12720 of the plants, chosen at random, are
A sample of 81 Taschi tapes was also treated with a new fertilizer, and the
tested. Denoting the actual recording time other half of the plants are treated with
by (180+ t) minutes, the results obtained a standard fertilizer. Random samples of
100 plants are selected from each half,
were
and records are kept of the total crop
St = 324, D(t—F)? = 33488 mass of each plant. For those treated with
If the recording times for the two makes the new fertilizer, the crop masses (in
are normally distributed and have a suitable units) are summarized by the
common variance, show that the unbiased figures Dx = 1030.0, Dx” = 11045.59.
estimate of this common variance is 361. Obtain an unbiased estimate of the
Test whether there is significant evidence, population variance, and, treating the
at the 5% level, of a difference in the sample as a large sample from a normal
mean recording times. Is the difference distribution, obtain a symmetric 96%
significant at the 4% level? confidence interval for the mean crop
The lengths (in millimetres) of nine mass.
11.
screws selected at random from a large The corresponding figures for those plants
consignment are found to be 7.99, 8.01, treated with the standard fertilizer are
8.00, 8.02, 8.03, 7.99. 8.00, 8.01, 8.01. Dy = 990.0, Dy? =10079.19. Treating
Calculate unbiased estimates of the pop- the sample as a large sample from a
ulation mean and variance. Assuming a normal distribution, and assuming that
492 A CONCISE COURSE IN A-LEVEL STATISTICS

the population variances of both dis- tributed normally in both populations,


tributions are equal, obtain a two-sample test the hypothesis that the mean height
pooled estimate of the common popula- of the population of male students exceeds
tion variance. the mean height of the population of
Assuming that it is impossible for the female students by less than 0.08 metres.
(O&C)
new fertilizer to be less efficacious than
the old fertilizer and assuming that both 15. A large group of sunflowers is growing
distributions are normal, test whether the in the shady side of a garden. A random
results provide significant evidence (at the sample of 36 of these sunflowers is
fertilizer is associ- measured. The sample mean height is
3% level) that the new
ated with a greater mean crop mass, found to be 2.86m, and the sample
stating clearly your null and alternative standard deviation is found to be 0.60 m.
hypotheses. (C) Treating the sample as a large sample and

14. From alarge population of students, 120 assuming the heights to be normally
distributed, give a symmetric 99% con-
males and 160 females are chosen at fidence interval for the mean height of
random. Their heights x in metres are the sunflowers in the shady side of the
summarised in the table below. The males garden.
and females may be treated as random
samples from two independent popula- A second group of sunflowers is growing
in the sunny side of the garden. A random

[ [sene[
tions.
sample of 26 of these sunflowers is
measured. The sample mean height is
found to be 3.29m and the sample
Males 120 198 | 327 standard deviation is found to be 0.90 m.
Females 160 248 385 © Treating the samples as large samples
(a) Find the sample means and variances. from normal distributions having the
(b) Assuming that in both populations same variance but possibly different
the heights are normally distributed with means, obtain a pooled estimate of the
these means and variances, find the variance and test whether the results
probability that arandomly-chosen female provide significant evidence (at the 5%
will be taller than a randomly-chosen level) that the sunny-side sunflowers
male. grow taller, on average, than the shady-
(c) Assuming only that height is dis- side sunflowers. (C)

TEST 4 — TESTING A PROPORTION

We may wish to test whether a random sample of size n, with


proportion of ‘successes’ p, could have been drawn from a popula-
tion with proportion of ‘successes’ p.
The sampling distribution of proportions gives

Poe np.2) where q


n and n is large
(see p. 407)

The test statistic used is

which is dikioved as N(O, 1)under the null ‘hypothesis Hotthat


theore of ‘successes’ inthe populationsisoe
SIGNIFICANCE TESTING 493
i

Example 3.18 The manufacturer of ‘Chummy Morsels’ claims that 8 out of 10


dogs choose his product rather than that produced bya rival firm.
In a random sample of 200 dogs, 152 chose ‘Chummy Morsels’,
and the rest chose the rival brand. Comment on the manufacturer’s
claim.
2
Solution 3.18 From the sample:
p Ps = ———
200 =°0.76) n = 200

Let p be the population proportion of dogs who prefer ‘Chummy


Morsels’.
Hy: p = 0.8 (80% of dogs prefer ‘Chummy Morsels’ and the
manufacturer’s claim is correct)
H,: p< 0.8 (less than 80% of dogs prefer ‘Chummy Morsels’
and the manufactureyr’s claim is not correct)
Consider the sampling distribution of proportions, where under Ho,

Ps N(p.2) where gq = 1—p


n
Use a one-tailed test at the 5% level. We will reject H, if z<—1.645
where

_ 9.76
—0.80_ JOR grads OF
(0.8)(0.2) S.V. —1.645 0
200
= —1.414
Conclusion: As z > —1.645, we do not reject Hy and conclude that
there is not sufficient evidence, at the 5% level, to refute the
manufacturer’s claim.

NOTE: as we are dealing with proportions, we should use the

continuity correction of + Bn However, if n is large, this makes


n
very little difference to the calculation of z.
With the continuity correction,
1
0.76— al 0.8
400
aa K0.8)(0.2)
200

= —1.503

and the conclusion is the same as before.


NOTE: an alternative approach to this type of problem was intro-
duced in Test 1, on p. 463. The method is shown again below.
494 A CONCISE COURSE IN A-LEVEL STATISTICS

Alternative Let X be the r.v. ‘the number of dogs who prefer Chummy Morsels’.
Solution 3.18 Then X ~ Bin(n, p).
Hy: p = 0.8 (80% prefer Chummy Morsels)
H,;: p < 0.8 (less than 80% prefer Chummy Morsels)
Under Ho
X ~ Bin(n,p) with n = 200 and p = 0.8

Now, using the normal approximation to the binomial distribution,


X ~ N(np,npq) where q = 1—p
X ~ N(np,npq) with np (200)(0.8) = 160
npq (200)(0.8)(0.2) = 32
We use a one-tailed test, at the 5% level and reject Hy if z <—1.645
where
Xow tip s.d,
= /32
2 =
V npg
ha 152160 :
- 4/32 —<=— Reject oe %
SV. "= 1.6450
=.—1.414

Conclusion: As z > —1.645, we do not reject Hy and conclude that


there is not sufficient evidence at the 5% level to refute the manu-
facturer’s claim.

If the continuity correction is used,


151.5 —160
V/32
= —1.508 and the conclusion is the same

Problems of this type can be tackled by either method. The calcula-


tions performed correspond exactly.

Example 3.19 A large college claims that it admits equal numbers of men and
women. A random sample of 500 students at the college gave 267
males. Is there any evidence, at the 5% level, that the college
population is not evenly divided into males and females?

2
Solution 3.19 From the sample: p, = E00 = 0.534, n = 500

Let p be the proportion of males in the population.


Hy: p = 0.5 (there are equal numbers of males and females)
H,: p # 0.5 (the college population is not evenly divided
into males and females)
SIGNIFICANCE TESTING y 495

Consider the sampling distribution of proportions, where

Pq
Py Np.zs where q = 1—p

We use a two-tailed test, at the 5% level, and reject Hy if |z|>1.96


where

a Pilih s.d. = bs
Pq
— 2.5% i
n

_ 0.584—0.5 <—Reject m4 c. as Hy >


(0.5)(0.5) S.V. —1.96 1,96
500

1.52

Conclusion: As z <1.96, we do not reject Hy and conclude that, at


the 5% level, there is not sufficient evidence to refute the claim that
the population is evenly divided into males and females.

_ Exercise Of

1. For each of the following sets of data,


carry out a significance test for the hypo-
theses stated.

sample successes

(@)> 50 45 : p=0.8, Hy: p>0.8


(b) 60 42 : p= 0.55, Hy: p#0.55
«(e))» 120 21 :p=4, Ai: p#¥
(d) 300 213 : p=0.65, Hy: p¥# 0.65
(ey) 90 56 : p =0.76, Hy: p<0.76

2. A theory predicts that the probability of is 0.1. A random sample of 100 items is
an event is 0.4. The theory is tested experi- inspected and found to contain 15 defective
mentally and in 400 independent trials the items. Does this provide evidence, at the
event occurred 140 times. Is the number of 5% level, that the machine is producing
occurrences significantly less than that more defective items than expected?
predicted by the theory. Test at the 1%
aoa ‘4. / A coin is tossed 100 times and 38 heads are
3. Itis thought that the proportion of defec- obtained. Is there evidence, at the 2% level,
tive items produced by a particular machine that the coin is biased in favour of tails?
496 A CONCISE COURSE IN A-LEVEL STATISTICS

A government report states that one-third 8. A factory produces large numbers of


of teenagers in Great Britain belong to a sweets in a variety of colours. Automatic
youth organisation. A survey, conducted machines select the sweets at random and
among a random sample of 1000 teenagers pack them in boxes of 20. A random
from a certain city, revealed that 370 sample of 100 boxes was chosen, the
belonged to a youth organisation. Does contents of each box examined and the
this provide significant evidence, at the 2% number of black sweets in each box
level, that the proportion of teenagers who recorded. The results obtained are sum-
belong to a youth organisation is greater marised in the following table.
in this city than the national average?
No. of
Based on the results of this sample, cal- QO 1-2) 3 4 5 6 or more
black sweets
culate a 95% confidence interval for the
proportion of teenagers in this city who No. of Ae Oe Zila ae 0
belong to a youth organisation. boxes

6. The probability that an oyster larva will (a) Find an unbiased estimate.for the
_/develop in unpolluted water is 0.9, while in
proportion p of sweets produced which
polluted water this probability is less than are black, and, to three significant figures,
0.9. Given that 20 oyster larvae are placed an estimate of its standard error.
in unpolluted water, find the probabilities, (b) Using a distributional approximation
each to two decimal places, that the number and a 5 per cent significance level, test
that will develop is the null hypothesis p = 0.1 against the
(a) atleast 17, alternative hypothesis p#0.1. State
(b) exactly 17. your conclusion.
An oyster breeder put 20 larvae in a sample (c) Given that p=0.1, use tables to
of water and observed that only 16 of find, to the nearest integer, the expected
them developed. Use a 10% significance frequencies corresponding to the observed
level to determine whether the breeder frequencies tabulated above. (JMB)
would be justified in concluding that the
water is polluted. (JMB)
9. Ina public opinion poll, 1000 randomly
chosen electors were asked whether they
A fruit farm grows ‘Golden Delicious’ would vote for the ‘Purple Party’ at the
apples, and it can be assumed that the
next election and 357 replied ‘Yes’. Find
distribution of the masses of the apples is a 95% confidence interval for the propor-
described by a normal probability function. tion p of the population who would
The apples are graded by mass (x g) into
answer ‘Yes’ to the same question.
three grades: ‘small’ for x < 80, ‘medium’
for 80 <x <100, ‘large’ for x > 100. In Twenty similar polls are taken and the
1979, 20% of the apples were graded as 95% confidence interval is determined for
‘small’ and 54% as ‘medium’. Estimate, to each poll. State the expected number of
one decimal place, the mean and standard these intervals which will enclose the true
deviation of the masses of the apples value of p.
produced on the farm in that year. Estimate
The leader of the ‘Purple Party’ believes
also what proportion of the apples had
that the true value of p is 0.4. Test, at the
masses exceeding 105 g.
8% level, whether he is overestimating his
When he begins to harvest his 1980 crop support. (C)
the grower picks out a sample of 100
apples at random and finds that only 9 are
‘small’. Find, on the hypothesis that the 10. In an investigation into ownership of
proportion of ‘small’ apples in the whole calculators, 200 randomly chosen school
crop is the same as in 1979, the probability students were interviewed, and 143 of
of getting 9 or fewer ‘small’ apples in such them owned a calculator. Using the
a sample. Would he be justified in con- evidence of this sample, test, at the 5%
cluding, on the evidence of this sample, level of significance, the hypothesis that
that there has been a reduction (for 1980 the proportion of school students owning
as compared with 1979) in the percentage a calculator is 75% against the alternative
of ‘small’ apples in the crop as a whole? hypothesis that the proportion is less
(SMP) than 75%. ; (C)p
SIGNIFICANCE TESTING as / 497

TEST 5 — TESTING THE DIFFERENCE BETWEEN PROPORTIONS


Consider two random samples, sizes n, and n, with proportion of
‘successes’ p,, and p,,. If the population proportions are p, and p>,
and the sample sizes are large,

then Lae ae with Quem pi


ny

P2902 ‘
Pe N[pa | with q, = 1—p,
No

; Piqi . P2492
So Es coalos oa N(p.—pa a ]
ny 0)
The distribution is known as the sampling distribution of the
difference between proportions.
We usually wish to test whether the samples have been drawn from
populations which have a common proportion p.
| In this case
1 1
Pag Be N(o.2a(--+ where q = 1—p
ny, Ny

Case 1 — If p is known

Case 2 — If p is unknown
We use an estimate p for it, where
NPs, + N2Ds,
Da
nyt+ny,
498 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 9.20 Two companies ‘Consumer Opinion’ and ‘People’s Choice’ conduct
research in a large city before an election. ‘Consumer Opinion’
finds that in a random sample of 500 people, 325 said that they
would vote for Mr A. The ‘People’s Choice’ finds that in a random
sample of 300 people, 201 said that they would vote for Mr A.
Test, at the 5% level, whether there is a significant difference
between the two proportions.

: a 325
ne = 0.65, n, = 500
Solution 9.20 Consumer Opinion: p,, =

Population'proportion = p,

People’s Choice: Pip re 9. Sioa 300

Population proportion = p»

Let the true population proportion voting for Mr A be p.


Ho: Pi = P2 =-p (there is no significant difference between
the proportions)

Hy: pi # D2 (there is a significant difference between


the proportions)

Consider the sampling distribution of the difference between


proportions
eee
where Pep Pe he N(0.pq ie+ =}
1 2

Now, p is unknown, so we use


n 1Ps, a N2Ps,

n,+n,

325+ 201
~ 500+ 300
= 0.6575

and q@ = lp = Oc42G

We use a two-tailed test, at the 5% level, and si sal 5k -


reject Hy if |z|> 1.96, where
ee 2.5% 2.5%
Ps, Ps, 0

as 1
Pq at <— Reject es D Reject Hp >
aoe aa ae 0

ny Ny SV. .—1.96. GO: . 1.96


SIGNIFICANCE TESTING i 499

0.65 —0.67

/ (0.6575)(0.38425) Fao aa

= — O77.
Conclusion: As |z2|< 1.96, we do not reject Hy and conclude that
there is no evidence at the 5% level of a significant difference
between the proportions obtained by the two companies.

Example 3.21 The manufacturer of Stay-White toothpaste maintains that by the


addition of a certain chemical he can guarantee that (over a given
period of time) people will have less dental decay if they clean their
teeth regularly with Stay-White, than if they clean their teeth with
ordinary toothpaste.

To test this theory, free tubes of Stay-White were given to 150


children at a school dental clinic, and at the same time, free tubes
of ordinary toothpaste were given to a further 100 children at the
clinic.

All the children guaranteed to clean their teeth regularly for 6


months and then return for a check-up.

At the end of 6 months it was found that 118 children using


Stay-White had no dental decay, whereas 72 children using the
ordinary toothpaste had no dental decay.

Dentists at the clinic believe that there is no real difference between


the proportions of children who had no dental decay over the 6
month period. Use the sample results to test, at the 10% level, the
dentists’ theory.

Assuming that the proportions of children with no further dental


decay are the same, give an approximate 98% confidence interval
for this common proportion.

118
Solution 3.21 Stay-White toothpaste: p,, = 150 = 0.787, n, = 150,

population proportion = p,
12, =
Ordinary toothpaste: Roe {00 =O) 72,400, 1100,

population proportion = p2

The dentists’ theory is that the proportions are the same.


500 A CONCISE COURSE IN A-LEVEL STATISTICS

Let the common population proportion be p. Now, as we do not


know p, we use

i NPs, + N2Ps,
ny+ny,
118+ 72
250
= 0.76
and gq = 1—p = 0.24
Ho: Pi = P2 = Pp (the proportions are the same)
H,: p, > p2 (the proportion with no dental decay is greater
if Stay-White is used)
Consider the sampling distribution of the difference between
proportions, where
pata
Pa, ah gn ten UC aegateae
ny Ny

We use a one-tailed test at the 10% level, and


reject Ho if z > 1.282 where
s.d. =

10%
Ba
Gaetan
ny, ;
Ne

z=
Dive Pa
fants t 1
BO lie dea EReject Hp —>
ny Nn»
S.V. 0 1.282

0.787 — 0.72
1
(0760.24)(= + a
1.21 (358.F.)
Conclusion: As z <1.282 we do not reject Hy and conclude that
there is no evidence, at the 10% level, to suggest that people have
fewer occurrences of dental decay if they use Stay-White tooth-
paste.

A 98% confidence interval for the common population proportion


p of people who have no further dental decay after 6 months is
given by
_~—~
Pq
p+t+2.326 i where p = 0.76, g = 0.24, and N = 250

The 98% C.I. is


(0.76)(0.24) x 1%
0.76+2.326 /——————_ = 0.76
+0.0628
250 bef —2:326 0 2.326
(0.697, 0.823)
The 98% confidence interval for p is (0.697, 0.823).
SIGNIFICANCE TESTING # 501

Exercise 9g
ie For each of the following sets of data, test
the hypothesis that there is a common
aes D.

Sample II

ieee in ee
| Number of Number in | Number near erosao
sample ‘successes’ sample ‘successes’ Siena ih cee Level

(a) 150 i Pt>=P2—Pp


He Pi<p2
(0) 1000 Ho: D1 = P2—7p,
Ay: py F#p2

(c) 100 Ho: py =Pp2=p


Ay: py F#p2

(d) 80 Ho: pi =P2=P


Ay: pi>p2

A shipment of Golden Delicious apples replied ‘No’. Give an approximate


was tested for bruising. A random sample symmetric 98% confidence interval for
of 1000 apples was found to contain 30 the proportion of fairies that say they
which were bruised. A second random believe in people.
sample of 2000 apples contained 78 bruised A random sample of 62 elves were each
apples. Show that the unbiased estimate for asked the same question: 54 elves replied
the proportion of bruised apples in a ship- ‘Yes’ and the remainder replied ‘No’.
ment is 3.6%. Find whether the propor- A question of interest is whether the
tions obtained from the two samples proportion of fairies that say they believe
differ significantly at the 5% level. in people differs from the proportion of
See Example 8.22 (p. 451). Answer the elves that say they believe in people.
end part of that question using a signifi- Assuming that these proportions are
cance test for the difference between equal, obtain an unbiased estimate of the
proportions. Use a 5% level of significance. common proportion. Using this estimate,
test the question of interest at the 10%
A region has a large population, 60% of significance level, stating clearly your
whom have surnames beginning with a null and alternative hypotheses. (C)
letter between A and M in the alphabet. A farmer has an orchard containing large
In a random sample of 400 people from a numbers of two varieties (A and B) of
town in the region it is found that 260 apple trees. One year the farmer selected
have surnames beginning with a letter at random one tree of each variety, and
between A and M. Test whether this kept a careful count of the fates of all the
result indicates any significant difference apples from these two trees. Some apples
between the town and the region. Give fell from the trees before picking time;
full details of your test, stating any some were eaten by insects; some eaten
assumptions made and the hypotheses apples fell and some apples remained
under test. uneaten and on the trees until picking
In a sample of 300 from another town in time. The farmer’s results are given
the region, 200 have surnames beginning below.
with a letter between A and M. Test
whether the results for the two towns Variety A

indicate a significant difference between Fallen | On tree Fallen | On tree

them. Give full details of your test. (C) Eaten 150 Eaten
Uneaten 40 A Uneaten ot
A certain country in a fairy tale is popu-
lated by elves and fairies. A random You may assume, when answering the
sample of 100 fairies were each asked the questions below, that the fate of an
question ‘Do you believe in people’: 72 individual apple was independent of the
fairies replied ‘Yes’ and the remainder fate of all other apples.
502 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Before any apple had fallen or been by Dutch elm disease. It is found that in
eaten, the farmer selected at random a country A, out of a random sample of
variety A apple and stated that it would 100 elm trees, 67 are affected by the
not fall before picking time. Estimate the disease and that in country B, out of a
probability that he was correct. random sample of 150 elm trees, 93 are
(b) At picking time the farmer acciden- affected by the disease.
tally trod on a fallen apple. Assuming Tree experts have a theory that the
that this apple was equally likely to have proportions of affected trees in the two
been any one of the fallen apples, estimate countries are the same, although there is
the probability that it was of variety A. a possibility that, since the disease affected
(c) Give an approximate symmetric 95% the trees in country A before those in
confidence interval for p 4, the proportion country B, the proportion of trees
of variety A apples remaining on the tree affected in country A may be greater
and uneaten until picking time. than the proportion affected in country
(d) The proportion of variety B apples B. Using the sample results, test, at the
remaining on the tree and uneaten until 10% significance level, the theory of the
picking time is pg. Determine whether experts. For the test that you perform,
there is evidence at the 0.1% significance ‘state clearly the hypotheses under com-
level of a difference between p, and pz. parison.
(C) Assuming that the proportions of affected
trees in the two countries are the same,
Assuming that the mean and variance of a
give an approximate symmetric 98%
random variable X having a binomial
confidence interval for this common
distribution with parameters n and p
are np and np(1—p) respectively, prove
proportion. (C)
the mean and variance of a propor- 10. According to some recent accident
tion based on a sample of size n are p and statistics, a random sample of 800 car
p(1—p)/n respectively, where p is the true drivers injured in road accidents com-
proportion. Of arandom sample of 50 shop- prised of 250 who were wearing seat
pers in a certain city store 13 stated that belts and 550 who were not wearing seat
they lived more than 10 miles from the city belts. Determine a symmetric 99% con-
centre. Of arandom sample of 50 shoppers fidence interval for the proportion of
from another store in the same city 9 injured drivers wearing seat belts.
lived more than 10 miles from the city The injuries of the car drivers were
centre. Stating your null and alternative described as either slight or serious. Of
hypotheses and using a significance level those wearing seat belts, 50 were seriously
of 5% injured; of those not wearing seat belts,
(a) test that the true proportion in both 150 were seriously injured. Determine an
stores could be 0.15; unbiased estimate of the overall propor-
(6) show that the two samples do not tion of serious injuries amongst the injured
offer evidence of a difference in propor- drivers. Test, at the 5% significance level,
tions between the two stores. (SUJB) the hypothesis that the proportion of
serious injuries is greater amongst those
An organisation interviews a randomly injured drivers not wearing seat belts than
chosen sample of 1000 adults from the amongst those injured drivers wearing
population of the United Kingdom, and seat belts. (C)
517 of those interviewed claim to support
the Conservative party. A second organisa- 1a: It is known that in a large population
tion independently interviews a random there is a proportion p with a certain
sample of 2000 adults, of whom 983 attribute. A random sample of size n is
claim to support the Conservative party. taken and it is found that x of them have
(a) Verify that the results of the two the attribute. If p =x/n show that
organisations do not differ significantly at the mean and variance of p are p and
the 5% level. p(1—p)/n respectively .(You may assume
(b) Obtain a symmetric 99% confidence any result relating to a binomial distribu-
interval (based on the combined results) tion.). What is the approximate distribu-
for the proportion of the population who tion of p when nis large?
claim to support the Conservative party. An ambulance station claims that at least
(C) 30% of its calls are life-threatening emer-
gencies. To check this a random sample
Countries A and B contain large numbers of 150 of its records were examined and,
of elm trees, many of which are affected of these, only 38 were found to be life-
SIGNIFICANCE TESTING # 503

threatening emergencies. Test the claim two sets of volunteers. One group of 90
using a 1% significance level. was treated with A and 59 responded
At a neighbouring station a random with lower blood pressure. The other
sample of 150 records showed that 50 group of 80 was treated with B and 51
were life-threatening emergencies. Test responded with lower blood pressure.
whether there is a difference between (a) Find an approximate 95% confidence
percentage rates in the two stations. interval for the population proportion for
(SUJB) which A is effective. In what way is your
interval approximate?
12. A drug research company has produced (6) Test (at the 5% significance level) if
two compounds A and B for reducing there is any difference between the
blood pressure. They are administered to effectiveness of the two drugs. (SUJB)

SUMMARY — SIGNIFICANCE TESTING, USING NORMAL AND


t-DISTRIBUTIONS
Type of test Test statistic

Single sample value Z= X—p


2

X —np
Binomial situation Z=
V npq
o unknown
Large n Small n

Aa Xia
a en =
o/V/n S//n-1

nS?
where 67 = 1S?
ial

Difference between Unequal population variances 0,, 0, known


means —
X7 X27 (bi T 2)
ed neg
Dipyl, 22s
ny, Nz

: 5 a
Equal population variance o

o known o unknown

Large samples Small samples

Xy—X2= (ui = hp) Xp Aa Uh Bs) ~~ X,—X.— (bi 7 M2)


SAB re 6
ge
/—+—
Hat Seca a
eet
Om ssa ee
nm Ny ny Ny Nn, Ny
n 15 i + nS
where = neues

Proportions p known
bee,
Z= (large n)
PQ
n

Difference between Equal population proportion p


proportions Siown aloe

lly Z= Po Psy
a Besp1
ie
iA: 7
pq{(—+—
pa( + | ti No
a NPs, +N2P5,
where DiS Sa
n,+ny,
A CONCISE COURSE IN A-LEVEL STATISTICS
TF ae

Miscellaneous Exercise 9h |
S
ee eei e ee ee

The heights of men can be assumed to be His results for a random sample of six-
normally distributed with standard devia- teen ‘Gofar’ golf balls were X = 224 and
tion 0.11 m. D(x —X)* = 2460. Assuming that the
variance of X is the same for both types
In 1928 the mean height of men in a
of golf ball, obtain a pooled (two-sample)
certain city was 1.72 m. In a survey in
estimate of this variance and, making the
1978 the mean height of a random sample
assumption that the true variance is equal
of 16 men from the same city was 1.77 m.
to this estimate, test at the 5% level
On the hypothesis that the population
whether his results for ‘Gofar’ golf balls
mean height has not changed, calculate
differ significantly from those for ‘Farfly’
the probability of obtaining a sample
golf balls. (C)
mean height greater than that measured.
In another survey in 1978 the mean Mr Smith and Mr Jones are neighbours
height of a random sample of 32 men who work at the same office. Mr Smith
from a second city was 1.73 m. Assuming drives to work in his old car, and each
that the population mean heights are the day records the time (x minutes) his
same in the two cities, calculate the journey takes. After 250 journeys his
probability that a difference in sample observations are summarised by
mean heights greater than that measured Dx = 6250, Dx? =158491. Regarding
would be obtained. (MEI) his observations as constituting a large
random sample, give a symmetric 97%
Jack says ‘Boys are better than girls at
confidence interval for his average journey
Watology’. Jill says ‘Not true, girls are
time.
better’. Assume ability at Watology can
be measured by a test with a maximum Mr Jones drives to work in his new car,
score of 100 and that scores are approxi- and his average time over a random
mately normally distributed. Explain how sample of 50 journeys is found to be 21
to investigate Jack and Jill’s assertions by minutes. Mr Jones claims that if he leaves
describing how to conduct an experiment home 3 minutes after Mr Smith he will,
in which the measurements made are the on average, arrive at work before him.
Watology test scores of boys and girls. Assuming that Mr Smith and Mr Jones
take different routes to work, that their
Assume your experiment has been done
and the following scores obtained:
journey times have standard deviation 3
minutes and that the samples may be
Boys,x | 92 80 76 79 84 80 87 88 81 91 treated as being large samples, test whether
Mr Jones’ claim may be accepted at the
Girls,y | 94 86 78 77 85 83 96 88 82 90
2% significance level. (C)
Test if there is any difference in the
Let p denote the probability of obtaining
ability of boys and girls at Watology.
a head when a certain coin is tossed.
(The following sums may be of use to
(a) If p= 0.4, find the probability of
some candidates: Dx = 838, Ly = 859,
obtaining at least 3 heads in 10 indepen-
Dx? = 70492, Dy? = 74143.) (O)
dent tosses of the coin.
An expert golfer wishes to discover (b) If p= 0.6, find the probability of
whether the average distances travelled by obtaining exactly 12 heads in 20 indepen-
two different brands of golf ball differ dent tosses of the coin.
significantly. He tests each ball by hitting (c) Write down an appropriate null hypo-
it with his driver and measuring the thesis and an appropriate alternative
distance X (in metres) that it travels. The hypothesis for testing whether the coin
distribution of X may be assumed to be is unbiased.
normal. To carry out this test 20 independent
His results for a random sample of nine tosses of the coin are made and the num-
‘Farfly’ golf balls were x = 214 and ber of heads that occurs is observed.
D(x —X)* = 2048. Making the assumption Given that 15 heads occurred, carry out
that the population variance is equal to the test, assuming a 5 per cent significance
the sample variance, obtain a 95% level. Write down a statement of the con-
symmetric confidence interval for the clusion you draw about the value of p
mean of X for ‘Farfly’ golf balls. for this coin. (JMB)
SIGNIFICANCE TESTING ’ 505
6. (a) After a survey a market research com- Given that 0,7= 0.04, 0,7>=0.05 and
pany asserted that 75% of T.V. viewers ny =nz= 100, write down a symmetri-
watched a certain programme. Another cal two-sided 99% confidence interval
company interviewed 75 viewers and for My— M2 in terms of X; — Xp.
found that 51 watched the programme
If in fact WU; = 3.06 and “2 = 3.00, deter-
and 24 did not. Does this provide evidence
mine the probability that the hypothesis
at the 5% level of significance that the
Mi = M2 would not be rejected using a
first company’s figure of 75% was
two-tailed test with significance level 1%.
incorrect?
State how this probability would be
(b) Samples of leaves were collected from
affected if the values of the population
two oak trees A and B. The number of
means were [; = 3.00, U2 = 3.06.
galls was counted on each leaf and the
mean and standard deviation of the num- Determine whether or not the hypothesis
ber of galls per leaf were calculated with My = U2 should be rejected at the 1% level
the following results: of significance in each of the cases when
(a) x1,= 3.07, X2= 2.99,
(b) X;= 3.07, x2 = 3.12. (JMB)
Sample size
The length X of a certain component
Mean
made by a machine is specified by the
S.D.
manufacturer to be 10 cm. X may be con-
Assuming normal distributions, do the sidered to be a random variable distributed
data provide evidence at the 5% signifi- normally with mean 10 cm and standard
cance level of different population means deviation 0.05cm. All components are
for the two trees? (SUJB) tested and are acceptable if they lie
between 9.95cm and 10.08cm. Those
less than 9.95 cm are rejected at a loss of
An investigation was conducted into the
40p each to the manufacturer; those
dust content in the flue gases of two
between 10.03 cm and 10.05 cm can be
types of solid-fuel boilers. Thirteen
shortened at a loss of 20p and those
boilers of type A and nine boilers of type
greater than 10.05 cm can be shortened
B were used under identical fuelling and
resulting in a loss of 25 p. Calculate the
extraction conditions. Over a similar
probabilities that if a component is tested
period, the following quantities (Table A),
the loss L = 0, 20, 25, 40 pence and hence
in grams, of dust were deposited in similar
calculate the expected value of L.
traps inserted in each of the twenty-two
flues. In order to test the accuracy of the
Assuming that these independent samples machine a random sample of 25 com-
came from normal populations with the ponents is measured and found to have a
same variance mean length of 10.014 cm. Is this sufficient
(a) use a two sample t-test at the 5% level evidence at the 5% level of significance to
of significance to determine whether indicate that the mean is greater than
there is any difference between the two 10cm? If a further random sample of 25
samples as regards the mean dust deposit. yielded a mean of 10.012 cm, by pooling
(b) test at the 5% level of significance the two samples determine whether your
whether there is any difference between conclusion about the mean alters. (SUJB)
the two samples as regards the mean dust
Blocks of wood used for flooring are cut
deposit, where this time you should also
by machine. Their lengths are normally
assume that the population variances are
distributed with mean 230mm and
both known to be 196.0.
standard deviation 2mm, while their
Explain the apparent contradiction in widths are normally distributed with
your test results. (AEB 1980) mean 80mm and standard deviation
1.5 mm; the two measurements are
Explain what is meant by a random independent. Calculate the probabilities
sample. (a) that a block selected at random will
Random samples of size n; and nz are lie within the tolerance limits 226.5 mm
taken, one from each of two normal to 233 mm in length,
distributions with means My, U2 and (b) that a block selected at random will
variances 0;°, 07 respectively. The sample lie within the tolerance limits 77 mm to
means are x; and X2 respectively. Write 82 mm for width,
down expressions for the population (c) that a block selected at random will
mean and variance of X;— Xp. satisfy both tolerances,
506 A CONCISE COURSE IN A-LEVEL STA TISTICS

seconds, the results obtained are summar-


(d) that a block selected at random will
be within the tolerance limits for width ised by Dw = 1800, D(w— iw)” = 9009.
but not for length. Assuming that the two types of tape have
playing time distributions with equal
The setting on the machine which cuts variances, show that the unbiased two-
the blocks to length is to be changed so sample estimate of this common variance
that, while the standard deviation remains is 324. Treating both samples as large
unchanged, 95% of the blocks will be no samples, test whether there is significant
longer than 232.7 mm. Calculate the new evidence, at the 3% level, of a difference
mean length. After this resetting a block in the means of the two playing time
is produced that is only 224.6 mm long. distributions. State your hypotheses
Does this suggest that the machine is not clearly. (C)
correctly set? (SUJB)
12. The lifetime, T, in hours of a certain
11. HUM and WOW are two makes of cassette type of electric lamp is a random variable
tapes both having a nominal playing time with distribution
of 5400 seconds. Each tape of a random
sample of 64 HUM tapes was timed using fit) = Ae t! g<t<o
standard equipment. Denoting the actual = 0, t<0
playing time of a HUM tape by (5400 +h)
Find the value of A and show that the
seconds, the results obtained are summar-
mean and standard deviation of T are
ised by Dh = 2624, D(h—h)? = 22748. both 1200 hours.
Obtain an unbiased estimate of the
variance of the playing time of HUM To test the reliability of the production
tapes and obtain a symmetric 96% con- a random sample of 40 bulbs was tested
fidence interval for the average playing and found to have a mean life of 1020
time of HUM tapes. hours. Does this indicate at the 5% level
A random sample of 36 WOW tapes was of significance that the batch from which
also tested. Denoting the actual playing the sample was taken was sub-standard?
time of a WOW tape by (5400+ w) (SUJB)

Table A

Dust deposit — 73.1, 56.4, 82.1, 67.2, 78.7, 75.1, 48.0,


Type A boilers 53.3, 55.5, 61.5, 60.6, 55.2, 63.1
Dust deposit — 53.0, 39.3, 55.8, 58.8, 41.2, 66.6, 46.0,
Type B boilers 56.4, 58.9

TESTS INVOLVING THE BINOMIAL DISTRIBUTION v


pat
We have already seen that if X ~ Bin(n, p), where nis large, we
can test whether a value comes from a particular distribution by
using the normal approximation (see p. 463). We now consider the
case when nis not large.
The significance test can be designed in a similar manner to those
involving the normal distribution. When dealing with a normal
distribution, which is continuous, we look to see whether a parti-
cular point lies in the critical region or not, but when dealing with
a binomial distribution, which is discrete, we look to see whether a
particular rectangle lies in the critical region or not.
Consider X ~ Bin(8, p), where p = 0.4, and suppose we wish to
test, at the 5% level, whether a single sample value x = 7 comes
from this distribution or from a distribution with a higher value
ofp.
SIGNIFICANCE TESTING 507
We make the hypotheses: Ho:p = 0.4
H,:p > 0.4
Under Ho, X ~ Bin(8, 0.4) and P(X =x) = 8C,(0.6)®—*(0.4)*,
Bye becete 8
The diagram shows the probability distribution of X.

P(X
= x)
X ~ Bin (8, 0.4)

We use a 1-tailed test and look at the right-hand tail of the distribu-
tion.
We want to draw the boundary line for the critical region so that
5% of the area lies to the right of the boundary.
We find, from tables or calculations, that
P(X 25) = 0.1737 (> 0.05)
P(X 26) = 0.0498 (< 0.05)
So the boundary line must be drawn slightly to the left of the
rectangle for x = 6.

X ~ Bin (8, 0.4)

Boundary line

0 1 2 3 4 2 6 7 Sian ex
Critical region —»

Now we wish to test the value x =7 and will reject Ho if the


rectangle representing x =7 lies wholly in the critical region.

From the diagram we see that this is the case, and conclude that
Bin(8, 0.4).
‘x =7 is unlikely to occur in the distribution eee
Peer
508 A CONCISE COURSE IN A-LEVEL STATISTICS

is
Example 3.22 A coin is tossed 6 times. Test, at the 5% level, whether the coin
biased towards headsif (a) 6 heads are obtained, (b) 5 heads
are obtained.

Solution 9.22 Let X be the r.v. ‘the number of heads when the coin is tossed 6
times’, and let p be the probability that the coin shows heads.
Hy:p = 0.5 (the coin is fair)
H,:p > 0.5 (the coin is biased so that it is
more likely to show heads)

Under Ho, X ~ Bin(6, 0.5)


and- * P(X=x)b= §C,(0.5)° “(05 are = 0, 1,2,6
= 6C,(0.5)°
From tables or calculations
P(X >5) = 0.109375 (> 0,05)
P(X =6) = 0.015625 (< 0.05)

so the boundary line for the critical region will be drawn as shown
in the diagram, to give an area of 5% in the critical region.
P(X =x)].- 1 X ~ Bin (6, 0.5)

5% shaded

Comoe ae ore ee
Critical region a
Reject Ho

Using a 1-tailed test, at the 5% level, we reject Ho if our observation


lies wholly in the critical region.
(a) 6 heads are obtained:

X ~ Bin (6, 0.5)

5% shaded

2Reject Hp —>

We see that the rectangle for x = 6 lies wholly in the critical region,
and conclude that there is evidence, at the 5% level, to suggest that
the coin is biased towards heads if 6 heads are obtained in 6 tosses.
SIGNIFICANCE TESTING , 509
(b) 5 heads are obtained:

P(X =x)

5% shaded

0 1 2 3 4 5 Cr x:
B Reject Hyp >

The rectangle for x = 5 does not lie wholly in the critical region,
so we do not reject Hy and conclude that there is no evidence, at
the 5% level, to suggest that the coin is biased towards heads if 5
heads are obtained in 6 tosses.

Example 3.23 The discrete r.v. X is distributed binomially with n=10. Ifa
single observation x is taken from the distribution, test, at the 8%
level, the hypothesis that p = 0.45 against the alternative hypothesis
p#0.45 when (a) x=7, (b) x=1.

Solution 39.23 Ho:p = 0.45

Hy:p # 0.45 (2-tailed test)

Under Ho, X ~ Bin(10, 0.45)

Since the test is 2-tailed, we need to consider both tails of the


probability distribution and find boundary lines such that 4% of
the area is in each tail. Note that it is not necessary to draw the
complete probability distribution, since we are interested only in
the tails.

(a) We test first the single observation x = 7. We are interested in


the position of the boundary of the critical region in the right-hand
tail, and need to know whether the rectangle for x = 7 lies wholly
to the right of the boundary, i.e. wholly in the critical region.

Now if x =7 does lie wholly in the critical region we would have


P(X = 7) < 0.04.
510 A CONCISE COURSE IN A-LEVEL STATISTICS

=7
From tables, P(X > 7) = 0.102 > 0.04, so the rectangle for x
does not lie wholly in the critical region. We do not reject Ho, and
conclude, at the 8% level, that p = 0.45.

Boundary line

4% shaded

7 8 9 10
{aes Reject Hy ——>

NOTE: Since P(X > 8) = 0.0274 the boundary line comes within
the rectangle for x = 7.

(b) We now test the single observation x =1. This time we are
interested in the position of the boundary of the critical region in
the left-hand tail. Now the rectangle for x =1 will lie wholly in
the critical region if P(X <1) < 0.04.

From tables, P(X <1) = 0.0233 < 0.04, indicating that the
rectangle for x =1 does lie wholly in the critical region. There-
fore we reject Hy and conclude, at the 8% level, that there is
evidence that p # 0.45.

Boundary line!

4% shaded

0 1 2
<«— Reject Hy =

NOTE: Since P(X < 2) = 0.0996, the boundary line comes within
the rectangle for x = 2.

Example 9.24 State the conditions under which the binomial distribution may be
used. Illustrate your answer by referring to a specific example
preferably from a project.
SIGNIFICANCE TESTING 511
7

Records kept in a hospital show that 3 out of every 10 casualties


who come to the casualty department have to wait more than half
an hour before receiving medical attention. Find, to 3 decimal
places, the probability that of the first 8 casualties who come to
that casualty department (a) none, (b) more than two will have to
wait more than half an hour before receiving medical attention.
Find also the most probable number of the 8 casualties that will
have to wait more than half an hour.

The hospital decided to increase the staff of the department by one


member and it was then found that of the next 20 casualties 2 had
to wait more than half an hour for medical attention. Test (c) at
the 2% level, (d) at the 5% level whether the new staffing has
decreased the number of casualties who have to wait more than half
an hour for medical attention. (L)

Solution 3.24 For the first part see p. 209.

Let X be the r.v. ‘the number of casualties who wait more than half
an hour’, and let p be the probability that a casualty has to wait
more than half an hour.

Then X ~ Bin(n, p) with n=8, p=0.3

andeseeh(%e== x) = £C.(0.7)° *(0.3)*, Natl eo

(a) P(X = 0) = (0.7)® = 0.058 (34d.p.)

(b) P(X > 2) = 1—P(X S2)


1—0.5518 (from tables)

0.488 (3d.p.)
P(X =0) = 0.058
P(X =1) = 8(0.7)7(0.3) = 0.1977
P(X = 2) 28(0.7)%(0.3)? = 0.2965
P(X = 3) = 56(0.7)5(0.3)? = 0.2541
andsoon...

PX = 2) abe eand.P(X' = 2) > PX — 3) and so the


to wait more
most eek of
=eget number
Seerprobable
casualties
ee ie that ee have
ee will ee
than half an hour is 2.
512 A CONCISE COURSE IN A-LEVEL STATISTICS

Now let X be the r.v. ‘the number of casualties in 20 who wait more
than half an hour’.

Then X ~ Bin(20, p)

Ho:p = 0.3 (there is no change in the waiting pattern)

H,:p < 0.38 (there is a decrease in the number who wait


more than half an hour)

Under Ho, X ~ Bin(20, 0.3).

(c) To test the significance of x = 2, at the 2% level, we perform


a 1-tailed test, and reject Hy if P(X <2) <0.02 (indicating that
the rectangle for x = 2 lies wholly in the critical region).

Now, from tables, P(X <2) = 0.0355 > 0.02, so we do not


reject Hy and conclude that there is no evidence at the 2% level to
suggest a decrease in the number of casualties who wait more than
half an hour.

2% shaded

0 1 2 3
Critical region
for 2% level

(d) At the 5% level, reject Hy if P(X <2) <0.05.


Now P(X S 2) = 0.0355 < 0.05, so we reject Hy and conclude that
there is sufficient evidence, at the 5% level, to suggest a decrease in
the number of casualties who wait more than half an hour.

Critical region
for 5% level
SIGNIFICANCE TESTING
y 513

SUMMARY — TESTING A BINOMIAL PROPORTION, n NOT LARGE

X ~ Bin(n, p)
Level of test: a%

Test the single value x =r

-tailed test

a
Reject Hy if P(X 2 r) <——
100

a
Reject Hy if P(X <r) <<——
100

2-tailed test

x
5a
Reject
‘J H,0 if P(X
( <r ) pigeon
100

1
5c
or if P(X Sr) <——
100

Exercise 9i

1. For each of the following, a single observation x is taken from a binomial distribution where
X ~ Bin(n, p). Test the hypotheses at the level of significance stated.

Level of

(a) Ho:p = 0.45, Hy:p > 0.45


(b) Ho:p = 0.45, Hy:p < 0.45
(c) Ho:p = 0.35, Hy:p > 0.35
(d) Ho:p = 0.85, Hy:p #0.35
(e) Ho:p
= 0.45, Hy:p
< 0.45
(f) Ho:p = 0.45, Hy:p > 0.45
(g) Ho:p = 0.4, Hy:p>0.4
(h) Ho:p= 0.3, Hy:p <0.3

2. Adie is thrown 15 times and it shows a six 10 seeds is tested 9 germinate. Is this
on twelve occasions. Is the die biased in evidence, at the 5% level, of an increase
favour of showing a six? Test at the 1% in the germination rate?
level.
4. Ina test of 10 true-false questions a
3. The probability that a certain type of student gets 8 correct. The student claims
seed germinates is 0.7. The seeds tindergo she was not guessing. Test this claim at
i a new treatment, andpewhen a packet of the 5% level.
ne
a ee Re ee heb
514 A CONCISE COURSE IN A-LEVEL STATISTICS

TESTS INVOLVING THE POISSON DISTRIBUTION SY

As with the test involving the binomial distribution we again look


to see whether a particular rectangle lies wholly in the critical
region or not.
Consider X ~ Po(A) and suppose we wish to test, at the 5% level,
whether the single sample value x = 14 comes from the distribution
where \ = 8.5 or from adistribution with a higher value of A.
We make the hypotheses: Hy:\ = 8.5
Hy:\ > 8.5
We use a 1-tailed test and look at the right hand tail of the distribu-

tion.
We want to draw the boundary line for the critical region so that
5% of the area lies to the right of the boundary.
If X ~ Po(8.5), P(X 214) = 0.0514 > 5% and
P(X >15) = 0.0274 <5%, so the boundary line must be drawn as
shown.

Boundary line X ~ Po (8.5)

i
5% shaded

1 31415) Gael 7 aS
_— Critical region ———»>

NOTE: Since P(X 215) = 0.0274, the boundary line comes


within the rectangle for x = 14.
The rectangle representing P(X = 14) does not lie wholly in the
critical region and we would therefore accept Hy and conclude that
x = 14 does come from the distribution Po(8.5).

Example 9.25 The number of misprints on the front page of the Daily Informer
is found to have a Poisson distribution with mean 6.5. A new proof-
reader is employed and shortly afterwards the front page was found
to have 12 misprints. The editor says that the mean number of mis-
prints has increased. Test this claim at the 5% level.

Solution 39.25 Let X be the r.v. ‘the number of misprints on the front page’.
Then X ~ Po(A)
Ho: = 6.5 (the mean is unchanged)
H,:\. > 6.5 (the mean has increased)
We test at the 5% level and will reject Hy if P(X 212) < 0.05,
indicating that the rectangle for P(X = 12) lies wholly within the
critical region.
SIGNIFICANCE TESTING # 515

Now, from tables or by calculation, we find that


P(X 2 12) = 0.0339 < 0.05, so we reject Hy and conclude that
there is evidence, at the 5% level, to suggest that the mean has
increased.

! Boundary line X ~ Po (6.5)


1
t

Critical region —————_»

NOTE: Since P(X 211) = 0.0668, the boundary line is drawn


within the rectangle for P(X =11).

Example 3.26 Consider X ~ Po(A) and Hy:\ = 6.5. If x = 2, test, at the 5%


level, (a) Hy:A #6.5, (b)Hy:\<6.5.

Solution 39.26 (a) Ho:\ = 6.5


Ay: # 6.5
We perform a 2-tailed test, at the 5% level, and reject Ho if
P(X <2) <0.025, indicating that the rectangle for P(X = 2) lies
wholly in the critical region.
We find, from tables or calculation, that P(X < 2) = 0.043 > 0.025,
therefore we do not reject Hy and conclude that \ = 6.5.

(b) HEX ="6'5


Hyd < 6.5
We perform a 1-tailed test, at the 5% level, and reject Hp if
P(X <2) < 0.05.
Now, since P(X <2) = 0.043 < 0.05, we reject Hy and conclude
that the mean is less than 6.5.

Example 9.27 State conditions under which the Poisson distribution is a suitable
model to use in statistical work. Describe briefly how a Poisson
distribution was used, or could have been used, in a project.
(a) The number, X, of breakdowns per day of the lifts in a large
block of flats has a Poisson distribution with mean 0.2. Find,
to 3 decimal places, the probability that on a particular day
(i) there will be at least one breakdown,
(ii) there will be at most two breakdowns.
516 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) Find, to 3 decimal places, the probability that, during a 20


day period, there will be no lift breakdowns.
(c) The maintenance contract for the lifts is given to a new com-
pany. With this company it is found that there are 2 break-
downs over a period of 30 days. Perform asignificance test at
the 5% level to decide whether or not the number of break-
downs has decreased. (L)

Solution 9.27 (a) Let X be the r.v. ‘the number of breakdowns per day’.
Then X ~ Po(0.2).

(i) P(X 21) = 1 — P(x = 0)


== Ve 82

= 0.181 (3d.p.)

(ii) P(X <2) = P(X =0)+ P(X = 1) + P(X = 2)


0.2?
=the +d [ EN ]
= 1,22e ©
= 0.999 (3d.p.)

(b) In 1 day we ‘expect’ 0.2 breakdowns, so in 20 days we ‘expect’


20 X 0.2 = 4 breakdowns.
Let Y be the r.v. ‘the number of breakdowns in 20 days’.
Then ;
Y ~ Po(4) and P(Y=0)=e *=0.018 (34d.p.)

NOTE: We could consider

P(no breakdowns in 20 days) = (P(X = 0))?°


(e020

=e as before.

(c) In 30 days we ‘expect’ 30 X 0.2 = 6 breakdowns.


Let B be the r.v. ‘the number of breakdowns in 30 days’.
Then B ~ Po(A) where A = 6.
Now there are 2 breakdowns in 30 days and we wish to test whether
there has been a decrease in the average number.of breakdowns.
Hy: = 6 (there is no change)
Hy: <6 (the average number of breakdowns has decreased)
SIGNIFICANCE TESTING . 517

We perform a 1-tailed test, at the 5% level, and will reject Ho if


P(B <2) <0.05.
Now, from tables, P(B < 2) = 0.062 > 0.05. Therefore we do not
reject Hy and Re
conclude
Ae that thereseis 2h
no evidence, at the B/Ptev
5% level, to
oer peg’ ss, 10
suggest
SE Ben that
ee the average
Se number of breakdowns
enh DECARK GOWNS Nas, has Cecreased.
decreased.

SUMMARY — TESTING A POISSON MEAN

X.™ ,Po(A)
Level of test: #%

Test the single value x =r

1-tailed test

a
ject Hy Ho if P(X
Reject ( zr)
r) <——
100

a
Reject Hy if P(X <r) <<—
100

2-tailed test
150
Reject
ject Ho
Ho if P( 2r) ) <—
if P(X 700
1
gm
or if P(X<rn< 00

Exercise 9j

1. For each of the following a single observation x is taken from a Poisson distribution, where
X ~ Po(A). Test the hypotheses at the level of significance stated.

Level of
Hypotheses significance

(2) Ho:A = rie Neal

(b) Ho: = 7, Hy:A#T7


Ho:A = 10, Hy:A< 10
(c)
(d) Ho:A =10, Hy:A> 10
(e) Ho:A = 6.5, Hy:A’A#6.5

(f)
518 A CONCISE COURSE IN A-LEVEL STATISTICS

The number of white corpuscles on a slide distribution with mean got. Given that
has Poisson distribution with mean 3.5. the telephone in that office is unmanned
After certain treatment another sample for 10 minutes, calculate, to 2 significant
was taken and the number of white figures, the probability that there will be
corpuscles was found to be 8. Test, at the at least 2 emergency telephone calls to
5% level, whether the mean has increased. the office during that time.
The number of breakdowns in a computer Find, to the nearest minute, the length of
is known to follow a Poisson distribution time that the telephone can be left un-
with a mean of 4.5 per month. A new manned for there to be a probability of
computer is installed and in the first 0.9 that no emergency telephone call is
month there are 2 breakdowns. Test, at made to the office during the period the
the 5% level, the claim that the mean has telephone is unmanned.
decreased. During a week of very cold weather it
was found that there had been 10 emer-
The number of telephone calls to an
gency telephone calls to the office in the
office follows a Poisson distribution with
first 12 hours of the weekend. Using the
a mean number of 6 per hour on a week-
tables provided, or otherwise, determine
day. whether the increase in the average num-
(a) On Monday there were 5 calls between
ber of emergency telephone calls to that
10.00 and 10.30. Test, at the 5% level,
office is significant at the 5% level. (L)
whether the mean has increased.
(b) On Wednesday there were 3 calls
between 11.00 and 12.30. Test, at the Explain briefly, referring to your projects
5% level, whether the mean has decreased. if possible, the role of the null hypo-
thesis and of the alternative hypothesis
The number of flaws per 100 m of fabric
in a test of significance.
is known to follow a Poisson distribution
with mean 2. A 200m length of fabric Over a long period, John has found that
is tested and found to have 7 flaws. Test the bus taking him to school arrives late
at the 5% level, whether the mean has on average 9 times per month. In the
increased. month following the start of new summer
schedules, John finds that his bus arrives
Describe, briefly, the experimental late 13 times. Assuming that the number
evidence which you obtained in order to of times the bus is late has a Poisson
illustrate the Poisson distribution. State distribution, test, at the 5% level of
carefully any assumptions which you significance, whether the new schedules
made. have in fact increased the number of
The number X of emergency telephone times on which the bus is late. State
calls to a gas board office in t minutes clearly your null and alternative hypo-
at weekends is known to follow a Poisson theses. (L)P

TYPE | AND TYPE Ii ERRORS


When conducting a significance test we reach one of four possible
conclusions. These are summarised in the table below.

True situation Our conclusion

Correct
(1) Hy is true Accept Ho
decision
Wrong
(2) Ho is true Reject Ho
decision
Wrong
(3) Hy is false Accept Ho
decision
Correct
(4) Ho is false Reject Ho
decision
SIGNIFICANCE TESTING y 519

We say that
(a) Type I error is made if we reject Hy when it is true.
(b) Type II error is made if we accept Hp when it is false.
We write
(a) P(Type I error) = P(rejecting Ho| Hp is true)
(b) P(Type II error) = P(accepting Ay|H , is true)
NOTE: when considering Type II errors we must state a definite
value of the parameter in the alternative hypothesis H,.

Example 3.28 Define Type I and Type II errors in testing hypotheses.


A box is known to contain either (H,) ten white counters and 90
black counters or (H,) 50 white counters and 50 black counters.
In order to test hypothesis Hy against hypothesis H,, four counters
are drawn at random from the box, without replacement. If all
four counters are black, Ho is accepted. Otherwise is it rejected.
Find the size of the Type I and Type II errors for this test.
(AEB)
Solution 9.28 For the first part, see preceding paragraph.
Hy: The bag contains 10 white and 90 black counters
H,: The bag contains 50 white and 50 black counters
Hg is accepted if all four counters, drawn without replacement, are
black.
P(Typelerror) = P(rejecting Ho| Hp is true)
II P(at least 1 white| there are 10 white and 90 black)

Now, if there are 10 white and 90 black


: 90 \ /89\ /88 a
P(drawing 4 black) = ma (I f e

= 0.652
P(drawing at least 1 white) = 1—0.652
= 0.348
Therefore P(Type I error) = 0.348.

P(TypeIlerror) = P(accepting Ho|H; is true)


= P(all 4 are black| there are 50 white and 50 black)
If there are 50 white and 50 black
50 \ /49\ (48) (47
4 black) = anc a6 ee oe
P(drawing

= 0.059
Therefore P(Type II error) = 0.059.
520 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 3.29 A man claims that he can throw a six with a fair die five times out
of six on the average. Calculate the probability that he will throw
four or more sixes in six throws (i) if his claim is justified (ii) if he
can throw a six, on the average, only once in six throws.
To test the claim, he is invited to throw the die six times, his claim
being accepted if he throws at least four sixes.
Find the probability that the test will (a) accept the man’s claim
when hypothesis (ii) is true, or (b) reject the claim when it is
justified, that is, when hypothesis (i) is true. (AEB 1974)

Solution 3.29 (i) If his claim is justified


F 5
P(throws a six) = A

So, in six throws, let X be the r.v. ‘the number of sixes obtained’.

Then XxX Bin(n; p)? with Sn =67" p= —

t 6x 5\*
Now P(X =x) = tale =| ¢ Se OAM
YEG

P(X >4) = P(X =4)+P(X =5)+P(X =6)

is(5) () 265)() +(e)


2 4

6} \6 6/\6 6
5) ei
| 5 (15+ 30+ 25)

0.938 (3d.p.)
Therefore P(X > 4) = 0.938 (3d.p.).

&
(wi) If P(throws a six) = 6
1

Then X ~ Bin(n,p) with n= 6, p= :

5 6—x 1\*
P(X =x) = ce.(2] =) aes 0). dee. «2,6

P(X> 4) = P(X = 4) +P(X =5)+P(X


=6)
15(—|5\2 (=)(1 \*
+6(—|{=]
5 fe
+ (=aye
fs(5) +6(5]5) +(|
1 6
= (375 +3041)
0.0087 (2S.F.)
Therefore P(X = 4) = 0.0087 (2S.F.).
SIGNIFICANCE TESTING , 521

Hy: The man can throwasix five times out of six


H,: The man can throwasix, on the average, only once in six throws
H is accepted if the man throws at least four sixes in six throws.
1)
(a) P(accepting Ho|H;, is true) = |x>Alp= 3]

ll 0.0087

Therefore the probability of accepting the man’s claim, when


hypothesis (ii) is true = P(Type II error) = 0.0087 (2S.F.).

(b) P(rejecting Hy| Hp is true) =


. .

{x<4|p= 3] 5

= 1—0.938
= 0.062 (3d.p.)
Therefore the probability of rejecting the man’s claim when it is
justified (probability of a Type I error) is 0.062 (3d.p.).

Example 3.30 Dating of archaeological specimens is a difficult task. It is known


that specimens emit a certain type of radioactive particle; the
number of particles emitted in n minutes having a Poisson distribu-
tion with parameter nA, where the value of \ depends upon the age
of the specimen.
Two hypotheses concerning the age of one particular specimen are
put forward:
H,: specimen is 7000 years old (in which case A = 1.0)
Hp: specimen is 15 000 years old (in which case \ = 4.0)
It is decided to count the number, X, of radioactive particles
emitted in n minutes and
accept H, (and reject Hg) if X < 1
and accept Hp, (and reject Hy) if X > 2
If n = 1 what is (a) the probability of rejecting H, when H, is in
fact true, (b) the probability of rejecting Hz when Hg is in fact
true?
If the probability of rejecting Hg when Hg is in fact true is to be less
than 0.001, show that the minimum number of complete minutes
for which counting should be recorded is three. What is the corres-
ponding probability of rejecting H, when H, is in fact true?
(AEB 1980)

Solution 39.30 H,: specimen is 7000 years old (A = 1.0)


Hz: specimen is 15 000 years old (A = 4.0)
522 A CONCISE COURSE IN A-LEVEL STATISTICS

Let X be the r.v. ‘the number of particles emitted in n minutes’.


Then X ~ Po(nd).
We accept H, (and reject Hg) if X <1
and accept Hz (and reject H,) if X > 2.
If n = 1, then X ~ Po(A).

(a) P(rejecting H,|H, is true) = P(X > 2|X ~ Po(1.0))


Now, if X ~ Po(1.0),
(1.0)
P(X =x) = eM al Ssbo
So P(X>2) = 1—- P(X=1)
=0)—P(X
= }—e !-e}
ie
= 1—0.736
= 0.264
Therefore the probability of rejecting H, when Hy, is true is 0.264
(3 d.p.).

(b) P(rejecting Hz| Hg is true) = P(X <1|X ~ Po(4.0))


If X ~ Po(4.0), then
4g (tO)
P(X =x) =e” a RO Lo.

So P(X <1) = P(X =0)+P(X


=1)
=e +e 4
= heuee
= 0.092
Therefore the probability of rejecting Hz when Hg is true is 0.092"
(3 d.p.).

Now under H,: X ~ Po((1.0)n)


under Hz: X ~ Po((4.0)n)
If P(rejecting Hg| Hz is true) < 0.001 then
P(X <1|X ~ Po(4n)) < 0.001.
If X ~ Po(4n) then
4 x

BORA Bigetie MaRinog ma dge jot


x!
We have P(X <1) = P(X =0)+P(X
=1)
= e+ en "dn
=e (14+4n)
SIGNIFICANCE TESTING 523

So, we require to find n such that


e4"(1+4n) < 0.001
By, trial, whenn=1 e 4(5) = 0.092 > 0.001
whenn=2 e (9) = 0.00802 > 0.001
whenn=3 e !*(13) = 0.00008 < 0.001
Therefore, the minimum number of complete minutes is 3.

In this case, when n = 8, X ~ Po(8d).


So P(rejecting H,|H, is true) = P(X> 2|X ~ Po(8(1.0)))
If X ~ Po(3) then

PX =x) = er ee ee

So P(X >2) = 1—P(X =0)—P(X=1)


= hea ee 23
= 1—4e"3
= 0.801 (3d.p.)
Therefore the probability of rejecting H, when H, is true is 0.801.

Example 3.31 Two hypotheses concerning the probability density function of a


random variable X are
sie yes geet
1) 10 ix <2
; <a
: 0 otherwise

Hy: f(x) = | 9x° 0<x<2


0 otherwise
Sketch the p.d.f. in each case.
The following test procedure is decided upon. A single observation
of X is made and if X is less than a particular value k, where
0<k < 2, then Hp is accepted, otherwise H, is accepted.
(a) Find k if P(Type I error) = 0.1.
(b) With this value of k, find P(Type II error).

Solution 3.31 Under Ho Under H,

f(x) = Z(x+ 1)

Bi]
BIW
NI
524 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) If X <k, we accept Hp. If X 2k, we accept Hj.

We need to find k such that P(Type I error) = 0.1,


i.e. P(accept H,|Ho is true) = 0.1

At this stage it is important to rewrite the statements as follows:


P(X > k| f(x) = (x +1) =O!

Under Hp, P(X Zk) II


laGay de

2 1

So glx +1) dx 0.1


Jk

a ely 0.4

2+2—tk?—k 0.4

k? + 2k—7.2 = 0
(k+1)? = 8.2
k+1 = +2.86
Therefore k = 1.86 since 0 <k < 2.

(b) Now, when k = 1.86


P(Type II error) = P(accept Ho|H, is true)

P(X < 1.86] f(x) = 5x°)


1.86
Vad 143 dx
|

isl
x4 1.86

0.748
Therefore, when k = 1.86, P(Type II error) = 0.748.

Example 9.32 To test whether a coin is fair, the following decision rule is adopted.
Toss the coin 120 times; if the number of heads is between 50 and
70 inclusive, accept the hypothesis that the coin is fair, otherwise
reject it.
(a) Find the probability of rejecting the hypothesis when it is
correct.
(b) How should the decision rule be modified if
P(Type I error) < 0.01? |
(c) With the original decision rule, find P(Type II error) if the coin
is biased and the probability that a head is obtained is in fact 0.6.
SIGNIFICANCE TESTING , 525

Solution 9.32 Let X be the r.v. ‘the number of heads obtained’.


Then X ~ Bin(n, p) with n = 120.
Now, as nis large, X ~ N(np, npq) approximately, where q = 1—p
Hy: The coin is fair We 5)
H,: Thecoinis biased (p# 5)
Under Hp,

np = c.20)(
5 = 60 and npq = (120)fe a = 30
2 2) \2
So X ~ N(60,30)

(a) Under Ho,


P(560 <X <70) > P(49.5 <X < 70.5) (continuity
correction)
‘a coe —60) < X= 80PS ae
/30 /30 /30
= P(—1.917 <Z<1.917)
= 0.9446

49.5 60 70.5
<=— Reject Hy —\« Accept Hee Reject Hy >

Wee Sie 1.917

So P(accepting Ho| Hp is true) = 0.9446


P(rejecting Ho|Hp is true) = 1—0.9446 = 0.0554.

(b) If P(Type I error) < 0.01 then


we need to find a value a such that
@(a) = 0.995. Now, from tables,
a = 2.575.
So, if X ~ N(60, 30), then the value of
eae eae to ae standardised Bion ao RIES Bee

Bae ‘ pe wen. Oy ~<— Reject Hp «Accept Ho ol Reject Hy —>

60 #2.575\/30 = 74.1) (3S-F.) 45.9 60 74.1


The value corresponding to the standar-
dised value of — 2.575 is

60 —2.575 \/30 = 45.9 (3S.F.)


526 A CONCISE COURSE IN A-LEVEL STATISTICS

Therefore, the decision rule becomes:

Accept the hypothesis that the coin is fair if the number of heads
lies between 46 and 74 inclusive, otherwise reject it.

(c) Hy: coin is fair (p = 0.5)


H,: coinis biased (p = 0.6)
We accept H, if the number of heads lies between 50 and 70 in-
clusive.
Now P(Type Il error) = P(accepting Ho|H, is true)
P(49.5 < X < 70.5|p = 0.6)
Now, ifp =0.6, np II (120)(0.6) = 72
npq (120)(0.6)(0.4) = 28.8
So X ~ N(72, 28.8).
(aes Ado iE
P(49.5 <X
< 70.5) Pes ee
/ 28.8 J/28.8 /28.8
P(=4:193.< Z <—0.2795)
lI 0.390 (3S.F.)
Therefore, P(Type II error) = 0.390 (38.F.).

49.5 70.5 72
S.V. —4.193 —0.28 0

We see that, with the given decision rule, there is a fairly high
probability that the coin will be accepted as fair when in fact
p=0.6.

SSSS5

ga Accept Ho
49.5 Number of heads
P(Type II error) P(Type | error)
SIGNIFICANCE TESTING ' 527
v

Example 9.33 A sample of size 100 is taken from a normal population with un-
known mean uw and known variance 36. An investigator wishes to
test the hypotheses Hp: uw = 65, H,: wu> 65. He decides on the
following criteria:
accept Hy if the sample mean X < 66.5
reject Hy if X > 66.5
Find the probability that he makes a Type I error.
If he uses as alternative hypothesis H,: u = 67.9, find the proba-
bility that he makes a Type II error.
On which critical value should he decide for the sample mean if he
wants P(Type I error) = P(Type II error)?

Solution 3.33 Under Ho,


Z
x ~ N{u,2| with—p = 65, o = 6, n = 100
n

He rejects Hy if x > 66.5.


Now
P(X > 66.5) = saa es Ee
6/10 6/10 65 66.5
=(§P(Zi>2215) S.V. 0 25

= 0.006 21
probability thatPerhe rejects Hp, when in fact Hp is
Parayy eantheyrtey
Therefore, ens 2.2 yates geegee eee Se
true (Type I error) is 0.006 21.

If H,: w = 67.9, then under H,

P(Type Il error) P(Type |! error)

P(Type Il error) =| P(accept Ho|H, is true)


lI P(X <66.5| wu = 67.9)
z zee < 86.5 —819|
6/10 6/10
528 A CONCISE COURSE IN A-LEVEL STATISTICS

Hy
P(Z <— 2.333)
= 0.009 82
Therefore, P(Type II error) = 0.009 82.
66.5 67.9
Se 200g

If he wants P(Type I error) = P(Type II error) then the critical value


of X should be fixed so that
P(X >X|Hp is true) = P(X <X|A, is true)
As the variances of the distributions given by Hy and H, are equal,
we see, by symmetry, that the value of X lies mid-way between 65
and 67.9.
Therefore, he should take as critical value, X = 5(65 + 67.9) = 66.45.

[38
e400

P(Type II error) P(Type | error)

Example 9.34 The ingredients for concrete are mixed together to obtain a mean
breaking strength of 2000 newtons. If the mean breaking strength
drops below 1800 newtons then the composition must be changed.
The distribution of the breaking strength is normal with standard
deviation 200 newtons.
Samples are taken in order to investigate the hypotheses:
Hy: bm = 2000 newtons
H,: uw = 1800newtons
How many samples must be tested so that
P(Typelerror) = a = 0.05
and P(Type II error) = 6 = 0.1?

Solution 9.34

= 1800 = 2000
+ Accept H, ———» <«——— Accept Hy ————»>
SIGNIFICANCE TESTING ed

Under Hy, X ~ N(2000, 2007)


So, for a random sample of size n
a 200?
X~N 200 —
n

Now a corresponds to a standardised ee


S.V. —1.645 0
value of —1.645,
a = 2000—1.645| 200
——
(i)
i ee
Jn

Under H,, X ~ N(1800, 200?)


So, for a random sample of size n
Rs 200?
X ~ N{1800,
n

Now 6b corresponds to a standardised SV. 0 1'282


value of 1.282,

i.e. b = 1800+1.282 ra (ii)


Jn

Now, if we find a value for n such that « = 0.05 and B = 0.1


then a= 0b
Equating (i) and (ii)
200
2000 — 1.645
. A
Jn 1800 + 1.282 |——
a|
200
200 — | (1.282
+ 1.645)
/n
Jn = 2.927
n = 8.57

So the estimated number of samples which need to be tested is 9.


pene SAeter ere 2S ee

‘ Re Exercise 9k (9 hee
1. Two separate tests are proposed to deter- the same result, and unbiased otherwise.
mine whether a coin is biased or unbiased. Test 2—Toss the coin 7 times, and con-
These are: clude that it is biased if at least 6 of the
Test 1—Toss the coin 4 times, and con- tosses give the same result, and unbiased
clude that it is biased if all 4 tosses give otherwise.
530 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Suppose that the coin is unbiased. Give a sketch of the probability density
Show that each test has the same proba- function for each case.
bility of giving a wrong conclusion. The following test procedure is decided
(b) Suppose that the coin is such that the upon:
probability of a head in any toss is 3 A single observation of X is made and if
Determine which test is more likely to X is less than a particular value a, where
give the conclusion that the coin is 1<a< 2, then Ho is accepted; otherwise
biased. (MEI) H, is accepted.
Find a such that, when Hp is true, the test
Two alternative hypotheses concerning procedure leads, with probability 0.1, to
the probability density function of a the acceptance of H. With this value of a,
random variable are find the probability that, when H is
Ho: f(x) = 2x 0<x<1 true, the test procedure leads to the
= 0 otherwise acceptance of Hp. (C)

Hy: f(x) = 20—2) 0<*e<1 One of two dice is loaded so that there
= 0 otherwise is a probability of 0.2 of throwing a six
with it, nothing being known about the
Give a sketch of the probability density
other scores. The other die is fair. A
function for each case.
person is given one of these dice (which
The following test procedure is decided is just as likely to be the fair as the biased
upon. A single observation of X is made one), together with the above information
and if X exceeds a particular value a, and is asked to discover which die it is.
where 0<a<1, then Ho is accepted, He decides to throw the die 10 times;
otherwise H; is accepted. Find the value if there are two or more sixes he will
of a if the probability of accepting Hy assert that the die is biased, otherwise he
given that Hp is true is é. With this value will assert that it is fair. Calculate the
of a, find the probability of accepting Ho probability of his asserting that the die is
given that H, is true. (C) (a) biased when it is, in fact, fair; (b) fair
when it is, in fact, biased. What is the
probability that his choice will be
A manufacturer makes two grades of
incorrect?
squash ball —‘slow’ and ‘fast’. Slow balls
have a ‘bounce’ (measured under standard If, instead, he decided to throw the die
conditions) which is known to be a 240 times and will assert that the die is
normal variable with mean 10cm and biased if there are N or more sixes, use
standard deviation 2 cm. The ‘bounce’ of the normal approximation to the binomial
fast balls is a normal variable with mean distribution to estimate N if the proba-
15 cm and standard deviation 2 cm. A box bility of his asserting that it is fair when it
of balls is unlabelled so that it is not is biased is to be 0.2. (SUJB)
known whether they are all slow or all
fast. Devise a test, based on a single An automated engineering process for
observation of the bounce of one ball manufacturing components includes an
such that the probability of deciding that automatic screening of the output to
the box contains fast balls when in fact it reject defective components. The process
contains slow balls, i.e. the Type I error, gives on average 5% of defectives. The
is equal to the Type II error. probability that the screening stage
Devise a test, based on an observation
identifies correctly a defective component
of the mean bounce of a sample of 4 is 98% but there is also a probability of
balls from the box such that the Type I 6% that a component which is not
error is 0.05 and state the magnitude of defective is rejected at the screening
the Type II error for this test. (C) stage. What is the proportion of all
components which is rejected and what is
the proportion of all components passed
Two hypotheses concerning the proba- from the screening stage that is still
bility density function of a random defective? (MEI)
variable are
ety 34 time oe In order to examine a six-sided die for
Ho: ah) ke otherwise bias, one face is marked, the die is tossed
a pre-determined number of times, and
Hy: =
BoekTuite tee the number of times the marked face is
A) _ otherwise uppermost is recorded.
SIGNIFICANCE TESTING 531

(a) If this occurred r times in n tosses, 10. A fair coin is tossed 100 times. Use a
explain how you would decide if this normal approximation to determine the
provided significant evidence of bias. Do probability of obtaining (a) more than
not consider any approximate methods 57 heads, (b) more than 58 heads.
in this part. It is desired to construct a significance
(6) Would you consider it likely to be test to choose between the following two
biased if the marked face came up once hypotheses concerning the possible bias
in 30 tosses? of a coin:
(ec) Would you consider it likely to be
Ho: the probability that the coin falls
biased if the marked face came up 39
heads is 0.5
times in 180 tosses? (O)
Hy: the probability that the coin falls
heads is 0.6
Flour is packed in bags. The combined The coin is to be tossed 100 times and the
mass, X grams, of a full bag and its con- number of heads, X, recorded. Construct
tents is a normally distributed random a significance test based upon the ob-
variable with mean LU grams and standard served value of X such that the proba-
deviation 5 grams. When the packing bility of accepting H; when Hp is true is
machine is working correctly UW= 136, as close as possible to 0.05. For this test
but when the packing machine is working calculate the probability of accepting Ho
incorrectly = 130. Show that the when H, is true. (C)
probabilities of a randomly chosen bag
having a combined mass of less than 11. Two alternative hypotheses for the proba-
131.5 grams when the machine is work- bility density function of a random
ing (a) correctly, (b) incorrectly, are variable X are given below.
approximately 0.2 and 0.6 respectively.
Hop: f(x) = at tx =e ed
When X is less than 131.5 the bag is
= 0 otherwise
underweight. Using the approximate
probability 0.2, determine the probability Ay: fx) = b-x —1 521,
that, when the machine is working = 0 otherwise.
correctly, in a random sample of five bags
there are precisely k bags which are Design a test, based on a single observa-
underweight, for k = 3,k = 4 andk = 5. tion of X such that the probability of
wrongly accepting Ho is 0.05.
The machine is presumed to be working
incorrectly if the number of under- Design also a test, based on a single
weight bags found in a random sample of observation of X, such that the probability
five bags is equal to or greater than r. of wrongly accepting Ho is twice the
probability of wrongly accepting H;. (C)
Determine the minimum value of r which
gives a probability less than 0.01 of
presuming the machine to be working 12. You are provided with a coin which may
incorrectly when it is working correctly. be biased. In order to test this you are
(C) allowed to toss it 12 times and count the
number, r, of heads and to use the value
of r to decide. If the coin is really fair you
wish to have at least a 95% chance of
One suggested test for deciding whether saying so. For what values of r should you
a coin is fair or not is to toss it four say that the coin is fair?
times and call it ‘biased’ if four heads or If you adopt your procedure with a coin
four tails are obtained. A second suggested which is actually biased two to one in
way is to toss it seven times and call it favour of heads, what is the probability
biased if six or seven heads, or six or that you decide the coin is biased? (O)
seven tails, are obtained. Show that both
these tests would be equally likely to 13. Random samples of 400 seeds are taken
conclude wrongly that a fair coin was from a large batch. For this batch the
biased. probability of a randomly chosen seed
Which of these two suggested tests would germinating is a. The r.v. X is defined as
be better for correctly judging as biased a the number of germinating seeds in a
coin whose probability of coming down sample. Use an appropriate normal
heads was 2/3? approximation to determine the values of
(a) P(X <340|a= 0.9)
Are any of the above results statistically
(SMP) (b) P(X> 340|a= 0.8)
significant?
532 A CONCISE COURSE IN A-LEVEL STATISTICS

The seed assessor knows that the value Z=P(X>x|a=0.8)—P(X <x|a=0.9)


of @ is either 0.8 or 0.9. Suppose that, in is positive. Otherwise he decides that a
fact out of the 400 seeds in a particular is 0.9. Determine the assessor’s decision
sample a total of x germinate. The assessor for each of the cases x = 330, x = 340,
decides that the value of a is 0.8 if x = 350. (C)
ee ee ee ee
10
THE X TEST
We now investigate the use of the chi-squared distribution in
significance testing. It has a very complicated p.d.f. which has been
included only for completeness. The information required for the
test will be obtained from tables.

THE CHI-SQUARED DISTRIBUTION

Consider the r.v. X with p.d.f. f(x) where


fla) = K,Gx)res x >.0
X has one parameter, v and the constant K,, depends on this para-
meter.
If X is distributed in this way, we write

KD)
NOTE: xX’ is pronounced ‘kye-squared’.
The shape of the distribution for various values of v is shown:

F(x) f(x)

The probability that the value of X is


greater than a particular value Xi is given el)
by the tail area, shaded in the diagram.
co 1 ce |

So P(X>X,) = | Koger te 2 dx Xe
Pp

However, this integral is very difficult to evaluate, so we refer toiG


tables.
533
534 A CONCISE COURSE IN A-LEVEL STATISTICS

In significance tests we are usually concerned with the values of


Xp" such that
P(X > Xp 0.05 (5% of the area is in the tail)
PX Xe) 0.01 (1% of the area is in the tail)

These values are summarised on p. 637; the first few lines are
reproduced below:

For example, if v = 4,

Noae = 9.49 Keira: = 13.28

P(X >9.49) = 0.05 P(X >13.28) = 0.01


|z é | . 1%

9.49 13.28

THE x? TEST
Consider an experiment or situation which results in n observed
frequencies, written O;,i=1,2,...,n.
Say we wish to make a hypothesis about the distribution; we could
then calculate the frequencies expected under this hypothesis,
written E;,i = 1,2,...,m. We now decide how well the ‘observed’
data fits the ‘expected’ data, and consider whether it is likely that
the differences can be attributed to chance.
Now, the comparison between the observed frequencies (O;) and
the expected frequencies (E;), fori =1,2,...,n (that is, for n pairs
of values, or classes) is made by considering the statistic

2 oie
= (O; —E;,)

i=1

For high values of O; and E; this statistic approximates to the chi-


squared distribution.

We define
THE x? 2 TEST / 535
Now

if Xa = 0 then there is exact agreement between the


observed and the expected data,
if Xa > O then O; and E; do not agree exactly and, for
a given value of v, the larger the value of
Eee the greater the discrepancy.

For a test performed at the 5% level:


itoeXee X75, then we consider the discrepancy to be
too large and reject the null hypothesis,
iPeeX ee eX ce wedo not reject the null hypothesis.

(rv)
5%

2
X5%
te
Rejection region

NOTE: (a) When using the chi-squared distribution we are approxi-


mating from a discrete to a continuous distribution.
(ti) The approximation is not valid if the expected frequency is
less than 5. This problem can be overcome by combining two or
more classes with small frequencies to form a class sufficiently
large.
(ii) If v = 1, it is advisable to use Yates’ continuity correction. In
this case
n
(|O; —E;|— 0.5) 2
XGaale a Ds zi
i=1

(b) When the value of X* alc is very small, it is wise to query the
reliability of the observed data and to question whether they
have been ‘fiddled’.

Degrees of freedom
The parameter v is known as the number of degrees of freedom.
Now the number of degrees of freedom associated witha statistic
is given by
py = number of independent variables involved in calculating
the statistic

This can be found by considering


__p = number of classes — number of restrictions |
536 A CONCISE COURSE IN A-LEVEL STATISTICS

n 2
(O;—£,;)
idering
When considering statistic »
the the statis oe
E, the number of classes
1 =

The number
is the number of pairs of values, i.e. there are n classes.
We consider
of restrictions involved depends on the null hypothesis.
several cases in the following examples.

UNIFORM DISTRIBUTION

Example 10.1 The table shows the number of employees absent for a single day
during a particular period of time:

Number of
absentees

(a) Find the frequencies expected under the hypothesis that the
number of absentees is independent of the day of the week.
(b) Test, at the 5% level, whether the differences in the observed
and expected data are significant.

Solution 101 (a) If the number of absentees is independent of the day of the
week, then we would expect the total of 500 to be spread uniformly
throughout the week, so that the expected number of absentees for
any day is 100.

Expected frequencies:

Number of

(b) Degrees of freedom, v


Now v = number of independent variables
= number of classes — number of restrictions

There are 5 classes and, since the total expected and observed
frequencies each have to be 500, there is one restriction.
Therefore y=. 5-51 =.4
THE x2 TEST y 537
The x? test is carried out as follows:

Reminders: Significance test

(1) Make the Hy: the number of absentees is independent of the day of
null hypo- the week
thesis (Ho)

Work out = number of classes — number of restrictions


the number
51
of degrees
of freedom, | | = 4
v

Decide on Test at the 5% level


the level of
the test

Decide on and reject Ho if Keak > X759,(4), x7(4)


rejection Levit X cate = 9.49 5%
criterion
9.49
Reject Ho

Calculate
ae

DO, =500| YE, =500


Therefore X*caic = 10.56

(6) Make As X*calc > 9.49 we reject Hy and conclude that the number of
conclusion absentees is not independent of the day of the week.

NOTE: (a) The test does not indicate what the relationship
might be between number of absentees and the day of the week.
However, a look at the observed frequencies suggests a tendency
towards a greater number of absentees on Mondays and Fridays.
(b) When working out the table we are not gouceried whether
O; —E; is positive or negative, as we require (O; — E;)*. Therefore we
find O; aides
|
538 A CONCISE COURSE IN A-LEVEL STATISTICS

on
Example 10.2 An ordinary die is thrown 120 times and each time the number
the uppermost face is noted. The results are as follows :

bero
CNon[i 4 5e6
2 8nd
is
Perform a x? test, at the 5% level, to investigate whether the die
fair.

Solution 10.2 H): The die is fair.

Now »O; = LE,, therefore there is one restriction, namely that the
totals agree.
vy = number of classes — number of restrictions
= 6-1
= 5
Therefore v = 5 and we consider the x?(5) distribution.

We will test at the 5% level and reject Ho 26)


if Xe ae (Op tee. it Ne ie eT 5%

6 2
h 2 calc = (O;a
—E;) 11.07
wnere x Ds Be

i Reject Ho

Now, under H, (that the die is fair) we would expect each number
to occur the same number of times,
sO E, = 20 ~for) 1 =),
2, 2-230

Therefore X*caic = 4.4.

AS X* cate < 11.07 we do not reject Hy and conclude that the differ-
ences between the observed and expected frequencies are not
significant at the 5% level and the die is fair.
THE x2 TEST # 539

DISTRIBUTION IN A GIVEN RATIO


Example 103 According to genetic theory the number of colour-strains pink,
white and blue in a certain flower should appear in the ratio 3:2:5.
For 100 plants, the results were as follows:

Number of plants

Are the differences between the observed and expected frequencies


significant, at the 1% level?

Solution 103 H): the colours pink, white and blue occur in the ratio 3: 2:5.

Now 2 O; = YE; = 100, so there is one restriction, that the totals


agree.

So Vv = number of classes —number of restrictions

=i =]

= 2

Therefore v = 2 and we consider the x?(2) distribution.

We test at the 1% level and reject Hp if x°(2)


Mo EX ie, OMT Yc 912 lewhere mh

3 O,—E; 2
x ctie — ys ( = oy 9.21
i=1 i; }-____»
Reject Ho

Now under Hp we expect the colours pink, white and blue to appear
in the ratio 3:2:5, so the expected frequencies are
3 2 5
—(100):—(100):—(100) = 30:20:50
10! ) 10! ) 10! )

As X? cate < 9-21, we do not reject Hy and conclude that the differ-
observed and
ences inMeraneaeete expected frequencies are not significant at
eae awe eee ee
the 1% level.
A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 10a

A tetrahedral die is thrown 120 times and had been 1600 and the observed frequen-
the number on which it lands is noted. cies 220, 820, 300, 260, would the
difference have been significant at the 5%
level? (C Additional)
Frequency 3b 82 25)" 28 Total 120
It is thought that each of the 8 outcomes
Test, at the 5% level whether the die is of an experiment is equally likely to
fair. occur. When the experiment is performed
400 times, the observed frequencies are
From alist of 500 digits, the occurrence of 45, 42,55, 53, 40, 62, 47 and 56. Perform
each digit is noted. a test at the 1% level to investigate the
Digit Que Oho tte nO oma Om validity of the theory.

Frequency 40 58 49 53 388 56 61 53 60 382 6 In a particular subject students are set


multiple choice questions each of which
Test, at the 1% level, whether the sequence contain 5 alternatives A, B,C, DandE. A
is a random sample from a uniform distri- teacher suggests that when students do not
bution. know the correct answer they are twice as
The outcomes, A, B and C, of a certain likely to choose one of B, C or D than to
experiment are thought to occur in the choose A or E. For 160 questions where it
ratio 1:2:1. The experiment is performed was known that the student answered with-
200 times and the observed frequencies out knowing the correct answer, A, B, C,
of A, B and C are 36, 115 and 49 respec- D, E were chosen 23, 45, 36, 43 and 13
tively. Is the difference in the observed times respectively. Is there evidence, at the
and expected results significant? Test at 5% level, to support the teacher’s theory?
the 5% level.
For a given set of data the observed and
According to genetic theory the number expected frequencies are shown:
of colour strains red, yellow, blue and
white in a certain flower should appear in
proportions 4:12:5:4. Observed frequen- Observed frequency 80 31 42 40 57
cies of red, yellow, blue and white strains Expected frequency 88 45 36 36 45
amongst 800 plants were 110, 410, 150,
130 respectively. Are these differences Are the differences between the observed
from the expected frequencies significant and expected frequencies significant at the
at the 5% level? If the number of plants 1% level?

‘GOODNESS OF FIT’ TESTS


We now illustrate the use of the chi-squared test to investigate
whether an observed distribution fits a well-known distribution
such as the binomial, Poisson or normal. The test is often referred
to as a ‘goodness of fit’ test.

BINOMIAL DISTRIBUTION, p known

Example 104 Four coins are thrown 160 times, and the distribution of the num-
ber of heads is observed to be
x (number of heads) mma oo se
f (frequency) 5 385 67 41 12
THE x? TEST ; 541
Find the expected frequencies if the coins are unbiased. Compare
the observed and expected frequencies and apply the x? test. Is
there any evidence that the coins are biased? (AEB 1974)

Solution 10.4 Let X be the r.v. ‘the number of heads obtained when four coins are
thrown’. Then if the coins are unbiased X ~ Bin(4, 5)
; t 4-—x 1 x 1 4

( =x) ) = *C,/—
and P(X A fe a
| ad Od ener. = Oiler ie a4

x Expected
(number of frequencies
heads) [160P(X = x)]

|o
4
16
6
16
4
16
oo
|

x? test:
al
Hp): the coins are not biased and P(head) = 9

Degrees of freedom: number of classes = 5


number of restrictions = 1
(totals must agree)

Therefore vy = 5—1 = 4, and we consider the x?(4) distribution.

We will test at the 5% level and reject Ho f


if Nous Be X59 (4) se 5%

Tet Xoo -40:


9.49
5 7

Now Kee = ) ( a Ne Reject No

i=1
i
542 A CONCISE COURSE IN A-LEVEL STATISTICS

12

Therefore X?,1. = 4.367 (3 d.p.).


AS X? calc < 9-49, we do not reject Hy and conclude that there is no
evidence that the coins are biased.

BINOMIAL DISTRIBUTION, p unknown

Example 10.5 Samples of size 5 are selected regularly from a production line and
tested. During one week 500 samples are taken and the number of
defective items in each sample recorded.

Number of defectives, x 0 1 a 3 4 5

(a) Find the frequencies of the number of defectives per sample


given by the binomial distribution having the same mean and
total as the observed distribution.
(b) Test whether the observed distribution follows a binomial
pattern.

Solution 10.5 Now x = — = 1.044


2f
and n = 5, p = P(defective item).
We require np = X
ie; 5p 1.044
So D 0.2088
Let X be the r.v. ‘the number of defectives in a sample’.
Then X ~ Bin(n,p) with n= 5, p = 0.2088,
Gd =A1A>p = 0.7912
We have

P(X =x) =S°Cx(0:7912)* *(0,2088) a 0

The expected frequencies can be found by calculating 500P(X = x)


forx = O}15;....70;
THE x2 TEST y 543

These are shown in the table. Frequencies have been rounded to the
- nearest integer.

Number of defectives 0 i 2 3 4 5
Expected frequency 155 205 108 .- 28 4 0

(b) Perform a x? test.


Hy: _ the distribution follows a binomial pattern
Now, we note that the expected frequencies for two of the classes
are less than 5. So we combine the last three classes to read ‘3 or
more defectives’. Therefore the classes are 0,1, 2,3 or more.
Degrees of freedom: The number of classes = 4
The number of restrictions = 2
(totals agree, means agree)
Therefore v = 4—2 = 2 and we consider the X7(2) distribution.
We test at the 5% level and reject H if
P
Neue = X 5069 x(2) 5%

MOAN caige DOO.


5.99
2 —
Now =s Reject Hp

Therefore X*caic = 5.959.


As X*calc < 5.99 we do not reject Hy and conclude that the distribu-
tion follows a binomial pattern.

POISSON DISTRIBUTION
Example 106 Analysis of the goals scored per match by a certain football team
gave the following results:

No. goals per match (x) |0 1 2 3 4 5 6 7

No. of matches (f) P48 -29518el08° 79°37

Calculate the mean of the above distribution and the frequencies


(each correct to 1 decimal place) associated with a Poisson distribu-
544 A CONCISE COURSE IN A-LEVEL STATISTICS

tion having the same mean. Perform a x? goodness of fit test to


determine whether or not the above distribution can be reasonably
modelled by this Poisson distribution. (SUJB)

Ik 230
Solution 106 Now xX = as Sa S028
Lf 100
Consider the r.v. X where X ~ Po(2.3), X is ‘the number of goals per
match’.
eq 2.3 (2:3)%
Then Te ea ice ate ce LS Zee.
x!
and the expected frequencies are given by 100P(X = x).
These have been calculated and are shown in the table.

Number of goals 0 1 2 3 4 5 6 7 8ormore


per match

Hiepectes 10.0 2301/2650 2018 VAleteg 5.4 j6 221) 90.7 0.2


frequency (1 d.p.)

x? test:
Hp: _ the distribution is Poisson
As the x? test is not valid for expected frequencies less than 5, we
combine the end categories into ‘5 or more goals’
Degrees of freedom: number of classes = 6
number of restrictions = 2
(totals agree, means agree)
Therefore v = 6 — 2 = 4 and we consider the x7(4) distribution.
We wish to test at the 5% level and reject Ho if 214)
x
X?cots 2 X750(4), 5%

ie. if X7cac > 9.49,


6 2
(O; = 8;) SS
where Monae )rer aes Reject Ho
i=1

14
18
29
18
10
11 :

Therefore x7.4). = 4.275.


2
THE XSTEST, f 545

AS X*caic < 9.49, we do not reject Hy and conclude that the distribu-
tioneecan be reasonably
ttege sores modelled
peat oeSerge eeby eee
the Hn
PoissonSedistribution
OM having
AVINe
the same mean.

NORMAL DISTRIBUTION, mean and variance known

Example 167 For a period of six months 100 similar hamsters were given a new
type of feedstuff. The gains in mass are recorded in the table
below:

Gain in mass Observed


(g) frequency
x

It is thought that these data follow a normal distribution, with


_ mean 10 and variance 100. Use the x? distribution at the 5% level of
significance to test this hypothesis.
Describe briefly how you would modify this test if the mean and
variance were unknown. (AEB)

Solution 10.7 LetX be ther.v. ‘the gain in mass’, thenX ~ N(10, 100). We calculate
P(a< X <b) =p: from the normal distribution tables (p. 634) and
work out the expected frequencies using EL; = 100p.

Upper class
Interval
(a<x<b)
Eanatiaxy Standardised
u.c.b. (z)
PZSes
< Pa<x<b
546 A CONCISE COURSE IN A-LEVEL STATISTICS

x? test:
Hy: the distribution is normal with mean 10 and variance 100
We note that the expected frequencies given by Hp are such that the
first two classes contain less than 5, similarly the last two classes.
So these are combined to give two classes instead of 4.
Degrees of freedom: number of classes = 8
number of restrictions = 1
(totals agree)

Therefore v = 8—1 = 7 and the x?(7) distribution is considered.


We test at the 5% level and reject Hp if 2(7)
xr > X750(7); : 5%

1.exif Xo 07,

where ne calc ai a eee


E; Reject
J H,lo
ejer

i=1

15
24
16
14
8
5 4

Therefore X7.aic = 3.197.


AS X? calc < 14.07, we do not reject Hy and conclude that the data
follows a normal distribution with mean 10 and variance 100.

NORMAL DISTRIBUTION, mean and variance unknown

If the mean and variance are not given for the normal distribution
then these have to be estimated from the observed data. The
expected frequencies are then calculated using these estimates.
This alters the number of degrees of freedom, for if estimates of the
mean and the variance are used, then

number of restrictions = 3 (totals agree, means agree, standard


deviations agree)
For Example 10.7, vy = 8—8 = 5 and the x?(5) distribution would
be considered.
THE x? TEST

Exercise 10b

Perform a X” test to investigate whether


Number living Oe ee eel A. wD
the following data is drawn from a
binomial distribution with p = 0.3. Usea Frequency TAE2ZO) YYZ AO: Sie iy i!
5% level of significance.
Calculate the mean number of living flies
|x |o re ghenge tia,ots, per sample and hence an estimate for p,
the probability of a fly surviving the spray.
|fF|42139 (Binds 4 3
Using your estimate calculate the expected
frequencies (each correct to one place of
Using the data of Question 1, Exercise 4d decimals) corresponding to a binomial
(p. 225) carry out a Xx? test, at the 5% distribution and perform a Ne goodness-
level of significance whether the observed of-fit test using a 5% significance level.
results are sufficiently close to the theoreti- (SUJB)
cal results to support an assumption of a
binomial distribution with mean as Table A overleaf gives the distribution
calculated from the observed data. for the number of heavy rainstorms
reported by 330 weather stations in the
Repeat the procedure in Question 2 for United States of America over a one year
the data given in Exercise 4e (p. 228) period.
(a) question 4 (6) question 6.
(a) Find the expected frequencies of
Two dice were thrown 216 times, and the rainstorms given by the Poisson distribu-
number of sixes at each throw were tion having the same mean and total as
counted. The results were: the observed distribution.
(b) Use the X* distribution to test the
adequacy of the Poisson distribution as a
Frequency [1307610 | Total216 model for these data. (AEB 1977)

Test the hypothesis that the distribution Use the xX” distribution to test the
‘is binomial with the parameter p = ¢- adequacy of the Poisson distribution as a
Explain how the test would be modified model for the data given in Example
if the hypothesis to be tested is that the 4.31 (p. 256).
distribution is binomial with the parameter
The numbers of cars passing a check-
p unknown. (Do not carry out the test.)
point during 100 intervals each of time 5
(O)
minutes, were noted:
A six-sided die with faces numbered as Opp lee 2 9 34 5 6 or more
usual from 1 to 6 was thrown 5 times and
the number of sixes was recorded. The
experiment was repeated 200 times, with
PFreaueney[5 23° 23725 14° 10 0

Fit a Poisson distribution to these data


the following results:
and test the goodness of fit.

Write a short account of the x? test of


goodness of fit, giving some indication of
On this evidence, would you consider the its shortcomings.
die to be biased? Fit a suitable distribution During the weaving of cloth the thread
to the data and test and comment on the sometimes breaks. 147 lengths of thread
goodness of fit. (MEI) of equal length were observed during
weaving and the table records the number
Under what circumstances would you of these threads for which the indicated
expect a variate, X, to have a binomial number of breaks occurred.
distribution? What is the mean of X if it
has a binomial distribution with parameters Number of breaks Ok ome Cebu
n and p? per thread
48 46 30 12 9 2
A new fly spray is applied to 50 samples Number of threads
each of 5 flies and the number of living
Fit a Poisson distribution to the data and
flies counted after one hour. The results
examine whether the deviation between
were as follows:
548 A CONCISE COURSE IN A-LEVEL STATISTICS

theory and experiment is significant. (b) Find the expected frequencies for a
(MEI) normal distribution having the same mean
and variance as the data given, and test
11. The table below gives the distribution of the goodness of fit, using a 5% levei of
the number of hits by flying bombs in significance.
450 equally sized areas in South London
during World War II. During observations on a patch of white
dead nettles it was noticed that the num-
Number of bers of flowers visited by bees during 100
hits (x) 5-minute intervals were as follows:
Frequency
LSOM om oo Number of flowers Breauenc
(f) visited/5-minute interval a mM

(a) Find the expected frequencies of hits 0-5


given by a Poisson distribution having the
same mean and total as the observed
distribution.
(b) Use the X? distribution and a 10%
level of significance to test the adequacy
of the Poisson distribution as a model for
these data. (AEB 1980)
See Exercise 10e, Question 5, (p. 556) for
values of X~j0% (V). (a) Calculate the mean and variance for
the data.
12. The following data (Table B) gives the (b) Find the expected frequencies for a
heights in cm of 100 male students: normal distribution with the same mean
(a) Test, at the 5% level, whether the and variance.
data follows a normal distribution with (c) Test, at the 5% level of significance,
mean 173.5 cm and standard deviation how well the observed data fits this
7cm. normal distribution.

Table A

Number of rainstorms (x)} 0 1 2 3 4 5 sais

Number of stations (/)


LOZ MGA 74 28 10 2 0
reporting x rainstorms

Table B

Height(cm) | 155-160 161-166 167-172 173-178 179-184 185-190


Frequency 5 17 38 25 9 6

USE OF x? TESTS IN CONTINGENCY TABLES


Sometimes situations arise when individuals are classified according
to two sets of attributes. We may then wish to investigate whether
the two sets of attributes are independent, or whether there is
evidence of an association between them.

2 <x 2 Contingency tables (2 rows and 2 columns)


Example 108 A driving school examined the results of 100 candidates who were
taking their driving test for the first time. They found that of the
40 men, 28 passed and out of the 60 women, 34 passed. Do these
results indicate, at the 5% level of significance, a relationship
between the sex of a candidate and the ability to pass first time?
THE xX*2 TEST # 549

Solution 108 The results can be shown ina table, known as a 2X2 (read ‘2 by 2’)
contingency table:

Results of first-time candidates

Male 28 el
Female 34 26 a

Ho: there is no relationship between the sex of a candidate


and the ability to pass first time; the attributes are
independent

To calculate the expected frequencies:


40
P(candidate is male) = ——
100

: e)
: t tim 62
: te passes firs
P(candida = ahh

Under Ho, the events are independent. Therefore


40 \ ( 62
P(candidate passes first time and is male) = |{—— Fe
100) \100
40 \ / 62
Expected number who pass and are male = 100|——] |-—
100} \100
40)(62
ee eee ee
100
(row total)(colun : ptotal)
_NOTE :Expected frequency=
grand total

We could work through this procedure to give the other expected


frequencies, but this is unnecessary, as the other frequencies can be
found by using the fact that the sub-totals and totals must agree
with those in the observed data:

Expected frequencies:

of first-time candidates
ae

Male 24.8 a2 40
Sex | Female —2 22.8 es:
550 A CONCISE COURSE IN A-LEVEL STATISTICS

Degrees of freedom: number of independent variables = 1


(once one expected frequency is known, the others are determined
by agreement of totals).
Therefore v = 1 and we consider the x?(1) distribution.
NOTE: As v =1, we use Yates’ continuity correction when cal-
culating X°catc-
We test at the 5% level and reject Hp if x2(1)
5%
Xe Be X?5a,(1 ),

hesif Mujer 3.845


ae FG te 0.8) 3.84
ano - a . /+_—_—__>

where X*calc = Ne >, eal - Reject Ho


ca ’

(using Yates’ continuity correction)

7.29
7.29
7.29
7.29

Therefore X*.a;. = 1.29.


AS X*calc < 3.84, we do not reject Hy and conclude that these results
do not indicate a relationship between the sex of the candidate and
the ability to pass first time.

Exercise 10c

In an investigation into eye colour and Colourblindness _


left or right handedness the following
results were obtained: Colourblind | Not colourblind

Baws Male 36 964


Female 19 981

Is there evidence, at the 5% level, of an


Eon polon Blue 15 85 association between the sex of a person
Laman Brown 20 80 and whether or not they are colourblind?

Is there evidence, at the 5% level, of an 8. Consider the following 2 X2 contingency


association between eye colour and left tables, and for each test whether A and
or right handedness? B are independent. Use a 5% level of
significance.

An investigation into colourblindness and


the sex of a person gave the following
results:
THEXS 2 TEST / 551

5. In an examination 37 out of 47 boys


passed and 27 out.of 41 girls passed. By
considering a suitable 2 X2 contingency
table, test whether boys and girls differ
in their ability in this subject.
4. Ina2xX2 contingency table, the observed
frequencies are as sh :
4 ae 6. The results obtained by 200 students in
chemistry and biology are shown in the
table. Test, at the 5% level, whether the
Group I
performances in both subjects are related.
Group II

[Pas[al|
Totals
4 : 2 eA
O;—-£; k(ae— bd
i=1 Ej efgh . Pass | 102 45
(do not use the continuity correction).

h X k Contingency tables (h rows and k columns)

Example 109 In the principality of Viewmania a survey of 200 families known to


_-be regular television viewers was undertaken. They were asked
which of the three television channels they watched most during an
average week. A summary of their replies is given in the following
table, together with the region in which they lived.

Region

Channel
watched
most
CCB1
CCB2
VIT
29
6
15
i
3
42
26
12
23
7
10

Find the expected frequencies on the hypothesis that there is no


association between the channel watched most and the region.

Use the x? distribution and a 5% level of significance to test the


above hypothesis.

By considering the contribution to the value of your test statistic


from each cell and the relative sizes of the observed and expected
frequencies in each cell, indicate the main source of the association,
if any exists. (AEB 1980)

Solution 109 Hy: there is no association between the channel watched most
and the region.
552 A CONCISE COURSE IN A-LEVEL STATISTICS

The observed frequencies are first totalled, and then the expected
frequencies under Hy are calculated from
(row total)(column total)
Expected frequency = grand total

Observed data:

CCB1 29 16 42 23 110
CCB2 6 11 26 7 50
VIT 15 3 12 10 40

This is a 8 X 4 contingency table

Expected data:
Expected frequency for northern viewers of
(110)(50) _
. CCB1 = 27.5
70a
This process is continued for the expected frequencies shown in
heavy type. The remaining frequencies are found by ensuring that
totals and sub-totals agree.

CCB1 27.5 16.5 22 110


CCB2 12.5 7.5 10 50
VIT 10 6

Degrees of freedom: Once 6 expected frequencies have been


found, the others are known automatically
(by agreement of totals).

So v = number of independent variables = 6, and we consider the


X?(6) distribution.
We test at the 5% level and reject Ho if
X cate > X56), ai 5%

Teall cence 1ei00s


12.59 a
: ae
(O;=£,)7 Reject Hy *
2 =
where Xcale, = Y rian a.
i=1 E;.
THE x2 TEST # 553

lp
0. or
ol
2
i
6. 5
3. 5
6
3
5
3
4
2

Therefore X*caic = 13.446.


AS X? calc > 12.59, we reject Hy and conclude that there is an associa-
tion between the channel watched most and the region.

The largest pair of contributions to aes are the northern viewers


of CCB2 and VIT, indicating that these viewers watch less CCB2
and more VIT than might be expected; the reverse effect is seen to
some extent in the eastern and southern regions.

Number of degrees of freedom for an / X k contingency table


In general, if there are h rows, then once h —1 expected frequencies
in a row have been calculated, the last value in the row is known
(agreement of totals). Similarly, if there are k columns, once k —1
expected frequencies in a column have been calculated, the last
value in the column is known.
Therefore, number of independent variables = (h —1)(k —1).

So, for h Xk contingency table, vy = (h—1)(k—1).

I S

Exercise 10d ~~

1. Consider the following contingency tables


and for each one, test at the 5% level
whether A and B are independent.

A,|16 19 15
A,|26 14 10
554 A CONCISE COURSE IN A-LEVEL STATISTICS

represented the College at netball but do


not play hockey, and 100 do not play
games at all. In all 100 girls have repre-
sented the College at hockey, and 150 at
netball. The number who do not play
hockey is 200 and the number who do not
play netball is 125.
2. A thousand households are taken at random Arrange the above data in the form of a

three groups A, B and C, 3x83 and state how


contingency table,hockey
and divided into many pupils play both an A-vet!
according to the total weekly income.
The following table shows the numbers in ball but have not represented the College
each group having a colour television Sonaithee:
receiver, a black and white receiver, or no
Apply the x? test to your 3 X83 table, and
television at all.
state the hypothesis which it tests.

pafale| 5.
(AEB 1976)

The following are data on 150 chickens,


Colour television 56 | 51] 93 ‘divided into two groups according to
Black and white 118 | 207 | 375 breed, and into three groups according
None 26 | 42) 32
to yield of eggs:
Calculate the expected frequencies if there
is no association between total income and
television ownership.
Apply a test to find whether the observed
Rhode Island Red 46 29
frequencies suggest that there is such an
Leghorn 27 14
association. (AEB 1974)

3. The following table shows the numbers of ae cee pave See ae with: the hy po:
years it Set ehea tue Mth thesis that the yield is not affected by the
students passed and failed by three type of breed?
examiners A, B and C.
6. In asmall survey 350 car owners from
Examiners four districts P, Q, R, S were found to
have cars in price ranges A, B, C, D, the
frequencies of the prices being as shown
in the table.

Test the hypothesis that the three examiners


fail equal proportions of students by Price of car
applying x2 tests with and without Yates’
correction. Comment on the results.
Kuan Sanh (AEB 1976) Find the expected frequencies on the
4. AtSt. Trinian’s College for Young Ladies hypothesis that there is no association
there are 1000 pupils. Of these 75 have between the district and the price of the
represented the College at both hockey and Car;
netball, 10 have represented the College at Use the X? distribution to test this hypo-
hockey but do not play netball, 35 have thesis. (AEB 1975)

SUMMARY — x2 TEST AND DEGREES OF FREEDOM


" (O;—E;,)*
FA ia y! rs iy
i=
For v = 1, using Yates’ continuity correction,
; :
((O;— £105)
Neva i a z,
i=1
THE x2 TEST
555

Degree of freedom (v):

Uniform distribu- n classes


tion and distribu- 1 restriction (totals
tions in a given agree)
ratio n—1 independent
variables

Binomial distribu- (a) p known


tion n classes
1 restriction (totals
agree)
n—1 independent
variables
(b) p unknown
n classes
2 restrictions (totals
agree, means agree)
n— 2 independent
variables

Poisson distribu- n classes


tion 2 restrictions (totals
agree, means agree)
n— 2 independent
variables

Normal distribu- mean and variance


tion known
n classes
1 restriction (totals
agree)
n—1 independent
variables
mean and variance
unknown
n classes
3 restrictions (totals
agree, means agree,
variances agree)
n— 8 independent
variables

2 X 2 contingency 4 classes
table 1 independent
variable

h Xk contingency h Xk classes yp= (h—1)(k—1)


(h —1)(k —1) inde-
pendent variables
A CONCISE COURSE IN A-LEVEL STATISTICS

~ Miscellaneous Exercise 10e

1. Arandom sample of 100 housewives were distribution can be taken as normal, with
asked by a market research team whether the same mean and standard deviation as
or not they used Sudsey Soap. 58 said yes the observed distribution. | (C Additional)
and 42 said no. In a second random sample 5. Smallwoods Ltd. run a weekly football
of 80 housewives, 62 said yes and 18 said
pools competition. One part of this involves
no. By considering a suitable 2 X 2 con- a fixed-odds contest where the entrant has
tingency table, test whether these two to forecast correctly the result of each of
samples are consistent with each other. five given matches. In the event of a fully
(O &C)
correct forecast the entrant is paid out at
odds of 100 to 1. During the last two years
2. Two fair dice are thrown 432 times. Find Miss Fortune has entered this fixed-odds
the expected frequencies of the scores 2, contest 80 times. The table below sum-
Oe eecs marises her results.
Two players, A and B are each given two Number of matches correctly 2 4 65
forecast per entry (x) Pies :
dice and told to throw them 432 times,
Number of entries ;with x
recording the results. The frequencies 8 19 25 22) 5, 1
correct forecasts (/)
reported are given in Table A below.
(a) Find the frequencies of the number of
Is there any evidence that either pair of
matches correctly forecast per entry given
dice is biased? What can be said about B’s
by a binomial distribution having the same
alleged results? (AEB 1976)
mean and total as the observed distribution.
(b) Use the X? distribution and a 10% level
3. Over a period of 50 weeks the numbers of significance to test the adequacy of the
of road accidents reported to a police binomial distribution as a model for these
station are shown in the table below. data.
(c) On the evidence before you, and
assuming that the point of entering is to
win money, would you advise Miss Fortune
to continue with this competition and
Find the mean number of accidents per
why? (AEB 1981)
week.
Use this means, a 5% level of significance, (NOTE: X",o%(1) = 2.71, X?10q(2) = 4.61,
and your table of Xx? to test the hypothesis X7109%(3) = 6.25, X*s9q,(4) = 7.78,
that these data are a random sample from X10%(5) = 9.24)
a population with a Poisson distribution.
(O & C) 6. The table summarises the incidence of
cerebral tumours in 141 neurosurgical
patients.
4. Table B below shows the girths of one type
of fir tree in a plantation of 480 trees set Type of tumour
alongside the distribution that would be
expected if the distribution were normal. Frontal lobes 23 9 6
Site of
Use the X’ test, with a 5% significance tumour
Temporal lobes 21 4 3
Elsewhere 34 24 17
level, to determine whether the observed

Table A

Scores 239 4 Cet One mmmL Ope tole 2


A’s frequency |18 33 28 54 62 65 66 42 30 27 7 .

B’s frequency |14 22 34 51 58 73 63 45 38 25 9

Table B

Girth of trees (in metres) | 0.6-0.8


No. of trees (observed)

No. of trees (expected)


THE x2 TEST 557
Find the expected frequencies on the occurring at an equal rate on each produc-
hypothesis that there is no association tion line. (AEB 1988)
between the type and site of a tumour. Use
the X? distribution to test this hypothesis. 9. The number of accidents in a large factory
(AEB 1977) Over a period of one month is recorded
Explain how to calculate the degrees of in 7 hourly periods in Table D below.
freedom for the X? statistic in (a) a goodness- Display these data in a suitable diagram
of-fit test, (b) a test of no association of and comment on them.
the two factors in ann Xk contingency Test the hypothesis that accidents are
table. equally likely to happen at any time of
An ecologist collected organisms of a the day. Comment on your conclusion in
particular species from three beaches and relation to the diagram drawn. (SUJB)
counted the number of females in each 10.
(a) As part of a statistics project,
sample (the remainder were males).
students observed five private cars passing
a college and counted the number which
[No-one
[aa[soo were carrying the driver only, with no
passengers. This was repeated 80 times.
The results of a ee were
Test if the proportion of females differed as follows:
significantly between the beaches.
Find the percentage of females at each Number of cars Number of
beach and comment on the results. (O) with driver only times observed

A factory operates four production lines.


0 0
Maintenance records show that the daily 1 3
2 12
number of stoppages due to mechanical
3 ad
failure were-as shown in Table C below
4 26
(it is possible for a production line to
5 12
break down more than once on the same
day). You may assume that Yf= 1400,
Lfx = 1036.
Use the xX? distribution and a 5% signifi-
cance level to test whether the binomial
(a) Use a X” distribution and a 1% sig-
distribution provides an adequate model
nificance level to determine whether the
for the data.
Poisson distribution is an adequate model
for the data. (b) In a further part of the project the
(b) The maintenance engineer claims that students counted the number of cars
breakdowns occur at random and that the passing the college in 130 intervals each
mean rate has remained constant through- of length 5 seconds. Table E overleaf
out the period. State, giving a reason, shows the results obtained by the same
whether your answer to (a) is consistent student together with the expected num-
with this claim. bers if a Poisson distribution, with the
(c) Of the 1036 breakdowns which same mean as the observed data, is fitted.
occurred 230 were on production line A, Use the X” distribution and a 5% signifi-
303 on B, 270 on C and 233 on D. Test cance level to test whether the Poisson
at the 5% significance level whether distribution provides an adequate model
these data are consistent with breakdowns for the data.
Table C

Number of
stoppages, x

Number of
days, f
Table D

: 09.00- 10.00- 11.00- 13.00- 14.00- 15.00- 16.00-


Period 10.00 11.00 12.00 14.00 15.00 16.00 17.00

accidents
558 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) The teacher suspected that this 12. Over a long period of time, a research
student had not observed the data but team monitored the number of car
invented them. Explain why the teacher accidents which occurred in a particular
was suspicious and comment on the county. Each accident was classified as
strength of the evidence supporting her being trivial (minor damage and no
suspicions. (AEB 1987) personal injuries), serious (damage to
vehicles and passengers, but no deaths)
11. One formula for the x’ statistic is or fatal (damage to vehicles and loss of
(fo-f, Me life). The colour of the car which, in the
2= Decera opinion of the research team, caused
the accident was also recorded, together
where f, is the observed frequency, he with the day of the week on which the
is the expected frequency and the accident occurred. The following data
summation is over the number of groups. were collected.
Show that the formula may also be
written as

16 s

where N is¢he total number of observa-


tions.
(a) Ballpoint pens come off a production
line and are packed into batches of 100.
It is believed that the number of defec-
tive pens in each batch follows a Poisson Analyse these data for evidence of associa-
distribution with mean 2.8. 100 batches tion between the colour of the car and
of pens were examined and the observed the type of accident.
frequencies of the number of defective State the condition which sometimes
pens in each batch found to be those necessitates the amalgamation of rows or
in the table below. Test whether the columns in contingency tables. Explain
suggested Poisson model fits these data. why amalgamation might not be approp-
riate for this table.
Number of
defective pens severe re
The following table summarises the
data relating to the day of the week on
Frequency 519825. 2071687, 8 which the accident occurred.

(6) To find whether there is any associa-


tion between a person’s eye colour and
his or her skin’s susceptibility to sunburn, accidents
a random sample of 180 people was taken
and the data in the table below obtained. Monday
Test whether there is significant evidence Tuesday
of association. Wednesday
Thursday
Eye colour eee a
to | Total
Friday
eee a
Saturday
Sunday
Medium
Blue . 27
Investigate the hypothesis that these
Brown 13 fa
data are a random sample from a uniform
Grey-green 0 48 26 100 (O) distribution. (AEB 1987)
Table E

Number of cars passing a point


in a 5 second interval 7 or more

|Number of
of intervals
intervals observed
observed _| ack 32 eee

Se ofee
intervals expected — 4 33.72]}18.16 |7.33|237| 064] 0.18 |
Es
REGRESSION AND
CORRELATION
SCATTER DIAGRAM

Sometimes we wish to investigate the results of a statistical enquiry


or experiment by comparing two sets of data, x and y, for example

The weight at the end of a The length of the spring


spring
Pupil’s mark in French Pupil’s mark in German
The diameter of the stem ofa | The average length of leaf of
plant the plant
The age of a plant The quantity of fruit produced
by a plant

Consider the set of points (x,,¥1), (%2,¥2),---+5 (Xn, ¥n)- If the values
of y are plotted against the values of x, then a scatter diagram is
obtained.

REGRESSION FUNCTION
We then look for a relationship :y = f(x), where the function f is to
be determined, i.e. given the points only we have to ‘work back-
wards’ or ‘regress’ to the original function f. Hence this function is
called the regression function.

LINEAR CORRELATION AND REGRESSION LINES

We shall consider only the simplest type of function where y = f(x)


is a straight line. If all the points in the scatter diagram seem to lie
near a straight line, we say that there is linear correlation between
x and y.

559
560 A CONCISE COURSE IN A-LEVEL STATISTICS

We try to estimate fairly accurately the position of this line, and


having done so we call it a regression line.
(a) If y tends to increase as x increases, y
then there is positive linear correlation.
J
YA regression line

(b) If y tends to decrease as x increases, y


‘ . . ° 3 i li
then there is negative linear correlation. ee

(c) If there is no relationship between x and


y, then there is no correlation.

NOTE: common sense is needed when interpreting scatter diagrams;


for instance we might find that there is an increase in the number of
bank robberies and an increase in the number of health food shops
over the last 5 years in a certain town — however, it would be foolish
to look for a relationship between the variables.
There are many ways of obtaining regression lines for different
purposes — two methods are indicated below.

METHOD | — DRAWING A REGRESSION LINE ‘BY EYE’

(a) If-there is very little scatter. y


First calculate the co-ordinates of the point porearesstan Iie
= re oly,
(x,y) where X = —— and y = :
n n
Then draw a line of good fit, ensuring that it
passes through (X, ¥).
(b) If there is a fair degree of scatter.
In this case we can distinguish two regression lines:
(ii) x ony
REGRESSION AND CORRELATION : 567
(i) A line of regression of y on x. This can be used to estimate y,
given a value of x.
(ii) A line of regression of x on y. This can be used to estimate x,
given a value of y.

Method for drawing these two regression lines by eye


(i) A line of regression of y on x —we assume the values of x to be
accurate and draw aregression line as follows:
(a) Find the mean M(<X, ¥) of the distribution.
(b) Through M drawaline parallel to the y axis. This divides
the points into two groups.
(c) Find the mean M, of the points on the left.
(d) Find the mean Mx of the points on the right.
(e) Draw aline of best fit through M, M,; and Mp.
(ii) A line of regression of x on y — we assume the values of y to be
accurate and drawaregression line as follows:
(a) Find the mean M(x, ¥) of the distribution.
(b) Through M drawaline parallel to the x axis.
(c) Find the mean M, of the points above.
(d) Find the mean M; of the points below.
(e) Draw the line of best fit through M, M, and Mg.

Example 11.1 The following table gives the test results for 10 children.

Child AD ah le 2 eg ld ae Pees
Arithmetic mark,x |1 8 15 18 23 28 33 39 45 45
English mark, y 38 14 8 20 19 17 36 26 14 29

(a) (i) Draw a scatter diagram, and by finding the means of


certain points draw a regression line y on x.
(ii) Estimate an English mark for a child who missed the
English test, but who had 20 in the arithmetic test.

(b) (i) On the scatter diagram draw a regression line x on y.


(ii) Estimate an arithmetic mark for a child who was absent for
the arithmetic test, but who had 30 in the English test.
(c) Would you use one of these lines to estimate an English mark
for a child who had 60 in the arithmetic test?

2G 255 Ly 186
i
Solution 11.1 (a) (i)Vie <*xg sl = ze
1G 10 5, 25.5,
25. Visa...
¥y 10 = ¢.
10 — 18.6

So we plot the point M(25.5, 18.6) and ensure that the line passes
through it.
For a regression line y on x, draw a line through M parallel to the
y axis.
562 A CONCISE COURSE IN A-LEVEL STATISTICS

For the points on the right |


For the points on the left

2.
17
8 14 33 36
15 8 39 26
18 20 45 14
23 19 29

Yx=65 Yy=64 Yx=190 Yy=122

65 190
S Oo x,
XL = —5 = 18, Xp R = — 5 = 38,

Vik
een!
sent 12.8
i 122
YrR= i = 24.4

We plot
M, (18,12.8) We plot Mp (38, 24.4)

Now draw aline of good fit through M, M, and Mg. This is a


regression line y on x.
(ii) If a child had 20 in the arithmetic test, since x is given, we use
the line y on x to estimate the English mark. From the line, the
estimated mark for English is 16. wae

(b) (i) x on y: Draw the line through M parallel to the x axis.

For the points above For the points below

x EY,
1 3
8 14
15 8
28 aSys
45 14
2x=158 LYLy=1380 Lx=97 Yy=56
158 he oe
Ei leo Bgl eee

S 130
AT aaa 5
We plot M, (31.6, 26) We plot Mz (19.4, 11.2)

Now drawa line of good fit through M, Ma, Mg. This is a regression
line x on y. .

(ii) If a child had 30 in the English test we use the line x on y, as y


is given. From this line, the estimated arithmetic mark is 35.
REGRESSION AND at TION ; 563
(c) The mark of 60 in the arithmetic test is outside the range of the
data. We could use the regression line y on x as drawn on the scatter
diagram to give an estimated English mark of 35, but this result
should be used with caution. As a general rule, keep within the
range of the data.

mark
English

Arithmetic mark

Scatter diagram to show English and Arithmetic marks for 10 pupils

NOTE: it may appear strange to have two regression lines, but it


does matter which is considered. Suppose that there is a positive
correlation between the height and mass of males. A result of this
might be that the average mass of all males of height 1.93 m (6 ft
A inches) is 85.7 kg (133 stone).

So, if you were given a height of 6 ft 4 inches you would guess 133
stone for the mass.

But, if you were given a mass of 133 stone, would you guess 6 ft
4 inches for the height? If you would not, then the two regression
lines are different.

NOTE: if we had a set of data such as

5 10 15 20 25
|y| 20 21 23 24 23
then it is obvious that the value of x has been controlled. In this
case we would use a regression line of y on x to estimatey ,given x,.
ot2a regression line x ony to estimate .x, given y..
but no
564 A CONCISE COURSE IN A-LEVEL STATISTICS
ee ee eee ——————

1. For the following sets of data, draw scatter 3. Four identical money boxes contain
diagrams and comment on the correlation. different numbers of a particular type of
Draw regression lines y on x and x on y. coin and no coins of other types. From the
information on the combined weights,
(a) Use these 11 pairs of data: which is given below, it is desired to
[x [3 7 9 11 14 14 15 21 22 23 26 estimate the weight of a box and the mean
[y [5 12 5 12 10 17 28 16 10 20 25 weight of a coin.

(b) Use these 13 pairs of data:


in box

Combined weight y |312 509 682 865


of coins and box

eat 10L £4 1012,5. 14 614,65 (a) Plot these data on a scatter diagram,
labelling the axes clearly. State whether the
fy| 81 70 74 66 © 69. 63 data display strong positive, strong
negative, or near zero correlation (or
otherwise).
(b) State the co-ordinates of one point
through which the line of regression of y
upon x must pass.
(c) Draw on your diagram, by eye, this
regression line.
(d) Estimate, from your regression line,
(i) the weight of an empty box, (ii) the
mean weight of a single coin. (C)
2. Values of two variables x and y obtained
from a survey are recorded below.
4. Table A gives the rainfall, in cm, for the
eet oh Vad EG ge es first nine months of a year at two weather
stations. Calculate the mean monthly rain-
ly |81 78 53 585 48°29 15 3 fall over this period at each station and
Represent these data on a scatter diagram plot the information given in the table ona
and draw in the line of best fit. Obtain the scatter Hiegeam;deavang-a line of best 110
equation of the line of best fit in the form Find the equation of this line and use it to
y =mx-+c and estimate the value of y predict the rainfall at B in a month when
when x = 5.5. (SUJB) 2.5 cm of rain fell at A. (C Additional)

Table A

METHOD II — CALCULATING THE EQUATIONS OF THE


LEAST SQUARES REGRESSION LINES
(a) The least squares regression line y on x
Let the equation of the least squares regression line y on x be
y=axt+b.
REGRESSION AND CORRELATION
565

Consider the set of points (x;, y;), where


P= 152, cc, MN.
We find the values of a and b such that
Ym,’ is a minimum, where m; is as shown
in the diagram.

The lengths P,Q,, P,Q,, ...,P;Q; are


called residuals.

Now m, = Q,P; = Q,R,—P,R, = (ax,+b)-y,

m,? = (ax, +b—y,)*

So Di ays eA Ve) -palivenidls Qovecabill

Ym,’ is the sum of the squares of the residuals and if we can find
values of a and b such that 2m; is a minimum, then the line
y = ax + b is called the least squares regression line of y on x.

If we allow b to vary, keeping all other quantities constant, we can


obtain the value of b for which 2 m,? is a minimum by setting
dma:

db

dim; d2(ax;+b—y;)*
Now sa een cine wa) 2 2(ax;
+ b—y;)
db db
= 2(arx;+nb—Zy,j)

dim, "
So a = 0 when 2y; = arx;+nb (i)

Similarly, if we allow a to vary, keeping all the other quantities


constant,

mak
Peed Gx et aay) gdb yx,
da da
= 2adxfp oD x2)

d> m?
So 0 when Lx,y; = alxf+brx; (ii)
da

If the least squares regression line y onxisy =ax +b, the values
of a and b are found by solving the simultaneous equations
Ly = arx+nb
Sey = dint
+ 02K
These equations are called the normal equations for y, on x.
566 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 11.2 Show that the least squares regression line y on x passes through
the mean of the data.

Solution 11.2 Consider the regression line y = ax + b.


To calculate a and b the following equation is used:
Ly = axx+nb
Ly ix.
Divide through by n == 0 771 0
n n
sO y = ax+b
Hence, the point (X,Y) lies on the regression line y = ax + b.

(b) The least squares regression line x on y


Let the equation of the least squares regression line x on y be
x=cytd. \

We find the values c and d such that ” x =cyt+d

Dn, is a minimum.

N3
(x4, V4)

It can be shown that

an the least squares regression line x ony is x= cy axd, the values


of c anddda are found by solving thesimultaneous ee
ll
ieae
Dey : i Se -
“Theseequations are called the normal equations for aonLys

It can be shown that the point (X, y) lies on the line x = cy +d.
NOTE: in general, these lines will not coincide with those obtained
by the earlier methods described.

Example 11.3 Obtain the normal equations for the least squares regression line y
on x for the following data:

eeei Gi 4 opyi 0
ya [io 14 ie Sis is treme
Hence find the equation of the least squares regression line y on x.
REGRESSION AND CORRELATION
567

Solution 11.3

Dx = 38 Ex?=270 | Sy?=1147 | Sxy= 495


For these data n = 7. The normal equations for y on x are

Ly G2. X45 710) sO 89 = 38a+7b (i)

Lexy = adx?>+brux 495 = 270a+38b (ii)


Solving equations (i) and (ii) gives a = 0.186, b = 11.7.

The least squares regression line y on x is

y = 0786x + 11:7

Exercise 11b

1. Calculate the equation of the least squares 2. Calculate the equation of the least squares
regression line x on y for the data given in regression line (a) y onx, (6) x ony for
Example 11.3. the data givenin (i) Exercise 11a, Question
1(a) (p.564), (ii) Example 11.1 (p. 561).

COVARIANCE

The covariance of (x1,91), (9,2) +++» (Xp ¥n) is defined as

re bay= =2e—By-9)

Now ty = =B(e—zly—9)= ey
ee ae ey yay za <<)
n n
568 A CONCISE COURSE IN A-LEVEL STATISTICS

since

1 er
Sxx = (x say eeae) II ” x
n
1 |
Se 7 ty VOY) = ~

In the following we shall use the alternative forms of the formulae:


xt Ly?
Sf = ay se ee ye (see p. 49)
n n
For the least squares regression line y on x, let y = ax + b.

ALTERNATIVE METHOD FOR CALCULATING THE


EQUATIONS OF LEAST SQUARES REGRESSION LINES

Using regression coefficients

The normal equations are


Ly = akx+nb (i)
Dxy = aLx--+b Bx (ii)

Multiplying (i) by 2x,


TxLy = a(Xx)*+nbrx

Multiplying (ii) by n,
noxy = anXx?+nb&Ux

Subtracting and rearranging


ERs eS
nix?—(Dx)?
ee =| =
n n n . . .

= ae eel so ge (dividing both the numerator and


oe (=) denominator by n’”)
n n 5

eeay
6,7
REGRESSION AND CORRELATION
569

a is known as the coefficient of regression of y on x, where


§
a=
Sx

a is the gradient of the least squares regression line yon x,


Now, since the regression line passes through (x,y), its equation
must be of the form y —¥ = a(x —%),

The equation of the least squares regression line y on x is


< . 5 a: < a

or esex) |
Sx

Similarly, for the least squares regression line x on y,x=cy+d,


it can be shown that

§
g
where c is the coefficient of regression of x on y
C=

w J i) 3

The equation of the regression line x on y is

MINIMUM SUM OF SQUARES OF RESIDUALS

The equation of the least squares regression line y on x is

The point Q; lies on this line and its x-coordinate is x;. So its
y-coordinate is yg where

yo 5
2 Ss

So mm; =
A CONCISE COURSE IN A-LEVEL STATISTICS
570
of the squares of
We shall denote the minimum value of the sum
residuals by 2m,7(min) where
. 2
LMAminy a lia) en (x; —*)
x
2
Sxy
2-5 D(x;—X)(y;—Y) + (sx”)? D(x,—-X)?
8, tabs fou

2(9;
2
Sy‘y S xy a
= ns, — 2) NSxy + Tg MSx
Sx (s,")
Roe 2
xy xy
= fey aT? Wer +n
Sy Sx
2
= ns, Rj Sxy_

Similarly, it can be shown that

The minimum sum of squares ofresiduals for x on y is


ee)
: n 8x aS
2__ Sey.S

Example 11.4 Draw a scatter diagram for the following data. Calculate the equations
of the lines of regression (a) y onx, (b) x ony, and draw these on
the diagram.
Find also the minimum sum of squares of residuals (c) for y on x,
(d) for x on y.

| || eee
ee LG
Fee eR Ps ee

Solution 11.4 This is the same data as in Example 11.3, so we refer to the table .
on p. 567.
4 ux 38
We have x = — =— = 54 (1d p.),
n fi,
x Ly 89 .
Yee er = ee 12.7 eedip.)
n 7
On the scatter diagram, plot M(5.4, 12.7).
REGRESSION AND CORRELATION : 571

Deer ee yt EAR HBSSTATS)


HER hs 35
eke) Cee
pee, 270 88)?
Pac ae ia Gilat?

s2 = Sy?
Sy = 1147, /89\? 22.204
=|
id rc 7 a

(a) Equation of least squares regression line y on x

2 Sxy a
Da Some x)
Sx

89 1.694 38
sO Ya tee (alesse |r
7 9.102 i
Rearranging, y = 0.186x+11.7 (as before)

Draw this on the scatter diagram, by plotting M(5.4, 12.7) and two
other points say (0,11.7) and (1, 11.886).

(b) Equations of least squares regression line x on y


s
Kes Caneaaigel Vem)
Sy
bg 0! ee =
A Sail @a DUA LT
Rearranging, x = 0.769y
— 4.34

Draw this on the scatter diagram by plotting M(5.4, 12.7) and two
other points, say (0, 5.64) and (1, 6.94).
572 A CONCISE COURSE IN A-LEVEL STATISTICS

(c) Fory onx


Soe
2 a
ZM;(min) = N\Sy 2 eee 3
Sx

1.694?
= 7(2.204 -———
9.102
= 13.2 (35S.F.)
The minimum sum of squares of residuals for y on x is 13.2 (3 S.F.).

(d) For x ony


3.2
2 i 2a)
2N; (min) — | Sx 2
Sy

1.694?
719.102 ————
2.204
II 54.6 (35.F.)
The minimum sum of squares of residuals for x on y is 54.6 (3 S.F.).

CALCULATOR NOTE: working in SD mode, the values of X, y, s,


and s, can be obtained directly and used in the calculations. The
value of s,,, will need to be calculated separately.
If your calculator has LR mode (linear regression) then the regres-
sion line y on x can be found directly. The following calculations
were done on a Casio 100C or 115N:
For the data in Example 11.4 on p. 570.

aah eee oO
|y|10 14 1279 45 75 12

prong[2]
set the calculator to LR (linear regression) by pressing

Ce][eave]
[52]foam
9
REGRESSION AND CORRELATION
573

(Try to use both hands, the left hand for the numbers and the right
hand for the DATA | keys.)

If the regression line y on x is written in the form

y = A+Bx

then gives 11.704 035 87

| gives = 0.186 098 654

so that the regression line y on x is y = 0.186x + 11.7 (as before).


You also have access to the following information:

gvesm 2x. 17 0

gives M II 38

gives
=]
Le] = 7

gives |Zy?] = 1147


gives | Dy II 89°

[Kou][6] gives |[Exy| = 495


gives = 5.428571 429

gives 8,2 = 9.102040816


12.714 285 71

gives s,? = 2.204081 634

SPECIAL NOTE: you also have access to r, the product-moment


correlation coefficient (see p. 579) and this is obtained by pressing

SHIFT] [9 |which gives| r |= 0.378 180198.


To clear the LR mode, press |MODE [0 | :
574 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 11c

In the following questions, check your answers (a) Plot the data. Comment on whether it
using your calculator in LR mode if possible. appears that the usual simple linear regres-
sion model is appropriate.
1. Calculate (i) the covariance, (ii) the equa- (b) Assuming that such a model is approp-
tions of the two least squares regression riate, estimate the regression line of yield
lines for the following data. Plot the on temperature.
scatter diagrams and draw in the regression (c) Plot your estimated line on your
lines. Find also the minimum sum of squares graph, and indicate clearly on your graph
of residuals (iii) for y on x, (iv) for x ony. the distances, the sum of whose squares
is minimised by the linear regression
procedure. (MEI)

To test the effect of a new drug twelve


patients were examined before the drug
was administered and given an initial
score (IJ) depending on the severity of
20 20.2 21.4 21.6 22.8 23.4 24.6 various symptoms. After taking the drug
they were examined again and given a
c.g ows ST Ss los final score (Ff). A decrease in score repre-
sented an improvement. The scores for the .
twelve patients are given in Table A.
9 65 66 115 14 765 Calculate the equation of the line of
regression of F on I.
5 75 fee 10 os (eels
On the average what improvement would
you expect for a patient whose initial
score was 30? (MEI)
eer 12° 12 deni Sunes
A straight line regression equation is
|y | 65 63 64/65 63 62 60) 61 fitted by the least squares method to the
n points (x,,¥,),f =1,2,...,n. For the
2. Calculate the equation of the regression regression equation y = ax+b, showin
line of y on x for the following distribu- a sketch the distances whose sum of
tion: squares is minimised, and mark clearly
which axis records the dependent variable
25 30 35 40 45 50 and which axis records the independent
(controlled) variable.
78 70 65 58 48 42 In a chemical reaction it is known that
the amount, A grams, of a certain com-
Is it possible to calculate from the equation
pound produced is a linear function of
you have just found (a) an estimate for
the temperature T C. Eight trial runs of
the value of x when y = 54? (b) anestimate
this reaction are performed, two at each
for the value of y when x = 37? In each
of four different temperatures. The
case, if the answer is ‘Yes’, calculate the
observed values of A are subject to error.
estimate. If the answer is ‘No’, say why
The results are shown in the table.
not. (SUJB)

a The following data show, in convenient


units, the yield (y) of a chemical reaction
run at various different temperatures (x):
Temperature (x) 110 120 1380 140 150 160 170
Draw a scatter diagram for these data.
Yield (y) ale ey Gial e DCy iaia EES) Calculate A and T.

Table A

| Patient | 12 8 4 86 7) 8 8 10°11 12
ee Initial (J) | 61 23 8 14 42 34 32 31 41 25 20 50
Final(F) | 49 12 3 4 28 27 20 20 34 15 16 40
REGRESSION AND CORRELATION
575
Obtain the equation of the regression line
Criticise the report and make your own
of A on Tgiving the coefficients to 2
recommendations on how to achieve the
decimal places. maximum yield. (AEB 1988)
Draw this line on your scatter diagram.
Use the regression equation to obtain an Referring to your projects if possible,
estimate of the mean value of A when explain clearly the purpose of obtaining
T = 20, and explain why this estimate a linear regression equation, and describe
is preferable to averaging the two what use was, or could be, made of this
observed values ofA when T= 20. equation.
Estimate the mean increase in A for a
A large field used for growing potatoes
One degree increase in temperature. was divided into 6 equal plots, and each
State any reservations you would have plot was treated with a different concen-
about estimating the mean value of A tration of a certain fertiliser. At harvest
when T=0. (L) time the yield from each plot was recorded,
and the results are given in the table, with
6. Inan attempt to increase the yield (kg/h)
potato yield (Y kg m~”) and fertilizer
of an industrial process a technician varies concentration (Cgl ).
the percentage of a certain additive used,
while keeping all other conditions as Concentration, C - i an
constant as possible. The results are
Yield, Y 10°16°26 36°50) 72
shown below.
Draw a scatter diagram for these data,
and mark on your diagram the point
representing the mean of the data.
Find the equation of a suitable regression
line from which the yield to be expected
for a concentration of 5 ln can be
predicted, and give the value of this
expected yield. Sketch the regression
line on your scatter diagram.
Calculate the sum of squares of the
You may assume that 2x = 34, residuals and explain what this value
Ly = 1057, Yxy = 4504.55, represents with regard to your regression
Dx? = 155. line.
(a) Draw a scatter diagram of the data. [If required, you may assume in your
(b) Calculate the equation of the regres- working that UC* = 66.25, ZCY = 813,
sion line of yield on percentage additive YY* = 10012.) (L)
and draw it on the scatter diagram.
In an experiment the temperature of a
The technician now varies the tempera-
ture ( C) while keeping other conditions metal rod was raised from 300 K. The
extensions E mm of the rod at selected
as constant as possible and obtains the
temperatures T K are shown in the table.
following results.

70
TAS:
80 igo
85
90
He calculates (correctly) that the regres-
sion lineis y = 107.14 0.29¢.
(c) Draw a scatter diagram of these data
together with the regression line.
Draw a scatter diagram of the data and
(d) The technician reports as follows,
mark on your diagram the point repre-
‘The regression coefficient of yield on
senting the meansof TandE. |) {4
percentage additive is larger than that of
yield on temperature, hence the most Find the equation of the regres8ion line
effective way of increasing the yield is to of E on T and draw this line on your
make the percentage additive as large as diagram. Estimate the extension of the
possible, within reason.’ rod at 430 K. (L)P
576 A CONCISE COURSE IN A-LEVEL STATISTICS

THE PRODUCT-MOMENT CORRELATION COEFFICIENT

The least squares regression line y on x is


Sy
y-F == @—z) Sx

and the least squares regression line x on y is


= eee a
oo a aaa)
y

We ‘standardise’ these equations as follows:


\ —¥ etsy, (sts)
Sy S,Sy Sx

Soe Sxy (y—Y)


Sx SxSy Sy

Now take new axes, with origin (x,y) and the X axis graduated in
units of s,, the Y axis graduated in units of sy.

Y= and xX =

Now the regression lines can be written

Y = rx where he

and xX =ryY

The diagram from Example 11.4 would change from diagram (a) to
diagram (b):

Diagram (a) Diagram ({b)

where the new origin is at the point (5.4, 12.7) and

pot |16 OST ee eye


seo) §/9.102 \/ 22048 a ara
REGRESSION AND CORRELATION
577

Note that if Y=rX makes an angle @ with the X-axis, then


r=tané@. Sothat X =rY makes the same angle 6 with the Y-axis.

r=tané

Some examples of regression lines together with the corresponding


lines Y=rX and X =rY are shown below.
Generally, the more correlated the variables are, the closer are the
two regression lines.

y on x and
xX on y coincide

Perfect positive
correlation r = 1

High positive rate


correlation r = 0.8

r=0.5
Some positive
correlation r = 0.5
578 A CONCISE COURSE IN A-LEVEL STATISTICS

No correlation r = 0 r=0

—~

Some negative
correlation r = —0.4

—_

High negative
correlation r = —0.9

x on y coincides
with y on x

Perfect negative
correlation r = —1

Thus r is a measure of the degree of scatter. It is independent of the


units in which the data are measured. Note that it is a measure of
linear correlations only, so that even if there were a perfect quad-
ratic relation between the variables we should still have r ~ 0.
Yi

r is known as the product-moment correlation coefficient.


REGRESSION AND CORRELATION 579

The product-moment correlation coefficient, r, is given by

ha= eiey

ALTERNATIVE METHOD FOR FINDING THE MINIMUM SUM


OF SQUARES OF RESIDUALS

The minimum sum of squares of residuals can now be written in a


slightly simpler form, using the product-moment correlation
coefficient, r.
For y onx
mee
2 = 2__ °xy
2M; (min) ah Sy ~B
Sx

: y
= n(s,?—r?s,”) since pe eee
Sx

SxSy
ex:
=sn( a Ler"oye?
)sy

For y on x, the minimum sum of squares of residuals is


_n(i—r?)s,?.

For x ony

2 is Be
s 2
2Ni(min) = nls |
Sy
Sx y
=n, (S cet 82) since r= —
By
= vest Se

For x on y, the minimum sum of squares of residuals is


n(1 es

Example 11.5 For the following data, find the product-moment correlation
coefficient. Find also the minumum sum of squares of residuals for
yon x.

|x | 20 30 40 46 54 60 80 88 92
|y| 54 60 54 62 68 80 66 80 100
580 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 11.5

For these data, n = 9.

Sxy ty Yeates SURE 4 Fess


ehie (510) (624
a|= 280.4445
n 9

Tx 34140 (510)?
ee = so = 582.2292
n 9 9

, fy? _, _ 45056 (624)?


sy = 2-9? = (SSeS Se 997111
n 9 9

So r=
Sey SS
280.4445 ES ol
8,8, /(582.2222)(199.1111)

Therefore the product-moment correlation coefficient is 0.82


(2 d.p.), indicating a high positive correlation.

Now, for y onx

DINin a Wide ee

= 9(1—0.82387)(199.1111)

= 576 (38.F.)

The minimum sum of squares of residuals for y on x is 576 (3 Sa,


e e
&
REGRESSION AND CORRELATION
587
CALCULATOR NOTE: if your calculator has pre-programmed SD
and/or LR modes then make use of them whenever possible.
In LR mode, r can be obtained directly as follows:

mM oO oO Co ES 5 ° a oO o =) oSeSOo mM2 5 vw oS EK |

o||
O
31/8 8 S << S on TS DATA

8 S << S

S S << S onnS DATA

S <<dS i)
E

& S SS

8 S — dS

8 dS <= S

& S SS S Sa
hale
pete
o Piel
eS
Sl
> Sis
reSle
else
el
>
sia
all

He
He
Ou
Co
co
© 8S Yyp||100|| DATA

[SHIFT [9| gives [r|=0.82367... ,as [oyefore.

To obtain s,” press SHIF ©» a

s,” by pressing |SHIFT 2 tv


N
ale Pala
[=|
[2]
So to calculate n(1 1? )6,7, where n= 9Q, press

A) are) 1
PE]
(3 8.F.), as before.
which gives 2m/(min) = 576

RELATIONSHIP BETWEEN REGRESSION COEFFICIENTS AND r

For the regression line y on x


Sxy
y = axt+b where a

and for the regression line x on y

x =cytd where ¢ =a
582 A CONCISE COURSE IN A-LEVEL STATISTICS

where a and c are the regression coefficients.

Now (haa

Now, either a and c are both positive

or a and ¢ are both negative,


'

sO r?> = ac and r= +vac_ ifa,c are positive


r = —vVac_ ifa,c are negative

Example 11.6 For the data given in Example 11.4, find r, the product-moment
correlation coefficient.

Solution 11.6 Method 1


From p. 571 the least squares regression line y on x is

y =I| 0.186 +11.7 (a= 0.186)


and x on y is

8 | 0.769y — 4.34 kc = 0.769)

So r = +/ac

= V(0.186)(0.769)
lI 0.378 (3 d.p.)
r = 0.878, indicating that there is some positive correlation.

Method 2 She
oe y\
de
tt >
y oS

We have shown that s,, = 1.694, $7 = 9.102, $f = 2.204 (p. 571).

Sxy 1.694
So ore = eee = 0.878 p.
s,s, V/(9.102)(2.204) ceo
REGRESSION AND CORRELATION
583
Example 11.7 The moisture content, M ? in grams of water per 100 grams of dried
solids, of core samples of mud from an estu ary was measured at
depth D metres. The results are shown in the table:

Depth (D) oS 5 VOR 155209 925— 530) 35


Moisture content (M) Oo oS co N te}© —sN oDSo N re Nme reoO

(a) On graph paper, draw a scatter diagram for these data.


(b) Obtain, to 3 decimal places, the product-moment correlation
coefficient. Without performing a significance test , interpret
the meaning of your result.
(c) Find the equation of the regression line of M on D , giving the
coefficients to 2 decimal places.
(d) Find, to 2 decimal places, the minimum sum of squares of the
residuals and explain using words and a diagram what this
number represents.
(e) From your equation estimate, to 2 decimal places, the decrease
in M when D increases by 1. (L)

Solution 11.7 (a) Scatter diagram to show moisture content, M, and depth, D.
(spljos
palup /5) QQ,
aunisioyy Jua}UOd 6

Depth (m)
584 A CONCISE COURSE IN A-LEVEL STATISTICS

Method 1 — using calculator in SD mode.

(b) eee S40 ° 15 9025980) 35


|Mm| 90 89156 42 80a214 21) 18

s ZIMA
i where Som = ~~ DM
SpSmu 8

Now X=DM = (0)(90)+(5)(82)+...+(35)(18) = 3985


From calculator,

Det, Sp = 11.456 439


M = 45, Sy = 26.528 287
3985

= —289.375
—289.375
Therefore r=
(11.45 ...)(26.52...)
= —0.952 (3d.p.)
This is almost perfect negative correlation.

(c) Equation of least squares regression line M on D is

M—M =—2(D-D)
— s =

Sp

—289.375
Therefore M45 2D 1)
(11.45 2.7)?

= —2.20(D —17.5)

M = —2.20D + 83.58 (2 d.p.)

We show the regression line drawn on the scatter diagram. Note that
it goes through (D, M) i.e. (17.5, 45) and the intercept on the
M-axis is 83.58.
REGRESSION AND CORRELATION
585

The least squares regression line of M on D: M = 83.58 —2.20D.

EEE
Ee Gs taeeea [se || Se

f EEE EEE
iS] Re HE ttt

EEEEE EERE
ale Les | om | ee | i Vf |
[|] | eatas | a espn
[ [| oaee i]| fe |a os[on Ff ee)
HH sae H+ HH aps)
- con | 4 a
|| ese eloae [ea @ |
hey emee) | ee |
> | nfo a] hee a ah [le stellen)
3 ic] SifSao eee a] Ig asf a alse a eaegcad

3
3

EEE EEE
aya 8 OY ere | |

2
Gy
oD
eof
fff rages] NC 0 | at
es gusbereeeersensceseeesa
[ai a ef
oO POO EPR BEE
2 (ae ese] sh HE [ees faa fal sa
San celia atbelie tetal TT | A Gees Pe
= Lee an ie bol) Trt ye fw! An aa {eid deface
el

8 EERE
eS BERR ease TTTN HE (aS PSs)
)

3
G45 Le
re tele
ee Pee | Cer
a eee ena ea

2 EEETo
2 ee eat

= 30Rett
a][siete[ey
ha OSA |
linfaafole fd. [axl
20 LE
a a TTT
Lala lll 2 mH 714
a ae Td tet elt HN {ae |i
BEER EEEEEE EEE EEE
10 Lispol et [op le} i a a
na Fa ims)Nie
|ee
DSU BERRIES Phe Ett tt MN 4 | ——
REE Poy1 i Ne
es |e eefie | | |FT eg Vf NS HH
EEE EEE EH
piesfee oat [red Cer eT errr ty Trt Tr tr So
0 5 10 15 20 25 30 35
Depth (m)

(d) The lengths P,Q,,P,Q,,...,Ps@s are called the residuals. The


sum of the squares of the residuals is given by

me = P.Q,°+ sae + P3Q5”

and the minimum value of this sum is given by

2m (min) saan —r?)sy?

8(1—(—0.952 ...)?) (26.5 ...)?


525.98 (2d.p.)

(e) The gradient of the regression line is —2.20 so that when D


increases by 1, M decreases by 2.20.

Gradient = —2.20
586 A CONCISE COURSE IN A-LEVEL STATISTICS

Method 2 — using calculator in LR mode.


(b) We shall use x for depth D and y for moisture content M.

a (AT)CO aa)(a

Now [SHIFT] |9|gives = —0.9 52


142 933.
So the product moment correlation coefficient = —0.952 (3 d.p.).

This is almost perfect negative correlation.

(c) The least squares regression line M on D is given by M =A+BD.

where gives 83.583


= 83.58 (2d.p.)

and SHIFT gives —2.204 161 905


= 220 (2dp.)
So the regression line is M = 83.58—2.20D (2.d.p.).

NOTE: gives 17.5

45.
REGRESSION AND CORRELATION E 587

(d) 2M? min) unis Hysy7c

Now n= 8 and from calculator

gives °s);° ='°703.75

From (b) [9] gives = —0.952142933

so to calculate n(1—r?)sj)?:

DA ame
5 a IE
So ZMi min) = 525.98 (2 d.p.).

fe) Asin Method 1.

a 2 7 Exercise 11d

Calculate the product-moment correlation tion coefficient and the equations of the
coefficient for the sets of data given in two least squares regression lines.
Exercise 11c, Question 1, and comment
on your answers.
For a given set of data the equations of
the least squares regression lines are
If the equations of the least squares regres- y = —0.219x+ 20.8 (yonx) and
sion lines are
x = —0.785y+ 16.2 (x ony)
y = 0.648x+ 2.64 (yonx) and Find the product-moment correlation co-
x lI 0.917y—1.91 (x ony) efficient for the data.

find the product-moment correlation co-


efficient for the data. For
agiven set of data 2x = 21, Ly = 33,
Yx*= 91. Dy? = 205, Ley = 128,
n= 6. Find the product-moment correla-
Fora given set of data 2x = 680, Ly = 996, tion coefficient for the data. Find also the
Dx?= 20154, Dy?= 34 670, Vxy = 24 844, minimum sum of squares of residuals for
n = 80. Find the product-moment correla- y on x.

r
Sure Tn n nn LE
588 A CONCISE COURSE IN A-LEVEL STATISTICS

USING A METHOD OF CODING


When the values of x and y are very large or very small we need to
avoid exceeding the capacity of the calculator. The least squares
calculations can be better done by a change of origin and scaling,
that is, using a method of coding.
For the data (x 1,1), (%2,2)>+ ++» (Xn Yn) Suppose we use the coding
ye
Xi and X=
NOTE: do not confuse the scaling constants a and c used here with
the regression coefficients a and c.
Now, rearranging we have ;
x; = a-bX;—and=yyp =e + dY;- for =i== "1, 2,...,7
We have already seen (pp. 44, 63) that
Z=atbX, - ¥ = e+dY
and &. = Dex, Sy dsy —

For the covariance

2 (x;7X)(ViY)
ee -

1 > ~
== Dat 0X; —(a-* 0X i [etd y= (e7-ay )]
n

i! a —
= — 2b(X;—X )d(Y;— Y)
n

So Sy = bdsxy

For the product-moment correlation coefficient


Sxy
TX ae
SxSy

i
ba
1 a.
3 Pree

a Soy
SxS8y

So Ixy = Try,
REGRESSION AND CORRELATION
589

| i.e. the product-moment correlation coefficient remains unchanged.


This is because risa measure of the degree of scatter and this is
unchanged by a change of origin and scaling.

Example 11.8 For the following data, use a method of coding to find (a) the co-
variance, (b) the product-moment correlation coefficient, (c) the
least squares regression lines y on x and x on y.

1000 1012 1009 1007 1010 1015 1010 1011


235 240 245 250 255 260 265 # 270

Solution 11.8 We use the codings


y — 250
x—1000, Y =
5
So, referring to the results on p. 588 with a = 1000, b = 1, c = 250,
d = 5, we have s, = sx, Sy = 5sy and s,,, = 5syy.

oo n 8 \8/\8
Therefore s,, = 5sxy
= 5(5.5)
= 27.5
The covariance s,,, is 27.5.

(b) Now
2
; Dake
Se a
ee in y2ts S20(4) = 16.9375
n 8 8
2
Dye
sv eee eV tee “_ (4) = 5.25
n 8 \8
Therefore

xe
SxY
al 28 = 0.58 (2d.p.)
SxSy V (16.9375)(5.25)
590 A CONCISE COURSE IN A-LEVEL STATISTICS

So ty = Tey OB (2 d.p.)
The product-moment correlation coefficient is 0.58 (2 d.p.).
DAMS DORIS oe easttsPapen bes Senger vine@ ey Seem eee

(c) The equation of the least squares regression line Y on X is

Y-¥Y =——(x-X)
= $s =

sx
4 5.5 74
Le: v8 16.9375 (x
50 Y = 0.3247X —2.5037
0
Now, since Y = and X = x —1000, this equation may be

written

y — 250
= 0.8247 (x — 1000) — 2.5037
5
y 1.6235x —1886.0185 (least squares regression
line y on x)
The equation of the:least squares regression line X on Y is

x-X = —(y-Y¥)
— s —

Sy
74 5.5 | =
xXx-— = —(|yYy--—
8 5.25 8
i.e. xX 1.048Y + 8.726
This equation may be written
y 50
x — 1000 1.048 + 8.726

x 0.2096y + 956.326 (least squares regression


line x on y)

——s«éExercise 11e 2 eS
For the following sets of data, use appropriate
methods of coding to calculate (a) the co-
variance, (b) the product-moment correlation oe 981.2 981.3 981.9 981.6 981.5
coefficient, (c) the least squares lines of
regression of y on x and x ony.
55.6. * 652 90 64g ee Tl esEes
1. 1701 1722 1717 1718 1703 1701| 3. 0.00157 0.00156 0.00149 0.00165
45.1 45.8 45.6 45.3 45.1 45,1 100.4 100.7 100.0 100.4
REGRESSION AND CORRELATION
591

COEFFICIENTS OF RANK CORRELATION

For the data (x;,V1),...,(X,,¥,) the product-moment correlation


th . Sx
coefficient is —~.
SxSy
Now suppose that, instead of using precise values of the variables,
, or when such information is not available, we rank the numbers in
order of size using the numbers 1, 2,...,n.
A correlation coefficient can be determined on the basis of the
ranks. There are two useful rank correlation coefficients:

Spearman’s Coefficient of Rank Correlation, rg


Kendall’s Coefficient of Rank Correlation, r,.

SPEARMAN’S COEFFICIENT OF RANK CORRELATION rs,

Suppose that
Kat Nossh 47 X pALeUne sun Rs-OL Kye Xs os Sp
Y,, Y2,..., Y, are the ranks of 1, y2,...,Yn
Then
X,,X2,...,X,, are the numbers 1, 2,..., in some order
and Y,,Y>,..., Y, are the numbers 1, 2,...,n in some order
Consider the rank difference d,,d>,...,d, given by
Cy = AqeeLy-G. — Xo Ys, ..., d, = X,Y,
so that dy ae Xie ee el 2h
SX ee ae — 2K

2l=)ni 4+ 1)(2n + 1)—22:X,Y;

since 12+ 2?+...+n?= an(n +1)(2n +1).


So, dropping the subscript, we have
1 1 :
ZXY = gut l@n+1)—— za (i)

Now, if we substitute for the x’s and y’s in the original data their
corresponding ranks in the formula for r, the product-moment
correlation coefficient, we obtain an approximation to r. This
approximation is called Spearman’s coefficient of rank correlation,
rs.

We write
Sxy DXY (=|[=")
i= where Sine Neal |ee We
SxSy n n n
592 A CONCISE COURSE IN A-LEVEL STATISTICS

Substituting from (i) and using


TX =LDY=14+2+...4n = Zn(nt+1)
1 if
ee il
grt ant Y—F za 7m +

OE +1)(2n
[a(nAE es
t+1)—3 (n+1)7]_eon1
ee ee Cte
12 2n

— (nt1jm—1) 1 > d?
2, 2n
Pipes 1 v5
a (n 1) fp (ii)
12 2n
Now
, _ 2X? _ (EX) _ (mt1j2n+1)_ (n+)? _ (1)
SX a - 6 4 12
(iii)
Similarly
(Heady
sy” = aa ie
12

SO SySy
ae
12

Therefore from (ii) and (iii)

(nn?=) R ma 2
Sxy 12 2n 62d?
Ch ste
r = SS ee

(n?—1) n(n*—1)
eae
Re

12

Spearman’s coefficient of rank correlation, rs, is given by


ee 62d" 5
Se

It is much easier to calculate rg than to calculate r, the product-


moment correlation coefficient, as there is far less working involved.
However, in general, r is a more accurate measure of correlation.

Method of ranking
Suppose we have the masses, x, (in kg) of five men
66, 68, 65, 69, 70
Arranged in ascending order of magnitude, these are 65, 66, 68, 69,
70, so we assign the ranks as follows:
REGRESSION AND CORRELATION #
593

If we have two or more equal values we proceed as follows:

Here, the 3rd and the 4th places represent the same mass (68 kg), so
we assign the average rank 3.5 to both these places.
Similarly for the eight values:

Here the 3rd, 4th and 5th places represent the same mass (66 kg)
SO we assign the average rank 4 to these places; also the 7th and the
8th places represent the same mass (68 kg) so we assign the average
rank 7.5 to both these places.
NOTE: if there are more than just a few equal values, then this
method is not appropriate.

Example 9 Two competitors rank the eight photographs in a competition as


follows:
Photograph A ~“B C
1st competitor 2ZRe@5 “38
2nd competitor Ae TTS

Calculate Spearman’s coefficient of rank correlation for the data.

Solution 11.5 In this example, the data has been ranked already.
Let d = rank(x) —rank(y).

Photograph Rank (x) Rank (y)

lime
Q

E NAM zOOF;
OHANWwA
DN
FNP
b& HPROOOH
AA
-enconelf
Yd?= 30

rs
6dd? where n = 8
~ n(n?—1)
6(30)
~~ 8(64—1)
lI 0.64 (2dp.)
594 A CONCISE COURSE IN A-LEVEL STATISTICS

Spearman’s coefficient of rank correlation for the data is 0.64,


between the competitors.
oa on ae
correlati
some positive bag
COP g ENS
indicatin

Example 11.10 The marks of 10 pupils in French and German tests are as follows.

|French,x [12 8 16 12 7 10 12 16 12 9
[Germany [6 5 7 7 4 6 8 13 10 10
Calculate Spearman’s coefficient of rank correlation.

Solution 11.10 Let d = rank(x)—rank(y).

62d?
ls
n(n?—1)
___
6(61)
10(100 —1)
= 0.63 (2dp.)
Spearman’s coefficient of rank correlation is 0.63, indicating some
positive correlation between the marks in the two tests.

Exercise 11f

The table shows the marks awarded to six


children in a competition. Calculate a
Childe | AB Ch eee
coefficient of rank correlation for the Judge 1 G:8 rane Ors seal Ono
data: Judge 2 48 94 7.99 916 8.9 6.9
REGRESSION AND CORRELATION 595

2. The following table shows the marks of 1a ary on 2


eight pupils in biology and chemistry. (x,y) =
cov(x, ae A OS, a ibe yi)AL
Rank the results and find the value of
Spearman’s coefficient of rank correla- Hence show that Spearman’s coefficient
tion. of rank correlation between x and y may
Biology, x 65 65 70 75 75 80 85 85 be written as
n
Chemistry,y |50 55 58 55 65 58 61 65 ———— ty.)
n(n?— 1) wichGin)
Mr and Mrs Brown and their son John all
Seven army recruits (A, B,...,G) were
drive the family car. Before ordering a
given two separate aptitude tests. Their
new car they decide to list in order their
orders of merit in each test were
preferences for’five optional extras in-
dependently. The rank order of their Order of merit | lst 2nd 38rd 4th 5th 6th 7th

choices is as shown:
Optional extra Mrs Brown
2nd test D F E B G Cc A

Heated rear window Find Spearman’s coefficient of rank


Anti-rust treatment correlation between the two orders and
Headrests
comment briefly on the correlation
Inertia-reel seat belts
Radio obtained. (O&C)
Sketch two scatter diagrams illustrating
(a) Calculate coefficients of rank correla- the following situations:
tion between each pair of members of the (a) two variables having a large, negative
Brown family. (6b) A salesman offered to correlation;
supply three of these extras free with the (6) two variables having a small, positive
new car. The family agreed to choose correlation.
those three which were ranked highest by
the two members who agreed most. Which The mean rainfall per day and the mean
three did they choose, and in what number of hours of sunshine per day
order. (L Additional) observed at a weather station are given
below.
Two adjudicators at a Music Competition
Rainfall Sunshine
award marks to ten Pianists as follows:

January lel,
February
AD OB eGS DMEM E GA H. © J
March
Adjudicator I 78 66 73 73 84 66 89 84 67 177 April
AdjudicatorII |81 68 81 75 80 67 85 83 66 78 May
June
Calculate a coefficient of rank correlation July
August
for these data. Name the method you
September
have used and describe briefly, without October
proof, the principle on which it is based. November
(SUJB Additional) December

Calculate Spearman’s coefficient of rank Calculate, correct to two decimal places,


correlation for the set of data given in the rank correlation coefficient between
Exercise 11c, Question 1 (p.574). Com- rainfall and hours of sunshine.
pare these with the product-moment What is the rank correlation coefficient
correlation coefficients found in Exer- between rainfall and minutes of sun-
cise 11d, Question 1 (p. 587). shine? (SUJB Additional)

In a skating competition one judge In a study of population density in eight


awards the same mark to all 4 com- suburbs of a town the statistics shown
petitors. Show that the coefficient of in the table were obtained. The popula-
rank correlation (Spearman’s) is 0.5, tion density is denoted by p, and the
irrespective of the marks awarded to the distance of the suburb from the centre of
competitors by the other judge. the town by d.
Ae Bec ee aah (Go eH
Each of the variables x and y takes the
values 1,2,...,” but not necessarily in p (persons/hectare) 55 11 68 88 46 43 21 25
the same order as each other. Prove that d (km) 0.7 3.8 1.7 2.6 1.5 2.6 3.4 1.9
the covariance of x and y is
596 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Plot p against d on a scatter diagram.


(b) Calculate and mark on the diagram Maw eee ocemueienn ayte 8! I
xe
the mean of the array. (c) Calculate a
coefficient of rank correlation between p YiLeow shite Ce atalesialine? palates
and d, stating the system of ranking
adopted for both quantities. (d) State Calculate a coefficient of rank correlation
what conclusions can be drawn from your between these two sets of ranks and
answers to (a) and (c) concerning the comment briefly on your result.
general trend of the results. (e) Giving a (b) Illustrate by means of two scatter
reason for your answer, state which suburb diagrams rank correlation coefficients of
in your opinion fits the general trend least 0 and — 1 between two variables X and Y.
well. (L Additional) (C Additional)

12. (a) Sketch scatter diagrams which illustrate


10. (i) positive linear correlation, (ii) negative
linear correlation, (iii) no correlation,
English | 38 62 56 42 59 48 . between two variables X and Y.
|History | 64 84 84 60 73 69 (b) A doctor asked ten of his patients,
The table shows the original marks of six who were smokers, how many years they
candidates in two examinations. Calculate had smoked. In addition, for each patient,
a coefficient of rank correlation and he gave a grade between 0 and 100 in-
comment on the value of your result. dicating the extent of their lung damage.
The following table shows the results:
The History papers are re-marked and one
of the six candidates is awarded five ASB HOSED OHI (GoPReeT
1d
Patient
additional marks. Given that the other
marks, and the coefficient of rank correla- Number
smoking
ofyears | 45 99 95 28 31 33 36 39 42 48
tion, are unchanged, state, with reasons,
which candidate received the extra marks. ee eee 30 50 55.30 57 35 60.72 70 75
grade
(C Additional)
Calculate a coefficient of rank correlation
between the number of years of smoking
11. (a) X and Y were judges at a beauty and the extent of lung damage. Comment
contest in which there were 10 com- on the figure which you obtain.
petitors. Their rankings are shown below. (C Additional)

SIGNIFICANCE OF SPEARMAN’S RANK CORRELATION


COEFFICIENT

In order to test the significance of the calculated value of rg, it is


becessary. to calculate the probability of obtaining a given value of
Za

We look at the distribution of Ld? in the following situation:


These are the rankings of four samples of sparkling wine by two
wine-tasters, Enrico and Claude.

Wine

Enrico’s ranking
Claude’s ranking
REGRESSION AND CORRELATION
597
If we leave the first row in its natural ranking order, 1, 2, 3, 4, then
the second row could be ranked in 4! different ways, assuming that
there are no equal ranks. These 24 arrangements are shown here,
with the corresponding values of Dd?.

Arranging 2d? in the form of a frequency distribution:

Peete tere) 1012 14 iel 1s 20


[Frequency] 18 1 4 272 2 4 1 3 1
The distribution is symmetrical about Ld? = 10.
Note the following results:

62d?
Vangire = —,
see S nin?=1)

Perfect position No correlation Perfect negative


correlation — the correlation — one
rankings agree rank is the exact
exactly. reverse of the
other.
598 A CONCISE COURSE IN A-LEVEL STATISTICS

The frequency distribution of Zd?.

4 DF=24

Qe 2ae Ae Gio Oe cee IB ee 20 ent2

We can now use this bar chart to find probabilities associated with
various values of 2d.

(a)

(b)

Ors E25 A.ReSs Cu8q mOrp12 cenit, Sel Gap eE 20 Sd?


<—__—— > 5
P(Ld? <2) = 4 = 0.167 P(Zd? > 18) = = = 0.167

(c)

P(Xa? < 4) = 2 = 0.282


REGRESSION AND CORRELATION
599

(d)

0 2 4 6 Be On 2a 4 ee Ce 20) Yd?
+@§_| —_—_____ eee
P(La? <6) = 2 =0.375 P(Za?> 14) = 3 = 0.375

(e)

OY] ——_—
TTT
P(X a? <8) =+
= 0.458 P(Zd? > 12) = = 0.458

If we consider one of these diagrams, (e) say,

P(2d?<8) = 0.458 and P(2d?2>12) = 0.458

Also, if 2d? <8, with n = 4,


6(8
rg Sm pees so Irs 2 0.2
4(15)
So P(Zd? <8) = P(rs > 0.2) = 0.458.
Also if Xd? >12, with n = 4,
__6(12)
rg S rg S —0.2
4(15)
So P(Zd? > 12) = P(rs <—0.2) = 0.458.

Putting together the results from all the diagrams we have

Probability
600 A CONCISE COURSE IN A-LEVEL STATISTICS

The results for Ld? are also summarised thus:

= 20

This is an extract from a larger table, given in the Appendix on


p. 638 which gives probabilities for Ld? for values of n in the range
4<n<10. We will refer to this table as Table A.

SIGNIFICANCE TEST FOR rs, USING PROBABILITIES OF xd?

When we test rs for significance, a suitable null hypothesis is


Ho:p = 0, where p is the true population correlation coefficient
and p = 0 indicates that there is no predictable correlation between
the rankings. The method is illustrated as follows:

Example 11.11 For 8 pairs of rankings, Yd? = 28, giving Spearman’s coefficient
of rank correlation rg = 0.667 (3 d.p.).
Does this value indicate (a) a correlation significantly different
from zero, at the 10% level, (b) a significant positive correlation at
the 1% level?

Solution 11.11 (a) Ho: =0 (there is no correlation)


H,:p #0 (there is some correlation different from zero)
We refer to Table A (p. 689).
Use a 2-tailed test at the 10% level and reject Ho if
P(Xd? < 28) < 0.05.
From Table A:

28 140} 0.0415

This indicates that P(2d? < 28) = 0.0415 < 0.05, so we reject Hy
and conclude that there is evidence at the 10% level of a correlation
different from zero.
REGRESSION AND CORRELATION 601
(b) Ho:p =0 (there is no correlation)

Ho:p>0 (there is some positive correlation)

Use a 1-tailed test at 1% level and reject Hy if P(2d* < 28) <0.01.

Now, from Table A, P(2d* < 28) = 0.0415 > 0.01 so we do not
reject Hy and conclude that there is'‘no evidence at the 1% level of a
positive correlation.

Example 11.12 For 9 pairs of rankings it is found that Dd? = 214, giving
rs = —0.783. Does this provide evidence, at the 1% level, of a nega-
tive correlation? ao

Solution 11.12 H):o =0 (there is no correlation)

<p 0 (there is some negative correlation)

Use a 1-tailed test at 1% level and reject Hy if P(2d* > 214) < 0.01.

Now, from Table A, P(2d? > 214) = 0.0086 < 0.01, so we reject
H, and conclude that there is evidence, at the 1% level, of negative
= correlation.

Example 11.13 An expert on porcelain is asked to place 7 china bowls in date order
of manufacture assigning the rank 1 to the oldest bowl. The actual
dates of manufacture and the order given by the expert are shown.

Date of manufacture |1920 1857 1710 1896 1810 1690 1780

i Oe a fF 6 U2 et 5

Find, to 3 decimal places, the Spearman rank correlation coefficient


between the order of manufacture and the order given by the
expert.

Refer to one of the tables of critical values provided to comment on


the significance of your result. State clearly the null hypothesis
which is being tested. (L)P
602 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 11.13

Rank (x)
Rank (y)

Id|
Za? Dd? =16, n=7

Now

= O14 (3 a:p)

Ho:p =0 (no evidence of correlation)

H,:p>0 (some positive correlation)

Use a 1-tailed test, at the 5% level, and reject Ho if


P(2d? < 16) < 0.05.

From Table A, P(2d?<16) = 0.044 < 0.05, so we reject Hy and


conclude that there is evidence, at the 5% level, of agreement bet-
ween the order given by the expert and the actual dates of manu-
facture.

Exercise 11g
In each of the following questions use Table A to test the hypotheses.

significance

ONS
CRO
SRO
#
REGRESSION AND CORRELATION
603
SIGNIFICANCE TEST FOR rs, USING CRITICAL VALUES

Instead of working with Yd? it is much easier to refer to Table B


which gives critical values of rg. This is printed below and in the
Appendix on p. 639.

Critical values of the Spearman rank correlation coefficient

TABLE B

Example 11.14 Using Table B, for n = 8 and rg = 0.667, test the following
hypotheses:
(a) Hp:p = 0, H,:p #0 (10% level of significance)
(b) Ho:p = 0, Ay:p>0 (1% level of significance)

Solution 11.14 (a) Using a 2-tailed test, at the 10% level, and considering Table B,
with n= 8, significance 0.05 (because test is 2-tailed) we reject
Hy if rg 2 0.6438.
Now rg = 0.667, so we reject Hy and conclude that there is evidence
at the 10% level of a correlation different from zero.

(b) Using a 1-tailed test, at 1% level, and considering Table B, with


n = 8, significance 0.01, we reject Hp if rs 2 0.833.
Now rs = 0.667 < 0.833, so we do not reject Hy and conclude that
there is no evidence, at the 1% level, of positive correlation.
604 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 11.15 For 9 pairs of values, rg is found to be —0.765. Test, at the 1%


level, whether there is evidence of a negative correlation.

Solution 11.15 H):0 =0 (there is no correlation)


H,:p <0 (there is negative correlation)
Using Table B, with n = 9, significance level 0.01, we reject Ho if
ls < — 0.783.

Now rs = —0.765 > —0.783, so we do not reject Ho and conclude


that there is no evidence, at the 1% level, of a negative correlation.

_ Exercise 11h
In each of the following, use Table B to comment on the significance of the value for rs.

Level of
es

Ho:p = 0, H,:p #0
Ho:p
=0, Hi:p >0
Ho:p = 5 Hy:p #0
Ho:p = 0, H,:p>0
Ho:p = H,:p <0
Ho:p
=0, Hy:p #0
7p >0
7p >0

_ Exercise 11i
In the following questions, either Table A or (f) X 123456789 10
Table B may be used. YoU TOsOSSmii Omon
4. elm
(g)X 123456 78910
1. Find rg and comment on the significance 3164581092 7
of the result.
X and Y have been ranked. (h) Xv vlo2 S946 6 7 § 9 10
(a)X 123456 YY. OS 1007s GES (2 4.3)
Y 123456
(b) X 12383456
Y 654321
id x 128 one 2. Calculate rs for the following data and
comment on the significance of the
Y 3 5°))476°2 results.
(d) . : : 3 y i : (a) (20, 13), (47, 29), (50, 33), (33, 20),
(57, 32), (44, 23), (38, 25), (25, 19).
(e) X 12345678910 (b) (4.8, 81), (6.2, 79), (8.4, 86),
YY. 126354 2556 ee Omo mL (4.1, 63), (7.5, 90), (5.1, 87).
REGRESSION AND CORRELATION
605
KENDALL'S COEFFICIENT OF RANK CORRELATION ri

To calculate Kendall’s coefficient of rank correlation the data is


ranked and then written so that one set is in rank order, for example

We investigate the agreement between the rankings by calculating


a“score’, S in the following way. We denote the y values by V1. Ya,
¥3, V4, Ys5-

Rank (y)
- 2

Start at y, and work from left to right.


Add 1 to the ‘score’ S for each y-value greater than y,.
Subtract 1 from the ‘score’ S for each y-value less than y,.
Then repeat the process, starting in turn from y,, y3 and y, and
~ always working from left to right.
2 values greater than y, "2 values less than y,
‘ ge
Score to the right of y,; = +2 —2=0
3 values greater than y, 0 values less than y,
‘“ o
Score to the right of y, = +3 —0=+3
1 value greater than y, 1 value less than y3
‘ ge
Score to the right of y; = +1 —-1=0
1 value greater than y, 0 values less than y4
ge
Score to the right of yz =+1 +1 =0

Therefore the total ‘score’ S


= 0) 428 seOQsrik = 4b

_ Now, Kendall’s coefficient of rank correlation is defined as

én(n 71)

So, when n=5, S=4


4
he = = 0.4
5(5)(4)
606 A CONCISE COURSE IN A-LEVEL STATISTICS

Now, when there is perfect agreement, Kendall’s coefficient r,


should take the value +1. We check that this is the case. Consider

A B Ciw BD E FiaeG HoH I

Rank (x) 1 Z 3 4 5 6 7 8 9

Rank (y) itt 2 3 4 5 6 7 8 9


V1 ob¥o evar Vises Vell Soe ee ett Ys

Score to the right of y;


pee
5n(n—1)

36

5(9)(8)
= 1 as expected.

a
Maximum score = 36..

NOTE: Ingeneral, when there are n pairs of rankings, the


maximum score S is given by _ .
Ss 1527st tin ean
: (sum ofan AP.)

Example 11.16 Nine applicants are interviewed for a teaching post by the head-
teacher and the head of department. Each ranked the applicants in
order of merit as follows:

Applicant A Barc
Headteacher 2 ie rO
Head of Department | 3 12

Investigate the extent of the agreement between the rankings of the


two interviewers.

Solution 11.16 We will consider Kendall’s rank correlation coefficient, and must
first put one set in rank order and allocate letters y, to yo to the
other set:

Headteacher (x ) le. 2 @3* 4:45 26 7 8 9


Head of Department
(y)| 13 5 42 7 9 8 6
v1 2 ay aS bY To Bt 8
REGRESSION AND CORRELATION
607

Calculating the total ‘score’ S,

Score to the right of y;

Now ri = pero
n(n —1)
20
i
3(9)(8)
= 0.556
ri Total score S = 20

Therefore there is some agreement between the rankings of the two


interviewers. We now investigate the level of significance of this
value, and will complete the solution after the next section, in
Solution 11.17.

SIGNIFICANCE OF KENDALL’S RANK CORRELATION COEFFICIENT

In order to test the significance of the calculated value of r,, it is


necessary to calculate the probability of obtaining a given value of
S. We look at the distribution of S in the same way that we looked
at the distribution of Yd? on p. 597.
For n= 4, all the possible ranks are listed below, showing the
value of S in each case.
608 A CONCISE COURSE IN A-LEVEL STATISTICS

Arranging S in a frequency distribution:

The distribution is symmetrical about S = 0.

Note the following results:

Perfect positive No correlation Perfect negative


correlation — the correlation — one
rankings agree rank is the exact
exactly reverse of the
other

The frequency distribution of S

Lf =24

= Oe tae 0 2 4 6 S

We can now use this diagram to find the probabilities associated


with various values of S.

(a)

P(S <—6) = 4 = 0.0417 P(S


>6) = 34 = 0.0417
6 ——— —————
REGRESSION AND CORRELATION
609

(b)

ower eo TO 2 ede)as
P(S<—4) = 4% = 0.167 P(S>4) =4 = 0.167
=
ae,

(c)

S642) OF 24 Gg
P(iS<—2)=%£OF =0.375 P(S>2)=% =0.375
<—$_—_____——_ ed

(d)

=6 +4) 2 0 2 4 6 S
P(S <0) = 35 = 0.625
aA
610 A CONCISE COURSE IN A-LEVEL STATISTICS

(e) ;

—6-—4 2 0 2 4 6 Gg
P(S > 0) = 32 = 0.625
SS

Putting together the results from all the diagrams:

S<—6 S26
S<—4 S24
ae S22
S20

The results are also summarised thus:

This is an extract from a larger table, given in the Appendix on


p. 640 which gives probabilities for S for n intherange 4 <n <10.
We will refer to this as Table C.

Note that the table is in two parts:

When n = 4, 5, 8 or 9 then the possible values of S are even numbers.


For example, for n = 8, S can take even values from —28 to 28
inclusive and P(S < —20) = P(S 2 20) = 0.0071.

When n = 6,7 or 10 the possible values of S are odd numbers. For


example, for n = 10, S can take odd values from —45 to 45 inclus-
ive and P(S <—15) = P(S 215) = 0.108.

When we test r, for significance, a suitable null hypothesis is


Ho:p =0, where p is the true population correlation coefficient
and p = 0 indicates that there is no predictable correlation between
the rankings. The method is illustrated as follows:
REGRESSION AND CORRELATION
; 6177

Example 11.17 Considering the data given in 11.16, perform a significance test to
determine the extent of agreement between the headteacher and
head of department when ranking nine applicants for a teaching
post.

Solution 11.17 We found that S = 20, n =9 and r, = 0.556, and we want to


test r, for significance.

So Ho:p = 0 (no correlation)


H,:p #0 (there is some positive correlation)

Use a 1-tailed test at the 5% level and reject Hp if P(S > 20) < 0.05.
Now, from Table C,

indicates that P(S 2 20) = 0.0223 < 0.05, so we reject Hy and


_ conclude that there is evidence, at the 5% level, of a positive correla-
tion between the rankings of the headteacher and head of depart-
ment.

Example 11.18 Calculate r, for the following data and comment on the result.
(6.9, 89), (5.8, 73), (4.8, 81), (6.2, 79), (8.4, 86),
(4.1, 63), (7.5, 90), (5.1, 87), (9.9, 96), (4.3, 72).

Solution 11.18
6:95-5:8-14-8- 6-2 -8:4- 4:1-—-7.-5- 5 1 9908
aaa 89 73 81 79 86 63 90 87 96 72
fae me come 6 ome
Qamodiony pad) 17phoi gay 40,407T Avie Am IRS

Now re-arrange the pairs so that the x-values are in rank order.

Rank (x) | 1 ee2e 48 G4)..55) ac6 i Tr & 9° 10


Rank (y) 1 2 5 7 3 4 8 9 6 10
Sey our sey ayy st B56 MY Yeo tYo 10
612 A CONCISE COURSE IN A-LEVEL STATISTICS

S
Now l= io See
5n(n—1)
2=3
= 3=0
5(10)(9) b= 5
4=4
= 0.689
0. (38d.p.)
d.p. pee
L—2—0
1=1

Ho:p = 0 (no correlation)

H,:p > 0 (evidence of positive correlation)

Use a 1-tailed test, at the 1% level, and reject H, if


P(S 2 81) < 0.01.
From Table C, n = 10, P(S > 31) = 0.0023 < 0.01, so reject Ho
and conclude that there is evidence, at the 1% level, of positive
correlation.

Example 11.19 When calculating r,, with 9 pairs of data, it is found that S = —24.
Test, at the 1% level, the hypotheses: Hj:p =0, H,:p #0.

Solution 11.19 Use a 2-tailed test, at the 1% level, and reject Ho if


P(S <—24) < 0.005.
Now, from Table C,
n=9, P(S<—24) = P(S 2 24) = 0.0063 > 0.005. Therefore we
do not reject Hy and conclude that, at the 1% level, there is no
evidence of correlation.

_ Exercise 11)

In each of questions 1 to 7, use Table C to test the hypotheses, at the level of significance indicated.

a Hypotheses Level of significance

AGS
PwON
REGRESSION AND CORRELATION
5 613
8. Calculate r, for number 1, Exercise 11i Calculate r, for each pair of judges and
and comment on the significance of the comment on the significance of your
result.
results.
9. Calculate r, for number 2 of Exercise 11i
and comment on the significance of the
result.
10. Three judges in a bouncing baby competi- 1: These were the marks obtained by 8
tion rank the babies as shown. pupils in Mathematics and Physics.

|Mathematics [67 42 85 51 39 97 81 70
Judge 1 70 59 71 38 55 62 80 76
Judge 2
Calculate r, and comment on the signifi-
g cance of the result.

MISCELLANEOUS WORKED EXAMPLES


Example 11.20(a) The marks of eight candidates in English and Mathematics are:
Candidate fee? 3) 4 ee Gy 7 8
English (x) 50 58 385 86 76 43 40 60
Mathematics (y) |65 72 54 82 82 74 40 53
Rank the results and hence find a rank correlation coefficient
between the two sets of marks.
(b) Using the data in part (a), obtain the product-moment correla-
tion coefficient. To assist in the lengthy calculation, you may
use the information s, = 16.67. (SUJB)

Solution 11.20 (a)

5 1
6 it
4 3
8 0
1 6
ie 4
iz 0
3 3

62d?
rs = 1————_
‘ n(n?—1)
6(72)
8(64—1)
= 0.17 (2d.p.)
Spearman’s coefficient of rank correlation is 0.17 (2 d.p.).
614 A CONCISE COURSE IN A-LEVEL STATISTICS

ane

Covariance s,, = 2 = I — (66)(59) = 41.25

sy = 2y_ 32 = 2 wpete 3g\3 =. 263.75; 8s, = 16.24

s, = 16.67 (given)

Therefore
Sxy 41.25
r= = —————_ = 0.15 (2dp.)
8,8, (16.24)(16.67)
The product-moment correlation coefficient is:0.15 (2 d.p.).

Example 11.21 It is suspected that two quantities Q and W are related according to
the formula Q = aW®, where a and b are constants. Observations on
@ and W were made and the results were as follows:

| w [13 16°20) 26 8a 40 coe a


|@ | 4050 32 24°81 25 16
Plot a scatter diagram of log;)Q against log;)W and estimate the
equation of the regression line of log @ on log W, using the means
of certain points, or otherwise. Use your results to estimate values
for a and b.

State how you would obtain the product moment correlation


coefficient between log @ and log W. (SUJB)
REGRESSION AND CORRELATION f 615

Solution 11.21

me 20 11.601
Re erg ee
n 8

Z' ZY ie
eas ae eee tee) 2

We plot M(x, y) and drawaline parallel to the y axis.

For the points on the left For the points on the right

= 5.017 1.95 zs 6.584 165


SS ae .
x
i eS)
4 7
XR 4

%on
By 88ers ag eee
a 4 . YR A .

We plot M, (1.25, 1.66) We plot Mp (1.65, 1.37)


The line of best fit is drawn through M, M, and Ma, ensuring that
the line passes through M. This is a regression line y on x.

Using these points, we estimate the gradient to be


YL IR 1.66 —1.37 jd
i Rk = —— = — 0.73 ° (2d.p.
=165
lay oh bb (2d-p.)
Now Q =aW? so
log ;oQ = log, 9a + b log;oW

i.e. y = +tbx
logio a

The gradient, b = — 0.73.


616 A CONCISE COURSE IN A-LEVEL STATISTICS

l0g,9Q i— Regression line

ry

1.9
xX = logipW

Scatter diagram of log,


9Q against log, 9>W

To find a, use the fact that (X,Y) lies on the line.

Then ¥ = logiat+b%, so 1.52 = logya+(—0.73)(1.45)


logioa = 2.5785 and a = 10257 = 380 (2S.F.)
We estimate the values of a and b to be 380 and — 0.78 respectively.

The product-moment correlation coefficient would be found by


calculating

Example 11.22 The body and heart masses of fourteen 10-month-old male mice are
tabulated below:

Body mass (x) (grams) 27-30 37.38 32 36 32 32 38 42 36 44 33 38


Heart mass (y) (milligrams) {118 136 156 150 140 155 157 114 144 159 149 170 131 160

(a) Draw a scatter diagram of these data.


(b) Calculate the equation of the regression line of y on x and
draw this line on the scatter diagram. :
(c) Calculate the product-moment coefficient of correlation.
(AEB)
REGRESSION AND CORRELATION 617

Solution 11.22 (a)

Xx = 495 Ly= 20389 | Lexy= 72867 | Lx? =17783 Ly?= 800 405

et ae =
x= Ane
14
eran
=135, ty A)
.p.
e
Sy 2039
ol Soe e564 (2d.p.)
n 14

(b) The least squares regression line of y on x is given by

Sx
VV irae ix xX)
Sx

where

Lexy 72.867 (495 ae


a es epi Sea = Ob Tea
sonmanyp ie 14 Eales ae
and

> 17783 [495\?


On eee oh acest5 fd | ni 009 e(2idip.
aban oi 14 oa a)

fficient of of regression ae
h coefficient
Thereforethe 20.09en : a

So the equation of the least squares regression line y on x is

y—145.64 = 2.75(x — 35.36)


618 A CONCISE COURSE IN A-LEVEL STATISTICS

To draw this on the scatter diagram, first plot (%, VY). Then find two
further points, e.g.

when x = 40, y = 145.644 2.75(40—35.36) = 158.4

when x = 30, y = 145.644+2.75(30—35.36) = 130.9

(mg)
Heart
mass

Body mass (g)

Scatter diagram to show body and heart masses of 14 mice

(c) The product-moment correlation coefficient r is given by

aM ez
SxSy

. - Ly? a 800 405 2039\? =


945 , 66
=e ss Se |
OW Ss
y
i
in y 14 14

Therefore

5s Wanoomaeen © Meme
Sxy 55.27
r= =

The product-moment correlation coefficient is 0.79 (2 d.p.).


e i ae e eee
REGRESSION AND CORRELATION : 619

Example 11.23 The positions in a league of 8 hockey clubs at the end of a season
are shown in the table.
Shown also are the average attendances (in hundreds) at home
matches during that season.

Average attendance | 30 382 12 19 27 18 15 25

Calculate a coefficient of rank correlation between position in the


league and average home attendance.
Refer to the appropriate table of critical values provided in the
formulae booklet to comment on the significance of your result,
stating clearly the null hypothesis being tested. (L)P

Solution 11.23 Either rg or r, could be calculated. We show the working for both.
Spearman

Club (x)
Position (x)
Attendance rank (y)

We have Yd? = 48.

Now Poa 8) ee

II 0.4286 (4 d.p.)

Significance test:
Ho:p = 0 (no correlation between the two ranks)
Hip > 0 (some positive correlation)

NOTE: We would expect a positive correlation between league


position and average attendance.
Use a 1-tailed test, at the 5% level, and reject Ho if
P(2d? < 48) < 0.05.
From Table A, with n = 8, P(2d? <48) = 0.15 > 0.05 so we do
not reject Hy and conclude that there is no evidence, at the 5%
level, of positive between
correlation career the two sets of ranks.
ie ela oN raeel pence
620 A CONCISE COURSE IN A-LEVEL STATISTICS

Kendall

Attendance 2 1 8 5 3 6 7 4
rank (y) Ve Va Vaan 56 a in V8

Score to the right of y;

ie =
oi
Sub
ee)
8
2(8)(7) |
= 0.286 (3 d.p.)

Significance test:
Ho:p = 0 (no correlation)
Hy:p 2.0 (some positive correlation)
Use a 1-tailed test, at the 5% level, and reject Hy if P(S 2 8) < 0.05.
From Table C, with n = 8, P(S 28) = 0.199 > 0.05, so we do not
reject Hy and conclude that there is no evidence, at the 5% level, of
positive correlation between the two sets of ranks.

NOTE: The significance tests gave the same conclusion.

SUMMARY — REGRESSION AND CORRELATION

Least squares regression lines

yonx x ony
If equation of lineis y=ax +b _| If equation oflineis x = cy +d
alx+nb = cLyt+nd
alx?+brx cLy?+dzy

1
=e)
n
. aN
= ey)
n
REGRESSION AND CORRELATION 621

Product-moment correlation coefficient, r

= Say
SxSy

In terms of the regression coefficients: r=ac

‘where (regression coefficient of yy on x)

(regression coefficient of x on y)

Spearman’s coefficient of rank correlation, rg


62d?
Te
e n(n?—1)
Kendall’s coefficient of rank correlation, r,
S
+n(n—1)

Miscellaneous Exercise 11k

In questions involving regression lines assume, The heights A, in cm, and weights W, in
unless stated otherwise, that the least squares kg, of 10 people are measured. It is
regression lines are required. found that Yh = 1710, 2 W = 760,
Dh? = 293 162, DAW = 130 628 and
a 12 students were given a prognostic test = W? = 59 390.
at the beginning of a course and their Calculate the correlation coefficient
scores X; in the test were compared with between the values of h and W.
their scores Y; obtained in an examina- What is the equation of the regression
tion at the end of the course (i = 1, 2,..., line of W on h? (O &C)
12). The results were as follows:

Ten boys compete in throwing a cricket


ball, and the following table shows the
Find the equation of the regression line of height of each boy (x cm) to the nearest
Y on X and determine the correlation cm and the distance (y m) to which he
coefficient between X and Y. (SUJB) can throw the ball.
622 A CONCISE COURSE IN A-LEVEL STATISTICS

|Boy | A Bl LOMDINED EE... Git. \Taaed ground is marshy but very few where the
ground is dry. The number x of alder
x [122 124 133 138 144 156 158 161 164 168 trees and the ground moisture content y
y |41 88 52 66 29 54 59 61 63 67 are found in each of 10 equal areas
(which have been chosen to cover the
Find the equations of the regression lines range of x in all such areas). The following
of y on x, and ofx on y. No diagram is is a summary of the results of the survey:
needed. Calculate also the coefficient of yx = 500, Ly = 300,
correlation. Vx? =.27 818° Vey = 16837}
Estimate the distance to which a cricket Ly” = 10462
ball can be thrown by a boy 150cm
Find the equation of the regression line
in height. (AEB)
of y on x.
4. Sketch scatter diagrams for which Estimate the ground moisture content in
(a) the product moment correlation an area equal to one of the chosen areas
coefficient is — 1, which contains 60 alder trees. (O &C)
(bo) Spearman’s correlation coefficient is
+ 1, but the product moment correlation 7. (a) The following marks were awarded
coefficient is less than 1. by 2 judges at a music competition:
Five independent observations of the
random variables X and Y were: Child 1 10 9
Child 2 5 6
Child 3 8 10
Child 4 7 5
Child 5 9 8
Find
(c) the sample product moment correla- Calculate a coefficient of rank correlation.
tion coefficient, (b) Determine, by calculation, the
(d) Spearman’s correlation coefficient. equation of the regression line of x on y
(0 &C) based on the following information about
8 children:
5. The state of Tempora demands that
every household in the country shall have Child Tees” 4. 8b" 86 es
a reliable clock; inspectors are being intro- Arithmetic mark (x) |45 33 27 23 18 14 8 O
duced throughout the country to imple-
English mark (y) Seco tien 20 e129) SOR asian
ment the policy. The Chief Inspector has
the following data on the population size (SUJB)
of towns, where Inspection Units have
been set up, and the number of man- 8. The following data (Table A) represent
hours spent on inspection. the lengths (x) and breadths (y) of 12
cuckoos’ eggs measured in millimetres.
Population
Ghousendsy | 2 40,5 (2 a8 45" 28 (20-21) 22 Draw a scatter diagram for the data.
Obtain the least squares regression lines
Manboum
(thousands)
108 (11, dais" 24. 96y8F 82 pangae of y on x and plot this on the scatter
diagram. (JMB)
(a) Calculate the regression line for
predicting the number of man-hoursfrom 9. (X;, Y;),i=1,2,...,nisasample froma
the population size (note that the mean bivariate population. The least-square
value of each variate is a whole number). regression lines of Y on X and X on Y are
(b) Predict the manpower required (in calculated. Why would you not expect
man-hours) for a new Inspection Unit to the two lines to coincide? Under what
be installed in a town with a population circumstances would they coincide?
of 17000. (O) In the table, Y; is the mass (in grammes)
of potassium bromide which will dissolve
6. Ina certain heathland region there is a in 100 grammes of water at a tempera-
large number of alder trees where the ture of X; C.

Table A

22.3 23.6 24.2 22.6 22.38 22.38 22.1 23.38 22.2 22.2 21.8 23.2
16.5 17.1 17.38 17.0 16.8 16.4 17.2 16.8 16.7 16.2 16.6 16.4
REGRESSION AND CORRELATION 623

|X ]10 20 30 40 50 11. Explain clearly what is meant by the


|y [61 64 70 73 78 statistical term ‘correlation’.
Vegboost Industries, a small chemical
Find the equation of the regression line firm specializing in garden fertilizers set
of Y on X. up an experiment to study the relation-
Find, also, the product-moment correla- ship between a new fertilizer compound
tion coefficient between X and Y. (SUJB) and the yield from tomato plants. Fight
similar plants were selected and treated
regularly throughout their life with x
10. (a) The 1973 and 1980 catalogue prices grams of fertilizer diluted in a standard
(in pence) of five British postage stamps volume of water. The yield y, in kilo-
are as follows: grams, of good tomatoes was measured
for each plant. The following table
1973 price} 50 45 65 25 15 summarizes the results.
1980 price |500 350 600 500 120
ALTE PC iDeehh FAYGs -H
(i) Plot these results on a scatter diagram. Amount
of fertilizer x (g) |1.2 1.8 3.1 4.9 5.7 7.1 8.6 9.8
(ii) Write down the coordinates of one Yield y (kg) 4.5 °6.9°°-7.0'7.8 7.26.8 4.5 2.7
point through which the regression line
of the 1980 price on the 1973 price must (a) Calculate the product-moment correla-
pass. tion coefficient for these data.
(iii) Fit, by eye, the regression line of the (6) Calculate Spearman’s rank correlation
1980 price on the 1973 price. coefficient for these data.
(iv) Denoting the 1980 price by y and (c) Is there any evidence of a relation-
the 1973 price by x, write down the ship between these variables? Justify your
equation of your fitted regression line answer. (No formal test is required.)
in the form y = ax + b, giving the con- (AEB 1978)
stants a and 6 to one decimal place. Use
this equation to determine the value of y
when x has the value 20. 12. State the effect on the product-moment
correlation coefficient between two
(b) (i) Ona certain island there are large
variables x and y of (a) changing the
numbers of each of two clans, the Fatties
origin for x and (b) changing the units
and the Thinnies. Two random samples of
of x.
50 adult males are taken, one sample
from each clan. Each of the clansmen is Table B below gives the daily output
weighed and measured. For each clan, the of the substance creatinine from the
value of the correlation coefficient body of each of ten nutrition students
between the heights and weights of the together with the student’s body mass.
clansmen is found to be near + 1. How- Draw a scatter diagram for the data.
ever, for the combined sample of 100
Calculate, correct to two decimal places,
adult males, the value of the correlation
the product-moment correlation co-
coefficient is found to be near — 1. Show,
efficient.
by a sketch of the scatter diagram for the
combined sample, how this could arise. Comment on any relationship which is
indicated by the scatter diagram and the
(ii) A large sample survey of three-
correlation coefficient. (JMB)
person families is conducted. The value of
X, the greatest amount earned by any one
member of the family, and the value of Y,
13. In a regression calculation for five pairs of
the total amount earned by the entire
observations one pair of values was lost
family, are both recorded. Would you
when the data were filed. For the regres-
expect the value of the correlation
sion of y on x the equation was calculated
coefficient between X and Yto be near
as
+1, near —1 or near 0? Justify your
answer. (C) y = 2x—0.1
Table B

Output of creatinine |4 35 154 1.45 1.06 2.13 1.00 0.90 2.00 2.70 0.75
(grammes)
Body mass 55 48-56 53" 74 44 | 49-68 78 «61
(kilogrammes)
624 A CONCISE COURSE IN A-LEVEL STATISTICS

The four recorded pairs of values are The results are shown in the table.

|x [0.1 0.2 0.4 0.3


Ly [ot 0.3 0.7 0.4
Find the missing pair of values, using the
following data for the four pairs above:
Lx=1, Dx?=0.3, Dxy = 0.47, Dy =1.5.
(MEI)

14. (a) On two separate occasions, ranks 1, 2,


3 are assigned at random to three objects
A, B, C. Obtain the probability distribution Given that
of a coefficient of rank correlation XCM = 4.459 EC = 19278
between the pair of rankings. DM? = 7.196,
(b) Five sacks of coal, A, B,C, D and E find, to 3 decimal places, the product-
have different weights, with A being moment correlation coefficient of the
heavier than B, B being heavier than C, percentages of the two oxides. Calculate
and so on. A weight lifter ranks the sacks also, to 3 decimal places, a rank correla-
(heaviest first) in the order A, D, B, E, C.
tion coefficient.
Calculate a coefficient of rank correlation
between the weight lifter’s ranking and Using the tables provided state any con-
the true ranking of the weights of the clusions which you draw from the value
of your rank correlation coefficient.
sacks. (C)
State clearly the null hypothesis being
tested. (L)
15. In an investigation into prediction using
the stars and planets, a celebrated astrolo- 17, Giving an example from your projects if
gist Horace Cope predicted the ages at you wish, describe conditions under
which thirteen young people would first which you would use a rank correlation
marry. The complete data, of predicted coefficient as a measure of association.
and actual ages at first marriage, are now
In a ski-jumping contest each competitor
available and are summarised in Table C
made 2 jumps. The orders of merit for
below.
the 10 competitors who completed both
(a) Draw a scatter diagram of these data. jumps are shown in the table.
(b) Calculate the equation of the regres-
sion line of y on x and draw this line on
the scatter diagram. jumper
(c) Comment upon the results obtained,
particularly in view of the data for person
G. What further action would you suggest? jump
(AEB 1981)
Jump
16. Explain how you used, or could have
used, a correlation coefficient to analyse (a) Calculate, to 2 decimal places, a rank
the results of an experiment. State briefly correlation coefficient for the perfor-
when it is appropriate to use a rank mances of the ski-jumpers in the two
correlation coefficient rather than a jumps.
product-moment correlation coefficient.
(6) Using a 5% level of significance and
Seven rock samples taken from a par- quoting from the tables of critical values
ticular locality were analysed. The provided, interpret your result. State
percentages, C and M, of two oxides clearly your null and alternative hypo-
contained in each sample were recorded. theses. (L)P

Table C
amma omg Gowwie, al)
Predicte
age x (years)
d |24 30 28 36 20 22 31 28 21 29 40 25 27
Actual age y (years) 23 31 28 35 20 25 45 30 22 27° 40°27 96
REGRESSION AND CORRELATION 625

18. In Table D below x is the average weekly on a particular course are given examina-
household income in £ and y the infant tions in Sociology (S). Social Administra-
mortality per 1000 live births in 11 tion (SA) and Quantitative Methods
regions of the UK in 1985. (QM). The final grade awarded to each
It is hypothesised that a high value of x will student is based on the total of the marks
be associated with a low value of y. Explain scored on the three papers. Table E shows
why it would not be appropriate to use the the marks obtained by a sample of ten
product moment correlation coefficient to students who sat the three papers.
investigate this. Calculate a rank correlation
The following matrix of Spearman rank
coefficient and test its significance. The
correlation coefficients was obtained for
values below give the probabilities of
this sample of ten students.
exceeding the given values of rg and r,
calculated from’ 10 and 11 pairs of
uncorrelated variables.

One-sided test 1 0.24°°—-0.01 0.78


probabliitg Bi Panter eereh? DG 0.77
Two-sided
probability
test 10%. 6.0%. 2.0% 1 y

n=10 rg 0.5636 0.6485 0.7455 1


n=11 rg 0.5364 0.6182 0.7091
n=10 r, 0.4667 0.5111 0.6000 Find the values of x and y.
n=11 ry 0.4182 0.4909 0.5636 It has been decided that in future students
should only be required to sit two papers.
It appears that region A is exceptional. Use these data to decide which two
What would your findings be if this
examinations should be used. Give a
region were omitted from the analysis? reason for your choice. (AEB 1987)
(SUJB)

; 20. (a) Explain briefly, referring to your


19. Define a ranking scale and give an example project work if you wish, the conditions
to illustrate your definition. Explain how under which you would measure associa-
you would rank values of equal magni- tion using a rank correlation coefficient
tude. rather than a product moment correla-
At the end of the academic year students tion coefficient.

Table D

170.4 183.2 172.9 187.1 203.2 204.8 208.8 248.0 198.3 187.1 179.1

94 10.3 10%” 8:3 9.4 8.5 9.0 9.4 9.8

Table E

: Social Quantitative
Student | Sociology(S) | Administration(SA) | Methods (QM)
66 48 44

= SCO
ONaQoarhwnre
626 A CONCISE COURSE IN A-LEVEL STATISTICS

At an agricultural show 10 Shetland (a) Calculate Spearman’s rank correla-


sheep were ranked by a qualified judge tion coefficient between
and by a trainee judge. Their rankings are (i) price and transport manager ’s
shown in the table. rankings,
(ii) price and saleswoman’s grades.
Qualified (b) Based on the results of (a) state,
deco
4 Oe 1 Ono #20 giving a reason, whether it would be
judge
necessary to use all three different
Trainee methods of assessing the cars.
1.2.5 627.8, 100403 9
judge (c) A new employee is asked to collect
further data and to do some calculations.
Calculate a rank correlation coefficient He produces the following results.
for these data. The correlation coefficient between
Using one of the tables provided and a (i) price and boot capacity is 1.2,
10% significance level, state your conclu- (ii) maximum speed and fuel consum-
sions as to whether there is some degree ption in miles per gallon is —0.7,
of agreement between the two sets of (iii) price and engine capacity is
ranks. —0.9.
(b) The variables H and T are known to For each of his results say, giving a reason,
be linearly related. Fifty pairs of experi- whether you think it is reasonable.
mental observations of the two variables (d) Suggest two sets of circumstances
gave the following results: where Spearman’s rank correlation
coefficient would be preferred to the
LH = 83.4, 2T a. 402.0, product moment correlation coefficient
2HT = 680.2, 2H" = 384.6, as a measure of association. (AEB 1988)
DT = 823812)
Obtain the regression equation from 22. Asample of npairs (x;, ¥;), i= 1, 2,...,7,
which one can estimate H when T has the is drawn from a bivariate population
value 7.8, and give, to 1 decimal place, (X, Y) and a rank correlation coefficient,
the value of this estimate. (L) r, calculated.
(a) What range of values is it possible for
21. A company is to replace its fleet of cars. r to have?
Eight possible models are considered and (6) What information about the sample
the transport manager is asked to rank does r indicate?
them, from 1 to 8, in order of preference. (c) What can be concluded about the
A saleswoman is asked to use each type sample points when r= 1? Can the same
of car for a week and grade them accor- be said about the population from which
ding to their suitability for the job (A — the sample is drawn? Explain your
very suitable to E — unsuitable). The price answer.
is also recorded. Table F below gives the average share
7 rate and average mortgage rate calculated
Rati ore Saleswoman’s| Price on the first day of the months shown for
ode ee = the years 1976 to 1985.
FARA Plot a scatter diagram and comment on its
Ss implication for r. Indicate on your diagram
iT. which point appears to be an outlier (i.e.
U one that is far from the trend line).
+ Calculate a rank correlation coefficient
between share rate and mortgage rate.
Y Under the null hypothesis that the popula-
Z tion r= 0 against the alternative r#0
the following are critical values of Spear-
Table F

1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
Nov Nov Dec _. Dec Jan Nov Dec Jul Dec Apr
78 6.0. 8.0,%10:5 ,10.5:| wo8uleee m7 on Mas emnes
Mortgage% | 12.2 95° 11.85,11,8 915.0), 15.03910.0 o6i1 25 see
REGRESSION AND CORRELATION 627

man’s and Kendall’s coefficients for a whether there is any association between
sample size of 10. the number of days training and a per-
ceived measure of accuracy based on the
Significance Spearman’s Kendall’s difference between x, and x». Con-
level sequently a new variable z = x;—x was
created.
(c) Plot a scatter diagram of y against z.
Explain why the manager should not
correlate y and z using the product
moment correlation coefficient.
Comment on the significance of your (d) Explain why 2? might be a better
calculated value of r. (SUJB) variable to correlate with y using the
product moment correlation coefficient.
23. Explain briefly what is measured by the Evaluate the correlation coefficient bet-
product moment correlation coefficient. ween y and 2° and explain why the
The manager of a large office supervises manager might be pleased with the value
15 clerical assistants, each using a word- obtained. Suggest how this new variable
processor. Because of the pressure of would present the manager with a practical
work, the assistants did not all receive the problem. (AEB 1988)
same amount of training in the use of 24. The experimental data below were
their word-processors. In order to make obtained by measuring the horizontal
an assessment of the need for training the distance y cm, rolled by an object released
manager monitored their work during a from the point P on a plane inclined at
given week, recording the number of 0° to the horizontal, as shown in the
pieces of work correctly produced without diagram. P
any errors (x,;), the number produced
containing errors (x2) together with the <—ycm—> # |
number of days training received (y).
The results are summarised in Table G
below.
(a) Given that Dx? =11513, Ly? = 728
and 2x,y =
2676, show that the product
moment correlation coefficient between
y and x, is 0.491.
(6) Without using a comment of the
form ‘The correlation between x, and y is
not very strong’, suggest how the manager
might have attempted to interpret this
value as part of the assessment. Ly = 828, Lyd 18 147,
The manager then decided to investigate LO = 155.5, ZO 3520.25.

Table G

Number of days
training (y)
en

oO
aw
5
ray1
8
9
3
8
2
6
4
5
3
rea1
628 A CONCISE COURSE IN A-LEVEL STATISTICS

(a) Illustrate the data by a scatter 26. A purchasing manager, of a London-


diagram. based company, believes that the time in
(b) Calculate the equation of the regres- transit of goods sent by road depends
sion line of distance on angle and draw upon the distance between the supplier
this line on the scatter diagram. and the company. In an attempt to
(c) It later emerged that one of the measure this dependence, twelve packages,
points was obtained using a different sent from different parts of the country,
object. have their transit times (y, days) accur-
(i) Suggest which point this was. ately recorded, together with the distance
(ii) Draw by eye aline of best fit on (x, miles) of the supplier from the com-
the scatter diagram ignoring the point pany. The results are summarised as
apparently obtained with the differ- follows:
ent object.
(iii) Use the line drawn by eye to Vx =01800 ee Dyeo3G.0:
estimate the distance the original Uxy = 6438.6, Lx? = 336 296,
object would roll if released at an Sys = "426-384.
angle of (a) 12°, (b) 40°.
Discuss the uncertainty of each of these Obtain the least squares straight line
estimates. (AEB 1987) regression equation of y on x.

25. Table H below gives the average cost per Explain the significance of the regression
hundredweight of zinc manufactures coefficient.
imported into the UK during each of the
years 1873 to 1882. Predict the transit time of a package sent
from a supplier 200 miles away from the
(a) Plot the data on graph paper, by
company.
coding with (year—1872) as the x
variable and (cost —100) as the y variable. Give two reasons why you would not use
(b) Given that Yy = 270 and the equation to predict transit time for a
Lxy = 1057, show that the gradient of package sent from a supplier 1500 miles
the equation of the least squares regression away.
line of y on x is —5.2 (to 2 significant
figures). Calculate the equation of this Calculate the product moment correla-
line and plot it on your graph. tion coefficient between x and y.
(c) Use your equation to predict the cost
of zinc manufactures imported in 1883. Explain why the value you have obtained
Comment on your prediction. supports the purchasing manager’s attempt
(Source: Statistical Abstract for the to establish a regression equation of y
United Kingdom 1871 to 1885.) (O) on x. (AEB 1987)

Table H

Year 1873 1874 1875 1876 1877 1878 —1879 1880 £1881 1882
Cost (p) 147 147 144 140 129 119 112 116 107 .109
APPENDIX 1
RANDOM 6523 6800/7782 5814/1085 118515711 737414525 5046
NUMBERS 0956 7651/0473 9430/1674 6959/0438 839813020 8785
5599 9860] 0133 0693/8513 231712551 9204/5231 3870
7282 4544/0953 0483/0383 984116741 0138|6683 1199
0421 2872/7325 0274/3581 7849/5267 6140/6050 4750
8701 8059| 8936 4159|6027 6489/4745 1821/6984 7606
3162 4653/8440 5631/7476 5223/7295 9606/5683 8522
2981 5794/3591 9070/9424 1935/5022 2372/8734 8315
3998 7422/7719 1281)|2942 0450/6234 3681)| 4307 9792
5614 8010/7652 3854)|8413 9990/2255 4104/7237 8933
2956 6274/1267 0935)|8933 0428/4475 0157/8745 5221
9332 5738] 3936 8742|7255 7397|9836 5741/7609 1168
9569 5154/4319 2049/5725 9055| 2620 7098/4373 5645
6571 3243|/6467 2255/6565 4886/1088 2012/4018 49.25
9027 3343/9784 2057/4991 4120|)1764 2960|6687 5597
9029 4245/6134 3013/3039 2152|)5928 6498|0876 0927
9974 0629/2055 7270/1143 9582/7537 9024/7748 6321
8787 5691|1697 5150/6136 9647|7668 4911/5056 5106
4624 1774)9737 3903)5483 3400] 7461 7751/4363 1567
6679 8143) 4092 8472)8832 8324|6701 4134/7019 2693
36 42 9458] 8330 9239/1840 0300/1290 3237]9165 4815
0766 2508|9927 6948)|8532 1646/1931 8502]|8636 2296
9310 0572|1826 3667)6848 3169|6858 9349/4586 9929
4950 6399) 2671 4794) 3271 7291] 3418 7406] 3214 4080
2075 5889)]3904 4273)3793 1107) 2877 9136)6047 8262
0240 6209|0071 0937)|8044 5037)| 38270 2038)|7186 75 34
5987 2138] 2978 7267)4283 6521)5479 6642)4786 3115
4808 9966/4338 2813/5025 4793/1115 0784) 2830 1907
5426 8675|4415 2039}] 2003 5854/8029 6253)0697 7151
85 35 5845] 2358 6366]0962 8092/1455 8141] 2148 8734
7384 9049]0121 9029/5706 6873]5110 5195)6308 5799
3464 7800/9259 6774|5848 9209/4220 4037]6380 5893
6856 8747/6306 2471]4198 7906)0718 5829)1649 6737
7247 0542/8807 2755|5874 8208/4228 2648) 2532 0031
4444 9675|]8957 1260] 4238 7736) 4569 2168) 3270 0496
2811 5747|6157 8988/6218 9367|]5732 9672/2117 1354
8722 3888/9199 1608/1776 2747/5214 9886) 3568 2385
4493 1459/6740 2410]1163 4047|0756 1422/6274 9339
8184 3725/9043 5662/9458 4903) 8422 5722) 4798 8637
0975 3521/0447 5408/9844 0816] 4486 6971] 2052 6494
7765 0504/2218 2010) 8187 0569/4370 9676} 42 05
1906 5161] 38403 6155) 9858 8350/0148 9985 | 08 67
5291 8707/1962 3228|0491 4248)6524 8609| 8768
5247 2514/9391 7551)|4926 4941) 2083 3030/ 43 22
5267 8740|6341 9186)1047 8070] 5687 2586| 8994
6525 7173|7860 5062/9104 9597/6416 7131] 3280
2997 5642|5690 1675/7495 9926)0163 2516] 5418
1525 0368/9245 5300/0629 4643/4666 2712/8505
8208 6567|6413 5114/3828 2430/3962 2035] 2390
8135 0325 (82 24 8359|0467 5152 [28 21 6975] 87 28

Each digit in this table is an independent sample from a population


where each of the digits 0 to 9 has a probability of occurrence of
0.1. It should be noted that these digits have been computer genera-
ted, and are therefore ‘pseudo’ random numbers.
629
630 A CONCISE COURSE IN A-LEVEL STATISTICS

CUMULATIVE BINOMIAL PROBABILITIES


The tabulated value is P(X <r) where X ~ Bin(n, p)

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p=
2 0.9025 | 0.8100 |0.7225] 0.6400 | 0.5625| 0.4900| 0.4225 pee | 0.3025 | 0.2500
| 0.8400 0.7975| 0.7500
0.9975 | 0.9900 | 0.9775 | 0.9600| 0.9375 | 0.9100 | 0.8775 |
1.0000 | 1.0000 | 1.0000] 1.0000 | 1.0000 | 1.0000| 1.0000 1.0000 — 1.0000
= ll ow “s ll 0.8574 | 0.7290 | 0.6141 | 0.5120 | 0.4219 | 0.3480 | 0.2746 | 0.2160 | 0.1664 | 0.1250
0.9928 | 0.9720 | 0.9393 | 0.8960 | 0.8438 | 0.7840 | 0.7183 | 0.6480 | 0.5748 0.5000
0.9999 | 0.9990 | 0.9966 | 0.9920 | 0.9844 | 0.9730 | 0.9571 | 0.9360 | 0.9089 | 0.8750
{1 0000 | 1.0000 | 1.0000 1.0000 | 1.0000 | 1.0000 |.1.0000 | 1.0000 | 1.0000 | 1.0000
wnNnNro|;NrO

x II 0.8145 | 0.6561 | 0.5220) 0.4096 | 0.3164 0.2401 | 0.1785 | 0.1296 | 0.0915 | 0.0625
0.9860 | 0.9477 | 0.8905 | 0.8192 | 0.7383 | 0.6517 | 0.5630 | 0.4752 | 0.3910 | 0.3125
0.9995 | 0.9963 | 0.9880 | 0.9728 | 0.9492 | 0.9163 | 0.8735 | 0.8208 | 0.7585 | 0.6875
1.0000 | 0.9999 | 0.9995 | 0.9984 | 0.9961 | 0.9919 | 0.9850 | 0.9744 | 0.9590 | 0.9375
©
re
PWN 1.0000| 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000
= Il or ~ lI 0.7738 | 0.5905 | 0.4437 0.3277 | 0.2373 0.1681 | 0.1160 | 0.0778 | 0.0503 | 0.0313
0.9774 | 0.9185 | 0.8352] 0.7373 | 0.6328 | 0.5282 | 0.4284 | 0.3370 | 0.2562 | 0.1875
0.9988 | 0.9914 | 0.9734| 0.9421 | 0.8965 | 0.8369 | 0.7648 | 0.6826 | 0.5931 | 0.5000
1.0000 | 0.9995 | 0.9978 | 0.9933 | 0.9844 | 0.9692 | 0.9460 | 0.9130 | 0.8688 | 0.8125
1.0000 | 0.9999 | 0.9997 | 0.9990 | 0.9976 | 0.9947 | 0.9898 | 0.9815 | 0.9688

= ll a me ll
| 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 1.0000 | 1.0000
0.7351 | 0.5314 | 0.3771 | 0.2621 | 0.1780 | 0.1176 | 0.0754 | 0.0467 | 0.0277 | 0.0156
1.0000

0.9672 | 0.8857 | 0.7765 | 0.6554 | 0.5339 | 0.4202 | 0.3191 | 0.2333 | 0.1636 | 0.1094
0.9978 | 0.9842} 0.9527 | 0.9011 | 0.8306 | 0.7443 | 0.6471 | 0.5443 | 0.4415 | 0.3438
brHolapnwnro
0.9999 | 0.9987 | 0.9941 | 0.9830 | 0.9624 | 0.9295 | 0.8826 | 0.8208 | 0.7447 | 0.6563
1.0000 | 0.9999 | 0.9996 | 0.9984 | 0.9954 | 0.9891 | 0.9777 | 0.9590 | 0.9308 | 0.8906
1.0000 | 1.0000 | 0.9999!) 0.9998 | 0.9993 | 0.9982 | 0.9959 | 0.9917 | 0.9844
en 1.0000 | 1.0000 | 1.0000 | 1.0000 |1.0000 Bee
= ll =] 3 ll 0.6983 | 0.4783 | 0.8206 | 0.2097 | 0.1335 | 0.0824 | 0.0490 | 0.0280 | 0.0152 | 0.0078
0.9556 | 0.8503 | 0.7166 | 0.5767 | 0.4449 | 0.3294 | 0.2338 | 0.1586 | 0.1024 | 0.0625
0.9962 | 0.9748 | 0.9262 | 0.8520 | 0.7564 | 0.6471 | 0.5323 | 0.4199 | 0.3164 | 0.2266
0.9998 | 0.9973 | 0.9879 | 0.9667 | 0.9294 | 0.8740 | 0.8002 | 0.7102 | 0.6083 | 0.5000
1.0000 | 0.9998 | 0.9988 | 0.9953 0.9871 | 0.9712 | 0.9444 | 0.9037 | 0.8471 | 0.7734
1.0000 | 0.9999] 0.9996 0.9987 | 0.9962 | 0.9910 | 0.9812 | 0.9648 | 0.9375
1.0000 | 1.0000 , 0.9999 | 0.9998 | 0.9994 | 0.9984 | 0.9963 | 0.9922
1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000
= | oo 4 ll 0.6634 | 0.4305 | 0.2725 |
0.1678 | 0.1001 | 0.0576 0.0319| 0.0168 | 0.0084 | 0.0039
0.9428 | 0.8131 | 0.6572 |
0.5033 | 0.3671 | 0.2553 | 0.1691 | 0.1064 | 0.0632 | 0.0352
0.9942 | 0.9619 | 0.8948 |
0.7969 | 0.6785 | 0.5518 | 0.4278 | 0.38154 | 0.2201 | 0.1445
0.9996 | 0.9950 | 0.9786 |
0.9437 | 0.8862 | 0.8059 | 0.7064 | 0.5941 | 0.4770 | 0.3633
1.0000 | 0.9996 | 0.9971 |
0.9896 | 0.9727 | 0.9420 | 0.8939 | 0.8263 | 0.7396 | 0.6367
1.0000 | 0.9998 |
0.9988 | 0.9958 | 0.9887 | 0.9747 | 0.9502 | 0.9115 | 0.8555
1.0000 |
0.9999 | 0.9996 | 0.9987 | 0.9964 | 0.9915 | 0.9819 | 0.9648
& 1.0000 | 1.0000 | 0.9999 | 0.9998 | 0.9993 | 0.9983 | 0.9961
AOMBRWNFOINDTBWNFO]ODUB
oo 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000
APPENDIX
631
CUMULATIVE BINOMIAL PROBABILITIES
The tabulated value is P(X <r) where X ~ Bin(n, p)

p= 0.05" > 0:10-690.15" 990.2058 0.25 08080 010/35 <b0i40 0145)’ 0.50
n=9 r=0 | 0.6302 | 0.3874 | 0.2316 | 0.1342 ]0.0751 0.0404 | 0.0207 | 0.0101 | 0.0046| 0.0020
1 | 0.9288 | 0.7748 | 0.5995 | 0.4362 | 0.3003 | 0.1960 | 0.1211 | 0.0705 | 0.0385 | 0.0195
2 | 0.9916 | 0.9470 | 0.8591 | 0.7382 | 0.6007 | 0.4628 | 0.3373 | 0.2318 | 0.1495 | 0.0898
3 | 0.9994 | 0.9917 | 0.9661 | 0.9144 | 0.8343 | 0.7297 | 0.6089 | 0.4826 | 0.3614 | 0.2539
4 | 1.0000 | 0.9991 | 0.9944 | 0.9804 | 0.9511 | 0.9012 | 0.8283 | 0.7334 | 0.6214 | 0.5000
5 0.9999 | 0.9994 | 0.9969 | 0.9900 | 0.9747 | 0.9464 | 0.9006 | 0.8342 | 0.7461
6 1.0000 | 1.0000 | 0.9997 | 0.9987 | 0.9957 | 0.9888 | 0.9750 | 0.9502 | 0.9102
7 1.0000 | 0.9999 | 0.9996 | 0.9986 | 0.9962 | 0.9909 | 0.9805
8 1.0000 | 1.0000 | 0.9999 | 0.9997 | 0.9992 | 0.9980
2 1.0000 | 1.0000 | 1.0000 | 1.0000
n=10 r=0 | 0.5987 | 0.3487 | 0.1969 | 0.1074 | 0.0563 | 0.0282 | 0.0135 | 0.0060 | 0.0025 | 0.0010
1 | 0.9139 | 0.7361 | 0.5443 | 0.3758 | 0.2440 | 0.1493 | 0.0860 | 0.0464 | 0.0233 | 0.0107
2 | 0.9885 | 0.9298 | 0.8202 | 0.6778 | 0.5256 | 0.3828 | 0.2616 | 0.1673 | 0.0996 | 0.0547
3 | 0.9990 | 0.9872 | 0.9500 | 0.8791 | 0.7759 | 0.6496 | 0.5138 | 0.3823 | 0.2660 | 0.1719
4 | 0.9999 | 0.9984 | 0.9901 | 0.9672 | 0.9219 | 0.8497 | 0.7515 | 0.6331 | 0.5044 | 0.3770
5 | 1.0000 | 0.9999 | 0.9986 | 0.9936 | 0.9803 | 0.9527 | 0.9051 | 0.8338 | 0.7384 | 0.6230
6 | ~~. | 1.0000 | 0.9999 | 0.9991 | 0.9965 | 0.9894 | 0.9740 | 0.9452 | 0.8980 | 0.8281
7 0.9999 | 0.9996 | 0.9984 | 0.9952 | 0.9877 | 0.9726 | 0.9453
8 1.0000 | 0.9999 | 0.9995 | 0.9983 | 0.9955 | 0.9893
9 1.0000 | 1.0000 | 0.9999 | 0.9997 | 0.9990
10 1.0000 | 1.0000 | 1.0000
ne 1s) a0 0.0134 | 0.0047 | 0.0016 | 0.0005 | 0.0001 | 0.0000
14) 0.0802 | 0.0353 | 0.0142 | 0.0052 | 0.0017 | 0.0005
2 | 0.9638 | 0.8159 | 0.6042 | 0.3980 | 0.2361 | 0.1268 | 0.0617 | 0.0271 | 0.0107 | 0.0037
3 | 0.9945 | 0.9444 | 0.8227 | 0.6482 | 0.4613 | 0.2969 | 0.1727 | 0.0905 | 0.0424 | 0.0176
4 | 0.9994 | 0.9873 | 0.9383 | 0.8358 | 0.6865 | 0.5155 | 0.3519 | 0.2173 | 0.1204 | 0.0592
5 | 0.9999 | 0.9978 | 0.9832 | 0.9389 | 0.8516 | 0.7216 | 0.5643 | 0.4032 | 0.2608 | 0.1509
6 | 1.0000 | 0.9997 | 0.9964 | 0.9819 | 0.9434 | 0.8689 | 0.7548 | 0.6098 | 0.4522 | 0.3036
7 1.0000 | 0.9994 | 0.9958 | 0.9827 | 0.9500 | 0.8868 | 0.7869 | 0.6535 | 0.5000
8 0.9999 | 0.9992 | 0.9958 | 0.9848 | 0.9578 | 0.9050 | 0.8182 | 0.6964
9 1.0000 | 0.9999 | 0.9992 | 0.99683 | 0.9876 | 0.9662 | 0.9231 | 0.8491
10 1.0000 | 0.9999 | 0.9998 | 0.9972 | 0.9907 | 0.9745 | 0.9408
17 1.0000 | 0.9999 | 0.9995 | 0.9981 | 0.9937 | 0.9824
12 1.0000 | 0.9999 | 0.9997 | 0.9989 | 0.9963
13 1.0000 | 1.0000 | 0.9999 | 0.9995
14 1.0000 | 1.0000
n=20 r=O0 | 0.3585 | 0.1216 | 0.0388 | 0.0115 | 0.0032 | 0.0008 | 0.0002 | 0.0000 | 0.0000 | 0.0000
1 | 0.7358 | 0.3917 | 0.1756 | 0.0692 | 0.0243 | 0.0076 | 0.0021 | 0.0005 | 0.0001 | 0.0000
2 | 0.9245 | 0.6769 | 0.4049 | 0.2061 | 0.0913 | 0.0355 | 0.0121 | 0.0036 | 0.0009 | 0.0002
3 | 0.9841 | 0.8670 | 0.6477 | 0.4114 | 0.2252 | 0.1071 | 0.0444 | 0.0160 | 0.0049 | 0.0013
4 | 0.9974 | 0.9568 | 0.8298 | 0.6296 | 0.4148 | 0.2375 | 0.1182 | 0.0510] 0.0189 | 0.0059
5 | 0.9997 | 0.9887 | 0.9327 | 0.8042 | 0.6172 | 0.4164 | 0.2454 | 0.1256 | 0.0553 | 0.0207
6 | 1.0000 | 0.9976 | 0.9781 | 0.9133 | 0.7858 | 0.6080 | 0.4166 | 0.2500 | 0.1299 | 0.0577
7 0.9996 | 0.9941 | 0.9679 | 0.8982 | 0.7723 | 0.6010 | 0.4159 | 0.2520 | 0.1316
8 0.9999 | 0.9987 | 0.9900 | 0.9591 | 0.8867 | 0.7624 | 0.5956 | 0.4143 | 0.2517
9 1.0000 | 0.9998 | 0.9974 | 0.9861 | 0.9520 | 0.8782 | 0.7553 0.5914 | 0.4119
10 1.0000 | 0.9994 | 0.9961 | 0.9829 | 0.9468 | 0.8725 | 0.7507 | 0.5881
11 0.9999 | 0.9991 | 0.9949 | 0.9804 | 0.9435 | 0.8692 | 0.7483
12 1.0000 | 0.9998 | 0.9987 | 0.9940 | 0.9790 | 0.9420 | 0.8684
13 1.0000 | 0.9997 | 0.9985 | 0.9935 | 0.9786 | 0.9423
14 1.0000 | 0.9997 | 0.9984 | 0.9936 | 0.9793
15 1.0000 | 0.9997 | 0.9985 | 0.9941
16 1.0000 | 0.9997 | 0.9987
sea 1.0000 | 0.9998
18 | 4 1.0000
632 A CONCISE COURSE IN A-LEVEL STA TISTICS

CUMULATIVE POISSON PROBABILITIES


The tabulated value is P(X <r) where X ~ Po(A)

0.4 0.5 0.6 0.8 1.0 1.2 1.4 1.5

0.6703 | 0.6065 0.5488 | 0.4493 0.3679 0.3012 0.2466 0.2231


0.9384 | 0.9098 0.8781 | 0.8088 0.7358 0.6626 0.5918 0.5578
0.9921 | 0.9856 0.9769 | 0.9526 O:SLST 0.8795 0.8335 0.8088
0.9992 | 0.9982 0.9966 | 0.9909 0.9810 0.9662 0.9463 0.9344
0.9999 | 0.9998 0.9996 | 0.9986 0.9963 0.9923 0.9857 0.9814
1.0000 | 1.0000 1.0000 | 0.9998 0.9994 0.9985 0.9968 0.9955
1.0000 0.9999 0.9997 0.9994 OR99 911)
1.0000 1.0000 059999 0.9998
1.0000 1.0000

1.8 2.0 2.2 2.4 3.0

0.1353 baits 0.0498


0.4060 0.3546 | 0.3084 O99
0.6767 0.6227 | 0.5697 0.4232
0.8571 0.8194 | 0.7787 0.6472
0.9473 0.9275 | 0.9041 0.8153
0.9834 0.9751 | 0.9643 0.9161
0.9955 0.9925 | 0.9884 0.9665
0.9989 0.9980 | 0.9967 0.9881
0.9998 0.9995 | 0.9991 0.9962
1.0000 0.9999 | 0.9998 0.9989
bh 1.0000 | 1.0000 0.9997
OS,
ee
NrFPOCWOMARDTAWNH 1.0000
jee ss

3.2 3.4 3.5 3.6 3.8 4.0 4.5 5.0 5.5

|
—-——-

0.0183] 0.0111] 0.0067


algae
s 0.0408 0.0334 0.0302 0.0273 | 0.0224 0.0041
0.1712 0.1468 0.1359 0.1257 | 0.1074 0.0916 0.0611 0.0404 0.0266
0.3799 0.3397 0.3208 0.3027 | 0.2689 0.2381 0.1736 0.1247 0.0884
0.6025 0.5584 0.5366 0.5152 | 0.4735 0.4335 0.3423 0.2650 0.2017
0.7806 0.7442 0.7254 0.7064 | 0.6678 0.6288 0.5321 0.4405 0.3575
0.8946 0.8765 0.8576 0.8441 | 0.8156 0.7851 0.7029 0.6160 0.5289
0.9554 0.9421 0.9347 0.9267 | 0.9091 0.8893 0.8311 0.7622 0.6860
0.9832 0.9769 0.9733 0.9692 | 0.9599 0.9489 0.9134 0.8666 0.8095
0.9943 OO9IT 0.9901 0.9883 | 0.9840 0.9786 0.9597 0.9319 0.8944
0.9982 0.9973 0.9967
oOmrIonrhrwNnro 0.9960 | 0.9942 0.9919 0.9829 0.9682 0.9462
0.9995 0.9992 0.9990 0.9987 | 0.9981 0.9972 0.9933 0.9863 0.9747
09999) 0.9998 0.9997 0.9996 | 0.9994 0.9991 0.9976 0.9945 0.9890
1.0000 O29 999 0.9999 0.9999 | 0.9998 0.9997 0.9992 0.9980 0.9955
1.0000 1.0000 1.0000 | 1.0000 0.9999 O99 97 0.9993 0.9983
1.0000 G9S99 0.9998 0.9994
1.0000 0.9999 0.9998
1.0000 0.9999

[ Hite ail it
1.0000
APPENDIX
633

CUMULATIVE POISSON PROBABILITIES


The tabulated value is P(X <r) where X ~ Po(d)

6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0


0) 0.0015 | 0.0009 | 0.0006 | 0.0008 | 0.0002 0.0001 | 0.0001 | 0.0000
i 0.0113 | 0.0073 | 0.0047 | 0.0030 | 0.0019 | 0.0012 | 0.0008 | 0.0005
2 0.0430 | 0.0296 | 0.0203 | 0.0138 | 0.0093 | 0.0062 | 0.0042 | 0.0028
3 0.1118 | 0.0818 | 0.0591 | 0.0424 | 0.0301 | 0.0212 | 0.0149! 0.0103
4 0.2237 | 0.1730 | 0.1321 | 0.0996 | 0.0744 | 0.0550 | 0.0403 | 0.0293
5 0.3690 | 0.3007 | 0.2414 | 0.1912 | 0.1496 | 0.1157 | 0.0885 | 0.0671
6 0.6265 | 0.4497 | 0.3782 | 0.3134 | 0.2562 | 0.2068 | 0.1649] 0.1301
7 0.6728 | 0.5987 | 0.5246 | 0.4530 | 0.3856 | 0.3239 | 0.2687 | 0.2202
8 0.7916 | 0.7291 | 0.6620 | 0.5925 | 0.5231 | 0.4557 | 0.3918] 0.3328
9 0.8774 | 0.8305 | 0.7764 | 0.7166 | 0.6530 | 0.5874 | 0.5218 | 0.4579
10 0.9332 | 0.9015 | 0.8622 | 0.8159 | 0.7634 | 0.7060 | 0.6453 | 0.5830
a 0.9661 | 0.9467 | 0.9208 | 0.8881 | 0.8487 | 0.8030 | 0.7520 | 0.6968
12 0.9840 | 0.9730 | 0.9573 | 0.9362 | 0.9091 | 0.8758 | 0.8364 | 0.7916
13 0.9929 | 0.9872 | 0.9784 | 0.9658 | 0.9486 | 0.9261 | 0.8981 | 0.8645
14 0.9970 | 0.9943 | 0.9897 | 0.9827 | 0.9726 | 0.9585 | 0.9400 | 0.9165
15 0.9988 | 0.9976 | 0.9954 | 0.9918 | 0.9862 | 0.9780 | 0.9665 | 0.9513
16 0.9996 | 0.9990 | 0.9980 | 0.9963 | 0.9934 | 0.9889 | 0.9823 | 0.9730
ed 0.9998 | 0.9996 | 0.9992 | 0.9984 | 0.9970 | 0.9947 | 0.9911 | 0.9857
0.9999 0:9999)1'0.9997 0.9993 | 0.9987 | 0.9976 | 0.9957 | 0.9928
1.0000 1.0000 | 0.9999 0.9997 | 0.9995 | 0.9989 | 0.9980} 0.9965
1.0000 0.9999 | 0.9998 | 0.9996 | 0.9991 | 0.9984
1.0000 | 0.9999 | 0.9998 | 0.9996 | 0.9993
1.0000 | 0.9999 | 0.9999 | 0.9997
1.0000 | 0.9999 | 0.9999
1.0000} 1.0000
634 A CONCISE COURSE IN A-LEVEL STATISTICS

N(O, 1)

(z)

THE DISTRIBUTION FUNCTION ®(z) OF


THE NORMAL DISTRIBUTION N(0, 1) 0 z

z 0 i 2 3 4 5 6 7 8 9)

0.0 5040 .5080 .5120 |.5160 5199 .5239 |.5279 .5319 .5359 Avs
0.1 5488 5478 .5517 |.5557 .5596 .56386 |.5675 .5714 5753) 4 8
0.2 5832 5871 5910 |.5948 5987 .6026 |.6064,~.6103 .6141 | 4 8
0.3 6217 .6255 .6293 |.6331 .6368 .6406 |.6443 .6480 .6517 | 4 7
0.4 6591 .6628 .6664|.6700 .6736 .6772 |.6808 .6844 .6879 | 4 7
0.5 6950 .6985 .7019 |.7054 .7088 .7123 |.7157 .7190 .7224] 3 7
0.6 7291 .1324 .7857 |.7389 .7422 .7454 |.7486 .7517 .7549 | 3 7
0.7 WGI “742 “7673 |104 7784 e764 |79L 1823 7852 ea (6) 9 Dal Eh 7)
0.8 7910 .7939 .7967 |.7995 .8023 .8051 |.8078 .8106 .8133} 3 5 8 1OR22N25
0.9 8186 .8212 .8238 |.8264 .8289 .8315 |.8340 .8365 .8389 | 3 5 8 18 20 23
1.0 8438 .8461 .8485 |.8508 .8531 .8554 |.8577 .8599 8621 | 2 5 7/9 16 19 21
iil 8665 .8686 .8708 |.8729 .8749 .8770 |.8790 .8810 .8830 | 2 4 6) 8 14 16 18
Hee 8869 .8888 .8907 |.8925 .8944 .8962 |.8980 8997 .9015 | 2 4 6| 7 9 13005) 17
1.3 9049 .9066 .9082 |.9099 .9115 .9131 |.9147 .9162 .9177 | 2 3 5] 6 8 lipitor
1.4 9207 .9222 .9236 |.9251 .9265 .9279 |.9292 .9306 .93819 | 1 3 4] 6 7 Op ieetS
1.5 9345 .9357 .9370 |.9382 .9394 .9406 |.9418 .9429 9441] 1 2 4] 5 6 7} 8 10 11
1.6 9463 .9474 .9484 |.9495 .9505 .9515 |.9525 .9585 .9545|1 2 3/ 4 5 6| 7 8 9
1.7 9564 .9573 .9582 |.9591 .9599 .9608 |.9616 .9625 .9633 | 1 2 3] 4 4 5] 6 7 8
1.8 9649 .9656 .9664 |.9671 .9678 .9686 |.9693 .9699 .9706 | 1 1 2] 8 4 4]| 5 6 6
1.9 9719 9726) 97329738 9744) 19750) 119756).976) -9767—| 1 1 2) (2 3 4/7405 5
2.0 9778 .9783 .9788 |.9793 .9798
.9803 |.9808 .9812 .9817 | 0 1 1) 2 2 3) 38 4 4
ont 9826 .9830 .9834 |.9838 .9842
.9846 !|.9850..9854 .9857 | 0 1 1] 2 2 2] 3 3 4
2.2 |.9861 |.9864 .9868 .9871 |.9875 .9878
.9881 |.9884 19887". 9890" Or ie meets (2) 0) tomes 8
2.3 |.9893 |.9896 .9898 | O), SW FATES Fh Weito. go ato
.9901 |.99036 .99061 .99086 3 5) 810) 13 15/18 20 23
99111 .99134 99158] 2 5 7] 9 12 14/16 18 21
2.4 |.99180].99202
.99224 .99245|.99266 | DEAD Ge Set lS Seared 9)
99286 .99305).99324 .99343 .99361' 2 4 6]! 7 9 11/13 15 17
2.5 |.99379|.99396 .99413 .99430].99446 .99461 .99477|.99492 .99506 .99520| 2 3 5] 6 8 9/11 12 14
2.6 |.99534}.99547 .99560 .99573].99585 .99598 .99609].99621 .99632 .99643] 1 2 3] 5 6 7| 8 9 10
2.7 |.996531.99664 .99674 .99683].99693 .99702 .99711].99720 .99728 99736, 1 2 3] 4 5 6| 7 8 9
2.8 |.997441.99752 .99760 .99767|.99774 .99781 .99788|.99795 .99801 99807; 1 1° 2) 38 4 4] 5 6 6
2.9 |.998131|.99819 .99825 .99831|.99836 .99841 .99846].99851..99856 .99861)/ 0 1 1] 2 2 3) 3 4 4
3.0 |.99865 |.99869 .99874 .99878|.99882 .99886 .99889].99893 .99896 .99900] 0 1 1) 2 2 2} 3 3 4
3.1|.9°032 |.9°065 .9°096 3 6 9/13 16 19/22 25 28
.9°126 |.9°155 .99184 .93211 SeG mers) pele lay 20mg mo
192238 .97264 .92289)| 2, 5) VIO 12 1517 20 22
3.2 1.93313 |.93336 .99359 .9°381 |.9°402 Dee Oe Lia TSiloea see O
| 93423 .93443].93462 .99481 .99499| 2 .4 6] 8 9 11/13 15 17
3.3|.9°517 1.99534 .9°550 .9°566 |.99581 DEES S| Gn oat Olen tou
99596 .9°610].99624 .9°638 .99651}/ 1 3 4! 5 7 8] 9 10 12
3.4 |.9°663 |.9°675 .93687 .99698 |.9°709 .93720 .9°730|.93740 .99749 .99758| 1 2 38] 4 5 6/7 8 Q
3.5 |.9°767 1.99776 .9°784 .99792|.9°800 .93807 .9°815].99822 .9°828 .9°835/1 1 2) 3 4 41 5 6 7
3.6|.9°841 |.9°847 .9°853 .9°858 |.9°864 .9°869 .9°874|.9°879 .9°883 .9°888/ 0 1 1| 2 2 3) 38 4 5
3.7 |.9°892 |.9°9896 .9°90 .9704 |.9708 .9712 .9715 |.9418 .9722 .97250
3.8 |.9728 |.9931 .9733 .9736 1.9938 .9441 .9743 |.9446 .9748 94500
3.9 |.9752 |.9°54 .9756 .9758 1.9959 9961 .9°63 ‘|.9°64 .9°66 .9°670

For negative values of z use ®(z) = 1— ®(—z)


APPENDIX #
635

UPPER QUANTILES z,p; OF THE NORMAL DISTRIBUTION


N(0, 1)

Reproduced, by permission, from Miller and Powell, The Cambridge


Elementary Mathematical Tables (Cambridge University Press).

N(O, 1)

O 2)
636 A CONCISE COURSE IN A-LEVEL STATISTICS

UPPER QUANTILES t)p; OF t-DISTRIBUTIONS t(v)

75 90 95 975 99 995 9975 .999 .9995


20 10 05 .025 O01 .005 .0025 .001 .0005
Sov 50 .20 10 .050 .02 .010 .0050 .002 .0010

wy I| 12271931882) 1163.66 |) 127-3 318.3 636.6


4.303 6.965 9.925 | 14.09 22.33 31.60
8.182 4.541 5.841) 7.453 10.21 12.92
2.776 3.747 4.604] 5.598 7.173 8.610
2.571 3.865 4.0382 | 4.773 5.893 6.869
2.447 3.143 3.707 | 4.317 5.208 5.959
2.365 2.998 3.499 | 4.029 4.785 5.408
2.306 2.896 3.355 | 3.833 4.501 5.041
PRwWNe
OONHFBSS 29262) 2.821) 3-250) 3.690 4.297 4.781
2.228 2.764 3.169] 3.581 4.144 4.587
222015 2-018 3106 | 43-497 4.025 4.437
2.179 2.681 3.055 | 3.428 3.930 4.318
2.160 2.650 3.012] 3.372 3.852 4,221
2.145 2.624 2.977 | 3.826 3.787 4.140
2.1381 2.602 2.947 | 3.286 3.733 4.073
2.120. -}2:583.4 2.921-)|,..3,252 3.686 4.015
2110, (2.567 2.898 |. 3.222 3.646 3.965
2.101 2.552 2.878 | 3.197 3.610 3.922
2.093 2.539 2.861] 3.174 3.579 3.883
2.086 2.528 2.845 | 3.153 3.552 3.850
2.080 2.518 2.831 | 3.135 3.527 3.819
2.074 2.508 2.819] 3.119 3.505 3.792
2.069 2.500 2.807 | 3.104 3.485 3.767
2.064 2.492 2.797) 3.091 3.467 3.745
2.060 2.485 2.787 | 3.078 3.450 3.725
2.006 2.479) 23019) |a-067 3.435 3.707
2.052 2.473 2.771 | 3.057 3.421 3.690
2.048 2.467 2.763 | 3.047 3.408 3.674
2.045 2.462 2.756] 3.038 3.396 3.659
2.042 2.457 2.750] 3.030 3.385 3.646
2.021 2.423 2.704 | 2.971 3.307 3.551
2.000 2.390 2.660] 2.915 3.232 3.460
1.980 2.358 2.617] 2.860 3.160 3.373
1.960 2.826 2.576 | 2.807 3.090 3.291

Reproduced, by permission, from Miller and Powell, The Cambridge


Elementary Mathematical Tables (Cambridge University Press).
The figure shows the form of the distribu- Probability density
tion for vy = 2; the shaded area represents
the tail probability Q. For large v the
distributions approximate to the normal
distribution N(0, 1), shown by the broken
line.

1 tip) 2 Smt
APPENDIX
: 637

CHI-SQUARED TABLES showing


X7sq@ Where P(X > X?<q,) = 0.05
X71@ where P(X > x?,q,) = 0.01

NOTE: Extended tables can be found in J. White, A. Yeats and


G. Skipworth, Tables for Statisticians, 3rd edition, (Stanley Thornes
Publishers Ltd).
A CONCISE COURSE IN A-LEVEL STATISTICS

O€§ =-PZ XVW]0PZ = PX XVI OF = PK XVW | 06 =2PK


es OL=uU 6=U G=u
‘OLS US HP IO] ‘sonyea urez109 ‘ueY} Sse] SI IO ‘spadoxe ,pz yeuy AyTiqeqorg
‘Sy ‘QueTOIJJooo UOTe[eIIOD Yue S,UeWITBEdG UI ,pZ YIM pozeroosse serztpiqeqord Jo equ],
V aTavVi
638
APPENDIX
639

TABLE B
Table of critical values of the Spearman’s rank correlation coefficient.

Significance level (one-tailed test)

TABLE C
Table of probabilities associated with S in Kendall’s rank correlation coefficient, rx.
Probability that S is equal to, or greater than, certain values, for 4<n< 10.

n=5 n=8 n=9

=10 = 28 = 36
.592 548 .540 .000 .000 .500
.408 .452 .460 .360 .386 .431
BAZ, .360 81 .230 281 .364
L117 A .306 .136 2101 .300
.0417 .199 .238 .0681 blo 242
.0083 .138 mS .0278 .0681 190
.0894 .130 .0083 .0345 .146
.0543 0901 .0014 0151 .108
.0305 0597 .0054 0779
.0156 .0376 .0014 0542
.0071 LOZ .0002 .0363
.0028 .0124 0233
.0009 .0063 0143
.0002 .0029 .0083
.0012 .0046
.0004 .0023
.0001 .0011
640 A CONCISE COURSE IN A-LEVEL STATISTICS

N(O, 1)

Q(z)

THE UPPER TAIL PROBABILITIES Q(z) OF


THE NORMAL DISTRIBUTION N(0, 1) Cnre7

A oe Oem:
SUBTRACT

2)
8
8
7
6
6
5
a
4
3
3
2
2
1
x
a
1
8
i
6
6
5
3
3
2
1
q
9
8
7
td
6
5
4
3
2
Ce PH 1
WWwWWW
HPNNNNH
KPHHFRFPRF
NNNWODOOO
OFPKFHND
HPHENNNNWWO
OFR
~1~1
HPP
©
TAHDANA
WHOKKKT
HPPNNDY
PROTOTYPE
HPHPNNW
NWWPRRTMDRMDH

Reproduced, by permission, from Miller and Powell, The Cambridge


Elementary Mathematical Tables (Cambridge University Press).

For negative values of z, use


Q(z) = 1—-Q(—2z)
APPENDIX 2
USE OF THE STANDARD NORMAL TABLES USING Q(z) (see p.640)

Only positive values of z are printed in the tables, so for negative


values of z the symmetrical properties of the curve are used:

n t Q(a) p p Q(a)

= a0 0a a0 0 a
P(Z<—a) = P(Z>a) = Qa)
Be =0)0= 1 PZ a) = 1 Oia)

NOTE: We have Q(—-a) = 1— Q(a).

Example 6.2A If Z ~ N(0,1) find from tables (a) P(Z > 1.377), (b) P(Z < 1.377),
(c) P(Z <—1.377), (a) P(Z >—1.877).
Solution 6.2A (a) t (b) Ll

1.377 1.377
P(Z >1.877) = Q(1.377) P(Z <1.877) = 1—Q(1.877)
= 0.0842 = 1—0.0842
= 0.9158

PiZ.<s— 1p 377) =PAL wel). P21. 377) = 1—Q(1.377)


Q(1.377) = 1—0.0842
= 0.0842 = 0.9158

Example 6.3A If Z ~ N(0,1), find


(a) P(0.345 << Z<1.751), (b) P(—2.696 < Z < 1.865),
(c) P(-1.4<Z<—0.6), (d) P(IZ|< 1.433),
(e) P(Z > 0.863 or Z <—1.527).

Solution 6.3A (a) P(0.345 <Z<1.751)Q(0.345)


— Q(1.751)
0.3650—0.0400
= 0.3250
< Z< 1.751) = 0.3250.
So P(0.345 0
0.345 1751
>
642 A CONCISE COURSE IN A-LEVEL STATISTICS

(b) P(—2.696<Z<1.865) = Q(—2.696)— Q(1.865)


1— Q(2.696)—Q(1.865)
1—0.003 50—0.0310
0.9655
So P(— 2.696 < Z < 1.865) = 0.9655.

?
—2.696 0 1.865

(c) P(-1.4<Z<—0.6) = Q(—1.4)—Q(—0.6)


1—Q(1.4)— [1— Q(0.6)]
Q(0.6)— Q(1.4)
0.2743 — 0.0808
= 0.1935
So P(-1.4<Z<—0.6)= 0.1935. .s:

(d) P(|Z|<1.483) = P(—1.483 <Z<1.433)


= 1—2Q(1.433)
= 1—2(0.0760) AY
= 0.848

>
So P(|Z|< 1.433) = 0.848. 1.433

1.433

(e)
P(Z > 0.863 or Z < — 1.527) Q(0.863) + Q(1.527)
0.1941 + 0.0635
0.2576
So P(Z > 0.863 or Z <—1.527) = 0.2576.
°
0.863
SEO,

Example 64a If Z ~ N(0,1), show that (a) P(—1.96 <Z <1.96) = 0.95,
(b) P(— 2.575 < Z < 2.575) = 0.99

Solution 644 (a) P(—1.96 <Z<1.96) = 1—2Q(1.96) os


=, 20020) eee me
= 0.95

NOTE: this is an important result:

The central 95% of the distribution liesbetween 1.96. _


APPENDIX
643

(pb) 5P(— 2.515


< 2 <2:575 Re 1 — 2Q(2.575)
99%
= 1—2(0.005)
= 1—0.01 0.5% 0.5%

= 0.99 915751000 2.575


Therefore P(— 2.575 < Z < 2.575) = 0.99.

_ The central 99% of the distribution lies between + 2.575.

Exercise 6a (page 337)

Example 6.5A If Z ~ N(0,1), find the value of a if (a) P(Z >a) = 0.3802,
(b) (Z >a) =0.7818, (c) P(Z<a) =0.0793,
(d) P(Z <a)=0.9693, (e) P(|Z|<a)=0.9.

Solution 6.5A (a) P(Z >a) = 0.3802.


i.e. Q(a) = 0.3802 P(Z >a) = 0.3802
so from tables
0 a
@= 0.305
(b) P(Z >a) = 0.7818.
Now, since the probability is greater than 0.5, a must be negative.
Now Q(a) = 1—Q(—a) = 0.7818
P(Z >a) = 0.7818
Q(—a) 1—0.7818
II 0.2182
a 0
But from tables
Q(0.778) = 0.2182
therefore
—a = 0.778

(c) P(Z <a) = 0.0798.


From the diagram it is obvious that a must be negative
By symmetry
P(Z <a) = 0.0793
Q(—a) = 0.0793
From tables
= ec ee se
therefore a mee
644 A CONCISE COURSE IN A-LEVEL STATISTICS

(d) P(Z <a) = 0.9693. P(Z <a) = 0.9693


Now 1—Q(a) = 0.9693
so Q(a) = 0.0307 " :

from tables a = 1.87

(e) P(IZ|<a)=0.9, Pi—-a<Z<a) =0.9


ie. P(—a<Z<a) = 09
From symmetry
1
P(Z >a) = 3 (1 — 0.9)

= 0.05
i.e. Q(a) = 0.05
From tables
Q(1.645) = 0.05
so a = 1.645

_ Exercise 6b (page 338)

USE OF THE STANDARD NORMAL TABLES FOR ANY NORMAL


DISTRIBUTION
We now show how the tables for the standard normal distribution
can be adapted for use with any random variable X where
AN (i, 0).

Example 6.6A The r.v. X ~ N(300, 25). Find (a) P(X > 305), (b) P(X < 291),
(c) P(X < 312), (d) P(X > 286).

Solution 6.6A (a) P(X> 305). X ~ N(300,


25)
First we have to standardise the random
variable X by subtracting the mean, 300,
and dividing by the standard deviation,
X — 300
(s.d.), 5, so that Z = ————_.

We also use the following properties of inequalities:


X—800_ 3805—300
X > 305 > X—300
> 305—300 =>
APPENDIX 645

X—300_ 305—300
So P(X > 305) = pz, oe)
S 5
= P(Z = 1) Standard normal curve

= Q(1) Z~N(0,1)
= 0.1587 s.d. = 1
Therefore P(X > 305) = 0.1587. 4

NOTE: if the two curves had been drawn to scale, the curve for
X would have been much more spread out and not as steep as
the curve for Z. However, for convenience of drawing, we use
the same sketch.
Often, again for convenience, we draw
one sketch and write the values of the
standardised variable underneath the x
values. We use the abbreviation S.V. for 300 305
‘standardised variable’. SV. OWT

X—300 _291—300
(b —
P(X
< 291) = P Wiese ua

= P(Z<—1.8)
me tLe) 291 300
= 0.0359 SV. —18 0

Therefore P(X < 291) = 0.0359.

X—300 _312—300
(c) AUS Scala erases aecoremem
= P(Z<2.4)
= diaiet?.2) 300. 312
= 1—0.0082 S.V. 0 2.4

= 0.9918

Therefore P(X < 312) = 0.9918.

ae 788— 300)
(d P(X > 286) i

5 5
P(Z > —2.8)
1—Q(2.8)
= 1—0.00256 SV. -28 0
0.997 44
Therefore P(X > 286) = 0.997 44.
646 A CONCISE COURSE IN A-LEVEL STATISTICS

Example 6.7A Ther.v. X is such that X ~ N(50, 8). Find (a) P(48<X < 54),
(b) P(52 <X <55), (c) P(46 <<X < 49), (d) P(|X—50| <4/8).

xX— 50
Solution 6.7A Standardise X so that Z = Cae

(a) P(48<X<54)
(
== P 880
V8
<AJ8 J8
= P(-0.107 <Z<1.414)
= 1—[Q(0.707)
+ Q(1.414)] ea
= 1—(0.2399
+ 0.0787)
= 150.0886 ee
= 0.6814 S.V. —0.707 0 1.414
Therefore P(48 < X < 54) = 0.6814.

52-50 X—50 _ 55—50


P(52<X <55)
(b —

Ve eas rs
P(0.707 <Z <1.768)
= Q(0.707) —Q(1.768)
0.2399 —0.0385
0.2014 5052 55
0S
Therefore P(52 << X < 55) = 0.2014. =
° 1.768

46—50 —X—50 _49—


(c) P(46 < X < 49)
VEL nig aun
P| : =|

P(—1.414 < Z < —0.354)


s.d.=/8
= Q(0.354) —Q(1.414)
0.8617 — 0.0787
0.283 S.Va Stes
Therefore P(46 < X < 49) = 0.283.

(d — P(\X—50| <\/8) = P(—/8


< X—50 <v/8)
X— 50
p(-1<< = q
VJ8
P(-1<Z<1)
1—2Q(1)
1—2(0.1587)
= 0.6826
Therefore P(|X—50| <1/8) = 0.6826.
7
APPENDIX
647

Example 6.8A The time taken by a milkman to deliver milk to the High Street is
normally distributed with mean 12 minutes and standard deviation
2 minutes. He delivers milk every day. Estimate the number of days
during the year when he takes (a) longer than 17 minutes, (b) less
than 10 minutes, (c) between 9 and 13 minutes.

Solution 6.8A Let X be the r.v. ‘the time taken to deliver the milk to the High
Street’. Then X ~ N(12, 27).
7 Xe
We standardise X so that Z = :

Keane — 12
(a) (P(X 1 s=— Pi > Ed, = 2
2 2
= P(Z > 2.5)
12: cath
=F O25) Se 0 25

= 0.006 21

The number of days when he takes longer than 17 minutes


= 3865(0.006 21)
= 2.27
~ 2

Therefore on approximately 2 days in the year he takes longer


than 17 minutes.

Me Psat aaah
(b) P(X <10) = iS
2 2

r tot) (@) 12
— Q(1) SV 0)

= 0.1587
The number of days when he takes less than 10 minutes
365(0.1587)
= 517.9
~ 58

Therefore on approximately 58 days in the year he takes less


than 10 minutes.
648 A CONCISE COURSE IN A-LEVEL STA TISTICS

O19), 6X12 dalSith?


(c) P(9<X<13) = P <——<
2 2 2
II P(-1.5 <Z<0.5)
1—Q(1.5) —Q(0.5)
1—0.068 — 0.3085 6051213
\| 0.6247 SV. —-15 005

The number of days when he takes between 9 and 13 minutes


= 365(0.6247)
= 228 days
Therefore on 288 days he takes between 9 and 13 minutes.
a
Exercise 6c (page 342)

De-standardising
Sometimes it is necessary to find a value X which corresponds to
the standardised value Z. We use Z = a so that X = ut+oZ.

Example 6.9A If X ~ N(50,6.8), find the value of X which corresponds to a


standardised value of (a) —1.2, (b) 0.6.

Solution 69A Now X = u+oZ, where u = 50 and o = V6.8, so that


i= BO 4/657.
(a) when z = —1.2, (b) when z = 0.6,
x = 50+/6.8(—1.2) x = 50+/6.8(0.6)
= 46.87 (2d.p.) = 51.56 (2d.p.)
Exercise 6d (page 343) =

Example 6.10A If X ~ N(100, 36) and P(X >a) = 0.1093, find the value of a.

Solution 6.10A As P(X > a) is less than 0.5, a must be greater than the mean, 100.
Now P(X >a) = 0.1093

so pA 5s) = 0.1093 Scar ie ee


6 6
| oe. 100 a
i.e. P 4? a = 0.1093

a—100
We have aa = 0.1093
APPENDIX 649
But from tables,

Q(1.23) = 0.1093
a—100
Therefore 1.23
6
a 100
+ 6(1.23) = 107.38
Therefore, if P(X >a) = 0.1093, then a = 107.38.

Example 6.11A If X ~ N(24,9) and P(X >a) = 0.974, find the value of a.

Solution 611A As P(X >a) is greater than 0.5, a must be less than the mean 24.
Now P(X >a) = 0.974
Gaara! a—24
so P >. —| — 0.974
iS 3
Qn 24
i.e. p(z> = 0.974

a 4
Now Aga must be negative and

e e al
ee ana
5
24—a
sO 1-9[=—*) = 0.974
=

(A=")
‘| = 0.026
3

But, from the tables, @Q(1.943) = 0.026


ad
Therefore |3 |- 1.943

a = 24—(8)(1.943)

= 18.171

Therefore, if P(X >a) = 0.974, then a = 18.171.

Example 6.12A If X ~ N(70, 25), find the value of a such that


P(|X—70| <a) = 0.8. Hence find the limits within which the
central 80% of the distribution lies.
650 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 6.12A P(| X10 Sa) aero


Therefore
P(—a <X—70 <a) II 2 00

xX—70
Zac <¢| 5
5 5
a a
p(-£<z<2| = 0.8
5 5
Now, by symmetry
a 1
PZ | oie 0)
5
= 0.1
and from tables, Q(1.282) = 0.1

a
Therefore 5 = 1.282

a = 6.41

So P(—6.41 <X—70<6.41) = 0.8


or P(63.59 < X < 76.41) = 0.8
The central 80% of the distribution lies between 63.59 and 76.41.

Exercise Ge (page 345)

PROBLEMS THAT INVOLVE FINDING THE VALUE OF OR o OR BOTH

Example 6.13A The lengths of certain items follow a normal distribution with mean
jucm and standard deviation 6 cm. It is known that 4.78% of the
. items have a length greater than 82 cm. Find the value of the mean

Solution 6.13A Let X be the r.v. ‘the length of an item in cm’.

X ~ N(u, 36) and P(X > 82) = 0.0478.


Now Xan
P(X>82) = p(==#> 2=e)
6

ee p(z> 82— “| S.V. 0 1.667

82—p
= 9 6 |
82—p
so Q = 0.0478
APPENDIX : F 651

But from tables

Q(1.667) = 0.0478
82—
fo Se ce
6
82—p = 10.002
w= 72 (28.F,)
The mean of the distribution is 72 cm.

Example 6.144 X ~ N(100, 0”) and P(X < 106) = 0.8849. Find the standard devia-
tion, o.

Solution 6.14A P(X< 106) = 0.8849


X—100 106—100
pfx <3 = 0.8849
o o
100 106
V. 6
p(z<*| = 0.8849 Si ; oO
o
6
p(z>*|= 1—0.8849

= 0.1151

6
a(t |= 0.1151
o
But from tables

Q(1.2) = 0.1151
6
Therefore oer
oO
6
6 2=
1.2
= 5
The standard deviation of the distribution is 5.

Example 6.15A The masses of articles produced in a particular workshop are


normally distributed with mean yp and standard deviation o. 5% of
the articles have a mass greater than 85 g and 10% have a mass less
than 25g. Find the values of p and o, and find the ranges symmetrical
about the mean, within which 75% of the masses lie.
652 A CONCISE COURSE IN A-LEVEL STATISTICS

Solution 6.154 Let X be the r.v. ‘the mass, in g, of an article’. Then X ~ N(u, 07)
where yp and o are unknown.
Now P(X
> 85) = 0.05

ie. pe >Se)
an
= 0.05 So
5%
9

oO 0
yb 85
P 2> 2H = 0.05 S.V. 0 1.645
0
oe
0 (4) = 0.05
0

But from tables


Q(1.645) = 0.05
85—p
Therefore aa Se E645
0
85—p = 1.6450 (i)
Also P(X
< 25) = 0.10

pX=# <8—#) On 10%


0 0
— 25 bu
plz<™ 4 = 010 S.V. —1.282 0

Zoya,
But is negative, and by symmetry,

@|-(Zoe 4) = 0.10
oO
From tables @(1.282) = 0.10

Doe ate
Therefore = = 1.282
oO
i.e, w—25 = 1.2820 (ii)
Adding (i) and (ii) we have
60 = 2.9270
o = 20.5 (8S.F.)
Substituting for o in (ii)
bm = 25+ (1.282)(20.5)
= 51.3 (35S.F.)

Therefore the distribution has mean mass 51.8 g and standard


deviation 20.5 g.
APPENDIX 653
Now consider values a and b such that 75%
s.d. = 20.5
Pax X<b) = 0.75
12.5% 12.5%
and a and b are symmetrical about the mean.
Now P(X>b) = 0.125 Bieta.
Ase spDiets | Pn
20.5 2050) 5a
6F-51°3
so Oh 01
20.5
but from tables Q(1.15) = 0.125
b—51.3
therefore pe = A1
20.5
b = 51.3+(20.5)(1.15) = 74.9 (38S.F.)
From symmetry a = 51.8—(20.5)(1.15) = 27.7 (38S8.F.)
Therefore, the central 75% of the distribution lies between the limits
27.7g and 74.9 g.

Exercise 6f (page 349)


ANSWERS
CHAPTER 1 Some (aad (b) 14 (c) 17
(d) 5.4
Exercise 1a (page 10)
A 10, 9,11, 13,14,19,13,9,3 Exercise 1f (page 28)
2. 4,6,7,13,10,5,5 1. Approx 33 mins
For Questions 3-9, heights of rectangles are 2., 61.5¢
in the proportions indicated: 8. 51km/h
3. 6,6,7,4,3 4. 0.56cm
3, 20, 30,42,12,18,3 5. 59 marks
2,6,12,14,8,4,1 6. 687.25 hrs
62, 40, 88,100, 112,60,12,3
11, 18, 22, 24, 28, 24,16 Exercise 1g (page 32)
ye oa
TP 12,30, 65, 48, 20, 10
OCOIMHP 1. (i)(a)61 (b)52 (c) 73
(ii) (a) 8 (b)7 (c)10
Exercise 1b (page 14) Zoe (a) a6 (b) 3 marks
8. (i) 46 (ii) 29, 9%
66., 60", 45°, Te 30°, 84° 4. 6.6, 4, 9.25
27.4°, 56.7" c54to 160.8° 5. (i) 153mm (ii) 15 mm
(i) 660 km? (ii) 4°, (ili) 2700 km? (iii) 11%
Radii in the ratio 7. 7: 6.7 6. (i) 96.5 mins (ii) 5 mins
66, 156°, 24°, 42°, 72°; 5.5 cm, 6 cm;
Sia (iii) 61 approx
50 1. 437.5, 418, 455
Radii in
i the ratio 20. oaes 5: 26.8 8. (a) 46 (b) 24
eee 67. 5°, 57. 5, 135°; £6; 72° ele 2°, (c) 80
ae 8°
9. (i) 57 (ii) 71.5 (iii) 32%
10. (a) 135cm (c) 176, 162,170,14
Exercise 1c (page 16) Lie (a) eas (b) 4.4 cm
Ratio of standard frequencies (c) 285 approx
3. 6:14:22:10:6:3
4. 87.5:470:535 :280:59
Exercise 1h (page 37)
Exercise 1d (page 21) 1 (a) 27 (b) 412, 485
(c) 4,6 (d) none
1. (b) (i) Approx 74% 25 4
(ii) approx 50 marks 3. (a) 108 (b) 7 (c) 32; 101
2. (a) Cumulative frequencies 2, 4, 7,13, 4. Approx 45
25,41,47, 50 5. (a) 659.5 <x < 669.5
(b) 24 (c) 26 (d) 23 (b) 4, 9, 21, 45, 60, 70, 77, 80
(e) 2, 2; 3,6, 12516; 6; 3 (c) (i) 697.5 (ii) 723.5
(a) Cumulative frequencies 3, 5,12, 30, (d) point of inflexion
48,51, 52
(b) 21 (c) 14 (d) 62kg
(b) 84% (c) 6.5 Exercise 1i (page 39)
(d) 1,1,3,5,9,19,5,3,2,1 (i) 9.7
ry (ii) 154.8 (iii) 51.375
Exercise 1e (page 24)
(iv) 17753 (v) 0.908 (3S.F.)
21
L: (a) 9 (b) 207 (c) 1896 19
(d) 0.55 8
2. 4 NoR7
wo
654
ANSWERS
i 655
Exercise 1} (page 42) (i) ute, 0 (ii) ku, ko;
it (i) 4 (ii) 29.54 (iii) 122.82 a=%, b= 22
(iv) 18.625 (v) 109.4 (1 d.p.) (a) f(x) = 2x +3 (b) 5,123
1

12:4,3 (c) 18,493 (d)26 (e) 644


(b) 15,7;1,2 (a)2 (b)200 (ce) 2.02
rece (a) 733 (b) bimodal 8,10 (d) —4, —1, 2,5,8,11,14
(c) median 8
49.3 Exercise 1p (page 59)
58.95 1. (a) 7.6,3.14, (b)30.4, 6.76
6,2;15 (c) 13.65, 3.02
45 (2S.F.)
cara
2. 15.6, 7.66
3. (124, 1.83
Exercise 1k (page 45) 4. 25.9, 1.99
1. 35
2. 2328 Exercise 1q (page 62)
3. 75.6 (38.F.) 1603517
4 (a) 33.3 (3S.F.) (b) 28.9 2. 115.8 (48.F.), 7.58
(c)_ 7.1 3. (a)441.23 (b) 7.85, 3.07
465 fc) Bh, 4.26, or (4) 341,134
mn 31.7 (3S.F.) (e) 16.04, 7.01 (f) 10,1.44
4. 28.15, 3.84
5. 159
Answers are given to 3 S.F. where appropriate 6. (a) 294.55, 28.15, 3.84
in Exercises 1] to 1q (b) 10, 7450,4 (c) 500, 5450, 450 |
(d) 159, 5.3, 2.47 (e) 12, 300,5
Exercise 11 (page 50)
ie (b) 8.5, 1.80 Exercise 1r (page 65)
(a) 5,2
(c) 18.8, 6.46 (d) 102, 4.10 1. (a) 3138.76,5.19 (b) 42.6, 13.2
(e) 3.42,1.91 (f) 205, 3.16 (c) 1954, 348.4 (d) 17.1, 8.18
3.74 (e) 1.02, 0.507 (£) 3241.95 6871:
29, 5.9 d 2. 61.5,18.6
5.10 Seu cine OOnl 204-4, 92.5
5 AR leataeoa
6,4
10.5, 5.77
eee Exercise 1s (page 72)
n?—1 1.(a) 180.5 (b) 175.5;
co 3(n
+1), 12 second part (a) 187 (b) 189.5
2. (i) 2,0, 1,0, 4,4, 8,5,8,14,12,5,0,0,1
(a) 121, 6.19 (b) 14, 1703.8
(ii) 34.4 (8S.F.), 7.88 (3S.F.)
(c) 1716, 3.59 (d) 1026, 58770
(iii) 74%
10. 5,V7.5;5,11 2 min 38 secs, 1 min 54 secs,
2 min 18 secs, 1 min 25.5 secs,
Exercise 1m (page 52) 2 min 59 secs
Answers as for Exercise 11 86.6, 44.1, N= 188
(i) 4,8 (ii) 6; mean = 7, n= +6
16,6 (i)5.86(38.F.) (ii) 15,7
Exercise 1n (page 53)
ra
oe
Re3—4 (i)3.25 (ii) 2.2
1. Each multiplied by 3 (iii) approx 41%
3. (a) 6, 2.14 (b) 516, 2.14 oo (b) Mid-point is representative of
(c) 78, 27.8 interval (i) 11 (ii) 101
4. (a) 4,4; 7,4; 40, 400; 43, 400 6 hr 14 min, 13 min; 6 hr 18 min, 16 min
(b) dj = 10a; +3 63; intervals 30-, 40-, ... frequencies
5. 17,8 178.7146 22032, 35, 32, 20, 16,38;3,5, 2;
63.05, 11.3 (3S.F.): normal (see
Exercise 10 (page 55) chapter 6)
11. (i) 100.7, 14.5 (3S.F.) (ii) approx 73
ve (i)a=3, b= 22 (ii) 70 — (iii) 76
n
2. (i) 38,8.99 (3S.F.) (ii) 34,77 12. (a) 5,6, 4.07 (3S.F.) (b)
a=0.8, b=—5;6.25 mtn
3.
S
656
A CONCISE COURSE IN A-LEVEL STA TISTIC

13. (a) (i)M+k,o (ii) PU, PO; 8+ 5, 30; Exercise 2b (page 88)
(b)a = 1.6, b= 10 (a) 3
1
(b) 3 (c)
14. 0,1; better in algebra au
15. 44.5, 51.75, 64, 40.5; a= 0.89(2S8.F.), ee 4
, oO Teves (a) 17 (b) si (co). 7a
16. (i)17—22;19.2 (ii) 20.2 5
(d) 77
(iii) 6 (iv) 20.6 (3S.F.), 8.19 (3S.F.) i
3
17. 34.9, 32.7, 186.5, 13.7; 61% 2
18. (a) 51.5 (b) 52 (c) 50 or 54 5

(d)51 °-(e)6 = (f) 57.5 0.4


(g) 109 0.7
19) 13,841, 0,033,814 5300; d= ear
b=2
Ca
Sr
Or
es (a) 36 (b) é (c) is
20. (a) 0.785 (3S.F.) (b) 4.44 (3S.F.) (d) 77
21. (b)(i)7 (i) 1;x=5,y =9 9. 3
92.
23.
“6.49, 1.71 (3S.F.);7
(i) 17, 4 (approx)
10.
;
(a) 36
11
(b) 36 (c) §
(ii) 17.85, 5.57 (8 S.F.)
Exercise 2c (page 94)
24. (a) 20.1,5.7 (b) 0.46
(c) 18.4- _
25. 35 years 1 month, 11 years 3 months
(b) 10 ;
(a) Approx 33 years 10 months
(b) Approx 17 years 10 months (b) 16 (c) 6
(c) Approx 65.6% (b)& (c)ts
26. Taking mark intervals 0 < mark < 10, (a) 7 (b)gen gleias:
ete.
(c) 40.4,15.4; a= 24 (2SF.), (a) 4s (b)&. (e)3
b = 0.65 (2 SF.) (a) 3 (b) § (c)3
27. £195.45, £14.12
28. (b) 5.21 (3S-F.), 2.70 (3 S.F.) (a) 3 (b)0

29. 36.7mm, 15.5mm, 35 Rw


APA
Co (a) 38 (b)3
30. 11.87, 0.80 (a) 70 (b) No
31. (c)5,1 (d)4.9,1.15; 4.86, 2.84 Yes
1

ee
eee
Oo
;
(a) 16
ane
a)
(b) 4 (c)
i6
CHAPTER 2 (d)3
Exercise 2a (page 82) Exercise 2d (page 98)
1
1 a
1. (a) 3 (b) 11 (c) 3i
2. (a) 33 (b)ze — (c) 33 2.
3
(a) 3704
1
(b) 76 (c)3
25

8. (ada (b) 4
1
(c) io (d)T69
4
(d) 19
4. (a) (b) 8
5
(c) 3 (a) 3 (b) 3 (c)
(a) 0.0025 (b) 0.095
(cy (e)) 1
1 (a) to (Dea (e) 30
5. (ais (b) (c) (a) 0.15 (b) 0.65; No
(d) 3 _ (e)4
(a) 4 (b)é
6. Ts ‘
3 ea (a)
eee
Os (b) Not independent
1. (ay 10 (b) 4. : il
1 10. (a) 4 (b) iz
8.i (a) 2 (b) 31 (c) 4 11. ee
16-
(d) 8 (e) 2
9. (a)iz — (b) 0 (c) 4 Exercise 2e (page 99)
1Qin(A) 1211 (deal ~ cede 0.4
(d) 8 (a) 0.24 (b) 0.42
1
11. (a)3% (b) (ec) 0 (a) a1 (b) 3 (c).%
(d) 4; t=6 orl2 9
14
ANSWERS
657

5. (a) (b) 4
6. 0.008%; 0.625
64. 4555
1; (a) 8 2
(bb) 350 ¢| (e) 3
Exercise 2f (page 101) (d) 47
4
Ly
a
30 84 (a) a (b) 7s (c) io
2. (a) # (b) 0 (43 €e) rc
(c) ig; A and B, A andC, & O:wenGh)(iane ee Gis (iii)§
Sis
3 (d) (i)3Bey
(ii)5 id

4. (a) 0.02 (b)0.45 10. (ii)%


11. Machine 1
5. (a) 0.5 (b) 0.35 (c) 0.375
(d) 0.4 ;
6. (i) 0. 02 (ii) 0.78 (iii) 0.76 Exercise 2j (page 123)
(iv) 30 1. (a) 0.763 (3S.F.) (b) 14
7. (a)$e (b)ixs (ec) $6 2. (a)5 (b) 6
8. (i) 2A (ii) 0.5 (iii) 0.52 3. 5,6
9. 3, ae 6; 2. (a) 0.04 4. 0.999 (3S.F.)
5
(b) 0.6225 (c) 0.1825 5. it
Gass
Exercise 2g (page 107) 742% ) (jan
1

Pape (ica (Sap (c) 77763 Ti


25
2. 7
Exercise 2k (page 130)
3. &
1. Oba
aemncaa, (b) 3 (c) %%
2. (a) 6! (b)4
(djmee- (eo) as 3. (a) 4!9! (b)=
5. (b)(i)% (ii) 0 4. 4
(c)(i)H and R (ii) HandS
6 (eit Wos 5. (a) 8! (b)+
A
77 (aie (ia. “*Guyate iv) & 6. Te

ae (by Gi) ionaee =Gl) a7 7. (a) apr! (db)&1


3
9. & is i
Exercise 21 (page 138)
Exercise 2h (page 115) 28

ray(b) 35 (dis 2. 143


(ii) &
49
PA (a) (i) 32 (b) 353
6

32 (a) 35 (b) 37
bat ee 7)
o
Aaya
he
(b) 7
2
(c) 30.
1

4. (a) (i) a7 (ii) §


ais
(iii) 37 6. (ii
P
(ii) 5
SERS
(iii)30
noone

(b) (i) #7 (ii) 28


oss?
(ili) 34 6. (i) 65 268 (ii) 4263
(a) 25
7. 510
8.
(a) & 9. 4608
(a) (c) 3 10. (a) 1260 (b) 2520
(a) 4 11. (a) 420 (b) Boys
44
252, Girls 462
Soe
a
aS (a) 0.34 (b) 0.063 (c) 120 (d) 733
(c) 0.19 (d) 0.97; 3 white 12. (a) 2.5X10~. (b) 3193 344
Exercise 2i (page 119)
ieee (b)§
14.” 130
ie (Ey % (b)(i)3 (ii) 25 15. (a) 360 (b)6 (d) 12
(a) iz (b)§ (e) 1170 7
16. (a) 64 (b) 18 (c) 32
(a) 0.66 (b) #7 17. (a) 9! (b) 3 (c) 1260
aoN (a) 0.024 (b) 0.452
(c) 0.496
(3SF.)
5.
oe
a Set ise, ae 26-64 age group 18. (a) 75
(4)
3
(c) 181
456 (d) (i) 6! (ii) 72 ©
658 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 2m (page 158) (b) (i)4 (ii)73 (iii) 5 : (c) 108


1. (a) (i) (ii)3 (b)3¢_—(€) 294 32. (a)73 (b) 35 (c)7
2. (a) B+5(a—B)(a+ 48) (d) 6 (e)3
3. 0.005 99, 0.987 (3S.F.) 83. (a)@ (b) #3 38
(c)
4 (a) 2 (b) 3 (c)& (d) 25 (e)35
(d) 4 (e)is 34. (ptp*+p(itp tp)
(a) 6720 (b) 8106; 3 35. 1—4pt+6p*
hia (a)No — (b) No 36. %,343, 0.617
7. (a)oS (b)#5 (c) 243 37. (a) 1/3" (b) 16/3° (c) 593/37
(das (d) 784/3°
“8. (a) (i)% (ii)4 (iii) (iv) 1534 (b)30 38. (a) (i) 120 (ii) 12600
i812) 10 10
(c) (i) 35 (ii)Ste
ae 9) Rae.
(b) 35> 35> 35> 35> 24

9. 0.59; (i) 0.352 (ii) 0.4576 39. 0.336, 0.452, 0.188, 0.024; 0.9
(iii) 0.480 64 2
40.. (a) (b) 0.0546
10. (a)% Ole (c)
i 4i o Cis
(ay POS erCe
(A)& (b) 0.355 (3 d.p.) (c) 0.920 (3 d.p.)
42. (a)38 best (c)3
135 70
aa (b) 4 hn
(d) 4 (e)3
2

12. Wa) 3 eet 43. (a)(if)3 (ii)


(4)
ias 1 6
s,
2 (iii)F (i
I
v)
No, no
13. 0.624 (b@) yar divas
wanbee

14. Fizaieii; (a)No. (b)No


Leese
44. (a) 48 (c) 63, 32
45. (a) 0.45, not
15. (ae (i)% (ii (b) (i) 0.33 (ii) G
(b) 0.013 824 (c) 3,aa
6. @Os Wn, Ga
16. “aja, (by3 (c)5 (iv) 5
Cony WO wes (b) (i) 0.0303 (ii) 0.450
(e) ts (iii) 0.0348
17. (i)0.36 (ii) 0.6875 47. (a) 0.7, 0.68 (b) 0.28
18. (a)$ (b)5 (c) 0.656 25
19. (a) (i)toso (ii) 35 (b) (i)25 48. (a) (i)% (ii)73 (iii)3
(ii)333as (b)
20. (a) 0.12 (b) 0.184 = (c) 0.82 49. (a) 0.88, 0.05
(d) 0.25 (b) (i) 6.346 (ii) 0.476
21. (a)% (b) 3; P(D) = 0.0325, 50. 4, no, 0.1
P(CAD) = 0.025, P(C|D) = 3 51. (a) 0.096 (ii) 0.156;
22. (a) 0.875,35 (b) a 52. (a) (ii) 0.43, 0.67 (iii) 32
23.
1
(a) 3 (b) 4
1 3
(c) to (b)(i)ais. (ii) |
(d) 24; No, no 53. (a) ior (b) (1—p—p’
+2pp')”
25 27
24. (a) 316: 216 (b) 0.5177, 0.4914 (c)§
(c) 0.6651, 0.6186 54. (a) 0.071, 0.929 (b) 0.600, 0.572
5. (ae . (bs (c) 799
(d) ia CHAPTER3
26. (a)zr © (b)gop,aa)i01s Exercise 3a (page 170)
27 Halis <tb)a
1. (a) x 0 1 2
28. (a)as (b)ss (c)s
Pixexy 4") a
1 1 ji
(d) 4 (i) Yes, no (ii) No, yes
29. 70 (a)55 (b)30 (ce) 65 0 al
COOL 8ar (b) P(
b) P(X = x) =a6°° 206 2k Sa
=2,...

30. (a)i3 — (b) 0.0481 (3S.F.) 13— x=8,...,12


P(X =x) =———~,
31. @in Gh Gis 36.0 °t
ANSWERS
659

(c) x 0 1 2 4.7 (at (b)tr — (e)#


ax=n |e 2s 5.
(d) 1647
(a)3.5 (b)14 (c) 5.5
(d) P(X =x)=0.1,
x =0,1,...,9
(d) 84 (e) 95
(e) x 1 6. (a)2 (b) 30r—3
3
8 Exercise 3d (page 188)

(f) P(X =0)=4, (xX =x) =S—*,


18
1.
2.
(a6
(a)5
(b) 22
(b)2.5 (c)10 (d)10
1
ee 2. en D 8. (a)4.2 (b)
73 (c) 3.67 (3S.F.)
39
1 ae ayaa (b) 33 (c) 1535
6
(d)2en may (e) 475
(a) x 0 1 2 3 5. (a)13 (b)397 fo) (c)8
P(X =x) | 0.216 0.432 0.288 0.064 6. (a) 21.34
(b) 0.648 rds x 0 1 2
PES ay 5 ONE
x 0 1 2; 3
@a cubs . Se). e.(aae
Pe aie es ce 8. (a) —0.7 5. (b)3.5 (c)3.01
1
5
9. P(X=x) == or
45mae
7 x Sel 2s.
——
5,9

Exercise 3b (page 177) 33, 2.21 (2d.p.), 1;


1. 23 pix taex)=en (5
a \eee (5).
1 =eS 1.2...
sey :
8. (a) 0.82. (b).2.9 10. (a) (b) 0 (c)6
4, 1 (d) 2.45 (2 d.p.)
5. 0.5 11. (a) 0.04 (b)5 (c)4
6. (d) 7 (e) 16
fe OMe
Exercise 3e (page 191)
8. «x 10 20
P(X=x)|04 0.6

9. (a)03 (b)0.2 F(x)


105 5% ao) 69) Seo ett eet (b) 128 4k be
P(X=<x) | 0.16 0.382 0.16 0.16 0.16 0.04°

loss of £1.20 Fig) Weteages 49 § 35 1


11. (i) £8(7+x) (a)5 (b) Loss
of£3.75 (c) «x owls 2 3
12. 7 L(x) (eer Oo jest
13.
14. (a) 24 2° ¥ 0.1 0.2 0.3 0.4 0.5
(c) x ONstar 2a 3 4 P(Y=y) | 0.05 0.3 0.6 0.75 1
Peas 4 0 & 3. (a)0.41 (b)0.87 (c) 0.46
(d)1 (d) 0.13 (e) 2.58
15. (a)0.2 (b) 2.08 4. ase aa fae 8* 0.9724
16. (a)2r —(b)4 (c)5 P(X=x) | 0.01 0.22 0.41 0.22 0.14 °
(d) a3 (e) 1 (f) 3, 0 Bu (a)e ae(b)S
is7 226
gO2 bw
eae (c) (X =x) =~ —, x =1,2,3
(d)>
Exercise 3c (page 182) 6 oe nbs
dn (a) 2h 00(b) 5-9 (ciP(X =x) = 3, x= 12,3
3. (a) 3.5" (b) 15g | (c) 14.5 (d) 0.816 (3 S.F.)
A CONCISE COURSE IN A-LEVEL STATISTICS

6 '
12. x Io 0 3

P(X =x)} p> 3p%(1—p) 3p(1—p)” (1—P)


450p, 30p

13. y 0 i 2 3 4
Exercise 3f (page 203) P(Y=y) | 0.09 0.24 0.34 0.24 0.09
1, ia) Vote 01, 08 4
z 0 1 2 3
(b) x+y 17 52) Sign dcr go J

P(Z =z) | 0.447 0.232 0.222 0.072 0.027


P(X+Y=x+y)|0.12 0.14 0.32 0.2 0.18 0.04
abel
(e) x-y eee One eee 2S
P(X—Y=x—y) | 0.12 0.14 0.32 0.2 0.18 0.04
14. (a) a3 (b)2, 73
15. (a) 0.01 (b) 3.54, 0.4684
2. (a) 26 (b) 15 (c) 17 (c) 14.7, 11.71
(d) 59 (e) 59
i! 5 2 16. (a)? (b) —0.24p
3. (a) 2 (b) 72 (c) 23 . (c) 3.34
p?(2 d.p.)
4. (a) 0orl12 or —12 (b) 294
5. (a)1 (b) —1 (c) 34 ie orale!
(d) 14 (e) 14 (f) 30 18. aia Wa GWE WE
6. (a)(i)7 (i) (b)()O (ii)F
5 ene s PENS.
19 edi AES, wubess
0 Zita &
7. (a) 1.2, 0.36 (b) 0.09 ima apaitn Me
(c) 2.4, 0.72 (d) 0.3 Spott sueseate
(e) 2.421140 © 20. (i337 (ii) 2.7 8
(3S.F.)
8. (a) 2.6, 0.24 (b) 5.2, 0.48 (iii) 0.260 (3S.F.)
(c) 7.8, 0.72 1 12 1
Po;
a8
(a) 705
2%. 3(n +1), a(ne 1)
9. 29%
10. (a) 0.1 (b) 3 (c) 1 (b) 16
(d) 0.2 (e) 12 (f) 3 22. (a)
x 1p een 5
P(X = x) | areas is
Exercise 3g (page 205) y 2 3.4 5 6 a

1. 6.25 im|ma
P(Y=y) Ye3 36
2. 23,2 y 8 9 10 a! 251
5 11 1
Ge (2)"36 (b) 36 (ec) 36; — Ge, @ PYY=y) 5 ae
4, x Gunna Coie all 12,,20
23. 3%, 3.5, 1.25
Pxk=x)lq 3 43 &
0.975 (3S.F.), 0.640 (3S.F.)
5. (a) ()3--(l) 19>
ae ol! mee
(b) fa CHAPTER4
1 pie? (3. 2 Answers are given to 3S.F. where applicable.
(c) 1216 d) 5
(d) [= =
5 (e) 25
35 1 1
Exercise 4a (page 214)
6. Te; (a) 2 (b) iz
(a) 0.0823 (b) 0.680
@ A) }.3, 21.50 (a) 0.209 (b) 0.0168
‘i r—!1 i 3

= =)-.= $1.
(c) 0.008 52
1.
16) .32
(a) ainsi, 619.8m 81
024 Ou ok
(b)—50p (a) 0.531 (b) 0.000 055
8. (a)1,2 (b)2,35 (ce) 11.2, 7.28 (c) 0.984
0.002 00
t 0 1 a 3 4 0.891
P(E
= yn eae ase see
0.5
115> 234 1 4 68 (a) 0.0808 (b) 0.428
9. 3» 45» 75> 39 45 0.0819
10. P(X=x)
=, x =1,2/3/4,5; (a) 0.329 (b) 0.461
0.0962
P(Ko=16)'=,0, P(X
x)= ge, 4
x=17,8,...,12:45,& 68
11. «x 713 4 506% 8°99 5
= Ce UN al
SleTeen
9
P( X=")! ||368 io akc GunGm Gancom (a) 0.0563 (b) 0.000 416
5%, 0.001 37 (3S.F.) (a) 0.267 (b) 0.000 144
ANSWERS 661

Exercise 4b (page 217) 7. (a) (1—p)*(36p?+


8p +1);
Le 2.5,1.5 (1—p)°
+ 5p(1 —p)§(1 +4p)
0.844 (b) 0.678, 0.630, 0.0547, 0.0605
8, 1.30 8. "C,(1—p)"—"p’ (a) 0.1296
(a) 0.2 (b) 0.005 51 (b) 0.1792; x oi 2
(a) 0.25 (b) 2.5 (c) 0.282
0.1, 0.23 (2 d.p.) PX=x)]5 3 4
ul)(a) 0.68 (2S.F.) (b) 8,1.6 0.4816
NHS
oO
Ob 5)4 9. (a)4 (b) 0.0424
10. (a) 0.4 (b) (i) 0.4516 (ii) 1.8
Exercise 4c (page 221)
te (a) (i) 0.9830 (ii) 0.0170 Exercise 4g (page 241)
(iii) 0.0015
(b) (i) 0.1596 (ii) 0.2660 1a (byGi)36 | Cilia (ce)
(iii) 0.5044 (iv) 0.9004 9.49
(c) (i) 0.0037 (ii) 0.0037 3. (a) 0.128
(iii) 0.2916 (b) P(X =r) = (0.8) — (0.2) Geometric
(d) (i) 0.5551 ~— (ii) 0.0706 (c) 0.512; 10, 40, 0.0768
(iii) 0.9294 (iv) 0.3114 4. 3,37, 28 0.00026,5
i 0 1 2 3 5. (a)(i)s
(ii)ae (ili)2
P(X =x)}0.0467 0.1866 0.311 0.2765 (iv)1 (v)6
0G 4 5 6
(c) 17
6. 0.0047, December 22
P(X =x)|0.1382 0.0369 0.0041 7. (a) 0.504 (b)0.432 (c) 0.5904
ie 0 1 2 3
(d) 44
P(X =x)|0.0053 0.0487 0.1812 0.3364 ee a
p
x 4 5

P(X = x){0.3124 0.116 Exercise 4h (page 245) ¢


NOTE: Answers are given to 3S.F. but all —
Exercise 4d (page 225) the numbers are retained in the
calculator when addition of
(a) 1 probabilities is required.
0.946
(a) 3 (b) 3 (c) 0.633 (a) 0.0302 (b) 0.106
(a) 2 (b) 0.994 (c) 0.185 (d) 0.216 (e) 0.321
0.922 (f) 0.463
boDOR
oo (a) 3 (b) 0.826 (c) 0.406 (a) 0.00781 (b) 0.000452
(c) 0.731 (d) 0.109
Exercise 4e (page 228) (a) 0.0907 (b) 0.308 (c) 0.570
(d) 0.779
1. (a) 1.2 (b) 0.4 (a) 1.6 (b) 0.976
(c) 0.216, 0.432, 0.288, 0.064 (a) 0.607, 0.303, 0.0758, 0.0126,
(d) 39, 78, 52, 11 0.001 58
0.06; 293, 94, 12, 1, 0,0 (b) 0.0608, 0.170, 0.238, 0.222, 0.156
1;0.894 (a)5 (b) 0.2 (c) 0.0273, 0.0984, 0.177, 0.212,
5, 22,37, 28,8 0.191
0, 0, 3, 13, 30, 36,18 6. (a) 2 (b) 0.271
om
eh
Se 16.5, 42.4, 45.4, 25.9, 8.3, 1.4,0.1 7 0.433

Exercise 4f (page 233) Exercise 4i (page 247)


0.0243 LL; (a) 0.513 (b) 0.00423 = (c) 0.0302
(a) (i) 0.201 (ii) 0.00637 (b) 2 2. (a) 0.143. + (b) 0.762 ~— (c) 0.670
(c)5,2 (d)14 3. (a) 0.0821 (b) 0.242 — (c) 0.759
(a) 4.8, 0.98 (2 d.p.) (c) 0.737 (d) 0.0486 (e) 0.125
(d) 0.388 (a) 0.0821 (b) 0.109 = (c) 0.265
1, 0.336, 20 (d) 0.0631
s>+ 3sd? (a) 0.567 (b) 0.184
esas(a) 0.940 (b) 0.04382 (c) 0.0167 (a) 1.2 (b) 0.879 — (c) 0.570
A CONCISE COURSE IN A-LEVEL STATISTICS

lis (a) 0.607 (b) 0.185 14. (a) 0.082 (b) 0.242;6.15
8. (a) 0.0408 (b) 0.219 (c) 0.0463 15. 0.371, £60.37
(d) 0.145 16. (a) 0.135 (b) 0.323; 0.81
17. (d) 0.387 (e) 0.929 (f) 0.893
Exercise 4j (page 251) (g) 0.205 (h) 0.816; 0.0290
18. (a) (ii)1.5 (b) 0.577 (c) 0.0249
1. (i) 0.0476, 0.0498 ieee A ?
(ii) 0.225, 0.224 (iii) 0.171, 0.168 19. (c)e *— (d)1—e r4a+};
2. (a) 0.879 — (b) 0.00150 6 2
3. (a) 0.287 (b) 0.191 0.013, 0.014, 0.182
4 (a) (i) 0.368 (ii) 0.184 (iii) 0.0190
(b) 0.677
(a) (i) 0.195 (ii) 0.0916 CHAPTER 5
(b) 0.075
0.463 Exercise 5a (page 275)
] 0.647, 0.185
0.121 1. (a)eo eles ee ae
2. (a) (c) 0.74
Exercise 4k (page 255)
3. (a)% (ce) 0.66
ae (i) 0.165, 0.298, 0.268, 0.161, 4, (a)i (eeeee
0.0723, 0.0260 5. c=1,k=4
(ii) 0.0743, 0.1931, 0.2510, 0.2176,
0.1414, 0.0736 6 (aye Meae eae
(iii) 0.0111, 0.05, 0.113, 0.169, (e) 0.3475
0.190, 0.171 7. (alg (ete) seed (a)ig
(iv) 0.0224, 0.0850, 0.162, 0.205,
0.194, 0.148 Exercise 5b (page 280)
Exercise 41 (page 257) 1. -(a) (bye (d) 6.45
ate (a) 44, 44, 22, 8, 2
2. (a)1 (b) 1.2
(b)I9On7 2298.15 0
3. (a)2, (bySe xene ways
2: Delo 2 On 2 OFS MUON G esas OOS OF 4. (a)% (b)16 (c) 4.8
71; 23 (78, 26 if do not round figures) (d)—= |
3. 0.5, 0.481; 31, 16, 4,1, 0
5. (a)2 (b)8% (c) 4.86(3S-F.)
4. 95, 137, 98, 47, 17, 5, 1; Approx 58
6. (a)3 (b)2 (e) 48
Exercise 4m (page 261) 7. 6m

1. 0.121 8. (ae ‘(b)>


2. (a) 0.189 (b)0.308 (c) 0.184 (c) 0.48; Money bond
3. (a) 0.323 (b) 0.0119 9. 2,0.124(3S.F.)
4. (a) 0.301 (b) 0.080 (ec) 0.251 10. 2.5, 0.803 (3.d.p.), 0.456 (3 d.p.)
11. (a) 2.875kg (b)£4.75,%
Exercise 4n (page 268)
12) (i) 040.) Givase) Gils
de 0.407, 0.366, 0.165, 0.0629; 0.816,
0.0518 Exercise 5c (page 287)
3, 18.5% 3 12 S|
m;(a) 0.983 (b) 0.184 do (ay (i) = (c) 20
(c) 0.199 (d) 0.387 (3S.F.)
(a) 0.788 = (b) 0.00293 2. (ayzet (8)e088 tc)igs
outs (a) 0.368 (b)0.264 (c) 3.16 (d) 1.44 (3S.F.)
(d) 0.199
(a) 22 (b)19;39 3.2 (azole (by Mee
(a) 0.100 = (b) 0.0702 (d) 0.553 (3S.F.)
(a) 0.600 (b) 0.0741 4. (apie ORGS UY (yep
(a) 0.0902 ~— (b) 0.0613; 4 (d) 0.545 (3S.F.)
(a) 0.647 += (b) 6
oe
oe
Rape
oa
Se (a)0.185 (b)4 (c) 2.68 Bs, (akg S b)s ame)5
(d) 6 (d) 0.163 (3S.F.)
12. (a) (i) 0.238 (ii) 0.841 = (b) 0.083 Gis, (a)
Sy Wb)
5,a ee
(a) 0.677 = (b) 0.017; 1498 (d) 0.912 (3S.F.)
ANSWERS 663
7. (a)is (b)
30 (ce)15 9. (a)3 (b) 0.272 (3S.F.)
(d) 0.672 (3S.F.) 1
8. (AG (b)3,3 (da =(x3-1) 1<x<2
(c) F(x) =, 7
Of (aleigemen)1 ee) g.wOCd)
a5 it x22
(e) 1
(d) 1.65 (3S.F.)

10. (a)3,3
Qx
2
2
Exercise 5d (page 294) a 25x53
ears 3
3 5
= 85x <5
to (a) Flix)
a
5
0 <x = 2 (b)F(x)=; 3 6,
2e——-—5 5S <6
1 x22
i x26
(b) 1.59 (3 S.F.)
()iore pdar te)4. Bele
1
pitt) Pq Sie 11. (a) 0.455,3 (b) 3.64, 4.95
2. (a) F(x) = 1
es 1<x<9
Hf es
(c) F(x) =)"
(b) 0.5 1 x29

di
SOX 2 XE) 1 Sk <3 12. 4,80

3. (a) F(x)= 3x — ix? 0O<Sx <2


NG °
2 8 F(x)=
il x22
ag(x?+ 6x + 12) 0<x<2 0.007
4. (a) F(x) = 13. 4x O0<x<l
i xZ yaaa
F(x)= \ 5+—20 1<x<2
xt O<x<1
5. (a) F(x) = 1 x22
1.565, 0.821
(b) 0.841 (3 S.F.) 14. (a)5 (b)3
(c) 335; 543 tonnes
re 0<x
4
Exercise 5e (page 301)
= 0O<x<l

(ays1 2:
ne x

Lee (b) f(x) . ete eo


i
—(x+2)? 3 -—2<x
24 0 otherwise

a, @QPeye 4 3 22 0 =e (a) —Z
Jil
—(e) 0.608 (38.F.)
(c)§
a ae

ore 1<x <8


(b)3
8. (a) 1.5 (b) 0.75 85 RIO ilyTa)
12
ee
0 otherwise
(c) F(x) = (b) 4, (ec) 3.54 (3S.F.)
(d) 0.4 (e) 0.2 (d) 0.595
664 A CONCISE COURSE IN A-LEVEL STATISTICS

0 el! Exercise 5h (page 327)


1. (a) 2.4 (b) 20,3, 0.178 (3S.F.)
3. (a)3
. ay3 (b) f(x) x) = Ree:
<x<1 2. Mi aaet & isene May)2
3
0 x21 8 Oe
(ec) (d)0.553.~— (e) 48
2 0<x<05 4. 4,%, ss, 0.541 (3S.F.)
8,3, 39
4. (a) 2 >)
otherwise 2x
Fy <x<1
(c) 0.25 (d) 0.144 2
6. (a)3 (b) fix = ae 1<x <3;
’ a OSe<4
On (a!) pee igen (sey = 0 otherwise

0 otherwise 45 (ec) 1.27 (3S.F.), 0.875


G. dyer: Ts Re, ee
8. a=2,k=%,0.2
Exercise 5f (page 306)
9. 0.6, 0.2, 0.166
= 1 ==4
Eee ea 12x3
1. (a) f(x)= (3 10. (c) F(x) a(SiLa = 2S
Kaex
een pepe
x
0 otherwise

(b) 4.5 (c) 0.75 (d)4 1


St
i 123% cs
2. (a)5 (b)0.5 (ec) —3.5 11
(d) 0.75 (d) 0, 5
Be" (a)5 "(b) 0.325 (c)i3 (a) 6 6(6 +1)
11. (b) 643 +3044)
4. f(a)=%a!?, 1<a<16;7, 19.2 30
1 137 2447 ee d) 0.2
5. f(a) — Sy
eee aa:Brand
(°) (64 3)10 +4) «)
12. (a) 0.991 (b) 0.983 (c) 0.28
6. (i) 0.25 (ii) 0.845 (d) 0.0017 (e) £15.40
26 964 2
Ts a ae 18,Saal){b= ees
ead
arm2s 2
i )e
(ii)c = =a
8. 37, 377, 0.63 (2d.p.)
14. 22 g8
9. of<u <2f, f(1 +1n2), p
15. (a) 2.1, 1.29 (b)1,3
10. (b) 3e 16. 4,2,% 20
17. 0.181, 0.0498; 11.6 miles (3 S.F.)
Exercise 5g (page 315)
18. (a) 98p (b) 83p
1. (a) 0.0821 (b)0.2 (c) 0.632
(d) 0.2 (e) 0.139 (f) 0
2. (a) 2000hrs (b)(i) 0.287 (ii) 0.593
(c) 0.465 (d) 0.0515
8. (a) 6.93 (by Ole. 0%
(c) 10,100 (d)0.24(2S.F.)
4. 0.1386, £26.30, 0.225 CHAPTER6
5: (a) ecyar (by dae. 129% (c) 2.5
‘(d) 1.73; 0.135 Exercise 6a (page 337)
6. a=92.2, A=0.0108 1. (a) 0.1911 (b) 0.8089
(a) 0.114 (b) 0.338 (c) 0.202 (c) 0.1911 (d) 0.8089
7. (i) 0.62 (ii) 0.38 2. (a) 0.0359 (b) 0.2578
8. 2.895 (c) 0.99973 (d) 0.9131
=p A(l—kKhE (e) 0.004 94 (f) 0.99111
9. =. e-2At e-2Atia (g) 0.9686 (h) 0.2343
xX ‘ ; 1—e “*t (i) 0.0312 (j) 0.9484
10. 2,3,4,1—e-2*, 0.368 (k) 0.9803 (1) 0.00201
ANSWERS 665

3. (a) 0.05 (b) 0.05 Exercise 6d (page 343)


(c) 0.0999 (d) 0.025 (i) (a) 51.55 (b) 63.55
(e) 0.005 (f) 0.01 (ii) (a) 117.44 (b) 126.752
(g) 0.0025 (h) 0.975 (iii) (a) 70.00 (b) 90.58
4. (a) 0.1709 (b) 0.548 07 (iv) (a) 49.66 (b) 67.60
(c) 0.3639 (d) 0.4582 (v) (a) u—2.050_ (b)u+0.860 __
(e) 0.4798 (f) 0.997 92 (vi) (a)a—2.05,/b (b)a+0.86,/b
(g) 0.033 68 (h) 0.9082 (vii) (a) —1.05a (b) 1.86a
(i) 0.2729 (j) 0.03061 (viii) (a) 34.65 (b) 55.02
(k) 0.925 (1) 0.4508
(m) 0.9 (n) 0.02 Exercise 6e (page 345)
1. (i) 63.655 (ii) 67.37
Exercise 6b (page 338) (iii),55.09 (iv) 56.69 or 56.695
i. (a) 3.03 (b) 2.326/7/8/9 2. (a) 37.572 (b) 50.012
(c) 38.244 (d) 55.608
(c) 1.96 (d) 0.849 8. 9.87; 70.13 < X < 89.87
(e) 0.047/8 (€).70,501/2 4. (i)9.2 (ii) 18.608
(g) —0.885 (h) — 2.272/3/4 (iii) 15.68 (iv) 17.92
(a)yais432 (b) —1.887 (v) (384.32, 415.68)
(c) —0.454 (d) 0.015 5. (a) 0.6247 (b) 629.52¢ (c) 3
(e) 0.796 (f).1.281/2 6. (a) 290 (b) 78 (c) 27
(g) 0.953 (h) 1.94 7. (i) 1.645 (ii) 2.575
(a) 0.91 (b) 1.66 (iii) 1.96 (iv) 2.808
(c) 0.674 (d) 2.05 8. (458.92, 546.52)
0.674, — 0.674, 0.524 9. 8,1.158, (6.10, 9.90)
(a) 1.645 (b) 1.96
(c) 2.054/5 (d) 2.326 Exercise 6f (page 349)
(e) 2.575 (f) 2.808/9
(a) 1.282 - (b) 2.054/5 1. 10.7
(c) 2.17 4
(d) 2.575
30
35.5
Exercise 6c (page 342) 52.73, 11.96
1. (a) 0.0548 (b) 0.0107 100.8, 5.71
(c) 0.8849 (d) 0.9713 50, 6.12
(e) 0.6554 (f) 0.9918 39.5, 5.32
(g) 0.4602 (h) 0.0808 Rwh
AIBA
OO 53.87, 16.48
(a) 0.0106 (b) 0.273 10. 0.7725
(c) 0.5971 (d) 0.2168 11. 0.203
(e) 0.9857 (f) 0.997 02 12. (a) 92.7% (b) 1.32
(a) 0.3015 (b) 0.0105 (c) 1.7%
(c) 0.9079 (d) 0.2533 13. 2080, 236
(e) 0.2097 (f) 0.0323 14. (a) 9.1% (b) 99.69
(g) 0.5231 (c) 0.4mm
(a) 0.1587 (b) 0.8413 15. 4,46
(c) 0.6915 (d) 0.3085 16. 4.299 ;
(e) 0.9332 Exercise 6g (page 353)
(a) 0.8634 (b) 0.2413
(c) 0.1388 (d) 0.6826 1. (a) 37.8% (b) (125.5, 194.5)
(e) 0.2565 (c) 0.405
(a) 0.8014 (b) 0.085 (a) 7, 3.5 (b) 0.075
(c) 0.2714 (d) 0.4028 979.27, 17.27, 133
(e) 0.188 62 5.2007, 0.003 46; 0.0269; 0.002 61,
(a) 0.5923 (b) 0.4208 1.4%
(c) 0.9544 24.97, 53.03
(a) 0.0668 (b) 0.4013 0.0038, 230.65, 1.29
(c) 0.1747 433.7 hrs
(a) 0.7054 (b) 0.3228 137, 149.5
0.30 (2 d.p.), 0.26 (2 d.p.); steeper
(c) 0.0618 (d) 0.8962
(e) 0.1818 (f) 0.4621 (a) 1.2 (b) 53.6
10. (a) 0.0478 (b) 8.17x107 (c) 54.2; 0.066
11. (a) 735 (b) 646 0.4013, 0.0031
(c) 546 (d) 740 0.159, 0.775, 0.067, 2.7, £37.56
666 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 6h (page 361) 0.1036 (a) 0.098 812


(b) 0.1061
iL. (a) P(2.5<X<9.5) 0.1360 (a) 0.1381
(b) P(3.5 <X <8.5) (b) 0.0936
(c) P(10.5 < X < 24.5) 0.063 03 (a) 0.0579
(d) P(1.5 <<X< 7.5) (b) 0.0655
(e) P(X >54.5) ° (£) P(X> 75.5) (a) 0.061 84 (b) 0.0651
(g) P(45.5 <X < 66.5) 0.2192, 0.2075
(h) P(X<108.5) (i) P(X < 45.5)
(j) P(55.5 <X <56.5)
(k) P(400.5 < X < 560.5) Exercise 6k (page 370)
(1) P(66.5 <.X < 67.5) 1. (a) 0.55 (b) 0.18
(m) P(X > 59.5) 2. (a) 0.649 (b) 0.965 (c) 0.371
(n) P(99.5 <X< 100.5) 3. 1°C,(0.96)%(0.04) (a) 0.20
(0) P(33.5 < X < 42.5)
(p) P(6.5 <X< 7.5) (q) P(X > 508.5) (b) 0.77
(r) P(X< 6.5) 4. (a) 0.2025 (b) 0.410 (c) 0.0238
(s) P(26.5 <.X< 28.5) 5. "C,(0.211)'(0.789)”
"; (a) 0.306
(t) P(52.5 <X< 53.5) (b) 21 (c) 0.203
2. (a) 0.9474 (b) 0.6325 6. (a) 9.6247 (b)93.32% (c) 0.7852
(c) 0.5914 (d) 0.0269 7. (a) 0.3154
(e) 0.2106 (b) 0.3068; worse; 0.5245
3. (a) 0.0154 (b) 0.8145 8. 0.1796, 3500
(c) 0.02 9. (a) 0.194 (b) 0.933 (c) 0.986
4. 0.1127 10. (a) (i) 0.1353 (ii) 0.3233
5. (a) 0.3729 (b) 0.9501 (c) 0.1039 (b) 0.250
(d) 0.929 | 11. (a) 2.04x1607°
6. 4, 3, 27; 6.75, 2.25, 0.8413 (b) 0.004 34; x = 73
7. 20,16, 0.004 36
Be 2000! Wye e/g aa
85 =CXi—p).
a p.wp, np lerp) * (2000—N)!N! \30) \30 ,
(a) 0.2304 86 lines; 2 X 47 lines > 86; No
(b) 0.922 24; 0.8531 (incl.), 0.7946 13. (a) (i) 0.0525 (ii) 0.358 75
(not incl.) (b) (i) 0.143 (ii) 0.145
9. (a) (i) 0.0432 (ii) 0.1845 14. (a) 0.315 (b) 0.5644
(iii) 0.7723 (b) at least 9 15. (a) 0.199 (b) 0.353
10. (a) 61.7 (b) 0.075 (c) 163.5 (c)e °>—e~
(d) 134.3 (e) 702.2 (d) 83e°(1—e °); 0.047 31, 0.007 066;
11. (a) 0.0566 (b) 0.2171 (c) 0.4708 0.870
(d) 0.1432 16. 0.859 (c) 0.204 (d) 0.034
12. (a) 285 (b) 43 17. 0.043
13. 0.6886 18. (a) (i) 0.5987 (ii) 0.149
(b) (i) 0.0294 (ii) 0.751
Exercise 6i (page 365) (c) (i) 0.5987. (ii) 0.9772
19. (a) 0.215
1. (a) 0.6201 (b) 0.39 20. 0.360, 0.734
(c) 0.5406 21. (a) 0.927 (b) 0.0102; 0.297
2. (a) 0.3998 (b) 0.2004
(c) 0.3361 (d) 0.0637
3. (a) 0.313 (b) 0.5078
(c) 0.8335 (d) 0.1101
4. (a) 0.2614 (b) 0.2343 CHAPTER 7
(c) 0.0558
5. 0.8901 Exercise 7a (page 377)
6. 0.6887; 4
7. (a) 0.4574 (b) 0.173 (c) 0.8312
1. (a) 0.6554 (b) 0.7698
8. (a) 0.4594 (b) 0.53638 (c) 0.3446 (d) 0.8301
9. (a) (i) 0.999767 (ii) 0.000177
2. (a) 0.0359 (b) 0.269 64
(iii) 0.924 41 (b) 0.009 44 (c) 0.6554 (d) 0.2743
10. 86 (e) 0.9918
(a) 0.001 35 (b) 0.0228
Exercise 6j (page 369) (c) 0.0913
(a) 0.9044 (b) 0.9522-
1. 0.5455 (a) 0.5462 (b) 0.38983 (c) 0.6826
ANSWERS 667

5. (i) 6.68% (ii) 6.1, V0.13 12. (a) 0.106 .(b) 438.2 ml
(ili) 4.81% (iv) £74 (c) 0.800 (d) 0.961
6. (a) 0.0478 (b) 0.0668 (e) 0.244 f) 388.6 ml
(d) 0.9324 13. N(Mi + M2 —U3, 30°)
7. 0.12, 0.0583, 1.98% (a) 0.1657 (b) 108p
(c) 0.4148
14. (a) 0.0139 (b) 0.1562
Exercise 7b (page 382) (c) 0.9332
1. (a) 0.0228 (b) 0.8621 15. (a) 0.159 (c) 0.584
(c) 0.9638
0.6915
(a) 0.1728 (b) 0.6127
(c) 0.5 Exercise 7e (page 398)
0.0561 Ae (a) (i) 0.5, 0.45 (ii) 1.5, 1.05
ee (a) 0.0289 (b) 0.0200 (iii) 0.6, 9.24
(c) 0.6252
2.
0.5402
(a) 0.1247 (b) 0.6957
0.1103, 0.753
ae 0.9043
10. 0.0651
11. 0.2575 4.75, 8.1875; 4.75, 4.09 (3S.F.)
12. 967020225 (a) 0.0177 5, 7.5; Mean | 2.5 4 4.5 5.5 6 7.5.
(b) 0.2218 f 2 282e @ieray
13. (a) (94.4, 105.6) (b) 92.55%
(c) 22.14% 5; 2.5
14. (a) 0.0787 (b) 3.019 x10°° 0.84, 1.68
15. (a) 0.6298 (b)-0.1056— (a) 24.5 (b) 2.57; 2.35, 7,6

Exercise 7c (page 389)


Exercise 7f (page 401)
ik (a) 0.0745 (b) 0.9736
(c) 0.9386 (d) 0.0271 (a) 0.0401 (b) 0.8891
2. (a) 0.8131 (b) 0.0478 0.3206
(c) 0.1078 (d) 0.0306 (a) 0.0668 (b) 0.9893
(e) 0.99553 (f) 0.2762 (c) 0.1974
(a) 6,2 (b) 0.2074 0.0228
(c) 0.7601 (d) 0.5143 (a) 0.0401 (b) 0.7571
0.8681 (c) 0.2660
(a) 0.990 39 (b) 0.9772 (a) 12 (b) 25
(c) 0.7373 (a) 0.2399 (b) 0.0787
(a) 0.1587 (b) 0.0127 (c) 0.0127; n > 108
(a) 0.244 (b) 0.659 62
(c) 0.409 (a) N(10, 3.2) (b) N(50, 3.2)
(c) N(—10, 3.2) (d) N(210, 48)
(e) N(80, 27.2)
Exercise 7d (page 392) 0.009 61
44
Lt (a) 0.60 (b) 0.20 (b) 0, /20
(d) 0.5 (a) 2u, re
(a) 0.051 (b) 0.001 55 (c) 0.9782 (c) be 0.7078, 0.9213
1000, 172.4, 3000, 298.6, 0.16, 0.02
(a) 0.0888 (b) 0.6611 0.0968, 0.0828, 0.000 907, 0.2295
0.0625, 0.2574, 0.5, 0.7123, 7 N(960, 21.2)
+a2b2, a1 25)? + a9”0”), 0.84
Y ~ N(aipi 0.0983
IO
wy
OR (a) 0.8413 (b) 0.5 (a) 0.1457
(c) 0.4207; 0.9938 (b) Distribution of v8 different,
12kg, 57.0 g, 3.97%, 765g prob. < 105"
© (a) 0.3446 (c) 0.1210
0.332, 0.0587, 0.009
(b) 0.6915; 0.003 29, 0.304 17.
10. 0.9192, 08 13, 0.999 912, 0810, No 18. (a) (i) 0.7881 (ii) 0.673
11. 0.8603, 0.1574, 0.3909 (b) 0.0749 (c) 0.0548
668 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 7g (page 405) Exercise 8b (page 427)


1.. 158081200. 48.875, 6.98 (2d.p.)
2. 3.21, 0.265 (3S.F.), 0.001 44 51.5, 241.1
3. (a) 0.034 (b) 0.8194 1.69 (2d.p.), 8x10 (1S.F.)
4. (a) 3.85 (b) 62.34 15, 43.14 (2d.p.)
(c) 1.7 15, 43.14 (2d.p.)
A (a) 0.9145 (b) 0.7081 10, 3.11 (2d4.p.)
(c) 0.6226 9.71 (2 d.p.), 621.12 (2d.p.)
6. 50 57.78 (2 d.p.), 6496.15 (2 d.p.)
7. 60 46.9, 242.46 (2d.p.)
8. 35 10. 10.96, 17.35 (2d.p.)
9. 42 11. 22.79 (2d.p.), 1.81 (2d.p.)
10. 45 12. , 236N7-68
11. 20500, 1768 13° (875614
12 0.9212
13 (b) 8.86,7.82 (c) 0.331 Exercise 8c (page 432)
14 0.25, 0.0228
Answers are given to 3 S.F. where applicable

Exercise 7h (page 409)


1. 0.663, 0.002 21
9.88, 0.796
1. (a) 0.0745 (b) 0.003 67 3.69, 1.33
2. (a) 0.005 68 (b) 0.527 02 9.19, 10.0
(c) 0.0958 2.27, 10.2
3. (a) 0.000215 (b) 0.5229 5.46, 0.0481
(c) 0.0367 30.15, 11.9
4. (a) 0.3085 (b) 0.0970 2.39, 0.0275
6. comin 0.348
6. (a) 0.0648 (b) 0.0851 thee
Oe 0.838
(c) 0.3068 11. - 25
7. 0.22 12. 0.307
Exercise 7i (page 417) Exercise 8d (page 439)
Some answers depend on the random numbers 1. (a) (139.494, 140.506)
used and on the method of allocation. These (b) (139.399, 140.601)
are possible answers. 2. (a) (74.026, 77.974)
10. (a)iie 0n3 (b) 4 (b) (73.396, 78.604)
11. 33.134, 34.193, 28.712 (c) (72.91, 79.09)
12. (a) 3,5 (b) 1,5 3. (a) (747.516, 748.484)
(c) 1007.2, 1016.8 (b) (747.424, 748.576)
13. 1.52,1.48 (c) (747.316, 748.684)
14, 3.3,1.41 4. (a) (79.209, 84.791)
17. (a)3 (b) 6.1826 (b) (78.91, 85.09)
18. (a) 5.36, 5.53 5. (a) (68.123, 69.877)
(b) (67.848, 70.152)
(1011, 1114)
(10.821, 14.079)
See10.82, 1.70 (38.F.), 11.19, 0.646
(38.F.), 10.968, (10,821, 11.115)
9. 85.2, 15.45 (2 d.p.), 85.01, 2.01,
CHAPTER 8 85.08 (2 d.p.), (84.628, 85.540)
10. 25.3, 3.68, (24.85, 25.75)
Exercise 8a (page 424) 11. 91.32 (2d.p.), 7.42 (2d.p.),
1. a,c,e (90.5, 92.2)
2. e 12. 194, 176.41 (2d.p.), (173.48, 214.52)
LYS, 13. (9,71 s(A72.3, 17am
4. =3° 3° i
maller variance 14. (b) 38.1, 1080.39 (c) 38.1+4.56
(d) 28.5
5. 65 15. (b)3,0.471 (c) 3
6. a=0.24, b= 0.28; 8.8, Unbiased, (d) 0.940.173
Minimum variance 16. 3.71
ANSWERS 669

Exercise 8e (page 444) Mesh size Otol Sl to? ee 2itos

1.(— 3.707, 3.707) Additional


2.(a) (—1.943, 1.943) diamonds - 4

(b) (— 2.447, 2.447) Mesh size >4to6 >6to8 >8& to12


(c) (— 3.143, 3.143) Additional 2 6
6
(d) (— 4.317, 4.317) diamonds
(a) (— 3.499, 3.499) 6.30, 9.93
(b) (— 3.055, 3.055) 3. 14.01, 0.04, (13.92, 14.10); 0.40
(c) (—2.947, 2.947) 4. (0.123, 0.392), (170.84, 178.16),
(d) (— 2.921, 2.921) (165.57, 186.83), (£488, £531)
0.945
0.045
0.075
1.86
90
oT
I 2.179
CHAPTER 9

Exercise 8f (page 447) Exercise 9a (page 464)

1. (a) (177.74, 181.59) 1. (i)z=2 (a) No (b) Yes


(b) (177.21, 182.12) (ii) z= —1.5 (a) Yes (b) Yes
(c) (175.82, 183.52) (iii) z = 2.12 (a) No (b) Yes
(3.77, 4.51) (iv) z=—2.475 (a) No (b) Yes
wn (a) (0.285, 0.335)
(v) z= 3.645 (a) No (b) No
(b) (0.275, 0.345) (vi)z=—1.826 (a) No (b) Yes
(14.98, 15.78)
(a) (8.08, 9.12) (b) (8.01, 9.19) In questions 2-5 the continuity correction
(4.70, 5.56) has been omitted.
IO(32.08, 33.22) 2. (a) 2=1.5, Fair (b) z = 2.5, Biased
38. z=— 1.826, Reject claim
4. z=1.746, Accept
Exercise 8g (page 454) 5. (a) (i) 0.0298 (ii) 0.0934
(b) z =—1.897, Yes, less than 75%
1. (a) (0.323, 0.517)
(b) (0.696, 0.904)
(c) (0.222, 0.418)
(d) (0.529, 0.791)
Exercise 9b (page 471)
(e) (0.146, 0.254) 1. (a) z=—1.095, Accept Ho
(f) (0.693, 0.847) (b) z = 1.845, Reject
(g) (0.469, 0.531) (c) z = 2.5, Reject
(0.622, 0.738) (d) z = —2.778, Reject
(a) (0.293, 0.427) 2. z= —1.565, No
(b) (0.273, 0.447) 8. z2=1.909 (a) Yes (b) No
(0.156, 0.344) (c) No
(0.510, 0.574); Yes 4. X<91.51 min
267.2 (1d.p.), 227.9 (1 d.p.), 5. (a) 0.683
(0.256, 0.410) (b) 2.9216 <* < 3.0784
(a) 3, (2.04, 3.96) 6. 0.1101, 0.001 58, Reduced
(b) 30%, (25.2, 34.8) 7. z=2.487, Yes; 1506.81+0.311
B. (a) a>3.59% (b)a>7.18%
MN (0.002 41, 0.007 59), 9. Approx. 83, (0.823, 0.845), No
no 10. My—Me, O17 +097; Agua
tAoMe,
(13175, 41493) Nor ” Naor
10 000, (7236, 16 181)
Bin(n0, nO(1—8@)), n large; ny nz
(0.826, 0.945) z= 1.853, Yes at the 5% level
11. (a) 10.6%,6.7% (b) £98.80
(c) 2 =—2.5, One tailed, Yes
Exercise 8h (page 456) 12. 11.2, 2.54, Reject at the 5% level
13. 0.0341, 0.069, Do not reject
Me (a) (92.32, 99.68)
(b) (0.351, 0.369), 5277
14., .(b)(i). (E173, £193)
(ii) 279 (iii) Yes
2. 6.6mm, 3.5mm
670 A CONCISE COURSE IN A-LEVEL STATISTICS

Exercise 9c (page 476) 10. |z| = 2.036, S at 5%, NS at 4%


11. 8.0067, 0.000175, z = 2.00, S, Reject
1. (a) 2 = 1.792, Accept
population mean is 8.00, z = 3.52, S,
(b) 2 = 1.792, Reject
Second population has smaller mean
(c) 2 = —1.437, Reject than first
(d) 2 =—2.5, Accept
12. |t|=6.496, Yes; t= 2.041, Yes
z = 0.995, Accept
18. 4.41, (9.87, 10.73), 3.61, z = 1.49, NS
z= 2.00, Yes; 16.2+1.232
14. (a) 1.65, 0.0025; 1.55, 0.003 75
Justified; (8.19, 8.53) (b) 0.1038 (c) z = 2.911, Reject
(b) Reject Ho (c) (77.50, 79.96)
15. (2.602, 3.118), 0.567, z =— 2.219, 5S,
(45.6, 49.4), a> 4, 0.0321
Flowers on sunny side grow taller

Exercise 9d (page 480)


Exercise 9f (page 495)
1. (a) t = 2.622, Reject
(b) t= —1.892, Accept Continuity corrections have been omitted.
(c) t= 2.152, Reject 1. (a) z=1.768,S, Reject
(d) t = —3.078, Reject (b) z = 2.335, S, Reject
t = —38.601, Not in good working order (c) z =— 1.897, NS, Accept
t = 2.828, Yes (d) z = 2.179, NS, Accept
Awb (a) t = —3.54, Yes (e) z = — 38.060, S, Reject
(b) 2 = —3.2, Yes z=—2.04,No
t=1.1, No z= 1.667, Yes
0.1056; z=—2.4, Yes
(a) 2 = —1.660, Do not reject z = 2.46, Yes, (0.340, 0.400)
(b) 0.3824; t = —2.38, Reject (a) 0.87 (b) 0.19, z = —1.49, Yes
(b) Ho rejected at better than 2% \ ae
ee
oe 91.3, 13.5; 0.1571; 0.003 (without
significance level, population mean continuity correction), z = — 2.75, Yes
unlikely to be 3.1 oo (a) 0.0985, 0.0666
(c) (1.81, 2.76) (b) z = —0.0223, Do not reject
(Co) 2x2 Oe LO moma
9. (0.32738, 0.3867), 19,2 =—2.775
Exercise 9e (page 491) Reject Ho, Yes
10. z=—1.143, Do not reject
ik (a) z=—2.096,S
(b) z =—1.402, NS
(c) z = 2.493,S8
(d)z=1.99, NS Exercise 9g (page 501)
(e) z = 2.076,8 1. (a) z=—1.245, NS, Accept
(f) z= (b) z = 2.941, S, Reject
(g) z= 1.783, 8 (c) 2 = — 0.568, NS, Accept
(h) z= 1.779,8 (d) z = 1.373, NS, Accept
(i) 2 = — 2.321, (only just) NS z=—1.247, NS
(j) z= 2.55,S8 |z| = 1.85, NS
(k) t = 2.135,8S z= 2.04, S,z = —0.45, NS (5%)
(1) ¢ =—0.567, NS
(m) t = 2.088, NS
oapw (0.616, 0.824), 738, z = 2.25, S,
(n) t = 1.260, NS Difference in proportions
z= 2.33, Yes 6. (a)% (b)35 (ec) 0.440.048
(a) |z| = 2.385, Yes (d) z = — 5.657, Yes
(b) z = 2.946, Yes 7. (a) 2; = 2.178, Reject; z2 = 0.594,
z= 1.627, Accept; least n = 124 Accept (b) z = 0.966, NS
(b) 15+3.4 (c) 13.625 8. (a) z=1.317, NS
(d) t= — 1.42, Accept claim (b) (0.476, 0.524)
z= 2.423,8 9. z= 0.807, Proportions same,
(26.77, 27.89); 2.4, z = 1.97, NS, (0.569, 0.711)
Those of high intelligence do not have 10. (0.270, 0.355), 0.25,z = 2.20, Yes,
greater foot length Greater among those not wearing seat
(3.175, 3.335), Yes, n large, belts :
2 — @o.246, Yes 11. z=~—1.247, Accept claim
z= — 1.646, Just S, Reject Mr Brown’s 2 = —1.521, No at 5% level
claim 12. (a) 0.557<p< 0.754 (b) No
ANSWERS 671
x
Exercise 9h (page 504) 3. Accept as slow if“bounce’ < 12.5,
1. 0.0345, 0.1174 Accept as slow if
‘mean bounce’ < 11.64; 0.0004
2. 6? = 34.583, |t| = 0.798, NS 4. 1.9, 0.837
3. (204.1, 223.9), 196, |z| = 1.714, NS
5. 0.515, 0.876, 0.4455, 43
4 (24.59, 25.41), |z| = 2.15, Accept 6. 0.106, 0.02
Mr Jones’ claim
om (a) 0.833 (b) 0.180 7. (a) Hop =% Ai: p#S
(c) Ho:p = 0.5, Hy: p#0.5;2 = 2.236, (b) z =—1.9595..., Just NS at 5%
Biased, (0.560, 0.940), p > 0.5 level, No (c) z = 1.8, NS, no
(a) z=—1.4, NS, no 8. 0.000 32, 0.006 72, 0.057 92; 4
(b) |z| = 1.405, NS 9. See question 1
(a) t = 2.52, evidence that means are 10. (a) 0.0668
different (b) 0.0446; If X > 58 accept Hj, If
(b) z = 1.80, No evidence that means X < 58 accept Ho; 0.380
are different. Assumption or = oy 11. Accept Ho if X > 0.817; Accept Ho if
= 196.0 is clearly suspect; a value X > — 0.255
of 100 would be more consistent 12. 3<r<9;0.182
with the data 13. (a) 0.000577
X¥1—X2+0.077 25, 0.7174, Same (b) 0.007 38; 0.8, 0.8, 0.9
(a) z = 2.667, Reject
(b) z = — 1.667, Accept
0.567, 0.1156, 0.1587, 0.1587;
12.6 pence; z = 1.4, Not sufficient
evidence, z = 1.838, Sufficient
evidence
10. (a) 0.8931 (b) 0.8859 CHAPTER 10
(c) 0.7912 (d) 0.0947; NOTE: There will be variation in answers,
d= 229.41, z = — 2.405, Yes, Not depending on the degree of
correctly set approximation used at various stages
11. 861, 5441 +4.88, |z| = 2.4,S, Yes in the working.
12. 1/1200, z = — 0.949, NS, No

Exercise 10a (page 540)


Exercise 9i (page 513) v Decision
1. (a) N.S. (b) S. (c) S. 3 accept fair
(d) N.S. (e) N.S. (f) S. 9 accept
(g) S, (h) N.S. 6.19 2 reject Ho, yes
N.S. ee
e
coat 4.95 3 no
No 9.90 3 yes
N.S. (She could have been guessing) 8.24 7 accept
4.15 4 yes
ae 10.68 4 NS

Exercise 9j (page 518)


A (a) N.S. (b) N.S. (c) N.S. Exercise 10b (page 547)
(d) S. (e) N.S. (f) S.
1. X ~ Bin(5, 0.8), E; = 17, 36, 31,13, 3,0
Ss.
(combine last three classes), V = 3,
N.S.
(a) N.S. (b) S. X vale = 4.49, accept
X ~ Bin(8, 0.4), Hj = 39, 78, 52,11,
N.S.
0.0057, 9 mins, N.S. Y= 2) X*cato = 26.9, No
No (a) X ~ Bin(4, 0.53), E; = 5, 22, 37, 28,
8) v= 93K cic = 1:28, Yes
(b) X ~ Bin(6, 0.3), E;= 17, 42, 45, 26,
10 (with last three classes combined),
Exercise 9k (page 529) v = 8, X*cate = 11.8, No
Le (a) 0.125, 0.125 4),E;= 150, 60, 6, v= 2,
X ~ Bin(2,
(b) 0.2099, 0.2702, Test 2 ae = 9.6, Reject; Use x = 0.444,
2.
14
39
p = 0.222, Find E;,v=1
672 A CONCISE COURSE IN A-LEVEL STATISTICS

X ~ Bin(5, 2), E; = 80.5, 80.5, 32, 7 Exercise 10d (page 553)


(with last three classes combined), NOTE: Minor adjustments need to be made
p= 3, Nae = 8.21, yes, biased;x = 1, when approximating, so that totals
p= 0.2, X ~ Bin(5, 0.2) agree.
E;= 66, 82, 41, 11 (with last three
E; Conclusion
classes combined), Vv = 2, X’cajc is very
small, too good a fit, query data 1. (a) 21,16.5, Independent
np, 1.6) 0.32, £;= 7.8; 17.1; 16:15)7.5, 12555) 24),
1.8, 0.2 (combine last three classes), 16.5, 12.5
V = 2, Xcalc = 1.79, Good fit (b) 17.3, 35.0 Not
38.2, 11.5 independent
(a) X = 1.2, E;= 99,119, 72, 29,9, 2 14.9, 30.3,
(combine last two classes), 32.9, 9.9,
(b) X7calc = 0.48, V = 3, Very good fit 33.8, 68.7,
E; = 58, 55, 29, 10, 3 (combine last 74.9, 22.6
(c) 18.6, 10.4, Independent
two classes), Vv= 2, er 15-0
ky 28.9, 16.1,
Good fit 18.910;
X = 2.5, E; = 8, 21, 26, 21,13,11 24.5, 13.5
(combining end classes), v = 4, (d) 11.8, 24.5, Independent
Xie = 2.59, Good 33.7, 13.5,
xX = 1.28 (2d.p.), H; = 41, 52, 34, 14,6 28, 38.5,
10.
20.3, 42,
(combining end classes), Vv = 3,
Dialao-4,
X72 = 681, NS 17.5, 24-1
ate (a) X = 0.9, EH; = 183,165, 74, 22,6 2. 40, 60, 100,
(combining end classes) 140, 210,
350, 20, 30,
(b) v = 8, X-sale = 1.62, Adequate 50
12. (a) E; = 3,138, 28, 32,18, 6 (combine
Reject
8. 40, 60, 100,
first two classes), v = 4, 140, 210, hypothesis
Xehie = 11.9, S, Reject normal 350. 2055005
(b)X = 171.54, s= 7.11 (2d.p.),
Accept
E; = 6, 18, 32, 28,13, 3 (combine
hypothesis
last two classes), v = 2,
X-calc = 1.73, Accept normal, 4. (645); 25, Performance
Good fit 145, 30, in both
13. Xx = 25.9,5 =11.8 (1 d.p.), E; = 4,7, 87.5, 507.5, sports not
13,18, 20,17, 12,6, 3,1 (combine first 105, 12.5, independent
two classes and last three classes), 72.5, 15
v= 4, Xone = 0.95, Very good fit 5. 50.1, 29.5,
23.4, 22.9,
13.5, 10.6
Gawd ieds 4eS:
Exercise 10c (page 550) 8.6, 15.7,
18.3, 22.9,
1. E; = 17.5, 82.5, 17.5, 82.5, v=1, Loa 725.0,
Xeon = 0.58, no 20.6, 25.7,
E, = 27.5, 972.5, 27.5, 972.5,v=1, 15.4, 28.3,
ZOE OM ck
Xrcale= 4.79, yes 22.3, 40.9
(a) E;= 24.225, 60.775, 32.775,
Sorte OAy = al gr bo MLO.
Independent, query whether Exercise 10e (page 556)
agreement too close 1. £;=66.7, 33.3, 53.3, 26.7,v=1,
(b) £; = 42.3, 47.7, 51.7, 58.3, v= 1, X cate = 0.01, No
cae = 7.79, Not independent 2. A: EH; = 12, 24, 36, 48, 60, 72,60, 48,
(c) Ej= 37.5, 22.5, 87.5,52.5, v= 1, 36; 24922, P= 10,X a 44.5,.Not
Xcalc = 2.54, Independent biased
(d) #;= 11.5,13.5, 43.5, 51.5, v=1, B: E; asin A, Xe = 2.12, Unbiased,
Nga = 3.18, Independent but query whether data have been
E; = 34.2, 29.8,12.8,11.2, v= 1, fiddled :
Xai= hed NS 3. x=0.9, H; = 21,18,8, 2,1 (combine
E; = 90.405, 56.595, 35.595, 20.405, last three classes), v = 1, NOs. = 1.80,
V=1, X calc = 13.3, Related Yes, Consistent
ANSWERS 673

ae v= 2, NCL = 1.15, Yes, Normal Exercise 11c (page 574):


(a) X = 2, p= 0.4, E; = 6, 21, 28,18,
6, 1 (combine last two classes),
1. (a) (i) —1.96
(ii) y = —0.297x + 6.125;
(b) X*cale = 2.21, v = 3, Yes, Binomial x =—0.757y + 8.51
adequate
(iii) 20.083 (iv) 51.170
Ea — 2180, 1020, 7-0, 15:5, 7.5.5, 41.5 (b) (i) 4.18
19.5, 14, v= 4, X*cac = 7.86, Accept (ii)y = 1.709x — 28.76;
hypothesis = 0.348y +18.92
E; = 48,52,96,104,96,104, v= 2, (iii) 34.129 (iv) 6.944
A coal = 6.57, Yes, Proportions different, (c) (i) 18.5
1:44%, 2:42%, 3:55%, Beach 3 (ii) y = 0.75x+ 2.02;
contributed to the high value of Miia x =1.19y —1.37
(a) \ = 0.74; combine 2 5, (iii) 9.924 (iv) 15.601
E; = 667.96, 494. 29, 182. 89, 45.11, (d) (i) —3.92
8.35, iLZA v= A, Xue = 13.8, (ii) y = —0.524x« + 70.58;
Not aden unis «=\—1.37y + 100.8
(c) v= 3, Meare = 13.8, not consistent (iii) 6.434 (iv) 16.84
y+1.45x = 114.4;
Noe 13.77, v = 6, Reject uniform (a) No, x is controlled
distribution (b)y = 61 (2S.F.)
10. (a) Combine first 3 classes, v = 2 (b) y = 0.0207x + 0.614
E; = 15.24, 25.76, 27.37, 11.63, F = 0.901—6.33, F = 20.8
Xralc = 0.144; NS., Yes Age 815, sf = 175.
(b) Combine > 4, v = 3, X2a1¢ = 0.404 A =6.3+0.49T, 16.1, 0.49
E;= 25.85, 41.75, 33.72, 18.16, (b) ¥.= 12715471 A7%
10.52,N.S. Cy C= bribe 35,
(c) Both values of Xoere are very small Y = 3.976+11.28C, 5.22
gla (a) E;= 6.081, 17.026, 23.837 ,22.248, T= 500m =1n6)
15.573, 8.721, 6.514 E=—2.4+0.008T, 1.04
Vv = 6 (totals agree),
Xzalc = 1.39, Yes Exercise 11d (page 587)
(b) X2a1c = 24.6, v= 4, Yes
ule (a) —0.47, Some negative correlation
12. P= 10 eae 16 (b) 0.77, High positive
No association at 5% level
Y= 6, XGate 43.2 (c) 0.95, Very high positive
Reject hypothesis (d) — 0.84, High negative
Ona
0.82;y = 0.48x + 22.36,
CHAPTER 11 x = 1.42y— 24.3
Exercise 11a (page 564) — 0.415
0.616, 14.57
NOTE: Answers may vary, depending on the
final approximation used and the
Exercise 11e (page 590)
number of figures retained in the
calculator during working. U, (a) 227 (b) 0.895
(c) y = 0.028x — 2.37,
1. (a) Positive correlation
x = 28.8y + 406.6
(x, ¥) = (15, 14.1)
(a) 0.088 (b)—0.359
(b) Negative correlation, (c) y =—0.63x + 677.0;
(X, ¥) = (8.2, 75.7) x =—0.204y + 992.8
(c) No correlation, (x, ¥) = (2.95, 8.25) (a) 7.1875 x10° (b) 0.51
2. yi 10 Sxt 93,34 (c) y = 2233x+ 96.9;
3. (a) Strong positive (b) (25, 592) x = 0.000 116y— 0.0102
(d) (i) 130 (ii) 18.5
4. ABS NOPOi; y = 0.8125« + 0.376; 2.4 Exercise 11f (page 594)
Exercise 11b (page 567) 0.26
0.75
ils x = 0.769y — 4.34 (a) 0.3, 0.5, 0.7
2. (i) (a) y = 0.64x + 4.50 (b) Mrs Brown and John
(b) x = 0.75y + 4.42 1) Headrests 2) Heated rear window
(ii) (a) y = 0.438x + 7.57 3) Anti-rust treatment
(b) « = 1.04y + 6.18
674 A CONCISE COURSE IN A-LEVEL STATISTICS

4. 0.86 y = 0.53 8x x =1.01y + 94.4;


— 25.4,
5. (a) —0.52 (b) 0.82 (c) 0.9 0.73, 53.7
(d) —0.9; Quite good agreement (a) Straight line, negative slope
6. Assume second judge gives no tied ranks (b) Monotonically increasing curve
7. —0.086, Very little negative correlation (c) —0.92 (d) —90.9
8. 0.60, Same (a) y =1.33x 4+ 5.7 (b) 28 330
9. (b) (38.375, 2.275) (c) —0.84 y = 0.65 x 36.5
— 2.59,
(d) High negative correlation (a) 0.6 (by x= 1:238y— 1:17
10. 0.84,E y = 0.238x-- 11.5
11. (a) 0.75, High positive correlation Y = 0.48X+ 56.3; 0.995
12. (b) 0.84, High positive correlation (a) (ii) (40, 414) (iv) 7, 185; 275
(b) (ii) Near +1
Exercise 11g (page 602) (a)*=07237 (b) — 0.26
n (c) Very little linear negative correlation
{ N.S. Se RS Pee 7, 95 0.91, High positive

ee
ey
S. oe 5, Gs 8, 0.3, 0.6
Exercise 11h (page 604) (a) (X =—1)=P(X=1)=6;
P(X =—0.5) = P(X = 0.5) = 3;
Xe NG (onde TIS: (b) 0.5
adie S. 1 eo emo;
(b) y = 1.038x + 0.53
r= 0.825, rg = 0.929, S at 1% level
Exercise 11i (page 604)
(a) r, = 0.511, rg = 0.660
1. Alay (b) —1 (c) 0.028 (b) r, significant at 5% level
(d) 0.886 (e)1 (f) —1 rg significant at 5% level
(g) 0.479 (h)—0.927 18. rg = —0.3341, rp, = —0.2545,
2. (a) 0.952(S.) (b) 0.6(N.S.) N.S. 5%;
rs = — 0.6939, Tyo O50 118
Exercise 11j (page 612) S. 25%
19. 0.48, 0.36
NS2.14 2) ap. sbere69e7 20. (a) rg = 0.527, S.
gos 4,
(b) = 1°57 9m eS aes
8. (a)1 (bj =i (c) 0.067 21. (a) (i) —0.976 (ii) —0.292
(d) 0.733 (e)1 (f) -1
22. rg = 0.845, rp, = 0.71 or 0.76
(g) 0.878 (h)—0.822 Both highly significant indicating
9. (a) 0.857 (S.) (b) 0.467 (N.S.)
an association.
10. Judges1,2; r,=0, NS.
Judges 1, 3; r, = 0.714, S. 1% level
23. (d) 0.962
24. (b) y = 4.120 + 22.3
Judges 2, 3; r, = —0.143, N.S.
(c) (i) (10.5, 91)
11. 0.429,N:S.
(iii) (a) Approx 64cm
(b) Approx 196 cm
Exercise 11k (page 623)
25. (b) y = —5.2x + 55.5 (c) 98.3p
1. Y=0.56X+ 2.9; 0.79 26. y = 0.0157x+ 0.65,
2. 0.60, W=0.89h—76 3days 19 hours, 0.9419
INDEX
Acceptance region 359 Consistent estimator 423
Addition law Contingency tables, 2 Xx 2 548
Alternative hypothesis H, 458 hxk 551
Appendix 1 629 Continuity correction 358
Appendix 2 641 Continuous data
Approximations, Poisson to binomial 248 random variable 272
normal to binomial 355 Correlation,
normal to Poisson 362 coefficient, product-moment 576
Arithmetic mean 37 Kendall’s rank 605
Arrangements 124 Spearman’s rank 591
Covariance 567
Bayes’ theorem 115 Critical region and values 459
Best estimator 425 Critical values,
Binomial distribution, Kendall’s coefficient 605
cumulative probability tables table 639
(use of) 219 Spearman’s coefficient 603
diagrammatic representation 217 table 640
expectation and variance 214 Cumulative distribution function,
fitting a distribution 225 continuous r.v. 287
goodness of fit test (x?) 540 discrete r.v. 189
normal approximation to 355 Cumulative frequency 17
Poisson approximation to 248 Cumulative probability tables,
recurrence formula 221 binomial 630
situation 209 Poisson 632
tests 506 use of 219 >
251

Calculator, use of 51, 61 444


Degrees of freedom
Central limit theorem 403 535
for X? test
Chi-squared distribution (x?) 533 554
summary of
tables 637 343
De-standardising
tests, binomial 540 46
Deviation from mean
contingency tables 548
Difference between normal r.v.
normal 545 374
(distribution of)
Poisson 543
Difference between means
ratio (in a given) 539 481
(significance test)
uniform distribution 536 497
proportions (significance test)
Circular diagram el)
Discrete data
Class boundaries random variable 167
Class width Dispersion, measures of 46
Classical probability eo Distribution, frequency
Coding, for calculating mean 43 167
probability
product moment correlation of sample mean 395
coefficient 588 of sample proportion 406
standard deviation 63
Coefficients of rank correlation,
605 Errors, type I and type II 518
Kendall’s
591 Estimation of population parameters 420
Spearman’s
131 Estimator, consistent 423
Combinations
most efficient of mean and variance 425
Conditional probability
433 pooled, of mean and variance 429
Confidence interval
434 pooled, proportion 431
mean, 0 known
437 proportion 428
mean, 0” unknown, large sample
444 unbiased 420
mean, go” unknown, small sample
448 Exhaustive events 89
proportion

675
676 A CONCISE COURSE IN A-LEVEL STATISTICS

Expectation, binomial distribution 214 Normal equations (regression lines) 565


continuous random variable 275 Null hypothesis Ho 458
discrete random variable LZ.
exponential distribution 308 Ogive 18
geometric distribution 236 One tailed tests 359
normal distribution 317
Poisson distribution 243 Percentile range (10-90) 20
rectangular distribution 304 Percentiles 29
Exponential distribution 307 Permutations 130
expectation and variance 308 Pie diagram 11
link with Poisson 314 Point estimation 420
Poisson, approximation to binomial 248
Factorial notation (!) 124 cumulative probability tables 632
Frequency distribution 1 diagrammatic representation 252
polygon 15 distribution (x? test) 543
expectation and variance 243
Geometric distribution 235 fitting a distribution 255
expectation and variance 236 mode 253
Geometric progressicn, use of in norma! approximation to 362
probability 2 recurrence formula 254
Goodness of fit tests 540 Pooled estimator of mean and variance 429
proportion 431
Histogram 5 Positive correlation 560
Hypotheses, alternative, null 458 Probability,
arrangements, permutations and
Impossible event 79 combinations
Independent events 94,105 classical definition
Interquartile range 29 conditional events
Interval estimation 433 continuous random variables
density function, continuous
Kendall’s rank correlation coefficient 605 discrete
significance of 607 from cumulative distribution
for normal variable
Least squares regression lines 564 distribution
Level of significance 459 exhaustive events
Linear correlation 559 independent events
Location, measures of 34 mutually exclusive events 90
trees
Mean, arithmetic 37 Product-moment correlation coefficient 576
confidence interval 433, 437, 444 Proportion, confidence interval
deviation from the mean 46 difference between significance test
difference between, significance test 481 distribution of sample
distribution of sample 399 estimation
most efficient estimators 425 significance tests
pooled estimator from two samples 429
significance test 464,474, 477 Quartiles 29
Measures of dispersion 46
location 34 Random numbers (use of) 412
Median 22, 288 table 629
Mode 34, 286 Random sampling 410
Multiples of a normal r.v. 383 frequency distribution 412
Multiplication law 95 probability distribution 413
Mutually exclusive events 86,104 Random variables, continuous 272
difference between 374
Negative correlation 560 discrete
Normal approximation to binomial 355 multiples of 383
to Poisson 362 sums of 374
Normal distribution 317,331 Range 46
X? test 545 Rank correlation, Kendall’s coefficient 605
expectation and variance 317 Spearman’s coefficient 591
tables 634, 635 Rectangular distribution 302
tables, use of 334 expectation and variance 304
INDEX 677

Recurrence formula, binomial 221 t-Distribution 441


Poisson 254 use of tables 442
Regression, coefficient of 568 tables 636
function 559 Tabulation of data
least squares lines 564 Test statistic 458
lines (by eye) 560 difference between means 481
Residuals 565 difference between proportions 497
minimum sum of squares of 569 mean 462, 474, 477
proportion 492
Sampling distribution, summary 503
difference between means 481 Tied ranks 593
difference between proportions 497 Tree diagrams 108
means 399 Triangular distribution 274
proportions 407 Two tailed tests 459
Sample mean 397 Type I and type II errors 518
Scatter diagram 559
Semi-interquartile range
Significance testing, level of 458 Unbiased estimator 420
difference between means 481 Unequal intervals (histogram)
difference between proportions 497 Uniform distribution 302
mean 465 X? test 536
single value 462 Unit interval, Poisson distribution 246
proportion 492
X? tests 534 Variance 48
Spearman’s rank correlation coefficient 591 binomial distribution 214
significance of 596 continuous random variable 281
table of critical values 639 discrete random variable 183
Standard deviation, exponential distribution 308
frequency distribution 58 geometric distribution 236
raw data most efficient estimator 425
Standard error of mean 399 normal distribution 317
proportion 407 pooled estimator from two samples 429
Standard normal distribution 332 Poisson distribution 243
tables (use of) 334 rectangular distribution 304
Standard width
Standardising, normal variable 333
Sum of normal random variables 374 Yates’ correction » 535
This book has been written with the needs of students and
teachers of the statistics section of A-level Pure Mathematics
with Statistics in mind. It will also be useful for A-/evel
Statistics and AS Mathematics with Applications.
In its Second Edition, this book remains the most lucid and
carefully prepared textbook in this.area. The authors have
provided an extremely supportive text that is fully up-to-date
with the latest syllabuses and the requirements of all the main
examination boards. The scope has been expanded to
include the exponential distribution, the use of random
number tables, testing the mean of a binomial and a Poisson
distribution and further detail on significance testing.
Throughout the book, theory is supported by worked
examples, and further consolidation is provided by carefully
graded exercises.

ISBN O-7487-0455-8
7
Stanley Thornes
Ola Station Drive |
Leckhampton
CHELDYENHAM i
Glos. GL53 ODN
780748"7045

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy