Screenshot 2024-10-16 at 8.23.19 PM
Screenshot 2024-10-16 at 8.23.19 PM
Chapter 1
Statistics
Introduction
All natural processes, as well as those devised by humans, are subject to variability. Civil engineers
are aware, for example, that crushing strengths of concrete, soil pressures, strengths of welds, traffic flow,
floods, and pollution loads in streams have wide variations. These may arise on account of natural changes
in properties, differences in interactions between the ingredients of a material, environmental factors, or
other causes. To cope with uncertainty, the engineer must first obtain and investigate a sample of data, such
as a set of flow data or triaxle test results. The sample is used in applying statistics and probability at the
descriptive stage.
For inferential purposes, however, one needs to make decisions regarding the population from which
the sample is drawn. By this we mean the total or aggregate, which, for most physical processes, is the
virtually unlimited universe of all possible measurements. The main interest of the statistician is in the
aggregation; the individual items provide the hints, clues, and evidence.
Having obtained a sample of data, the first step is its presentation. Consider, for example, the modulus
of rupture data for a certain type of timber. The initial problem facing the civil engineer is that such an
array of data by itself does not give a clear idea of the underlying characteristics of the stress values in this
natural type of construction material. To extract the salient features and the particular types of information
one needs, one must summarize the data and present them in some readily comprehensible forms. There
are several methods of presentation and organization of data.
Data Type
When working with statistics, it’s important to recognize the different types of data: numerical
(discrete and continuous), and categorical. Data are the actual pieces of information that you collect through
your study. For example, if you ask five of your friends how many pets they own, they might give you the
following data: 0, 2, 1, 4, 18. (The fifth friend might count each of her/his aquarium fish as a separate pet).
Not all data are numbers; let’s say you also record the gender of each of your friends, getting the following
data: male, male, female, male, female.
-1-
Chapter 1 Statistics
Qualitative Data
Qualitative data represents characteristics such as a person’s gender, marital status, hometown, or the
types of movies they like. Categorical data can take on numerical values (such as “1” indicating male and
“2” indicating female), but these numbers don’t have mathematical meaning. For example, you couldn’t
add them up together. Other names for categorical data are qualitative data, or Yes/No data.
Quantitative Data
Quantitative (Numerical) data have a meaning as a measurement, such as a person’s height, weight, IQ,
or blood pressure; or they’re a count, such as the number of stock shares a person owns, how many teeth a
dog has, or how many pages you can read of your favorite book before you fall asleep. Numerical data can
be further broken into two types: discrete and continuous.
Discrete Data
Discrete data represent items that can be counted; they take on possible values that can be listed out.
The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity (making
it countably infinite). For example, the number of heads in 100 coin flips takes on values from 0 through
100 (finite case), but the number of flips needed to get 100 heads takes on values from 100 (the fastest
scenario) on up to infinity (if you never get to that 100th head). Its possible values are listed as 100, 101,
102, 103,... (Representing the countably infinite case).
Continuous Data
Continuous data represent measurements; their possible values cannot be counted and can only be
described using intervals on the real number line. For example, the exact amount of gas purchased at the
pump for cars with 20-gallon tanks would be continuous data from 0 gallons to 20 gallons, represented by
the interval [0, 20], inclusive. You might pump 8.40 gallons, or 8.41, or 8.414863 gallons, or any possible
number from 0 to 20. In this way, continuous data can be thought of as being uncountable infinite. For ease
of record keeping, statisticians usually pick some point in the number to round off. Another example would
be that the lifetime of a C battery can be anywhere from 0 hours to an infinite number of hours (if it lasts
forever), technically, with all possible values in between. Granted, you don’t expect a battery to last more
than a few hundred hours, but no one can put a cap on how long it can go.
A graph is a method of presenting statistical data in visual form. The main purpose of any chart is to
give a quick, easy-to-read-and-interpret pictorial representation of data which is more difficult to obtain
from a table or a complete listing of the data. This session discusses how to do the following:
-2-
Chapter 1 Statistics
Example 1 Twenty elementary school children were asked if they live with both parents (B), father
only (F), mother only (M), or someone else (S). The responses of the children are as
follows.
M B B M F S B M F B
B F B M M B B F B M
Solution:
The relative frequency = (Relative frequencies are useful for comparing distributions of different sizes.)
∑
Relative Frequency
Frequency 0.4
0.3
M //// / 6 0.3 0.3
F //// 4 0.2 0.2
0.2
B //// //// 9 0.45
0.1 0.05
S / 1 0.05
0
Items
M F B S
Example 2 The following data give the number of computer course taken by 30 business major
who recently graduated from university.
2 3 2 3 1 4 2 2 3 4
1 2 3 1 1 3 2 2 4 2
1 2 3 1 1 3 2 2 4 1
Solution:
0.4 0.4
0.333
Relative
Items Tally Frequency 0.3
Relative Frequency
Frequency
1 //// /// 8 0.2666 0.2 0.1666
2 //// //// / 11 0.3666 0.1
0.1
3 //// // 7 0.2333
4 //// 4 0.1333 0
Items
1 2 3 4
-3-
Chapter 1 Statistics
Example 3 The following data give the number of computer keyboards assembled at Twentieth
Century Electronic Company for a sample of 25 days.
45 52 48 41 56 46 44 42 48 53
51 53 51 48 46 43 52 50 54 47
44 47 50 49 52
(i) Construct a frequency distribution table if the number of classes is 6.
(ii) Write the relative frequency for all categories.
(iii) Draw a histogram and ogive graph for the relative frequency distribution
and class cumulative frequency respectively.
(iv) Draw the polygon graph.
Solution:
The minimum value is 41. The maximum value is 56.
Range= maximum value - minimum value = 56 − 41 = 15
Class width = = = 2.66 ≈ 3 (round it up).
.
Relative
Cumulative Relative Midpoint
Classes Tally Frequency Cumulative
Frequency Frequency 𝑥
Frequency
41 – 44 /// 3 3 0.12 0.12 42.5
44 – 47 //// 5 8 0.20 0.32 45.5
47 – 50 //// / 6 14 0.24 0.56 48.5
50 – 53 //// // 7 21 0.28 0.84 51.5
53 – 56 /// 3 24 0.12 0.96 54.5
56 – 59 / 1 25 0.04 1 57.5
0.2 0.8
0.2 0.56
C. R. F
6 7
6 5
Frequency
4 3 3
2 1
0 0
0
39.5 42.5 45.5 48.5 51.5 54.5 57.5 60.5
…
-4-
Chapter 1 Statistics
Example 4 The following data give the number of computer terminals produced at the company
for a sample of 30 days.
24 32 27 23 33 33 29 25 23 28
21 26 31 22 27 33 27 23 28 29
31 35 34 22 26 28 23 35 31 27
(i) Construct a frequency distribution table if the number of classes is 6.
(ii) Write the relative frequency for all categories.
(iii) Draw a histogram and ogive graph for the relative frequency distribution
and class cumulative frequency respectively.
(iv) Draw the polygon graph.
Solution:
The minimum value is 35. The maximum value is 21.
Range= maximum value - minimum value = 35 − 21 = 14
Class width = = = 3.
.
Relative
Cumulative Relative Midpoint
Classes Tally Frequency Cumulative
Frequency Frequency 𝑥
Frequency
21 – 24 //// // 7 7 0.2333 0.2333 22.5
24 – 27 //// 4 11 0.1333 0.3666 25.5
27 – 30 //// 9 20 0.3000 0.6666 28.5
////
30 – 33 //// 4 24 0.1333 0.8 31.5
33 – 36 //// / 6 30 0.2000 1 34.5
0.4 1
0.3
Relative Frequency
9
7
6
Frequency
4 4
5
0 0
0
19.5 22.5 25.5 28.5 31.5 34.5 37.5
-5-
Chapter 1 Statistics
Stem-and-Leaf Plot
Stem-and-leaf plot is a statistical technique used if the number of classes is not given to present a
set of data by splitting each numerical value into 2 parts:
37 33 33 32 29 28 28 23
22 22 22 21 21 21 20 20
19 19 18 18 18 18 16 15
14 14 14 12 12 9 6
Solution. In this example, we will explain how to construct and interpret this kind of graph. A stem and
leaf display of the data is shown in the following table.
Stem Leaf
0 6 9
1 2 2 4 4 4 5 6 8 8 8 8 9 9
2 0 0 1 1 1 2 2 2 3 8 8 9
3 2 3 3 7
The left portion of the table contains the stems. They are the numbers 3, 2, 1, and 0, arranged as a column
to the left of the bars. These number is 10’s digits. A stem of 3, for example, can be used to represent the
10’s digit in any of the numbers from 30 to 39. The numbers to the right of the bar are leafs, and they
represent the 1’s digits. Every leaf in the graph therefore stands for the result of adding the leaf to 10 times
its stem.
Example 7 The following data give the annual incomes (an thousands of dollars) for 40 production
managers randomly selected from a large companies
57.6 63.3 47.3 72.5 41.2 66.1 59.6 68.5
73.3 39.4 44.15 84.9 53.7 37.7 63.3 77.4
60.2 55.9 43.1 35.6 49.3 67.4 79.2 71.9
48.8 73.2 76.0 64.3 51.8 73.5 48.8 63.5
81.5 72.7 69.4 51.5 77.5 67.9 46.1 65.1
Solution.
-6-
Chapter 1 Statistics
Stem Leaf
3 9.4 7.7 5.6
4 7.3 1.2 4.15 3.1 9.3 8.8 8.8 6.1
5 7.6 9.6 3.7 5.9 1.8 1.5
6 3.3 6.1 8.5 3.3 0.2 7.4 4.3 3.5 9.4 7.9 5.1
7 2.5 3.3 7.4 9.2 1.9 3.2 6.0 3.5 2.7 7.5
8 4.9 1.5
Relative
Cumulative Relative Midpoint
Classes Frequency Cumulative
Frequency Frequency 𝑥
Frequency
30 – 40 3 3 0.075 0.075 35
40 – 50 8 11 0.2 0.275 45
50 – 60 6 17 0.15 0.475 55
60 – 70 11 28 0.275 0.7 65
70 – 80 10 38 0.25 0.95 75
80 - 90 2 40 0.5 1 85
6
Relative
0.2 0.15
0.075 5 3 2
0.1 0.05 0 0
0 0
30 40 50 60 70 80 90 25 35 45 55 65 75 85 95
Mid Point of Classes
0.95 1
1 0.7
0.8
C. R. F
0.6 0.425
0.275
0.4 0.075
0.2 0
0
30 40 50 60 70 80 90
Example 7 The following are the GPA of 30 students who signed up for a graduate course at a
university
3.46 3.72 3.95 3.55 3.62 3.80 3.86 3.71 3.56 3.49
3.96 3.90 3.70 3.61 3.72 3.65 3.48 3.87 3.82 3.91
3.69 3.67 3.72 3.66 3.79 3.75 3.93 3.74 3.50 3.83
Solution.
-7-
Chapter 1 Statistics
Stem Leaf
3.4 6 8 9
3.5 0 5 6
3.6 1 2 5 6 7 9
3.7 0 1 2 2 2 4 5 9
3.8 0 2 3 6 7
3.9 0 1 3 5 6
Relative
Cumulative Relative Midpoint
Classes Frequency Cumulative
Frequency Frequency 𝑥
Frequency
3.4 – 3.5 3 3 0.1 0.1 3.45
3.5 – 3.6 3 6 0.1 0.2 3.55
3.6 – 3.7 6 12 0.2 0.4 3.65
3.7 – 3.8 8 20 0.2667 0.6667 3.75
3.8 – 3.9 5 25 0.1667 0.8334 3.85
3.9 – 4.0 5 30 0.1667 1 3.95
0.266 8
0.3
Relative Frequency
8 6
0.2 5 5
Frequency
1 0.83334 1
0.8
0.6667
R. C. F
0.6 0.4
0.4 0.1 0.2
0.2 0
0
3.4 3.5 3.6 3.7 3.8 3.9 4
Sheet (1)
(Statistics - Representation and organization of grouped and ungrouped Data)
1. A survey was taken on how much trust people place in the information they read on the Internet.
Construct a categorical frequency distribution for the data. A = trust in everything they read, M = trust
in most of what they read, H = trust in about one-half of what they read, S = trust in a small portion of
what they read.
M M M A H M S M H M S M M M M A M M A M
M M H M M M H M H M A M M M H M M M M M
-8-
Chapter 1 Statistics
(i) Construct a frequency distribution table.
(ii) Write the relative frequency for all categories.
(iii) Draw a bar graph for the frequency distribution and the relative frequency.
2. The children in a class state how many children there are in their family. The numbers that they state
are given below.
1 2 1 3 2 1 2 4 2 2 1 3 1 2 3
2 2 1 1 4 3 1 2 1 2 2 1 2 3 4
3. A 28 day Compressive strength of 40 cubes of M-25 grade concrete in the table below 2
35.28 41.11 26.04 38.41 32.56 34.22 39.31 40.55 36.08 38.09
28.93 35.75 25 36.63 25.28 33.99 34.77 30.71 34.65 34.63
39.06 37.81 26.79 29.01 38.41 41.3 25.95 35.54 31.65 36.57
32.48 32.46 30.15 33.82 36.76 35.39 31.8 25.55 40.36 41.38
4. The given data values represent the record of temperature (𝐹 ° ) for each of the U.S. states
112 100 127 120 133 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
5. The average quantitative GRE scores for the top 30 graduate schools of engineering are listed:
767 770 761 760 771 768 776 771 756 770
763 760 747 766 754 771 771 778 766 762
780 750 746 764 769 759 757 753 758 746
(i) Use it to construct a frequency distribution.
(ii) Using frequencies, sketch the frequency polygon.
6. The following data are the measures of the diameters of 30 rivet heads in 1/100 of an inch.
6.44 6.47 6.22 6.17 6.48 6.21 6.52 6.37 6.27 6.17 6.51 6.58 6.41 6.37 6.28
6.1 6.33 6.4 6.21 6.45 6.59 6.27 6.31 6.41 6.53 6.32 6.23 6.43 6.5 6.3
7. The cumulative frequency graph below gives the results of 120 students on a test.
120
100
Cumulative Frequency
80
60
40
20
0
0 20 40 60 80 Score 100
In this part we introduce the subject matter of descriptive statistics and in doing so learn ways to
describe and summarize a set of data.
In statistical terms we are trying to find a measure of central tendency. The question we are now faced
with is: what is the central position in our frequency distribution? One answer is simply to select the most
frequent mark, the longest bar in the histogram. This statistic is called the mode. Another measure of central
tendency that is used more often than the mode is the median or the second quartile (𝑄2 ). This is the
score that comes in the middle of the list when we have ordered it from lowest to highest. The median of
the data value below (𝑄2 ) is called the first quartile (𝑄1 ). The median of the data value above (Q2) is
called the third quartile (𝑄3 ).
Whilst we might regard the median as a better choice of a central value than the mode, as it finds the
score at the middle position rather than the most frequent score, there is a third measure of central tendency
that is used far more often than either of the above two measures. This is the mean. The mean represents
the average value of the given scores.
How to determine the 1st Quartile and 3rd Quartile for Ungrouped Data?
1. Sort the data values.
2. The 1st Quartile 𝑄 is the median of the data values that fall below 𝑄 . The 3rd Quartile 𝑄 the
median of the data values that fall above 𝑄 .
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Measures of central tendency locate the center of a distribution. They do not indicate how the values
are distributed around the center. Measuring variation examine the spread, or variation, of data values
around the center. For example, the following two groups illustrate the meaning of variation
Group I : 65 67 68 72 75 80 85 88
Group II : 10 20 30 40 110 120 130 140
65 + 67 + 68 + 72 + 75 + 80 + 85 + 88
𝜇 = = 75
8
10 + 20 + 30 + 40 + 110 + 120 + 130 + 140
𝜇 = = 75
8
The two distributions have the same mean. In group I, the data values are clustered closer to the mean but
in group I is more consistent. The mean is therefore considered representative of the data. Conversely, a
large measure of dispersion indicates that the mean is not reliable. A second reason for studying the
dispersion in a set of data is to compare the spread in two or more distributions.
- 11 -
Chapter 1 Statistics
How to determine the Variance and Standard Deviation for ungrouped data?
1. For a population of size 𝑁, the variance 𝜎 and standard deviation 𝜎 are given by:
∑(𝑥 − 𝜇) ∑(𝑥 − 𝜇)
𝜎 = , 𝜎=
𝑁 𝑁
We can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 ) (∑ 𝑥 )
∑𝑥 − ∑𝑥 −
𝜎 = 𝑁 , 𝜎= 𝑁
𝑁 𝑁
2. For a sample of size 𝑛, the variance 𝑆 and standard deviation 𝑆 are given by:
∑(𝑥 − 𝑥̅ ) ∑(𝑥 − 𝑥̅ )
𝑆 = , 𝑆=
𝑛−1 𝑛−1
Also, we can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 ) (∑ 𝑥 )
∑𝑥 − ∑𝑥 −
𝑆 = 𝑛 , 𝜎= 𝑛
𝑛−1 𝑛−1
Dividing the standard deviation by the mean gives the dimensionless measure of dispersion called the
population coefficient of variation, 𝜈 for a population:
𝜎
𝜈= ,
𝜇
and for a sample, the sample coefficient of variation:
𝑠
𝜈=
𝑥̅
This is usually expressed as a percentage. The coefficient of variation is useful in comparing different data
sets with respect to central location and dispersion.
∑𝑥 𝑓 ∑𝑥 𝑓
𝜇= =
∑𝑓 𝑁
∑𝑥 𝑓 ∑𝑥 𝑓
𝑥̅ = =
∑𝑓 𝑛
- 12 -
Chapter 1 Statistics
How to determine the Variance and Standard Deviation for Grouped Data?
1. For a population of size 𝑁, the variance 𝜎 and standard deviation 𝜎 are given by:
∑(𝑥 − 𝜇) 𝑓 ∑(𝑥 − 𝜇) 𝑓
𝜎 = , 𝜎=
𝑁 𝑁
We can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 𝑓) (∑ 𝑥 𝑓)
∑𝑥 𝑓 − ∑𝑥 𝑓 −
𝜎 = 𝑁 , 𝜎= 𝑁
𝑁 𝑁
2. For a sample of size 𝑛, the variance 𝑆 and standard deviation 𝑆 are given by:
∑(𝑥 − 𝑥̅ ) 𝑓 ∑(𝑥 − 𝑥̅ ) 𝑓
𝑆 = , 𝑆=
𝑛−1 𝑛−1
Also, we can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 𝑓) (∑ 𝑥 𝑓)
∑𝑥 𝑓 − ∑𝑥 𝑓 −
𝑆 = 𝑛 , 𝜎= 𝑛
𝑛−1 𝑛−1
Example 8 A sample of 7 business statistics books produced the following data on their prices.
56 47 68 55 71 52 62
(i) Calculate the mean, median, mode, range, variance and standard deviation.
(ii) Evaluate the first, third quartiles and interquartile range (IQR).
Solution: 𝑛 = 7
∑
Rank 𝑥 47 52 55 56 62 68 41 411
𝑥 2209 2704 3025 3136 3844 4624 5041 24583
∑ ( )
∑
Standard Deviation: 𝑠= = = 8.6739.
∑ ( )
∑
Variance : 𝑠 = = = 75.23.
The third quartile : The median of the data values that fall above 𝑄2 ⇒ 𝑄 = 68.
The first quartile : The median of the data values that fall below 𝑄2 ⇒ 𝑄 = 52.
Interquartile range : IQR = 𝑄3 − 𝑄1 = 68 − 52 = 16.
- 13 -
Chapter 1 Statistics
Example 9 The following table gives the 1992 gross sales (rounded to billions of dollars) for a
sample of eight U.S. companies.
(i) Calculate the mean, median, mode, range, variance and standard deviation.
(ii) Evaluate the first, third quartiles and interquartile range (IQR).
Gross Sales
Company
(Billions of dollar)
Philip Morris 50
General Electric 62
Pfizer 7
Merck 10
Coca-Cola 13
AT&T 65
Hewlett-Packard 17
Johnson & Johnson 13
Solution: 𝑛 = 7
∑
Rank 𝑥 7 10 13 13 17 50 62 65 237
𝑥 49 100 169 169 289 2500 3844 4225 11345
∑
Mean : 𝑥̅ = = = 29.625
Median : Median order = = 4 ⇒ 𝑄2 = = 15.
Mode : 13.
Range : Range = Max. value−Min. value = 65 − 7 = 58.
∑ ( )
∑
Variance : 𝑠 = = = 617.69.
∑ ( )
∑
Standard Deviation: 𝑠= = = 24.85.
The first quartile : The median of the data values that fall below 𝑄2 ⇒ 𝑄 = = 11.5.
The third quartile : The median of the data values that fall above 𝑄2 ⇒ 𝑄 = = 56.
There are two accepted technique for determining the approximate mode from grouped frequency
distribution:
1. Determining the modal class (the class with the highest frequency) and then using its
class mark as the approximate mode.
2. Using the following mode-locating formula for grouped frequency distributions.
- 14 -
Chapter 1 Statistics
𝑑
Mode ≈ 𝑙 + (𝑤 ),
𝑑 +𝑑
where
Using the same method of calculation as in the Median, we can get 𝑄1 and 𝑄3 equation as follows
𝑛 3𝑛
− 𝑐𝑓 − 𝑐𝑓
𝑄 ≈𝑙 + 4 (𝑤 ), 𝑄 ≈𝑙 + 4 (𝑤 )
𝑓 𝑓
∑𝑥 𝑓 ∑𝑥 𝑓
𝜇= =
∑𝑓 𝑁
∑𝑥 𝑓 ∑𝑥 𝑓
𝑥̅ = =
∑𝑓 𝑛
- 15 -
Chapter 1 Statistics
How to determine the Variance and Standard Deviation for Grouped Data?
1. For a population of size 𝑁, the variance 𝜎 and standard deviation 𝜎 are given by:
∑(𝑥 − 𝜇) 𝑓 ∑(𝑥 − 𝜇) 𝑓
𝜎 = , 𝜎=
𝑁 𝑁
We can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 𝑓) (∑ 𝑥 𝑓)
∑𝑥 𝑓 − ∑𝑥 𝑓 −
𝜎 = 𝑁 , 𝜎= 𝑁
𝑁 𝑁
2. For a sample of size 𝑛, the variance 𝑆 and standard deviation 𝑆 are given by:
∑(𝑥 − 𝑥̅ ) 𝑓 ∑(𝑥 − 𝑥̅ ) 𝑓
𝑆 = , 𝑆=
𝑛−1 𝑛−1
Also, we can write simple formulas of the variance and standard deviation in the forms
(∑ 𝑥 𝑓) (∑ 𝑥 𝑓)
∑𝑥 𝑓 − ∑𝑥 𝑓 −
𝑆 = 𝑛 , 𝜎= 𝑛
𝑛−1 𝑛−1
Example 10 A hardware distributor reports the following distribution of sales from a sample of 100
sales receipts.
Dollar values of sales Number of sales
0 – 20 16
20 – 40 18
40 – 60 14
60 – 80 24
80 – 100 20
100 – 120 8
(i) Calculate the mean, median, mode, range, variance, standard deviation and
sample variation coefficients.
(ii) Evaluate the first, third quartiles and interquartile range (IQR).
Solution:
∑
Mean : 𝑥̅ = = = 57.6
- 16 -
Chapter 1 Statistics
Median : Median order = = 50 ⇒ 48 ≤ 50 < 72.
Median class is 60 − 80
∑ ( )
∑
Variance : 𝑠 = = = 984.08.
∑ ( )
∑
Standard Deviation : 𝑠= = = 31.37.
.
Variation Coefficient: 𝜈= ̅= .
× 100%
Example 11 The following table gives the frequency distribution of total hours spent the summer
for a sample of 40 university students enrolled in an introductory during spring 2015.
Hours of Study Number of Students
24 – 40 3
40 – 56 5
56 – 72 10
72 – 88 12
88 – 104 5
104 – 120 5
(i) Calculate the mean, median, mode, range, variance, standard deviation and sample
variation coefficients.
(ii) Evaluate the first, third quartiles and interquartile range (IQR).
- 17 -
Chapter 1 Statistics
Solution:
∑
Mean : 𝑥̅ = = = 74.4
Median class is 72 − 88
∑ ( )
∑
Variance : 𝑠 = = = 506.09.
∑ ( )
∑
Standard Deviation : 𝑠= = = 22.49.
.
Variation Coefficient: 𝜈= ̅= × 100%
.
- 18 -
Chapter 1 Statistics
Sheet (2)
(Statistics - Measures of Central Tendency for grouped and ungrouped Data)
1. A tire manufacturer wants to determine the inner diameter of a certain grade of tire. Ideally, the diameter
would be 570 mm. The data are as follows:
572, 572, 573, 568, 569, 575, 565, 570.
Find the sample mean and median, range, variance, standard deviation, quartiles and interquartile range
(IQR). Using the calculated statistics can you comment on the quality of the tires?
2. The average height of a group of 25 children was calculated to be 78.4 cm. It was later discovered that
one value was misread as 69 cm instead of the correct value of 96 cm. Calculate the correct average.
3. In the competition of tug of war, two teams play with each other. The weight, in kilograms, of the 8
members of Hereward House team are 75, 73, 77, 76, 84, 76, 77 and 78. The weight, in kilograms, of
the 8 members of Nelson House team are 100, 73, 54, 95, 80, 76, 70 and 60.
(i) Which team do you think will win a tug of war between Hereward House and Nelson House?
Give a reason for your answer using statistical calculation.
(ii) If the highest weight in Hereward House team is replaced by member has weight 76 kilograms.
Which team do you prefer to train it? Give a reason for your answer using statistical calculation.
4. The number of M&Ms is counted in several bags and recorded in the frequency table below
Number of M&Ms 37 38 39 40 41 42 43
Frequency 3 8 11 19 13 7 2
5. The following data are the measures of the resistance of resistors produced in electronic factory.
Resistance (kΩ) 0 ≤ 𝑅 < 15 15 ≤ 𝑅 < 30 30 ≤ 𝑅 < 45 45 ≤ 𝑅 < 60
Frequency (×1000) 50 100 75 25
(i) Construct a frequency distribution table with relative frequency for all categories.
(ii) Draw a histogram and ogive graph with the relative frequency distribution and class cumulative
frequency respectively.
(iii) Use histogram graph to find the mode value.
(iv) Draw the polygon graph.
(v) Calculate the mean, median, mode, range, variance and standard deviation.
(vi) Evaluate the first, third quartiles and interquartile range (IQR).
- 19 -
Chapter 1 Statistics
6. The following graph shows the number of hours a sample people spent viewing television one week
during the summer.
35
30
25
Frequency
20
15
10
5
0
0 10 20 30 40 50 60
(i) Construct a frequency distribution table with relative frequency for all categories.
(iii) Draw the polygon and ogive graph.
(iv) Calculate the mean, median, mode, range, variance and standard deviation.
(v) Evaluate the first, third quartiles and interquartile range (IQR).
7. In a certain examination, the average grade of all students in class A is 68.4 and that of all students in
class B is 71.2. If the average of both classes combined is 70, find the ratio of the number of students
in class A to the number in class B.
8. An incomplete distribution of families according to their expenditure per week is given below. The
median and mode for the distribution are $250 and $240 respectively. Calculate the missing
frequencies.
Expenditure 0 – 100 100 – 200 200 – 300 300 – 400 400 – 500
No. of families 14 ? 27 ? 15
- 20 -
Chapter 2 Probability
Chapter 2
Probability
Introduction
This chapter introduces the key concept of probability, its fundamental rules and properties and
discusses most basic methods of computing probabilities of various events. Probability theory is a branch
of mathematics that deals with repetitive events subject to chance variation.
Sample Space:
The set of all possible outcomes of an experiment. If the elements of this set are individual, distinct,
countable entities, then the sample space is said to be discrete, if, on the other hand, the elements are a
continuum of values, the sample space is said to be continuous.
Event:
A set of possible outcomes that share a common attribute from the sample space.
Example 1 In tossing a coin two times and recording the number of observed heads and tails,
identify the experiment and the sample space.
Solution:
1. Experiment: Toss a coin 2 times; record the number of observed heads (each one as an “𝐻”) and
tails (each one as a “𝑇”).
2. Sample space: The set S defined by
𝑆 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}
consisting of all possible 4 outcomes, is the sample space for this experiment. A sample space of N
possible outcomes yields 2 possible events.
-1-
Chapter 2 Probability
Example 2 In rolling a die and recording the number of observed faces, identify the experiment
and the sample space.
Solution:
Example 3 In tossing a coin 3 times and recording the number of observed heads and tails, identify
the experiment and the sample space.
Solution:
1. Experiment: Toss a coin 3 times; record the number of observed heads (each one as an “𝐻”) and
tails (each one as a “𝑇”).
2. Sample space: The set S defined by
𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}
consisting of all possible 8 outcomes, is the sample space for this experiment.
Example 4 In rolling a two die and recording the number of observed faces
• Identify the experiment and the sample space.
• Write the event 𝑨 that presents equal faces.
• Write the event 𝑩 that produces sum of faces more than 8.
• Write the event 𝑪 that gives sum divisible by 5.
Solution:
𝑆 = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1),
(3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2),
(5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Example 5 Out of six computer chips, two are defective. If two chips are randomly chosen for
testing (without replacement). Identify the experiment and list all the outcomes in the
sample space.
Solution:
1. Experiment: choosing 2 chips; record the number of observed defective (each one as a “D”) and non-defective
(each one as an “N”).
-2-
Chapter 2 Probability
Example 6 Drawing a flash memory from a lot contains 6 items having 1 defective memory until
defective flash memory appears.
Solution:
To write the elements of the sample space, we must note that if the defective memory draws first, the first
element is determined. The second element in S can determined if the first item is non-defective and the
second one is defective. So, the sample space is
𝑆 = {𝐷, 𝑁𝐷, 𝑁𝑁𝐷, 𝑁𝑁𝑁𝐷, 𝑁𝑁𝑁𝑁𝐷, 𝑁𝑁𝑁𝑁𝑁𝐷}.
Example 7 Among 10 laptop computers, five are good and five have defects. Unaware of this, a
customer buys 3 laptops. Write down the sample space. What events correspond in
this 𝑆 to the statements?
a. 𝑥 ∶ at least one defective computer is obtained.
b. 𝑦 ∶ exactly two computers are defective.
c. 𝑧 ∶ at most two defective computers.
In tossing a coin 3 times and recording the number of observed heads and tails, identify
the experiment and the sample space.
Solution:
𝑆 = {𝐷𝐷𝐷, 𝐷𝐷𝑁, 𝐷𝑁𝐷, 𝐷𝑁𝑁, 𝑁𝐷𝐷, 𝑁𝐷𝑁, 𝑁𝑁𝐷, 𝑁𝑁𝑁}
Definition: A union of events 𝐴, 𝐵, 𝐶, . .. is an events consisting of all the outcomes in all these events.
It occurs if any 𝐴, 𝐵, 𝐶, . .. occurs and therefore, corresponds to the word “𝑶𝑹”: 𝐴 or 𝐵 or
𝐶 or …
Notation: Union: 𝐴 ∪ 𝐵 ∪ 𝐶
-3-
Chapter 2 Probability
Definition: A intersection of events 𝐴, 𝐵, 𝐶, … is an events consisting of outcomes that are common in
all these events. It occurs if each 𝐴, 𝐵, 𝐶, … occurs and therefore, corresponds to the word
“𝑨𝑵𝑫”: 𝐴 and 𝐵 and 𝐶 and …
Notation: Intersection: 𝐴 ∩ 𝐵 ∩ 𝐶
Definition: A complement of events A is an event that occurs every time when A does not occur. It
consists of outcomes excluded from A, and therefore, corresponds to the word “𝑵𝑶𝑻”: not
𝐴.
Notation: Not: 𝐴
Definition: A difference of events 𝐴 and 𝐵 consists of all outcomes included in 𝐴 but excluded in 𝐵. It
occurs when 𝐴 occurs and 𝐵 does not, and therefore, corresponds to the word “𝑩𝑼𝑻 𝑵𝑶𝑻”:
𝐴 but not 𝐵.
Notation: Difference: 𝐴 ∩ 𝐵
-4-
Chapter 2 Probability
Definition: Events 𝐴, 𝐵, 𝐶, … . . . are disjoint or mutually exclusive if their intersection is empty.
Notation: Disjoint: 𝐴 ∩ 𝐵 = 𝜙
Solution:
1. region 5: 𝑀 ∩ 𝑇 ∩ 𝑉 .
2. region 3: 𝑉 ∩ 𝑇 ∩ 𝑀 .
3. region 1 and 2 together: 𝑀 ∩ 𝑉.
4. regions 4 and 7 together: (𝑀 ∩ 𝑇 ∩ 𝑉 ) ∪ (𝑉 ∩ 𝑀 ∩ 𝑇 ).
5. regions 3, 6, 7, and 8 together: (𝑉 ∪ 𝑇 ∩ 𝑀) ∪ (𝑀 ∪ 𝑉 ∪ 𝑇) .
Several results that follow from the foregoing definitions, which may easily be verified by means of Venn
diagrams, are as follows:
𝐴 ∪ 𝜙 = 𝐴, 𝐴 ∩ 𝜙 = 𝜙,
𝑆 = 𝜙, 𝜙 = 𝑆,
𝐴∪𝐴 = 𝑆, 𝐴 ∩ 𝐴 = 𝜙,
(𝐴 ∪ 𝐵) = 𝐴 ∩ 𝐵 , (𝐴 ∩ 𝐵) = 𝜙,
-5-
Chapter 2 Probability
The concept of probability perfectly agrees with our intuition. In everyday life, probability of an event
is understood as a chance that this event will happen. If a fair coin is tossed, we say that it has a 50-50
(equal) chance of turning up heads or tails. Hence, the probability of each side equals 1/2. It does not mean
that a coin tossed 10 times will always produce exactly 5 heads and 5 tails.
If you don’t believe, try it! However, if you toss a coin 1 million times, the proportion of heads is
anticipated to be very close to 1/2. This example suggests that in a long run, probability can be viewed as a
proportion, or relative frequency. In forecasting, it is common to speak about the probability of an event as
a likelihood of this event to happen (say, the company’s profit is likely to rise during the next quarter). In
gambling and lottery, probability is equivalent to odds. Having the winning odds of 1 to 100 (1:100) means
that the probability to win is 0.01.
Definition: Probability is a finite measure. Being finite means that it has the largest possible value,
which is one. Being a measure means first of all that it is a function or mechanism that takes
input, event 𝐸, and converts it into output, probability 𝑃(𝐸).
Definition: (Classical Definition) Suppose an event 𝐸 can occur in n ways out of a total 𝑁 equally likely possible
ways. Then
𝑛
𝑃(𝐸) =
𝑁
Definition: Let 𝑆 = 𝑠1 , 𝑠2 , 𝑠3 , … , 𝑠𝑛 be the sample space of a given random experiment with equally likely
outcomes. Suppose that 𝐸 ⊂ 𝑆 be a given event. Let 𝑁 denotes the number of outcomes (elements
or points) in 𝑆 and 𝑛 denotes the number of outcomes in 𝐸, then the probability 𝑃(𝐸) of an event E
is given by:
number of the element in 𝐸 𝑛
𝑃(𝐸) = =
number of the element in 𝑆 𝑁
Proposition 1:
𝑃(𝑆) = 1
Proposition 2:
𝑃(𝜙) = 0
Proposition 3:
𝑃(𝐸 ) = 1 − 𝑃(𝐸)
-6-
Chapter 2 Probability
Example 9 In tossing the coin three times. Draw the tree diagram and find the probability of the
following events.
(i) A: obtaining at least two heads.
(ii) B: obtaining at most two heads.
(iii)C: getting two tails and one head.
(iv) D: getting exactly one tails.
Solution:
𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}.
(i) 𝐴 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝑇𝐻𝐻, 𝐻𝑇𝐻} , ⇒ 𝑃(𝐴) = ,
(ii) 𝐵 = {𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇} , ⇒ 𝑃(𝐵) = ,
(iii) 𝐶 = {𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻} , ⇒ 𝑃(𝐶) = ,
(iv) 𝐷 = {𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻} , ⇒ 𝑃(𝐷) =
(i)
𝐴 = {(2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (3, 2), (4, 2), (5, 2), (6, 2), (4, 3), (5, 3), (6, 3), . (5, 4), (6, 4), (6, 5)},
𝑃(𝐴) = .
(ii)
𝐵 = {(4, 5), (5, 4), (6, 3), (3, 6), (5, 5), (4, 6), (6, 4), (6, 5), (5, 6), (6, 6)},
10
𝑃(𝐵) = 36.
(iii)
𝐶 = {(1, 4), (4, 1), (2, 3), (3, 2), (5, 5), (4, 6), (6, 4)},
7
𝑃(𝐶) = 36 .
(iv)
𝐷 = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)},
-7-
Chapter 2 Probability
7
𝑃(𝐷) = .
36
(v)
𝐵 ∩ 𝐶 = {(5, 5), (4, 6), (6, 4)},
3
𝑃(𝐵 ∩ 𝐶) = .
36
(vi)
𝐵 ∩ 𝐷 = {(4, 4)},
1
𝑃(𝐵 ∩ 𝐷) = .
36
Example 11 A die is loaded in such a way that the even numbers are twice as likely to occur as the
odd numbers. Find the probability of the following events:
(i) A: the number has perfect square.
(ii) B: the number is greater that 3 and smaller than or equal 5.
(iii) C: the number is divisible by 3.
(iv) D: the number is divisible by 4.
(v) which of the pervious events are mutually exclusive?
Solution:
𝑆 = {1,2,3,4,5,6}.
The die is not fair. So, the number of an even numbers is 6 but the number of an odd numbers is 3. Then
𝑃(1) = , 𝑃(2) = , 𝑃(3) = , 𝑃(4) = , 𝑃(5) = , 𝑃(6) =
(i) 𝐴 = {1, 4} , ⇒ 𝑃(𝐴) = 𝑃(1) + 𝑃(4) = + = .
(ii) 𝐵 = {4, 5} , ⇒ 𝑃(𝐵) = 𝑃(4) + 𝑃(5) = + = .
(iii) 𝐶 = {3, 6} , ⇒ 𝑃(𝐶) = 𝑃(3) + 𝑃(6) = + = .
(iv) 𝐷 = {4} , ⇒ 𝑃(𝐷) = 𝑃(4) = .
A and C are mutually exclusive.
B and C are mutually exclusive.
D and C are mutually exclusive.
Example 12 A box contains 5 red balls, 6 green balls and 4 blue balls. A ball is drawn at random
from the box. Find
(i) The probability that the ball is green.
(ii) The probability that the ball is red or blue.
(iii) The probability that the ball is not red.
(iv) The probability that the ball is green or blue.
Solution:
(i) 𝑃(𝐺) = .
(ii) 𝑃(𝑅 ∪ 𝐵) = 𝑃(𝑅) + 𝑃(𝐵) = + = .
(iii) 𝑃(𝑅 ) = 1 − 𝑃(𝑅) = 1 − = .
(iv) 𝑃(𝐺 ∪ 𝐵) = 𝑃(𝐺) + 𝑃(𝐵) = + = = 𝑃(𝑅 ).
-8-
Chapter 2 Probability
Example 13 If 𝐴 and 𝐵 are two events in the sample space 𝑆 such that 𝑃(𝐴) = and 𝑃(𝐵) =
determine whether 𝐴 and 𝐵 are mutually exclusive or not?
Solution: Assume A and B are mutually exclusive. So, 𝐴 ∩ 𝐵 = 𝜙 or 𝑃(𝐴 ∩ 𝐵) = 0. Then,
2 4 6
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = + = > 1.
5 5 5
The result is incorrect, because the probability must be less than or equal 1. Then, our assumption is not
true. So, 𝐴 and 𝐵 are not mutually exclusive.
Example 14 If 𝑨 and 𝑩 are two events in the sample space 𝑺 such that 𝑷(𝑨) = 𝟑 , 𝑷(𝑩) = 𝟏 and
𝟖 𝟐
𝟏
𝑷(𝑨 ∩ 𝑩) = 𝟒, Find:
(i) 𝑷(𝑨𝒄 ∩ 𝑩𝒄 ).
(ii) 𝑷(𝑨𝒄 ∪ 𝑩𝒄 ).
(iii) 𝑷(𝑩 ∩ 𝑨𝒄 ).
(iv) 𝑷(𝑨 ∩ 𝑩𝒄 ).
Solution:
Example 15 If A and B are two events in the sample space S such that P(A) = and P(A ∩ B ) =
Find P(B) in each of the following cases:
(i) A and B are disjoint events.
(ii) A ⊂ B.
Solution:
𝑃(𝐴 ∩ 𝐵 ) = 𝑃 ((𝐴 ∪ 𝐵) ) = 1 − 𝑃(𝐴 ∪ 𝐵) = 1 − [𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)]
(i) Because 𝐴 and 𝐵 are disjoint, 𝐴 ∩ 𝐵 = 𝜙 or 𝑃(𝐴 ∩ 𝐵) = 0.
= 1− + 𝑃(𝐵) − 0 ⇒ 𝑃(𝐵) = 1 − − = .
-9-
Chapter 2 Probability
Sheet (3)
(Probability - Basic Rules)
5. In tossing a coin six times, find the probability of getting at least one head occurs.
6. A card is drawn from an ordinary deck of playing cards. What is the probability that it is either a spade
or an ace?
7. A class consists of 100 students. 60 students like music, 70 students like football and 40 students like
football and music. A student is chosen randomly, what is the probability that
(i) He like music or football. (ii) He likes music only.
(iii) He likes football only. (iv) He doesn't like music or football.
(v) He doesn't like music and football. (vi) He likes at least one of them.
(vii) He likes at most one of them. (viii) He likes only one of them.
8. A new computer virus can enter the system through the e-mail or through the internet. There is a 30%
chance of receiving this virus through the e-mail. There is a 40% chance of receiving it through the
internet. Also, the virus enters the system simultaneously through the e-mail and the internet with
probability 0.15. What is the probability that the virus does not enter the system at all?
9. Three newspapers A, B, C are published in a city and a survey of readers indicates the following: 20%
read A, 16% read B, 14% read C, 8% read both A and B, 5% read both A and C, 4% read both B and C,
and 2% read all three. For a person chosen at random, find the probability that he reads none of the
papers.
10. Let A and B be two events such that P(A) = , P(B ) = and P(A ∩ B ) = . Find P(𝐴 ∪ 𝐵),
P(𝐴 ∪ 𝐵 ).
11. Let A and B be two events such that P(A) = , P(B) = and P(A ∩ B) = . Find P(𝐴 ∩ B ),
P(A ∪ B ) and P(B ∩ 𝐴 ).
12. Let A and B be two events such that P(A) = 0.2, and P(A ∩ B ) = 0.6. Find P(𝐵) in the following
cases:
(i) 𝐴 ⊂ 𝐵. (ii) 𝐴 and 𝐵 are mutually exclusive.
- 10 -
Chapter 2 Probability
Conditional Probability
The probability of the occurrence of an event E when another event E is known to have
already happened is called “Conditional Probability” and is denoted by P(E |E ).
Independent Events
An event E1 is said to be independent of an event E2 if P(E1 |E2 ) = P(E1 ) i.e., if the probability
of the occurrence of E1 is independent of the occurrence of E2 or P(𝐸 ∩ 𝐸 ) = P(E1 ) P(E )
Dependent Events
The probability of the simultaneous occurrence of two events is equal to the probability of one
of the events multiplied by the conditional probability of the other, i.e., for two events E1 and E2 ,
where P(E2 |E1 ) represents the conditional probability of the occurrence of E2 when the event E1
has already happened.
Example 16 Suppose that five good fuses and two defective ones have been
mixed up. To find the defective fuses, we test them one-by-one, at random and
without replacement. What is the probability that we are lucky and find both
of the defective fuses in the first two tests?
Solution:
Let D be the event that we find a defective fuse in the first test and D be the event
that we find a defective fuse in the second test. We want to compute
2 1 1
P(D ∩ D ) = P(D ) P(D |D ) = · = .
7 6 21
Example 17 A can hit a target 4 times in 5 shots; B can hit it 3 times in 4
shots. What is the probability that the two shots hit?
Solution:
Probability of A’s hitting the target =
Probability of B’s hitting the target =
- 11 -
Chapter 2 Probability
Example 2.19 One bag contains 4 white balls and 3 black balls, and a second
bag contains 3 white balls and 5 black balls. One ball is drawn from the first
bag and placed unseen in the second bag. What is the probability that a ball
now drawn from the second bag is black?
Solution:
P(The ball drawn from the second bag is black) = P(W ∩ B) + P(B ∩ B)
= P(W)P(B|W) + P(B)P(B|B)
4 5 3 6 39
= + = .
7 9 7 9 63
1 2 3 1
P(A ∩ B ∩ C ) = P(A ) P(B )P(C ) = × × = .
2 3 4 4
∴ The probability that the problem is solved by at least one of them is
1 3
P(At least one of them solve the problem) = 1 − P(A ∩ B ∩ C ) = 1 − = .
4 4
- 12 -
Chapter 2 Probability
Example 22 Three boxes each one has five balls numbered from one to five.
One ball drawn from each box. Find the probability that the sum of the numbers
appeared on the balls is more than four.
Solution:
Assume the event A describes the sum of the numbers appeared on the balls is more than four. Then, A
describes the sum of the numbers appeared on the balls is less than or equal four.
. = + + + =
.
4 121
P(A) = 1 − P(A ) = 1 − = .
125 125
- 13 -
Chapter 2 Probability
Example 23 Let A and B be two events such that 𝐏(𝐀) = 𝟎. 𝟖, 𝐏(𝐁) = 𝟎.4 and
𝐏(𝐀 ∩ 𝐁) = 𝟎. 𝟑. Find 𝐏(𝐀|𝐁), 𝐏(𝐁|𝐀), 𝐏(𝐀 ∪ 𝐁), 𝐏(𝐀𝐜 ) and 𝐏(𝐁 𝐜 ).
Solution:
P(A ∩ B) 0.3 3 P(A ∪ B) 0.3 3
P(A|B) = = = , P(B|A) = = = .
𝑃(𝐵) 0.8 8 𝑃(𝐴) 0.4 4
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.8 + 0.4 − 0.3 = 0.9.
P(A ) = 1 − P(A) = 0.2. P(B ) = 1 − P(B) = 0.6.
Example 24 Let 𝐀 and 𝐁 be two events such that 𝐏(𝐀) = 𝟎. 𝟖, 𝐏(𝐁) = 𝟎. 𝟒 and
𝐏(𝐀 ∩ 𝐁) = 𝟎. 𝟑 Find 𝐏(𝐀𝐜 |𝐁 𝐜 ) and 𝐏(𝐁 𝐜 |𝐀𝐜 ).
Solution:
P(A ∩ B ) = P ((A ∪ B) ) = 1 − P(A ∪ B) = 1 − P(A) + P(B) − P(A ∩ B)
= 1 − 0.8 − 0.4 + 0.3 = 0.1.
P(A ∩ B ) 0.1 0.1 1
P(A |B ) = = = = .
P(B ) 1 − P(B) 0.6 6
P(A ∩ B ) 0.1 0.1 1
P(B |A ) = = = = .
P(A ) 1 − P(A) 0.2 2
Example 25 Let 𝐀 and 𝐁 be two events such that 𝐏(𝐀𝐜 ∩ 𝐁 𝐜 ) = 𝟎. 𝟐, 𝐏(𝐀 ∩ 𝐁 𝐜 ) = 𝟎. 𝟐 and
𝐏(𝐀𝐜 ∩ 𝐁) = 𝟎. 𝟓. Find 𝐏(𝐀|𝐁) and 𝐏(𝐁|𝐀).
Solution:
P(A ∩ B ) = P ((A ∪ B) ) = 1 − P(A ∪ B) = 0.2 ⇒ P(A ∪ B) = 0.8.
P(A ∩ B ) = P(A) − P(A ∩ B) = 0.2 . . . . . . . . . . . (1)
P(A ∩ B) = P(B) − P(A ∩ B) = 0.5 . . . . . . . . . . . (2)
By adding eq.(1) and eq.(2) yields,
P(A) + P(B) − P(A ∩ B) − P(A ∩ B) = 0.7 ⇒ P(A ∪ B) − P(A ∩ B) = 0.7
P(A ∩ B) = 0.1, P(A) = 0.3, P(B) = 0.6.
P(A ∩ B) 0.1 1 P(A ∩ B) 0.1 1
𝑃(B|A) = = = , P(A|B) = = = .
P(A) 0.3 3 P(B) 0.6 6
Bayes’ Rule
If E1 , E2 , . . . , En are mutually exclusive and exhaustive events with P(Ei ) ≠ 0 of a random experiment then
for any arbitrary event A of the sample space of the above experiment with P(A) > 0, we have
- 14 -
Chapter 2 Probability
Example 28 Three different boxes contain colored balls. Box I contains 2 red,
3 white, 5 blue balls all numbered 1, box II contains 4 red, 1 white and 3 blue
balls all numbered 2 while box III contains 3 red, 4 white and 3 blue balls all
numbered 3. The balls in the three boxes are mixed together in one box and
then a ball is drawn. If the ball drawn was white, then what is the probability
that it com from the box 2.
Solution.
Let
E : the ball is drawn from box I,
E : the ball is drawn from box II,
E : the ball is drawn from box III,
and
W: the ball is white.
We have to find P(E2 |W).
By Bayes’ Theorem
- 16 -
Chapter 2 Probability
Sheet (4)
1. Let A and B are two events such that P(A|B) = 0.4, P(B) = 0.25 and P(A) = 0.2.
(i) Find P(𝐵|𝐴). (ii) 𝑃(𝐴 ∩ 𝐵). (ii) 𝑃(𝐴 ∪ 𝐵).
2. For two events 𝐴 and 𝐵, let 𝑃(𝐴|𝐵) = 0.4, 𝑃(𝐵|𝐴) = 0.25 and 𝑃(𝐴 ∩ 𝐵) = 0.12.
(i) Calculate the value of 𝑃(𝐵).
(ii) Give a reason why 𝐴 and 𝐵 are not independent.
(iii) Calculate the value of 𝑃(𝐴 ∩ 𝐵 ).
3. For two events 𝐴 and 𝐵, let 𝑃(𝐴) = 0.4, 𝑃(𝐵) = 𝑝 and 𝑃(𝐴 ∪ 𝐵) = 0.6
(i) Find 𝑝 so that 𝐴 and 𝐵 are independent events.
(ii) For what value of 𝑝 are 𝐴 and 𝐵 mutually exclusive?
4. Let A and B be two events such that P(A ∪ B) = , P(A ∩ B) = and P(A ∩ B ) = .
(i) Find P(𝐴|𝐵). (ii) Check the independence of 𝐴 and 𝐵.
5. Ninety percent of flights depart on time. Eighty percent of flights arrive on time. Seventy-five percent
of flights depart on time and arrive on time. What is the probability that
(i) A flight will arrive on time if it departed on time.
(ii) A flight will depart on time if it arrived on time.
(iii) A flight will arrive on time if it didn't depart on time.
(iv) Are the events departing on time and arriving on time independent?
6. Three cards are drawn in succession, without replacement, from an ordinary deck of playing cards. Find
the probability that the event 𝐴 ∩ 𝐴 ∩ 𝐴 occurs, where 𝐴 is the event that the first card is a red ace,
𝐴 is the event that the second card is a 10 or a jack, and 𝐴 is the event that the third card is greater
than 3 but less than 7.
7. A problem in statistics is given to five students. Their chances of solving it are , , , and . What is
the probability that the problem will be solved?
8. A computer program is tested by 5 independent tests. If there is an error, these tests will discover it with
probabilities 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. Suppose that the program contains an error. What
is the probability that it will be found
(i) by at least one test?
(ii) by at least two tests?
(iii) by all five tests?
9. A bag contains 10 balls, two of which are red, three are blue, and five are black. Three balls are drawn
at random from the bag. What is the probability that
(i) the three balls are of different colors.
(ii) two balls are of the same color.
(iii) the balls are all of the same color.
11. Under good weather conditions, 80% of flights arrive on time. During bad weather, only 30% of flights
arrive on time. Tomorrow, the chance of good weather is 60%. What is the probability that your flight
will arrive on time?
12. A computer maker receives parts from three suppliers, S1, S2, and S3. Fifty percent come from S1,
twenty percent from S2, and thirty percent from S3. Among all the parts supplied by S1, 5% are
defective. For S2 and S3, the portion of defective parts is 3% and 6%, respectively.
(i) What portion of all the parts is defective?
(ii) A customer complains that a certain part in her recently purchased computer is
defective. What is the probability that it was supplied by S1?
13. A factor has four machines, the number of pieces produced per day by these are 1000, 1200, 1800 and
2000 respectively. The first machine produces on the average 1% defective prices, the second 0.5%, the
third 0.5% and the fourth 1%. If a piece selected at random is defective, what is the probability that it is
produced by the fourth machine?
14. An insurance company insured 2000 motorcycle drivers, 4000 car drivers, and 6000 truck drivers. The
probability of an accident is 0.01, 0.03, and 0.15 respectively. One of the insured persons has an accident.
What is the probability that he is a motorcycle driver?
15. Suppose the following three boxes are given: Box A contains 3 red and 5 white marbles. Box B contains
2 red and 1 white marbles. Box C contains 2 red and 3 white marbles. A box is selected at random, and
marble is randomly drawn from the box. If the marble is red, find the probability that it came from the
box A.
16. The contents of bags I, II, and III are as follows: 1 white, 2 black, and 3 red balls, 2 white, 1 black, and
1 red balls, and 4 white, 5 black, and 3 red balls. One bag is chosen at random and two balls are drawn
from it. They happen to be white and red. What is the probability that they come from bags I, II, or III?
- 18 -
Chapter 2 Probability
Counting Rules
In many of the examples in this chapter, it is easy to determine the number of outcomes in
each event. In more complicated examples, determining the outcomes in the sample space (or an
event) becomes more difficult. Instead, counts of the numbers of outcomes in the sample space
and various events are used to analyze the random experiments. These methods are referred to
as counting rules. Some simple rules can be used to simplify the calculations.
Counting Rule 1
If any one of 𝑘 different mutually exclusive and collectively exhaustive events can occur on each
of 𝑛 trials, the number of possible outcomes is equal to
𝑘𝑛
Example 2.29 Suppose you toss a coin five times. What is the number of different possible outcomes
(the sequences of heads and tails)?
Solution:
If you toss a two-sided coin five times, the number of outcomes is
𝑁 = 2 = 2 × 2 × 2 × 2 × 2 = 32
Example 2.30 Suppose you roll a die twice. How many different possible outcomes can occur?
Solution:
Counting Rule 2
If there are 𝑘 events on the first trial, 𝑘 events on the second trial, … , and 𝑘 events on the
𝑛th trial, then the number of possible outcomes is
𝑁 = 𝑘 × 𝑘 × 𝑘 × ··· × 𝑘
Example 2.31 In how many different ways can you seat 8 people in row at a dinner
table?
Solution:
For the first seat, there are eight choices. For the second, there are seven remaining
choices, since one person has already been seated. For the third seat, there are 6 choices, since two people
are already seated. By the time we get to the last seat, there is only one seat left.
Therefore, using the Multiplicative Rule above, we get
𝑁 = 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 = 40320
Example 2.32 Suppose there are 30 candidates that are competing for three
executive positions. How many different ways can you fill the three positions?
Solution:
Since there are three executive positions and 30 candidates, let 𝑛 is the number of
candidates that are available to fill the first position, 𝑛 is he number of candidates remaining to
fill the second position, and 𝑛 is the number of candidates remaining to fill the third position.
- 19 -
Chapter 2 Probability
Hence, we have 𝑘 = 30, 𝑘 = 29 and 𝑘 = 28. Then The number of different ways to fill the
three executive positions with the given candidates is
𝑁 = 𝑘 × 𝑘 × 𝑘 = 30 × 29 × 28 = 24360
Example 2.33 In a certain state, automobile license plates display three letters
followed by three digits. How many such plates are possible if repetition of the
letters is not allowed?
Solution:
There are six choices, one for each letter or digit on the license plate. At the first
stage, we choose a letter (from 26 possible choices); at the second stage, another letter (again
from 26 choices); at the third stage, another letter (26 choices); at the fourth stage, a digit (from
10 possible choices); at the fifth stage, a digit (again from 10 choices); and at the sixth stage,
another digit (10 choices). By the Fundamental Counting Principle, the number of possible license
plates is
𝑁 = 26 × 26 × 26 × 10 × 10 × 10 = 17576000
Counting Rule 3
The number of ways of arranged in order 𝑛 objects which 𝑝 are alike is
𝑛! 𝑛 × (𝑛 − 1) × (𝑛 − 2) × (𝑛 − 3) × ···× 3 × 2 × 1
=
𝑝! 𝑝 × (𝑝 − 1) × (𝑝 − 2) × (𝑝 − 3) × ···× 3 × 2 × 1
For example, consider the letters 𝐴, 𝐵, 𝐶, 𝐷. The first letter can be chosen in four ways (either 𝐴 or 𝐵 or 𝐶
or 𝐷), the second letter can be chosen in three ways, the third letter can be chosen in two ways, the fourth
letter can be chosen in only one way. Therefore the number of ways of arranging the four letters is 4 × 3 ×
2 × 1 = 4! = 24. The arrangements are
Example 2.34 If a set of six books is to be placed on a shelf, in how many ways
can the six books be arranged?
Solution:
To begin, you must realize that any of the six books could occupy the first position
on the shelf. Once the first position is filled, there are five books to choose from in filling the
second position. You continue this assignment procedure until all the positions are occupied. The
number of ways that you can arrange six books is
𝑁 = 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
- 20 -
Chapter 2 Probability
Example 2.35 A witness reported that a car seen speeding away from the scene
of the crime had a number plate begin V or W, the digits were 4, 7 and 8 and
the end letters were A, C, E. He could not however remember the order of the
digits or the end letters. How many would need to be sure of including car?
Solution:
There are 3! ways of arranging the digits 4, 7, 8 and 3! ways of arranging the letters A, C, E. There are two
choices for the initial letter. The total number of different plates 𝑁 = 2 × 3! × 3! = 72. So, 72 cars would
need to check.
Example 2.36 A hospital operating room needs to schedule three knee surgeries
and two hip surgeries in a day. We denote a knee and hip surgery as k and h,
respectively. What is the number of possible sequences of three knee and two
hip surgeries?
Solution:
The number of possible sequences of three knee and two hip surgeries is
5!
= 10
2! 3!
The 10 sequences are easily summarized as
{𝑘𝑘𝑘ℎℎ, 𝑘𝑘ℎ𝑘ℎ, 𝑘𝑘ℎℎ𝑘, 𝑘ℎ𝑘𝑘ℎ, 𝑘ℎ𝑘ℎ𝑘, 𝑘ℎℎ𝑘𝑘, ℎ𝑘𝑘𝑘ℎ, ℎ𝑘𝑘ℎ𝑘, ℎ𝑘ℎ𝑘𝑘, ℎℎ𝑘𝑘𝑘}
Example 2.37 How many distinct permutation can be formed from all the letter
of each word?
(i) can (ii) bottle (iii) banana (iv) statistics
Solution:
! ! !
(i) 3! (ii) (iii) (iv)
! ! ! ! ! !
In many instances you need to know the number of ways in which a subset of an entire group
of items can be arranged in order. Each possible arrangement is called a permutation.
Example 2.38 How many permutations consisting of five letters can be made from eight letters?
Solution:
Again, there are eight choices for the first position, seven for the second, six for the third, five for the fourth,
and four for the fifth. So, the number of choices is
8
𝑃5 = 8! = 8 × 7 × 6 × 5 × 4 = 6720.
8−5 !
Example 2.39 A club has nine members. In how many ways can a president, vice president, and
secretary be chosen from the members of this club?
Solution:
We need the number of ways of selecting three members, in order, for the positions of president, vice
president, and secretary from the nine club members. This number is
- 21 -
Chapter 2 Probability
9 9! = 9 × 8 × 7 = 504.
𝑃3 =
9−3 !
Example 2.40 If you have six books, but the shelf has four places for four books only, in how many
ways can you arrange these books on the shelf?
Solution:
The number of ordered arrangements of four books selected from six books is equal to
6 6! = 6 × 5 × 4 = 360.
𝑃4 =
6−4 !
Sometimes we are interested in counting the number of ordered sequences for objects that are not all
different. The following result is a useful, general calculation. In many situations, you are not interested in
the order of the outcomes but only in the number of ways that 𝑥 items can be selected from 𝑛 items,
irrespective of order. Each possible selection is called a combination.
Example 2.41 If you have six different books, but the shelf has four places for four books only, If the
order of the books on the shelf is irrelevant, in how many ways can you arrange these
books on the shelf?
Solution:
The number of combinations of four books selected from six books is equal to
6 6!
𝐶4 = 6 = = 6×5×4×3×2×1
= 15.
4 4! 6−4 ! 4×3×2×1 (2×1)
Example 2.42 A bin of 50 manufactured parts contains 3 defective parts. A sample of 6 parts is
selected from the 50 parts without replacement. How many different samples are
there of size 6 that contain exactly 2 defective parts?
Solution:
A subset containing exactly 2 defective parts can be formed by first choosing the 2 defective parts from the
three defective parts. This step can be completed in
3 3!
𝐶2 = 3 = = 3.
2 2! 3−2 !
Then, the second step is to select the remaining 4 parts from the 47 acceptable parts in the bin. The second
step can be completed in
47 47!
𝐶 4 = 47 = 4! 47−4 = 178365.
4 !
Therefore, from the multiplication rule, the number of subsets of size 6 that contain exactly 2 defective parts
is
3
𝐶 2 × 47𝐶 4=3×178365 = 535095.
Example 2.43 A group of 25 campers contains 15 women and 10 men. In how many ways can a
scouting party of 5 be chosen if it must consist of 3 women and 2 men?
Solution:
Three women can be chosen from the 15 women in the group in 15C3 ways, and two men can be chosen
from the 10 men in the group in 10C2 ways. Thus, the number of ways of choosing the scouting party is
15
C3 × 10C2 = 455 × 45 = 20475.
- 22 -
Chapter 2 Probability
𝑛(𝐸) 2 × 9!
𝑃(𝐸) = = = 0.2
𝑛(𝑆) 10!
Example 2.45 If a four-digit is formed the digits 1, 2, 3, 4 and 5 and repetition is not allowed, find
the probability that the number is divisible by 5.
Solution:
Let the possibility space be 𝑆, then 𝑛(𝑆) = = Let 𝐸 be the event “the number is divisible by 5”.
If the number is divisible by 5 then it must end with the digit 5.
𝑛(𝐸) = number of ways of arranging 1, 2, 3 =
So,
𝑛(𝐸)
𝑃(𝐸) = = = 0.2
120960
∴ 𝑃(vowels together) = = 0.024
4989600
- 23 -
Chapter 2 Probability
Definition:
Assume 𝑁 = 𝑁1 + 𝑁2 + 𝑁3 + ··· + 𝑁𝑟 objects of which 𝑁1 are of one type, 𝑁2 are of a second
type, . .. , and 𝑁𝑟 are of an 𝑟th. Then, the probability of simultaneously choosing 𝑛1 from 𝑁1 , 𝑛2
from 𝑁2 and so on, is therefore
Example 2.47 A box contains 8 red balls, 3 white balls and 9 blue balls. 3 balls are drawn at random
without replacement, determine the probability that
(i) all 3 balls are red.
(ii) all 3 balls are white.
(iii) 2 balls are red and 1 ball is blue.
(iv) at least 1 ball is white.
(v) one ball of each color is drawn.
Solution:
Example 2.48 A bag contains 7 white, 6 red, and 5 black balls. Two balls are drawn at random. Find
the probability that they will both be white.
Solution:
Total number of balls = 7 + 6 + 5 = 18.
Example 2.49 Four people are chosen at random from a group containing 3 men, 2 women, and 4
children. Show that the chance that exactly two of them will be children is .
Solution:
Total number of people is 9. the number of people being women or men is 5.
- 24 -
Chapter 2 Probability
Sheet (5)
(Probability - Counting Rules)
1. A student can select one of 6 different mathematics books, one of 3 different chemistry books and one
of 4 different science books. In how many different ways can a student select a book of mathematics, a
book of chemistry and a book of science?
2. A machine operator must make four safety checks before machining a part. It does not matter in which
order the check are made. In how many different ways can the operator make the checks?
6. In how many ways can gold, silver and bronze medals be awarded for a race run by 8 people?
7. Find the number of ways in which ten different books can be shared between a boy and girl if each is to
receive an even number of books.
8. A student is taking a probability test in which 7 questions out of 10 must be answered. In how many
ways can the student answer the exam if
(i) any 7 questions may be selected?
(ii) the first 2 questions must be selected?
(iii) the student must choose 3 questions from the first 5 and 4 questions from the last 5?
9. Suppose the question is matching: there are 6 questions and 10 possible choices. Now, how many
ways can you match?
10. How many 3-digits numbers can be formed from the digits of the set {2, 3, 4, 5, 6, 7, 8, 9}?, if:
(i) No repetitions are allowed.
(ii) No repetitions are allowed and the number must be odd.
(iii) Repetitions are allowed and the number must be even.
(iv) No repetitions are allowed and the number must be greater than 500.
(v) Repetitions are allowed and the number must be less than 400.
(vi) No repetitions are allowed and the number must be divisible by 5.
(vii) No repetitions are allowed and the number must be divisible by 3.
11. In how many ways can the letters of the word FACETIOUS be arranged in a line? What is the
probability that an arrangement begins with F and ends with S?
12. Seven cards labeled A, B, C, D, E, F, G, are thoroughly shuffled and then dealt out face upward on table.
Find the probability that
- 25 -
Chapter 2 Probability
(i) the first three cards to appear are the cards labeled A, B, C in that order.
(ii) the first three cards to appear are the cards labeled A, B, C but in any order.
(iii) the seven cards appear in their original orders: A, B, C, D, E, F, G.
13. Ezz, Omar, Ahmed and Donia decided to take part in a competition of bridge design joining between
two urban areas. If 50 competitors have entered the contest. If we have only three prizes, find the number
of different ways in which the prizes could be won if neither Ezz nor Omar wins a prize and each Ahmed
and Donia wins a prize?
14. On a shelf there are four mathematics books and eight English books
(i) If the book are to be arranged so that the mathematics books are together, in how many ways
can this be done?
(ii) What is the probability that all mathematics books will not be together?
15. From a group of six men and eight girl, five people are chosen at random. Find the probability that there
are more men chosen than girl.
16. A drum contains 3 black balls, 5 red balls and 6 green balls. If 4 balls are selected at random what is the
probability that the 4 selected contain
(i) no red ball?
(ii) exactly 1 black ball?
(iii) exactly 1 red ball and exactly 2 green balls?
17. Among eighteen computers in some store, six have defects. Five randomly, selected computers are
bought for the university lab. Compute the probability that
(i) all five computers have no defects.
(ii) only one computer has defect.
(iii) at most 4 computers have defects.
18. An internet search engine looks for a keyword in 9 databases, searching them in a random order. Only
5 of these databases contain the given keyword. Find the probability that it will be found at least 2 of
the first 4 searched databases.
19. A hand of 5 cards is dealt from a well-shuffled deck. What is the probability that the hand contains:
(i) no aces?
(ii) 5 clubs?
(iii) at least one ace?
(iv) 3 clubs and 2 hearts?
20. Five men in a company of twenty are graduated. If 3 men are picked out of 20 at random, what is the
probability that
(i) they are all graduated?
(ii) only one is graduated?
(iii) at least one is graduated?
- 26 -
Chapter 3 Random Variables
Chapter
Random Variables
Introduction
The outcome from a random experiment is summarized by a simple number. In many of the examples
of random experiments that we have considered, the sample space has been a description of possible
outcomes. In some cases, descriptions of outcomes are sufficient, but in other cases, it is useful to associate
a number with each outcome in the sample space. Because the particular outcome of the experiment is not
known in advance, the resulting value of our variable is not known in advance. For this reason, the variable
that associates a number with the outcome of a random experiment is referred to as a random variable.
Definition 1 A random variable is a function that assigns a real number to each outcome in the sample
space of a random experiment.
A random variable is denoted by an uppercase letter such as 𝑋. After an experiment is conducted, the
measured value of the random variable is denoted by a lowercase letter such as 𝑥 = 70 milliamperes.
Sometimes a measurement (such as current in a copper wire or length of a machined part) can assume
any value in an interval of real numbers (at least theoretically). Then arbitrary precision in the measurement
is possible. Of course, in practice, we might round off to the nearest tenth or hundredth of a unit. The
random variable that represents this measurement is said to be a continuous random variable.
The range of the random variable includes all values in an interval of real numbers; that is, the range
can be thought of as a continuum. In other experiments, we might record a count such as the number of
transmitted bits that are received in error. Then, the measurement is limited to integers. Or we might record
that a proportion such as 0.0042 of the 10,000 transmitted bits were received in error. Then, the
measurement is fractional, but it is still limited to discrete points on the real line. Whenever the
measurement is limited to discrete points on the real line, the random variable is said to be a discrete
random variable.
𝑝1 + 𝑝2 + 𝑝 + 𝑝 + ⋯ + 𝑝𝑛 = 1 or 𝑓(𝑥 ) = 1
𝑓(𝑥) is called the discrete probability distribution (probability density function (p.d.f)) for 𝑋 and it spells
out how a total probability of 1 is distributed over several values of the random variable.
The cumulative distribution function of 𝑋 with probability distribution 𝑓(𝑥) is the function defined by
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = 𝑓(𝑡)
𝜇= 𝑥 𝑓(𝑥 )
Other names for the mean are average or expected value 𝐸(𝑋). The variance is denoted by 𝜎 and define as
Theorem 3.1 If 𝑿 is a discrete random variable with probability 𝒇(𝒙), Expected value 𝑬(𝒙) and variance
𝑽𝒂𝒓(𝒙), the expected value and variance of the function 𝒈(𝑿) = 𝒂𝑿 ± 𝑩 is given by
= 𝑎 𝑥𝑖 𝑓(𝑥𝑖 ) + 𝑏 𝑓(𝑥𝑖 )
= 𝑎𝐸(𝑋) + 𝑏.
-2-
Chapter 3 Random Variables
= 𝑎 𝐸(𝑋 ) + 2𝑎𝑏𝐸(𝑋) + 𝐸(𝑏 ) − 𝑎 𝐸(𝑋) − 2𝑎𝑏𝐸(𝑋) − 𝑏
Example 1 Consider an experiment of tossing 3 fair coins and counting the number of heads. Let X
be the discrete random variable that denote the number of heads
(i) Write the elements of the random variable 𝑿.
(ii) Find the probability distribution 𝒇(𝒙) and the cumulative distribution 𝑭(𝑿)
(iii) Plot f(x) and F(x).
(iv) Find the mean and variance of 𝑿 and of 𝒈(𝑿) = 𝟐𝑿 − 𝟓.
-3-
Chapter 3 Random Variables
Mean : 𝐸(𝑥) = ∑ 𝑥𝑓(𝑥) = .
Variance : 𝑉𝑎𝑟(𝑥) = 𝐸(𝑋 ) − 𝐸(𝑥) = ∑ 𝑥 𝑓(𝑥) − [∑ 𝑥 𝑓(𝑥)] = − = .
Mean of Z : 𝐸(2𝑋 − 3) = 2𝐸(𝑋) − 3 = 2 − 3 = 0.
Variance of Z : 𝑉𝑎𝑟(2𝑋 − 3) = 4𝑉𝑎𝑟(𝑋) = .
Example 2 A die is loaded such that the even number is three times as likely to occur the odd
number. A die rolled once and the score recorded. Assume 𝑋 is a random variable denote
the number appears in die face
(v) Find the probability distribution 𝒇(𝒙) and the cumulative distribution 𝑭(𝑿)
(vi) Plot f(x) and F(x).
(vii) Find the expected mean and variance of the score 𝑿.
(viii) A ‘prize’ is awarded which depends on the score on the die. The value of the prize is
$𝒁 = 𝟑𝑿 − 𝟔. Find the expected mean and variance of 𝑍
Solution. 𝑆 = {1, 2, 3, 4, 5, 6}
The random variable is X = {1,2,3,4,5,6}. Then,
1 3
𝑓(1) = 𝑃(𝑋 = 1) = 𝑃(𝑁𝑜. 1) = , 𝑓(2) = 𝑃(𝑋 = 2) = 𝑃(𝑁𝑜. 2) = .
12 12
1 3
𝑓(3) = 𝑃(𝑋 = 3) = 𝑃(𝑁𝑜. 3) = , 𝑓(4) = 𝑃(𝑋 = 4) = 𝑃(𝑁𝑜. 4) = .
12 12
1 3
𝑓(5) = 𝑃(𝑋 = 5) = 𝑃(𝑁𝑜. 5) = , 𝑓(6) = 𝑃(𝑋 = 6) = 𝑃(𝑁𝑜. 6) = .
12 12
𝑥 1 2 3 4 5 6 ∑
1 3 1 3 1 3
𝑓(𝑥) 1
12 12 12 12 12 12
1 4 5 8 9 12
𝐹(𝑥) ---
12 12 12 12 12 12
1 6 3 12 5 18 45
𝑥𝑓(𝑥)
12 12 12 12 12 12 12
1 12 9 48 25 108 203
𝑥 𝑓(𝑥)
12 12 12 12 12 12 12
0, 𝑥 < 1,
1/12, 1 ≤ 𝑥 < 2,
4/12, 2 ≤ 𝑥 < 3,
𝐹(𝑥) = 5/12, 3 ≤ 𝑥 < 4,
8/12, 4 ≤ 𝑥 < 5,
9/12, 5 ≤ 𝑥 < 6,
⎩ 1, 6≤𝑥
Mean : 𝐸(𝑥) = ∑ 𝑥𝑓(𝑥) = .
Variance : 𝑉𝑎𝑟(𝑥) = 𝐸(𝑋 ) − 𝐸(𝑥) = ∑ 𝑥 𝑓(𝑥) − [∑ 𝑥 𝑓(𝑥)] = − = .
Mean of Z : 𝐸(𝑍) = 𝐸(3𝑋 − 6) = 3𝐸(𝑋) − 6 = 3 −6 = .
Variance of Z : 𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟(3𝑋 − 6) = 9𝑉𝑎𝑟(𝑋) = .
-4-
Chapter 3 Random Variables
Solution.
Since ∑ 𝑓(𝑥) = 1, we have
(i)
0 + 𝑘 + 2𝑘 + 2𝑘 + 3𝑘 + 𝑘 + 2𝑘 + 7𝑘 + 𝑘 = 1
⇒ 10𝑘 + 9𝑘 − 1 = 0 ⇒ (10𝑘 − 1)(𝑘 + 1) = 0 ⇒ 𝑘 = .
(ii)
𝑃(𝑋 < 6) = 1 − 𝑃(𝑋 ≥ 6) = 𝑓(𝑋 = 6) + 𝑓(𝑋 = 7) = 9𝑘 + 2𝑘 + 𝑘
= + =
(iii)
1 1 3 1
𝑃(𝑋 ≤ 1) = 𝑘 = < , 𝑃(𝑋 ≤ 2) = 𝑘 + 2𝑘 = < ,
10 2 10 2
5 1 8 1
𝑃(𝑋 ≤ 3) = 𝑘 + 2𝑘 + 2𝑘 = < , 𝑃(𝑋 ≤ 4) = 𝑘 + 2𝑘 + 2𝑘 + 3𝑘 = > ,
10 2 10 2
∴ The maximum value of 𝑥 so that 𝑃(𝑋 ≤ 𝑥) > is 4.
Next, we introduce the most commonly used families of discrete distributions. Amazingly, absolutely
different phenomena can be adequately described by the same mathematical model, or a family of
distributions. For example, as we shall see below, the number of virus attacks, received e-mails, error
-5-
Chapter 3 Random Variables
messages, network blackouts, telephone calls, traffic accidents, earthquakes, and so on can all be modeled
by the same Poisson family of distributions.
Binomial Distribution
Let there be 𝑛 independent trials in an experiment. Let a random variable 𝑋 denote the number of
successes in these 𝑛 trials. Let 𝑝 be the probability of a success and 𝑞 be that of a failure in a single trial so
that 𝑝 + 𝑞 = 1. Let the trials be independent and 𝑝 be constant for every trial. The probability function
is
𝑛
𝑃(𝑋 = 𝑥) = 𝑓(𝑥) = 𝑝 𝑞 = 𝑛𝐶𝑥 𝑝 𝑞
𝑥
where
𝑃(𝑋 = 𝑥) = probability of exactly 𝑥 success in 𝑛 trial
𝑛 = number of trials
𝑥 = number of successes
𝑝 = probability of success for any given trial
𝑞 = probability of failure for any given trial
= 𝑛𝐶𝑥
Expectation (mean) E(x) = 𝑛 𝑝
Variance 𝜎 = 𝑛𝑝𝑞
Example 4 One ship out of 9 was sunk on an average in making a certain voyage. What was the
probability that exactly 3 out of a convoy of 6 ships would arrive safely?
Example 5 Assume that one telephone number out of fifteen called between 2 P.M. and 3 P.M. on
week-days is busy. What is the probability
e that if 6 randomly selected telephone numbers
are called (i) not more than three, (ii) at least three of them will be busy?
Solution.
𝑝 = the probability of a telephone number being busy between 2 P.M. and 3 P.M. on week-days
𝑝= ,𝑞 = 1− = ,𝑛 = 6
(i)
The probability that not more than three will be busy is
-6-
Chapter 3 Random Variables
𝑃(𝑥 ≤ 3) = 𝑓(0) + 𝑓(1) + 𝑓(2) + 𝑓(3)
1 14 1 14 1 14 1 14
= 6𝐶0 + 6𝐶1 + 6𝐶2 + 6𝐶3 = 0.9997.
15 15 15 15 15 15 15 15
(ii)
The probability that at least three of them will be busy
𝑃(𝑥 ≥ 3) = 𝑓(3) + 𝑓(4) + 𝑓(5) + 𝑓(6)
1 14 1 14 1 14 1 14
= 6𝐶3 + 6𝐶4 + 6𝐶5 + 6𝐶6 = 0.005.
15 15 15 15 15 15 15 15
Example 6 The probability that a certain kind of component will survive a shock test is 3/4. Find the
probability that exactly 2 of the next 4 components tested survive.
Solution.
Assuming that the tests are independent and 𝑝 = , 𝑞 = and 𝑛 = 4 for each of the test, we obtain
3 1 27
𝑃(𝑥 = 2) = 4𝐶2 = .
4 4 128
Poisson Distribution
If the parameters 𝑛 and 𝑝 of a binomial distribution are known, we can find the distribution. But in
situations where 𝑛 is very large and 𝑝 is very small, the application of the binomial distribution is very
laborious. However, if we assume that as 𝑛 → ∞ and 𝑝 → 0 such that 𝑛𝑝 always remains finite, say 𝜆, we
get the Poisson approximation to the binomial distribution.
𝑒 𝜆
𝑃(𝑋 = 𝑥) = 𝑓(𝑥) =
𝑥!
where
Example 7 If the probability of producing a defective screw is 0.01, what is the probability that a lot
of 200 screw will contain more than 2 defective?
Solution.
Probability of getting a defective screw 𝑝 = 0.01, 𝑛 = 200, 𝜆 = 𝑛𝑝 = 2
The probability that a lot screw will contain more than 2 defective is
2 2 2
𝑃(𝑥 > 2) = 1 − 𝑃(𝑥 ≤ 2) = 1 − [𝑓(2) + 𝑓(1) + 𝑓(0)] = 1 − 𝑒 + +
2! 1! 0!
-7-
Chapter 3 Random Variables
Solution.
Probability of success 𝑝 = 5% = 0.05, 𝑛 = 100, 𝜆 = 𝑛𝑝 = (100)(0.05) = 5
Guarantee: 𝑋 not less than 2 ⇒ 𝑋 = 0, 1, 2
Example 9 Assume that the probability of an individual coal miner being injured in a certain way in
a mine accident during a year is 1/2400. Use Poisson’s distribution to calculate the
probability that in a mine employing 200 miners there will be at least one such similar
accident in a year.
Solution.
200 1
Here 𝑝 = 1/2400 , 𝑛 = 200 and 𝜆 = 𝑛𝑝 = 2400 = 12 = 0.083
( . ) .
𝑃(at least one fatal accident) = 1 − 𝑃(no fatal accident) = 1 − 𝑓(0) = 1 − !
= 0.08
Example 10 It is known from the past experience that in a certain plant there are on the average of
4 industrial accidents per month. Find the probability that in a given year will be less
that 3 accidents.
Solution. 𝜆 = 4
4 4 4
𝑃(𝑥 < 3) = 𝑓(2) + 𝑓(1) + 𝑓(0) = 𝑒 + +
2! 1! 0!
Let us consider an experiment where the properties are the same as those listed for a binomial
experiment, with the exception that the trials will be repeated until a fixed number of successes occur.
Therefore, instead of the probability of 𝑥 successes in 𝑛 trials, where 𝑛 is fixed, we are now interested in
the probability that the 𝑘th success occurs on the 𝑥th trial. Experiments of this kind are called negative
binomial experiments. The number 𝑋 of trials required to produce 𝑘 successes in a negative binomial
experiment is called a negative binomial random variable, and its probability distribution is called the
negative binomial distribution. If repeated independent trials can result in a success with probability 𝑝 and
a failure with probability 𝑞 = 1 − 𝑝, then the probability distribution of the random variable 𝑋, the number
of the trial on which the 𝑘th success occurs, is
𝑥 − 1 𝑘 𝑥−𝑘
𝑓(𝑋 = 𝑥, 𝑘) = 𝑝 𝑞 , 𝑥 = 𝑘, 𝑘 + 1, 𝑘 + 2, …
𝑘−1
𝑘 𝑘(1 − 𝑝)
𝐸(𝑋) = 𝑉𝑎𝑟(𝑋) =
𝑝 𝑝2
Example 11 In tossing a coin repeatedly, what is the probability of getting the third head in 8th trial?
Solution.
-8-
Chapter 3 Random Variables
Example 12 In rolling a die repeatedly, what is the probability of getting the fourth ”one” in 10th
trial?
Solution.
𝑝 = , and 𝑞 = , 𝑓(𝑥 = 10, 𝑘 = 4) = 9𝐶3
Example 13 In NBA (National Basketball Association) championship series, the team that wins four
games out of seven is the winner. Suppose that teams A and B face each other in the
championship games and that team A has probability 0.55 of winning a game over team
B.
(i) What is the probability that team A will win the series in 6 games?
(ii) What is the probability that team A will win the series?
(iii)If teams A and B were facing each other in a regional playoff series, which is
decided by winning three out of five games, what is the probability that team A
would win the series?
Solution.
𝑝 = 0.55, and 𝑞 = 0.46
(i)
𝑓(𝑥 = 6, 𝑘 = 4) = 5𝐶3(0.55) (1 − 0.55) = 0.1853.
(ii)
𝑃(team A wins the championship series)
= 𝑓(𝑥 = 4, 𝑘 = 4) + 𝑓(𝑥 = 5, 𝑘 = 4) + 𝑓(𝑥 = 6, 𝑘 = 4) + 𝑓(𝑥 = 7, 𝑘 = 4)
= 3𝐶3(0.55) (1 − 0.55) + 4𝐶3(0.55) (1 − 0.55) + 5𝐶3(0.55) (1 − 0.55) + 6𝐶3(0.55) (1 − 0.55)
= 0.0915 + 0.1647 + 0.1853 + 0.1668 = 0.6083.
(iii)
𝑃(team A wins the playoff)
= 𝑓(𝑥 = 3, 𝑘 = 3) + 𝑓(𝑥 = 4, 𝑘 = 3) + 𝑓(𝑥 = 5, 𝑘 = 3)
= 2𝐶2(0.55) (1 − 0.55) + 3𝐶2(0.55) (1 − 0.55) + 4𝐶2(0.55) (1 − 0.55)
= 0.1664 + 0.2246 + 0.2021 = 0.5931.
Geometric Distributions
If repeated independent trials can result in a success with probability p and a failure with probability
𝑞 = 1 − 𝑝, then the probability distribution of the random variable X, the number of the trial on which the
first success occurs, is
𝑓(𝑋 = 𝑥) = 𝑝 𝑞 , 𝑥 = 1, 2, 3, …
1 (1 − 𝑝)
𝐸(𝑋) = 𝑉𝑎𝑟(𝑋) =
𝑝 𝑝
-9-
Chapter 3 Random Variables
Example 14 In tossing a coin repeatedly, what is the probability of getting the first head in 8th trial?
Solution.
𝑝 = 0.5, and 𝑞 = 0.5, 𝑓(𝑥 = 8) = (0.5)(0.5)
Example 15 In rolling a die repeatedly, what is the probability of getting the first ”one” in 10th trial?
Solution.
𝑝 = 16, and 𝑞 = 56, 𝑓(𝑥 = 10) =
Example 16 For a certain manufacturing process, it is known that, on the average, 1 in every 100
items is defective. What is the probability that the fifth item inspected is the first
defective item found?
Solution.
Using the geometric distribution with 𝑥 = 5 and 𝑝 = 0.01, we have
𝑓(𝑥 = 5) = (0.01)(0.99)
- 10 -
Chapter
Chapter 4
Normal Distribution
The most important continuous probability distribution in the entire field of statistics is the normal
distribution. Its graph, called the normal curve, is the bell-shaped curve of the following figure, which
approximately describes many phenomena that occur in nature, industry, and research.
A continuous random variable 𝑋 having the bell-shaped distribution of the following figure is called a
normal random variable. The mathematical equation for the probability distribution of the normal variable
depends on the two parameters 𝜇 and 𝜎, its mean and standard deviation, respectively. Hence, we denote
the values of the density of 𝑋 by 𝑁(𝑥, 𝜇, 𝜎).
The density function of the normal random variable 𝑋 that gives the graph of the normal distribution,
with mean 𝜇 and variance 𝜎 , is
2
1 −1 𝑥−𝜇
𝜎
𝑓(𝑥) = 𝑒 2 , −∞<𝑥 <∞
√2𝜋 𝜎
-1-
Chapter
Since each normally distributed random variable has its own 𝜇 and 𝜎, therefore the shape and the location
of these curves will vary, i.e., there are infinitely many different normal distributions, and you would have
to have a table for areas “probabilities” under the curve of each variable. The following curves show how
the curve changes if one varies the parameters 𝜇 and 𝜎.
The difficulty encountered in solving integrals of normal density functions necessitates the tabulation of
normal curve areas for quick reference. However, it would be a hopeless task to attempt to set up separate
tables for every conceivable value of 𝜇 and 𝜎. Fortunately, we are able to transform all the observations
of any normal random variable 𝑋 into a new set of observations of a normal random variable 𝑍 with
mean 𝜇 = 0 and variance 𝜎 = 1. This can be done by means of the transformation
𝑋−𝜇
𝑍=
𝜎
In this case, the normal distribution is called the standard normal distribution and is written in the form
1 −1𝑧2
𝑓(𝑧) = 𝑒 2 , −∞<𝑧 <∞
√2𝜋
𝑓(𝑥)𝑑𝑥 = 1
The standard normal distribution table provides the probability that a normally distributed random variable
𝑍, with mean equal to 0 and variance equal to 1, is less than or equal to 𝑧. It does this for positive values
of 𝑧 only (i.e., 𝑧-values on the right-hand side of the mean). What this means in practice is that if someone
asks you to find the probability of a value being less than a specific, positive 𝑧-value, you can simply look
that value up in the table. We call this area 𝜙. Thus, for this table,
1 −1𝑧2
𝑃(𝑍 < 𝑎) = 𝑒 2 𝑑𝑧 = 𝜙(𝑎)
√2𝜋
where 𝑎 is positive.
Diagrammatically, the probability of 𝑍 less than 𝑎 being 𝜙(𝑎), as determined from the standard normal
distribution table, is shown below:
This guide will show you how to calculate the probability (area under the curve) of a standard normal
distribution. It will first show you how to interpret a Standard Normal Distribution Table.
It will then show you how to calculate:
We start by remembering that the standard normal distribution has a total area (probability) equal to 1 and
it is also symmetrical about the mean. Thus, we can do the following to calculate negative 𝑧-values: we
-3-
Chapter
need to appreciate that the area under the curve covered by 𝑃(𝑍 > 𝑎) is the same as the probability less
than −𝑎. 𝑃(𝑍 < −𝑎) as illustrated below:
Making this connection is very important because from the standard normal distribution table, we can
calculate the probability less than 𝑎, as 𝑎 is now a positive value. Imposing 𝑃(𝑍 < 𝑎) on the above graph
is illustrated below:
From the above illustration, and from our knowledge that the area under the standard normal distribution
is equal to 1, we can conclude that the two areas add up to 1. We can, therefore, make the following
statements:
∵ 𝜙(𝑎) + 𝜙(−𝑎) = 1 ⇒ ∴ 𝜙(−𝑎) = 1 − 𝜙(𝑎)
Thus, we know that to find a value less than a negative z-value we use the following equation:
(1) The probability of 𝑃(𝑍 > 𝑎) = 1 − 𝜙(𝑎). To understand the reasoning behind this look at the illustration
below:
You know 𝜙(𝑎) and you know that the total area under the standard normal curve is 1 so by
mathematical deduction: 𝑃(𝑍 > 𝑎) = 1 − 𝜙(𝑎)
(2) The probability of 𝑃(𝑍 > −𝑎) = 𝑃(𝑍 < 𝑎) which is 𝜙(𝑎). To understand this we need to appreciate the
symmetry of the standard normal distribution curve. We are trying to find out the area below:
-4-
Chapter
But by reflecting the area around the center line (mean), this is the same size area as the area we are looking
for, only we already know this area, as we can get it straight from the standard normal distribution table: it
is 𝑃(𝑍 < 𝑎). Therefore, the 𝑃(𝑍 > −𝑎) is 𝑃(𝑍 < 𝑎), which is 𝜙(𝑎).
The key requirement to solve the probability between 𝑧-values is to understand that the probability between
𝑧-values is the difference between the probability of the greatest 𝑧-value and the lowest 𝑧-value:
𝑃(𝑎 < 𝑍 < 𝑏) = 𝑃(𝑍 < 𝑏) − 𝑃(𝑍 < 𝑎) = 𝜙(𝑏) − 𝜙(𝑎).
𝑋 −𝜇
1. Convert any non-standard to standard normal distribution using the transformation 𝑍 = .
𝜎
2. Sketch a normal curve, label the mean and the specific 𝑥 values, then shade the region representing
the desired probability.
3. For each relevant value 𝑥 that is a boundary for the shaded region, convert it to the equivalent 𝑧 value.
4. Use the standard normal distribution table to find the area of the shaded region. This area is the desired
probability.
Example 1 Given a standard normal distribution, find the area under the curve that lies
(a) to the right of z = 1.84 and
(b) between z = −1.97 and z = 0.86.
Solution.
(a) The area in above figure to the right of z = 1.84 is equal to 1 minus the area in z-table to the left of 𝑧 =
1.84, namely, 1 − 𝜙(1.84) = 0.9671 = 0.0329
-5-
Chapter
(b) The area in above figure between 𝑧 = −1.97 and 𝑧 = 0.86 is equal to the area to the left of 𝑧 = 0.86 minus
the area to the left of 𝑧 = −1.97. From 𝑧-table, we find the desired area to be
𝜙(0.86) − 𝜙(−1.97) = 𝜙(0.86) − 1 + 𝜙(1.97) = 0.8051 − 0.0244 = 0.7807
Example 2 Given a standard normal distribution, find the value of k such that
1. 𝑷(𝒁 > 𝒌) = 𝟎. 𝟑𝟎𝟏𝟓.
2. 𝑷(𝒌 < 𝒁 < −𝟎. 𝟏𝟖) = 𝟎. 𝟒𝟏𝟗𝟕.
Solution.
1. In above left figure, we see that the 𝑘 value leaving an area of 0.3015 to the right must then leave an
area of 0.6985 to the left. From 𝑧-table, it follows that 𝑘 = 0.52.
2. From z-table we note that the total area to the left of −0.18 is equal to 0.4286. In above right figure, we
see that the area between 𝑘 and −0.18 is 0.4197, so the area to the left of 𝑘 must be 0.4286 − 0.4197 =
0.0089. Hence, from 𝑧-table, we have 𝑘 = −2.37.
Example 3 Given a random variable 𝑋 having a normal distribution with 𝝁 = 𝟓0 and 𝜎 = 10, find
the probability that 𝑋 assumes a value between 𝟒𝟓 and 𝟔𝟐
Solution.
The 𝑧 values corresponding to 𝑥1 = 45 and 𝑥2 = 62 are
𝑧 = = = −0.5, 𝑧 = = = 1.2
Therefore,
𝑃(45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2)
𝑃(−0.5 < 𝑍 < 1.2) is shown by the area to the left of the ordinate 𝑧 = −0.5 from the entire area to the
left of 𝑧 = 1.2. Using 𝑧-table, we have
𝑃(45 < 𝑋 < 62) = 𝑃(−0.5 < 𝑍 < 1.2) = 𝑃(𝑍 < 1.2) − 𝑃(𝑍 < −0.5)
= 𝜙(1.2) − 𝜙(−0.5) = 𝜙(1.2) − 1 + 𝜙(0.5)
= 0.8849 − 0.3085 = 0.5764
-6-
Chapter
Example 4 Given that 𝑋 has a normal distribution with 𝜇 = 300 and 𝜎 = 50 find the probability
that 𝑋 assumes a value greater than 𝟑𝟔𝟐.
Solution.
To find 𝑃(𝑋 > 362), we need to evaluate the area under the normal curve to the right of 𝑥 = 362. This
can be done by transforming 𝑥 = 362 to the corresponding 𝑧 value, obtaining the area to the left of 𝑧 from
Z-table, and then subtracting this area from 1. We find that
𝑥 − 𝜇 362 − 300
𝑧 = = = 1.24
𝜎 50
Hence,
𝑃(𝑋 > 362) = 𝑃(𝑍 > 1.24) = 1 − 𝑃(𝑍 < 1.24) = 1 − 𝜙(1.24)
= 1 − 0.8925 = 0.1075.
Example 5 Each month, an American household generates an average of 143 pounds of newspaper
for garbage or recycling. Assume that standard deviation is 29 pounds. If a household is
selected at random. Find the probability
e if it generates
(i) Between 140 and 211 pounds per month.
(ii) More than 174.9 pounds per month.
Assume the variable is normally distributed.
Solution.
(a) Find the two 𝑍 values corresponding to the given two 𝑋 values:
In the section on the history of the normal distribution, we saw that the normal distribution can be used
to approximate the binomial distribution. This section shows how to compute these approximations.
Let’s begin with an example. Assume you have a fair coin and wish to know the probability that you
would get 8 heads out of 10 flips. The binomial distribution has a mean of 𝜇 = 𝑛𝑝 = (10)(0.5) = 5 and
a variance of 𝜎 = 𝑛𝑝(1 − 𝑝) = (10)(0.5)(0.5) = 2.5. The standard deviation is therefore 𝜎 =
8−5
1.5811. A total of 8 heads is 𝑧 = 1.5811 = 1.897. The question then is, “What is the probability of getting
a value exactly 1.897 standard deviations above the mean?” You may be surprised to learn that the answer
is 0: The probability of any one specific point is 0. The problem is that the binomial distribution is a discrete
probability distribution, whereas the normal distribution is a continuous distribution.
-7-
Chapter
The solution is to round off and consider any value from 7.5 to 8.5 to represent an outcome of 8 heads.
Using this approach, we figure out the area under a normal curve from 7.5 to 8.5. The area in black in the
following figure is an approximation of the probability of obtaining 8 heads.
The difference between the areas is 0.044, which is the approximation of the binomial probability. For
these parameters, the approximation is very accurate.
The same logic applies when calculating the probability of a range of outcomes. For example, to
calculate the probability of 8 to 10 flips, calculate the area from 7.5 to 10.5. The accuracy of the
approximation depends on the values of 𝑛 and 𝑝. A rule of thumb is that the approximation is good if both
𝑛𝑝 and 𝑛(1 − 𝑝) are both greater than 5.
Theorem 3.1 If X is a random variable with binomial distribution 𝐵(𝑛, 𝑝), then for sufficiently large n,
such that 𝑛𝑝 > 5 and 𝑛(1 − 𝑝) > 5 then, the following random variable has a standard
normal distribution:
𝒙−𝝁
𝒛 = ~ 𝑵(𝟎, 𝟏)
𝝈
where
𝝁 = 𝒏𝒑, 𝝈𝟐 = 𝒏𝒑(𝟏 − 𝒑) .
Example 6 Find the probability of obtaining 4, 5, 6, or 7 heads when a fair coin is tossed 12 times
(a) using the binomial distribution,
(b) using a normal approximation to the binomial distribution.
Solution.
𝑋 is the number of heads in 12 tosses. Since the coin is fair, 𝑃(ℎ𝑒𝑎𝑑) = 0.5, so 𝑋~𝐵(12,0.5).
(b) Using a normal approximation to the binomial distribution. The diagram below shows the probability
distribution for 𝑋~𝐵(12,0.5).
-8-
Chapter
Note that the vertical lines have been replaced by rectangle to help illustrate the intention to use a
continuous distribution as an approximation for a discrete one. The required binomial probability is
represented by the sum of the areas of the shaded rectangles. First, check the conditions for a normal
approximation:
𝑛𝑝 = 12 × 0.5 = 6, 𝑠𝑜 𝑛𝑝 > 5
𝑛(1 − 𝑝) = 12 × 0.5 = 6, 𝑠𝑜 𝑛(1 − 𝑝) > 5
Since 𝑛𝑝 > 5 and 𝑛(1 − 𝑝) > 5, use the normal approximation
𝑋~𝑁(𝑛𝑝, 𝑛𝑝𝑞)with 𝑛𝑝 = 6, 𝑛𝑝𝑞 = 12 × 0.5 × 0.5 = 3
So, 𝑋~ 𝑁(6, 3).
Superimposing the curve which is approximately 𝑁(6, 3), the probability of obtaining 4, 5, 6, or 7
heads is found by considering the area under this normal curve from x = 3.5 to x = 7.5
Note that the probability found by the two different methods compare well and the working for part (b) is
quicker to perform. The approximation is good because, although 𝑛 is not very large, 𝑝 = 0.5.
Continuity correction
This correction is needed when we approximate a discrete distribution (Binomial in this case) by a
continuous distribution (Normal). Recall that the probability 𝑃(𝑋 = x) may be positive if 𝑋 is discrete,
whereas it is always 0 for continuous 𝑋. Thus, it will always approximate this probability by 0. It is
obviously a poor approximation. This is resolved by introducing a continuity correction. Expand the interval
by 0.5 units in each direction, then use the Normal approximation. Notice that
is true for a Binomial variable 𝑋; therefore, the continuity correction does not change the event and preserves
its probability. It makes a difference for the Normal distribution, so every time when we approximate some
discrete distribution with some continuous distribution, we should be using a continuity correction. Now it
is the probability of an interval instead of one number, and it is not zero.
The following table lists the continuous correction:
-9-
Chapter
Example 7 A new computer virus attacks a folder consisting of 200 files. Each file gets damaged with
probability 0.2 independently of other files. What is the probability that fewer than 50
files get damaged?
Solution.
The number 𝑋 of damaged files has Binomial distribution with 𝑛 = 200, 𝑝 = 0.2, 𝜇 = 𝑛𝑝 = 40, and
𝜎 = 𝑛𝑝(1 − 𝑝) = 5.657.
49.5 − 40
𝑃(𝑥 < 50) ~ 𝑃(𝑥 < 49.5) = 𝑃 𝑧 < = 𝜙(1.68) = 0.9535.
5.657
Example 8 A multiple-choice quiz has 200 questions, each with 4 possible probability 0.2 answers of
which only 1 is correct. What is the probability that sheer guess-work yields from 25 to
30 correct answers for the 80 of the 200 problems about which the student has no
knowledge?
Solution.
The probability of guessing a correct answer for each of the 𝑛 = 80 questions is 𝑝 = . If 𝑋 represents the
number of correct answers resulting from guesswork, then
1 3
𝑃(25 ≤ 𝑥 ≤ 30) = 80𝐶𝑥
4 4
- 10 -
Chapter
1 3
𝑃(25 ≤ 𝑥 ≤ 30) = 80𝐶𝑥 ≈ 𝑃(1.16 < 𝑧 < 2.71)
4 4
Example 9 Based on past experience, 7% of all luncheon vouchers are in error. If a random sample
of 400 vouchers is selected, what is the approximate probability that
(a) exactly 25 are in error?
(b) fewer than 25 are in error?
(c) between 20 and 25 (inclusive) are in error
Solution.
We approximate the 𝐵(400, .07) random variable 𝑋 with a normal, with mean 𝑛𝑝 = (400)(. 07) =
28 >5 and standard deviation 𝜎 = (400)(. 07)(. 93) = 5.103 > 5. The probability calculations are
thus
(a)
24.5 − 28 25.5 − 28
𝑃(𝑋 = 25)~ 𝑃(24.5 < 𝑋 < 25.5) = 𝑃 < 𝑍 <
5.103 5.103
= 𝑃(−0.69 < 𝑍 < −0.49) = 𝜙(−0.49) − 𝜙(−0.69)
= 0.3121 − 0.2451 = 0.0670
(b)
24.5 − 28
𝑃(𝑋 < 25) ≈ 𝑃(𝑋 < 24.5) = 𝑃 𝑍 < = 𝑃(𝑍 < −0.69)
5.103
= 𝜙(−0.69) = 0.2451.
(c)
19.5 − 28 25.5 − 28
𝑃(20 ≤ 𝑋 ≤ 25) ≈ 𝑃(19.5 < 𝑋 < 25.5) = 𝑃 < 𝑍 <
5.103 5.103
= 𝑃(−1.67 < 𝑍 < −0.49) = 𝜙(−0.49) − 𝜙(−1.67)
= 0.3121 − 0.0475 = 0.2646
Example 10 Suppose that a sample of 𝒏 = 𝟏, 𝟔𝟎𝟎 tires of the same type are obtained at random from
an ongoing production process in which 𝟖% of all such tires produced are defective. What
is the probability that in such a sample not more than 𝟏𝟓𝟎 tires will be defective?
Solution.
We approximate the 𝐵(1600, .08) random variable 𝑇 with a normal, with mean 𝑛𝑝 = (1600)(. 08) =
128 > 5, 𝑛(1 − 𝑝) = (1600)(0.92) = 1354.24 > 5 and standard deviation 𝜎 = (1600)(. 08)(. 92) =
10.85. The probability calculation is thus
150.5 − 128
𝑃(𝑇 ≤ 150) ≈ 𝑃(𝑇 < 150.5) = 𝑃 𝑍 <
10.85
= 𝑃(𝑍 < 2.07) = 𝜙(2.07) = 0.9808.
- 11 -