EXAMPLE 2.13:: Applied Statistics Random Variables and Probability Distributions
EXAMPLE 2.13:: Applied Statistics Random Variables and Probability Distributions
x − 1
The total number of sample points in the experiment is found to be , with each having the above
r − 1
probability. Thus we have the general formula for the pmf of negative binomial distribution to be given by
§ [ · U [ U
3; [ S[ ¨ ¸ S T [ UUU…
© U ¹
EXAMPLE 2.13:
Solution:
a) Using the negative binomial distribution with x = 7, r = 3, and p = 0.5, we find that
6
p (7) = (0.5)7 = 0.1172.
2
b) Using the negative binomial distribution with x =4, r = 1, and p = 0.5, we find that
3
P (4) = (0.5) 4 = 1/16.
0
The normal distribution is no doubt the most important distribution in statistics, and the most widely
used continuous probability distribution. There are 4 basic reasons why the normal distribution occupies
a prominent place in statistics.
94
Applied Statistics Random Variables and Probability Distributions
1. The normal distribution comes close to fitting the actual observed frequency distributions of
many phenomena:
a) Human characteristics such as weights, heights, and IQs.
b) Outputs from physical processes; dimensions, and yield.
c) Repeated measurements of the same quantity, as described above and errors made in
measuring physical and economical phenomena.
The probability density function (pdf) for a normally distributed random variable X, with mean μ and
variance σ2, in short; X ~ N (μ, σ2), is given by
1
f(x) = exp[−( x − µ ) 2 / (2σ 2 )], –∞< x <∞, –∞ < μ <∞, and 0 < σ<∞.
σ 2π
In the above notation, for the pdf of the normal random variable X, μ is the mean, or the location
parameter, while σ is the standard deviation, the shape, or scale parameter. Due to the extensive use of
the above pdf for finding probabilities, and to the exhaustive and wide range of the values for the mean
and the standard deviation of X, a unique table has been introduced based on transforming the above
general R.V. to the standard normal R.V. Z, where Z = (X–μ)/σ, and thus Z ~ N (0, 1), and it is called
the standard normal distribution, Figure 8.
Figure 8
This transformation has tremendously reduced the volumes of the tables that will correspond to the
different values of μ & σ, into one single table. Some basic properties of the probability density function
for normal random variable X are
96
Applied Statistics Random Variables and Probability Distributions
Figure 9 (internet)
f
3. The integral of f(x) over the real line is 1, i.e ³
f
I [G[ i.e. the total area under the curve
and above the horizontal axis is 1.
4. The horizontal axis acts as a horizontal asymptote to the curve of the normal pdf.
5. Areas under the graph of the normal density function represent probabilities. The value of
the integral of f(x) over the interval (a, b) represents the probability that a ≤x ≤b, in other
words,
E
3D d [ d E ³ I [G[ DVVKRZQLQ)LJXUH
D
Figure 10 (Internet)
6. It is to be noted that P(a< x < b) = P( a < x ≤b) = P( a ≤ x < b) = P(a ≤ x ≤ b ), and this is
due to the fact that
a
∫ f ( x)dx
a
= 0.
97
Applied Statistics Random Variables and Probability Distributions
7. The Empirical Rule: or 68 – 95 – 99.7 rule, is the statistical rule for a normal distribution
determined by the mean and the standard deviation. Approximately 68% of the area under
the normal curve is between X = μ – σ and x = μ + σ, and 95% of the area under the normal
curve is between X = μ – 2σ and x = μ +2σ, while 99.7% of the area under the normal curve
is between X = μ – 3σ and x = μ + 3σ, check Figure 11A.
In terms of probability and mathematical notation the above facts can be expressed as follows:
EXAMPLE 2.14:
The scores for all high school seniors taking the verbal section of the Scholastic Aptitude Test
(SAT) in a particular year had a mean of 490 and a standard deviation of 100. The distribution
of SAT scores is bell-shaped.
a) What percentage of seniors scored between 390 and 590 on this SAT test?
b) One student scored 795 on this test. How did this student do compared to the rest of the
scores?
c) A rather exclusive university only admits students who were among the highest 16% of
the scores on this test. What score would a student need on this test to be qualified for
admittance to this university?
For the example above we have X ~N (490, 1002), Figure 6A displays the areas noted above.
Solution:
The data being described are the verbal SAT scores for all seniors taking the test one year. Since
this is describing a population, we will denote the mean and standard deviation as μ = 490 and
σ = 100, respectively. A bell shaped curve summarizing the percentages given by the empirical
rule is below.
a) From the Figure 11A above, about 68% of seniors scored between 390 and 590 on this
SAT test.
b) Since about 99.7% of the scores are between 190 and 790, a score of 795 is excellent. This
is one of the highest scores on this test.
c) Since about 68% of the scores are between 390 and 590, this leaves 32% of the scores
outside this interval. Since a bell-shaped curve is symmetric, one-half of the scores, or
16%, are on each end of the distribution. Figure 11B, below, shows these percentages.
99
Applied Statistics Random Variables and Probability Distributions
Since about 16% of the students scored above 590 on this SAT test, to be qualified for admittance to this
university, a student would need to score 590 or above on this test.
EXAMPLE 2.15:
The weight of a certain type of chicken, at a certain age, follows a normal distribution with
mean 1.0 kg and a standard deviation of 0.20 kg. Find
Solution:
Let X be the weight of a chicken, then X has a normal distribution with μ =1.0, and σ=0.2, i.e.
X ~ N (1.0, 0.04). By using Z = (X -μ)/σ, we have
a) P(X < 1.5) = P[(X -μ)/σ < (1.5 -μ)/σ] = P(z < 2.5) = 0.9938
b) P(0.9 < X < 1.2) = P(X < 1.2) – P(X <0.9) = P(Z < 1) – P(Z < -0.5) = 0.8413 – 0.3085
= 0.5328
c) P (X > 1.6) = 1 – P(X ≤1.6) =1 – P (Z ≤3.0) = 1 – 0. 9987 = 0.0013
d) P (0.8 < X < 1.5) = P (Z < 2.5) – P (Z< -1.0) = 0.9938 – 0.1587 = 0.8351 = 83.51%.
e) 0.8351* 300 = 250.53 ≈251.
100
Applied Statistics Random Variables and Probability Distributions
1
f(x) = exp[−( x − µ ) 2 / (2σ 2 )], –∞< x <∞, –∞< μ <∞, and 0 < σ<∞.
σ 2π
We find a large number of values for the mean, for the variance, and also for the variable itself. There
is no way that a table, or tables, will be made for any combination of those values. Having another look
at the standardizing formula, namely
Z = (X – μ)/σ,
We see that just one table is needed. That table is the Standard Normal Table for the Random Variable
Z which is distributed as Z ∼ N (0, 1), Check Figure 8.
Having done what we did so far for the normal distribution, let us discuss the procedure for finding the
area under the normal curve. For the general normal random variable we have: X ∼ N(μ, σ2). There are
three cases that arise, and these are:
1. Finding the area under the normal curve, above the x-axis and between two values for the
random variable. In other words find the following probability
Is it one probability or four different ones? All are equal, whether we include the end points, or exclude
them, or include one and exclude the other. This is based on the concept, in calculus; there is no area
above a point in the continuous case of a random variable,
a
P(X = a) = ∫ f ( x)dx
a
=0
2. Finding the area to the left of a value for the random variable: P(x < c), check Figure 12
Figure 12 (Internet)
102
Applied Statistics Random Variables and Probability Distributions
3. Finding the area to the right of a value for the random variable: P(x > d), the un-shaded
area in Figure 13
Figure 13 (Internet)
With no doubt that we can find the required probabilities for any value of the variable X, any value for
the mean, and any value for the standard deviation. Calculus techniques had been used just to do that.
This save a lot of time and resources, and reduced the tremendous number of tables into just ONE, the
standard normal Table. Therefore if we use the transformation
Z = (X–μ)/σ,
the above three cases for finding the probabilities can be calculated by using the standard normal table.
The equivalent case, in terms oz will look like the following
1. P ( z1 ≤ Z ≤ z2 ) = P (a ≤ x ≤ b).
2. P ( Z ≤ z3 ) = P(x < c).
3. P ( Z ≥ z4 ) = P(x > d).
The standard normal Table gives the area to the left of any point, to find the probability for
Part 1, we have
P( z1 ≤ Z ≤ z2 ) = P ( Z ≤ z2 ) - P( Z ≤ z1 ) .
To find the probability for part 2, it is straight forward from the table.
For part 3, since the total area under the curve is 1 and the table lists the area to the left we find ourselves
doing the following for part 3.
P ( Z ≥ z4 ) = 1 – P ( Z ≤ z4 ) .
103
Applied Statistics Random Variables and Probability Distributions
EXAMPLE 2.16
Solution:
Transforming the values by standardizing we see that, the above probabilities can be found by
using the standard normal table for the corresponding values of z as follows:
1. P (56.5 < X < 90.1) = P [(56.5-70)/10] < Z < (90.1-70)/10] = P (-1.35 < Z < 2.01)
= P (Z<2.01) – P (Z< -1.35) = 0.9778 – 0.0885 = 0.8893.
6HFRQGGHFLPDOSODFHIRU]
=
We read .9778 for 2.0 under z and under .01, to get the probability of z < 2.01
6HFRQGGHFLPDOSODFHIRU]
=
Similarly, we read 0.0885 for -1.3 under z and under .05, to get 0.0885. The difference is the answer, as
it is seen above.
6HFRQGGHFLPDOSODFHIRU]
=
104
Applied Statistics Random Variables and Probability Distributions
Similarly, we read 0.6255 for 0.3 under z and under .02, to get 0.6255
3. P(X > 86.8) = 1 – P(X < 86.8) = 1- P {(86.8-70)/10] = 1- P (Z < 1.68) = 1-0.9535 = 0.0465.
6HFRQGGHFLPDOSODFHIRU]
=
Similarly, we read 0.9535 for 1.6 under z and under .08, to get 0.9535.
For the figure below the way is going backwards. It is a two way street. Now we are given the area,
which is standing for probability, we need to find the cutting point, whether on the X-axis or the Z-axis.
Finding the cutting point on one of the axes and using the transformation, below, will get you the other
cutting point.
Z = (X – μ)/σ.
EXAMPLE 2.17
105
Applied Statistics Random Variables and Probability Distributions
D =
Reading the value in the Standard Normal Table, we found that the 0.9972 is along 2.7 under Z and
under 0.07 for the second place. Hence P (Z < 2.77) = 0.9972. From the above transformation we see
that x = 5(2.77) + 60 = 73.85
=
E
Reading the value in the Standard Normal Table, we found that the 0.3000 is between the two values
cited along -0.5 under Z and under 0.02 and 0.03 for the second place. Since 0.2981 is closer to 0.3 than
.3015, we can take z to be -0.53. Using the transformation based on the distribution of X, we have x =
(2/3) (-0.53) + 5 = 4.65.
F =
Since the given area is to the right of the required value for reading the value in the Standard Normal
Table, we need to subtract this number from 1, i.e., 1- 0.9386 = 0.0632. Based on that, we now read
0.0632 to be found closer to 0.0630, which is cited along -1.5 under Z and under 0.03 for the second
decimal place. We can take z to be -1.53. Using the transformation based on the distribution of X, we
have x = 6(-1.53) + 200 = 190.82.
=
As it was in part c), above, and since the given area, as a percentage, is to the right of the required
value, for reading the value in the Standard Normal Table, we need to subtract this number from 100,
i.e., 100 – 23.15 = 76.85. Based on that, we now read 0.7685 to be found closer to 0.7673, which is cited
along 0.7 under Z and under 0.03 for the second decimal place. We can take Z to be 0.73. Using the
transformation based on the distribution of X, we have x =2(.73) +0 = 1.46.
106
Applied Statistics Random Variables and Probability Distributions
The above Example could have been solved using technology-step-by-step by applying the command
INVNorm(Area to the left, Mean, Standard deviation), and press enter to get the value of x to any
decimal places you like, and no rounding for the area in order to use the Standard Normal Table.
Approximating the Binomial distribution probabilities using the normal is not needed any more at this
time of technology. Since a lot of software and calculators are accessible to students with more accuracy
and less time consuming. Based on this notion, we will not discuss this topic here anymore.
CHAPTER 2 EXERCISES
2.1 An oil exploration firm finds that 5% of the test wells it drills yield deposit of natural gas. If it
drills 6 wells, find the probability that at least one well will yield gas.
2.2 A medical research suggests that 20% of the general population suffer adverse side effects from
a new drug. If a doctor prescribes the drug for 4 patients, what is the probability that:
2.3 determine whether the distribution is a discrete probability distribution. If not, state why.
D [
F [
S;
107
Applied Statistics Random Variables and Probability Distributions
2.4 Determine the required value of the missing probability to make the distribution a discrete
probability distribution
D [
E [
2.5 Consider the exercise 2.4, after finding the missing probability, find
a) The mean,
b) The variance, and
c) The standard deviation.
Fast-track
your career
2.6 An insurance company finds that 0.005% of the population dies from a certain kind of accident
each year. What is the probability that the company must pay off no more than 3 of 1000 insured
risks against such accidents in a given year? (Hint use the Poisson approximation to the binomial
distribution with np = λ.) This approximation is satisfactory whenever n is large and p value is
near 0 or 1. If p is near 0.50 and n is large then the normal distribution is used to approximate
the binomial distribution with mean = np and variance = npq.
2.7 Find the probability of the indicated event if P (E) = 0.25 and P (F) = 0.45
2.8 Weights of fish caught by a certain method are approximately normally distributed with mean
of 4.5 lbs. and a standard deviation of 0.50 lbs.
2.9 The inside diameter of a piston ring is normally distributed with mean of 4 inches and a standard
deviation of 0.01 inches.
a) What percentage of the rings will have an inside diameter exceeding 4.025 inches?
b) What is the probability that a piston ring will have an inside diameter between 3.99 and
4.01 inches?
c) Below what value of the inside diameter will 15% of the rings fall?
2.10 Gauges are used to reject all components in which a certain dimension is not within the
specifications of 1.5 – d and 1.5 + d. It is known that this dimension is normal distributed with
mean 1.50 and standard deviation 0.2. Determine the value of d such the specifications
109
Applied Statistics Random Variables and Probability Distributions
2.11 A sample space consists of five simple events, E1, E2, E3, E4, and E5.
a) If P (E1) = P (E2) = 0.15, P (E3) = 0.4, and P (E4) =2P (E5), find the probabilities of P (E4)
and P (E5).
b) If P (E1) = 3P (E2) = 0.3, find the probabilities of the remaining simple events if you know
that the remaining simple events are equally probable.
D; S[ E [ S[ F [ S[
2.13 Consider the Tables in 2.9, and pick up the one that represents a discrete probability distribution.
Then find, for that distribution
a) The mean.
b) The Variance.
c) The Standard deviation.
2.14 Suppose in families with 4 children, only single birth that the probability of having 0, 1, 2, 3, or
4 boys are respectively: 1/16, 4/16, 6/16, 4/16, and 1/16. Find
110
Applied Statistics Random Variables and Probability Distributions
2.16 The random variable X follows a Poisson process with λ = 4. Find each of the following:
TECHNOLOGY STEP-BY-STEP
TECHNOLOGY STEP-BY-STEP Finding the Mean and Standard Deviation of a Discrete
Random Variable
TI-83/84 Plus
1. Enter the values of the random variable in L1 and their corresponding probabilities in L2.
2. Press STAT, highlight CALC, and select 1: 1-Var Stats.
3. With 1-VarStats on the HOME screen, type L1 followed by a comma, followed by L2 as
follows: 1-Var Stats L1, L2
Hit ENTER.
TI-83/84 Plus
Computing P(x)
Computing P(X ≤ x)
Excel
Computing P(x)
1. Click on the fx icon. Highlight Statistical in the Function category window. Highlight
BINOMDIST in the Function name window
2. Fill in the window with the appropriate values. For example, if x = 5, n = 10, and p = 0.2, fill
in the window. Click OK
Computing P(X ≤ x)
Follow the same steps as those presented for computing P(x). In the BINOMDIST window, type TRUE
in the cumulative cell.
TI-83/84 Plus
Computing P(x)
112
Applied Statistics Random Variables and Probability Distributions
Computing P(X ≤ x)
Excel
Computing P(x)
Computing P(X ≤ x)
Follow the same steps as those presented for computing P(x). In the POISSON window, type
TRUE in the cumulative cell.
TI-83/84 Plus
Note: When there is no lower bound, enter -1E99. When there is no upper bound, enter 1E99. The E
shown is scientific notation; it is selected by pressing 2nd then ‘.
113
Applied Statistics Random Variables and Probability Distributions
Excel
1. Select the fx button from the tool bar. In Function Category: select “Statistical”. In Function
Name: select”NORMDIST”. Click OK.
2. Enter the specified z-score. Click OK.
1. Select the fx button from the tool bar. In Function Category: select “Statistical”. In Function
Name: select”INVNORM”. Click OK.
2. Enter the specified area. Click OK.
TI-83/84 Plus
Note: When there is no lower bound, enter -1E99. When there is no upper bound, enter 1E99. The E
shown is scientific notation; it is selected by pressing 2nd then ‘.
Excel
1. Select the fx button from the tool bar. In Function Category: select “Statistical”. In Function
Name: select”NORMDIST”. Click OK.
2. Enter the specified observation, mu, and sigma, and set Cumulative to TRUE. Click OK.
1. Select the fx button from the tool bar. In Function Category: select “Statistical”. In Function
Name: select”NORMINV”. Click OK.
2. Enter the specified area left of the unknown normal value, mu, sigma, Click OK.
115
Applied Statistics Random Variables and Probability Distributions
TI-83/84 Plus
Excel
116