0% found this document useful (0 votes)
44 views22 pages

BinoPoiNormalDist 12 33

The document discusses the binomial distribution. It begins by describing Bernoulli trials as experiments with two possible outcomes (success/failure) where the probability of success is constant across trials and trials are independent. It then defines the binomial distribution as the probability of getting r successes in n trials. The key points are: i) The probability of r successes is given by the binomial probability mass function P(X=r) = C(n,r)prqn-r, where p is the probability of success and q is the probability of failure. ii) Examples calculate the probabilities of getting different numbers of successes in examples like coin tosses and sales calls. iii) The binomial distribution applies when experiments have two

Uploaded by

sinhapalak1002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views22 pages

BinoPoiNormalDist 12 33

The document discusses the binomial distribution. It begins by describing Bernoulli trials as experiments with two possible outcomes (success/failure) where the probability of success is constant across trials and trials are independent. It then defines the binomial distribution as the probability of getting r successes in n trials. The key points are: i) The probability of r successes is given by the binomial probability mass function P(X=r) = C(n,r)prqn-r, where p is the probability of success and q is the probability of failure. ii) Examples calculate the probabilities of getting different numbers of successes in examples like coin tosses and sales calls. iii) The binomial distribution applies when experiments have two

Uploaded by

sinhapalak1002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

i) more than 500 hours to aomplete the programme?

Probability Distribution$
ii) less than 500 hours to complete the p r o g r m e ?
Solution:
i) From the figure, we sg that half of the area under the curve is located on either
side of the mean of 500 hours. Thus, we get that the probability that the random
variable will take on value higher than 500 is one half, or 0.5.
ii) A similar argument shows that the chance is 0.5.
X

Why don't you try some exercise now.

E5) Suppose X is a continuous random variable defined on [2,4] and f is a function of


[2,4] such that

f(x) = { 0,
3
for2<x<4
elsewhere

i) Draw a rough sketch of f(x),


ii) Does f define a probability density function of .f? if so why?
E6) Classify the r.v.'s given at the beginning of Sec. 3.1-as discrete or continuolls.

In the following sections we shall discuss some standard distribution.

3.3 BINOMIAL DISTRIBUTION


One of the important discrete random variables (or, discrete distributions) is the
binomial variable. In this section we shall discuss this random variable and its
probability distribution.
Many times we have to deal with experiments where there are only two possible
outcomes. For example, when a coin is tossed, either head or tail comes up, seed either
generates or fails to generate, a newborn is either a girl or boy.
Let us consider such an experiment. For example, consider the experiment of tossing a
fair coin 3 times. This experiment has certain characteristic. First of all, it involves
repetition of three identical experiments (trials). Each trial has only two possible
outcomes - a head or tail. We call outcome "head" success and outcome "tail" a failure.
All trials are independent of each other. We also know that probability of getting a head
in a trial and probability of gening a tail in a trial are bdth f .

and

This shows that the probability of a "success" and of a "fail~re'~ do not change from
one trial to another.
If X denotes the total number of heads, obtained in 3 trials, then X is a random
variables which takes values from (0, 1,2,3).
Suppose that p denote the probability of a success (i.e. getting a head) and q denote the
probability of failure (i.e. getting a tail).
Then regarding the above experiment, we have observed the following:
69 I
Statistics and Probability 1) It invokes a repetiticm of n idmica! hials (Here n = 3).
2) The trials are independent of each other.
3) Each trials has two possible outcomes
3) The probabilities of a "success" (p) and of a "failure" (q) do not change.
If you go back for a moment to Sec. 3.1, you will see that we have already obtained the
probability distribution of this in Example 3. Let us look at the probabilities once again.
P[X = 01 = P[getting three tails]
= P[T,T,T] = q x q x q

Similarly,
P[X = 11 = ~ [ [ ~ l , P H T l , ~ l I
+
= P [TI'H] P[THT] P[HTT] +
+ +
= q2p q2p q2p = 3q2p
Similarly,
P[X = 21 = P[THH] PlTHH] + + P[HHT]
= 3p2q
and
P[X= 31 = p3
In fact the probability P[X = r],r =: 0,1,2,3 gives that if we toss a coin three times,
how maqy ways, or combinations, will yield r heads and n - r tails.
Now you recall from your school mathematics that the number of combinations of n
objects taken r at a time is calculated by the formula

In the case of tossing of three coins, ~ 3Therefore,


. we rewrite the probabilities as
P[X = 01 = c(3,0)~~= q ~q3- ~ 1

P[X = 11 = ~ ( 3l)p1q3-1
, = 3W2
P[X = 21 = c(3, 2)p2q3-2 = 3p2q
P[X = 31 = C(3, qp3q0 = p3
This suggests that the probability p[X = r] = p, for a given r can be calculated using
the formula

Pr = C(n, r)prqn-r (3)


where r = number of successes
n = number of trials made
p =.probability of success in a trial
q = 1 - p =: probability of failwe in a trial.
Why don't you check this formula for n=5, i.e. tossing of a coin 5 times. For example,
try this exercise.
- - -

E7) In the experiment of tossing a coin 5 times find the probability of getting 3 heads
and 2 tails. Verify that this probability is given by the formula given in Equation
(3).

Let us now sum up the points we have observed in the example, above.
An experiment consisting of n trials is performed such that
i) each trial has two possible outcomes, viz., a 'success'@) and a "failure9'(q); Probability Dbstdbutions
ii) the probability of success, p, is the same for any trial;
iii) the outcomes of different trials are statistically independent (i.e. the trials are
independent).
These trials are called Bernoulli trials.
The sample space of this experiment consists of elements like "SSFSSF.. ......"
of
length n of 'S's and 'F's where S stands for the success and F stands for the failure.
Let X represent the number of successes (in any order whatsoever) in the set of n trials.
Then X is a discrete random variable taking integral values 0, 1, . . . ,n. The
probability of the event P[X = r] is given by Binomial Distribution

P[X = r] = pr = C(n,r)prqn-'

where r = exact number of successes


n = number of trials made
p = probability of success on a trial
q = 1 - p = probability of failure on a trial and
n! James Bernoulli was a
C(n,r) = (n - r)!r! (The earlier example illustrates how we got this formula for p,). seventeenth century swiss
mathematician who
Such a random variable X is called a binomial random variable and its probability
performed some of the early
distribution is called binomial distribution and is given by Eqn.(2). work on binomial
Problem 2: A sales representative, calls on four potential clients. The probability that
she will obtain an order from each of them is and whether or not she obtains an order
from one of them is statistically independent of whether or not she obtains an order
from any of the others. What is the probability distribution of the number of orders she
will receive?
Solution: We note that there are two mutually exclusive events (obtaining an order or
no order) each time she makes a call and the probability of an order 112 each time.
Also the outcomes of the calls are statistically independent. Therefore this is a situation
where there are four Bernoulli trials and where the probability of a success (an order)
1
equals 112. Substituting n = 4 and p = - in Eqn.(4), we get that
2

Thus, the probability of no orders is 1/16, of one order is 114, of two order is 318, of
three orders is 114, and of four orders is 1/16.

Problem 3: It has been claimed that in 60% of all solar heat installations, the utility
bill is reduced by at least one-third. Accordingly, what are the probabilities that the
utility bill will be reduced by one-third in
i) four of five installations?
ii) at least four of the five installations?
Statistics and hbabifity Solution Here the random variable follows binomial distribution with p = 0.6, r = 4
and n=5.
To find (i), we have to calculate P[X = 41, which is given by
P [ X = 4 ] = ~(5,4)(0.6)~(0.4)
= 0.259
Now to find (ii), we have to find the probability that X is at least 4. This probability is
the sum of the probabilities that X = 4 and X = 5 because 'at least 4 means 4 or more'.
f l u s we have to find p[X = 41 f P[X = 51.
P[X = 51 = (5,5)(0.6)'
= 0.078
.; the required probability = 0.259 + 0.078 = 0.337.

Binomial distribution is very applicable in situations where we have to decide whether


to accept a lot of goods (items) coming out of of a manufacturing process. This decision
is based on how many defective items are in the lot. Companies (or firms) will generally
return the entire items if there is evidence that more than certain items is defective. To
make such decision, let us see how we can make use of the binomial distribution.
An item coming out of a manufacturing process can either be defective or non
defective. Consider a lot of N items produced by the manufacturing process. Let m of
these be defective. Suppose a quality control inspector draws a random sample of n
items from the lot, one by one, with replacement (i.e. an item drawn is put back in the
lot, after noting down whether it is defective or non defective, before the next item is
drawn at random). Let X be the number of defective items drawn by the inspector. Note
that there are n trials and in each trial the probability that a defektive item is picked
remains the same, namely ,; as the drawing is done with replacement and at random.
Also note that the trials are independent. Therefore, the random variable X defined
above is distributed as a Binomial (n,p) where p = g.
By now you must have got some idea for recognising those situations where we can
apply binomial formula. If we can apply binomial distribution to study a situation, then
we say that the situation can be modelled by binomial distribution.
I
Here are some exercises for you.

E8) A farmer buys a quantity of cabbage seeds from a company that claims that
approximately 80 % of the seeds will germinate if planted properly. If four seeds
are planted, what is the probability that exactly two will germinate?

E9) Consider again the data collected by Sunil, the newspaper boy. When Sunita, the
statistics student, saw the data, she started wondering if the number of customers
from among his ten irregular customers, who actually buy from him on a given
day, will follow a binomial distribution? What do you think? Under what
conditions will this random variable follows a binomial distribution.
E10) Sunita was still glancing through Sunil's diary wondering to herself if she could
think of the 10 customers as 'ten identical coins', when she noticed something
significant. She noticed that a lot more of the sequences had a 1 in the third
position than in the 8th position. Sunil remembered that customer 3 was the
management trslinee whom he called 'Alka Didi'. She was from a neighbouring
town and was undergoing training in a software company. She was interested in
news about software companies, science and environmental issues. She would
often buy from Sunil but not always. Customer 8, Sunil told his sister, was a
mysterious yoyng man by name Kapil, who was rumoured to be working for a
detective agency. One could rare$ find trim in the morning hours and if he was at
*
home in the morning hours, he would certainly buy fiom Sunil. Probability Distributions

Given the situation above, do you stiM think that the nurilber of sales on a day can
be modelled as a binomially distributed random variable'?, Give reasons Fcr your
answer.
- -

Once we have the probability distribution, we naturally ask what is the 'expected
value'. We shall see that now.
Expected Value of a Binomial Variable
We have already seen in Sec.3.2 that for a discrete random variable X, the 'Expected
Value' E(X) is

I
where xo, XZ,. . .are the values assumed by X and po, pl , . . . are the probabilities
associated with these values i.e.

If X is a binomid r.v., taking values 0, 1, . . . , ,n, then, we know that


I

We rewrite this expression in the sum notation C (called sigma) as


n

j= 1
Those who are familiar with binomial expansion can recognise that the second
+
expression on the R.H.S. is 1 - p p)"-'. Therefore we have
E(X) = np [ l - +
= np.

This means that the expected number of successes is nu.

I Let us do a problem.

IE Problem 4: An oil exploration firm plans to drill six holes. It is believed that the
probability that each hole will yield oil is 0.1. Since the holes are in quite different
locations, the outcome of drilling one hole is statistically independent of that af drilling
any of the other holes.
(a) If the firm will be able to stay in business only if two or more holes produce oil,
what is the probability of its staying in business?
(b) Give the expected value of the number of holes that result in oil.

i
Solution: (a) If the firm can stay in business only if two or Inore holes produce oil, it
follows that the probability that it will stay in business cqunls i minus the probability
that the number of holes resulting in oil is 0 or 1. Eat11 Ilole drilled can be viewed as a
Bernoulli trial where the probability of success is . I . Thus. the probability that the
number of successes is 0 or 1 equals:
6! O! 6!
P(0 orl) = P(0) + P(1) = -(.96) + -(.1)(.9')

I = .531
0!6!
+ ,354= .885.
1!5!
71
Statistfcs and Probability Consequently, the probability that the firm will be able to stay in business is
1 -.885 = .115.
(b) The expected value of the number of holes yielding oil is 6 x 0.1 = 0.6, since n = 6
and p = .I.

A problem with the binomial distribution is that if the number trials 'n' is very large
and probability 'p' is very small, computation of P[X = r] is cumbersome.
The distribution which we introduce in the next section may be useful in such a
situation.

3.4 POISSON DISTRIBUTION

In this section we introduce you to another discrete distribution called 'Poisson


distribution'. We will familiarise you with different situations where we can apply this
Poisson distribution. Let us try to understand this distribution through an example.
A nineteenth cectury swiss
mathematician. Suppose it is the the busy Friday noon hour at a bank, and we are interested in the
number of customers who might arrive during that hour, or during a 5-minute or a
10-minute interval in that hour;.
In statistical terms, we want to find the probabilities for the number of arrivals in a time
interval.
As in the case of binomial , here also we make some assumptions.
1) The average arrival rate at any unit time remains the same over the entire noon
hour.
2) The number of arrivals in a time interval does not depend on what happened in
previous time intervals.
3) It is extremely unlikely that there will be more than one arrival in a very short
interval of time. That means that it is impossible for more than one customer to get
through the revolving entrance door in a fraction of a second.
Under these assumption we find the required probability. For this we make use of the
following formula known as Poisson formula, given by

where X is the Greek letier lambda which denotes the average arrival rate per unit ~f
time and t is the number of units of time is the number of arrivals in t units of time
41so we know that X = 72 arrivals per hour is a constant for this situation. Since in the
question X is given in 'hour', to standardise the unit, we have to find 't' in hour.
i.e. 60 minutes = lhour

1
:. t = -hours
20
Then

To find P(4), we use the Table 2, given in the Appendix. This table shows p(x) for
selected values of A.
From-thetable, we get

What does this value 0.191 specify? This tells us that if the m v a l s are arrivals of
customers at a bank, there is 19.1% chance that exactly four customers will arrive in the
next 3 minutes.
If we vary the values of x and t, we can get different probabilities. This gives the
probability distribution which is called Poisson probability distribution.
In the above discussion we saw that the Poisson formula is applicable only if certain
conditions are specified. We re-state the formula now.
Poisson Formula
The Poisson Formula is given by Poisson Distribution

where X is used to compute probabilities for the number of occurances in an interval of


time, if the occurrences have the following characteristic.
I) the average occurrence rate per unit of time is constant
2) occurrence in an interval is independent of what happened previously.
3) It rarely happens that there will be more than one occurrence in a very short
time interval
i A distribution having probabilities given by Poisson Formula is called Poisson
distribution.
I
Now let us see some situations where we can apply Poisson distribution . Here is an
example.
Problem 5: Calls at a telephone switch board occur at an average rate of six calls per
10 minutes. Suppose the operator leaves for a 5-minute coffee break, what is the
probability that exactly two calls come in (and so go unanswered)while the operator is
I away?

I Solution :Here you can check that the conditions 1.2 and 3 of the Poisson formula are
I
satisfied in this case. Therefore we can use the formula. Now that here X = $, In this
case t = 5 so that Xt = 3. Hence the required probability P(2) is given by

That means there is 0.2240 chance that two calls go unanswered.


I

Here are some exercises for you.

E l 1) If a bank receives on an average X = 6 bad checks per day, what is the probability
that it will receive 4 bad checks on any given day.

E12) A hospital has 20 kidney dialysis machines and that the chance of any one of them
malfunctioning during any day is .02. We want to find the probability that exactly
3 machines will be out of service on the same day. Then,
i) can we use the binomial farmula to find this probability? If yes, calculate the
Statistics and Probability ii) can we use the Poisson formula to find this? If yes calculate the probability.

In the above exercise we have seen that the difference between the two calculations is
very small.
The Poisson formula can be used to approximate the binomial probability of r successes
in n binomial trials in the situations where n is large and probability of success 'p' is
small.
For instance, suppose we are interested in number of road accidents in a metropolitan
city or daily number of machine breakdown in a work shop etc., during a specified
interval of time. Each of these subintervals is so small that at best one and no more
occurrence happens within it. 'l7-m~we may look upon each subinterval as a trial. Each
trial leads to a "success" if the occurrence happens during that subinterval and to a
"failure" if the occurrence does not happen.Assume that the occurrences are
independent of each other. Hence, the total number of occurrences can be constructed
to be distributed binomially, the total number of trials being equal to the number of
subinten~alswhich we have ensured to be large; also, the length for each subinterval
being small, the probability of an arrival (success) ns likely to be small.
'Thus we have seen that there are situations where both binomial and Poisson are
applied. The rule of thump followed by most statisticians is that if n _> 20 and
<
p 0.05, then Poisson formula can be used to calculate binomial pobability.
It is clear that the Poisson calculation is simpler than the binomial calculation. An
advantage of the Poisson distribution. , if it is applicable, is that it has only one
parameter, A, whereas the binomial distribution has two parameters, n and p;
consequently, Poisson probabilities can be tabulated more compactly than binomial
probabilities. For example, the Poisson probability P(3) is the same for n = 200,
p = 0.01 as it is for n = 100, p = 0.02, and for any cther pair of n and p values whose ,
product is X = np = 2.
By now you must have got a fairly good idea where the Poisson formula can be used.
In all the situation we have considered so far, we have calculated the probability over an
interval of time. But there are situations where we need to calculate probability over a
region (or space) or something else as o m physical reference. In the following example
we given such a situation and illustrated how to use Poisson distribution to calculate the
probability.
Example 5: During second world war,a v-3 rocket hit in South London. Later a study
was conducted on'what are regions not affected by the rocket hit.Let us see how they
used Poisson distribution for this study.
They took X as the average number of hits per unit area (Note that earlier in the formula
X was average rate per unit time).Instead of the variable't' they replace the variable 'v',
and x denotes the number of hits per unit area. Then they assumed that all the
conditions to satisfy the Poisson formula holds in this case.With all this assumptions,
they calculated the probability using the formula

According the problem stated, they have to calculate the probability of 'no hit' per unit
area. That is, the x = 0 and v = 1, so that Xv = A. Now, to calculate A, what they did
was, they divided the area into 576 areas of equal size (the number 576 is chosen based
on some other study and they found that they were 537 hits).
537
:. the average number of hits per unit area X = - = 0.9323
576
Then the required Probability is
This means that if we take one region, then the probability that the region is not hit by Probability Distributions
the rocket is 0.3936. Hence, out of 576 regions, the number of regions not hit by the
rocket is given by

Now, the actual number got from the record was that there are 229regions not hit by the
rocket. This number is quite close to 226. This shows that the values got using Poisson
formula are very close to the actual values.
Thus we saw that the Poisson distribution is very effective in studying various real-life
problems where the occurrence is very rare.
One of the main disadvantages of this distribution is that it is applicable only in
situation where the outcomes are independent i.e. each outcome is independent of what
happened previously.
In the next section we shall discuss another standard distribution.

3.5 UNIFORM DISTRIBUTION


The uniform distribution is the simplest of a few well-known continuous distributions
which occur often.
As we have seen in Sec.3.2 in the continuous case we are interested in behaviour of the
variable in the subintervals of the sample space; rather than at single points. If for
example, the sample space is what we call the unit interval [0, 11, and we set the random
variable X as a value selected from this interval, then we are no longer interested in the
outcome of the kind {X = a), but rather outcome of events of the kind {a < x < b) i.e.
values lying between the two numbers a and b, where 0 5 a 5 b 5 1.
Suppose X is a random variable such that if we take any subinterval of the sample
space, then the probability of this interval is the same as the probability of any other
subinterval of the same length. The distribution corresponding to this r;v. is called a
uniform distribution. As the name suggests the probability is uniform along
subintervals.
Let us see some examples of such sample spaces.
Example 6: A train is likely to arrive at a station at any time between 6.10 p.m. and
6.40 p.m. The time the train reaches, measured in minutes, after 6 p.m. is a random
variable X. Here X can take any value between 10 and 40 minutes. Therefore the
sample space is the interval (10,40). It is reasonable to assume that the likelihood for X
taking any value between 10 and 40 is equal. So if we take subintervals of equal
lengths, then the probability will be the same. The distribution corresponding to this r.v.
is uniform over the interval (10,40).
***
Example 7: An office fire drill is scheduled for a particular day, and the fire alarm is
likely to ring at any time between 9 a.m. and 5 p.m. The time the fire alarm starts,
measured in minutes, after 9 a.m. is therefore a random variable which takes any value
between 0 and 480 (= 8 hours = 8 x 60 = 480minutes) equally. The distribution
corresponding to this r.v. is uniform.
***
Now, why don't you look for such samplz spaces on your own. Try this exercise now.

E13) Verify whether the following situations can be described by uniform distribution
or not?
Statistics and Probability a) The average life span of a life bulb produced by a manufacturing company.

b) The number of defective items produced by an assembly process.

Next we will see how we can define (calculate) the probabilities for this distribution. As
we have seen in Sec. 3.2 , in the case of a continuous distribution, the probabilities are
calculated using a function called 'probability density function' (p.d.f.). The p.d.f. for
uniform distribution is given as follows.
Definition 5: The pd.f. of a random variable X which is distributed uniformly in the
interval [a,b], where a < b is given by

I0, otherwise

We can easily draw the graph of this distribution. It is given in Fig.8.

Fig. 8: draph of P.d.f. of a uniform distribution


Now let us see how we calculate different probabilities for this distribution. As stated in
Sec.3.2, for a continuous r.v., we calculate the probability of an interval rather than a
point. For example, what will be P[c < X < dl where a < c < d < b? We have seen
that it is given by the area above this interval and under the graph. The area is shown in
Fig.9.

Fig9 P [c < X < d] =Area of the rectangle shown


1
So, essentially it is the area of thc rectangle with length d - c and height = -i.e.
b -a
1
P[c < X < d] = ( d - C ) x -
b -a
For example, if we take the situation in Example 4, let us find the probability that the
alarm sounds between 1 p.m. and 2 p.m. Here the pdf, Pmbabiiity Diiributiollc

= 0, otherwise
To find the required probability, you have to find time elapsed in minutes between 9
a.m. and 1 p.m. and between 9 a.m. and 2 p.m.
For 1 p.m. this is 4 x 60 = 240 minutes.
Similarly, for 2 p.m., it is 5 x 60 = 300 minutes.
Therefore you have to calculate the probability P[240 < X < 3001. This is given by the

-10
This area is the rectangle with base 60(= 300 - 240) and height &.

That is there is U.5% chance that the alarm sounds between 1 p.m. and 5 p.m. [Some
of you may think that this fact was rather obvious from the statement of the-problem
itself. But we have given this situation as an illustrative example. There are situations
which are complicated, where we can easily calculate the probability asing this
distribution.]
Next we state below the expected value of this distribution.

You can try this exercise now.

E14) Suppose that the weight of sugar obtained by processing a tank of sugar cane juice
is uniformly distributed with a mean of 10 kg. and range of 1.8 kg. Then
i) What are the largest and smallest weights of sugar obtained from a tank of
sugar can juice?
ii) What is the probability that a tank of juice will yield sugar weighing between
9 kg. and 10.5 kg.?

E15) A train is due to arrive at 5.30 p.m. but in practise is equally likely to amve at any
time between 2 minutes early and 30 minutes late. Let the time of.arrival
(expressed as minutesgrom due time) be X. Sketch the pdf f(x) of the r.v. X and
shade the areas given bellow
1) The probability that the train is less than 10 minutes late.
Statistics and Probability 2) The probability that the train is late, but less than 16 minutes late.

Next we shall discuss another continuous distribution which is widely used in statistical
problems.

3.6 NORMAL DISTRIBUTION


Normal distribution' is a class of distribution which can be used to study the probability
distribution occumng frequently in real-life situations, of biology, manufacturing
machines, psychology etc.
A particular form of this distribution was found by seventeenth - eighteenth century
mathematicians Abraham De Moivre and Pierre Laplace, while they were working on
various problems in probability. They found that the distribution corresponding to
certain random variables had got special property that when graphed, a bell-shaped
curve is obtained and came to be called the normal pattern. The graph of the pattern
became known as normal curve. Later this class of distribution was studied extensively
by another mathematician Karl Friedrich Gauss and therefore this became known as
'Gaussian distribution'.
We shall now state the distribution. I

Definition 6: We say that a random variable X is normally distributed with parameters


p and a if the probability density function f(x) of X is given by,

Z
where p is a real number lying between -oo and oo and a is a real number lying
between 0 and CQ.
The function f(x) may look rather formidable to you at first sight. At this stage we just
ask you to notice that it involves two parameters, a and p. Corresponding to each pair
(p, u), we get a distribution. Therefore there is a whole family of distributions, each
one specified by a particular pair of values for a and p.
The most important characteristic of this distribution is that the graph of pdf, f(x) for a
particular value of p and a is bell-shaped as shown in Fig.11.
The probability density function, pdf is also symmetrical about the mean p. The word
symmetrical means that the two halves of the curve are mirror images (see Fig.11). In
Fig. 11 you note that if we place a mirror on the dashed vertical line ( which occurs at
75 in Fig. 11) then the mirror image of the portion on the left is the same as the portion
on the right side.

Both p and a have a 'nice' interpretation. We have already said that the pdf is
symmetric about p, SO it is no surprise that p is the mean of the distribution. The
other constant, a2 dictates how spread out and flat the 'bell-shape' is and in fact u2is
the variance of the normal distribution.
As an illustration, the following figure shows that the normal pdfs for p and a are given Probability Distributions
as follows:
A /L = 10,u = 1
B p = 10,u=2
C p=lO,u=3
t D p=15,u=l

5 10 15
Fig.12
Pdfs A, B and C all have the mean 10 and so they are all centred at x = 10. Of these
three curfes, C has the largest variance and so is the most 'spread out'. Curve I3 has a
smaller variance and so is less spread out, and curve A has the srriallest variance and so
is the most 'squeezed in'. Curves A and D have the same variance and so they have
exactly the same shape, but they have different means so they are centred at x = 10 and
x = 15 respectively.
Some notation
As a normal distribution is entirely specified by its parameters p and u we denote such
distribution by N ( p ,02)'where p is the mean and a2is the variance. So, for instance,
the curve shown in (A) above is the pdf N(10,l) the curve in (B) is the pdf.N(10,4)
and so on.
The standard normal distribution
The normal distribution with mean p = 0 and variance a2 = 1, is called the standard
nonnal distribution. Z is the notation usually used far a random variable which has this
distribution. A graph of the standard normal pdf, p(z) is shown in F1g.13.

I
I - 1 I I I I

-4 -3 -2 -1 0 1 2 3 4

Fig13
Notice that most of the area under the standard normal curve lies between -3 and +3.
Calculating Probabilities
The normal distribution is continuous and so the probability that the random variable X
lies between the interval (a, b) is are calculated by obtaining the area under the pdf
curve between a and b.
Statistics and Probability For example, suppose an individual's IQ scm X has a normal distribution with
p .= 100 and standard deviation a = 15. Fig. 14 shows the areas under the pdf which
correspond to P(X < 85) and P(115 < X < 120).

Rg.14
Unfortunately there are no 'nice' formulae for calculating such areas. But there are
tables available from which we can find out the area. Statistical software are also
available by which we can calculate the area.
Because the number of possible values for p and a is unlimited, the number of different
normal distributions is unlimited. However, probabilities for every normal distribution
can be obtained from a table of probability for standard normal distribution.
We shall first discuss how to use the table for calculating probabilities for a standard
normal distribution. Then we shall discuss how to use this to find the probability for
any normal distribution.
Using tables to calculate normal probability
We defiote by F(a) = P[Z 5 a], the probability that the standard normal variable Z
takes values less than or equal to 'a'. The values of F for different values of:a are
calculated and listed in a table. One such table is given in Sec. 3.9 Appendix.
Note that the entries in the table are the values of z for z=0.00,0.01,0.02,3.49. To find
the probability that a random variable having the standard normal distribution will take
on a value between a and b, we use the equation
P[a < z < b] = F(b) - F(a),
and, if either a or b is negative, we also make use of the identity

In the following exampie we illustrate how we use the table to calculate different
probabilities.
Example 8: Suppose want to find the following probability
i) P[0.87 < Z < 1.281
ii), P[-0.34 < Z < 0.621
iii) P[Z 2 0.851
iv) P[Z 2 -0.651
We proceed as follows.
i) We know that

To find F(1.28), we find the row where 2k1.2, then move across that iow to the
column headed 0.08 and found the entry 0.8997. Similarly we catr find thatF(0.87)
= 0.8078. Then the requed Probability is
-
CI

ii) Similarly, Probability Distributions


P[-0.34 < Z < 0.621 = F(0.62) - F(-0.34)
= F(0.62) - [l - F(0.34)]
by the identify F(z) = 1 - F(z).
= 0.7324 - (1 - 0.6331)
I = 0.3655.
iii) From the previous unit (Unit 2), you have already learnt that

Hence we have
P[Z > 0.851 = 1 - P[Z 5 0.851
= 1 - F(0.85)
= 0.1977.
v) As in (iii), we can write
P[Z > 0.65 = 1 - P[Z 5 -.0.65)]
= 1 - F(-0.65)

In the following exercise we ask you to find certain probabilities using the normal
distribution table.

E16) If a random variable has the standard normal distribution, find the probability that
it will take on a value
i) less than 1.50
ii) less than - 1.20
iii) greater than - 1.75
E17) A filling machine is set to pour 952 ml (rnillimetres) of oil into bottles. The
amounts of fill are normally distributed with a mean of 952 ml. and a standard
deviation of 4 ml. Use the standard normal table to find the probability that a
bottle contains oil between 952 and 9$6 ml.

Next we shall see that how to use the standard normal probability table to calculate
probability of any normal distribution.
Standardising

i
Any normal random variable X, which has mean ,u and variance a2can be standardised
as follows.
Take the variable X,and
L
i) subtract its mean, p and then
ii) divide by its standard deviation, a.
We will call the result. 2. so

Z=- x - P
a
For example, suppose, as earlier, that X is an individual's IQ score and that it has a
normal distribution with mean p = 100 and standwd deviation a = 15. To standardise
Statistics and Probability an individuals IQ score, X, we subtract p = 100 and divide the result by u = 15 to give,

In this way every value of X, has a corresponding value of Z. For instance, when
X = 1 3 0 , Z = T 130-100
- -2andwhen~=90,~=-=-0.67.

The distribution of standardised normal random variables 4


The reason for standardising a normal random variable in this way is that a standardised
normal random variable
t
z=-x - Pu

has a standard normal distribution.


That is, Z is N(0,l). So if we take any normal random variable, subtract its mean and
then dividz by its standard deviation, the resulting random variable wiil have a standard
ncjrmal distribution. We are going to use this fact to calculate (non-standard) normal
probabilities.
Calculating probabilities
With reference to the prob!em of IQ score, suppose we want to find the probability that
an individual's IQ score is less than 85, i.e. P[X < 851. The corresponding area under
the pdf N(100,15~)is shown in Fig.15.

'fi-e cannat use r,omal tables directly because these give N ( 0 , l ) probabilities. Instead,
we will convert the statement X < 85 into an equivalent statement which involves the
standardised score, Z = because we know it has a standard nonnal distribution.
We start with X =85. To turn X into Z we must standardise the X, but to ensure that we
preserve the meaning of the statement we must treat the other side of the inequality in
exactly the same way. (Otherwise we will end up calculating the probability of another
stalement, not X < 85). 'Standadising' both sides gives, X-100 < 85-109
T.
The left hand side is now a standard normal randoin variable and so we can call it Z,
and we have t

So we have established that the statement we started with, X < 85 is equivalent to


Z i-- 1. This means that whenever an IQ score, X, is less than 85 the corresponding
standardised score, Z will be less than - 1 2nd so the probability we are seeking,
84 P[X < 851 is the same as P[Z < - 11.
P[Z < -I]. is just a standard normal probability and so we can look it up in Table 1 in Probability Distributions
the usual way, which gives 0.1587. We get that PIX < 853 = 0.1587.
This process of rewriting a probability statement about X; in terms of Z, is not difficult
if you are systematically writing down what you are doing at each stage. We would lay
out the working we have just done for P[X < 851 as follows.
X has a normal distribution with mean 100 and standard deviation 15. Let us find the
probability that X is less than 85.

I
P[X < 851 = P <
15
= P[Z < -11 = 0.1587
I
Let us do some problems now.
Problem 6: For each of these write down the equivalent standard normal probability.
a) The number of people who visit a historic monument in a week is normally
distributed with a mean of 10,500 and a standard deviation of 600. Consider the
probability that fewer than 9000 people visit in a week.
b) The number of cheques processed by a bank each day is normalljr distributed with
a mean of 30,100 and a standard deviation of 2450. Consider tle probability that
the bank processes more than 32,000 cheqiles in a day.
Solution: Here we want to find the standard normal probability corresponding to the
probability P[X < 90001.
X - 10500 9000 - 105001
a) We have P[X < 90001 = P < = P[Z < -2.51.
600
b) Here we want to find the standard normal probability corresponding to the
probability P[X > 320001.

3 2-0-0 - 30100] =
X -- 30100 > -
P[X > 320001 = P --
[
2450 2450
> 0.781
X

Note Probabilities like P[a < X < b] can be calculated in the same way. m e only
difference is that when X is standardised, similar operations must be applied to both a
and b. That is, a < X < b becomes

a - p X
< - p< - b - p
u u Cr

which is

a - p < ~ < b- - P
u u

1 Problem 7: An individual's IQ score has a N(100, 1S2)distribution. Find the


probability that an individual's IQ score is between 91 and 121.
i
f Solution: We require P[91 < X < 1211. Standardising gives

I The middle term is a standardised normal random variable and so we have,


Statistics'and Probability Try these exercises now.

E18) A flight is due at Palam airport at 1800 hours. Its arrival time has a normal
distribution with mean 1810 hours and standard deviation 10 minutes.
a) What is the probability that the flight arrives before its due time?
b) Passengers must check in for a connecting flight by 1830 at the latest. What
is the probability that passengers from the first flight arrive too late for the
connecting flight? (Assume no travelling time from aircraft to check-in.)
E19) The length of metallic strips produced by a machine has mean 100 cm. and
variance 2.25 cm. Only strips with a weight between 98 and 103 cm. are
acceptable. What proportion of strips will be acceptable? Yoq may assume that
the length of a strip has a normal distribution.

With this we come to an end of this unit.


Let us now summarise the points we have covered in this unit.

3.7 SUMMARY
In this unit we have covered the following points
1) A random variable is a variable that takes on different numerical values according
to chance outcomes
2) There are two types of random variables - discrete and continuous;
3) A probability distribution gives the probabilities with which the random variables
take an various values in their range.
4) We have discussed three standard distributions:

a) Binomial Distribution. The probabilities of an event P[X = T] in this


distribution is given by
P[X = r] = C(n, r)Pqn-'
b) Poisson distribution: The probability of an event PIX = x] in this
distribution is given by

where X is a constant for a particular situation.


c) Uniform Distribution The probability density function is defined by
1
- ifa<xLb
b-a'
0, elsewhere
-
\
d-c
The probability P[c < X < d] =
6-a
d) Normal distribution. The probability for this distribution is calculated by 1
finding the area, under the cuee of a fmction called probability density
function defined by I

El) a) If X denote the number of correct answers, then X is the random variables for
this situation.
--

b) X can take values 0,1,2, . . . up to 50 Probability Distributions


c) P[X = 401 means the probability that the number of correct answers is 40.
E2) 1 and 2 is not discrete. 2 and 3 are discrete.
(1) is not discrete because it takes values in aii interval.
( 2 ) is discrete because the number of accidents is finite. Similarly, you argue for
the situation in (3).
E3) Let X denote the amount you win or lose. Then X takes values Rs. SO, 0 or - 10
(loss in Rs. 10). The probability that both the marbles are green is 119. The i.e.
P[X = 501 = 119. The probability that both the marbles are red is 419 i.e.
PIX = - 101 = 419.
The probability that the marbles are of different colou'r is 419 is i.e. P[X = 01 =
419.
Thus the probability distribution is as given in the following table.
Amount (in Rs. won (+) ot lost(-) Probability
50 1/4
0 4/9

E4) He has to calculate the mean. It is given by


0 x 0.5+ 1 x 0.15+2 x 0.35+4 x 0.12 x 5 x 0.18
Mean =
+ +
0.05 0.15 + 0.35 + 0.25 0.12 + 0.18

This means that he can expect that on an 2 average cars will be sold per day over
long run (or more precisely 5 cars will be sold over (2 days)).

Fig.16
ii) The area under the graph and above the interval [2,4] is the area of the
rectangle shown in Fig.16 which is given by
4
Area = 2 x = 1.
:. f defines a probability density function of X.
E6) 1,3,4,5,6 are discrete. 2 and 7 are continuous.
E7) Here we have to calculate Probability of 3 heads and 2 tails. That is P[X = 31.
I
Tbe possibility of getting 3 heads and 2 tails in five tosses of a coin is given by
[ H H m I , [HHTHTI, [HHnHI, [HTHHTI, [HTTHHI, [THTHHI, [ m l ,
[TI-IHHT], [THHTH], [THTHH], [HTHTH].
Each of these events are having probability p3q2 and there are 10 such events.
Therefore we get
PIX= 31 = 1 0 ~ ~ ~ ~ . 87
If we apply the formula in Equation (3) to find P[X = 31, then we substitute
r = 3, n = 5 in the formula, and we get
PIX = 31 = ~ ( 5 . 3 ) ~ ~ ~ ~

E8) This situation follows binomial distribution with n=4 and p = #$ = $.The
random variable X is the ~unlberof seeds that germinate. We have to calculate the
probability that exactly two of the four seeds will germinate. That is P[X = 21.By
applying binomial formula, we get

Therefore the required probabiliw Is 0.154.


E9i If Xi denote the random variable that the ith customer buys the piiper on a given
dzy, then Xi's may not be identically distributed. Therefore Xi's may not be
binomially distributed. But if the customers are having the same business
activities or same kind of habits or working nature, then we can expect that Xi's
wiil be identically distributed. In such situatiorl we can expect that Xi's will
follow bi~omialdistribution.
+ + +
E10) The number of sales in a day is actually X1 X2 . . . Xio where each Xi Is
either 0 or 1 depending on whither customer i buys the paper or not on the given
day. Now since customer 8 is more likely to buy on a day, than customer 3, X3
and X8are not identically distributed. That is, PIXs = 11 > PIX3 = 11. Therefore
+ + +
X1 X2 . . . Xlo cannot be thought of as binomially distributed random
variable.
E l 1) Since the problem deals with the receipts of bad cheques which is an event with
rare occurrence over an interval of time (a day, in this case), we can apply Poisson
distribution.
Since on an average 6 bad cheques are received per day,
Substituting X = 6 ar,d x = 4 in the Poisson Formula, we get
64e-6 1296 x (0.0025)
P[X=4] = -- -
4! 2.4

E12) Note that here the experiment or trial is 'checking the machine for its functioning.
There are 20 trials and each trial is identically distributed with probability 0.2.
i) The trials are independent also. Therefore we can apply binomial formula.
We are required to calculate P[X = 31. Then

ii) Here we have to check whether we can apply Poisson distribution.


Note that here the occurrences are 'function of the dialysis machines'. Then the
average rate of machines that go out of service in a day is a constant
X = 20 x 0.02 = 0.4.
Also note that we can make the subintervals so small that at best only one
88 machine go out of service. Thus conditio; (2) and (3) are satisfied. Therefore we

i I
.,cab,apBIp Poissoa F d a tc calculat: the required probability P(3). Then Probability Distributions
(0.4)~e-Q.~
a(q = ,
31

I fjM)j) The mean, 10 is the centre point of a line segment whose length is the range,
1.B kg. Hence, the line segment extends x (1.8)= 0.9kg. to the left and to

i the right of' 10 i.e. 9.1 to 10.9 kg. Hence tbe smallest weight is 9.1 kg. and
the largest weight is 10.9 kg.
ii) We requir6d.t~calculate p[9 < X < 10.51.

= 0.833.
That is the probability that the weight lies oetween 9.1kg. and 10.9 kg. is
0.833.
El51 Here the pdf f(x) is given by
5.28 < x'< 6
0,otherwise
The sketch of f(x) is as given belaw.

0.1 -

1/32 - -
5.28 6

, Fig.17
i)Tafind fie required probability we note that X can take values in the interval
(5,28,5.40). Hence the required probability is
1 0.003
P[S.28 < X < 5.401 = 0.2 X - =
32 8
-
= 0.0037
El@ a) 0.9332
b) Q,1151
C) 0.9599,
E17)The standard normal probability c o v n d i n g to this probability is given by
P[952 < X < 9561 =
= P[O< z < I]
= F(l)- F(0)
= 0.8413 - 0.5
= 0.343.
Statisticsand Probability E18) Let the time of arrival in minutes past 1800hrs be X. Then X follows normal
distribution N(10, lo2).
a) The required probability is P[X < 181. The standard probability
corresponding to this is

= F(O.O1)
= 0.5040.
b) The required probability is P [X ) 301. Then

E19) We,- have


- _ _ . P[98 < X
to find the probability < 1031. The standard probability Is

So, only 13 % will be acceptable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy