0% found this document useful (0 votes)
35 views77 pages

Probability Theory Lecture Note

Uploaded by

smiletopeace14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views77 pages

Probability Theory Lecture Note

Uploaded by

smiletopeace14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Probability Lecture Note Stat(2012)

CHAPTER ONE
INTRODUCTION
Deterministic and non-deterministic models
Deterministic models always return the same result any time they are called with a specific set
of input values. Nondeterministic models may return different results each time even if the
input values that they access remain the same. In deterministic models the condition under
which an experiment is performed determines the outcome of the experiment but in
nondeterministic models we cannot determine the outcome of the experiment even if we know
condition under which an experiment is performed.
A mathematically deterministic model is a representation y = f(x) that allows you to make
predictions of y based on x. The model is used like this: when x = 3, then we predict that
y = f (3). For example, suppose y = 2+3X - 4𝑋 2 . We can predict that if x = 3, then y = -25
❖ Mathematical model in which outcomes are precisely determined through known
relationships among states and events, without any room for random variation. In such
models, a given input will always produce the same output.
❖ A deterministic event always has the same outcome and is predictable 100% of the time.
➢ Distance traveled = time * velocity
➢ The speed of light
➢ The sun rising in the east
❖ Note that this "prediction" does not necessarily occur in the past, future, or even the present.
It is simply a hypothetical, "what-if" statement. It helps us identify what would be the
outcome if we were to use a particular x.
❖ This type of model is "deterministic" because y is completely determined if you know x.
Non-deterministic/ probability model
❖ A probability model is a representation y ~ p(y). Note that we say "y ~ p(y)" not "y =
p(y)". The notation "y ~ p(y)" specifically means that y is generated at random from a
probability distribution whose mathematical form is p(y). This model also allows you to
make "what-if" predictions as to the value of y, but, unlike the deterministic model, it
does not allow you to say precisely what the value of y will be.
A probabilistic event is an event for which the exact outcome is not predictable
100% of the time.

1
Probability Lecture Note Stat(2012)

The number of heads in ten tosses of a coin.


The winner of the World Series.
The number of games played in a World Series.
The number of defects in a batch of product.
Example: Suppose you wish to predict whether the next customer will buy a red car, a gray
car or a green car. The possible values of y are "red", "gray", or "green", and the distribution
p(y) (probability of y) might have the form
y red gray green total
P(y) 0.35 0.4 0.25 1.00
The model does not tell you precisely what the next customer will do, because the model simply
says it is random: y could be either "red", "gray" or "green". However, the model does allow
aggregate what-if predictions as follows: "What if we were to sell a car to next 100 customers?
Then about 35 of them would buy a red car, 40 would buy a gray car, and 25 of them would buy
a green car."
Again, this "prediction" does not necessarily occur in the past, future, or even the present. It is
simply a hypothetical, "what-if" statement. The probability model allows us to predict aggregate
outcomes if we were to observe a large number of y values.
1. Review of set theory
A set is a well-defined collection of objects. Sets are usually denoted by capital letters A, B etc.
Examples
1. A= {1, 2, 3, 4} describes the set consisting of positive integers 1, 2, 3 and 4.
2. A={𝑋: 0 ≤ 𝑋 ≤ 1} ⇒A consisting of all real numbers between 0 and 1 inclusive or the set of
all X’s where X is a real number between 0 and 1, inclusive.
The individual objects making up the collection are called members or elements of A.
Examples: 1∈A and 6∉A
U= Universal set: the set of all objects under considerations.
Empty set or Null set: - the set containing no members. i.e. {} or 𝜙 ⇒ impossible event.
Example: A is the set of all real numbers X satisfying the equation 𝑋 2 + 1 = 0. A={ }
Union:Let A and B be any two subsets of a universal set, S. Then, the union of the Sets A and B
is the set of all elements in S that are in at least one of the sets A or B, it is denoted by A∪B.

2
Probability Lecture Note Stat(2012)

Intersection:Let A and B be any two subsets of a specified universal set, S. Then the intersection
of the sets A and B is the set of all elements in S that are in both sets A and B, and is denoted by
A∩B.
De Morgan’s rules:for any two A and B events
❖ (𝐴 ∩ 𝐵)𝑐 = 𝐴𝑐 ∪ 𝐵𝑐
❖ (𝐴 ∪ 𝐵)𝑐 = 𝐴𝑐 ∩ 𝐵𝑐
Generally to know about the probability of an event, there must be know the concept of set
theory about the notation of set probability.
If A and B are events then the following conditions are true.
➢ If A and B are disjoint events then A∩B= ∅
➢ If at least one of the events occurs ,then it means that A∪B
➢ Both the events are occurs, then it means that A∩B
➢ If neither event A nor event B occurs, then it means that 𝐴 ∩ 𝐵
➢ Exactly one of the event A occurs, then it means that 𝐴 ∩ 𝐵
➢ If exactly one of the events occurs, then it means that (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
➢ If not more than one of the events A or B occurs, then it means that (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵) ∪
(𝐴 ∩ 𝐵)
Exercise:
1
1. Suppose that A, B, and C are events such that P(A) = P(B) = P(C) = 4,
1
P(A ∩ B) = P(C ∩B) = 0, and P(A∩ C) = 8· Evaluate the probability that at least one of
the events A, B, or C occurs.
2. Let A, B, and C be three events associated with an experiment. Express the following
verbal statements in set notation.
A. At least one of the events occurs. F. At least two events occur.
B. Exactly one of the events occurs. G. All three events occur.
C. Exactly two of the events occur. H. None occurs.
D. Only A occurs. I. At most one occurs.
E. Both A and B but not C occurs. J. At most two occurs.
K. Not more than two of the events occur simultaneously.
3. Random experiments, Sample space and events
1. Experiment: Any process of observation or measurement or any process which generates well
defined outcome.
2. Probability Experiment (Random Experiment): It is an experiment that can be repeated any number
of times under similar conditions and it is possible to enumerate the total number of outcomes
without predicting an individual outcome.
Example: If a fair coin is tossed three times, it is possible to enumerate all possible eight sequences of
head (H) and tail (T). But it is not possible to predict which sequence will occur at any occasion.
3. Outcome: The result of a single trial of a random experiment

3
Probability Lecture Note Stat(2012)

4. Sample Space(S): Set of all possible outcomes of a probability experiment.

Example: Sample space of a trial conducted by three tossing of a coin is


S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Sample space can be
➢ Countable (finite or infinite)
➢ Uncountable
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or more

outcomes of a random experiment. It is denoted by capital letter A, B, C - - -.


For example, in the event, that there are exactly two heads in three tossing of a coin, it
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members with two possible outcomes in each trial then there
are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.

7. Complement of an Event: the complement of an event A means non- occurrence of A and is

denoted by A' or Ac or A , contains those points of the sample space which don’t belong to A.

8. Elementary (simple) Event: an event having only a single element or sample point.

9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same time.

10. Independent Events: Two events are said to be independent if the occurrence of one does

not affect the probability of the other occurring.


11. Dependent Events: Two events are dependent if the first event affects the outcome or

occurrence of the second event in a way the probability is changed.


4. Counting Techniques
In order to calculate probabilities, we have to know
❖ The number of elements of an event
❖ The number of elements of the sample space
That is in order to judge what is probable, we have to know what is possible.
➢ In order to determine the number of outcomes, one can use several rules of counting
❖ The addition rule ❖ The permutation rule
❖ The multiplication rule ❖ The combination rule

4
Probability Lecture Note Stat(2012)

Addition Rule: Suppose that for an experiment the 1𝑠𝑡 procedure is designated by 1can be
performed 𝑛1 ways, the 2𝑛𝑑 procedure is designated by 2 can be performed 𝑛2 ways, … the 𝐾 𝑡ℎ
procedure is designated by K can be performed 𝑛𝑘 ways and each procedure/stps cannot be
performed together, then the total number of possibility of the experiment can be performed
(𝑛1 + 𝑛2 + 𝑛3 + … + 𝑛𝑘 )𝑤𝑎𝑦𝑠
I.e. n (A or B) =n (A) +n (B)-n (A B)
To list the outcomes of the sequence of events, a useful device called tree diagramis used.
Example:1A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk
with bread, cake and sandwich. How many possibilities does he have?
Solutions:

cake cake cake

Tea sandwich Coffee sandwich Milk sandwich

bread bread bread

There are nine Possibilities


Example 2: Suppose we are planning a trip to some place. if there are 5 bus routes and 3 train
routes that can take, then there are 8 different routs possibility that we can take the trip place.
The Multiplication Rule:
If an operation consists K steps and the 1𝑠𝑡 step can be performed in 𝑛1 ways,the2𝑛𝑑 step can be
performed in𝑛2 ways (regardless of how the 1𝑠𝑡 step can be performed),the 𝐾 𝑡ℎ step can be
performed in 𝑛𝑘 ways(regardless of how the preceding steps can be performed),then the total
number of possibility of the experiment can be performed (𝑛1 ∗ 𝑛2 ∗ 𝑛3 ∗ … ∗ 𝑛𝑘) ways.
Example: 1) A student has two shoes, three trousers and three jackets. In how many can be
dressed?
2. The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many different
cards are possible if
I. Repetitions are permitted. II. Repetitions are not permitted
Permutation:

5
Probability Lecture Note Stat(2012)

A linear ordering of a set of n distinct objects is called a permutation of theobjects. It is easy to


see that the number of distinct permutations of n > 0 distinct objects is n!=nx(n-1)x…2x1.
An arrangement of n objects in a specified order is called permutation of the objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! =n*(n-1)*(n-2)*,…,*2*1.
2. The arrangement of n objects in a specified order using r objects at a time is called the

permutation of n objects taken r objects at a time. It is written as n Pr and the formula is


n!
Pr =
n
(n − r )!
3. The number of permutations of n objects in which k1 are alike, k2 are alike ---- etc is
n!
nPr =
k1  k 2  ....k n
Example:
1. Suppose we have a letters A, B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“MISSISSIPPI”?
3. In how many ways can a party of 7 people arrange themselves?
a) In a row of 7 chairs?
b) Around a circular table?

Combination: A combination is a set of elements without repetitions and without regard to


ordering. The number of combinations of n distinct objects taken r at a time is the number of
𝑛
subsets of size r, taken from the n things without replacement. We write this as( ).
𝑟
A selection of objects without regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two letters
Solution
Permutation
AB BA CA DA
AC BC CB DB
6
Combination
AB BC
Probability Lecture Note AC BD Stat(2012)
AD DC
AD BD CD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule: The number of combinations of r objects selected from n objects is denoted

by nC r or (nr ) and is given by the formula: nCr =


n!
(n − r )!r!
Example:
1. In how many ways can a committee of 5 people be chosen out of 9 people?
2. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 Mathematician and 3
Statistician is to be formed. In how many ways this can be done if
i. There is no restriction
ii. One particular Statistician should be included
iii. Two particular Mathematicians cannot be included on the committee.
3. A committee of 5 people must be selected out 5men and 8 women. In how many ways can be
selection made if there are three women on the committee?
Definitions of probability
Probability is the chance of an outcome of an experiment. It is the measure of how likely an
outcome is to occur.
❖ In any random experiment there is always uncertainty as to whether a particular event will or
will not occur. As a measure of the chance, or probability, with which we can expect the event to
occur, it is convenient to assign a number between 0 and 1. If we are sure or certain that the
event will occur, we say that its probability is 100% or 1, but if we are sure that the event will
not occur, we say that its probability is zero.
1.6 Approaches to measuring Probability
There are four different conceptual approaches to study probability theory. These are
➢ The classical approach ➢ The axiomatic approach
➢ The frequencies approach ➢ The subjective approach
1. CLASSICAL APPROACH :
If an event can occur in h different ways out of a total number of n possible ways, all of which
are equally likely, then the probability of the event is h/n.
EXAMPLE: - 1. Suppose we want to know the probability that a head will turn up in a single
toss of a coin. Since there are two equally likely ways in which the coin can come up—namely,
heads and tails (assuming it does not roll away or stand on its edge)—and of these two ways a

7
Probability Lecture Note Stat(2012)

head can arise in only one way, we reason that the required probability is 1/2. In arriving at this,
we assume that the coin is fair, i.e., not loaded in any way.
Example
1. A fair die is tossed once. What is the probability of getting
A. Number 4? C. Number greater than 4?
B. An odd number? D. Either 1 or 2 or …. Or 6
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of these
candles are selected at random, what is the probability?
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
2. Frequency approach: If after n repetitions of an experiment, where n is very large, an event
is observed to occur in h of these, then the probability of the event is h/n. This is also called
the empiricalprobabilityof the event.
Example 1:If we toss a coin 1000 times and find that it comes up heads 532 times, we estimate
the probability of a head coming up to be 532/1000=0.532.
Example 2: If records show that 60 out of 100,000 bulbs produced are defective. What is the
probability of a newly produced bulb to be defective?
❖ Both the classical and frequency approaches have serious drawbacks, the first because the
words “equally likely” are vague and the second because the “large number” involved is
vague. Because of these difficulties, mathematicians have been led to an axiomatic approach
to probability.
3. Axiomatic Approach ( Basic notion of Probability):
Probability theory is derived from a small set of axioms – a minimal set of essential assumptions
A deep understanding of axiomatic probability theory is not essential to financial
econometricsor to the use of probability and statistics in general, although understanding these
core conceptsdoes provide additional insight.
Let “E” be a random experiment and S be a sample space associated with “E”. With each event
A, we associate a real number designated by P (A) and called the probability of A satisfies the
following properties:

8
Probability Lecture Note Stat(2012)

1. 0  P( A)  1

2. P(S) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals the
sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)
4. If 𝐴1 , 𝐴2 , 𝐴3 , … , 𝐴𝑛 are mutually exclusive events, then

𝑃 (⋃ 𝐴𝑖 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑛 ) + ⋯ + 𝑃(𝐴∞ )


𝑖=1

For any finite n, 𝑃(⋃𝑛𝑖=1 𝐴𝑖 ) = ∑𝑛𝑖=1 𝑃(𝐴𝑖 )


4. Subjective Approach:
It is always based on some prior body of knowledge. Hence subjective measures of uncertainty
are always conditional on this prior knowledge. The subjective approach accepts unreservedly
that different people (even experts) may have vastly different beliefs about the uncertainty of the
same event.
Example: Abebe’s belief about the chances of Ethiopia Buna club winning the FA Cup this year
may be very different from Daniel's. Abebe, using only his knowledge of the current team and
past achievements may rate the chances at 30%. Daniel, on the other hand, may rate the chances
as 10% based on some inside knowledge he has about key players having to be sold in the next
two months.
2.7. Derived theorems of probability
1. P(S)=1
2. For any event A , P( A)  0
3. 𝑃(∅) = 0
4. For any event A and B, 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
5. P(A − )= 1 − P(A)
6. If A, B and C are any three events, then
𝑃(𝐴 ∪ 𝐵 ∪ 𝑐) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐵 ∩ 𝐶) − 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)
7. If 𝐴𝐶(𝑠𝑢𝑏𝑠𝑒𝑡) 𝐵, then 𝑃(𝐴) ≤ 𝑃(𝐵)
Reading Assignment:- Prove the above theorems.

9
Probability Lecture Note Stat(2012)

Exercise:
1. If two dice are thrown, what is the probability that the sum is
A. greater than 8? B. neither 7 nor 11?
2. An urn contains 8 white balls and 2 green balls. A sample of three balls is selected at
random. What is the probability that the sample contains at least one green ball?
3. A box contains 12 light bulbs of which 5 are defective. All the bulbs look alike and have
equal probability of being chosen. Three bulbs are picked up at random. What is the
probability that at least 2 are defective?
1
4. A problem in Mathematics is given to three students, whose chances of solving itare ,
2
1 1
3
𝑎𝑛𝑑 4respectively. What is the probability that problem will be solved?
1 2 3
5. The probabilities of A, B and C solving a problem are ,
2 7
𝑎𝑛𝑑 8respectively. If allthe three
try to solve the problem simultaneously, find the probability that exactly one ofthem will
solve it.
6. A husband and wife appear in an interview for two vacancies in the same department.The
1 1
probability of husband's selection is 𝑎𝑛𝑑 that of wife's selection is What is the
7 5
probability that
A. Only one of them will be selected?
B. Both of them will be selected?
C. None of them will be selected?
D. At least one of them will be selected?

10
Probability Lecture Note Stat(2012)

7. Almaze and Tamerat appear for an interview for two vacancies. The probabilityofAlmaze's
1 1
selection is and that of Tamerat's selection is . Find the probability that
3 5
A. both of them will be selected.
B. none of them is selected.
C. at least one of them is selected.
D. only one of them is selected.

11
Probability Lecture Note Stat(2012)

CHAPTER TWO:
Conditional probability and Independency
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the
other event then the two events are conditional or dependantevents. The conditional
probability of an event B is the probability that the event will occur given the knowledge that an
event A has already occurred. This probability is written P(B|A), notation for the probability of
B given A. In the case where events A and B are independent (where event A has no effect on the
probability of event B), the conditional probability of event B given event A is simply the
probability of event B, that is P(B).
If events A and B are not independent, then the probability of the intersection of A and B (the
probability that both events occur) is defined by P(A and B) = P(A)P(B|A).
From this definition, the conditional probability P(B|A) is easily obtained by dividing by
p(A  B
P (A B ) = , P (B) 0 or P (A  B) = P (A|B).P(B)
P (B )

To calculate the probability of the intersection of more than two events, the conditional
probabilities of all of the preceding events must be considered. In the case of three events, A, B,
and C, the probability of the intersection P(A∩B ∩ C) =P(A)P(B|A)P(C|A ∩B).
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
2. Draw a ball without replacement
Conditional probability rule
❖ Addition rule: (A1∪A2∪…∪|B)=P(A1|B)+P(A2|B)+… for mutually disjoint events
A1,A2…
❖ Complement rule: P(Ac|B)=1-P(A|B)
❖ Multiplication rule: P(A∩B)=P(B) P(A|B). More generally, for any events A1, A2,…,An
then P(A1∩A2∩…∩ An)=P(A1)P(A1|A2)P(A3|A1∩A2)…P(An|A1∩A2∩…∩An-1)
Conditional probability of an event
The conditional probability of an event A given that B has already occurred, denoted by P(A|B).
Since A is known to have occurred, it becomes the new sample pace replacing the original
sample space.
From this we are led to the definition

12
Probability Lecture Note Stat(2012)

p(A  B
P (A B ) = , P (B) 0 or P (A  B) = P (A|B).P(B)
P (B )
Remark:
1. P(A − B )= 1 − P(A B )
2. P(B − A)= 1 − P(B A)
3. For three events 𝐴1 , 𝐴2 𝑎𝑛𝑑𝐴3 then 𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ) = 𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 )𝑃(𝐴3 |𝐴1 ∩ 𝐴2 )
4. Generalization of multiplication theorem, for events 𝐴1 , 𝐴2 , 𝐴3 , … , 𝐴𝑛 we have
𝑃(𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛 ) = 𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 )𝑃(𝐴3 |𝐴1 ∩ 𝐴2 ) … 𝑃(𝐴𝑛 |𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛−1 )
Example:
1. For a student enrolling at freshman at certain university the probability is 0.25 that he/she
will get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that he/she
will get scholarship and will also graduate. What is the probability that a student who get a
scholarship graduate?
Solution: Let A= the event that a student will get a scholarship
B= the event that a student will graduate
given p ( A) = 0.25, p ( B) = 0.75, p( A  B ) = 0.20
Re quired p(B A)
p( A  B ) 0.20
p ( B A) = = = 0.80
p ( A) 0.25
2. A lot consists of 20 defective and 80 non-defective items from which two items are chosen
without replacement. Events A & B are defined as A = {the first item chosen is defective}, B =
{the second item chosen is defective}
a. What is the probability that both items are defective?
b. What is the probability that the second item is defective?
3. In the above example 2, if we choose 3 items after other without replacement, what is the
probability that all items are defective?
The law of total probability
Law of Total Probability: The “Law of Total Probability” (also known as the “Method of
Conditioning”) allows one to compute the probability of an event B by conditioning on cases,
according to a partition of the sample space.
Let {A1,A2, . . . ,An} be a partition of the sample space S, and suppose each one of the events
A1,A2, . . . ,An, has nonzero probability of occurrence. Let A be any event. Then
P(B) = P(A1)P(B|A1)+P(A2)P(B|A2)+···+P(An)P(B|An)

13
Probability Lecture Note Stat(2012)

=∑𝒏𝒊=𝟏 𝑃(𝐴𝑖)𝑃(𝐵|𝐴𝑖)
Definition (Partition):- We say that events 𝐴1 , 𝐴2 , … , 𝐴𝑛 represent a partition of sample space S

A1 A2 A3 …An

Partition is the collection of non-overlapping, non-empty subset of a sample space whose


union is the sample space itself. So the property of partition is;
I. 𝐴𝑖 ∩ 𝐴𝑗 = ∅𝑓𝑜𝑟𝑎𝑙𝑙𝑖 ≠ 𝑗
II. ⋃𝑛𝑖=1 𝐴𝑖 = 𝑆
III. 𝑃(𝐴𝑖 ) > 0 𝑓𝑜𝑟𝑎𝑙𝑙𝑖
Proof: from the partition sample space let B any event in sample space S

B
A1 A2 A3 … An
From the above diagram 𝐵 = ⋃𝑛𝑖=1(𝐵 ∩ 𝐴𝑖 )
𝐵 = (𝐵 ∩ 𝐴1 ) ∪ (𝐵 ∩ 𝐵𝐴2 ) ∪ (𝐵 ∩ 𝐴3 ) ∪ … ∪ (𝐵 ∩ 𝐴𝑛 )
(𝐵 ∩ 𝐴𝑖 )𝑎𝑛𝑑(𝐵 ∩ 𝐴𝑗 )𝑎𝑟𝑒𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒𝑖𝑓𝑖 ≠ 𝑗
∴ 𝑃(𝐵) = 𝑃(𝐵 ∩ 𝐴1 ) + 𝑃(𝐵 ∩ 𝐴2 ) + 𝑃(𝐵 ∩ 𝐴3 ) + … + 𝑃(𝐵 ∩ 𝐴𝑛 )
= 𝑃[(𝐵|𝐴1 )𝑃(𝐴1 ) ∪ (𝐵|𝐴2 )𝑃(𝐴2 ) ∪ (𝐵|𝐴3 )𝑃(𝐴3 ) ∪ … ∪ (𝐵|𝐴𝑛 )𝑃(𝐴𝑛 )]
𝑛

𝑃(𝐵) = ∑ 𝑃(𝐵|𝐴𝑖 )𝑃(𝐴𝑖 ) → 𝑇ℎ𝑒𝑜𝑟𝑒𝑚𝑜𝑓𝑡𝑜𝑡𝑎𝑙𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦


𝑖=1
𝑛

𝑃(𝐵) = ∑ 𝑃(𝐵|𝐴𝑖 )𝑃(𝐴𝑖 )


𝑖=1

Bayes’ theorem
Bayes’ theorem (also known as Bayes’ rule or Bayes’ law) is a result in probability theory that
relates conditional probabilities. If A and B denote two events, P(A|B) denotes the conditional
probability of A occurring, given that B occurs. The two conditional probabilities P(A|B) and
P(B|A) are in general different. Bayes theorem gives a relation between P(A|B) and P(B|A).
An important application of Bayes’ theorem is that it gives a rule how to update or revise the
strengths of evidence-based beliefs in light of new evidence a posteriori.
Bayes’ theorem relates the conditional and marginal probabilities of stochastic events A and B:

14
Probability Lecture Note Stat(2012)

P(B|Ai )P(Ai )
𝑃(𝐴𝑖 |𝐵) = Each term in Bayes’ theorem has a conventional name:
P(B)

➢ P(A) is the prior probability or marginal probability of A. It is”prior” in the sense that it
does not take into account any information about B.
➢ P(Ai|B) is the conditional probability of A, given B. It is also called the posterior probability
because it is derived from or depends upon the specified value of B.
➢ P(B|A) is the conditional probability of B given A.
➢ P(B) is the prior or marginal probability of B, and acts as a normalizing constant
❖ A prior probability is an initial probability value originally obtained before any additional
information is obtained.
❖ A posterior probability is a probability value that has been revised by using additional
information that is later obtained
Let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be a partition of the sample space S and let B be the event associated with S.
Applying the definition of conditional probability, we have

P(B|Ai )P(Ai )
𝑃(𝐴𝑖 |𝐵) =
∑ni=1 P(B|Ai )P(Ai )
P(B∩Ai )
Proof:-P(Ai |B) = but P(A∩B)= P(B∩A)=P(A)P(B|A)
P(B)

P(B|Ai )P(Ai )
=
∑n
i=1 P(B|Ai )P(Ai )

Example 1: An aircraft emergency locator transmitter (ELT) is a device designed to transmit a


signal in the case of a crash. The Altigauge Manufacturing Company makes 80% of the ELTs, the
Bryant Company makes 15% of them, and the Chartair Company makes the other 5%. The ELTs
made by Altigauge have a 4% rate of defects, the Bryant ELTs have a 6% rate of defects, and the
Chartair ELTs have a 9% rate of defects (which helps to explain why Chartair has the lowest
market share).
A. If an ELT is randomly selected from the general population of all ELTs, find the
probability that it was made by the Altigauge Manufacturing Company.
B. If a randomly selected ELT is then tested and is found to be defective, find the
probability that it was made by the Altigauge Manufacturing Company.
Solution:We use the following notation: Let:-
A = ELT manufactured by Altigauge B = ELT manufactured by Bryant

15
Probability Lecture Note Stat(2012)

C = ELT manufactured by Chartai 𝐷= ELT is not defective (or it is good)


D= ELT is defective
A. If an ELT is randomly selected from the general population of all ELTs, the probability that it
was made by Altigauge is 0.8 (because Altigauge manufactures 80% of them).
B. If we now have the additional information that the ELT was tested and was found to be
defective, we want to revise the probability from part (a) so that the new information can be
used. We want to findthe value of P(A|D), which is the probability that the ELT was made
by the Altigauge company given that it is defective. Based on the given information, we
know these probabilities:
P(A) = 0.80 because Altigauge makes 80% of the ELTs
P(B) = 0.15 because Bryant makes 15% of the ELTs
P(C) = 0.05 because Chartair makes 5% of the ELTs
P(D|A) = 0.04 because 4% of the Altigauge ELTs are defective
P(D|B) = 0.06 because 6% of the Bryant ELTs are defective
P(D|C) = 0.09 because 9% of the Chartair ELTs are defective
Here is Bayes’ theorem extended to include three events corresponding to the selection of ELTs
from the three manufacturers (A, B, C)

𝑃(𝐷 |𝐴).𝑃(𝐴) 𝑃(𝐷 |𝐴).𝑃(𝐴) 0.032


P(A|D)= = = =0.703
𝑃(𝐷) 𝑃(𝐷 |𝐴).𝑃(𝐴)+𝑃(𝐷 |𝐵 ).𝑃(𝐵)+𝑃(𝐷 |𝐶 ).𝑃(𝐶) 0.0455

Example 2: Of the travelers arriving at a small airport, 60 % fly on major airlines, 30 % fly on
privately owned planes, and the remainder fly on commercially owned planes not belonging to
a major airline. Of those travelling on major airlines, 50 % are travelling for business reasons,
whereas 60 % of those arriving on private planes and 90 % of those arriving on other
commercially owned planes are travelling for business reasons. Suppose that we randomly
select one person arriving at this airport. What is the probability that the person
A. Is travelling on business?
B. Is travelling for business on a privately owned plane?
C. Arrived on privately owned plane, given that the person is travelling for business reasons?
D. Is travelling on business, given that the person is flying on a commercially owned plane?
Solution: - Define the following events.

16
Probability Lecture Note Stat(2012)

M= travelers fly on major airlinesB= travelling for business reasons


R = travelers fly on private owned planesC=travelers fly on commercially owned planes
Answer: A. 0.57 B. 0.18 C. 0.3158 D. 0.9
Example 3: Three urns contain coloured balls:
Urn Red White blue
1 3 4 1
2 1 2 3
3 4 3 2
One urn is chosen at random and a ball is withdrawn.
A. What is the probability that a white ball is drawn?
B. Suppose a red ball is drawn. What is the probability that it came from urn 2?
Solution:
Let 𝐸𝑖 be the event that the 𝑖 𝑡ℎ urn is selected, i= 1, 2, 3. Let S be the sample space of this
experiment –selecting a urn and drawing a ball. Then𝐸1 , 𝐸2 𝑎𝑛𝑑 𝐸3 , form a partition of S.
1
Moreover, since the urn is selected at random, it must be𝑃(𝐸1 ) = 𝑃(𝐸2 ) = 𝑃(𝐸3 ) = 3

A. Let Wbe the event that a white ball is drawn and the conditional probability is defined as
4 2 3
𝑃(𝑊|𝐸1 ) = , 𝑃(𝑊|𝐸2 ) = 𝑎𝑛𝑑 𝑃(𝑊|𝐸3 ) =
8 6 9
Then 𝑃(𝑊) = 𝑃(𝐸1 ∩ 𝑊) + 𝑃(𝐸2 ∩ 𝑊) + 𝑃(𝐸3 ∩ 𝑊)

= 𝑃(𝐸1 )𝑃(𝑊|𝐸1 ) + 𝑃(𝐸2 )𝑃(𝑊|𝐸2 ) + 𝑃(𝐸3 )𝑃(𝑊|𝐸3 )


1 4 1 2 1 3 7
= ∗ + ∗ + ∗ =
3 8 3 6 3 9 18
B. Let Bbe the event that the ball withdrawn is red. The probabilitythat the chosen red ball is
from urn 2 is 𝑃(𝐸2 |𝐵),By Bayes’theorem
𝑃(𝐸2 ∩ 𝐵) 𝑃(𝐸2 )𝑃(𝐵|𝐸2 )
𝑃(𝐸2 |𝐵) = =
𝑃(𝐵) 𝑃(𝐸1 )𝑃(𝐵|𝐸1 ) + 𝑃(𝐸2 )𝑃(𝐵|𝐸2 ) + 𝑃(𝐸3 )𝑃(𝐵|𝐸3 )
𝑃(𝐸2 )𝑃(𝐵|𝐸2 )
=
𝑃(𝐸𝑖 )𝑃(𝐵|𝐸𝑖 )
𝑤ℎ𝑒𝑟𝑒 𝑃(𝐵) = 𝑃(𝐸𝑖 )𝑃(𝐵|𝐸𝑖 ) 𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
𝑃(𝐵|𝐸𝑖 ) 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑑𝑟𝑎𝑤𝑖𝑛𝑔 𝑎 𝑟𝑒𝑑 𝑏𝑎𝑙𝑙 𝑔𝑖𝑣𝑒𝑛 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑖 𝑡ℎ 𝑢𝑟𝑛 𝑖𝑠 𝑐ℎ𝑜𝑠𝑒𝑛.
3 1 4
By using the information from the table:𝑃(𝐵|𝐸1 ) = , 𝑃(𝐵|𝐸2 ) = 𝑎𝑛𝑑 𝑃(𝐵|𝐸3 ) =
8 6 9

17
Probability Lecture Note Stat(2012)

1 3 1 1 1 4 71
𝑃(𝐵) = 𝑃(𝐸𝑖 )𝑃(𝐵|𝐸𝑖 ) = ∗ + ∗ + ∗ =
3 8 3 6 3 9 216
So we obtain
𝑃(𝐸2 )𝑃(𝐵|𝐸2 ) 12
𝑃(𝐸2 |𝐵) = =
𝑃(𝐵) 71
Exercise: Box I contains 3 red and 2 blue marbles while Box II contains 2 redand 8 blue marbles.
A faircoin is tossed. If the coin turns up heads a marble is chosen from Box I; if it turns up tails a
marble is chosen from Box II.
A. Find the probability that a red marble is chosen
B. What is the probability that Box I was chosen given that a red marble is known to have been
chosen?
Probability of Independent Events
The probability of B occurring is not affected by the occurrence or nonoccurrence of A, then we
say that A and B are independent events i.e. P (B/A) = P (B). This is equivalent to
P(A  B) = P(A).P(B)
Let A and B be events with P(B) ≠0. Then A and B are independent if and only
If P(A | B) = P(A).
Proof: So first suppose that A and B are independent. Remember that this means that
P(A∩B) = P(A) ·P(B). Then
P(A∩B) P(A) ·P(B)
P(A | B)= = = P(A)
P(B) P(B)

Collary: If A and B are independent, then A and Bcare independent


We are given that P(A∩B) = P(A) ·P(B), and asked to prove that P(A∩Bc) = P(A) ·P(Bc)we know
that P(Bc) = 1−P(B). the events A∩B and A∩ Bcare disjoint (since no outcome can be both in B
andBc), and their union is A (since every event in A is either in B or in Bc); we have that P(A) =
P(A∩B)+P(A∩ Bc).
P(A∩Bc) = P(A)− P(A∩B)
= P(A)−P(A) ·P(B) (since A and B are independent)
= P(A)[1−P(B)]
= P(A) ·P(Bc),
Collary: If A and B are independent, so 𝐴𝑐 and 𝐵 𝑐 are independent.

18
Probability Lecture Note Stat(2012)

Collary Let events A, B, C are mutually independent. Then A and B∩C are independent, and A
and B∪C are independent
Remarks: If A1, A2, A3 are to be independent then they must be pair wise independent,

P(A j  Ak ) = P (A j ).P( Ak ) j≠ k Where j, k=1, 2, 3 and we must also have


P( A1  A2  A3 ) = P( A1 ).P( A2 ).P( A3 )

Exercise:
1. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue balls.
Find the probability that they are drawn in the order red, white and blue if each ball is?
A. Replaced B. Not replaced
2. In a factory machine A1, A2, A3 manufacturing 25%, 35% and 40% of the total output
respectively. Out of their product 5%, 4% and 2% are respectively defective. An item is
drawn at random from the product is found to be defective.
A. What is the probability that defective item is produce by all machine?
B. What is the probability that this item is produce by machine A1?
3. Suppose the probabilities of three events, A, B and C are as depicted in the
followingdiagram:

A. Are the three events pair wise independent?


B. Are the three events independent?
C. What is P(A∩B)? E. What is P(C |A ∩B)?
D. What is P(A ∩B|C)? F. What is P(C |A ∪B)?
4. An industry A has three machines A1, A2, A3 which produced the same item. It is known that
30%, 30% and 40% of the total output respectively. It is known that 2%, 3% and 3% are
defective respectively. All the items are put in to one stockpile and then one item is chosen at
random. Then find the probability that;
A. This item is defective
B. Given that the item selected in (as defective) then what is the probability that this item
was produced by machine A3?

19
Probability Lecture Note Stat(2012)

5. A laboratory blood test is 95 percent effective in detecting a certain disease when it is, in fact,
present. However, the test alsoyields a "false positive" result for 1 percent of the healthy
persons tested.(That is, if a healthy person is tested, then, with probability 0.01, the testresult
will imply he has the disease.) If 0.5 percent of the population actuallyhas the disease, what
is the probability a person has the disease given that his test result is positive?

20
Probability Lecture Note Stat(2012)

CHAPTER THREE
One Dimensional Random Variable
Definitions of Random variable
• A variable whose value is unknown or a function that assigns values to each of an
experiment's outcomes. Random variables are often designated by letters and can be
classified as discrete, which are variables that have specific values, or continuous, which are
variables that can have any values within a continuous range.
• A random variable is a variable that has a single numerical value, determined by chance,
foreach outcome of a procedure.
• A random variable is discrete if it takes on a countable number of values (i.e. there are gaps
between values).
• A random variable is continuous if there are an infinite number of values the random
variable can take, and they are densely packed together (i.e. there are no gaps between
values).
• A probability distribution is a description of the chance a random variable has of taking on
particular values. It is often displayed in a graph, table, or formula.
• A probability histogram is a display of a probability distribution of a discrete random
variable. It is a histogram where each bar’s height represents the probability that the random
variable takes on a particular value.
Random variable is a variable X whose value is determined by the outcomes of random
experiment. It is classified as ;
• Discrete random variable
• Continuous random variable
1. Discrete random variable:
➢ Possible values of isolated points along the number line. Random variables have their own
sample space, or set of possible values. If this set is finite or countable, the random variable is
said to be discrete.
➢ It is a random variable which can assume only a countable numbers of real values.
➢ If X is a discrete random variable taking a values X1,X2,….,Xn then P(xi)=P(X=xi),where
i=1,2,3,….,n is called probability mass function(pmf) of random variable X

21
Probability Lecture Note Stat(2012)

➢ The set of order pairs [xi = P(xi)], i=1,2,3,….,n gives the probability distribution of random
variable X
Discrete probability distribution: A probability distribution describes the possible values and
their probability of occurring.
Discrete probability distribution is called probability mass function (pmf), p(.) and need to
satisfy following conditions
Properties of P(X=xi) where X is Discrete random variable:
1. 0 ≤ P(X=xi) ≤ 1
2. ∑𝒏𝒊=𝟏 P(X = xi) =1
Example: consider the experiment of tossing a coin three times let ‘x’ be the number of heads
then write the probability distribution of the random variable x
Solution:
The variable ‘x’ takes the value 0,1,2,3 with probability distribution {HHH, HHT, HTH, TTH,
THT, THH, HTT, TTT} then the probability distribution for ‘x’
X 0 1 2 3
P(X) 1/8 3/8 3/8 1/8
Example: Suppose we record four consecutive baby births in a hospital. Let X be the difference
in the number of girls and boys born. X is discrete, since it can only have the values 0, 2, or 4.
(Why can’t it be 1 or 3?) Furthermore, if we write out the sample space for this procedure, we
can find the probability that X equals 0, 2, or 4:
S={mmmm, mmmf, mmfm, mfmm, fmmm, mmff, mfmf, mffm, fmfm, ffmm, fmmf, mfff, fmff,
ffmf, fffm, ffff}
There are 16 total cases, and each one is equally likely.
P(X = 0) = 6/16 = 0.375 (these are the cases mmff, mfmf, mffm, fmfm, ffmm, fmmf)
P(X = 2) = 8/16 = 0.5 (these are the cases mmmf, mmfm, mfmm, fmmm, mfff, fmff, ffmf,fffm)
P(X = 4) = 2/16 = 0.125 (these are the cases mmmm and ffff)
Is this a probability distribution?
✓ ∑𝑥 𝑃(𝑥)= 0.375+ 0.5 +0.125= 1
✓ 0 ≤ 0.125 < 0.375 < 0.5 ≤ 1 So it is a probability distribution
Another Example: Is the following a probability distribution? If Y=𝑋 2 ,X=1,2,…,11

22
Probability Lecture Note Stat(2012)

X 1 4 9 16 25 36 49 64 81 100 121
P(x) 0.20 0.15 0.14 0.12 0.10 0.09 0.07 0.05 0.04 0.02 0.01
✓ Clearly, each probability is between 0 and 1. Now we need to see if they sum to 1.
✓ ∑𝑥 𝑃(𝑥)= 0.20+ 0.15+ 0.14 + 0.12+ 0.10 + 0.09+ 0.07 +0.05 +0.04+ 0.02+ 0.01= 0.99 since
they do not sum to 1, it is not a probability distribution.
2. Continuous random variable:
• A random variable X is said to be continuous if it can take all possible values (integral as well
as fractional) between certain limits or intervals. Continuous random variables occur when
we deal with quantities that are measured on a continuous scale. For instance, the life length
of an electric bulb, the speed of a car, weights, heights, and the like are continuous
• If X is a continuous random variable that assume a values X1,X2,X3,…. then f(x)where
i=1,2,3,….is called probability density function(pdf) of random variable X
Properties of Pdf:
1. 0 ≤ f(x) ≤ 1

2. ∫−∞ f(x)dx = 1
3. probability density function(pdf) integral over arrange of [-∞,∞]
𝑥2
4. P(x1<X<x2)=∫𝑥1 f(x)dx
𝑎
5. P(X=a)=∫𝑎 f(x)dx =0

Example: suppose we have a continuous random variable’ X’ with probability density function
2
is given by 𝑓(𝑋) = { 𝑐𝑥 0 < 𝑥 < 3
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Determine the value of ‘c’
B. Verify that f is pdf
C. Calculate 𝑝(1 < 𝑥 < 2)
Solution:

a) ∫−∞ 𝑓(𝑥)𝑑𝑥=1 property of pdf
3 𝑋3 3
∫0 𝑐𝑥 2 𝑑𝑥= 𝑐( 3 )| =9c=c=1/9
0
31 1
b) ∫0 9 𝑥 2 dx=1=(27 𝑥 3 ) = 1Then f is pdf
21 1 2 1
c) ∫1 9 𝑥 3 𝑑𝑥=(27 𝑥 3 ) | = 3
1
23
Probability Lecture Note Stat(2012)

Cumulative distribution function ofdiscrete random variable


Let X be a discrete random variable with probability mass function (pmf) then the cumulative
distribution function is denoted by F(x) and it is defined by;
F(x) = P(X≤x)
=∑𝒙𝒙=𝟎 P(X = xi)
Example: Tossing a coin three time, let X be getting the numbers of head, then find the CDF of x
Solution: The variable ‘x’ takes the value 0,1,2,3 with probability distribution
(HHH, HHT, HTH, TTH, THT, THH, HTT, TTT)
x P(x) F(X)
0 1/8 1/8
1 3/8 4/8
2 3/8 7/8
3 1/8 1
Cumulative distribution function for continuous random variable
If X is continuous random variable with probability density function (pdf), f(x) then the
Cumulative distribution function of X is F(x) which is defined as;
𝑥
F(x) = 𝑝(𝑋 ≤ 𝑥) =∫−∞ f(t)dt
F(X) gives the “accumulated” probability “up to x .
Properties of CDF
1. 0≤F(X)≤1
𝑋 ∞
2. lim 𝐹(𝑋) = lim ∫−∞ 𝑓(𝑡)𝑑𝑡 = ∫−∞ 𝑓(𝑡)𝑑𝑡 = 1
𝑋→∞ 𝑥→∞
𝑋 ∞
3. lim 𝐹(𝑋) = lim ∫−∞ 𝑓(𝑡)𝑑𝑡 = ∫−∞ 𝑓(𝑡)𝑑𝑡=0
𝑥→−∞ 𝑋→−∞

4. F’(X)=f(x) i.e (F(X) is the anti-derivative of f(x) ).


5. F(X) is a non –decreasing function
Example: If the pdf of a continuous random variable x is given by
2
𝑓(𝑥) = { 49 𝑋 0 ≤ 𝑥 ≤ 7
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. find the CDF of x
B. find 𝑝(4 ≤ 𝑥 ≤ 5)

24
Probability Lecture Note Stat(2012)

CHAPTER FOUR
Function of random variable
Equivalent set
Let x be a random variable define on the sample spaces S and Y be a function of X, the Y is also a
random variable. Define Rx and Ry are range of spaces of X and Y respectively. Let C𝐶Ry and
BC Rx, then B is define as follows B={X∈Rx,Y(x)∈C},then event B and C are Equivalent sets.
Which means one is occurred if only if the others are occurred. For any event C𝐶 Ry, P(c)
defined as P(c) =P{x∈Rx, Y(x)∈C}=P(B) i.e P(C)=P(B)
Example:
1. Suppose that Y=𝜋𝑥 2 , then B={X:x≥2}and C={Y:y≥4𝜋 },are equivalent sets? Why?
2. Let Y=2x+1 and Rx=(X: x>0) and Ry{Y: y>1}, suppose that event C defined as C={y≥5}.
How can we define event B in order to make event C and event B are equivalent set?
Solution:
1. Event C and event B are equivalent set. Because in order to get the values of event C
depends on the values of event B, due to this reason B and C are equivalent set
xi 2 3 4 5 …….
yi 4𝜋 9𝜋 16𝜋 25𝜋 …….
2. First we make the table that Y=2x+1 to get C={Y:y≥5}
xi 0 1 2 3 4 …….
yi 1 3 5 7 9 …….
From the table we get the values of C=(y≥5) is if x≥2. So the event B is define B=(x≥2) when x =2
is defined on the range of Rx. Therefore event C and event B are equivalent set
Function of Discrete Random Variables
Let us first dispose of the case when X is a discrete random variable, since it requires only
simple point-to-point mapping. Suppose that the possible values taken by X can be enumerated
as x1, x2, . . .. Shows that the corresponding possible values of Ymay be enumerated as y1= g(x1),
y2=g(x2), . . . Let the pmf of X be given by

The pmf of Y is simply determined as

25
Probability Lecture Note Stat(2012)

Example:
1. The pmf of a random variable X is given as

Determine the pmf of YifYis related to X by Y =2X + 1.


Solution: the corresponding values of Y are: g(-1) =2(-1)+ 1= 1; g(0) =1; g(1)= 3; and
g(2) =5. Hence, the pmf of Y is given by

2. For the same X as given in Example.1, determine the pmf of Y if Y=2X2 +1.
Solution: in this case, the corresponding values of Y are: g(-1) =2(-1)2+1= 3; g(0)=1;
g(1)= 3; and g(2)= 9, resulting in
1
, for y = 1
4
5 1 1
P(y)= 8
= 2 + 8 , for y = 3
1
, for y = 9
{8
3. A random variable X has the following probability mass function

Find
A. the value of K
B. evaluate P(x<4),P(x>5)and P(3<x≤6)
1
C. what is the smallest values of X for which P(X≤x)>2

Solution:
A. Since P(X=x) is the probability mass function

26
Probability Lecture Note Stat(2012)

B.

C. The minimum value of x may be determined by trial and error method.

1
Therefore the smallest value of x for which P(X ≤ x) >2 is 4

Function of Continuous Random Variables


Suppose that X is a continuous random variable with pdf f(x) and Y is a function of X, Y=f(x)
random variable. Then we can obtain pdf of Y,g(y) from pdf of X, f(x).
The general procedure to find pdf of Y from pdf of X
Step 1: obtain CDF of Y, G(y)
G(y)=P(Y≤ 𝑦),by finding the event C in the range space of X
27
Probability Lecture Note Stat(2012)

Step 2: differentiate G(y) with respect to y to obtain g(y)


Step 3: determine the values of y in the range space of Y for with g(y) ≥ 0
dg−1 (y)
Note g(y) = f[𝑔−1 (𝑦)]| |
dy

A more frequently encountered case arises when X is continuous with known PDF, F(x), or pdf, f
x(x). To carry out the mapping steps as outlined at the beginning of this section, care must be
exercised in choosing appropriate corresponding regions in range spaces RxandRy, this mapping
being governed by the transformation Y= g(x). Thus, the degree of complexity in determining
the probability distribution of Y is a function of complexity in the transformation g(x). Let us
start by considering a simple relationship Y=g(x)=2x+1, The transformation yg(x) is presented
graphically in Figure below Consider the pdf of Y,F(y); it is defined by: Fy(y) =P(Y≤y)
The region defined by Y yin the range space RY covers the heavier portion of the transformation
curve, as shown in Figure below, which, in the range space RX, corresponds to the region g(X )
𝐲−𝟏
≤y, or X≤ g-1(y), where 𝐠 −𝟏 (𝐲) = 𝟐

Figure: Transformation defined by Equation Y=g(x)=2x+1is the inverse function of g(x), or the
solution for x in Equation Y=g(x)=2x+1 in terms of y. Hence,

This gives the relationship between the PDF of X and that of Y, our desired result.
The relationship between the pdfs of XandYare obtained by differentiating both sides of the
above Equation with respect to y. We have:

28
Probability Lecture Note Stat(2012)

Theorem: Let X be a continuous random variable and Y =g(x) where g(x ) is continuous in X and
strictly monotone. Then

Where|u| denotes the absolute value of u.


Example:
1. Let X has the following p.d.f.,
Then find the p.d.f. of Y = 𝑋3.
Solution:

2. Let X have a p.d.f. f(x) and Y = 𝑥 2 ,find𝑝.𝑑.𝑓.of Y.


Solution:

Example: Consider the quadratic function Y = X2. The plot of YagainstXis shown in
Figure below where we see that for one value of Y there are two values of X,

namely√𝑦and −√𝑦. Thus, the CDF of Y is given by

29
Probability Lecture Note Stat(2012)

Figure: Plot of Y = X2

If f(x) is an even function, then f(x) = f(−x) and F(−x) = 1 − F(x). Thus, we have

Example:Find the PDF of the random variable Y = 𝑥 2 , where X is the standard normal random
variable.
Solution:Since the PDF of X is given by f(x) = which is an even function, we have
that and

Therefore, if we let u =√y, then


30
Probability Lecture Note Stat(2012)

Exercise:
1
, −1 < 𝑥 < 2
1. The random variable X has the following PDF𝑓(𝑥) = {3
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If we define Y = 2X +3, what is the PDF of Y?
2𝑥, 0 < 𝑥 < 1
2. Let X be a random variable with pdff(𝑥) = { , let Y=3x+1 then find the pdf
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
of Y, g(y)?
2𝑥, 0 < 𝑥 < 1
3. Let X be a random variable with pdff(𝑥) = { , let Y=𝑒 −𝑥 then find the pdf of
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Y,g(y)?
−𝑥
1
𝑥𝑒 2 , 𝑥 ≥ 0 −𝑥
4. Let X be a random variable with pdff(𝑥) = { 4 , let Y= 2 +2 then find the pdf
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
of Y,g(y)?

31
Probability Lecture Note
Stat(2012)

CHAPTER FIVE

Two-dimensional random variable


What is Two-dimensional random variable?
Definition: Let E be an experiment and S a sample space associated with E. Let X=X(s) and
Y= Y(s) be two functions each assigning a real number to each outcomes s ∈S.We call (X, Y) a
two-dimensional random variable (sometimes called a random vector).

Example:
1. recording the amount of precipitate (p) and volume of gas (q) for a given locality, (p,q)
2. Observing the rainfall (R) and temperature (T) of certain town (R, T)
Note: If the possible values (x, y) are finite countable infinite, then (x, y) is called two-
dimensional discrete random variable.
If (x, y) assumes all values in a specified region R in the xy-plane then (x,y) is called two-
dimensional continuous random variable.
Joint probability distribution
If X and Y are two random variable the probabilitydistribution for their simultaneous
occurrences can be represented by a function f(x,y),for any pair values (X,Y) within the range
ofthe random variable X and Y. This function known as joint probability distribution (X, Y).
Definition: 1. Let (x, y) is a two-dimensional discrete random variable with each possible
outcome (Xi, Yi) we associate a number (Xi, Yi) representing P(X=Xi, Y=Yi) and satisfying the
following conditions.

1. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦

2. ∑∞ ∞
𝑖=1 ∑𝑗=1 𝑃(𝑋𝑖 , 𝑌𝑖 ) = 1

32
Probability Lecture Note
Stat(2012)

The function P is joint probability mass function(𝑋, 𝑌). The set of triples[(𝑋𝑖 , 𝑌𝑖 ), 𝑃(𝑋𝑖 , 𝑌𝑖 )],
i=j=1, 2,3,…, is the joint probability distribution of (𝑋, 𝑌).
Definition: 2. Let (X, Y) be a continuous random variable. If it assuming all values in some
region R of the Euclidean plane. Let (X, Y) be tow-dimensional continuous random variable
then the joint probability density function f is a function satisfying the following conditions:

1. 𝑓(𝑥, 𝑦) ≥ 0, ∀𝑥,𝑦 ∈ 𝑅

2. ∬𝑅 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1

Example:
1. Two production lines 1 and 2 have a capacity of producing 5 and 3 items per day
respectively, assume the numbers of items produced by each line is a random variable.
Let (𝑥, 𝑦) be a two-dimensional random variable yielding the number of items produced
by line 1 and line 2 respectively.
Y|X 0 1 2 3 4 5
0 0 0.01 0.03 0.05 0.07 0.09
1 0.01 0.02 0.04 0.05 0.06 0.08
2 0.01 0.03 0.05 0.05 0.05 0.06
3 0.01 0.02 0.04 0.06 0.06 0.05
A. Show that 𝑃(𝑋𝑖 , 𝑌𝑖 ) is a legitimate probability function of (𝑥, 𝑦).
B. What is the probability that both lines produce the same numbers of items?
C. What is the probability that more items are produce by line 2?
2. Let (𝑥, 𝑦) be a two-dimensional discrete random variable with
2−(𝑥+𝑦) , 𝑥 = 1,2,3 𝑎𝑛𝑑 𝑦 = 1,2,3, …
𝑃(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Show that 𝑃(𝑥, 𝑦) is probability pmf?
B. Find 𝑃(𝑥 > 𝑦)
3. Suppose that 𝑓(𝑥, 𝑦) is a two-dimensional random variable with joint pdf is given by
𝑒 −(𝑥+𝑦) , 0 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Show that 𝑓(𝑥, 𝑦) 𝑖𝑠 𝑝𝑑𝑓?
B. Find 𝑃(𝑥 < 𝑦)
𝑘, 0 < 𝑥 < 2,0 < 𝑦 < 4
4. Given 𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
33
Probability Lecture Note
Stat(2012)

A. Find the value of K


B. Find 𝑃(𝑥 < 1, 𝑦 < 3)
C. Find 𝑃(𝑥 > 𝑦)

Solution
1. A.
I. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦

II. ∑5𝑖=0 ∑3𝑗=0 𝑃(𝑋𝑖 , 𝑌𝑖 ) = 1=0.01+0.03+…+0.06+0.05

𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑖𝑡 𝑖𝑠 𝑙𝑒𝑔𝑖𝑡𝑖𝑚𝑎𝑡𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛


1B. 𝑃(𝑥 = 𝑦) = 𝑃(𝑥 = 0, 𝑦 = 0) + 𝑃(𝑥 = 1, 𝑦 = 1) + 𝑃(𝑥 = 2, 𝑦 = 2) + 𝑃(𝑥 = 3, 𝑦 = 3)
= 𝑃(0,0) + 𝑃(1,1) + 𝑃(2,2) + 𝑃(3,3)
= 0 + 0.02 + 0.05 + 0.06 = 0.13
1, C.𝑃(𝑥 < 𝑦) = 𝑃(0,1) + 𝑃(0, 2) + 𝑃(0,3) + 𝑃(1,2) + 𝑃(1, 3) + 𝑃(2,3)
=0.01 + 0.01 + 0.01 + 0.03 + 0.02 + 0.04 = 0.12
2. A.

I. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦
II. ∑∞ ∞ ∞ ∞
𝑖=1 ∑𝑗=1 𝑃(𝑋𝑖 , 𝑌𝑖 ) = 1 → ∑𝑖=1 ∑𝑗=1 2
−(𝑥+𝑦)
=1
∑∞ ∞
𝑖=1 ∑𝑗=1 2
−(𝑥+𝑦)
=∑∞
𝑖=1 2
−𝑦 ∑∞
𝑗=1 2
−𝑥

1 1 1 1 1 1 1 1
=( + + + + ⋯) ∗ ( + + + +⋯)
2 4 8 16 2 4 8 16
1 1 1 1 1 1 1 1
= (1 + + + + ⋯ ) ∗ (1 + + + + ⋯ )
2 2 4 8 2 2 4 8

1 1 1 1 1 1 1
= ( 1 )∗ 2( 1 ) = 4( 1 )∗( 1 )=1
2 1−2 1−2 1−2 1−2

𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑖𝑡 𝑖𝑠 𝑙𝑒𝑔𝑖𝑡𝑖𝑚𝑎𝑡𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛


2B. 𝑃(𝑥 > 𝑦) = ∑∞ ∞
𝑦=1 ∑𝑥=𝑦+1 2
−(𝑥+𝑦)
= ∑∞
𝑦=1 2
−𝑦 ∑∞
𝑥=𝑦+1 2
−𝑥

34
Probability Lecture Note
Stat(2012)

1 1 1 1
Consider ∑∞
𝑥=𝑦+1 2
−𝑥
= ∑∞ 𝑥
𝑥=𝑦+1( ) = ( )
𝑦+1
+ ( )𝑦+2 + ( )𝑦+3 + ⋯
2 2 2 2
1 1 𝑦 1 1 𝑦 1 1 𝑦 1 1 𝑦
= () + () + () + ( ) +⋯
2 2 4 2 8 2 16 2

1 1 1
=( )𝑦 ( 1) = 2−𝑦
2 2 1−2

⇒∑∞ ∞
𝑦=1 ∑𝑥=𝑦+1 2
−(𝑥+𝑦)
= ∑∞
𝑦=1 2
−𝑦 ∑∞
𝑥=𝑦+1 2
−𝑥
= ∑∞
𝑦=1 2
−𝑦
∗ 2−𝑦 = ∑∞
𝑦=1 2
−2𝑦

1 1 1 1
=( + + + +⋯)
4 16 64 256
1 1 1 1
= (1 + + + + ⋯ )
4 4 16 64

1 1 1
= ( 1 )=3
4 1−4

3. A
I. 𝑓(𝑥, 𝑦) ≥ 0, ∀𝑥,𝑦 ∈ 𝑅
∞ ∞ ∞ ∞ ∞
II. ∬𝑅 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1⇒∫0 ∫0 𝑒 −(𝑥+𝑦) 𝑑𝑥 𝑑𝑦 = ∫0 𝑒 −𝑦 ∫0 𝑒 −𝑥 𝑑𝑥 𝑑𝑦 = 1
𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑓(𝑥, 𝑦) 𝑖𝑠 𝑙𝑒𝑔𝑖𝑡𝑖𝑚𝑎𝑡𝑒 𝑝𝑑𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
∞ 𝑥
3B. 𝑃(𝑥 < 𝑦) = 1 − 𝑃(𝑥 ≥ 𝑦) = 1 − ∫0 𝑒 −𝑥 ∫0 𝑒 −𝑥 𝑑𝑦 𝑑𝑥
∞ ∞
−𝑥 (1 −𝑥 )
1
=1−∫ 𝑒 −𝑒 𝑑𝑥 = 1 − ∫ (𝑒 −𝑥 − 𝑒 −2𝑥 ) 𝑑𝑥 =
0 0 2

Start from this


Definition: Let (X,Y) be a two-dimensional random variable. The cumulative distribution
function (cdf) F of the two-dimensional random variable (X, Y) is defined by 𝐹(𝑥, 𝑦) =
𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦)

35
Probability Lecture Note
Stat(2012)
𝑦
I. 𝐹(𝑥, 𝑦) = ∑𝑥𝑖=1 ∑𝑗=1 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦), 𝑖𝑓 (𝑥, 𝑦)𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑦 𝑥
II. 𝐹(𝑥, 𝑦) ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦, 𝑖𝑓 (𝑥, 𝑦)𝑖𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑐𝑜𝑢𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Properties of 𝐶𝐷𝐹, 𝐹
I. 𝐹(−∞, −∞) = 0, 𝐹(−∞, ∞) = 1
𝑑 2 𝐹(𝑥,𝑦)
II. = 𝑓(𝑥, 𝑦)
𝑑𝑥 𝑑𝑦

III. 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑) = 𝐹(𝑏, 𝑑) − 𝐹(𝑎, 𝑑) − 𝐹(𝑏, 𝑐) + 𝐹(𝑎, 𝑐)

Example: Suppose that the two-dimensional continuous random variable (X, Y) has jointpdf
𝑥𝑦
𝑥 2 + 3 , 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 2
given by𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

A. Determine the joint CDF of (x,y)


B. Obtained F(1,2) and F(1,1.5)
C. Obtained 𝑃(𝑥 + 𝑦 ≥ 1)
Solution
𝑦 𝑥 𝑦 𝑥 𝑠𝑡
A. 𝐹(𝑥, 𝑦) = ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = ∫0 ∫0 (𝑡 2 + ) 𝑑𝑡 𝑑𝑠 3
3 2 3 2 2
𝑦 𝑥 𝑠𝑥 𝑥 𝑦 𝑥 𝑦
=∫0 ( + ) 𝑑𝑠 = +
3 6 3 12
x3 y x2 y2
3
+ 12 , 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 2
Therefore𝐹(𝑥, 𝑦) == { 1, 𝑥 > 1 𝑎𝑛𝑑 𝑦 > 2
0, otherwise
17
B. 𝐹(1,2) = 1 𝑎𝑛𝑑 𝐹(1, 1.5) = 24
C. 𝑃(𝑥 + 𝑦 ≥ 1) = 1 − 𝑃(𝑥 + 𝑦 < 1)
=1 − 𝑃(𝑦 < 1 − 𝑥)
1 1−𝑥 𝑥𝑦 1 𝑥(1−𝑥)2
=1 − ∫0 ∫0 [𝑥2 + ] 𝑑𝑦 𝑑𝑥 = ∫0 𝑥 2 (1 − 𝑥) + 𝑑𝑥
3 6
7 65
=1 − 72 = 72

Example: If the joint probability density of 𝑋 and 𝑌 is given by,


𝑥 + 𝑦, 𝑓𝑜𝑟 0 < 𝑥 < 1, 0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = {
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find the joint cumulative distribution function of these two random variables.

36
Probability Lecture Note
Stat(2012)

𝑦 𝑥 1
Solution:𝐹(𝑥, 𝑦) = ∫0 ∫0 (𝑠 + 𝑡)𝑑𝑠𝑑𝑡 = 𝑥𝑦(𝑥 + 𝑦) ; 0<𝑥 <1, 0<𝑦<1
2
Example: Find the joint probability density function of the two random variables 𝑋 and 𝑌
whose joint distribution function is given by,
(1 − 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) , 𝑓𝑜𝑟 0 < 𝑥, 0 < 𝑦
𝐹(𝑥, 𝑦) = {
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Also use the joint probability density to determine 𝑃(1 < 𝑋 < 3, 1 < 𝑌 < 2).
𝜕2 𝐹(𝑥,𝑦)
Solution:Since partial differentiation yields,𝑓(𝑥, 𝑦) = = 𝑒 −(𝑥+𝑦) for 𝑥 > 0 𝑎𝑛𝑑 𝑦 >
𝜕𝑥𝜕𝑦

0 and 0 elsewhere, we find that the joint probability density 𝑋 and 𝑌 is given by,
𝑒 −(𝑥+𝑦) , 𝑓𝑜𝑟 0 < 𝑥, 0<𝑦
𝑓(𝑥, 𝑦) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Thus, integration yields,
2 3
𝑃(1 < 𝑋 < 3, 1 < 𝑌 < 2) = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥𝑑𝑦 = (𝑒 −1 − 𝑒 −3 )(𝑒 −1 − 𝑒 −2 ) = 0.074
1 1

Marginal and Conditional Probability Distributions


Definition: If 𝑋 and 𝑌 are discretetwo-dimensional random variables and 𝑝(𝑥, 𝑦) is the
valueof their joint probability distribution at (𝑥, 𝑦):
𝑔(𝑥) = ∑𝑦 𝑝(𝑥, 𝑦)for each 𝑥 within the range of 𝑋:The marginal probability mass function of
X.
g(y) = ∑x p(x, y)for each y within the range of Y: The marginal probability mass function of
Y.
Definition: If 𝑋 and 𝑌 are continuous two-dimensional random variables and 𝑓(𝑥, 𝑦) is the
value of their joint probability density at (𝑥, 𝑦):

𝑔(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 for each −∞ < 𝑥 < ∞ : The marginal probability density function of X.

g(y) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 for each −∞ < 𝑦 < ∞ : The marginal probability density function of Y.
𝑑 ∞ 𝑑
NB 𝑃(𝑐 ≤ 𝑥 ≤ 𝑑) = 𝑃(𝑐 ≤ 𝑥 ≤ 𝑑, −∞ ≤ 𝑦 ≤ ∞) = ∫𝑐 ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥 = ∫𝑐 𝑓(𝑥) 𝑑𝑥
Example:
1. let (x,y) be the joint probability function given by
37
Probability Lecture Note
Stat(2012)

Y|X 0 1 2 𝑃(𝑥𝑖 )
0 0.25 0.15 0.1 0.50
1 0.1 0.08 0.1 0.28
2 0.05 0.07 0.1 0.22
𝑃(𝑦𝑖 ) 0.40 0.30 0.30 1
Find the marginal distribution function of X and Y
Solution:
i. The marginal distribution function of X is
X=𝑥𝑖 0 1 2 Total
𝑃(𝑋 = 𝑥𝑖 ) 0.40 0.30 0.30 1
ii. The marginal distribution function of Y are
Y=𝑦𝑖 0 1 2 Total
𝑃(𝑌 = 𝑦𝑖 ) 0.50 0.28 0.22 1

2. The joint PMF of two random variables X and Y is given by


𝑘(2𝑥 + 𝑦), 𝑥 = 1, 2; 𝑦 = 1, 2
PX,Y(x, y) ={
0, otherwise
Wherek is a constant:
A. What is the value of K? B. Find the marginal PMFs of X and Y.
Solution:
A. To evaluate k, we remember that
∑𝑦 ∑𝑥 𝑃(𝑥, 𝑦)=∑2𝑦 ∑2𝑥 𝑘(2𝑥 + 𝑦)=1
Thus∑2𝑦 ∑2𝑥 𝑘(2𝑥 + 𝑦)=𝑘 ∑2𝑥=1[(2𝑥 + 1) + (2𝑥 + 2)]=1
=𝑘 ∑2𝑥=1(4𝑥 + 3) = 1
1
=𝐾[(4 + 3) + (8 + 3)] = 18𝑘 = 1 This gives 𝑘 =
18
B. The marginal pmf’s are
1 1
𝑃(𝑥) = ∑𝑦 𝑃𝑥, 𝑦(𝑋, 𝑌)= ∑2𝑦=1(2𝑥 + 𝑦) = {(2𝑥 + 1) + (2𝑥 + 2)}
18 18
38
Probability Lecture Note
Stat(2012)

1
= (4𝑥 + 3), 𝑥 = 1,2
18
1 1
𝑃(𝑦) = ∑𝑥 𝑃𝑥, 𝑦(𝑋, 𝑌) = ∑2𝑥=1(2𝑥 + 𝑦)= {(2 + 𝑦) + (4 + 𝑦)}
18 18
1
= (2𝑦 + 6), 𝑦 = 1, 2
18
2(𝑥 + 𝑦 − 2𝑥𝑦), 0 ≤ 𝑥 ≤ 1𝑎𝑛𝑑 0 ≤ 𝑦 ≤ 1
3. Let the joint pdf of (x,y) is given by𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the marginal distribution function of X and Y
Solution:
i. The marginal distribution density function of X is
∞ ∞ 𝑦2 1
𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦= 𝑓(𝑥) = ∫−∞ 2(𝑥 + 𝑦 − 2𝑥𝑦) 𝑑𝑦 = 2(𝑥𝑦 + − 𝑥𝑦 2 ) | = 1
2 0
1, 0 ≤ 𝑥 ≤ 1
Therefore 𝑓(𝑥) = { so that is, X is uniformly distributed over [0, 1]
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
ii. The marginal distribution density function of X is
∞ ∞ 𝑥2 1
𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥= 𝑓(𝑥) = ∫−∞ 2(𝑥 + 𝑦 − 2𝑥𝑦) 𝑑𝑥 = 2(𝑥𝑦 + − 𝑦𝑥 2 ) | = 1
2 0
1, 0 ≤ 𝑦 ≤ 1
Therefore 𝑓(𝑦) = { so that is, y is uniformly distributed over [0, 1]
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
2
(𝑥 + 2𝑦) , 𝑓𝑜𝑟 0 < 𝑥 < 1 , 0 < 𝑦 < 1
4. Given the joint probability density,𝑓(𝑥, 𝑦) = { 3
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find the marginal densities of 𝑋 and 𝑌.
Solution: Performing the necessary integrations, we get
∞ 1
2 2
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (𝑥 + 2𝑦)𝑑𝑦 = (𝑥 + 1), 0<𝑥<1
−∞ 0 3 3
∞ 12 1
Likewise,ℎ(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫0 3 (𝑥 + 2𝑦)𝑑𝑥 = 3 (1 + 4𝑦), 0 < 𝑦 < 1
2
Exercise: Let the joint pdf of (x,y) is given by 𝑓(𝑥, 𝑦) = { 12𝑥, 0 < 𝑦 < 𝑥 < 1 and0 < 𝑥 < 𝑦 < 1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the marginal distribution function of X and Y
Definition: We say that the two-dimensional continuous random variable is uniformly
𝑐, 𝑓𝑜𝑟 (𝑥, 𝑦) ∈ 𝑅
distributed over a region R in the Euclidean plane if 𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

39
Probability Lecture Note
Stat(2012)
∞ ∞
Because of the requirement ∫∞ ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1,the above implies that the constant
1
equals . We are assuming that R is a region with finite, nonzero area. i.e𝑐 =
area (R)
1
𝑎𝑟𝑒𝑎(𝑅)

Note: This definition represents the two-dimensional analog to the one-dimensional


uniformly distributed random variable.
Example: Suppose that the two-dimensional random variable (X, Y) is uniformly distributed
1
, 𝑓𝑜𝑟 (𝑥, 𝑦) ∈ 𝑅
over the region R.𝑓(𝑥, 𝑦) = {𝑎𝑟𝑒𝑎(𝑅) , Rx={(x, y): 0<y<x<1,0< 𝑥 2 < 𝑦 < 1}
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

A. Find the value of C


B. Find the marginal pdf of X and Y
Solution:
A. we find that
1 𝑥 1
1
𝐴𝑟𝑒𝑎(𝑅) = ∫ ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦𝑑𝑥 = ∫ (𝑥 − 𝑥2 ) 𝑑𝑥 =
0 𝑥2 0 6
1
𝑠𝑜 𝑡ℎ𝑎𝑡 𝑐 = ⇒𝑐=6
𝑎𝑟𝑒𝑎(𝑅)
Therefore the pdf is given by𝑓(𝑥, 𝑦) = {6, (𝑥, 𝑦) ∈ 𝑅
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
B. In the following equations we find the marginal pdf's of X and Y
i. The marginal distribution density function of X is
∞ 𝑥
𝑓(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 = 𝑓(𝑥) = ∫ 6 𝑑𝑦 = 6(𝑥 − 𝑥2 ), 0 ≤ 𝑥 ≤ 1
−∞ 𝑥2
ii. The marginal distribution density function of y is
∞ √𝑦
𝑓(𝑦) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑥 = 𝑓(𝑦) = ∫ 6 𝑑𝑥 = 6(√𝑦 − 𝑦), 0 ≤ 𝑦 ≤ 1
−∞ 𝑦

40
Probability Lecture Note
Stat(2012)

Conditional Probability Distributions


Definition: Let (X, Y) be a continuous two-dimensional random variable with joint
pmf𝑃(𝑥, 𝑦). Let 𝑃(𝑥)and 𝑃(𝑦) be the marginal pmf‘s of X and Y, respectively. Then the
𝑃(𝑋=𝑥𝑖 ,𝑌=𝑦𝑖 )
conditional pmf of X given that Y = y is defined by𝑃 (𝑥|𝑦) =
𝑃(𝑦𝑖 )
, 𝑃(𝑦𝑖 )>0 and
𝑃(𝑋=𝑥𝑖 ,𝑌=𝑦𝑖 )
the conditional pmf of Y given that X= 𝑥𝑖 is defined by𝑃 (𝑦|𝑥 ) =
𝑃(𝑥𝑖 )
, 𝑃(𝑥𝑖 )>0
Definition: Let (X, Y) be a continuous two-dimensional random variable with joint pdf𝑓(𝑥, 𝑦).
Let 𝑓(𝑥)and 𝑓(𝑦) be the marginal pdf‘s of X and Y, respectively. Then the conditional pdf of
𝑓(𝑥,𝑦)
X given that Y = y is defined by 𝑓(𝑥|𝑌 = 𝑦) =
𝑓(𝑦)
, 𝑓(𝑦)>0 and the conditional pdf of Y
𝑓(𝑥,𝑦)
given that X = x is defined by 𝑓 (𝑦|𝑋 = 𝑥) =
𝑓(𝑥)
, 𝑓(𝑥)>0
Generally: If X and Y have a joint distribution with joint density or probability
function𝑓(𝑥, 𝑦),then the marginal distribution of X has a probability function or density

function denoted 𝑓(𝑥)which is equal to 𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 in the continuous case and

𝑃(𝑥) = ∑𝑦 𝑓(𝑥, 𝑦)in the discrete case.The density function for the marginal distribution of Y is

found in a similar way; 𝑓(𝑦)is equal to either 𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 in the continuous case

and𝑃(𝑦) = ∑𝑥 𝑓(𝑥, 𝑦) in the discrete case and

Example:

1. The joint PMF of two random variables X and Y is given by


1
𝑃(𝑥, 𝑦) = {18 (2𝑥 + 𝑦), 𝑥 = 1, 2; 𝑦 = 1, 2
0, otherwise
A. What is the conditional PMF of Y giventhatX?
B. What is the conditional PMF of X given that Y?
Solution:we know that the marginal PMFs are given by
1 1
𝑃(𝑥) = ∑𝑦 𝑃(𝑥, 𝑦) = 18 ∑2𝑦=1(2𝑥 + 𝑦) = 18 (4𝑥 + 3),x=1, 2 and

41
Probability Lecture Note
Stat(2012)

1 1
𝑃(𝑦) = ∑𝑥 𝑃(𝑥, 𝑦) = 18 ∑2𝑥=1(2𝑥 + 𝑦) = 18 (2𝑦 + 6), y=1, 2

Thus, the conditional PMFs are given by


𝑃(𝑥,𝑦) 2𝑥+𝑦
A. the conditional PMF of YgiventhatXis𝑃 (𝑦|𝑥 ) = =
𝑃(𝑥) 4x+3
𝑃(𝑥,𝑦) 2𝑥+𝑦
B. the conditional PMF of XgiventhatY is 𝑃 (𝑥|𝑦) = =
𝑃(𝑦) 2𝑦+6

2. Suppose that the two-dimensional continuous random variable (X, Y) has joint pdf
𝑥𝑦
𝑥2 + , 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 2
given by𝑓(𝑥, 𝑦) = { 3
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Determine the conditional pdf of Xgiven that Y and the conditional pdf of Ygiven that X
Solution: To determine the conditional PDFs, we first evaluate the marginal pdf’s, which are
given by
∞ 2 𝑥𝑦 𝑥𝑦 2 2 2𝑥
❖ 𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫0 (𝑥 2 + )𝑑𝑦 = (𝑥 2 𝑦 + ) | = 2𝑥 2 +
3 6 0 3

∞ 1 𝑥𝑦 𝑥3 𝑦𝑥 2 1 𝑦 1
❖ 𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫0 (𝑥 2 + )𝑑𝑥 = ( + )| = +
3 3 6 0 6 3
Hence,
𝑥𝑦
𝑓(𝑥,𝑦) 𝑥2 + 6𝑥 2 +2𝑥𝑦
𝑓(𝑥|𝑦) = = 𝑦 1
3
= , 0 ≤ 𝑦 ≤ 2,0 ≤ 𝑥 ≤ 1;
𝑓(𝑦) + 2+𝑦
6 3
𝑥𝑦
𝑓(𝑥,𝑦) 𝑥2 + 3𝑥+𝑦
𝑓(𝑦|𝑥 ) = = 2𝑥
3
= , 0 ≤ 𝑦 ≤ 2,0 ≤ 𝑥 ≤ 1
𝑓(𝑥) 2𝑥 2 + 3 6𝑥+2

Exercise: verify that 𝑓(𝑦|𝑥) 𝑎𝑛𝑑 𝑓(𝑥|𝑦) is pdf

3. Two random variables XandYhave the following joint PDF:

𝑓(𝑥, 𝑦) = {
𝑒−𝑥(1+𝑦) , 0 ≤ 𝑥 < ∞, 0 ≤ 𝑦 < ∞
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Determine the conditional pdf of XgiventhatYand the conditional pdf of YgiventhatX.
Solution: To determine the conditional PDFs, we first evaluate the marginal pdf’s, which are
∞ ∞ ∞
given by 𝑓(𝑥) ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦=∫0 𝑥𝑒 −𝑥(1+𝑦) 𝑑𝑦 = 𝑥𝒆−𝒙 ∫0 𝑒 −𝑥𝑦 𝑑𝑦

42
Probability Lecture Note
Stat(2012)

−𝒙 𝑒 −𝑥𝑦 ∞
=𝑥𝒆 [− ]
𝒙 𝟎

=𝑒 −𝑥 , 0 ≤ x < ∞
∞ ∞
𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦=∫0 𝑥𝑒 −𝑥(1+𝑦) 𝑑𝑥
𝑒 −𝑥(𝑦+1)
Let u = x, which means that du = dx; and let dv = e−x(y+1)dx, which means that𝑉 = − .
𝑦+1

Integrating by parts we obtain


∞ ∞ ∞
−𝑥(1+𝑦)
𝑥𝑒 −𝑥(𝑦+1) 1
𝑓(𝑦) = ∫ 𝑥𝑒 𝑑𝑥 = [− ] + ∫ 𝑒 −𝑥(1+𝑦) 𝑑𝑥
𝒚+𝟏 𝟎 𝒚+𝟏
0 0
1 ∞
=0− [𝑒 −𝑥(1+𝑦) ]𝟎
(𝒚+𝟏)𝟐
1
= ,0 ≤ 𝑦 < ∞
(𝒚+𝟏)𝟐

Thus, the conditional PDFs are given by:

𝑓(𝑥, 𝑦) 𝑥𝑒 −𝑥(1+𝑦)
𝑓(𝑥|𝑦) = = 𝟐
= 𝑥(y + 1)2 𝑒 −𝑥(1+𝑦) , 0 ≤ 𝑥 < ∞
𝑓(𝑦) 1/(𝑦 + 1)
𝑓(𝑥, 𝑦) e−x(1+y)
𝑓(𝑦|𝑥) = = 𝑥 −x = 𝑥𝑒 −𝑥𝑦 , 0 ≤ 𝑦 < ∞
𝑓(𝑥) e
Independent random variables
Definition: Let (𝑥, 𝑦)denote a continuous bivariate random variable with joint pdf 𝑓(𝑥, 𝑦)
and marginal pdfs 𝑓(𝑥)and𝑓(𝑦). Then X and Y are called independent random variables if, for

every x ∈ X and y ∈ Y i.e.𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦) and Let (𝑥, 𝑦)denote a discrete bivariate random
variable with joint pmf𝑃(𝑥, 𝑦) and marginal pmf 𝑃(𝑥)and𝑃(𝑦). Then X and Y are called
independent random variables if, for every x ∈ X and y ∈ Y i.e.𝑃(𝑥, 𝑦) = 𝑃(𝑥)𝑃(𝑦).
𝑥+𝑦
, 𝑥 = 1, 2,3; 𝑦 = 1, 2
Example:𝑃(𝑥, 𝑦) = { 21 , then X and Y are not independent
0, otherwise
𝑦2 𝑥
𝑃(𝑥, 𝑦) = { 30 , 𝑥 = 1, 2,3; 𝑦 = 1, 2 , then X and Y are independent
0, otherwise

43
Probability Lecture Note
Stat(2012)

1. Suppose (X, Y) are discrete random variables with probability function given by

A. find the marginal pmf of X and Y


B. Are X and Y independent?
2
(𝑥 + 2𝑦) , 𝑓𝑜𝑟 0 < 𝑥 < 1 , 0 < 𝑦 < 1
2. Given the joint probability density, 𝑓(𝑥, 𝑦) = { 3
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
A. Find the marginal densities of X and Y.
B. Are X and Y independent?
Solution:
A. Performing the necessary integrations, we get
∞ 12 2
❖ 𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫0 3 (𝑥 + 2𝑦)𝑑𝑦 = 3 (𝑥 + 1), 0 < 𝑥 < 1
∞ 12 1
Likewise, 𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫0 3 (𝑥 + 2𝑦)𝑑𝑥 = 3 (1 + 4𝑦), 0 < 𝑦 < 1

B. X and Y are not independent because 𝑓(𝑥, 𝑦) ≠ 𝑓(𝑥) ∗ 𝑓(𝑦)

Function of two dimensional random variables


Let X and Y be two random variables with a given joint PDF 𝑓(𝑥, 𝑦).Assume that U and V are
two functions of X and Y; that is, U = 𝑔(𝑥, 𝑦)and V = ℎ(𝑥, 𝑦). Sometimes it is necessary to
obtain the joint PDF of U and W, 𝑓(𝑢, 𝑣)in terms of the pdf’s of X and Y.
It can be shown that 𝑓(𝑢, 𝑣)is given by:
𝑓(𝑥1 , 𝑦1 ) 𝑓(𝑥2 , 𝑦2 ) 𝑓(𝑥𝑛 , 𝑦𝑛 )
𝑓(𝑢, 𝑣) = + + ⋯+
|𝐽(𝑥1 , 𝑦1 )| |𝐽(𝑥2 , 𝑦2 )| |𝐽(𝑥𝑛 , 𝑦𝑛 )|
Where (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , , (𝑥𝑛 , 𝑦𝑛 )are real solutions of the equations U= 𝑔(𝑥, 𝑦)and

V=ℎ(𝑥, 𝑦); and J(x, y) is called the Jacobian of the transformation


𝜕𝑔 𝜕𝑔
| 𝜕𝑥𝜕𝑦| 𝜕𝑔 𝜕ℎ 𝜕𝑔 𝜕ℎ
𝐽(𝑥, 𝑦) =
| 𝜕ℎ
= ∗ − ∗
𝜕ℎ | 𝜕𝑥 𝜕𝑦 𝜕𝑦 𝜕𝑥
𝜕𝑥 𝜕𝑦

44
Probability Lecture Note
Stat(2012)

Example:LetU= 𝑔(𝑥, 𝑦)= X + Y and V= ℎ(𝑥, 𝑦)= X − Y. Find 𝑓(𝑢, 𝑣).


𝑢+𝑣 𝑢−𝑣
Solution The unique solution to the equations U = x + y and V= x − y is x = and y = .
2 2
Thus, there is only one set of solutions. Since
𝜕𝑔 𝜕𝑔
𝜕𝑥 𝜕𝑦 1 1
𝐽(𝑥, 𝑦) = |𝜕ℎ 𝜕ℎ
|=| | =-2
1 −1
𝜕𝑥 𝜕𝑦

𝑓(𝑥,𝑦) 1
We obtain𝑓(𝑢, 𝑣) =
|𝐽(𝑥,𝑦)|
= |−2| 𝑓(𝑢+𝑣
2
, 𝑢−𝑣
2
) = 12 𝑓(𝑢+𝑣
2
, 𝑢−𝑣
2
)
Exercise:
1. Find 𝑓(𝑢, 𝑣)if U = 𝑥 2 +𝑦 2 and V= 𝑥 2
2. Let X and Y have the joint probability density function
3
𝑓(𝑥, 𝑦) = 𝑥 2 (1 − |𝑦|) , −1 < 𝑥 < 1 , −1 < 𝑦 < 1
2
Let 𝐴 = {(𝑥, 𝑦): 0 < 𝑥 < 1, 0 < 𝑦 < 𝑥}. Find the probability that (X, Y) falls into A
𝑥+𝑦
3. Let X and Y have the joint probability function 𝑝(𝑥, 𝑦) = , 𝑥 = 1,2,3 , 𝑦 = 1,2
21
A. Find the conditional probability function of 𝑋, given that 𝑌 = 𝑦.
B. 𝑓𝑖𝑛𝑑 𝑃(𝑋 = 2|𝑌 = 2)
C. Find the conditional probability function of 𝑌, given that 𝑋 = 𝑥.
4. Let X and Y have the joint probability density function

𝑓(𝑥, 𝑦) = 2 , 0≤ 𝑥≤𝑦≤1
A. Find the marginal probability density functions.
B. Find the conditional probability density function𝑓(𝑦|𝑥) 𝑎𝑛𝑑 𝑓(𝑥|𝑦)?
3 7 1
C. Calculate 𝑃 (4 ≤ 𝑌 ≤ 8 |𝑋 < 4).

45
Probability Lecture Note
Stat(2012)

CHAPTER SIX
EXPECTATION

Expectation of a Random Variable


The definition of the expectation of a random variable can be motivated both by the concept
of a weighted average and through the use of the physics concept of center of gravity, or the
balancing point of a distribution of weights. We first examine the case of a discrete random
variable and look at a problem involving the balancing-point concept.

The averaging process, when applied to a random variable is called expectation. It is denoted
by E(X) or and is read as the expected value of X or the mean value of X.

Expectation of a Random Variable: Discrete Case


The expected value of a discrete random variable is defined by E(x) =∑𝑛𝑥=1 𝑥𝑃(𝑥); provided
the sum exists. Since P(x)≥0,∀𝑥 ∈ 𝑅(𝑥) and∑𝑛𝑥=1 𝑃(𝑥) = 1, the expected value of a discrete
random variable can also be straightforwardly interpreted as a weighted average of the
possible outcomes (or range elements) of the random variable. In this context the weight
assigned to a particular outcome of the random variable is equal to the probability that the
outcome occurs (as given by the value of P(x))
Example 1: Consider the random variable representing the number of episodes of diarrhea in
the first 2 years of life. Suppose this random variable has a probability mass function as
below
X 0 1 2 3 4 5 6
P(X = x) 0.129 0.264 0.271 0.185 0.095 0.039 0.017
What is the expected number of episodes of diarrhea in the first 2 years of life?
Solution:E (X) = ∑6𝑥=0 𝑥𝑃(𝑥)= 0.P(X=0) +1.P (X=1) +2P(X=2) + … + 6P(X=6)
= 0 (0.129) + 1(0.264) +2(0.271) + … + 6(0.017) = 2.038
Thus, on the average a child would be expected to have 2 episodes of diarrhea in the first 2
years of life.

46
Probability Lecture Note
Stat(2012)

Example: A construction firm has recently sent in bids for 3 jobs worth (in profits) 10, 20, and
40 (thousand) dollars. If its probabilities of winning the jobs are respectively 0.2,0 .8, and 0.3,
what is the firm’s expected total profit?
Solution:LettingXi,i =1, 2, 3 denote the firm’s profit from jobi, then
Total profit=𝑋1 + 𝑋2 + 𝑋3
So𝐸(𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑓𝑖𝑡) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 )
= 10 ∗ 0.2 + 20 ∗ 0.8 + 40 ∗ 0.3
= 2 + 16 + 12
Therefore the firm’s expected total profit is 30 thousand dollars.

Expected Value of a Random Variable: Continuous Case


The expected value of the continuous random variableXis defined by
+∞
E(x)= ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥, provided the integral exists.

EXAMPLE: Let the random variable X be defined as follows. Suppose that X is the time (in
minutes) during which electrical equipment is used at maximum load in a certain specified
time period. Suppose that X is a continuous random variable with the following pdf;
1
𝑥, 0 ≤ 𝑥 ≤ 1500
(1500)2
𝑓(𝑥) = −1
(𝑥 − 3000), 1500 ≤ 𝑥 ≤ 3000
(1500)2
{ 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
+∞
Solution: Bydefinition,E(x)=∫−∞ 𝑥𝑓(𝑥)𝑑𝑥
+∞ 1500 3000
𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ 0 1500
1500 3000
1 −1
=∫ 𝑥 𝑥𝑑𝑥 + ∫ 𝑥 (𝑥 − 3000)𝑑𝑥
0 (1500) 2
1500 (1500)2
1500 3000
1 1
= 2∫ 𝑥 2 𝑑𝑥 − ∫ 𝑥(𝑥 − 3000)𝑑𝑥
(1500) 0 (1500)2 1500
1 𝑥 3 1500 1 𝑥3 𝑥 2 3000
= | − ( − 3000 )|
(1500)2 3 0 (1500)2 3 2 1500
=1500 minutes

47
Probability Lecture Note
Stat(2012)

Example:A large domestic automobile manufacturer mails out quarterly customer


satisfaction surveys to owners who have purchased new automobiles within the last 3 years.
The proportion of surveys returned in any given quarter is the outcome of a random
3𝑥 2 , 0 ≤ 𝑥 ≤ 1
variableXhaving density function 𝑓(𝑥) = { . What is the expected proportion
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
of surveys returned in any given quarter?
Solution: By definition,
+∞ 1 1 3𝑥 4 1
E(x) =∫−∞ 𝑥𝑓(𝑥)𝑑𝑥=∫0 𝑥(3𝑥 )𝑑𝑥=∫0 3𝑥 𝑑𝑥= | =
2 3
0.75
4 0
The expected proportion of surveys returned in any given quarter is 0.75

Expectations of Functions of Random Variables

LetXbe a random variable having density function f(x).Then the expectation of


∑𝑛𝑥=1 𝑔(𝑥)𝑃(𝑥) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Y=g(x) is given by E(Y)={ +∞
∫−∞ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Example:
1. A small rural bank has two branches located in neighboring towns in East Gojjam. The
numbers of certificates of deposit that are sold at the branch in Menkorere and the
branch in Gozamen in any given week can be viewed as the outcome of the bivariate
random variable (X,Y) having joint probability density function

x2 y
𝑃(𝑥, 𝑦) = { 30 , y = 1,2,3 𝑎𝑛𝑑 𝑥 = 1,2
0, otherwise

A. Are the random variables independent?


B. What is the expected number of certificate sales bytheGozamen Branch?
C. What is the expected number of combined certificatesales for both branches?
Properties of Expectation
If X and Y are random variables and a, b are constants then:
1. E(k) = k, where k is any constant 3. E (X + k) =E(X) + k
2. E (kX) = k E(X), where k is any constant 4. E(X + Y) = E(X) +E(Y)

48
Probability Lecture Note
Stat(2012)

5. E(X) ≥ 0, if X ≥ 0. 7. |E(XY)2| ≤ E(X2) E(Y2).


6. |E(X)| ≤ E(|X|)
8. E(XY) = E(X) E(Y), if X, Y are independent random variables
Theorem:Let (X, Y) be a two-dimensional random variable and suppose that X and Yare
independent. Then E(XY) = E(X)E(Y).
Proof:

Variance of random variable


The variance of arandom variable measures the variability of the random variable about its
expected value.
Definition: Let X be a random variable. We define the variance of X, denoted by V(X) or𝜎𝑥 2 ,
as follows:
Mean of X = E(X)
Variance of X =𝜎𝑥 2 = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 = 𝐸[𝑋 − 𝐸(𝑋)]2

The positive square root of V(X) is called the standard deviation of X and is denoted by 𝜇𝑥 .
Proof:Expanding 𝐸(𝑥 − 𝐸(𝑥))2 and using the previously established properties for
expectation, we obtain

Case 1: Variance for discrete random variable

If X is a discrete random variable with expected value μ then the variance of X, denoted by
Var (X), is defined by:

𝜎𝑥 2 =Var(X) = E(X-μ)2 = E(X2) – μ2= ∑𝑛 2


𝑖=1 𝑥𝑖 𝑃((𝑥𝑖 ) − 𝜇
2

Alternatively, Var(X) = ∑𝑛 2
𝑖=1(𝑥𝑖 − 𝜇𝑥 ) 𝑃(𝑥𝑖 )

49
Probability Lecture Note
Stat(2012)

Case 2: Variance for continuous random variable



If X is a continuous random variable, then var (X), 𝜎𝑥 2 = ∫−∞(𝑥 − 𝑥̅ )2 𝑓𝑥 (𝑥) 𝑑𝑥
Properties of Variances
✓ For any random variable X and Y and constant a and b, it can be shown that
- 𝑉𝑎𝑟(𝑎𝑥 + 𝑏) = 𝑎2 𝑣𝑎𝑟(𝑥)
- 𝑉𝑎𝑟(𝑎𝑥 + 𝑏𝑦) = 𝑎2 𝑣𝑎𝑟(𝑥) + 𝑏 2 𝑉𝑎𝑟(𝑦) + 2𝑎𝑏𝐶𝑜𝑣(𝑥, 𝑦)
- 𝑉𝑎𝑟(𝑎𝑥 − 𝑏𝑦) = 𝑎2 𝑣𝑎𝑟(𝑥) + 𝑏 2 𝑉𝑎𝑟(𝑦) − 2𝑎𝑏𝐶𝑜𝑣(𝑥, 𝑦)
- if X1, X2 ……, Xn are independent random variables, then
𝑉𝑎𝑟(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛) = 𝑣𝑎𝑟(𝑥1) + Var (X2) + ⋯ + var (Xn
i.e.,𝑉𝑎𝑟(∑𝑘𝑖=1 𝑥𝑖 ) = ∑𝑘𝑖=1 𝑉𝑎𝑟(𝑥𝑖 )
𝑉𝑎𝑟(𝑥 + 𝑦) = 𝑉𝑎𝑟(𝑥) + 𝑉𝑎𝑟(𝑦)
} when X and Y are independent
(𝑉𝑎𝑟(𝑥 − 𝑦) = 𝑉𝑎𝑟(𝑥) + 𝑉𝑎𝑟(𝑦))
𝑉𝑎𝑟(𝑥 + 𝑦) = 𝑉𝑎𝑟(𝑥) + 𝑉𝑎𝑟(𝑦) + 2𝐶𝑜𝑣(𝑥, 𝑦)
} when X and Y are 𝑛𝑜𝑡 independent
𝑉𝑎𝑟(𝑥 − 𝑦) = 𝑉𝑎𝑟(𝑥) + 𝑉𝑎𝑟(𝑦) − 2𝐶𝑜𝑣(𝑥, 𝑦)
x2
Example: Compute the variance of f(x) = for 0 < x < 3
9
V(x) = E(x2) – [E(x)]2
+∞ +∞
+∞ 4
𝑥2 𝑥 1 𝑥 5 3 27
𝐸(𝑥 2 ) = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 = ∫
𝑑𝑥 = ∫ 𝑥2 ∗
𝑑𝑥 = [ ] | =
−∞ −∞ 9 −∞ 9 9 5 0 5
+∞ +∞ 2 +∞ 3 4
𝑥 𝑥 1 𝑥 3 9
𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 ∗ 𝑑𝑥 = ∫ 𝑑𝑥 = [ ] | =
−∞ −∞ 9 −∞ 9 9 4 0 4
Therefore, V(x) = E(x ) – [E(x)]
2 2
27 9 27
= + ( )2 = = 0.3375 ≈ 0.34
5 4 80
Moments of a Random Variable
The expectations of certain power functions of a random variable have uses as measures of
central tendency, spread or dispersion, and skewness of the density function of the random
variable, and also are important components of statistical inference procedures that we will
study in later chapters. These special expectations are called momentsof the random variable
(or of the densityfunction). There are two types of moments that we will be concerned with
moments about the originandmoments about the mean.
It tells us information about the shape of the distribution. The nature of the distribution can
be identified by looking on various moment values.
Moment about the Origin
LetXbe a random variable with density function f(x). Then the rth moment ofXabout the
origin, denoted by 𝜇 ′ 𝑟 , is defined for integers r>0as
50
Probability Lecture Note
Stat(2012)

∑𝑛𝑥=1 𝑥 𝑟 𝑃(𝑥) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒


E(𝑥 𝑟 ) ={ +∞ 𝑟
∫−∞ 𝑥 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
The value ofrin the definition of moments is referred to as the order of the moment, so that
one would refer to E(𝑥 𝑟 ) as themoment of order r. Note that 𝜇 ′ 0 =1 for any discrete or
continuous random variable, since 𝜇 ′ 0 = E(𝑥 0 )= E(1) = 1.
The first moment about the origin is simply the expectation of the random Variable X,
i.e, 𝜇 ′1 = E(𝑥1 )= E(x) a quantity that we have examined at the beginning of our discussion of
mathematical expectation. This balancing point of a density function, or the weighted
average of the elements in the range of the random variable, will be given a special name and
symbol.
❖ The first moment about the origin of a random variable,X, is called the mean of the
random variable X(or mean of the density function of X), and will be denoted by the

symbol𝜇. Thus, the first moment about the origin characterizes the central tendencyof a
density function. Measures of spread and skewness of a density function aregiven by
certain moments about the mean

Moment about the Mean


LetXbe a random variable with density functionf(x). Then the rthcentral moment of X(or the
rth moment of Xabout the mean), denoted by 𝜇𝑟 , is defined as
∑𝑛𝑥=1(𝑥 − 𝜇)𝑟 𝑃(𝑥) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑟
𝐸((𝑥 − 𝜇) ) ={ +∞
∫−∞ (𝑥 − 𝜇)𝑟 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Note that 𝜇0 =1 for any discrete or continuous random variable, since
𝜇0 =E((𝑥 − 𝜇)0 )= E(1)=1 . Furthermore, 𝜇1 = 0 for any discrete or continuous random
variable for which E(X) exists, since𝜇1 =E((𝑥 − 𝜇)1 ) =E(𝑥) − 𝐸(𝜇)= 𝜇 − 𝜇=0
The second central moment is given a special name and symbol. The second central
moment, E((𝑥 − 𝜇)2 ), of a random variable, X, is called the variance of the random
variable X (or the variance of the density function of X), and will be denoted by the
symbol 𝜎 2 , or by Var(x).

51
Probability Lecture Note
Stat(2012)

The nonnegative square root of the variance of a random variable,X, (i.e.,+√𝜎 2 ) is


called the standard deviation of the random variable (or standarddeviation of the
density function ofX) and will be denoted by the symbols 𝜎,or by Std(X).

The relationship between raw moment and central moment


I. 𝜇1 = 0
II. 𝜇 ′ 2 − ( 𝜇 ′ 1 )2
III. 𝜇3 = 𝜇′ 3 − 3𝜇′ 2 𝜇′1 + 2( 𝜇′1 )3
IV. 𝜇4 = 𝜇′ 4 − 4𝜇′ 3 𝜇′1 + 6 𝜇′ 2 ( 𝜇′1 )2 − 3( 𝜇′1 )4
Exercise:
𝑥
, 𝑥 = 1, 2, 3
1. If X is a random variable having a probability mass function𝑃(𝑥) = {6
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the mean and variance of the random variable of X
2. If X is a random variable having a probability density function𝑓(𝑥) =
2(𝑥 − 1), 1 < 𝑥 < 2
{ Find the mean and variance of the random variable of X
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Moment Generating Functions
The expectation of 𝑒 𝑥𝑡 results in a functionoft that, when differentiated with respect to the

argument and then evaluated at t=0, generates moments ofXabout the origin. The function is
aptly called the moment-generating functionofX.

Definition: The expected value of𝑒 𝑡𝑥 is defined to be the moment-generating function of Xif
the expected value exists for every value of t in some open interval containing 0,
i.e. ∀𝑡 ∈ (−ℎ, ℎ), ℎ > 0.The moment generating function of X will be denoted byMx (t), and is
represented by:
∑𝑛𝑥=1 𝑒 𝑥𝑡 𝑃(𝑥) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Mx(t) =E(𝑒 ) ={ +∞ 𝑥𝑡
𝑥𝑡
∫−∞ 𝑒 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Note thatMx(0) =E(𝑒 2 ) =E(1)=1isalways defined, and from this property it is clear that a
function oftcannotbe a MGF unless the value of the function at t=0 is 1. The condition
thatMx(t) must be defined ∀t∈(-h,h) is a technical condition that ensuresMx(t) is
differentiable at the point zero, a property whose importance will become evident shortly.

52
Probability Lecture Note
Stat(2012)

We now indicate how the MGF can be used to generate moments about the origin. In the
𝑟
𝑑 𝑔(𝑎)
following theorem, we use the notation 𝑟
to indicate therth derivative ofg(x) with respect
𝑑𝑥
tox evaluated atx=a.
Theorem: Let X be a Random Variable for which the MGF, Mx(t),exists. Then
𝒅𝒓 𝑴𝒙(𝟎)
𝜇 𝑟 =E(𝑥 ) =
′ 𝑟
𝒅𝒕𝒓
Example: Suppose that X is binomially distributed with parameters n and p. Then the
moment generating function is defined as Mx(t)=[𝑝𝑒 𝑡 + 𝑞]𝑛 then find the 1st and the 2nd
moment then find the mean and variance of the binomial distribution.
Solution:

M’x(t)=𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡
M’’x(t)=𝑛𝑝2 𝑒 2𝑡 (𝑛 − 1)[𝑝𝑒 𝑡 + 𝑞]𝑛−2 𝑝𝑒 𝑡 + 𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡
Therefore E(x)=M’x(0)=np and E(𝑥 2 )=M”x(0)=𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝
Var(x) =M”x(0) − [M’x(0)]2 = 𝑛𝑝𝑞
Theorem: Suppose that the random variable X has MGF Mx(t). Let Y= aX +b. ThenMy(t), the

MGF of the random variable Y, is given by; My(t)=𝑒 𝑏𝑡 Mx(at)


𝑒 −𝑥 , 𝑥 ≥ 0
Example: The random variable X has the density function𝑓(𝑥) = { . Find the
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
MGF, and use it to define the mean and variance ofX.
+∞ +∞ +∞ 𝑒 𝑥(𝑡−1) ∞
Solution: Mx(t)=∫−∞ 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥 = ∫0 𝑒 𝑡𝑥 𝑒 −𝑥 𝑑𝑥=∫−∞ 𝑒 𝑥(𝑡−1) 𝑑𝑥 = |
𝑡−1 0
1
=0 − =[1 − 𝑡]−1 , provided if t<1
𝑡−1

Exercise:
2𝑥, 0 < 𝑥 < 1
1. Suppose that X has pdf given by𝑓(𝑥) = { .
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Determine the MGF of X.
B. Using the MGF, evaluate E(X) and Var(X).
2. Suppose that the continuous random variable X has pdf
1 −|𝑥|
𝑓(𝑥) = {2 𝑒 , −∞ < 𝑥 < ∞
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
53
Probability Lecture Note
Stat(2012)

A. Obtain the MGF of X.


B. Using the MGF, find E(X) and Var(X).
3. Suppose that the MGF of a random variable X is of the formM x(t) = [0.4𝑒 𝑡 + 0.6]8
A. What is the MGF of the random variable Y = 3X + 2 ?
B. Evaluate E(X).
C. Can you check your answer to (b) by some other method?
4. The daily quantity of water demanded by thepopulation of a large Addis Ababa city in
the summer months is the outcome of a random variable, X, measured in millions of
gallons and having a MGF of Mx(t) =[1 − 0.5𝑡]−10 ,for t<2
A. Find the mean and variance of the daily quantity of water demanded.
B. Is the density function of water quantity demanded symmetric?
The Markov and Chebychev’sinequalities:It is often not possible to calculate exactly the
probabilities associated with a random variable, and we will often look for bounds on these
probabilities. Two of the most famous bounds (or inequalities) are Markov’s inequality and
Chebychev’s inequality.
Markov’s inequality: Let X be a non-negative random variable and let a be any positive
𝐸(𝑥)
constant. Then𝑃(𝑥 ≥ 𝑎) ≤
𝑎
We shall prove this result in the continuous case. For a continuous random variable with
probability density function, f(𝑥)
∞ ∞
𝑥
𝑃(𝑥 ≥ 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥 ≤ ∫ 𝑓(𝑥)𝑑𝑥
𝑎 𝑎 𝑎
∞ ∞
1 1 𝐸(𝑥)
= ∫ 𝑥𝑓(𝑥)𝑑𝑥 ≤ ∫ 𝑥𝑓(𝑥)𝑑𝑥 =
𝑎 𝑎 𝑎 0 𝑎
Chebychev’s inequality: Let X be a random variable with mean μ and variance σ2 and let k
1
be any positive constant. Then 𝑃 (|𝑋 − 𝜇 | < 𝜎𝑘) ≥ 1 − indicates at least
𝑘2
1
(1 − ) ∗100%are found in the interval (𝑥 − 𝜎𝑘, 𝑥 + 𝜎𝑘)and 𝑃 (|𝑋 − 𝜇 | ≥ 𝜎𝑘 ) ≤
𝑘2
1 1
indicates at most ∗100%are found out of the interval (𝑥 − 𝜎𝑘, 𝑥 + 𝜎𝑘).Where 𝒙 = 𝑬(𝒙)
𝑘2 𝑘2

54
Probability Lecture Note
Stat(2012)

1
𝑃(|𝑋 − 𝜇| ≥ 𝜎𝑘) ≤ is the probability that the value of X lies at least k standard
𝑘2
1
deviations from its mean is at most where k is a positive number greater than 1.
k2
𝑽𝒂𝒓(𝒙)
Note: 𝑷(|𝑿 − 𝝁| < 𝒂) ≥ 𝟏 − 𝒘𝒉𝒆𝒓𝒆𝒂 = 𝝈𝒌
𝒂𝟐

Example:
1. a random variable X has the mean of 4 and variance of X is 2, then use chebyshev’s

inequality obtain the upper bound for 𝑷(|𝑿 − 𝟒| < 𝟑)


Solution: By the definition of thechebyshev’s inequality
𝟏 𝟏
𝑷(|𝑿 − 𝝁| < 𝝈𝒌) ≥ 𝟏 − 𝟐
is the upper bound𝑷(|𝑿 − 𝝁| < 𝝈𝒌) ≥ 𝟏 −
𝒌 𝒌𝟐
𝟏 𝟑 𝟑
𝑷(|𝑿 − 𝟒| < 𝟑) ≥ 𝟏 − But𝝈𝒌 = 𝟑 𝒕𝒉𝒆𝒏 𝒌 = →𝒌=
𝒌𝟐 𝝈 √𝟐

𝟐
𝑷(|𝑿 − 𝟒| < 𝟑) ≥ 𝟏 −
𝟗
𝟕
𝑷(𝟏 < 𝒙 < 𝟕) ≥
𝟗
So the probability that some value of the random variable X will be between 1 and 7 is at
least 0.778
𝟏
, 𝟏<𝒙<𝟒
2. A random variable X has pdf, 𝒇(𝒙) = {𝟑 then use chebyshev’s inequality to
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
estimate 𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐)
Solution: By the definition of the chebyshev’s inequality
𝟏
𝑷(|𝑿 − 𝝁| < 𝝈𝒌) ≥ 𝟏 − is the upper bound
𝒌𝟐
𝟒 𝟒
𝟏 𝟏 𝟒 𝟓
𝑬(𝒙) = ∫ 𝒙𝒇(𝒙)𝒅𝒙 = ∫ 𝒙 𝒅𝒙 = 𝒙𝟐 | = = 𝟐. 𝟓
𝟏 𝟏 𝟑 𝟔 𝟏 𝟐
𝟒 𝟒
𝟏 𝟏 𝟒 𝟔𝟑
𝑬(𝒙𝟐 ) = ∫ 𝒙𝟐 𝒇(𝒙)𝒅𝒙 = ∫ 𝒙𝟐 𝒅𝒙 = 𝒙𝟑 | = =𝟕
𝟏 𝟏 𝟑 𝟗 𝟏 𝟗
𝟐𝟓 𝟑 √𝟑
Then 𝑽𝒂𝒓(𝒙) = 𝝈𝟐 = 𝑬(𝒙𝟐 ) − [𝑬(𝒙)]𝟐 = 𝟕 − = 𝒔𝒐 𝒕𝒉𝒂𝒕 𝝈 =
𝟒 𝟒 𝟐
𝟏 𝟐 𝟒
𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥ 𝟏 − But𝝈𝒌 = 𝟐 𝒕𝒉𝒆𝒏 𝒌 = →𝒌=
𝒌𝟐 𝝈 √𝟑

55
Probability Lecture Note
Stat(2012)

𝟑
Then 𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥ 𝟏 −
𝟏𝟔
𝟏𝟑
𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥
𝟏𝟔
So the probability that some value of the random variable X will be between 0.5 and 4.5 is at
least 0.8125
Example: Suppose that it is known that the number of items produced in a factory during a
week is a random variable with mean 50.
A. What can be said about the probability that this week’s production will exceed 75?
B. If the variance of a week’s production is known to equal 25, then what can be
saidabout the probability that this week’s production will be between 40 and 60?
Solution: LetXbe the number of items that will be produced in a week:
𝐸(𝑥)
A. By Markov’s inequality;𝑃 (𝑥 > 75) ≤
75
50
𝑃(𝑥 > 75) ≤
75
2
𝑃(𝑥 > 75) ≤
3
𝜎2 1
B. By Chebyshev’s inequality;𝑃{|𝑥 − 50| ≥ 10} ≤ =
102 4
𝟏 𝟑
Hence𝑷(|𝑿 − 𝟓𝟎| < 𝟏𝟎) ≥ 𝟏 − ↔ 𝑷(|𝑿 − 𝟓𝟎| < 𝟏𝟎) ≥
𝟒 𝟒
So the probability that this week’s production will be between 40 and 60 is at least 0.75.
Exercise:From past experience, a professor knows that the test score of a student taking her
final examination is a random variable with mean 75.
A. Give an upper bound to the probability that a student’s test score willexceed
85.Suppose in addition the professor knows that the variance of a student’s test score
is equal to 25.
B. What can be said about the probability that a student will score between65 and 85?
C. How many students would have to take the examination so as to ensure,
withprobability at least0.9, that the class average would be within 5 of 75?
Joint Moment about the Origin

56
Probability Lecture Note
Stat(2012)

LetXandYbe two random variables having joint density functionf(x,y). Then the (𝑟, 𝑠)𝑡ℎ joint
moment of (X,Y) (or off(x,y)) about the origin is defined by
∑𝑛𝑥=1 𝑥𝑟 𝑦𝑠 𝑃(𝑥, 𝑦) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑟,𝑠 ={ +∞ 𝑥𝑟 𝑦𝑠 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥 , 𝑖𝑓

𝜇
∫−∞ 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Covariance and Correlation coefficient
Regarding joint moments, our immediate interest is on a particular joint moment about the
mean,𝜇1,1, and the relationship between this moment and moments about the origin. The
central moment𝜇1,1is given a special name and symbol, and we will see that 𝜇1,1is useful as a
measure of “linear association” betweenXandY.

Covariance: The central joint moment 𝜇1,1 = 𝐸(𝑥 − 𝐸(𝑥)𝐸(𝑦 − 𝐸(𝑦)) is called the

covariance betweenXandY, and is denoted by the symbol 𝜎𝑥𝑦 , or by𝑐𝑜𝑣 (𝑥, 𝑦).

Note that there is a simple relationship between 𝜎𝑥𝑦 and moments about the origin that can

be used for the calculation of the covariance.

𝜎𝑥𝑦 = 𝐸(𝑥𝑦) − 𝐸(𝑥)𝐸(𝑦)


Proof: This result follows directly from the properties of the expectation operation. In
particular, by definition
𝜎𝑥𝑦 = 𝐸[(𝑥 − 𝐸(𝑥))(𝑦 − 𝐸(𝑦))]
= 𝐸[𝑥𝑦 − 𝑦𝐸(𝑥) − 𝑥𝐸(𝑦) + 𝐸(𝑥)𝐸(𝑦)]
= 𝐸(𝑥𝑦) − 𝐸(𝑦)𝐸(𝑥) − 𝐸(𝑥)𝐸(𝑦) + 𝐸(𝑥)𝐸(𝑦)
= 𝐸(𝑥𝑦) − 𝐸(𝑥)𝐸(𝑦)
Example: Let the bivariate random variable (X,Y) have a joint density function
𝑥 + 𝑦, 0 < 𝑥 < 1,0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { Find 𝑐𝑜𝑣(𝑥, 𝑦).
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution: from the definition of covariance it is computed as 𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸(𝑥𝑦) − 𝐸(𝑥)𝐸(𝑦)

Therefore by the definition of covariance 𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸(𝑥𝑦) − 𝐸(𝑥)𝐸(𝑦)


57
Probability Lecture Note
Stat(2012)

1 7 7 −1
= − ∗ =
3 12 12 144

❖ If X and Y are independent random variable then𝑥, 𝜎𝑥𝑦 = 0,(assuming the covariance
exists)otherwise X and Y are dependent variable.

Example: Let X and Y be two random variables having a joint density function given by

1.5, −1 < 𝑥 < 1,0 < 𝑦 < 𝑥 2


𝑓(𝑥, 𝑦) = { .
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Note this density implies that (x,y) points are equally likely to occur on and below the
parabola represented by the graph of 𝑦 = 𝑥 2 . There is a direct functional dependence
between Xand the range ofY, so that𝑓(𝑥|𝑦)will change as x changes and thus XandY must be

dependent random variables. Nonetheless; 𝜎𝑥𝑦 = 0. To see this, note that

So that the X and Y are independent random variable


Properties covariance:
If X and Y are either continuos or discrete random variable and if a and b are any constant
number, then covariance has the following properties.

58
Probability Lecture Note
Stat(2012)

1. 𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸(𝑥𝑦) − 𝐸(𝑥)𝐸(𝑦)


2. 𝐶𝑜𝑣(𝑥, 𝑦) = 0, 𝑖𝑓 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
3. 𝐶𝑜𝑟𝑟(𝑥, 𝑦) = 0, 𝑖𝑓 𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
4. 𝐶𝑜𝑣(𝑥, 𝑥) = 𝑉𝑎𝑟(𝑥)
5. 𝐶𝑜𝑣(𝑥 + 𝑎, 𝑦) = 𝐶𝑜𝑣(𝑥, 𝑦)
6. 𝐶𝑜𝑣(𝑎𝑥, 𝑦) = 𝑎𝐶𝑜𝑣(𝑥, 𝑦)
7. 𝐶𝑜𝑣(𝑥, 𝑦 + 𝑧) = 𝐶𝑜𝑣(𝑥, 𝑦) + 𝐶𝑜𝑣(𝑥, 𝑧)
8. 𝐶𝑜𝑣(𝑎𝑥, 𝑏𝑦) = 𝑎𝑏𝐶𝑜𝑣(𝑥, 𝑦)
9. 𝐶𝑜𝑣(𝑥 ± 𝑎, 𝑦 ± 𝑏) = 𝐶𝑜𝑣(𝑥, 𝑦)

Correlation coefficient: The correlation coefficient between two random variablesXandYis


𝜎𝑥𝑦
defined by𝑐𝑜𝑟𝑟 (𝑥, 𝑦) = 𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦

Correlationcoefficient tells us the degree of association and the direction of the linear
relationship between the random variables.
The correlation coefficientcomputed from the sample data measures the strength and
direction of a linear relationship between two variables.
The symbol for the sample correlation coefficient is r.
The symbol for the population correlation coefficient is ρ
The range of the correlation coefficient is from −1 to +1.
If there is a strong positive linear relationshipbetween the variables, the value of rwill
be close to +1.
If there is a strong negative linear relationship between the variables, the value of r
will be close to −1.
When there is no linear relationship between the variables or only a weak
relationship, the value of rwillbe close to 0.
Example: Let the bivariate random variable (X,Y) have a joint density function
𝑥 + 𝑦, 0 < 𝑥 < 1,0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { , then compute the correlation coefficient of X and Y?
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution:

59
Probability Lecture Note
Stat(2012)

Properties of correlation:LetX and Y be random variables with correlation equal to ρ.Then-


1≤ ρ ≤1. Furthermore, ρ equals 1 or -1 if and only if Y is a linear function of X. In fact:
1. If Y = a +bX for some constants aandb then ρ=1 if b>0, and ρ=-1 if b<0.
2. If Y ≠ a +bX for all aandb then -1< ρ<1.
If X and Y are independent then ρ=0
Exercise:
1. A Seattle newspaper intends to administer twodifferent surveys relating to two different
anti-tax initiatives on the ballot in November. The proportion ofsurveys mailed that will
actually be completed andreturned to the newspaper can be represented as the outcome
of a bivariate random variable (X,Y) having thedensity function
2
(𝑥
𝑓(𝑥, 𝑦) = {5 + 2𝑦), 0 < 𝑥 < 1,0 < 𝑥 < 1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where X is the proportion of surveys relating to initiative Ithat are returned, and Y
refers to the proportion of surveysrelating to initiative II that are returned.Compute the
covariance and correlation coefficient between the X and Y and also interpretthe
values of correlation coefficient between the X and Y

60
Probability Lecture Note Stat(2012)

CHAPTER SVEN
Common Discrete probability Distributions and their Properties
Binomial distribution
Binomial distribution is one of the simplest and most frequently useddiscrete probability
distribution and is very useful in many practicalsituations involving either /ortypes of events.
Assumption of Binomial Experiment:
1. The procedure/experimenthas a fixed number of trials.
2. The trials must be independent. (The outcome of any individual trial doesn’taffect the
probabilities in the other trials.)
3. Each trial must have all outcomes classified into two categories. One of the outcomes is
labeled as Success and the other as Failure.
4. The probability of Successremains the same from trial to trial
The outcomes of the binomial experiment and the corresponding probability of these outcomes are
called Binomial distribution
Let X be the number of successes. Then X follows a binomialdistribution with parameters n, number
of experiments performed andp, probability of success, and write as X~Bin(n,p).
The probability mass function of Binomial distribution is given by:

P(X=xi)=(𝑛𝑥)𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 ;x=0,1,2,3….n, Where p =is the probability of success


q=1-p is the probability of failure
n=is number of trials, x=is number of successes
The binomial coefficient(𝑛𝑥), represents the number of ways of arranging x successesandn−x failures.
The shape of the PMF of X(n) depends on the parametersnandp.
Properties of Binomial distribution
1. E(X) =np 3. Mx(t) =(𝒒 + 𝒑𝒆𝒕 )𝒏
2. Var(x) = npq
𝑛−𝑥
Note: notation of binomial expansion(𝑎 + 𝑏)𝑛 = ∑𝑛𝑥=0(𝑛𝑥)𝑎𝑥 𝑏
Proof:First we can proof the moment generating function of the random variable X.

Mx(t)=E(𝑒 𝑡𝑥 ) =∑𝑛𝑥=0 𝑒 𝑡𝑥 (𝑛𝑥)𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 =∑𝑛𝑥=0(𝑛𝑥)(𝑒 𝑡 𝑝)𝑥 𝑞 𝑛−𝑥

61
Probability Lecture Note Stat(2012)

=∑𝑛𝑥=0(𝑛𝑥)(𝑝𝑒 𝑡 )𝑥 𝑞 𝑛−𝑥 =(𝑞 + 𝑝𝑒 𝑡 )𝑛


From the moment generating function we find the expected values of r.v X and variance of X by
using the successively differentiating the Mx(t).
The mean and variance of the binomial distribution may be determined as
𝑛
𝑛!
𝐸(𝑥) = ∑ 𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
𝑥=0
𝑛
(𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑥 𝑝 𝑥−1 𝑞 𝑛−𝑥
𝑥(𝑥 − 1)! (𝑛 − 𝑥)!
𝑥=1
𝑛−1
(𝑛 − 1)!
= 𝑛𝑝 ∑ 𝑝 𝑦 𝑞 𝑛−1−𝑦 𝑠𝑖𝑛𝑐𝑒 y = x − 1
𝑦! (𝑛 − 1 − 𝑦)!
𝑦=0

So that 𝐸(𝑥) = 𝑛𝑝
Using similar approach we can find𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1) + 𝑥)

= 𝐸(𝑥(𝑥 − 1)) + 𝐸(𝑥)


𝑛
𝑛!
𝐸(𝑥(𝑥 − 1)) = ∑ 𝑥(𝑥 − 1) 𝑝 𝑥−1 𝑞 𝑛−𝑥 + 𝑛𝑝
𝑥! (𝑛 − 𝑥)!
𝑥=0
𝑛
𝑛!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑥(𝑥 − 1) 𝑝 𝑥−2 𝑞 𝑛−𝑥 + 𝑛𝑝
𝑥(𝑥 − 1)(𝑥 − 2)! (𝑛 − 𝑥)!
𝑥=2
𝑛
𝑛!
= 𝑛(𝑛 − 1)𝑝2 ∑ 𝑝 𝑥−2 𝑞 𝑛−𝑥 + 𝑛𝑝
(𝑥 − 2)! (𝑛 − 𝑥)!
𝑥=2

𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝐸(𝑥 2 ) = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝


𝑉(𝑥) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2
= 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝 − (𝑛𝑝)2
= 𝑛𝑝 − 𝑛𝑝2
= 𝑛𝑝𝑞

62
Probability Lecture Note Stat(2012)

Example:
1. Suppose a coin is tossed 10 times. What is the probability of getting?
A. Exactly 3 heads D. More than 3 heads
B. At most 3 heads E. No head
C. At least 3 heads

F. Find the average and variance of the number of heads.

2. On the basis of past experience, the probability that a certain electrical component will be
satisfactory is 0.98. The components are sampled item by item from continuous production. In a
sample of five components, what are the probabilities of finding
A. five satisfactory, D. two or more satisfactory
B. exactly one satisfactory E. find mean and variance
C. exactly two satisfactory

Poisson distribution
The Poisson distribution is a discrete probability distribution that applies to occurrencesof some
event over a specified interval. The random variable x is thenumber of occurrences of the event in an
interval. The interval can be time, distance,area, volume, or some similar unit.
A discrete random variable X is called a Poisson random variable with parameterλ, where λ>0, if its

e− 
x

PMF is given by: P(X=x)= ,x=0,1,2,3,……
x!
The symbol estands for a constant approximately equal to 2.7183. It is a famous constant
inmathematics, named after the Swiss mathematician L. Euler, and it is also the base of theso-called
natural logarithm.
The Poisson distribution has the following requirements:
• The random variable x is the number of occurrences of an event over some interval.
• The occurrences must be random.
• The occurrences must be independent of each other.
• The occurrences must be uniformly distributed over the interval being used.

63
Probability Lecture Note Stat(2012)

A Poisson distribution differs from a binomial distribution in these fundamental ways:


1. The binomial distribution is affected by the sample size n and the probability p, whereas the

Poisson distribution is affected only by the mean  .


2. In a binomial distribution, the possible values of the random variable x are 0, 1, . . . ,n, but a
Poisson distribution has possible x values of 0, 1, 2, . . . , with no upper limit.
The Poisson distribution has many applications in science and engineering. Forexample, the number
of telephone calls arriving at a switchboard during variousintervals of time and the number of
customers arriving at a bank during variousintervals of time are usually modeled by Poisson
random variables.
Poisson distribution applies for rare events. Some examples of random variables that usually obey,
to a good approximation, the Poisson probability law are:
1. The number of misprints on a page (or a group of pages) of a book.
2. The number of people in a community living to 100 years of age.
3. The number of wrong telephone numbers that are dialed in a day.
4. The number of transistors that fail on their first day of use.
5. In the case of Natural disaster ,like earth quake, flooding
6. In the case of accidents like car accident.

Properties of Poisson distribution

3. Mx(t) = 𝒆𝛌(𝐞 −𝟏)


𝐭
1. E(X) = 𝛌

2. Var(x) = 𝛌

∞ 𝑥𝑛 𝑥
Note: ∑𝑛=0 =𝑒 , it is called maculoria series or exponential expanetion
𝑛!
Proof: First we can proof the moment generating function of the random variable X.
𝒆𝒕𝒙 𝒆−𝛌 𝛌𝒙 (𝛌𝒆𝒕 )𝒙 𝒆−𝛌
𝑴𝒙(𝒕) = 𝑬(𝒆𝒕𝒙 ) =∑∞
𝒙=𝟎 = ∑∞
𝒙=𝟎
𝒙! 𝒙!

−
(𝛌𝒆𝒕 )𝒙

=𝒆 ∑
𝒙=𝟎 𝒙!
𝒕
= 𝒆−𝛌(𝒆 −𝟏)
64
Probability Lecture Note Stat(2012)

Exercise:From the moment generating function we find the expected values of r.v X and variance of
X by using the successively differentiating the Mx(t).
The mean and variance of the Poisson distribution easily determined as follows:
𝑒 −λ λ𝑥 𝑒 −λ λ𝑥−1
E(𝑥) = ∑∞
𝑥=0 𝑥 𝑥! = λ ∑∞
𝑥=1 𝑥 𝑥(𝑥−1)!

λ𝑥−1 λ λ2 λ3 λ4
=λ𝑒 −λ ∑∞
𝑥=1 = λ𝑒 −λ ⌈1 + + + + + ⋯⌉
(𝑥−1)! 1! 2! 3! 4!

=λ𝑒 −λ 𝑒 λ = λ
𝑒 −λ λ𝑥 2
Similarly; 𝐸(𝑥 2)
= ∑∞
𝑥=0 𝑥
2
= λ + λ, since 𝐸(𝑥 2 ) = 𝐸[𝑥(𝑥 − 1) + 𝑥]
𝑥!
2
So that 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥2 ) − (𝐸(𝑥)) = λ
Example: Messages arrive at a switchboard in a Poisson manner at an average rate of six per hour.
Find the probability for each of the following events:
A. Exactly two messages arrive within one hour.
B. No message arrives within one hour.
C. At least three messages arrive within one hour
Solution: let X be the random variable that denotes the message arrives at switchboard since the
mean of the message arrive E(x) =6 per hour
A. The probability that exactly three messages arrive within one hour

e− 
x
 𝐞−𝟔 𝟔𝟑
P(X=3)= = =18e−6=0.0446
x! 𝟑!

B. The probability that No message arrives within one hour

e− 
x
 𝐞−𝟔 𝟔𝟎
P(X=0)= = =e−6=0.00248
x! 𝟎!

C. The probability that At least three messages arrive within one hour
P(X≥ 3) = 1-P(x<3)
= 1-[P(x=0)+P(x=1)+P(x=2)]
=1-[e−6+6e−6 + 18e−6 ]=1-25e−6 =0.938
1. The number of phone calls that arrive at a secretary’s desk has a Poisson distribution with a
mean of 4 per hour.
65
Probability Lecture Note Stat(2012)

A. What is the probability that no phone calls arrive in a given hour?


B. What is the probability that more than 2 calls arrive within a given hour?
Solution: let X be the random variable that denotes the numbers of phone calls since the mean of the
message arrive E(x) =4 per hour
A. The probability that no phone calls arrive in a given hour

e− 
x
 𝐞−𝟒 𝟒𝟎
P(X=0)= = =e−4=0.0183
x! 𝟎!

B. The probability that more than 2 calls arrive within a given hour
P(X≥ 2) = 1-P(x<2)
= 1-[P(x=0) + P(x=1)]
=1-[e−6+6e−6 + 18e−6 ]=1-25e−6 =0.938
=1-[e−4+4e−4 ]=1-5e−4 =0.9084
2. The number of typing mistakes that Ann makes on a given page has a Poisson distribution with
a mean of 3 mistakes.
A. What is the probability that she makes exactly 7 mistakes on a given page?
B. What is the probability that she makes fewer than 4 mistakes on a given page?
C. What is the probability that Ann makes no mistake on a given page?
Geometric Distribution
The geometric random variable is used to describe the number of Bernoulli trials until the first
success occurs. An experiment is said to be geometric experiment if it provides;
1. Each repetition is called trial.
2. For each trial there are two mutually exclusive out comes, success or failure.
3. The trials are independent.
4. The probability of success is the same for each trail of the experiment.
5. We repeat the trials until we get the success.
Let X be a random variable that denotes the number of Bernoulli trials until the first success. If the
first success occurs on the xth trial, then we know that the first x − 1 trials resulted in failures. Thus,

the PMF of a geometric random variable, X, is given by P(X=x)=P(1-P)x-1,x=1,2,3,4,5,…


The Differences between the Geometric and the Binomial Distributions is:
66
Probability Lecture Note Stat(2012)

1. The most obvious difference is that the geometric distribution does not have a set number of
observation, n
2. The 2nd most obvious difference is the question being asked:
❖ Binomial distribution asks for the probability of a certain number of successes.
❖ Geometric distribution asks for the probability of the first success.
Properties of Geometric distribution
𝟏 𝒑𝒆𝒕
1. E(x)=𝒑 3. Mx(t) =𝟏−𝒒𝒆𝒕
𝒒
2. Var(x)=
𝒑𝟐
𝒂
Note: ∑∞
𝑥=0 𝑎𝑟 =a+𝑎𝑟 +𝑎𝑟 +𝑎𝑟 +a𝑟 +a𝑟 +……. =
𝑥 2 3 4 5 6
, where |r|<1
𝟏−𝐫

Proof: First we can proof the moment generating function of the random variable X.

Mx(t)=E(𝑒 𝑡𝑥 )=∑∞ 𝑡𝑥
𝑥=1 𝑒 𝑝(1 − 𝑝)
𝑥−1
=𝑝 ∑∞ 𝑡𝑥
𝑥=1 𝑒 (1 − 𝑝)
𝑥−1

= 𝑝 ∑∞ 𝑡𝑥 𝑥−1
𝑥=1 𝑒 𝑞 , since q=1-p
=p(𝑒 𝑡 + 𝑞𝑒 2𝑡 + 𝑞 2 𝑒 3𝑡 + 𝑞 3 𝑒 4𝑡 + 𝑞 4 𝑒 5𝑡 + 𝑞 5 𝑒 6𝑡 +………)
=p𝑒 𝑡 (1+𝑞𝑒 𝑡 + 𝑞 2 𝑒 2𝑡 + 𝑞 3 𝑒 3𝑡 + 𝑞 4 𝑒 4𝑡 + 𝑞 5 𝑒 5𝑡 +………)
=p𝑒 𝑡 {1+𝑞𝑒 𝑡 + (𝑒 𝑡 𝑞)2 + (𝑒 𝑡 𝑞)3 + (𝑒 𝑡 𝑞)4 + (𝑒 𝑡 𝑞)5 +………}
𝟏
=p𝑒 𝑡
𝟏−𝐪𝑒 𝑡

𝐩𝑒 𝑡
=
𝟏−𝐪𝑒 𝑡

q
Note: Sometimes the pdf of X is given in the form: P( X = x) = pq x , x=0,1,2,…; then E ( X ) = ,
p

p 𝐩𝑞𝑒 𝑡
Var( X ) = 2 and𝑀𝑋 (𝑡) = 𝐸 [𝑒 𝑡𝑥 ] =
q 𝟏−𝐪𝑒 𝑡

From the moment generating function we find the expected values of r.v X and variance of X by
using the successively differentiating the Mx(t).
Exercise: find theexpected values of r.v X and variance of X

• 𝜇 = 𝐸(𝑥) = ∑∞
𝑥=1 𝑥𝑝𝑞
𝑥−1
= 𝑝 ∑∞
𝑥=1 𝑥𝑞
𝑥−1

67
Probability Lecture Note Stat(2012)

𝑑 𝑑
=𝑝 ∑∞
𝑥=1 𝑞 𝑥 ,Since 𝑞 𝑥 = 𝑥𝑞 𝑥−1
𝑑𝑞 𝑑𝑞
𝑑
=𝑝 ∑∞
𝑥=1 𝑞
𝑥
𝑑𝑞
𝑑 𝑞 1
=𝑝 ⌈ ⌉=
𝑑𝑞 1−𝑞 𝑝

𝑑2
• 𝜎 2 = 𝑉𝑎𝑟(𝑋) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 ,since𝑝𝑞 ∑∞
𝑥=1 𝑥(𝑥 − 1)𝑞
𝑥−2
= 𝑝𝑞 ∑∞
𝑥=1 𝑞𝑥
𝑑𝑞

1 2
=∑∞ 2
𝑥=1 𝑥 𝑝𝑞
𝑥−1
−( )
𝑝
1
=𝑝 ∑∞ 2 𝑥−1
𝑥=1 𝑥 𝑞 −
𝑝2
𝑞
=
𝑝2

Example:
1. A manufacturer uses electrical fuses in an electronic system. The fuses are purchased in large lots
and tested sequentially until the first defective fuse is observed. Assume that the lot contains 5%
defective fuses.
A. What is the probability that the first defective fuse will be one of the first five tested?
B. Find the mean, variance, and standard deviation for X, the number of fuses tested until the
first defective fuse is observed.
Solution:Given: p=0.05, q=0.95
A. The probability that the first defective fuse will be one of the first five

P(𝑥 ≤ 5) = ∑5𝑥=1 𝑝𝑞 𝑥−1


=𝑃(𝑥 = 1) + 𝑃(𝑥 = 2) + 𝑃(𝑥 = 3) + 𝑃(𝑥 = 4) + 𝑃(𝑥 = 5)
=𝑝𝑞1−1 + 𝑝𝑞 2−1 + 𝑝𝑞 3−1 + 𝑝𝑞 4−1 + 𝑝𝑞 5−1
=𝑝 + 𝑝𝑞 + 𝑝𝑞 2 + 𝑝𝑞 3 + 𝑝𝑞 4
=0.226
B. The mean, variance, and standard deviation for X, the number of fuses tested until the first
defective fuse is observed is

68
Probability Lecture Note Stat(2012)

1 𝑞
𝐸(𝑥) = = 20, 𝑉𝑎𝑟(𝑥) = 2 = 380 𝑎𝑛𝑑 𝑠. 𝑑 = √𝑉𝑎𝑟(𝑥) = 19.49
𝑝 𝑝
2. A fair die is rolled repeatedly until a 6 appears.
A. What is the probability that the experiment stops at the fourth roll?
B. Given that the experiment stops at the third roll, what is the probability that sum of all the
three rolls are at least 12?
3. Suppose X has a geometric distribution with p = 0.1. Find:
A. P(X = 7) E. P(7 ≤X ≤10)
B. P(X = 10) F. Mean of the r.v X
C. P(X ≤3) G. Variance of r.v X
D. P(X > 5)

69
Probability Lecture Note Stat(2012)

CHAPTER EIGHT
Common Continuous Distributions and their properties
Uniform (Rectangular) distribution
A continuous random variable X has a uniform distributionover an interval a tob(b >a) if it is
equally likely to take on any value in this interval. The probability density function (pdf) of X is
constant over interval (a, b) and has the form
1
, 𝑎≤𝑥≤𝑏
𝑓(𝑥) = {𝑏−𝑎 ,−∞ < 𝑎 < 𝑏 < ∞
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The probability that X lies in any subinterval of [a, b] is equal to the length of that subinterval
divided by the length of the interval [  , b]. This follows since when [a, b] is a subinterval of [c, d]
1 b c−d
P (c  X  d ) =
b−a a
dx =
b −
The term “uniform” is justified by the fact that intervals of equal length in (a, b) are assigned the
same probability regardless of their location. The notation used for such a distribution is U(a, b) or R(
 , b), and the expected mean ,variance and moment generating function of uniform distribution is
given by:
a+b ebt −eat
E(x) = Mx(t) =
2 t(b−a)
(a−b)2
Var(x) =
12

70
Probability Lecture Note Stat(2012)

Note: A uniform random variable X is often used where we have no prior knowledge of the actual
pdf and allcontinuous values in some range seem equally likely
Example:IfXis uniformly distributed over the interval [0, 10], compute the probability that
A. 2 <X <9,B. 1 <X <4,C.X <5 D.X >6.
Answer: The respective answers are (a) 7/10, (b) 3/10, (c) 5/10, (d) 4/10.
Normal distribution
Let the random variable X denote the ideal symmetric distribution with mean
μandstandarddeviation𝝈. This distribution is known as the normal distribution. Statisticians have
foundthat the probability density function of such a distribution is given by the function
−(𝑥−𝜇)2
1
f(x)= 𝑒 2𝜎2 , −∞ <x<∞and we write X~N(μ, 𝜎2),
𝜎 √2𝜋
A normal distribution is a continuous probability distribution for a random variable x. The graph of
a normal distribution is called the normal curve. A normal distribution has the following properties.
1. The mean, median, and mode are equal.
2. The normal curve is bell shaped and is symmetric about the mean.
3. The total are under the normal curve is equal to one.
4. The normal curve approaches, but never touches, the x-axis as it extends farther and farther
away from the mean.

71
Probability Lecture Note Stat(2012)

5. Between 𝜇 − 𝜎and𝜇 + 𝜎 (in the center of curve) the graph curves downward. The graph curves
upward to the left of 𝜇 − 𝜎and to the right𝜇 + 𝜎. The points at which the curve changes from
curving upward to curving downward are called inflection points.
6. The Empirical Rule:
❖ Approximately 68% of the area under the normal curve is between 𝜇 − 𝜎and𝜇 + 𝜎.
❖ Approximately 95% of the area under the normal curve is between 𝜇 − 2𝜎and 𝜇 + 2𝜎.
❖ Approximately 99.7% of the area under the normal curve is between 𝜇 − 3𝜎 and 𝜇 + 3𝜎.
The normal density f (x) is a bell-shaped curve that is symmetric about μ and that attains its
maximum value at x= μ. In practice, many random phenomena obey, at least approximately, a
normal probability distribution. Because the normal probability density function is symmetrical,
themean, median and mode coincide at x = μ.
Thus, the value of μdetermines the location of the center of the distribution, and the value of
σ2determines its spread.
An important fact about normal random variables is that if X is normal with mean μ and variance σ2,
then Y = αX + β is normal with mean αμ + β and variance α2𝜎2.
The uses of normal distribution
• Many things actually are normally distributed, or very close to it. For example, height and
intelligence are approximately normally distributed; measurement errors also often have a
normal distribution
• The normal distribution is easy to work with mathematically. In many practical cases, the
methods developed using normal theory work quite well even when the distribution is not
normal.
• There is a very strong connection between the size of a sample N and the extent to which a
sampling distribution approaches the normal form. Many sampling distributions based on
large N can be approximated by the normal distribution even though the population
distribution itself is definitely not normal.
x−
It follows from the foregoing that if X~N(μ, σ2) then Z = is a normal random variable with mean

0 and variance 1. Such a random variable Z issaid to have a standard, or unit, normal distribution.

72
Probability Lecture Note Stat(2012)

The distribution function of the N(0, 1)-distribution is usually denoted by  ; i.e., if Z~N(0, 1), then:
x 1 −t 2 2
P( X  x) =  ( x) =  e dt , −∞ <x<∞
−  2

Calculations of probabilities of the form P(a <X <b) for −∞ ≤ a ≤ b <∞ are done through two steps:
First, turn the r.v. X~N(μ, σ2) into a N(0, 1)-distributed r.v., or, as we say, standardize it, and then use
available tables,the Normal tables. For instance, to obtain P{X <b}, we note that X will be less than b
if and only if (X − μ)/σ is less than (b − μ)/σ, and so
x− b− b− 
P ( x  b) = P   =  
      
Similarly, for any a <b,
𝑎−𝜇 𝑥−𝜇 𝑏−𝜇 𝑎−𝜇 𝑏−𝜇
𝑃(𝑎 < 𝑥 < 𝑏) = 𝑃 ( < < ) = 𝑃( <𝑍< )
𝜎 𝜎 𝜎 𝜎 𝜎
𝑏−𝜇 𝑎−𝜇 𝑏−𝜇 𝑎−𝜇
𝑃 (𝑍 < ) − 𝑃 (𝑍 < ) = ∅( )−∅( )
𝜎 𝜎 𝜎 𝜎
While the normal table tabulates  (x) only for nonnegative values of x, we can also obtain  (−x)

from the table by making use of the symmetry (about 0) of the standard normal probability density
function. That is, for x>0, if Z represents a standard normal randomvariable, then
 (− x) = P(Z  − x) = P(Z  x) By symmetry
= 1 −  ( x) Thus, for instance,
P{Z <−1} =  (−1) = 1 −  (1) = 1 − .8413 = .1587

The mean, variance and moment generating function of the Normal distribution
If X~N(μ, σ2) then;
E(X) = μ 𝛿2 𝑡 2
Mx(t)=𝑒 𝜇𝑡+2
Var(X) = σ2
Proof: First we can proof the moment generating function of the random variable X.
−(𝑥−𝜇) 2
𝑡𝑥 ∞ 𝑡𝑥 1
Mx(t)=E(𝑒 )=∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥
𝜎 √2𝜋
−(𝑥−𝜇) 2
∞ 𝑡𝑥 1
=∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥 but 𝑒 𝑡𝑥 =𝑒 𝑡𝑥−𝑡𝜇+𝑡𝜇
𝜎√2𝜋
−(𝑥−𝜇) 2
∞ 1
=∫−∞ 𝑒 𝑡𝑥−𝑡𝜇+𝑡𝜇 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋

73
Probability Lecture Note Stat(2012)

−(𝑥−𝜇) 2
𝑡𝜇 ∞ 𝑡(𝑥−𝜇) 1
=𝑒 ∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥
𝜎 √2𝜋
2 2
−[(𝑥−𝜇) −2𝜎 𝑡(𝑥−𝜇)]
𝑡𝜇 ∞ 1
= 𝑒 ∫−∞ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋
But (x − 𝜇 − 𝜎 2 𝑡) 2 =(𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇)+𝜎 4 𝑡 2 -𝜎 4 𝑡 2
−[(𝑥−𝜇)2 −2𝜎2 𝑡(𝑥−𝜇)+𝜎4 𝑡2 −𝜎4 𝑡2 ]
𝑡𝜇 ∞ 1
=𝑒 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
2
𝜎2 𝑡2 −(𝑥−𝜇−𝜎2 𝑡)
∞ 1
= 𝑒 𝑡𝜇 𝑒 2 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
2
𝜎2 𝑡2 −(𝑥−𝜇−𝜎2 𝑡)
𝑡𝜇+ 2 ∞ 1
=𝑒 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
𝜎2 𝑡2
𝑡𝜇+
=𝑒 2

2
−(𝑥−𝜇−𝜎2 𝑡)
∞ 1
Since ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥 =1, since with mean𝝁 + 𝝈𝟐 𝒕
𝜎2 𝑡 2
𝑡𝜇+
Therefore:Mx(t)= 𝑒 2

Example
1. If X is a normal random variable with mean μ = 3 and varianceσ2= 16, find
A.P{X <11};B.P{X >−1};C.P{2 <X <7}
Answer:A.P{X <11} =  (2) = .9772 B. P{X >−1}=P{Z >−1}= P{Z <1}= .8413

C.P {2 <X <7}=  (1) −  (−1/4) =  (1) − (1 −  (1/4)) = .8413 + .5987 − 1 = .4400
7. Find the area the normal distribution curve
A. P(0<Z<2.34) D. P(Z<-1.93)
B. P(-1.75<Z<0) E. P(-1.37<Z<1.68)
C. P(Z>1.11) F. P(2<Z<2.47)
Answer A.0.4904 B.0.4599 C.0.1335 D. 0.0268 E.0.8682 F. 0.0160
8. Find the Za-values such that the area of under normal distribution curve
A. P(0<Z<Za)=0.2123
B. P(0<Z<Za) = 0.45
Answer A. Za=0.56B.Za=1.645

74
Probability Lecture Note Stat(2012)

Exercise: The mean weight of 200 students in a certain college is 140 lbs, and the standard deviation
is 10 lbs. If we assume that the weights are normally distributed, evaluate the following:
A. The expected number of students that weight between 110 and 145 lbs.
B. The expected number of students that weight less than 120 lbs.
C. The expected number of students that weigh more than 170 lbs.
Exponential distribution
A continuous random variable whose probability density function is given, for someλ> 0, by

f ( x ) =  e − x , x≥0
is said to be an exponential random variable (or, more simply, is said to be exponentially
distributed) with parameter λ.
The cumulative distribution function F (x) of an exponential random variable is given by

F ( x) = P( X  x ) =  e −y dy = 1 − e −x , x ≥ 0
x

The exponential distribution often arises, in practice, as being the distribution of the amount of time
until some specific event occurs. For instance, the amount of time (starting from now) until an
earthquake occurs, or until a new war breaks out, or until a telephone call you receive turns out to be
a wrong number are all random variables that tend in practice to have exponential distributions.
The mean, variance and moment generating function of the exponential distributions are

1 1 
E( X ) = and Var ( X ) = Mx(t) =
  2
 −𝑡
Example:
1. Assume that the length of phone calls made at a particular telephone booth is exponentially
distributed with a mean of 3 minutes. If you arrive at the telephone booth just as Chris was about
to make a call, find the following:
A. The probability that you will wait more than 5 minutes before Chris is donewith the call.
B. The probability that Chris’ call will last between 2 minutes and 6 minutes
Solution: Let X be a random variable that denotes the length of calls made at the telephone booth.
Since the mean length of calls is 1/λ = 3, we have that the PDF of X is given by

75
Probability Lecture Note Stat(2012)

A. The probability that you will wait more than 5 minutes is the probability thatXis greater than
5 minutes, which is givenby

B. The probability that the call lasts between 2 and 6 minutes is given by

2. The lifetime of an automobile battery is described by ar.v. X having the Exponential distribution
1
with parameter  = . Then:
3
A. Determine the expected lifetime of the battery and the variation around this mean.
B. Calculate the probability that the lifetime will be between 2 and 4 time units.
Answer: (a) E(x)=3 and Var(x)=9

(b) P(2  x  4) = P( x  4) − P( x  2) = F (4) − F (2) = 1 − e 3  − 1 − e 3   0.252


−4 −2

   
Exercise:
1. The life time X of a system in weeks is given by the following pdf:

A. What are the expected value of Xandthe variance of X?


B. What is the CDF of X?
2. A certain industrial process yields a large number of steel cylinders whose lengths are
distributed normally with mean 3.25 inches and standard deviation 0.05 inch. If two such
cylinders are chosen at random and placed end to end, what is the probability that their
combined length is less than 6.60 inches? (Hint: using standard normal distribution)

76
Probability Lecture Note Stat(2012)

77

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy