Probability Theory Lecture Note
Probability Theory Lecture Note
CHAPTER ONE
INTRODUCTION
Deterministic and non-deterministic models
Deterministic models always return the same result any time they are called with a specific set
of input values. Nondeterministic models may return different results each time even if the
input values that they access remain the same. In deterministic models the condition under
which an experiment is performed determines the outcome of the experiment but in
nondeterministic models we cannot determine the outcome of the experiment even if we know
condition under which an experiment is performed.
A mathematically deterministic model is a representation y = f(x) that allows you to make
predictions of y based on x. The model is used like this: when x = 3, then we predict that
y = f (3). For example, suppose y = 2+3X - 4𝑋 2 . We can predict that if x = 3, then y = -25
❖ Mathematical model in which outcomes are precisely determined through known
relationships among states and events, without any room for random variation. In such
models, a given input will always produce the same output.
❖ A deterministic event always has the same outcome and is predictable 100% of the time.
➢ Distance traveled = time * velocity
➢ The speed of light
➢ The sun rising in the east
❖ Note that this "prediction" does not necessarily occur in the past, future, or even the present.
It is simply a hypothetical, "what-if" statement. It helps us identify what would be the
outcome if we were to use a particular x.
❖ This type of model is "deterministic" because y is completely determined if you know x.
Non-deterministic/ probability model
❖ A probability model is a representation y ~ p(y). Note that we say "y ~ p(y)" not "y =
p(y)". The notation "y ~ p(y)" specifically means that y is generated at random from a
probability distribution whose mathematical form is p(y). This model also allows you to
make "what-if" predictions as to the value of y, but, unlike the deterministic model, it
does not allow you to say precisely what the value of y will be.
A probabilistic event is an event for which the exact outcome is not predictable
100% of the time.
1
Probability Lecture Note Stat(2012)
2
Probability Lecture Note Stat(2012)
Intersection:Let A and B be any two subsets of a specified universal set, S. Then the intersection
of the sets A and B is the set of all elements in S that are in both sets A and B, and is denoted by
A∩B.
De Morgan’s rules:for any two A and B events
❖ (𝐴 ∩ 𝐵)𝑐 = 𝐴𝑐 ∪ 𝐵𝑐
❖ (𝐴 ∪ 𝐵)𝑐 = 𝐴𝑐 ∩ 𝐵𝑐
Generally to know about the probability of an event, there must be know the concept of set
theory about the notation of set probability.
If A and B are events then the following conditions are true.
➢ If A and B are disjoint events then A∩B= ∅
➢ If at least one of the events occurs ,then it means that A∪B
➢ Both the events are occurs, then it means that A∩B
➢ If neither event A nor event B occurs, then it means that 𝐴 ∩ 𝐵
➢ Exactly one of the event A occurs, then it means that 𝐴 ∩ 𝐵
➢ If exactly one of the events occurs, then it means that (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
➢ If not more than one of the events A or B occurs, then it means that (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵) ∪
(𝐴 ∩ 𝐵)
Exercise:
1
1. Suppose that A, B, and C are events such that P(A) = P(B) = P(C) = 4,
1
P(A ∩ B) = P(C ∩B) = 0, and P(A∩ C) = 8· Evaluate the probability that at least one of
the events A, B, or C occurs.
2. Let A, B, and C be three events associated with an experiment. Express the following
verbal statements in set notation.
A. At least one of the events occurs. F. At least two events occur.
B. Exactly one of the events occurs. G. All three events occur.
C. Exactly two of the events occur. H. None occurs.
D. Only A occurs. I. At most one occurs.
E. Both A and B but not C occurs. J. At most two occurs.
K. Not more than two of the events occur simultaneously.
3. Random experiments, Sample space and events
1. Experiment: Any process of observation or measurement or any process which generates well
defined outcome.
2. Probability Experiment (Random Experiment): It is an experiment that can be repeated any number
of times under similar conditions and it is possible to enumerate the total number of outcomes
without predicting an individual outcome.
Example: If a fair coin is tossed three times, it is possible to enumerate all possible eight sequences of
head (H) and tail (T). But it is not possible to predict which sequence will occur at any occasion.
3. Outcome: The result of a single trial of a random experiment
3
Probability Lecture Note Stat(2012)
denoted by A' or Ac or A , contains those points of the sample space which don’t belong to A.
8. Elementary (simple) Event: an event having only a single element or sample point.
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same time.
10. Independent Events: Two events are said to be independent if the occurrence of one does
4
Probability Lecture Note Stat(2012)
Addition Rule: Suppose that for an experiment the 1𝑠𝑡 procedure is designated by 1can be
performed 𝑛1 ways, the 2𝑛𝑑 procedure is designated by 2 can be performed 𝑛2 ways, … the 𝐾 𝑡ℎ
procedure is designated by K can be performed 𝑛𝑘 ways and each procedure/stps cannot be
performed together, then the total number of possibility of the experiment can be performed
(𝑛1 + 𝑛2 + 𝑛3 + … + 𝑛𝑘 )𝑤𝑎𝑦𝑠
I.e. n (A or B) =n (A) +n (B)-n (A B)
To list the outcomes of the sequence of events, a useful device called tree diagramis used.
Example:1A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk
with bread, cake and sandwich. How many possibilities does he have?
Solutions:
5
Probability Lecture Note Stat(2012)
7
Probability Lecture Note Stat(2012)
head can arise in only one way, we reason that the required probability is 1/2. In arriving at this,
we assume that the coin is fair, i.e., not loaded in any way.
Example
1. A fair die is tossed once. What is the probability of getting
A. Number 4? C. Number greater than 4?
B. An odd number? D. Either 1 or 2 or …. Or 6
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of these
candles are selected at random, what is the probability?
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
2. Frequency approach: If after n repetitions of an experiment, where n is very large, an event
is observed to occur in h of these, then the probability of the event is h/n. This is also called
the empiricalprobabilityof the event.
Example 1:If we toss a coin 1000 times and find that it comes up heads 532 times, we estimate
the probability of a head coming up to be 532/1000=0.532.
Example 2: If records show that 60 out of 100,000 bulbs produced are defective. What is the
probability of a newly produced bulb to be defective?
❖ Both the classical and frequency approaches have serious drawbacks, the first because the
words “equally likely” are vague and the second because the “large number” involved is
vague. Because of these difficulties, mathematicians have been led to an axiomatic approach
to probability.
3. Axiomatic Approach ( Basic notion of Probability):
Probability theory is derived from a small set of axioms – a minimal set of essential assumptions
A deep understanding of axiomatic probability theory is not essential to financial
econometricsor to the use of probability and statistics in general, although understanding these
core conceptsdoes provide additional insight.
Let “E” be a random experiment and S be a sample space associated with “E”. With each event
A, we associate a real number designated by P (A) and called the probability of A satisfies the
following properties:
8
Probability Lecture Note Stat(2012)
1. 0 P( A) 1
2. P(S) =1
3. If A and B are mutually exclusive events, the probability that one or the other occur equals the
sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)
4. If 𝐴1 , 𝐴2 , 𝐴3 , … , 𝐴𝑛 are mutually exclusive events, then
∞
9
Probability Lecture Note Stat(2012)
Exercise:
1. If two dice are thrown, what is the probability that the sum is
A. greater than 8? B. neither 7 nor 11?
2. An urn contains 8 white balls and 2 green balls. A sample of three balls is selected at
random. What is the probability that the sample contains at least one green ball?
3. A box contains 12 light bulbs of which 5 are defective. All the bulbs look alike and have
equal probability of being chosen. Three bulbs are picked up at random. What is the
probability that at least 2 are defective?
1
4. A problem in Mathematics is given to three students, whose chances of solving itare ,
2
1 1
3
𝑎𝑛𝑑 4respectively. What is the probability that problem will be solved?
1 2 3
5. The probabilities of A, B and C solving a problem are ,
2 7
𝑎𝑛𝑑 8respectively. If allthe three
try to solve the problem simultaneously, find the probability that exactly one ofthem will
solve it.
6. A husband and wife appear in an interview for two vacancies in the same department.The
1 1
probability of husband's selection is 𝑎𝑛𝑑 that of wife's selection is What is the
7 5
probability that
A. Only one of them will be selected?
B. Both of them will be selected?
C. None of them will be selected?
D. At least one of them will be selected?
10
Probability Lecture Note Stat(2012)
7. Almaze and Tamerat appear for an interview for two vacancies. The probabilityofAlmaze's
1 1
selection is and that of Tamerat's selection is . Find the probability that
3 5
A. both of them will be selected.
B. none of them is selected.
C. at least one of them is selected.
D. only one of them is selected.
11
Probability Lecture Note Stat(2012)
CHAPTER TWO:
Conditional probability and Independency
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the
other event then the two events are conditional or dependantevents. The conditional
probability of an event B is the probability that the event will occur given the knowledge that an
event A has already occurred. This probability is written P(B|A), notation for the probability of
B given A. In the case where events A and B are independent (where event A has no effect on the
probability of event B), the conditional probability of event B given event A is simply the
probability of event B, that is P(B).
If events A and B are not independent, then the probability of the intersection of A and B (the
probability that both events occur) is defined by P(A and B) = P(A)P(B|A).
From this definition, the conditional probability P(B|A) is easily obtained by dividing by
p(A B
P (A B ) = , P (B) 0 or P (A B) = P (A|B).P(B)
P (B )
To calculate the probability of the intersection of more than two events, the conditional
probabilities of all of the preceding events must be considered. In the case of three events, A, B,
and C, the probability of the intersection P(A∩B ∩ C) =P(A)P(B|A)P(C|A ∩B).
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
2. Draw a ball without replacement
Conditional probability rule
❖ Addition rule: (A1∪A2∪…∪|B)=P(A1|B)+P(A2|B)+… for mutually disjoint events
A1,A2…
❖ Complement rule: P(Ac|B)=1-P(A|B)
❖ Multiplication rule: P(A∩B)=P(B) P(A|B). More generally, for any events A1, A2,…,An
then P(A1∩A2∩…∩ An)=P(A1)P(A1|A2)P(A3|A1∩A2)…P(An|A1∩A2∩…∩An-1)
Conditional probability of an event
The conditional probability of an event A given that B has already occurred, denoted by P(A|B).
Since A is known to have occurred, it becomes the new sample pace replacing the original
sample space.
From this we are led to the definition
12
Probability Lecture Note Stat(2012)
p(A B
P (A B ) = , P (B) 0 or P (A B) = P (A|B).P(B)
P (B )
Remark:
1. P(A − B )= 1 − P(A B )
2. P(B − A)= 1 − P(B A)
3. For three events 𝐴1 , 𝐴2 𝑎𝑛𝑑𝐴3 then 𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐴3 ) = 𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 )𝑃(𝐴3 |𝐴1 ∩ 𝐴2 )
4. Generalization of multiplication theorem, for events 𝐴1 , 𝐴2 , 𝐴3 , … , 𝐴𝑛 we have
𝑃(𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛 ) = 𝑃(𝐴1 )𝑃(𝐴2 |𝐴1 )𝑃(𝐴3 |𝐴1 ∩ 𝐴2 ) … 𝑃(𝐴𝑛 |𝐴1 ∩ 𝐴2 ∩ … ∩ 𝐴𝑛−1 )
Example:
1. For a student enrolling at freshman at certain university the probability is 0.25 that he/she
will get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that he/she
will get scholarship and will also graduate. What is the probability that a student who get a
scholarship graduate?
Solution: Let A= the event that a student will get a scholarship
B= the event that a student will graduate
given p ( A) = 0.25, p ( B) = 0.75, p( A B ) = 0.20
Re quired p(B A)
p( A B ) 0.20
p ( B A) = = = 0.80
p ( A) 0.25
2. A lot consists of 20 defective and 80 non-defective items from which two items are chosen
without replacement. Events A & B are defined as A = {the first item chosen is defective}, B =
{the second item chosen is defective}
a. What is the probability that both items are defective?
b. What is the probability that the second item is defective?
3. In the above example 2, if we choose 3 items after other without replacement, what is the
probability that all items are defective?
The law of total probability
Law of Total Probability: The “Law of Total Probability” (also known as the “Method of
Conditioning”) allows one to compute the probability of an event B by conditioning on cases,
according to a partition of the sample space.
Let {A1,A2, . . . ,An} be a partition of the sample space S, and suppose each one of the events
A1,A2, . . . ,An, has nonzero probability of occurrence. Let A be any event. Then
P(B) = P(A1)P(B|A1)+P(A2)P(B|A2)+···+P(An)P(B|An)
13
Probability Lecture Note Stat(2012)
=∑𝒏𝒊=𝟏 𝑃(𝐴𝑖)𝑃(𝐵|𝐴𝑖)
Definition (Partition):- We say that events 𝐴1 , 𝐴2 , … , 𝐴𝑛 represent a partition of sample space S
A1 A2 A3 …An
B
A1 A2 A3 … An
From the above diagram 𝐵 = ⋃𝑛𝑖=1(𝐵 ∩ 𝐴𝑖 )
𝐵 = (𝐵 ∩ 𝐴1 ) ∪ (𝐵 ∩ 𝐵𝐴2 ) ∪ (𝐵 ∩ 𝐴3 ) ∪ … ∪ (𝐵 ∩ 𝐴𝑛 )
(𝐵 ∩ 𝐴𝑖 )𝑎𝑛𝑑(𝐵 ∩ 𝐴𝑗 )𝑎𝑟𝑒𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒𝑖𝑓𝑖 ≠ 𝑗
∴ 𝑃(𝐵) = 𝑃(𝐵 ∩ 𝐴1 ) + 𝑃(𝐵 ∩ 𝐴2 ) + 𝑃(𝐵 ∩ 𝐴3 ) + … + 𝑃(𝐵 ∩ 𝐴𝑛 )
= 𝑃[(𝐵|𝐴1 )𝑃(𝐴1 ) ∪ (𝐵|𝐴2 )𝑃(𝐴2 ) ∪ (𝐵|𝐴3 )𝑃(𝐴3 ) ∪ … ∪ (𝐵|𝐴𝑛 )𝑃(𝐴𝑛 )]
𝑛
Bayes’ theorem
Bayes’ theorem (also known as Bayes’ rule or Bayes’ law) is a result in probability theory that
relates conditional probabilities. If A and B denote two events, P(A|B) denotes the conditional
probability of A occurring, given that B occurs. The two conditional probabilities P(A|B) and
P(B|A) are in general different. Bayes theorem gives a relation between P(A|B) and P(B|A).
An important application of Bayes’ theorem is that it gives a rule how to update or revise the
strengths of evidence-based beliefs in light of new evidence a posteriori.
Bayes’ theorem relates the conditional and marginal probabilities of stochastic events A and B:
14
Probability Lecture Note Stat(2012)
P(B|Ai )P(Ai )
𝑃(𝐴𝑖 |𝐵) = Each term in Bayes’ theorem has a conventional name:
P(B)
➢ P(A) is the prior probability or marginal probability of A. It is”prior” in the sense that it
does not take into account any information about B.
➢ P(Ai|B) is the conditional probability of A, given B. It is also called the posterior probability
because it is derived from or depends upon the specified value of B.
➢ P(B|A) is the conditional probability of B given A.
➢ P(B) is the prior or marginal probability of B, and acts as a normalizing constant
❖ A prior probability is an initial probability value originally obtained before any additional
information is obtained.
❖ A posterior probability is a probability value that has been revised by using additional
information that is later obtained
Let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be a partition of the sample space S and let B be the event associated with S.
Applying the definition of conditional probability, we have
P(B|Ai )P(Ai )
𝑃(𝐴𝑖 |𝐵) =
∑ni=1 P(B|Ai )P(Ai )
P(B∩Ai )
Proof:-P(Ai |B) = but P(A∩B)= P(B∩A)=P(A)P(B|A)
P(B)
P(B|Ai )P(Ai )
=
∑n
i=1 P(B|Ai )P(Ai )
15
Probability Lecture Note Stat(2012)
Example 2: Of the travelers arriving at a small airport, 60 % fly on major airlines, 30 % fly on
privately owned planes, and the remainder fly on commercially owned planes not belonging to
a major airline. Of those travelling on major airlines, 50 % are travelling for business reasons,
whereas 60 % of those arriving on private planes and 90 % of those arriving on other
commercially owned planes are travelling for business reasons. Suppose that we randomly
select one person arriving at this airport. What is the probability that the person
A. Is travelling on business?
B. Is travelling for business on a privately owned plane?
C. Arrived on privately owned plane, given that the person is travelling for business reasons?
D. Is travelling on business, given that the person is flying on a commercially owned plane?
Solution: - Define the following events.
16
Probability Lecture Note Stat(2012)
A. Let Wbe the event that a white ball is drawn and the conditional probability is defined as
4 2 3
𝑃(𝑊|𝐸1 ) = , 𝑃(𝑊|𝐸2 ) = 𝑎𝑛𝑑 𝑃(𝑊|𝐸3 ) =
8 6 9
Then 𝑃(𝑊) = 𝑃(𝐸1 ∩ 𝑊) + 𝑃(𝐸2 ∩ 𝑊) + 𝑃(𝐸3 ∩ 𝑊)
17
Probability Lecture Note Stat(2012)
1 3 1 1 1 4 71
𝑃(𝐵) = 𝑃(𝐸𝑖 )𝑃(𝐵|𝐸𝑖 ) = ∗ + ∗ + ∗ =
3 8 3 6 3 9 216
So we obtain
𝑃(𝐸2 )𝑃(𝐵|𝐸2 ) 12
𝑃(𝐸2 |𝐵) = =
𝑃(𝐵) 71
Exercise: Box I contains 3 red and 2 blue marbles while Box II contains 2 redand 8 blue marbles.
A faircoin is tossed. If the coin turns up heads a marble is chosen from Box I; if it turns up tails a
marble is chosen from Box II.
A. Find the probability that a red marble is chosen
B. What is the probability that Box I was chosen given that a red marble is known to have been
chosen?
Probability of Independent Events
The probability of B occurring is not affected by the occurrence or nonoccurrence of A, then we
say that A and B are independent events i.e. P (B/A) = P (B). This is equivalent to
P(A B) = P(A).P(B)
Let A and B be events with P(B) ≠0. Then A and B are independent if and only
If P(A | B) = P(A).
Proof: So first suppose that A and B are independent. Remember that this means that
P(A∩B) = P(A) ·P(B). Then
P(A∩B) P(A) ·P(B)
P(A | B)= = = P(A)
P(B) P(B)
18
Probability Lecture Note Stat(2012)
Collary Let events A, B, C are mutually independent. Then A and B∩C are independent, and A
and B∪C are independent
Remarks: If A1, A2, A3 are to be independent then they must be pair wise independent,
Exercise:
1. A ball is drawn at random from a box containing 6 red balls, 4 white balls and 5 blue balls.
Find the probability that they are drawn in the order red, white and blue if each ball is?
A. Replaced B. Not replaced
2. In a factory machine A1, A2, A3 manufacturing 25%, 35% and 40% of the total output
respectively. Out of their product 5%, 4% and 2% are respectively defective. An item is
drawn at random from the product is found to be defective.
A. What is the probability that defective item is produce by all machine?
B. What is the probability that this item is produce by machine A1?
3. Suppose the probabilities of three events, A, B and C are as depicted in the
followingdiagram:
19
Probability Lecture Note Stat(2012)
5. A laboratory blood test is 95 percent effective in detecting a certain disease when it is, in fact,
present. However, the test alsoyields a "false positive" result for 1 percent of the healthy
persons tested.(That is, if a healthy person is tested, then, with probability 0.01, the testresult
will imply he has the disease.) If 0.5 percent of the population actuallyhas the disease, what
is the probability a person has the disease given that his test result is positive?
20
Probability Lecture Note Stat(2012)
CHAPTER THREE
One Dimensional Random Variable
Definitions of Random variable
• A variable whose value is unknown or a function that assigns values to each of an
experiment's outcomes. Random variables are often designated by letters and can be
classified as discrete, which are variables that have specific values, or continuous, which are
variables that can have any values within a continuous range.
• A random variable is a variable that has a single numerical value, determined by chance,
foreach outcome of a procedure.
• A random variable is discrete if it takes on a countable number of values (i.e. there are gaps
between values).
• A random variable is continuous if there are an infinite number of values the random
variable can take, and they are densely packed together (i.e. there are no gaps between
values).
• A probability distribution is a description of the chance a random variable has of taking on
particular values. It is often displayed in a graph, table, or formula.
• A probability histogram is a display of a probability distribution of a discrete random
variable. It is a histogram where each bar’s height represents the probability that the random
variable takes on a particular value.
Random variable is a variable X whose value is determined by the outcomes of random
experiment. It is classified as ;
• Discrete random variable
• Continuous random variable
1. Discrete random variable:
➢ Possible values of isolated points along the number line. Random variables have their own
sample space, or set of possible values. If this set is finite or countable, the random variable is
said to be discrete.
➢ It is a random variable which can assume only a countable numbers of real values.
➢ If X is a discrete random variable taking a values X1,X2,….,Xn then P(xi)=P(X=xi),where
i=1,2,3,….,n is called probability mass function(pmf) of random variable X
21
Probability Lecture Note Stat(2012)
➢ The set of order pairs [xi = P(xi)], i=1,2,3,….,n gives the probability distribution of random
variable X
Discrete probability distribution: A probability distribution describes the possible values and
their probability of occurring.
Discrete probability distribution is called probability mass function (pmf), p(.) and need to
satisfy following conditions
Properties of P(X=xi) where X is Discrete random variable:
1. 0 ≤ P(X=xi) ≤ 1
2. ∑𝒏𝒊=𝟏 P(X = xi) =1
Example: consider the experiment of tossing a coin three times let ‘x’ be the number of heads
then write the probability distribution of the random variable x
Solution:
The variable ‘x’ takes the value 0,1,2,3 with probability distribution {HHH, HHT, HTH, TTH,
THT, THH, HTT, TTT} then the probability distribution for ‘x’
X 0 1 2 3
P(X) 1/8 3/8 3/8 1/8
Example: Suppose we record four consecutive baby births in a hospital. Let X be the difference
in the number of girls and boys born. X is discrete, since it can only have the values 0, 2, or 4.
(Why can’t it be 1 or 3?) Furthermore, if we write out the sample space for this procedure, we
can find the probability that X equals 0, 2, or 4:
S={mmmm, mmmf, mmfm, mfmm, fmmm, mmff, mfmf, mffm, fmfm, ffmm, fmmf, mfff, fmff,
ffmf, fffm, ffff}
There are 16 total cases, and each one is equally likely.
P(X = 0) = 6/16 = 0.375 (these are the cases mmff, mfmf, mffm, fmfm, ffmm, fmmf)
P(X = 2) = 8/16 = 0.5 (these are the cases mmmf, mmfm, mfmm, fmmm, mfff, fmff, ffmf,fffm)
P(X = 4) = 2/16 = 0.125 (these are the cases mmmm and ffff)
Is this a probability distribution?
✓ ∑𝑥 𝑃(𝑥)= 0.375+ 0.5 +0.125= 1
✓ 0 ≤ 0.125 < 0.375 < 0.5 ≤ 1 So it is a probability distribution
Another Example: Is the following a probability distribution? If Y=𝑋 2 ,X=1,2,…,11
22
Probability Lecture Note Stat(2012)
X 1 4 9 16 25 36 49 64 81 100 121
P(x) 0.20 0.15 0.14 0.12 0.10 0.09 0.07 0.05 0.04 0.02 0.01
✓ Clearly, each probability is between 0 and 1. Now we need to see if they sum to 1.
✓ ∑𝑥 𝑃(𝑥)= 0.20+ 0.15+ 0.14 + 0.12+ 0.10 + 0.09+ 0.07 +0.05 +0.04+ 0.02+ 0.01= 0.99 since
they do not sum to 1, it is not a probability distribution.
2. Continuous random variable:
• A random variable X is said to be continuous if it can take all possible values (integral as well
as fractional) between certain limits or intervals. Continuous random variables occur when
we deal with quantities that are measured on a continuous scale. For instance, the life length
of an electric bulb, the speed of a car, weights, heights, and the like are continuous
• If X is a continuous random variable that assume a values X1,X2,X3,…. then f(x)where
i=1,2,3,….is called probability density function(pdf) of random variable X
Properties of Pdf:
1. 0 ≤ f(x) ≤ 1
∞
2. ∫−∞ f(x)dx = 1
3. probability density function(pdf) integral over arrange of [-∞,∞]
𝑥2
4. P(x1<X<x2)=∫𝑥1 f(x)dx
𝑎
5. P(X=a)=∫𝑎 f(x)dx =0
Example: suppose we have a continuous random variable’ X’ with probability density function
2
is given by 𝑓(𝑋) = { 𝑐𝑥 0 < 𝑥 < 3
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Determine the value of ‘c’
B. Verify that f is pdf
C. Calculate 𝑝(1 < 𝑥 < 2)
Solution:
∞
a) ∫−∞ 𝑓(𝑥)𝑑𝑥=1 property of pdf
3 𝑋3 3
∫0 𝑐𝑥 2 𝑑𝑥= 𝑐( 3 )| =9c=c=1/9
0
31 1
b) ∫0 9 𝑥 2 dx=1=(27 𝑥 3 ) = 1Then f is pdf
21 1 2 1
c) ∫1 9 𝑥 3 𝑑𝑥=(27 𝑥 3 ) | = 3
1
23
Probability Lecture Note Stat(2012)
24
Probability Lecture Note Stat(2012)
CHAPTER FOUR
Function of random variable
Equivalent set
Let x be a random variable define on the sample spaces S and Y be a function of X, the Y is also a
random variable. Define Rx and Ry are range of spaces of X and Y respectively. Let C𝐶Ry and
BC Rx, then B is define as follows B={X∈Rx,Y(x)∈C},then event B and C are Equivalent sets.
Which means one is occurred if only if the others are occurred. For any event C𝐶 Ry, P(c)
defined as P(c) =P{x∈Rx, Y(x)∈C}=P(B) i.e P(C)=P(B)
Example:
1. Suppose that Y=𝜋𝑥 2 , then B={X:x≥2}and C={Y:y≥4𝜋 },are equivalent sets? Why?
2. Let Y=2x+1 and Rx=(X: x>0) and Ry{Y: y>1}, suppose that event C defined as C={y≥5}.
How can we define event B in order to make event C and event B are equivalent set?
Solution:
1. Event C and event B are equivalent set. Because in order to get the values of event C
depends on the values of event B, due to this reason B and C are equivalent set
xi 2 3 4 5 …….
yi 4𝜋 9𝜋 16𝜋 25𝜋 …….
2. First we make the table that Y=2x+1 to get C={Y:y≥5}
xi 0 1 2 3 4 …….
yi 1 3 5 7 9 …….
From the table we get the values of C=(y≥5) is if x≥2. So the event B is define B=(x≥2) when x =2
is defined on the range of Rx. Therefore event C and event B are equivalent set
Function of Discrete Random Variables
Let us first dispose of the case when X is a discrete random variable, since it requires only
simple point-to-point mapping. Suppose that the possible values taken by X can be enumerated
as x1, x2, . . .. Shows that the corresponding possible values of Ymay be enumerated as y1= g(x1),
y2=g(x2), . . . Let the pmf of X be given by
25
Probability Lecture Note Stat(2012)
Example:
1. The pmf of a random variable X is given as
2. For the same X as given in Example.1, determine the pmf of Y if Y=2X2 +1.
Solution: in this case, the corresponding values of Y are: g(-1) =2(-1)2+1= 3; g(0)=1;
g(1)= 3; and g(2)= 9, resulting in
1
, for y = 1
4
5 1 1
P(y)= 8
= 2 + 8 , for y = 3
1
, for y = 9
{8
3. A random variable X has the following probability mass function
Find
A. the value of K
B. evaluate P(x<4),P(x>5)and P(3<x≤6)
1
C. what is the smallest values of X for which P(X≤x)>2
Solution:
A. Since P(X=x) is the probability mass function
26
Probability Lecture Note Stat(2012)
B.
1
Therefore the smallest value of x for which P(X ≤ x) >2 is 4
A more frequently encountered case arises when X is continuous with known PDF, F(x), or pdf, f
x(x). To carry out the mapping steps as outlined at the beginning of this section, care must be
exercised in choosing appropriate corresponding regions in range spaces RxandRy, this mapping
being governed by the transformation Y= g(x). Thus, the degree of complexity in determining
the probability distribution of Y is a function of complexity in the transformation g(x). Let us
start by considering a simple relationship Y=g(x)=2x+1, The transformation yg(x) is presented
graphically in Figure below Consider the pdf of Y,F(y); it is defined by: Fy(y) =P(Y≤y)
The region defined by Y yin the range space RY covers the heavier portion of the transformation
curve, as shown in Figure below, which, in the range space RX, corresponds to the region g(X )
𝐲−𝟏
≤y, or X≤ g-1(y), where 𝐠 −𝟏 (𝐲) = 𝟐
Figure: Transformation defined by Equation Y=g(x)=2x+1is the inverse function of g(x), or the
solution for x in Equation Y=g(x)=2x+1 in terms of y. Hence,
This gives the relationship between the PDF of X and that of Y, our desired result.
The relationship between the pdfs of XandYare obtained by differentiating both sides of the
above Equation with respect to y. We have:
28
Probability Lecture Note Stat(2012)
Theorem: Let X be a continuous random variable and Y =g(x) where g(x ) is continuous in X and
strictly monotone. Then
Example: Consider the quadratic function Y = X2. The plot of YagainstXis shown in
Figure below where we see that for one value of Y there are two values of X,
29
Probability Lecture Note Stat(2012)
Figure: Plot of Y = X2
If f(x) is an even function, then f(x) = f(−x) and F(−x) = 1 − F(x). Thus, we have
Example:Find the PDF of the random variable Y = 𝑥 2 , where X is the standard normal random
variable.
Solution:Since the PDF of X is given by f(x) = which is an even function, we have
that and
Exercise:
1
, −1 < 𝑥 < 2
1. The random variable X has the following PDF𝑓(𝑥) = {3
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If we define Y = 2X +3, what is the PDF of Y?
2𝑥, 0 < 𝑥 < 1
2. Let X be a random variable with pdff(𝑥) = { , let Y=3x+1 then find the pdf
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
of Y, g(y)?
2𝑥, 0 < 𝑥 < 1
3. Let X be a random variable with pdff(𝑥) = { , let Y=𝑒 −𝑥 then find the pdf of
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Y,g(y)?
−𝑥
1
𝑥𝑒 2 , 𝑥 ≥ 0 −𝑥
4. Let X be a random variable with pdff(𝑥) = { 4 , let Y= 2 +2 then find the pdf
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
of Y,g(y)?
31
Probability Lecture Note
Stat(2012)
CHAPTER FIVE
Example:
1. recording the amount of precipitate (p) and volume of gas (q) for a given locality, (p,q)
2. Observing the rainfall (R) and temperature (T) of certain town (R, T)
Note: If the possible values (x, y) are finite countable infinite, then (x, y) is called two-
dimensional discrete random variable.
If (x, y) assumes all values in a specified region R in the xy-plane then (x,y) is called two-
dimensional continuous random variable.
Joint probability distribution
If X and Y are two random variable the probabilitydistribution for their simultaneous
occurrences can be represented by a function f(x,y),for any pair values (X,Y) within the range
ofthe random variable X and Y. This function known as joint probability distribution (X, Y).
Definition: 1. Let (x, y) is a two-dimensional discrete random variable with each possible
outcome (Xi, Yi) we associate a number (Xi, Yi) representing P(X=Xi, Y=Yi) and satisfying the
following conditions.
1. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦
2. ∑∞ ∞
𝑖=1 ∑𝑗=1 𝑃(𝑋𝑖 , 𝑌𝑖 ) = 1
32
Probability Lecture Note
Stat(2012)
The function P is joint probability mass function(𝑋, 𝑌). The set of triples[(𝑋𝑖 , 𝑌𝑖 ), 𝑃(𝑋𝑖 , 𝑌𝑖 )],
i=j=1, 2,3,…, is the joint probability distribution of (𝑋, 𝑌).
Definition: 2. Let (X, Y) be a continuous random variable. If it assuming all values in some
region R of the Euclidean plane. Let (X, Y) be tow-dimensional continuous random variable
then the joint probability density function f is a function satisfying the following conditions:
1. 𝑓(𝑥, 𝑦) ≥ 0, ∀𝑥,𝑦 ∈ 𝑅
∞
2. ∬𝑅 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1
Example:
1. Two production lines 1 and 2 have a capacity of producing 5 and 3 items per day
respectively, assume the numbers of items produced by each line is a random variable.
Let (𝑥, 𝑦) be a two-dimensional random variable yielding the number of items produced
by line 1 and line 2 respectively.
Y|X 0 1 2 3 4 5
0 0 0.01 0.03 0.05 0.07 0.09
1 0.01 0.02 0.04 0.05 0.06 0.08
2 0.01 0.03 0.05 0.05 0.05 0.06
3 0.01 0.02 0.04 0.06 0.06 0.05
A. Show that 𝑃(𝑋𝑖 , 𝑌𝑖 ) is a legitimate probability function of (𝑥, 𝑦).
B. What is the probability that both lines produce the same numbers of items?
C. What is the probability that more items are produce by line 2?
2. Let (𝑥, 𝑦) be a two-dimensional discrete random variable with
2−(𝑥+𝑦) , 𝑥 = 1,2,3 𝑎𝑛𝑑 𝑦 = 1,2,3, …
𝑃(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Show that 𝑃(𝑥, 𝑦) is probability pmf?
B. Find 𝑃(𝑥 > 𝑦)
3. Suppose that 𝑓(𝑥, 𝑦) is a two-dimensional random variable with joint pdf is given by
𝑒 −(𝑥+𝑦) , 0 < 𝑥 < ∞, 0 < 𝑦 < ∞
𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Show that 𝑓(𝑥, 𝑦) 𝑖𝑠 𝑝𝑑𝑓?
B. Find 𝑃(𝑥 < 𝑦)
𝑘, 0 < 𝑥 < 2,0 < 𝑦 < 4
4. Given 𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
33
Probability Lecture Note
Stat(2012)
Solution
1. A.
I. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦
I. 𝑃(𝑋𝑖 , 𝑌𝑖 ) ≥ 0, ∀𝑥,𝑦
II. ∑∞ ∞ ∞ ∞
𝑖=1 ∑𝑗=1 𝑃(𝑋𝑖 , 𝑌𝑖 ) = 1 → ∑𝑖=1 ∑𝑗=1 2
−(𝑥+𝑦)
=1
∑∞ ∞
𝑖=1 ∑𝑗=1 2
−(𝑥+𝑦)
=∑∞
𝑖=1 2
−𝑦 ∑∞
𝑗=1 2
−𝑥
1 1 1 1 1 1 1 1
=( + + + + ⋯) ∗ ( + + + +⋯)
2 4 8 16 2 4 8 16
1 1 1 1 1 1 1 1
= (1 + + + + ⋯ ) ∗ (1 + + + + ⋯ )
2 2 4 8 2 2 4 8
1 1 1 1 1 1 1
= ( 1 )∗ 2( 1 ) = 4( 1 )∗( 1 )=1
2 1−2 1−2 1−2 1−2
34
Probability Lecture Note
Stat(2012)
1 1 1 1
Consider ∑∞
𝑥=𝑦+1 2
−𝑥
= ∑∞ 𝑥
𝑥=𝑦+1( ) = ( )
𝑦+1
+ ( )𝑦+2 + ( )𝑦+3 + ⋯
2 2 2 2
1 1 𝑦 1 1 𝑦 1 1 𝑦 1 1 𝑦
= () + () + () + ( ) +⋯
2 2 4 2 8 2 16 2
1 1 1
=( )𝑦 ( 1) = 2−𝑦
2 2 1−2
⇒∑∞ ∞
𝑦=1 ∑𝑥=𝑦+1 2
−(𝑥+𝑦)
= ∑∞
𝑦=1 2
−𝑦 ∑∞
𝑥=𝑦+1 2
−𝑥
= ∑∞
𝑦=1 2
−𝑦
∗ 2−𝑦 = ∑∞
𝑦=1 2
−2𝑦
1 1 1 1
=( + + + +⋯)
4 16 64 256
1 1 1 1
= (1 + + + + ⋯ )
4 4 16 64
1 1 1
= ( 1 )=3
4 1−4
3. A
I. 𝑓(𝑥, 𝑦) ≥ 0, ∀𝑥,𝑦 ∈ 𝑅
∞ ∞ ∞ ∞ ∞
II. ∬𝑅 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1⇒∫0 ∫0 𝑒 −(𝑥+𝑦) 𝑑𝑥 𝑑𝑦 = ∫0 𝑒 −𝑦 ∫0 𝑒 −𝑥 𝑑𝑥 𝑑𝑦 = 1
𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑓(𝑥, 𝑦) 𝑖𝑠 𝑙𝑒𝑔𝑖𝑡𝑖𝑚𝑎𝑡𝑒 𝑝𝑑𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
∞ 𝑥
3B. 𝑃(𝑥 < 𝑦) = 1 − 𝑃(𝑥 ≥ 𝑦) = 1 − ∫0 𝑒 −𝑥 ∫0 𝑒 −𝑥 𝑑𝑦 𝑑𝑥
∞ ∞
−𝑥 (1 −𝑥 )
1
=1−∫ 𝑒 −𝑒 𝑑𝑥 = 1 − ∫ (𝑒 −𝑥 − 𝑒 −2𝑥 ) 𝑑𝑥 =
0 0 2
35
Probability Lecture Note
Stat(2012)
𝑦
I. 𝐹(𝑥, 𝑦) = ∑𝑥𝑖=1 ∑𝑗=1 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦), 𝑖𝑓 (𝑥, 𝑦)𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑦 𝑥
II. 𝐹(𝑥, 𝑦) ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦, 𝑖𝑓 (𝑥, 𝑦)𝑖𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑐𝑜𝑢𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Properties of 𝐶𝐷𝐹, 𝐹
I. 𝐹(−∞, −∞) = 0, 𝐹(−∞, ∞) = 1
𝑑 2 𝐹(𝑥,𝑦)
II. = 𝑓(𝑥, 𝑦)
𝑑𝑥 𝑑𝑦
Example: Suppose that the two-dimensional continuous random variable (X, Y) has jointpdf
𝑥𝑦
𝑥 2 + 3 , 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 2
given by𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
36
Probability Lecture Note
Stat(2012)
𝑦 𝑥 1
Solution:𝐹(𝑥, 𝑦) = ∫0 ∫0 (𝑠 + 𝑡)𝑑𝑠𝑑𝑡 = 𝑥𝑦(𝑥 + 𝑦) ; 0<𝑥 <1, 0<𝑦<1
2
Example: Find the joint probability density function of the two random variables 𝑋 and 𝑌
whose joint distribution function is given by,
(1 − 𝑒 −𝑥 )(1 − 𝑒 −𝑦 ) , 𝑓𝑜𝑟 0 < 𝑥, 0 < 𝑦
𝐹(𝑥, 𝑦) = {
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Also use the joint probability density to determine 𝑃(1 < 𝑋 < 3, 1 < 𝑌 < 2).
𝜕2 𝐹(𝑥,𝑦)
Solution:Since partial differentiation yields,𝑓(𝑥, 𝑦) = = 𝑒 −(𝑥+𝑦) for 𝑥 > 0 𝑎𝑛𝑑 𝑦 >
𝜕𝑥𝜕𝑦
0 and 0 elsewhere, we find that the joint probability density 𝑋 and 𝑌 is given by,
𝑒 −(𝑥+𝑦) , 𝑓𝑜𝑟 0 < 𝑥, 0<𝑦
𝑓(𝑥, 𝑦) = {
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Thus, integration yields,
2 3
𝑃(1 < 𝑋 < 3, 1 < 𝑌 < 2) = ∫ ∫ 𝑒 −(𝑥+𝑦) 𝑑𝑥𝑑𝑦 = (𝑒 −1 − 𝑒 −3 )(𝑒 −1 − 𝑒 −2 ) = 0.074
1 1
Y|X 0 1 2 𝑃(𝑥𝑖 )
0 0.25 0.15 0.1 0.50
1 0.1 0.08 0.1 0.28
2 0.05 0.07 0.1 0.22
𝑃(𝑦𝑖 ) 0.40 0.30 0.30 1
Find the marginal distribution function of X and Y
Solution:
i. The marginal distribution function of X is
X=𝑥𝑖 0 1 2 Total
𝑃(𝑋 = 𝑥𝑖 ) 0.40 0.30 0.30 1
ii. The marginal distribution function of Y are
Y=𝑦𝑖 0 1 2 Total
𝑃(𝑌 = 𝑦𝑖 ) 0.50 0.28 0.22 1
1
= (4𝑥 + 3), 𝑥 = 1,2
18
1 1
𝑃(𝑦) = ∑𝑥 𝑃𝑥, 𝑦(𝑋, 𝑌) = ∑2𝑥=1(2𝑥 + 𝑦)= {(2 + 𝑦) + (4 + 𝑦)}
18 18
1
= (2𝑦 + 6), 𝑦 = 1, 2
18
2(𝑥 + 𝑦 − 2𝑥𝑦), 0 ≤ 𝑥 ≤ 1𝑎𝑛𝑑 0 ≤ 𝑦 ≤ 1
3. Let the joint pdf of (x,y) is given by𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the marginal distribution function of X and Y
Solution:
i. The marginal distribution density function of X is
∞ ∞ 𝑦2 1
𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑦= 𝑓(𝑥) = ∫−∞ 2(𝑥 + 𝑦 − 2𝑥𝑦) 𝑑𝑦 = 2(𝑥𝑦 + − 𝑥𝑦 2 ) | = 1
2 0
1, 0 ≤ 𝑥 ≤ 1
Therefore 𝑓(𝑥) = { so that is, X is uniformly distributed over [0, 1]
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
ii. The marginal distribution density function of X is
∞ ∞ 𝑥2 1
𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥= 𝑓(𝑥) = ∫−∞ 2(𝑥 + 𝑦 − 2𝑥𝑦) 𝑑𝑥 = 2(𝑥𝑦 + − 𝑦𝑥 2 ) | = 1
2 0
1, 0 ≤ 𝑦 ≤ 1
Therefore 𝑓(𝑦) = { so that is, y is uniformly distributed over [0, 1]
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
2
(𝑥 + 2𝑦) , 𝑓𝑜𝑟 0 < 𝑥 < 1 , 0 < 𝑦 < 1
4. Given the joint probability density,𝑓(𝑥, 𝑦) = { 3
0 , 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find the marginal densities of 𝑋 and 𝑌.
Solution: Performing the necessary integrations, we get
∞ 1
2 2
𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (𝑥 + 2𝑦)𝑑𝑦 = (𝑥 + 1), 0<𝑥<1
−∞ 0 3 3
∞ 12 1
Likewise,ℎ(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫0 3 (𝑥 + 2𝑦)𝑑𝑥 = 3 (1 + 4𝑦), 0 < 𝑦 < 1
2
Exercise: Let the joint pdf of (x,y) is given by 𝑓(𝑥, 𝑦) = { 12𝑥, 0 < 𝑦 < 𝑥 < 1 and0 < 𝑥 < 𝑦 < 1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the marginal distribution function of X and Y
Definition: We say that the two-dimensional continuous random variable is uniformly
𝑐, 𝑓𝑜𝑟 (𝑥, 𝑦) ∈ 𝑅
distributed over a region R in the Euclidean plane if 𝑓(𝑥, 𝑦) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
39
Probability Lecture Note
Stat(2012)
∞ ∞
Because of the requirement ∫∞ ∫−∞ 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦 = 1,the above implies that the constant
1
equals . We are assuming that R is a region with finite, nonzero area. i.e𝑐 =
area (R)
1
𝑎𝑟𝑒𝑎(𝑅)
40
Probability Lecture Note
Stat(2012)
𝑃(𝑥) = ∑𝑦 𝑓(𝑥, 𝑦)in the discrete case.The density function for the marginal distribution of Y is
∞
found in a similar way; 𝑓(𝑦)is equal to either 𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 in the continuous case
Example:
41
Probability Lecture Note
Stat(2012)
1 1
𝑃(𝑦) = ∑𝑥 𝑃(𝑥, 𝑦) = 18 ∑2𝑥=1(2𝑥 + 𝑦) = 18 (2𝑦 + 6), y=1, 2
2. Suppose that the two-dimensional continuous random variable (X, Y) has joint pdf
𝑥𝑦
𝑥2 + , 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 2
given by𝑓(𝑥, 𝑦) = { 3
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Determine the conditional pdf of Xgiven that Y and the conditional pdf of Ygiven that X
Solution: To determine the conditional PDFs, we first evaluate the marginal pdf’s, which are
given by
∞ 2 𝑥𝑦 𝑥𝑦 2 2 2𝑥
❖ 𝑓(𝑥) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫0 (𝑥 2 + )𝑑𝑦 = (𝑥 2 𝑦 + ) | = 2𝑥 2 +
3 6 0 3
∞ 1 𝑥𝑦 𝑥3 𝑦𝑥 2 1 𝑦 1
❖ 𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 = ∫0 (𝑥 2 + )𝑑𝑥 = ( + )| = +
3 3 6 0 6 3
Hence,
𝑥𝑦
𝑓(𝑥,𝑦) 𝑥2 + 6𝑥 2 +2𝑥𝑦
𝑓(𝑥|𝑦) = = 𝑦 1
3
= , 0 ≤ 𝑦 ≤ 2,0 ≤ 𝑥 ≤ 1;
𝑓(𝑦) + 2+𝑦
6 3
𝑥𝑦
𝑓(𝑥,𝑦) 𝑥2 + 3𝑥+𝑦
𝑓(𝑦|𝑥 ) = = 2𝑥
3
= , 0 ≤ 𝑦 ≤ 2,0 ≤ 𝑥 ≤ 1
𝑓(𝑥) 2𝑥 2 + 3 6𝑥+2
𝑓(𝑥, 𝑦) = {
𝑒−𝑥(1+𝑦) , 0 ≤ 𝑥 < ∞, 0 ≤ 𝑦 < ∞
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Determine the conditional pdf of XgiventhatYand the conditional pdf of YgiventhatX.
Solution: To determine the conditional PDFs, we first evaluate the marginal pdf’s, which are
∞ ∞ ∞
given by 𝑓(𝑥) ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦=∫0 𝑥𝑒 −𝑥(1+𝑦) 𝑑𝑦 = 𝑥𝒆−𝒙 ∫0 𝑒 −𝑥𝑦 𝑑𝑦
42
Probability Lecture Note
Stat(2012)
−𝒙 𝑒 −𝑥𝑦 ∞
=𝑥𝒆 [− ]
𝒙 𝟎
=𝑒 −𝑥 , 0 ≤ x < ∞
∞ ∞
𝑓(𝑦) = ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑦=∫0 𝑥𝑒 −𝑥(1+𝑦) 𝑑𝑥
𝑒 −𝑥(𝑦+1)
Let u = x, which means that du = dx; and let dv = e−x(y+1)dx, which means that𝑉 = − .
𝑦+1
𝑓(𝑥, 𝑦) 𝑥𝑒 −𝑥(1+𝑦)
𝑓(𝑥|𝑦) = = 𝟐
= 𝑥(y + 1)2 𝑒 −𝑥(1+𝑦) , 0 ≤ 𝑥 < ∞
𝑓(𝑦) 1/(𝑦 + 1)
𝑓(𝑥, 𝑦) e−x(1+y)
𝑓(𝑦|𝑥) = = 𝑥 −x = 𝑥𝑒 −𝑥𝑦 , 0 ≤ 𝑦 < ∞
𝑓(𝑥) e
Independent random variables
Definition: Let (𝑥, 𝑦)denote a continuous bivariate random variable with joint pdf 𝑓(𝑥, 𝑦)
and marginal pdfs 𝑓(𝑥)and𝑓(𝑦). Then X and Y are called independent random variables if, for
every x ∈ X and y ∈ Y i.e.𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦) and Let (𝑥, 𝑦)denote a discrete bivariate random
variable with joint pmf𝑃(𝑥, 𝑦) and marginal pmf 𝑃(𝑥)and𝑃(𝑦). Then X and Y are called
independent random variables if, for every x ∈ X and y ∈ Y i.e.𝑃(𝑥, 𝑦) = 𝑃(𝑥)𝑃(𝑦).
𝑥+𝑦
, 𝑥 = 1, 2,3; 𝑦 = 1, 2
Example:𝑃(𝑥, 𝑦) = { 21 , then X and Y are not independent
0, otherwise
𝑦2 𝑥
𝑃(𝑥, 𝑦) = { 30 , 𝑥 = 1, 2,3; 𝑦 = 1, 2 , then X and Y are independent
0, otherwise
43
Probability Lecture Note
Stat(2012)
1. Suppose (X, Y) are discrete random variables with probability function given by
44
Probability Lecture Note
Stat(2012)
𝑓(𝑥,𝑦) 1
We obtain𝑓(𝑢, 𝑣) =
|𝐽(𝑥,𝑦)|
= |−2| 𝑓(𝑢+𝑣
2
, 𝑢−𝑣
2
) = 12 𝑓(𝑢+𝑣
2
, 𝑢−𝑣
2
)
Exercise:
1. Find 𝑓(𝑢, 𝑣)if U = 𝑥 2 +𝑦 2 and V= 𝑥 2
2. Let X and Y have the joint probability density function
3
𝑓(𝑥, 𝑦) = 𝑥 2 (1 − |𝑦|) , −1 < 𝑥 < 1 , −1 < 𝑦 < 1
2
Let 𝐴 = {(𝑥, 𝑦): 0 < 𝑥 < 1, 0 < 𝑦 < 𝑥}. Find the probability that (X, Y) falls into A
𝑥+𝑦
3. Let X and Y have the joint probability function 𝑝(𝑥, 𝑦) = , 𝑥 = 1,2,3 , 𝑦 = 1,2
21
A. Find the conditional probability function of 𝑋, given that 𝑌 = 𝑦.
B. 𝑓𝑖𝑛𝑑 𝑃(𝑋 = 2|𝑌 = 2)
C. Find the conditional probability function of 𝑌, given that 𝑋 = 𝑥.
4. Let X and Y have the joint probability density function
𝑓(𝑥, 𝑦) = 2 , 0≤ 𝑥≤𝑦≤1
A. Find the marginal probability density functions.
B. Find the conditional probability density function𝑓(𝑦|𝑥) 𝑎𝑛𝑑 𝑓(𝑥|𝑦)?
3 7 1
C. Calculate 𝑃 (4 ≤ 𝑌 ≤ 8 |𝑋 < 4).
45
Probability Lecture Note
Stat(2012)
CHAPTER SIX
EXPECTATION
The averaging process, when applied to a random variable is called expectation. It is denoted
by E(X) or and is read as the expected value of X or the mean value of X.
46
Probability Lecture Note
Stat(2012)
Example: A construction firm has recently sent in bids for 3 jobs worth (in profits) 10, 20, and
40 (thousand) dollars. If its probabilities of winning the jobs are respectively 0.2,0 .8, and 0.3,
what is the firm’s expected total profit?
Solution:LettingXi,i =1, 2, 3 denote the firm’s profit from jobi, then
Total profit=𝑋1 + 𝑋2 + 𝑋3
So𝐸(𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑓𝑖𝑡) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 )
= 10 ∗ 0.2 + 20 ∗ 0.8 + 40 ∗ 0.3
= 2 + 16 + 12
Therefore the firm’s expected total profit is 30 thousand dollars.
EXAMPLE: Let the random variable X be defined as follows. Suppose that X is the time (in
minutes) during which electrical equipment is used at maximum load in a certain specified
time period. Suppose that X is a continuous random variable with the following pdf;
1
𝑥, 0 ≤ 𝑥 ≤ 1500
(1500)2
𝑓(𝑥) = −1
(𝑥 − 3000), 1500 ≤ 𝑥 ≤ 3000
(1500)2
{ 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
+∞
Solution: Bydefinition,E(x)=∫−∞ 𝑥𝑓(𝑥)𝑑𝑥
+∞ 1500 3000
𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ 0 1500
1500 3000
1 −1
=∫ 𝑥 𝑥𝑑𝑥 + ∫ 𝑥 (𝑥 − 3000)𝑑𝑥
0 (1500) 2
1500 (1500)2
1500 3000
1 1
= 2∫ 𝑥 2 𝑑𝑥 − ∫ 𝑥(𝑥 − 3000)𝑑𝑥
(1500) 0 (1500)2 1500
1 𝑥 3 1500 1 𝑥3 𝑥 2 3000
= | − ( − 3000 )|
(1500)2 3 0 (1500)2 3 2 1500
=1500 minutes
47
Probability Lecture Note
Stat(2012)
x2 y
𝑃(𝑥, 𝑦) = { 30 , y = 1,2,3 𝑎𝑛𝑑 𝑥 = 1,2
0, otherwise
48
Probability Lecture Note
Stat(2012)
The positive square root of V(X) is called the standard deviation of X and is denoted by 𝜇𝑥 .
Proof:Expanding 𝐸(𝑥 − 𝐸(𝑥))2 and using the previously established properties for
expectation, we obtain
If X is a discrete random variable with expected value μ then the variance of X, denoted by
Var (X), is defined by:
Alternatively, Var(X) = ∑𝑛 2
𝑖=1(𝑥𝑖 − 𝜇𝑥 ) 𝑃(𝑥𝑖 )
49
Probability Lecture Note
Stat(2012)
symbol𝜇. Thus, the first moment about the origin characterizes the central tendencyof a
density function. Measures of spread and skewness of a density function aregiven by
certain moments about the mean
51
Probability Lecture Note
Stat(2012)
argument and then evaluated at t=0, generates moments ofXabout the origin. The function is
aptly called the moment-generating functionofX.
Definition: The expected value of𝑒 𝑡𝑥 is defined to be the moment-generating function of Xif
the expected value exists for every value of t in some open interval containing 0,
i.e. ∀𝑡 ∈ (−ℎ, ℎ), ℎ > 0.The moment generating function of X will be denoted byMx (t), and is
represented by:
∑𝑛𝑥=1 𝑒 𝑥𝑡 𝑃(𝑥) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Mx(t) =E(𝑒 ) ={ +∞ 𝑥𝑡
𝑥𝑡
∫−∞ 𝑒 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Note thatMx(0) =E(𝑒 2 ) =E(1)=1isalways defined, and from this property it is clear that a
function oftcannotbe a MGF unless the value of the function at t=0 is 1. The condition
thatMx(t) must be defined ∀t∈(-h,h) is a technical condition that ensuresMx(t) is
differentiable at the point zero, a property whose importance will become evident shortly.
52
Probability Lecture Note
Stat(2012)
We now indicate how the MGF can be used to generate moments about the origin. In the
𝑟
𝑑 𝑔(𝑎)
following theorem, we use the notation 𝑟
to indicate therth derivative ofg(x) with respect
𝑑𝑥
tox evaluated atx=a.
Theorem: Let X be a Random Variable for which the MGF, Mx(t),exists. Then
𝒅𝒓 𝑴𝒙(𝟎)
𝜇 𝑟 =E(𝑥 ) =
′ 𝑟
𝒅𝒕𝒓
Example: Suppose that X is binomially distributed with parameters n and p. Then the
moment generating function is defined as Mx(t)=[𝑝𝑒 𝑡 + 𝑞]𝑛 then find the 1st and the 2nd
moment then find the mean and variance of the binomial distribution.
Solution:
M’x(t)=𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡
M’’x(t)=𝑛𝑝2 𝑒 2𝑡 (𝑛 − 1)[𝑝𝑒 𝑡 + 𝑞]𝑛−2 𝑝𝑒 𝑡 + 𝑛[𝑝𝑒 𝑡 + 𝑞]𝑛−1 𝑝𝑒 𝑡
Therefore E(x)=M’x(0)=np and E(𝑥 2 )=M”x(0)=𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝
Var(x) =M”x(0) − [M’x(0)]2 = 𝑛𝑝𝑞
Theorem: Suppose that the random variable X has MGF Mx(t). Let Y= aX +b. ThenMy(t), the
Exercise:
2𝑥, 0 < 𝑥 < 1
1. Suppose that X has pdf given by𝑓(𝑥) = { .
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
A. Determine the MGF of X.
B. Using the MGF, evaluate E(X) and Var(X).
2. Suppose that the continuous random variable X has pdf
1 −|𝑥|
𝑓(𝑥) = {2 𝑒 , −∞ < 𝑥 < ∞
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
53
Probability Lecture Note
Stat(2012)
54
Probability Lecture Note
Stat(2012)
1
𝑃(|𝑋 − 𝜇| ≥ 𝜎𝑘) ≤ is the probability that the value of X lies at least k standard
𝑘2
1
deviations from its mean is at most where k is a positive number greater than 1.
k2
𝑽𝒂𝒓(𝒙)
Note: 𝑷(|𝑿 − 𝝁| < 𝒂) ≥ 𝟏 − 𝒘𝒉𝒆𝒓𝒆𝒂 = 𝝈𝒌
𝒂𝟐
Example:
1. a random variable X has the mean of 4 and variance of X is 2, then use chebyshev’s
𝟐
𝑷(|𝑿 − 𝟒| < 𝟑) ≥ 𝟏 −
𝟗
𝟕
𝑷(𝟏 < 𝒙 < 𝟕) ≥
𝟗
So the probability that some value of the random variable X will be between 1 and 7 is at
least 0.778
𝟏
, 𝟏<𝒙<𝟒
2. A random variable X has pdf, 𝒇(𝒙) = {𝟑 then use chebyshev’s inequality to
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
estimate 𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐)
Solution: By the definition of the chebyshev’s inequality
𝟏
𝑷(|𝑿 − 𝝁| < 𝝈𝒌) ≥ 𝟏 − is the upper bound
𝒌𝟐
𝟒 𝟒
𝟏 𝟏 𝟒 𝟓
𝑬(𝒙) = ∫ 𝒙𝒇(𝒙)𝒅𝒙 = ∫ 𝒙 𝒅𝒙 = 𝒙𝟐 | = = 𝟐. 𝟓
𝟏 𝟏 𝟑 𝟔 𝟏 𝟐
𝟒 𝟒
𝟏 𝟏 𝟒 𝟔𝟑
𝑬(𝒙𝟐 ) = ∫ 𝒙𝟐 𝒇(𝒙)𝒅𝒙 = ∫ 𝒙𝟐 𝒅𝒙 = 𝒙𝟑 | = =𝟕
𝟏 𝟏 𝟑 𝟗 𝟏 𝟗
𝟐𝟓 𝟑 √𝟑
Then 𝑽𝒂𝒓(𝒙) = 𝝈𝟐 = 𝑬(𝒙𝟐 ) − [𝑬(𝒙)]𝟐 = 𝟕 − = 𝒔𝒐 𝒕𝒉𝒂𝒕 𝝈 =
𝟒 𝟒 𝟐
𝟏 𝟐 𝟒
𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥ 𝟏 − But𝝈𝒌 = 𝟐 𝒕𝒉𝒆𝒏 𝒌 = →𝒌=
𝒌𝟐 𝝈 √𝟑
55
Probability Lecture Note
Stat(2012)
𝟑
Then 𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥ 𝟏 −
𝟏𝟔
𝟏𝟑
𝑷(|𝑿 − 𝟐. 𝟓| < 𝟐) ≥
𝟏𝟔
So the probability that some value of the random variable X will be between 0.5 and 4.5 is at
least 0.8125
Example: Suppose that it is known that the number of items produced in a factory during a
week is a random variable with mean 50.
A. What can be said about the probability that this week’s production will exceed 75?
B. If the variance of a week’s production is known to equal 25, then what can be
saidabout the probability that this week’s production will be between 40 and 60?
Solution: LetXbe the number of items that will be produced in a week:
𝐸(𝑥)
A. By Markov’s inequality;𝑃 (𝑥 > 75) ≤
75
50
𝑃(𝑥 > 75) ≤
75
2
𝑃(𝑥 > 75) ≤
3
𝜎2 1
B. By Chebyshev’s inequality;𝑃{|𝑥 − 50| ≥ 10} ≤ =
102 4
𝟏 𝟑
Hence𝑷(|𝑿 − 𝟓𝟎| < 𝟏𝟎) ≥ 𝟏 − ↔ 𝑷(|𝑿 − 𝟓𝟎| < 𝟏𝟎) ≥
𝟒 𝟒
So the probability that this week’s production will be between 40 and 60 is at least 0.75.
Exercise:From past experience, a professor knows that the test score of a student taking her
final examination is a random variable with mean 75.
A. Give an upper bound to the probability that a student’s test score willexceed
85.Suppose in addition the professor knows that the variance of a student’s test score
is equal to 25.
B. What can be said about the probability that a student will score between65 and 85?
C. How many students would have to take the examination so as to ensure,
withprobability at least0.9, that the class average would be within 5 of 75?
Joint Moment about the Origin
56
Probability Lecture Note
Stat(2012)
LetXandYbe two random variables having joint density functionf(x,y). Then the (𝑟, 𝑠)𝑡ℎ joint
moment of (X,Y) (or off(x,y)) about the origin is defined by
∑𝑛𝑥=1 𝑥𝑟 𝑦𝑠 𝑃(𝑥, 𝑦) , 𝑖𝑓 𝑥 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑟,𝑠 ={ +∞ 𝑥𝑟 𝑦𝑠 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥 , 𝑖𝑓
′
𝜇
∫−∞ 𝑥 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
Covariance and Correlation coefficient
Regarding joint moments, our immediate interest is on a particular joint moment about the
mean,𝜇1,1, and the relationship between this moment and moments about the origin. The
central moment𝜇1,1is given a special name and symbol, and we will see that 𝜇1,1is useful as a
measure of “linear association” betweenXandY.
Covariance: The central joint moment 𝜇1,1 = 𝐸(𝑥 − 𝐸(𝑥)𝐸(𝑦 − 𝐸(𝑦)) is called the
covariance betweenXandY, and is denoted by the symbol 𝜎𝑥𝑦 , or by𝑐𝑜𝑣 (𝑥, 𝑦).
Note that there is a simple relationship between 𝜎𝑥𝑦 and moments about the origin that can
1 7 7 −1
= − ∗ =
3 12 12 144
❖ If X and Y are independent random variable then𝑥, 𝜎𝑥𝑦 = 0,(assuming the covariance
exists)otherwise X and Y are dependent variable.
Example: Let X and Y be two random variables having a joint density function given by
Note this density implies that (x,y) points are equally likely to occur on and below the
parabola represented by the graph of 𝑦 = 𝑥 2 . There is a direct functional dependence
between Xand the range ofY, so that𝑓(𝑥|𝑦)will change as x changes and thus XandY must be
58
Probability Lecture Note
Stat(2012)
Correlationcoefficient tells us the degree of association and the direction of the linear
relationship between the random variables.
The correlation coefficientcomputed from the sample data measures the strength and
direction of a linear relationship between two variables.
The symbol for the sample correlation coefficient is r.
The symbol for the population correlation coefficient is ρ
The range of the correlation coefficient is from −1 to +1.
If there is a strong positive linear relationshipbetween the variables, the value of rwill
be close to +1.
If there is a strong negative linear relationship between the variables, the value of r
will be close to −1.
When there is no linear relationship between the variables or only a weak
relationship, the value of rwillbe close to 0.
Example: Let the bivariate random variable (X,Y) have a joint density function
𝑥 + 𝑦, 0 < 𝑥 < 1,0 < 𝑦 < 1
𝑓(𝑥, 𝑦) = { , then compute the correlation coefficient of X and Y?
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Solution:
59
Probability Lecture Note
Stat(2012)
60
Probability Lecture Note Stat(2012)
CHAPTER SVEN
Common Discrete probability Distributions and their Properties
Binomial distribution
Binomial distribution is one of the simplest and most frequently useddiscrete probability
distribution and is very useful in many practicalsituations involving either /ortypes of events.
Assumption of Binomial Experiment:
1. The procedure/experimenthas a fixed number of trials.
2. The trials must be independent. (The outcome of any individual trial doesn’taffect the
probabilities in the other trials.)
3. Each trial must have all outcomes classified into two categories. One of the outcomes is
labeled as Success and the other as Failure.
4. The probability of Successremains the same from trial to trial
The outcomes of the binomial experiment and the corresponding probability of these outcomes are
called Binomial distribution
Let X be the number of successes. Then X follows a binomialdistribution with parameters n, number
of experiments performed andp, probability of success, and write as X~Bin(n,p).
The probability mass function of Binomial distribution is given by:
61
Probability Lecture Note Stat(2012)
So that 𝐸(𝑥) = 𝑛𝑝
Using similar approach we can find𝐸(𝑥 2 ) = 𝐸(𝑥(𝑥 − 1) + 𝑥)
62
Probability Lecture Note Stat(2012)
Example:
1. Suppose a coin is tossed 10 times. What is the probability of getting?
A. Exactly 3 heads D. More than 3 heads
B. At most 3 heads E. No head
C. At least 3 heads
2. On the basis of past experience, the probability that a certain electrical component will be
satisfactory is 0.98. The components are sampled item by item from continuous production. In a
sample of five components, what are the probabilities of finding
A. five satisfactory, D. two or more satisfactory
B. exactly one satisfactory E. find mean and variance
C. exactly two satisfactory
Poisson distribution
The Poisson distribution is a discrete probability distribution that applies to occurrencesof some
event over a specified interval. The random variable x is thenumber of occurrences of the event in an
interval. The interval can be time, distance,area, volume, or some similar unit.
A discrete random variable X is called a Poisson random variable with parameterλ, where λ>0, if its
e−
x
PMF is given by: P(X=x)= ,x=0,1,2,3,……
x!
The symbol estands for a constant approximately equal to 2.7183. It is a famous constant
inmathematics, named after the Swiss mathematician L. Euler, and it is also the base of theso-called
natural logarithm.
The Poisson distribution has the following requirements:
• The random variable x is the number of occurrences of an event over some interval.
• The occurrences must be random.
• The occurrences must be independent of each other.
• The occurrences must be uniformly distributed over the interval being used.
63
Probability Lecture Note Stat(2012)
2. Var(x) = 𝛌
∞ 𝑥𝑛 𝑥
Note: ∑𝑛=0 =𝑒 , it is called maculoria series or exponential expanetion
𝑛!
Proof: First we can proof the moment generating function of the random variable X.
𝒆𝒕𝒙 𝒆−𝛌 𝛌𝒙 (𝛌𝒆𝒕 )𝒙 𝒆−𝛌
𝑴𝒙(𝒕) = 𝑬(𝒆𝒕𝒙 ) =∑∞
𝒙=𝟎 = ∑∞
𝒙=𝟎
𝒙! 𝒙!
−
(𝛌𝒆𝒕 )𝒙
∞
=𝒆 ∑
𝒙=𝟎 𝒙!
𝒕
= 𝒆−𝛌(𝒆 −𝟏)
64
Probability Lecture Note Stat(2012)
Exercise:From the moment generating function we find the expected values of r.v X and variance of
X by using the successively differentiating the Mx(t).
The mean and variance of the Poisson distribution easily determined as follows:
𝑒 −λ λ𝑥 𝑒 −λ λ𝑥−1
E(𝑥) = ∑∞
𝑥=0 𝑥 𝑥! = λ ∑∞
𝑥=1 𝑥 𝑥(𝑥−1)!
λ𝑥−1 λ λ2 λ3 λ4
=λ𝑒 −λ ∑∞
𝑥=1 = λ𝑒 −λ ⌈1 + + + + + ⋯⌉
(𝑥−1)! 1! 2! 3! 4!
=λ𝑒 −λ 𝑒 λ = λ
𝑒 −λ λ𝑥 2
Similarly; 𝐸(𝑥 2)
= ∑∞
𝑥=0 𝑥
2
= λ + λ, since 𝐸(𝑥 2 ) = 𝐸[𝑥(𝑥 − 1) + 𝑥]
𝑥!
2
So that 𝑉𝑎𝑟(𝑥) = 𝐸(𝑥2 ) − (𝐸(𝑥)) = λ
Example: Messages arrive at a switchboard in a Poisson manner at an average rate of six per hour.
Find the probability for each of the following events:
A. Exactly two messages arrive within one hour.
B. No message arrives within one hour.
C. At least three messages arrive within one hour
Solution: let X be the random variable that denotes the message arrives at switchboard since the
mean of the message arrive E(x) =6 per hour
A. The probability that exactly three messages arrive within one hour
e−
x
𝐞−𝟔 𝟔𝟑
P(X=3)= = =18e−6=0.0446
x! 𝟑!
e−
x
𝐞−𝟔 𝟔𝟎
P(X=0)= = =e−6=0.00248
x! 𝟎!
C. The probability that At least three messages arrive within one hour
P(X≥ 3) = 1-P(x<3)
= 1-[P(x=0)+P(x=1)+P(x=2)]
=1-[e−6+6e−6 + 18e−6 ]=1-25e−6 =0.938
1. The number of phone calls that arrive at a secretary’s desk has a Poisson distribution with a
mean of 4 per hour.
65
Probability Lecture Note Stat(2012)
e−
x
𝐞−𝟒 𝟒𝟎
P(X=0)= = =e−4=0.0183
x! 𝟎!
B. The probability that more than 2 calls arrive within a given hour
P(X≥ 2) = 1-P(x<2)
= 1-[P(x=0) + P(x=1)]
=1-[e−6+6e−6 + 18e−6 ]=1-25e−6 =0.938
=1-[e−4+4e−4 ]=1-5e−4 =0.9084
2. The number of typing mistakes that Ann makes on a given page has a Poisson distribution with
a mean of 3 mistakes.
A. What is the probability that she makes exactly 7 mistakes on a given page?
B. What is the probability that she makes fewer than 4 mistakes on a given page?
C. What is the probability that Ann makes no mistake on a given page?
Geometric Distribution
The geometric random variable is used to describe the number of Bernoulli trials until the first
success occurs. An experiment is said to be geometric experiment if it provides;
1. Each repetition is called trial.
2. For each trial there are two mutually exclusive out comes, success or failure.
3. The trials are independent.
4. The probability of success is the same for each trail of the experiment.
5. We repeat the trials until we get the success.
Let X be a random variable that denotes the number of Bernoulli trials until the first success. If the
first success occurs on the xth trial, then we know that the first x − 1 trials resulted in failures. Thus,
1. The most obvious difference is that the geometric distribution does not have a set number of
observation, n
2. The 2nd most obvious difference is the question being asked:
❖ Binomial distribution asks for the probability of a certain number of successes.
❖ Geometric distribution asks for the probability of the first success.
Properties of Geometric distribution
𝟏 𝒑𝒆𝒕
1. E(x)=𝒑 3. Mx(t) =𝟏−𝒒𝒆𝒕
𝒒
2. Var(x)=
𝒑𝟐
𝒂
Note: ∑∞
𝑥=0 𝑎𝑟 =a+𝑎𝑟 +𝑎𝑟 +𝑎𝑟 +a𝑟 +a𝑟 +……. =
𝑥 2 3 4 5 6
, where |r|<1
𝟏−𝐫
Proof: First we can proof the moment generating function of the random variable X.
Mx(t)=E(𝑒 𝑡𝑥 )=∑∞ 𝑡𝑥
𝑥=1 𝑒 𝑝(1 − 𝑝)
𝑥−1
=𝑝 ∑∞ 𝑡𝑥
𝑥=1 𝑒 (1 − 𝑝)
𝑥−1
= 𝑝 ∑∞ 𝑡𝑥 𝑥−1
𝑥=1 𝑒 𝑞 , since q=1-p
=p(𝑒 𝑡 + 𝑞𝑒 2𝑡 + 𝑞 2 𝑒 3𝑡 + 𝑞 3 𝑒 4𝑡 + 𝑞 4 𝑒 5𝑡 + 𝑞 5 𝑒 6𝑡 +………)
=p𝑒 𝑡 (1+𝑞𝑒 𝑡 + 𝑞 2 𝑒 2𝑡 + 𝑞 3 𝑒 3𝑡 + 𝑞 4 𝑒 4𝑡 + 𝑞 5 𝑒 5𝑡 +………)
=p𝑒 𝑡 {1+𝑞𝑒 𝑡 + (𝑒 𝑡 𝑞)2 + (𝑒 𝑡 𝑞)3 + (𝑒 𝑡 𝑞)4 + (𝑒 𝑡 𝑞)5 +………}
𝟏
=p𝑒 𝑡
𝟏−𝐪𝑒 𝑡
𝐩𝑒 𝑡
=
𝟏−𝐪𝑒 𝑡
q
Note: Sometimes the pdf of X is given in the form: P( X = x) = pq x , x=0,1,2,…; then E ( X ) = ,
p
p 𝐩𝑞𝑒 𝑡
Var( X ) = 2 and𝑀𝑋 (𝑡) = 𝐸 [𝑒 𝑡𝑥 ] =
q 𝟏−𝐪𝑒 𝑡
From the moment generating function we find the expected values of r.v X and variance of X by
using the successively differentiating the Mx(t).
Exercise: find theexpected values of r.v X and variance of X
• 𝜇 = 𝐸(𝑥) = ∑∞
𝑥=1 𝑥𝑝𝑞
𝑥−1
= 𝑝 ∑∞
𝑥=1 𝑥𝑞
𝑥−1
67
Probability Lecture Note Stat(2012)
𝑑 𝑑
=𝑝 ∑∞
𝑥=1 𝑞 𝑥 ,Since 𝑞 𝑥 = 𝑥𝑞 𝑥−1
𝑑𝑞 𝑑𝑞
𝑑
=𝑝 ∑∞
𝑥=1 𝑞
𝑥
𝑑𝑞
𝑑 𝑞 1
=𝑝 ⌈ ⌉=
𝑑𝑞 1−𝑞 𝑝
𝑑2
• 𝜎 2 = 𝑉𝑎𝑟(𝑋) = 𝐸(𝑥 2 ) − [𝐸(𝑥)]2 ,since𝑝𝑞 ∑∞
𝑥=1 𝑥(𝑥 − 1)𝑞
𝑥−2
= 𝑝𝑞 ∑∞
𝑥=1 𝑞𝑥
𝑑𝑞
1 2
=∑∞ 2
𝑥=1 𝑥 𝑝𝑞
𝑥−1
−( )
𝑝
1
=𝑝 ∑∞ 2 𝑥−1
𝑥=1 𝑥 𝑞 −
𝑝2
𝑞
=
𝑝2
Example:
1. A manufacturer uses electrical fuses in an electronic system. The fuses are purchased in large lots
and tested sequentially until the first defective fuse is observed. Assume that the lot contains 5%
defective fuses.
A. What is the probability that the first defective fuse will be one of the first five tested?
B. Find the mean, variance, and standard deviation for X, the number of fuses tested until the
first defective fuse is observed.
Solution:Given: p=0.05, q=0.95
A. The probability that the first defective fuse will be one of the first five
68
Probability Lecture Note Stat(2012)
1 𝑞
𝐸(𝑥) = = 20, 𝑉𝑎𝑟(𝑥) = 2 = 380 𝑎𝑛𝑑 𝑠. 𝑑 = √𝑉𝑎𝑟(𝑥) = 19.49
𝑝 𝑝
2. A fair die is rolled repeatedly until a 6 appears.
A. What is the probability that the experiment stops at the fourth roll?
B. Given that the experiment stops at the third roll, what is the probability that sum of all the
three rolls are at least 12?
3. Suppose X has a geometric distribution with p = 0.1. Find:
A. P(X = 7) E. P(7 ≤X ≤10)
B. P(X = 10) F. Mean of the r.v X
C. P(X ≤3) G. Variance of r.v X
D. P(X > 5)
69
Probability Lecture Note Stat(2012)
CHAPTER EIGHT
Common Continuous Distributions and their properties
Uniform (Rectangular) distribution
A continuous random variable X has a uniform distributionover an interval a tob(b >a) if it is
equally likely to take on any value in this interval. The probability density function (pdf) of X is
constant over interval (a, b) and has the form
1
, 𝑎≤𝑥≤𝑏
𝑓(𝑥) = {𝑏−𝑎 ,−∞ < 𝑎 < 𝑏 < ∞
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The probability that X lies in any subinterval of [a, b] is equal to the length of that subinterval
divided by the length of the interval [ , b]. This follows since when [a, b] is a subinterval of [c, d]
1 b c−d
P (c X d ) =
b−a a
dx =
b −
The term “uniform” is justified by the fact that intervals of equal length in (a, b) are assigned the
same probability regardless of their location. The notation used for such a distribution is U(a, b) or R(
, b), and the expected mean ,variance and moment generating function of uniform distribution is
given by:
a+b ebt −eat
E(x) = Mx(t) =
2 t(b−a)
(a−b)2
Var(x) =
12
70
Probability Lecture Note Stat(2012)
Note: A uniform random variable X is often used where we have no prior knowledge of the actual
pdf and allcontinuous values in some range seem equally likely
Example:IfXis uniformly distributed over the interval [0, 10], compute the probability that
A. 2 <X <9,B. 1 <X <4,C.X <5 D.X >6.
Answer: The respective answers are (a) 7/10, (b) 3/10, (c) 5/10, (d) 4/10.
Normal distribution
Let the random variable X denote the ideal symmetric distribution with mean
μandstandarddeviation𝝈. This distribution is known as the normal distribution. Statisticians have
foundthat the probability density function of such a distribution is given by the function
−(𝑥−𝜇)2
1
f(x)= 𝑒 2𝜎2 , −∞ <x<∞and we write X~N(μ, 𝜎2),
𝜎 √2𝜋
A normal distribution is a continuous probability distribution for a random variable x. The graph of
a normal distribution is called the normal curve. A normal distribution has the following properties.
1. The mean, median, and mode are equal.
2. The normal curve is bell shaped and is symmetric about the mean.
3. The total are under the normal curve is equal to one.
4. The normal curve approaches, but never touches, the x-axis as it extends farther and farther
away from the mean.
71
Probability Lecture Note Stat(2012)
5. Between 𝜇 − 𝜎and𝜇 + 𝜎 (in the center of curve) the graph curves downward. The graph curves
upward to the left of 𝜇 − 𝜎and to the right𝜇 + 𝜎. The points at which the curve changes from
curving upward to curving downward are called inflection points.
6. The Empirical Rule:
❖ Approximately 68% of the area under the normal curve is between 𝜇 − 𝜎and𝜇 + 𝜎.
❖ Approximately 95% of the area under the normal curve is between 𝜇 − 2𝜎and 𝜇 + 2𝜎.
❖ Approximately 99.7% of the area under the normal curve is between 𝜇 − 3𝜎 and 𝜇 + 3𝜎.
The normal density f (x) is a bell-shaped curve that is symmetric about μ and that attains its
maximum value at x= μ. In practice, many random phenomena obey, at least approximately, a
normal probability distribution. Because the normal probability density function is symmetrical,
themean, median and mode coincide at x = μ.
Thus, the value of μdetermines the location of the center of the distribution, and the value of
σ2determines its spread.
An important fact about normal random variables is that if X is normal with mean μ and variance σ2,
then Y = αX + β is normal with mean αμ + β and variance α2𝜎2.
The uses of normal distribution
• Many things actually are normally distributed, or very close to it. For example, height and
intelligence are approximately normally distributed; measurement errors also often have a
normal distribution
• The normal distribution is easy to work with mathematically. In many practical cases, the
methods developed using normal theory work quite well even when the distribution is not
normal.
• There is a very strong connection between the size of a sample N and the extent to which a
sampling distribution approaches the normal form. Many sampling distributions based on
large N can be approximated by the normal distribution even though the population
distribution itself is definitely not normal.
x−
It follows from the foregoing that if X~N(μ, σ2) then Z = is a normal random variable with mean
0 and variance 1. Such a random variable Z issaid to have a standard, or unit, normal distribution.
72
Probability Lecture Note Stat(2012)
The distribution function of the N(0, 1)-distribution is usually denoted by ; i.e., if Z~N(0, 1), then:
x 1 −t 2 2
P( X x) = ( x) = e dt , −∞ <x<∞
− 2
Calculations of probabilities of the form P(a <X <b) for −∞ ≤ a ≤ b <∞ are done through two steps:
First, turn the r.v. X~N(μ, σ2) into a N(0, 1)-distributed r.v., or, as we say, standardize it, and then use
available tables,the Normal tables. For instance, to obtain P{X <b}, we note that X will be less than b
if and only if (X − μ)/σ is less than (b − μ)/σ, and so
x− b− b−
P ( x b) = P =
Similarly, for any a <b,
𝑎−𝜇 𝑥−𝜇 𝑏−𝜇 𝑎−𝜇 𝑏−𝜇
𝑃(𝑎 < 𝑥 < 𝑏) = 𝑃 ( < < ) = 𝑃( <𝑍< )
𝜎 𝜎 𝜎 𝜎 𝜎
𝑏−𝜇 𝑎−𝜇 𝑏−𝜇 𝑎−𝜇
𝑃 (𝑍 < ) − 𝑃 (𝑍 < ) = ∅( )−∅( )
𝜎 𝜎 𝜎 𝜎
While the normal table tabulates (x) only for nonnegative values of x, we can also obtain (−x)
from the table by making use of the symmetry (about 0) of the standard normal probability density
function. That is, for x>0, if Z represents a standard normal randomvariable, then
(− x) = P(Z − x) = P(Z x) By symmetry
= 1 − ( x) Thus, for instance,
P{Z <−1} = (−1) = 1 − (1) = 1 − .8413 = .1587
The mean, variance and moment generating function of the Normal distribution
If X~N(μ, σ2) then;
E(X) = μ 𝛿2 𝑡 2
Mx(t)=𝑒 𝜇𝑡+2
Var(X) = σ2
Proof: First we can proof the moment generating function of the random variable X.
−(𝑥−𝜇) 2
𝑡𝑥 ∞ 𝑡𝑥 1
Mx(t)=E(𝑒 )=∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥
𝜎 √2𝜋
−(𝑥−𝜇) 2
∞ 𝑡𝑥 1
=∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥 but 𝑒 𝑡𝑥 =𝑒 𝑡𝑥−𝑡𝜇+𝑡𝜇
𝜎√2𝜋
−(𝑥−𝜇) 2
∞ 1
=∫−∞ 𝑒 𝑡𝑥−𝑡𝜇+𝑡𝜇 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋
73
Probability Lecture Note Stat(2012)
−(𝑥−𝜇) 2
𝑡𝜇 ∞ 𝑡(𝑥−𝜇) 1
=𝑒 ∫−∞ 𝑒 𝑒 2𝜎2 𝑑𝑥
𝜎 √2𝜋
2 2
−[(𝑥−𝜇) −2𝜎 𝑡(𝑥−𝜇)]
𝑡𝜇 ∞ 1
= 𝑒 ∫−∞ 𝑒 2𝜎2 𝑑𝑥
𝜎√2𝜋
But (x − 𝜇 − 𝜎 2 𝑡) 2 =(𝑥 − 𝜇)2 − 2𝜎 2 𝑡(𝑥 − 𝜇)+𝜎 4 𝑡 2 -𝜎 4 𝑡 2
−[(𝑥−𝜇)2 −2𝜎2 𝑡(𝑥−𝜇)+𝜎4 𝑡2 −𝜎4 𝑡2 ]
𝑡𝜇 ∞ 1
=𝑒 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
2
𝜎2 𝑡2 −(𝑥−𝜇−𝜎2 𝑡)
∞ 1
= 𝑒 𝑡𝜇 𝑒 2 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
2
𝜎2 𝑡2 −(𝑥−𝜇−𝜎2 𝑡)
𝑡𝜇+ 2 ∞ 1
=𝑒 ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥
𝜎2 𝑡2
𝑡𝜇+
=𝑒 2
2
−(𝑥−𝜇−𝜎2 𝑡)
∞ 1
Since ∫−∞ 𝜎√2𝜋 𝑒 2𝜎2 𝑑𝑥 =1, since with mean𝝁 + 𝝈𝟐 𝒕
𝜎2 𝑡 2
𝑡𝜇+
Therefore:Mx(t)= 𝑒 2
Example
1. If X is a normal random variable with mean μ = 3 and varianceσ2= 16, find
A.P{X <11};B.P{X >−1};C.P{2 <X <7}
Answer:A.P{X <11} = (2) = .9772 B. P{X >−1}=P{Z >−1}= P{Z <1}= .8413
C.P {2 <X <7}= (1) − (−1/4) = (1) − (1 − (1/4)) = .8413 + .5987 − 1 = .4400
7. Find the area the normal distribution curve
A. P(0<Z<2.34) D. P(Z<-1.93)
B. P(-1.75<Z<0) E. P(-1.37<Z<1.68)
C. P(Z>1.11) F. P(2<Z<2.47)
Answer A.0.4904 B.0.4599 C.0.1335 D. 0.0268 E.0.8682 F. 0.0160
8. Find the Za-values such that the area of under normal distribution curve
A. P(0<Z<Za)=0.2123
B. P(0<Z<Za) = 0.45
Answer A. Za=0.56B.Za=1.645
74
Probability Lecture Note Stat(2012)
Exercise: The mean weight of 200 students in a certain college is 140 lbs, and the standard deviation
is 10 lbs. If we assume that the weights are normally distributed, evaluate the following:
A. The expected number of students that weight between 110 and 145 lbs.
B. The expected number of students that weight less than 120 lbs.
C. The expected number of students that weigh more than 170 lbs.
Exponential distribution
A continuous random variable whose probability density function is given, for someλ> 0, by
f ( x ) = e − x , x≥0
is said to be an exponential random variable (or, more simply, is said to be exponentially
distributed) with parameter λ.
The cumulative distribution function F (x) of an exponential random variable is given by
F ( x) = P( X x ) = e −y dy = 1 − e −x , x ≥ 0
x
The exponential distribution often arises, in practice, as being the distribution of the amount of time
until some specific event occurs. For instance, the amount of time (starting from now) until an
earthquake occurs, or until a new war breaks out, or until a telephone call you receive turns out to be
a wrong number are all random variables that tend in practice to have exponential distributions.
The mean, variance and moment generating function of the exponential distributions are
1 1
E( X ) = and Var ( X ) = Mx(t) =
2
−𝑡
Example:
1. Assume that the length of phone calls made at a particular telephone booth is exponentially
distributed with a mean of 3 minutes. If you arrive at the telephone booth just as Chris was about
to make a call, find the following:
A. The probability that you will wait more than 5 minutes before Chris is donewith the call.
B. The probability that Chris’ call will last between 2 minutes and 6 minutes
Solution: Let X be a random variable that denotes the length of calls made at the telephone booth.
Since the mean length of calls is 1/λ = 3, we have that the PDF of X is given by
75
Probability Lecture Note Stat(2012)
A. The probability that you will wait more than 5 minutes is the probability thatXis greater than
5 minutes, which is givenby
B. The probability that the call lasts between 2 and 6 minutes is given by
2. The lifetime of an automobile battery is described by ar.v. X having the Exponential distribution
1
with parameter = . Then:
3
A. Determine the expected lifetime of the battery and the variation around this mean.
B. Calculate the probability that the lifetime will be between 2 and 4 time units.
Answer: (a) E(x)=3 and Var(x)=9
Exercise:
1. The life time X of a system in weeks is given by the following pdf:
76
Probability Lecture Note Stat(2012)
77