5 Prob - Overview
5 Prob - Overview
ctx 1
OVERVIEW OF PROBABILITY
1. Statistical Experiment: Experiment with uncertain outcomes, e.g., flipping a coin, height of a
student randomly drawn from NTUST, no. of defectives found in 10 inspections, no. of inspections
required to observe 3 defectives.
2. Sample Space S: Set of all possible outcomes of a statistical experiment, e.g., {head, tail},
{h (cm)|155 ≤ h ≤ 187}, {x|x = 0, 1, 2, . . . , 10}, {y|y = 3, 4, . . .}.
3. Event E: subset of sample space (i.e. special sample), usually expressed in terms of a random
variable, e.g., {H ≤ 165}, {X = 6}, {Y ≥ 40}.
4. Probability Distribution Function of a random variable: Since random variable X varies from
sample to sample, to study (or describe) the behavior of X, we need the probability distribution
function of X — (a) What are the possible realizations of X? (b) How often does each realization
occur?
5. Two components of the probability distribution function of a random variable X:
(a) Possible realizations of X.
(b) P(X = a) for each possible a.
For example, if X is the no. of defectives found in 10 inspections then
(a) Possible realizations of X are 0, 1, 2, . . . , 10.
(b) Need to find P(X = 0), P(X = 1), P(X = 2), . . . , P(X = 10). For simplicity, these probabilities
often are expressed by a probability distribution function
10 a
fX (a) ≡ P(X = a) = p (1 − p)10−a , a = 0, 1, 2, . . . , 10.
a
8. Discrete random variables: (Realizations of a discrete random variable are usually integers.)
(a) (Binomial distribution) X: When inspecting n items, X defectives are found. Possible
realizations of X are x = 0, 1, 2, . . . , n.
(b) (Neg-binomial distribution) Y : When inspecting the items one by one, the 3rd defective
appears on the Y th inspection. Possible realizations of Y are y = 3, 4, . . ..
(c) (Geometric distribution) G: When inspecting the items one by one, the 1st defective
appears on the Gth inspection. Possible realizations of G are g = 1, 2, . . ..
(d) (Hypergeometric distribution) H: When randomly drawing without replacement n fish
from a pond consisting of N fish of which k fish are tagged, H fish are found tagged. Possible
realizations h of H are max{0, n − (N − k)} ≤ h ≤ min{n, k}.
(e) (Poisson distribution) P : Assuming that the arrival rate of customers is λ, P customers
arrive during a period of length t. Possible realizations of P are p = 0, 1, 2, . . ..
9. Continuous random variables: (Realizations of a continuous random variable are usually real
numbers.)
(a) (Normal distribution):
It is used for making inference about the population mean when the population variance is
known.
i. Central Limit Theorem: (X1 + · · · + Xn ) ∼ normal, X̄ = i=1 Xi /n ∼ normal
n→ ∞ Pn n →∞
ii. Standardization: If X ∼ N (µ, σ 2 ) then X−µ
σ ∼ N (0, 1).
iii. Let Z denote the standard normal distribution.
A. Φ(a): Φ(a) = P(Z ≤ a)
B. zα : P(Z ≥ zα ) = α = P(Z ≤ −zα ) (The probability distribution function of Z is symmetric about
0.)
C. • P(|Z| ≤ 1) ≈ 68%
• P(|Z| ≤ 2) ≈ 95%
• P(|Z| ≤ 1) ≈ 99.7%
(b) (Exponential distribution) If a continuous random variable X has density f (x) = λe −λx , x >
0, then X follows an exponential distribution with failure rate λ.
i. E(X) = 1/λ, Var(X) = 1/λ2 , and coefficient of variation CV≡ σ/µ = 1.
ii. Memoryless property: If X ∼ exponential(λ), then
that is, the lightbulb which has lit for length a (i.e. it is old but still functioning) is not
different from a brand-new lightbulb since its remaining-life distribution is the same as
the life-distribution of a brand-new one.
iii. If X ∼ exponential(λx ), Y ∼ exponential(λy ), and X and Y are statistically independent,
then
A. P(X < Y ) = λxλ+λ x
y
B. min{X, Y } ∼ exponential(λx + λy ).
C. (X − Y )X > Y ∼ exponential(λx ) and (Y − X)Y > X ∼ exponential(λy ).
(c) (Chi-squared distribution): It is used for making inference about the population variance
of a normal distribution.
(d) (Student-t distribution): It is used for making inference about the population mean of a
normal distribution when the population variance is unknown.
(e) (F distribution): It is used for making inference about the variance ratio of two normal
populations.
5-prob.overview.ctx 3
100
80
W
60
40
0.02
0.01
0.00
0.02
0.01
0.00
40 60 80 100
(a) Intuitively Hi and Wi are not independent. Why? ∵ P(W ≥ 80|H ≥ 180) > P(W ≥ 80)
5-prob.overview.ctx 4
i. P(H ≥ 180, W ≥ 80) ≈ 0.0210 joint distribution of H and W
ii. P(H ≥ 180) ≈ 0.0229 marginal distribution of H
iii. P(W ≥ 80) ≈ 0.1576 marginal distribution of W
iv. P(W ≥ 80|H ≥ 180) ≈ 0.9137 conditional distribution of W (given H ≥ 180)
compared with P(W ≥ 80) ≈ 0.1576
P(W ≥80,H≥180)
v. How to calculate P(W ≥ 80|H ≥ 180)? = P(H≥180)
(Question: Given P(W ≥ 80|H ≥ 180) and P(H ≥ 180), how to calculate P(W ≥ 80, H ≥ 180)? )
vi. If H and W are independent then P(W ≥ 80|W ≥ 180) = P(W ≥ 80|H < 160) =
P(W ≥ 80).
(b) Theorem
of Total Probability: How to calculate P(W ≥
80)?
Given P(W ≥ 80|group i) and P(group i), how to calculate P(W ≥ 80)?
P(H≥180,W ≥80)
i. P(H ≥ 180|W ≥ 80) = P(W ≥80)
ii. P(W ≥ 80) =?
iii. P(H ≥ 180, W ≥ 80) =?
5-prob.overview.ctx 5
(d) E(W ) = E E(W |H) :
Given the average weight for each group, how to calculate the average weight?
E Var(W |H) + Var E(W |H) = 55.3759 + 43.7652 ≈ 99.1411, Var(W ) ≈ 99.7493
11. Joint distribution of (X, Y ): For discrete X and Y , fX,Y (a, b) ≡ P(X = a, Y = b) for all (a, b)
which describes the behavior of (X, Y ) is called the joint distribution of (X, Y ). For example,
(Xi , Yi ) are the amounts of money in the left and right pockets of person i.
P P
And fX (a) ≡ P(X = a) = P(X = a, −∞ < Y < ∞) = b P(X P = a, Y = b) = b fX,Y (a, b),
for all a, is called the marginal distribution of X. So fY (b) = a fX,Y (a, b), for all b, is the
marginal distribution of Y .
P P P
(a) E(X) = a a · P(X = a) = a a · fX (a), E(Y ) = b b · fY (b)
P P
(b) E(X + Y ) = (a,b) (a + b) · fX,Y (a, b), E g(X, Y ) = (a,b) g(a, b) · fX,Y (a, b)
(c) Var(X) ≡ E (X − µX )2 = E(X 2 ) − µX2
2
(d) Var g(X, Y ) = E g(X, Y ) − E[g(X, Y )] = E g 2 (X, Y ) − µg(X,Y
2
)
| {z }
Z | {z } | {z }
2 E(Z 2 )−µZ2
E Z−µZ
5-prob.overview.ctx 6
2
(e) Var(X − Y ) = E (X − Y ) − E(X − Y )
2 2
= E X − µX + E Y − µY − 2E X − µX Y − µY
= Var(X) + Var(Y ) − 2Cov(X, Y )
12. Cov(X, Y ) = E (X − µX )(Y − µY )
13. In addition to understanding the random behavior of X from the probability distribution function
fX (a), we need fX (a) for making statistical inference, where X is the sample statistic related to the
population parameter of interest, e.g., sample mean X̄ when making inference on the population
mean µ, no. of defectives X found in n inspections when making inference on the defective rate p.
14. Statistical Inference: Reject some hypothesis H about some population parameter or not.
For example, do you accept the hypothesis H : p = 10% with p being the defective rate of some
production line?
15. Population: set of all objects of interest, e.g., set of all products from a production line.
16. Population Parameter: characteristic of the population (often serves as the system performance
measure), e.g., defective rate p of the production line (which is defined to be the proportion of
defectives in the lot)
17. Sample: subset of the population, e.g., 10 products randomly drawn for inspection
P10
18. Sample Statistic: summarized number of the sample observations, e.g., T = i=1 Xi denotes
the total number of defectives found in 10 inspections with Xi = 1 if the ith product is a defective
and Xi = 0 otherwise. Since the sample statistic T varies from sample to sample, it is a random
variable and has its own distribution called sampling distribution (of T ).
19. How to make statistical inference? How to determine if we should reject H : p = 10% when
observing T = 3?
20. If the observed T is significantly different from the expected T under H then we tend to reject
H. The observed is t = 3 and the expected is E(T |H) = 1. Suppose that 9 heads appeared when
flipping the coin 10 times. Do you believe that the coin is fair?
Answer: Find the proportion of T (when H is true) that is greater than or equal to 3.
In order to calculate P(T ≥ 3|H), we need the probability distribution function of T (sampling
distribution of T ) when H is true.
5-prob.overview.ctx 8
set.seed(1235)
N=5000
SIG.H=10; SIG.W=10
MU.H=160; MU.W=70
R=.8
E=SIG*rnorm(N)
W=A+B*H+E
# end generation
mean(H); sd(H)
mean(W); sd(W)
V.W.given.H.160=sum(W.given.H.160^2)/N3-E.W.given.H.160^2; V.W.given.H.160
100
80
W
60
40
60
40
60
40