0% found this document useful (0 votes)
55 views10 pages

5 Prob - Overview

1. The document provides an overview of key concepts in probability, including statistical experiments, sample spaces, events, probability distributions, discrete and continuous random variables, and their characteristics such as mean, variance, and standard deviation. 2. Specific probability distributions discussed include binomial, negative binomial, geometric, hypergeometric, Poisson, normal, exponential, chi-squared, Student's t, and F distributions. 3. An example is given of a joint distribution of height and weight for 5000 people where height and weight are normally distributed with a correlation of 0.8.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views10 pages

5 Prob - Overview

1. The document provides an overview of key concepts in probability, including statistical experiments, sample spaces, events, probability distributions, discrete and continuous random variables, and their characteristics such as mean, variance, and standard deviation. 2. Specific probability distributions discussed include binomial, negative binomial, geometric, hypergeometric, Poisson, normal, exponential, chi-squared, Student's t, and F distributions. 3. An example is given of a joint distribution of height and weight for 5000 people where height and weight are normally distributed with a correlation of 0.8.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

5-prob.overview.

ctx 1

OVERVIEW OF PROBABILITY

1. Statistical Experiment: Experiment with uncertain outcomes, e.g., flipping a coin, height of a
student randomly drawn from NTUST, no. of defectives found in 10 inspections, no. of inspections
required to observe 3 defectives.
2. Sample Space S: Set of all possible outcomes of a statistical experiment, e.g., {head, tail},
{h (cm)|155 ≤ h ≤ 187}, {x|x = 0, 1, 2, . . . , 10}, {y|y = 3, 4, . . .}.
3. Event E: subset of sample space (i.e. special sample), usually expressed in terms of a random
variable, e.g., {H ≤ 165}, {X = 6}, {Y ≥ 40}.
4. Probability Distribution Function of a random variable: Since random variable X varies from
sample to sample, to study (or describe) the behavior of X, we need the probability distribution
function of X — (a) What are the possible realizations of X? (b) How often does each realization
occur?
5. Two components of the probability distribution function of a random variable X:
(a) Possible realizations of X.
(b) P(X = a) for each possible a.
For example, if X is the no. of defectives found in 10 inspections then
(a) Possible realizations of X are 0, 1, 2, . . . , 10.
(b) Need to find P(X = 0), P(X = 1), P(X = 2), . . . , P(X = 10). For simplicity, these probabilities
often are expressed by a probability distribution function
 
10 a
fX (a) ≡ P(X = a) = p (1 − p)10−a , a = 0, 1, 2, . . . , 10.
a

6. Permutation and Combination:


(a) (Permutation) Number of arrangements of n people is n · (n − 1) · (n − 2) · · · · · 2 · 1 ≡ n!.
What is the number of arrangements of 5 white balls? What is the number of arrangements
of 3 white balls and 2 black balls?
(b) (Combination)  Number of arrangements
 for separating n people into two groups of sizes k
n

and (n − k) is n! k!(n − k)! ≡ k .

7. Characteristics of a probability distribution function of X:


Let c and d be constants.
P
(a) Mean: E(X) = µ ≡ a a · P(X = a) is the center of gravity of the probability distribution
function. E(cX + d) = cE(X) + d
 
P
(b) Variance: Var(X) = E (X − µ)2 = σ 2 ≡ a (a − µ)2 · P(X = a) reflects the dispersion
realizations of X. If Var(X) = 0 then all realizations are the same. Var(cX + d) = c 2 Var(X)

(c) Standard deviation: σ ≡ σ 2 has the same dimension as X and is usually used as the
measurement scale, e.g., (Chebyshev’s inequality: p P(|X − µ| ≤ kσ) ≥ p1 − 1/k 2 )
P(|X − µ| ≤ 2σ) ≥ 3/4 and P(|X − µ| ≤ 3σ) ≥ 8/9. Var(cX + d) = |c| Var(X)

If X follows a normal distribution then


i. P(|X − µ| ≤ σ) ≈ 68%,
ii. P(|X − µ| ≤ 2σ) ≈ 95%,
iii. P(|X − µ| ≤ 3σ) ≈ 99.7%.
5-prob.overview.ctx 2

8. Discrete random variables: (Realizations of a discrete random variable are usually integers.)
(a) (Binomial distribution) X: When inspecting n items, X defectives are found. Possible
realizations of X are x = 0, 1, 2, . . . , n.
(b) (Neg-binomial distribution) Y : When inspecting the items one by one, the 3rd defective
appears on the Y th inspection. Possible realizations of Y are y = 3, 4, . . ..
(c) (Geometric distribution) G: When inspecting the items one by one, the 1st defective
appears on the Gth inspection. Possible realizations of G are g = 1, 2, . . ..
(d) (Hypergeometric distribution) H: When randomly drawing without replacement n fish
from a pond consisting of N fish of which k fish are tagged, H fish are found tagged. Possible
realizations h of H are max{0, n − (N − k)} ≤ h ≤ min{n, k}.
(e) (Poisson distribution) P : Assuming that the arrival rate of customers is λ, P customers
arrive during a period of length t. Possible realizations of P are p = 0, 1, 2, . . ..
9. Continuous random variables: (Realizations of a continuous random variable are usually real
numbers.)
(a) (Normal distribution):
It is used for making inference about the population mean when the population variance is
known.
i. Central Limit Theorem: (X1 + · · · + Xn ) ∼ normal, X̄ = i=1 Xi /n ∼ normal
n→ ∞ Pn n →∞

 
ii. Standardization: If X ∼ N (µ, σ 2 ) then X−µ
σ ∼ N (0, 1).
iii. Let Z denote the standard normal distribution.
A. Φ(a): Φ(a) = P(Z ≤ a)
B. zα : P(Z ≥ zα ) = α = P(Z ≤ −zα ) (The probability distribution function of Z is symmetric about
0.)
C. • P(|Z| ≤ 1) ≈ 68%
• P(|Z| ≤ 2) ≈ 95%
• P(|Z| ≤ 1) ≈ 99.7%
(b) (Exponential distribution) If a continuous random variable X has density f (x) = λe −λx , x >
0, then X follows an exponential distribution with failure rate λ.
i. E(X) = 1/λ, Var(X) = 1/λ2 , and coefficient of variation CV≡ σ/µ = 1.
ii. Memoryless property: If X ∼ exponential(λ), then

P(X ≥ a + b|X ≥ a) = P(X ≥ b) , for any (a, b);


| {z }
=P(X≥0+b|X≥0)

that is, the lightbulb which has lit for length a (i.e. it is old but still functioning) is not
different from a brand-new lightbulb since its remaining-life distribution is the same as
the life-distribution of a brand-new one.
iii. If X ∼ exponential(λx ), Y ∼ exponential(λy ), and X and Y are statistically independent,
then
A. P(X < Y ) = λxλ+λ x
y

B. min{X, Y } ∼ exponential(λx + λy ).
   

C. (X − Y ) X > Y ∼ exponential(λx ) and (Y − X) Y > X ∼ exponential(λy ).

(c) (Chi-squared distribution): It is used for making inference about the population variance
of a normal distribution.
(d) (Student-t distribution): It is used for making inference about the population mean of a
normal distribution when the population variance is unknown.
(e) (F distribution): It is used for making inference about the variance ratio of two normal
populations.
5-prob.overview.ctx 3

10. Let (Hi , Wi ) denote the height and weight of person i.

Joint distribution of 5000 pairs of (H,W)


H ~ N(160,10^2), W ~ N(70, 10^2)
r=0.8

100
80
W

60
40

120 140 160 180 200

Marginal distribution of 5000 H


H ~ N(160,10^2)
0.04
0.03
Density

0.02
0.01
0.00

120 140 160 180 200

N = 5000 Bandwidth = 1.637

Marginal distribution of 5000 W


W ~ N(70,10^2)
0.04
0.03
Density

0.02
0.01
0.00

40 60 80 100

N = 5000 Bandwidth = 1.644

(a) Intuitively Hi and Wi are not independent. Why? ∵ P(W ≥ 80|H ≥ 180) > P(W ≥ 80)
5-prob.overview.ctx 4

 
i. P(H ≥ 180, W ≥ 80) ≈ 0.0210 joint distribution of H and W
 
ii. P(H ≥ 180) ≈ 0.0229 marginal distribution of H
 
iii. P(W ≥ 80) ≈ 0.1576 marginal distribution of W
 
iv. P(W ≥ 80|H ≥ 180) ≈ 0.9137 conditional distribution of W (given H ≥ 180)
 
compared with P(W ≥ 80) ≈ 0.1576
 
P(W ≥80,H≥180)
v. How to calculate P(W ≥ 80|H ≥ 180)? = P(H≥180)

(Question: Given P(W ≥ 80|H ≥ 180) and P(H ≥ 180), how to calculate P(W ≥ 80, H ≥ 180)? )
vi. If H and W are independent then P(W ≥ 80|W ≥ 180) = P(W ≥ 80|H < 160) =
P(W ≥ 80).
(b) Theorem
 of Total Probability: How to calculate P(W ≥ 
80)?
Given P(W ≥ 80|group i) and P(group i), how to calculate P(W ≥ 80)?

i. P(W ≥ 80|H ≥ 180) ≈ 0.9137, P(H ≥ 180) ≈ 0.0229


ii. P(W ≥ 80|160 ≤ H < 180) ≈ 0.2751, P(160 ≤ H < 180) ≈ 0.4762
iii. P(W ≥ 80|H < 160) ≈ 0.0112, P(H < 160) ≈ 0.5008
 
(0.0229)(0.9137) + (0.4862)(0.2751) + (0.5008)(0.0112) ≈ 0.1575, P(W ≥ 80) ≈ 0.1576
 
(c) Bayes’ Theorem: P(H ≥ 180|W ≥ 80) =? compared with P(H ≥ 180)

Given P(W ≥ 80|H ≥ 180), P(H ≥ 180), P(W ≥ 80|160 ≤ H < 180), P(160 ≤ H < 180),

P(W ≥ 80|H < 160), P(H < 160), how to calculate P(H ≥ 180|W ≥ 80)?

P(H≥180,W ≥80)
i. P(H ≥ 180|W ≥ 80) = P(W ≥80)
ii. P(W ≥ 80) =?
iii. P(H ≥ 180, W ≥ 80) =?
5-prob.overview.ctx 5

 
(d) E(W ) = E E(W |H) :
 
Given the average weight for each group, how to calculate the average weight?

i. E(W |H ≥ 180) ≈ 89.0159, P(H ≥ 180) ≈ 0.0229


ii. E(W |160 ≤ H < 180) ≈ 75.7522, P(160 ≤ H < 180) ≈ 0.4762
iii. E(W |H < 160) ≈ 63.6311, P(H < 160) ≈ 0.5008

(0.0229)(89.0159) + (0.4762)(75.7522) + (0.5008)(63.6311) ≈ 69.9781, E(W ) ≈ 69.9858


   
(e) Var(W ) = E Var(W |H) + Var E(W |H) :
| {z } | {z }
variation within variation between
group group

Given the weight variance within each group and mean weight for each group, how to calculate the variance of

weight?
 
E Var(W |H) :

i. Var(W |H ≥ 180) ≈ 44.4997, P(H ≥ 180) ≈ 0.0229


ii. Var(W |160 ≤ H < 180) ≈ 51.8192, P(160 ≤ H < 180) ≈ 0.4762
iii. Var(W |H < 160) ≈ 59.2663, P(H < 160) ≈ 0.5008

(0.0229)(44.4997) + (0.4762)(51.8192) + (0.5008)(59.2663) ≈ 55.3759


     
 2   2
Var E(W |H) = E E(W |H) − E E(W |H) :
 
(0.0229)(89.0159) + (0.4762)(75.7522) + (0.5008)(63.6311) − (69.9858)2 ≈ 43.7652
2 2 2

   
E Var(W |H) + Var E(W |H) = 55.3759 + 43.7652 ≈ 99.1411, Var(W ) ≈ 99.7493

11. Joint distribution of (X, Y ): For discrete X and Y , fX,Y (a, b) ≡ P(X = a, Y = b) for all (a, b)
which describes the behavior of (X, Y ) is called the joint distribution of (X, Y ). For example,
(Xi , Yi ) are the amounts of money in the left and right pockets of person i.
P P
And fX (a) ≡ P(X = a) = P(X = a, −∞ < Y < ∞) = b P(X P = a, Y = b) = b fX,Y (a, b),
for all a, is called the marginal distribution of X. So fY (b) = a fX,Y (a, b), for all b, is the
marginal distribution of Y .
P P P
(a) E(X) = a a · P(X = a) = a a · fX (a), E(Y ) = b b · fY (b)
 
P P
(b) E(X + Y ) = (a,b) (a + b) · fX,Y (a, b), E g(X, Y ) = (a,b) g(a, b) · fX,Y (a, b)
 
(c) Var(X) ≡ E (X − µX )2 = E(X 2 ) − µX2
     
 2
(d) Var g(X, Y ) = E g(X, Y ) − E[g(X, Y )] = E g 2 (X, Y ) − µg(X,Y
2
)
| {z }
Z | {z } | {z }
 
2 E(Z 2 )−µZ2
E Z−µZ
5-prob.overview.ctx 6

 
 2
(e) Var(X − Y ) = E (X − Y ) − E(X − Y )
     
2 2  
= E X − µX + E Y − µY − 2E X − µX Y − µY
= Var(X) + Var(Y ) − 2Cov(X, Y )
 
12. Cov(X, Y ) = E (X − µX )(Y − µY )

Let c, d, e, and f are constants.


(a) Cov(cX + d, eY + f ) = ceCov(X, Y )
(b) If X and Y are statistically independent then Cov(X, Y ) = 0.
Cov(X,Y )
(c) Correlation coefficient ρ = √ (scale-independent),
Var(X)Var(Y )
−1 ≤ ρ ≤ 1
(d) Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ).
If X and Y are statistically independent then Var(X + Y ) = Var(X) + Var(Y ).
If X and Y are negatively correlated then Var(X + Y ) reduces.
(e) Var(X − Y ) = Var(X) + Var(Y ) − 2Cov(X, Y ).
If X and Y are statistically independent then Var(X − Y ) = Var(X) + Var(Y ).
If X and Y are positively correlated then Var(X − Y ) reduces.
P
(f) Let {X1 , . . . , Xn } be a random sample (or i.i.d. sample) and X̄ = ni=1 Xi /n.
E(X̄) = E(Xi ) and Var(X̄) = Var(Xi )/n.
5-prob.overview.ctx 7

13. In addition to understanding the random behavior of X from the probability distribution function
fX (a), we need fX (a) for making statistical inference, where X is the sample statistic related to the
population parameter of interest, e.g., sample mean X̄ when making inference on the population
mean µ, no. of defectives X found in n inspections when making inference on the defective rate p.
14. Statistical Inference: Reject some hypothesis H about some population parameter or not.

For example, do you accept the hypothesis H : p = 10% with p being the defective rate of some
production line?
15. Population: set of all objects of interest, e.g., set of all products from a production line.
16. Population Parameter: characteristic of the population (often serves as the system performance
measure), e.g., defective rate p of the production line (which is defined to be the proportion of
defectives in the lot)
17. Sample: subset of the population, e.g., 10 products randomly drawn for inspection
P10
18. Sample Statistic: summarized number of the sample observations, e.g., T = i=1 Xi denotes
the total number of defectives found in 10 inspections with Xi = 1 if the ith product is a defective
and Xi = 0 otherwise. Since the sample statistic T varies from sample to sample, it is a random
variable and has its own distribution called sampling distribution (of T ).
19. How to make statistical inference? How to determine if we should reject H : p = 10% when
observing T = 3?
20. If the observed T is significantly different from the expected T under H then we tend to reject

H. The observed is t = 3 and the expected is E(T |H) = 1. Suppose that 9 heads appeared when

flipping the coin 10 times. Do you believe that the coin is fair?

Question: Is 3 significantly larger than 1?

Answer: Find the proportion of T (when H is true) that is greater than or equal to 3.

If this proportion P(T ≥ 3|H) is small then · · ·

In order to calculate P(T ≥ 3|H), we need the probability distribution function of T (sampling
distribution of T ) when H is true.
5-prob.overview.ctx 8

par(mfrow=c(1,1)) #allocation of plots on 1 page

set.seed(1235)
N=5000
SIG.H=10; SIG.W=10
MU.H=160; MU.W=70
R=.8

# generate (H,W) ~ bivariate normal( (160,70), (10^2,80,80,10^2) )


B=R*SIG.W/SIG.H
A=MU.W-B*MU.H
SIG=((1-R^2)*SIG.W^2)^.5 #sd of errors in regression model
# (see p.676 of my text-book)
H=MU.H+SIG.H*rnorm(N)

E=SIG*rnorm(N)
W=A+B*H+E
# end generation

mean(H); sd(H)
mean(W); sd(W)

plot(H,W,main="Joint distribution of 5000 pairs of (H,W)


H ~ N(160,10^2), W ~ N(70, 10^2)
r=0.8",cex=.3);abline(v=160,lty=29);abline(h=70,lty=29)
abline(h=80,lty=1,col="red");abline(v=180,lty=1,col="red")

mean(H>180 & W>80) #joint


mean(H>180); mean(W>80) #marginals
mean(H>180)*mean(W>80) #dependent
mean(H>180 & W>80)/mean(H>180) #conditional

# H > 180 group


H.180=H>180
P.H.180=mean(H.180); P.H.180
W.given.H.180=W*H.180
N1=sum(W.given.H.180 > 0)
P.W.80.given.H.180=sum(W.given.H.180 > 80)/N1; P.W.80.given.H.180
E.W.given.H.180=sum(W.given.H.180)/N1; E.W.given.H.180
V.W.given.H.180=sum(W.given.H.180^2)/N1-E.W.given.H.180^2; V.W.given.H.180

# 160 < H < 180 group


H.160.180=(H>160)*(H<180)
P.160.H.180=mean(H.160.180); P.160.H.180
W.given.160.H.180=W*H.160.180
N2=sum(W.given.160.H.180 > 0)
P.W.80.given.160.H.180=sum(W.given.160.H.180 > 80)/N2; P.W.80.given.160.H.180
E.W.given.160.H.180=sum(W.given.160.H.180)/N2; E.W.given.160.H.180
V.W.given.160.H.180=sum(W.given.160.H.180^2)/N2-E.W.given.160.H.180^2; V.W.given.160.H.180

# H < 160 group


H.160=H<160
P.H.160=mean(H.160); P.H.160
W.given.H.160=W*H.160
N3=sum(W.given.H.160 > 0)
P.W.80.given.H.160=sum(W.given.H.160 > 80)/N3; P.W.80.given.H.160
E.W.given.H.160=sum(W.given.H.160)/N3; E.W.given.H.160
5-prob.overview.ctx 9

V.W.given.H.160=sum(W.given.H.160^2)/N3-E.W.given.H.160^2; V.W.given.H.160

plot(density(H),main="Marginal distribution of 5000 H


H ~ N(160,10^2)");abline(v=160,lty=29) #marginal distribution of H
plot(density(W),main="Marginal distribution of 5000 W
W ~ N(70,10^2)");abline(v=70,lty=29) #marginal distribution of W
5-prob.overview.ctx 10

Joint distribution of 5000 pairs of (H,W)


H ~ N(160,10^2), W ~ N(70, 10^2)
r=0.0

100
80
W

60
40

120 140 160 180 200

Joint distribution of 5000 pairs of (H,W)


H ~ N(160,10^2), W ~ N(70, 10^2)
r=0.9
100
80
W

60
40

120 140 160 180 200

Joint distribution of 5000 pairs of (H,W)


H ~ N(160,10^2), W ~ N(70, 10^2)
r=0.99999
100
80
W

60
40

120 140 160 180 200

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy