01 Lectureslides ProbTheory
01 Lectureslides ProbTheory
Maximilian Soelch
1 / 42
Probability Theory
2 / 42
Probability Theory
3 / 42
Basic Elements of Probability
Sample space Ω:
e.g. (smallest) σ-field that contains A: F = {∅, {1, 3, 5}, {2, 4, 6}, {1, 2, 3, 4, 5, 6}}
4 / 42
Basic Elements of Probability ctd.
|A|
e.g. for rolling a die P(A) = |Ω|
5 / 42
Important Properties
6 / 42
Conditional Probability
P(A ∩ B )
P(A | B ) := .
P(B )
7 / 42
Multiplication law
8 / 42
Law of total probability (revisited)
9 / 42
Bayes’ rule
P(B | A) · P(A)
P(A | B ) = .
P(B )
Bayes’ rule applies the multiplication rule twice to set P(A | B ) and
P(B | A) in relation:
10 / 42
Independence
11 / 42
Random variables
We are usually only interested in some aspects of a random experiment.
12 / 42
Cumulative distribution function – CDF
A probability measure P is specified by a cumulative distribution function
(CDF), a function FX : R → [0, 1]:
FX (x ) ≡ P(X ≤ x ) .
Properties:
I 0 ≤ FX (x ) ≤ 1
I limx →−∞ FX (x ) = 0
I limx →∞ FX (x ) = 1
I x ≤ y → FX (x ) ≤ FX (y)
13 / 42
Probability density function—PDF
P(x ≤ X ≤ x + ∆x ) ≈ fX (x ) · ∆x .
Properties:
I fX (x ) ≥ 0
R
I f (x ) dx = P(X ∈ A)
x ∈A X
R∞
I
−∞ X
f (x ) dx = 1
14 / 42
Probability mass function—PMF
X takes on only a countable set of possible values (discrete random
variable).
pX (x ) = P(X = x )
Properties:
I 0 ≤ pX (x ) ≤ 1
P
I pX (x ) = 1
Px
x ∈A pX (x ) = P(X ∈ A)
I
15 / 42
Transformation of Random Variables
16 / 42
Expectation
Properties:
I E[a] = a for any constant a ∈ R
I E[af (X )] = aE[f (X )] for any constant a ∈ R
I E[f (X ) + g(X )] = E[f (X )] + E[g(X )]
17 / 42
Variance and Standard Deviation
Properties:
I Var(a) = 0 for any constant a ∈ R.
I Var(af (X )) = a 2 Var(f (X )) for any constant a ∈ R.
p
σ(X ) = Var(X ) is called the standard deviation of X .
18 / 42
Entropy
19 / 42
Bernoulli distribution
µ,
if x = 1
pX (x ) = 1 − µ, if x = 0
0 else
Ber(x | µ) = µx · (1 − µ)1−x
20 / 42
Binomial distribution
For x ∈ {0, 1, . . . , N }:
N
Bin(x | N , µ) = ·µx ·(1−µ)N −x
x
21 / 42
Poisson distribution
X ∼ Bin(N , µ) → X ∼ Poi(λ)
For x ∈ N0 :
e −λ · λx
Poi(x | λ) =
x!
22 / 42
Uniform distribution
23 / 42
Exponential distribution
For x ∈ R+
0:
Exp(x | λ) = λ · e −λx
24 / 42
Normal/Gaussian distribution
For x ∈ R:
1 (x −µ)2
N (x | µ, σ 2 ) = √ · e − 2σ2
2πσ
25 / 42
Beta distribution
Γ(a + b) a−1
Beta(x | a, b) = x (1−x )b−1
Γ(a)Γ(b)
26 / 42
Gamma distribution
27 / 42
Overview: probability distributions
*Discrete distributions
R∞
With the gamma function Γ(x ) = 0
t x −1 e −t dt, with the property that
Γ(n + 1) = n! for n ∈ N+
0.
28 / 42
Two random variables—Bivariate case
FXY (x , y) = P(X ≤ x , Y ≤ y) .
Properties:
I 0 ≤ FXY (x , y) ≤ 1
I limx ,y→−∞ FXY (x , y) = 0
I limx ,y→∞ FXY (x , y) = 1
I FX (x ) = limy→∞ FXY (x , y)
29 / 42
Two continuous random variables
Most properties can be defined analogously to the univariate case.
Joint probability density function:
∂ 2 FXY (x , y)
fXY (x , y) = .
∂x ∂y
Properties:
I fXY (x , y) ≥ 0
RR
I fXY (x , y) dxdy = P((X , Y ) ∈ A)
R ∞A R ∞
I
−∞ −∞ XY
f (x , y) dxdy = 1
30 / 42
Relations between fX ,Y , fX , fY , FX ,Y , FX and FY
31 / 42
Two discrete random variables
pXY (x , y) = P(X = x , Y = y) .
Properties:
I 0 ≤ pXY (x , y) ≤ 1
P P
y pXY (x , y) = 1
I
x
32 / 42
Conditional distributions/Bayes’ rule
discrete continuous
pXY (x ,y) fXY (x ,y)
Definition pY |X (y | x ) = pX (x ) fY |X (y | x ) = fX (x )
pX |Y (x |y)pY (y) fX |Y (x |y)fY (y)
Bayes’ rule pY |X (y | x ) = pX (x ) fY |X (y | x ) = fX (x )
R
Probabilites pY |X (y | x ) = P(Y = y | X = x ) P(Y ∈ A | X = x ) = f
A Y |X
(y | x ) dy
33 / 42
Independence
Equivalently:
I pXY (x , y) = pX (x ) pY (y)
I pY |X (y | x ) = pY (y)
I fXY (x , y) = fX (x )fY (y)
I fY |X (y | x ) = fY (y)
34 / 42
Independent and identically distributed—i.i.d.
fX (x ) = fY (x ),
FX (x ) = FY (x ).
E[X ] = E[Y ],
Var(X ) = Var(Y ).
35 / 42
Expectation and covariance
Given two random variables X , Y and g : R2 → R.
P P
I E[g(X , Y )] :=
x y g(x , y)pXY (x , y).
R∞
I E[g(X , Y )] := g(x , y)fXY (x , y) dx dy.
−∞
Covariance
I Cov(X , Y ) := E[(X − E[X ])(Y − E[Y ])] = E[XY ] − E[X ]E[Y ].
I When Cov(X , Y ) = 0, X and Y are uncorrelated.
I Pearson correlation coefficient ρ(X , Y ):
Cov(X , Y )
ρ(X , Y ) := p ∈ [−1, 1].
Var(X )Var(Y )
36 / 42
Multiple random variables—Random vectors
Generalize previous ideas to more than two random variables. Putting all
these random variables together in one vector X , a random vector
(X : Ω → Rn ). The notions of joint CDF and PDF apply equivalently,
e.g.
or equivalently
hold.
38 / 42
Covariance matrix
Σij = Cov(Xi , Xj ).
h i
T
Σ = E (X − E[X ])(X − E[X ])T = E X X T − E[X ]E[X ]
39 / 42
Multinomial distribution
The multivariate version of the Binomial is called a Multinomial,
X ∼ Multinomial(N , µ). We have k ≥ 1 mutually exclusive events with
Pk
a success probability of µk (such that i=1 µk = 1).
We draw N times independently.
N
pX (x1 , x2 , . . . , xk ) = µx1 µx2 . . . µxkk .
x1 x2 . . . xk 1 2
where
X
xk = N ,
k
N N!
= ,
x1 x2 . . . xk x1 ! x2 ! . . . xk !
E[X ] = (N µ1 , N µ2 , . . . , N µk ),
Var(Xi ) = N µi (1 − µi ),
Cov(Xi , Xj ) = −N µi µj .
Example: An urn with n balls of k ≥ 1 different labels, drawn N ≥ 1
times with replacement and probabilities µk = #kn .
The marginal distribution of Xi is Bin(n, µi ).
40 / 42
Multivariate Gaussian
1 1
fX (x1 , x2 , . . . , xk ) = p exp − (x − µ)T Σ−1 (x − µ) ,
(2π)k det Σ 2
where
E[X ] = µ,
Var(Xi ) = Σii ,
Cov(Xi , Xj ) = Σij .
41 / 42
Notation in the lecture
Consider
pX (x ), x ∈R vs pX (y), y ∈R
42 / 42