Probability and Statistics: Cookbook
Probability and Statistics: Cookbook
Cookbook
12 Parametric Inference
12.1 Method of Moments . . . . . . . . .
12.2 Maximum Likelihood . . . . . . . . .
California in Berkeley but also influenced by other sources [2, 3].
12.2.1 Delta Method . . . . . . . . .
If you find errors or have suggestions for further topics, I would
12.3 Multiparameter Models . . . . . . .
appreciate if you send me an email. The most recent version
12.3.1 Multiparameter delta method
of this document is available at http://matthias.vallentin.
12.4 Parametric Bootstrap . . . . . . . .
This cookbook integrates a variety of topics in probability theory and statistics. It is based on literature and in-class material
from courses of the statistics department at the University of
net/probability-and-statistics-cookbook/. To reproduce,
please contact me.
.
.
.
.
.
.
.
.
.
.
.
.
13 Hypothesis Testing
14 Exponential Family
Contents
.
.
.
.
.
.
.
.
.
.
16
22 Math
16
22.1 Gamma Function
17
22.2 Beta Function . .
17
22.3 Series . . . . . .
17
22.4 Combinatorics .
18
6 Inequalities
8
9 16 Sampling Methods
16.1 Inverse Transform Sampling . . . . . .
9
16.2 The Bootstrap . . . . . . . . . . . . .
16.2.1 Bootstrap Confidence Intervals
9
16.3 Rejection Sampling . . . . . . . . . . .
16.4 Importance Sampling . . . . . . . . . .
10
.
.
.
.
.
18
18
18
18
19
19
7 Distribution Relationships
10
1 Distribution Overview
1.1 Discrete Distributions . . . . . . . . . .
1.2 Continuous Distributions . . . . . . . .
3
3
5
2 Probability Theory
3 Random Variables
3.1 Transformations . . . . . . . . . . . . .
4 Expectation
5 Variance
15 Bayesian Inference
15.1 Credible Intervals . . . .
15.2 Function of parameters .
15.3 Priors . . . . . . . . . .
15.3.1 Conjugate Priors
15.4 Bayesian Testing . . . .
13 20 Stochastic Processes
20.1 Markov Chains . . . . . . . . . .
13
20.2 Poisson Processes . . . . . . . . .
14
14
15 21 Time Series
21.1 Stationary Time Series . . . . . .
15
21.2 Estimation of Correlation . . . .
15
21.3 Non-Stationary Time Series . . .
21.3.1 Detrending . . . . . . . .
15
21.4 ARIMA models . . . . . . . . . .
21.4.1 Causality and Invertibility
16
21.5 Spectral Analysis . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
20
20
20
.
.
18.3 Multiple Regression . . . .
18.4 Model Selection . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
20
21
21
22
22
22
23
23
23
24
8 Probability
Functions
and
Moment
Generating
9 Multivariate Distributions
9.1 Standard Bivariate Normal . . . . . . .
9.2 Bivariate Normal . . . . . . . . . . . . .
9.3 Multivariate Normal . . . . . . . . . . .
17 Decision Theory
17.1 Risk . . . . . . . . . . . .
17.2 Admissibility . . . . . . .
11
17.3 Bayes Rule . . . . . . . .
17.4 Minimax Rules . . . . . .
11
11 18 Linear Regression
11
18.1 Simple Linear Regression
11
18.2 Prediction . . . . . . . . .
.
.
.
.
.
10 Convergence
11
10.1 Law of Large Numbers (LLN) . . . . . . 12
10.2 Central Limit Theorem (CLT) . . . . . 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
. . . . 24
. . . . 25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
26
26
27
27
28
28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
30
Distribution Overview
1.1
Discrete Distributions
Notation1
Uniform
Unif {a, . . . , b}
FX (x)
Bernoulli
Binomial
Multinomial
Hypergeometric
Negative Binomial
1 We
Bern (p)
Bin (n, p)
x<a
axb
x>b
(1 p)1x
bxca+1
ba
I1p (n x, x + 1)
Mult (n, p)
Hyp (N, m, n)
NBin (r, p)
Geometric
Geo (p)
Poisson
Po ()
x np
p
np(1 p)
Ip (r, x + 1)
1 (1 p)x
e
x N+
x
X
i
i!
i=0
fX (x)
E [X]
V [X]
MX (s)
I(a x b)
ba+1
a+b
2
(b a + 1)2 1
12
eas e(b+1)s
s(b a)
px (1 p)1x
p(1 p)
1 p + pes
np
np(1 p)
(1 p + pes )n
!
n x
p (1 p)nx
x
k
X
n!
x
xi = n
px1 1 pkk
x1 ! . . . xk !
i=1
m N m
x
npi
p(1 p)x1
x e
x!
x N+
npi (1 pi )
!n
si
pi e
i=0
nm
N
nx
N
n
!
x+r1 r
p (1 p)x
r1
k
X
1p
p
nm(N n)(N m)
N 2 (N 1)
r
1p
p2
1
p
1p
p2
p
1 (1 p)es
r
pes
1 (1 p)es
e(e
1)
use the notation (s, x) and (x) to refer to the Gamma functions (see 22.1), and use B(x, y) and Ix to refer to the Beta functions (see 22.2).
Uniform (discrete)
Binomial
Geometric
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
0.8
Poisson
p = 0.2
p = 0.5
p = 0.8
=1
=4
= 10
0.3
0.2
0.6
0.4
10
30
1.00
2.5
1.0
5.0
7.5
0.0
10.0
10
1.00
20
0.75
15
Poisson
0.8
Geometric
0.75
0.50
0.6
10
0.25
0.4
0.00
0.25
0.50
CDF
CDF
CDF
0.0
i
n
40
Binomial
Uniform (discrete)
i
n
0.0
20
x
1
0.1
0.0
0.2
0.1
0.2
PMF
PMF
CDF
1
n
PMF
PMF
20
30
n = 40, p = 0.3
n = 30, p = 0.6
n = 25, p = 0.9
40
0.2
0.0
2.5
5.0
7.5
p = 0.2
p = 0.5
p = 0.8
10.0
0.00
10
15
=1
=4
= 10
20
1.2
Continuous Distributions
Notation
FX (x)
Uniform
Unif (a, b)
Normal
N , 2
x<a
a<x<b
1
x>b
Z x
(x) =
(t) dt
xa
ba
Log-Normal
ln N , 2
Multivariate Normal
MVN (, )
Students t
Chi-square
Student()
2k
1
1
ln x
+ erf
2
2
2 2
fX (x)
E [X]
V [X]
MX (s)
exp
2 2
x 2 2
a+b
2
(b a)2
12
esb esa
s(b a)
2 s2
exp s +
2
,
(k/2)
2 2
1 (x)
(+1)/2
+1
x2
2
1
+
2
1
xk/21 ex/2
2k/2 (k/2)
r
e+
(e 1)e2+
/2
1
exp T s + sT s
2
(
0
>2
1<2
2k
d2
d2 2
2d22 (d1 + d2 2)
d1 (d2 2)2 (d2 4)
F
F(d1 , d2 )
Exponential
Exp ()
Gamma
Inverse Gamma
Dirichlet
Beta
Weibull
Pareto
d1 x
d1 x+d2
Gamma (, )
InvGamma (, )
d1 d1
,
2 2
Pareto(xm , )
(, x/)
()
, x
()
1
x1 ex/
()
1 /x
x
e
()
P
k
k
i=1 i Y 1
xi i
Qk
i=1 (i ) i=1
>1
1
2
>2
( 1)2 ( 2)
( + ) 1
x
(1 x)1
() ()
+
1
1 +
k
( + )2 ( + + 1)
2
2 1 +
2
k
xm
>1
1
x
m
>2
( 1)2 ( 2)
1 e(x/)
1
xB
d1 d1
, 2
2
1 x/
e
Ix (, )
Weibull(, k)
(d1 x+d2
)d1 +d2
1 ex/
Dir ()
Beta (, )
(d1 x)d1 d2 2
x
m
x xm
k x k1 (x/)k
e
x
m
+1
x
x xm
i
Pk
i=1
1
(s < 1/)
1 s
1
(s < 1/)
1 s
p
2(s)/2
4s
K
()
E [Xi ] (1 E [Xi ])
Pk
i=1 i + 1
1+
k1
Y
k=1
n n
n=0
r=0
+r
++r
sk
k!
s
n
1+
n!
k
(xm s) (, xm s) s < 0
Uniform (continuous)
Normal
2.0
LogNormal
1.00
= 0, 2 = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
1.0
0.00
5.0
2.5
0.0
2
k=1
k=2
k=3
k=4
k=5
0.1
0.25
0.0
1.00
0.3
0.2
0.50
0.5
2.5
5.0
=1
=2
=5
=
1
ba
0.75
1.5
Student's t
0.4
= 0, 2 = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
0.0
0
5.0
2.5
0.0
Exponential
2.0
=2
=1
= 0.4
= 1, = 2
= 2, = 2
= 3, = 2
= 5, = 1
= 9, = 0.5
1.5
1.5
0.75
5.0
Gamma
2.0
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
2.5
0.50
2
1.0
1.0
1
0.5
0.5
0.25
0.00
0
0.0
0.0
0
Inverse Gamma
10
15
20
Weibull
2.0
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
Beta
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
Pareto
xm = 1, k = 1
xm = 1, k = 2
xm = 1, k = 4
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
3
1.5
3
3
2
1.0
0.5
0
0
0.0
0.00
0.25
0.50
0.75
1.00
0.0
0.5
1.0
1.5
2.0
2.5
1.0
1.5
2.0
2.5
Uniform (continuous)
Normal
LogNormal
1.00
Student's t
1.00
= 0, 2 = 3
= 2, 2 = 2
= 0, 2 = 1
= 0.5, 2 = 1
= 0.25, 2 = 1
= 0.125, 2 = 1
0.75
0.75
0.75
CDF
CDF
CDF
CDF
0.50
0.50
0.50
0.25
0.25
0.25
= 0, = 0.2
= 0, 2 = 1
= 0, 2 = 5
= 2, 2 = 0.5
2
0.00
a
5.0
2.5
0.0
2.5
0.00
5.0
0.00
4
Exponential
0.75
0.75
0.50
0.00
1
0.00
3
0.00
1.00
0.75
0.75
0.25
0.50
10
0.75
1.00
15
20
0.50
0.25
= 1, k = 0.5
= 1, k = 1
= 1, k = 1.5
= 1, k = 5
0.00
0.00
Pareto
1.00
0.25
0.25
= 1, = 1
= 2, = 1
= 3, = 1
= 3, = 0.5
0.50
0.50
0.25
0.00
CDF
0.50
Weibull
= 0.5, = 0.5
= 5, = 1
= 1, = 3
= 2, = 2
= 2, = 5
CDF
0.75
CDF
0.75
Beta
1.00
= 1, = 2
= 2, = 2
= 3, = 2
= 5, = 1
= 9, = 0.5
CDF
Inverse Gamma
=2
=1
= 0.4
1.00
0.25
0.00
5.0
0.50
0.25
d1 = 1, d2 = 1
d1 = 2, d2 = 1
d1 = 5, d2 = 2
d1 = 100, d2 = 1
d1 = 100, d2 = 100
2.5
CDF
0.75
0.0
Gamma
1.00
2.5
CDF
CDF
k=1
k=2
k=3
k=4
k=5
5.0
1.00
0.25
0.25
1.00
0.50
0.50
CDF
0.75
1.00
0.00
0
x
2
=1
=2
=5
=
0.0
0.5
1.0
1.5
2.0
2.5
xm = 1, k = 1
xm = 1, k = 2
xm = 1, k = 4
0.00
1.0
1.5
2.0
2.5
Probability Theory
Definitions
P [B] =
Sample space
Outcome (point or element)
Event A
-algebra A
P [B|Ai ] P [Ai ]
n
G
Ai
i=1
Bayes Theorem
P [B | Ai ] P [Ai ]
P [Ai | B] = Pn
j=1 P [B | Aj ] P [Aj ]
Inclusion-Exclusion Principle
[
X
n
n
Ai =
(1)r1
Probability Distribution P
i=1
1. P [A] 0 A
2. P [] = 1
" #
G
X
3. P
Ai =
P [Ai ]
r=1
n
G
Ai
i=1
\
r
A
ij
Random Variables
i=1
X:R
Probability space (, A, P)
Probability Mass Function (PMF)
Properties
i=1
1. A
S
2. A1 , A2 , . . . , A = i=1 Ai A
3. A A = A A
i=1
n
X
P [] = 0
B = B = (A A) B = (A B) (A B)
P [A] = 1 P [A]
P [B] = P [A B] + P [A B]
P [] = 1
P [] = 0
S
T
T
S
( n An ) = n An ( n An ) = n An
DeMorgan
S
T
P [ n An ] = 1 P [ n An ]
P [A B] = P [A] + P [B] P [A B]
= P [A B] P [A] + P [B]
P [A B] = P [A B] + P [A B] + P [A B]
P [A B] = P [A] P [A B]
f (x) dx
a
FX (x) = P [X x]
Continuity of Probabilities
A1 A2 . . . = limn P [An ] = P [A]
A1 A2 . . . = limn P [An ] = P [A]
S
whereA = i=1 Ai
T
whereA = i=1 Ai
Z
P [a Y b | X = x] =
fY |X (y | x)dy
Independence
fY |X (y | x) =
A
B P [A B] = P [A] P [B]
f (x, y)
fX (x)
Independence
Conditional Probability
P [A | B] =
ab
P [A B]
P [B]
P [B] > 0
1. P [X x, Y y] = P [X x] P [Y y]
2. fX,Y (x, y) = fX (x)fY (y)
3.1
Transformations
E [XY ] =
Transformation function
E [(Y )] 6= (E [X])
(cf. Jensen inequality)
P [X Y ] = 1 = E [X] E [Y ]
P [X = Y ] = 1 E [X] = E [Y ]
X
E [X] =
P [X x]
Z = (X)
Discrete
X
fZ (z) = P [(X) = z] = P [{x : (x) = z}] = P X 1 (z) =
f (x)
x1 (z)
x=1
Sample mean
Continuous
X
n = 1
Xi
X
n i=1
Z
FZ (z) = P [(X) z] =
with Az = {x : (x) z}
f (x) dx
Az
Conditional expectation
Z
E [Y | X = x] = yf (y | x) dy
E [X] = E [E [X | Y ]]
E[(X, Y ) | X = x] =
E [Z] =
(x, y)fY |X (y | x) dx
Z
Z
(x) dFX (x)
E [(Y, Z) | X = x] =
Z
E [IA (x)] =
E [Y + Z | X] = E [Y | X] + E [Z | X]
E [(X)Y | X] = (X)E [Y | X]
E[Y | X] = c = Cov [X, Y ] = 0
Z
dFX (x) = P [X A]
Convolution
Z
Z := X + Y
fX,Y (x, z x) dx
fZ (z) =
X,Y 0
fX,Y (x, z x) dx
Z := |X Y |
Z :=
X
Y
fZ (z) = 2
fX,Y (x, z + x) dx
0
Z
Z
fZ (z) =
|x|fX,Y (x, xz) dx =
xfx (x)fX (x)fY (xz) dx
Variance
Expectation
V
Z
E [X] = X =
x dFX (x) =
i=1
#
Xi =
i=1
xfX (x)
x
Z
xfX (x) dx
P [X = c] = 1 = E [X] = c
E [cX] = c E [X]
E [X + Y ] = E [X] + E [Y ]
" n
X
X discrete
n
X
i6=j
V [Xi ]
if Xi
Xj
i=1
Standard deviation
sd[X] =
V [X] = X
Covariance
X continuous
n
m
n X
m
X
X
X
Cov
Xi ,
Yj =
Cov [Xi , Yj ]
i=1
j=1
Distribution Relationships
Binomial
Xi Bern (p) =
i=1 j=1
n
X
Xi Bin (n, p)
i=1
Correlation
Cov [X, Y ]
[X, Y ] = p
V [X] V [Y ]
Independence
Negative Binomial
X
Y = [X, Y ] = 0 Cov [X, Y ] = 0 E [XY ] = E [X] E [Y ]
Sample variance
n
S2 =
1 X
n )2
(Xi X
n 1 i=1
Conditional variance
2
V [Y | X] = E (Y E [Y | X])2 | X = E Y 2 | X E [Y | X]
V [Y ] = E [V [Y | X]] + V [E [Y | X]]
Poisson
Xi Po (i ) Xi Xj =
n
X
Xi Po
i=1
n
X
!
i
i=1
X
n
X
n
Xj Bin
Xj , Pn
Xi Po (i ) Xi Xj = Xi
j
j=1
j=1
j=1
Inequalities
Cauchy-Schwarz
Exponential
2
E [XY ] E X 2 E Y 2
Markov
P [(X) t]
E [(X)]
t
Chebyshev
P [|X E [X]| t]
Chernoff
P [X (1 + )]
Xi Exp () Xi
Xj =
Xi Gamma (n, )
i=1
V [X]
t2
e
(1 + )1+
n
X
Normal
X N , 2
> 1
Hoeffding
X1 , . . . , Xn independent P [Xi [ai , bi ]] = 1 1 i n
E X
t e2nt2 t > 0
P X
2 2
E X
| t 2 exp Pn 2n t
P |X
t>0
2
i=1 (bi ai )
Jensen
E [(X)] (E [X]) convex
N (0, 1)
2
X N , Z = aX + b = Z N a + b, a2 2
P
P
P
Xi N i , i2 Xi
Xj =
Xi N
i , i i2
i
i
P [a < X b] = b
a
(x) = 1 (x)
0 (x) = x(x)
00 (x) = (x2 1)(x)
1
Upper quantile of N (0, 1): z = (1 )
Gamma
X Gamma (, ) X/ Gamma (, 1)
P
Gamma (, ) i=1 Exp ()
P
P
Xi Gamma (i , ) Xi
Xj =
i Xi Gamma (
i i , )
10
()
=
9.2
x1 ex dx
Bivariate Normal
Let X N x , x2 and Y N y , y2 .
Beta
1
( + ) 1
x1 (1 x)1 =
x
(1 x)1
B(, )
()()
B( + k, )
+k1
=
E X k1
E Xk =
B(, )
++k1
Beta (1, 1) Unif (0, 1)
f (x, y) =
"
z=
x x
x
2x y
2
+
1
p
z
exp
2
2(1 2 )
1
y y
y
2
2
x x
x
y y
y
#
X
(Y E [Y ])
Y
p
V [X | Y ] = X 1 2
E [X | Y ] = E [X] +
|t| < 1
"
#
X (Xt)i
X
E Xi
MX (t) = GX (et ) = E eXt = E
=
ti
i!
i!
i=0
i=0
9.3
P [X = 0] = GX (0)
P [X = 1] = G0X (0)
Multivariate Normal
(Precision matrix 1 )
V [X1 ]
Cov [X1 , Xk ]
..
..
..
=
.
.
.
Covariance matrix
(i)
GX (0)
i!
E [X] = G0X (1 )
(k)
E X k = MX (0)
X!
(k)
E
= GX (1 )
(X k)!
P [X = i] =
V [X] =
G00X (1 )
Cov [Xk , X1 ]
If X N (, ),
G0X (1 )
d
1/2
fX (x) = (2)n/2 ||
2
(G0X (1 ))
GX (t) = GY (t) = X = Y
9
9.1
V [Xk ]
Properties
Multivariate Distributions
Standard Bivariate Normal
Let X, Y N (0, 1) X
Z where Y = X +
1 2 Z
10
Joint density
x2 + y 2 2xy
p
f (x, y) =
exp
2(1 2 )
2 1 2
and
Z N (0, 1) X = + 1/2 Z = X N (, )
X N (, ) = 1/2 (X ) N (0, 1)
X N (, ) = AX N A, AAT
X N (, ) kak = k = aT X N aT , aT a
Convergence
Let {X1 , X2 , . . .} be a sequence of rvs and let X be another rv. Let Fn denote
the cdf of Xn and let F denote the cdf of X.
Conditionals
(Y | X = x) N x, 1 2
1
exp (x )T 1 (x )
2
(X | Y = y) N y, 1 2
Types of convergence
D
Independence
X
Y = 0
t where F continuous
11
10.2
2. In probability: Xn X
( > 0) lim P [|Xn X| > ] = 0
n
n(Xn ) D
X
Z
Zn := q =
n
V X
as
i
lim Xn = X = P : lim Xn () = X() = 1
where Z N (0, 1)
zR
n
qm
CLT notations
Zn N (0, 1)
2
n N ,
X
n
2
Xn N 0,
n
n(Xn ) N 0,
n(Xn )
N (0, 1)
lim E (Xn X)2 = 0
Relationships
qm
Xn X = Xn X = Xn X
as
P
Xn X = Xn X
D
P
Xn X (c R) P [X = c] = 1 = Xn X
Xn
Xn
Xn
Xn
X
qm
X
P
X
P
X
Yn
Yn
Yn
=
Y = Xn + Yn X + Y
qm
qm
Y = Xn + Yn X + Y
P
P
Y = Xn Yn XY
P
(Xn ) (X)
Continuity correction
Xn X = (Xn ) (X)
qm
Xn b limn E [Xn ] = b limn V [Xn ] = 0
qm
n
X1 , . . . , Xn iid E [X] = V [X] < X
n x
P X
n x 1
P X
Slutzkys Theorem
D
Xn X and Yn c = Xn + Yn X + c
D
P
D
Xn X and Yn c = Xn Yn cX
D
D
D
In general: Xn X and Yn Y =
6
Xn + Yn X + Y
10.1
x + 12
/ n
Yn N
11
2
,
n
= (Yn ) N
11.1
as
n
X
Strong (SLLN)
2
(), ( ())
n
0
iid
Weak (WLLN)
n
Statistical Inference
n
X
x 12
/ n
Delta method
Point Estimation
12
11.2
b 2 . Let z/2 = 1 (1 (/2)), i.e., P Z > z/2 = /2
Suppose bn N , se
and P z/2 < Z < z/2 = 1 where Z N (0, 1). Then
b
Cn = bn z/2 se
11.4
Statistical Functionals
Statistical functional: T (F )
Plug-in estimator of = (F ): bn = T (Fbn )
R
Linear functional: T (F ) = (x) dFX (x)
Plug-in estimator for linear functional:
Z
T (Fbn ) =
11.3
Empirical distribution
b 2 = T (Fbn ) z/2 se
b
Often: T (Fbn ) N T (F ), se
i=1
1X
(Xi )
(x) dFbn (x) =
n i=1
I(Xi x)
n
(
1
I(Xi x) =
0
Xi x
Xi > x
12
Parametric Inference
Let F = f (x; ) : be a parametric model with parameter space Rk
and parameter = (1 , . . . , k ).
12.1
Method of Moments
j th moment
2
P sup F (x) Fbn (x) > = 2e2n
x
j () = E X j =
xj dFX (x)
13
j th sample moment
1X j
bj =
X
n i=1 i
I() = E s(X; )
Inobs () =
k () =
bk
n(b ) N (0, )
where = gE Y Y T g T , Y = (X, X 2 , . . . , X k )T ,
1
j ()
g = (g1 , . . . , gk ) and gj =
12.2
P
Consistency: bn
Equivariance: bn is the mle = (bn ) ist the mle of ()
Asymptotic normality:
p
1. se 1/In ()
(bn ) D
N (0, 1)
se
q
b 1/In (bn )
2. se
(bn ) D
N (0, 1)
b
se
Maximum Likelihood
Asymptotic optimality (or efficiency), i.e., smallest variance for large samples. If en is any other estimator, the asymptotic relative efficiency is
Likelihood: Ln : [0, )
Ln () =
n
2 X
log f (Xi ; )
2 i=1
n
Y
f (Xi ; )
h i
V bn
are(en , bn ) = h i 1
V en
i=1
Log-likelihood
`n () = log Ln () =
n
X
log f (Xi ; )
i=1
12.2.1
Ln (bn ) = sup Ln ()
Score function
s(X; ) =
log f (X; )
Fisher information
I() = V [s(X; )]
In () = nI()
Delta Method
14
12.3
13
Multiparameter Models
Hypothesis Testing
2 `n
=
2
Hjk
H0 : 0
2 `n
=
j k
Definitions
E [H11 ]
..
..
In () =
.
.
E [Hk1 ]
E [H1k ]
..
.
E [Hkk ]
Null hypothesis H0
Alternative hypothesis H1
Simple hypothesis = 0
Composite hypothesis > 0 or < 0
Two-sided test: H0 : = 0 versus H1 : 6= 0
One-sided test: H0 : 0 versus H1 : > 0
Critical value c
Test statistic T
Rejection region R = {x : T (x) > c}
Power function () = P [X R]
Power of a test: 1 P [Type II error] = 1 = inf ()
1
1
.
=
..
H0 true
H1 true
1F (T (X))
p-value
< 0.01
0.01 0.05
0.05 0.1
> 0.1
(b
) D
N (0, 1)
b )
se(b
b ) =
se(b
b and
b = b.
and Jbn = Jn ()
=
12.4
Parametric Bootstrap
T
Type II Error ()
Reject H0
Type
I Error ()
(power)
p-value = sup0 P [T (X) T (x)] = inf : T (x) R
p-value = sup0
P [T (X ? ) T (X)]
= inf : T (X) R
{z
}
|
b Then,
Suppose =b 6= 0 and b = ().
r
Retain H0
p-value
where
H1 : 1
versus
since T (X ? )F
evidence
very strong evidence against H0
strong evidence against H0
weak evidence against H0
little or no evidence against H0
Wald test
b
Jbn
Two-sided test
b 0
Reject H0 when |W | > z/2 where W =
b
se
P |W | > z/2
p-value = P0 [|W | > |w|] P [|Z| > |w|] = 2(|w|)
Sample from f (x; bn ) instead of from Fbn , where bn could be the mle or method
of moments estimator.
Likelihood ratio test (LRT)
15
sup Ln ()
Ln (bn )
=
sup0 Ln ()
Ln (bn,0 )
k
X
D
iid
(X) = 2 log T (X) 2rq where
Zi2 2k and Z1 , . . . , Zk N (0, 1)
T (X) =
Vector parameter
(
fX (x | ) = h(x) exp
15
Bayesian Inference
Bayes Theorem
f ( | x) =
T 2k1
p-value = P 2k1 > T (x)
D
2
Faster Xk1
than LRT, hence preferable for small n
f (x | )f ()
f (x | )f ()
=R
Ln ()f ()
f (xn )
f (x | )f () d
Definitions
Independence testing
X n = (X1 , . . . , Xn )
xn = (x1 , . . . , xn )
Prior density f ()
Likelihood f (xn | ): joint density of the data
n
Y
f (xi | ) = Ln ()
In particular, X n iid = f (xn | ) =
i=1
Posterior density f ( | xn )
R
Normalizing constant cn = f (xn ) = f (x | )f () d
Kernel: part of a density that depends Ron
R
Ln ()f ()
Posterior mean n = f ( | xn ) d = R
Ln ()f () d
15.1
Credible Intervals
Posterior interval
n
14
Natural form
T =
i=1
i=1
p-value = P0 [(X) > (x)] P 2rq > (x)
Multinomial LRT
Xk
X1
,...,
mle: pbn =
n
n
Xj
k
Y
pbj
Ln (b
pn )
=
T (X) =
Ln (p0 )
p0j
j=1
k
X
pbj
D
(X) = 2
Xj log
2k1
p
0j
j=1
s
X
Exponential Family
f ( | xn ) d = 1
P [ (a, b) | x ] =
a
Scalar parameter
fX (x | ) = h(x) exp {()T (x) A()}
= h(x)g() exp {()T (x)}
f ( | xn ) d = /2
16
1. P [ Rn ] = 1
2. Rn = { : f ( | xn ) > k} for some k
15.3.1
Rn is unimodal = Rn is an interval
15.2
Conjugate Priors
Function of parameters
Likelihood
Conjugate prior
Unif (0, )
Pareto(xm , k)
Exp ()
Gamma (, )
i=1
2
Let = () and A = { : () }.
Posterior CDF for
N , c
H(r | xn ) = P [() | xn ] =
2
N 0 , 0
f ( | xn ) d
N c , 2
Posterior density
N , 2
h( | xn ) = H 0 ( | xn )
15.3
Posterior hyperparameters
max x(n) , xm , k + n
n
X
xi
+ n, +
MVN(, c )
MVN(c , )
Priors
Choice
Subjective bayesianism.
Objective bayesianism.
Robust bayesianism.
InverseWishart(, )
Pareto(xmc , k)
Gamma (, )
Pareto(xm , kc )
Pareto(x0 , k0 )
Gamma (c , )
Gamma (0 , 0 )
Pn
0
1
n
i=1 xi
+
/
+ 2 ,
2
2
02
c
0
c1
1
n
+ 2
02
c
Pn
02 + i=1 (xi )2
+ n,
+n
+ n
x
n
,
+ n,
+ ,
+n
2
n
2
1X
(
x
)
+
(xi x
)2 +
2 i=1
2(n + )
1
1
0 + nc
1
1
1
x
,
0 0 + n
1 1
1
0 + nc
n
X
(xi c )(xi c )T
n + , +
i=1
n
X
xi
x
mc
i=1
x0 , k0 kn where k0 > kn
n
X
0 + nc , 0 +
xi
+ n, +
log
i=1
Types
Flat: f () constant
R
Proper: f () d = 1
R
Improper: f () d =
Jeffreys prior (transformation-invariant):
f ()
I()
f ()
det(I())
17
log10 BF10
Discrete likelihood
Likelihood
Conjugate prior
Posterior hyperparameters
Bern (p)
Beta (, )
Bin (p)
Beta (, )
n
X
i=1
n
X
xi , + n
xi , +
i=1
NBin (p)
Beta (, )
Po ()
+ rn, +
Gamma (, )
Multinomial(p)
Dir ()
n
X
i=1
n
X
Beta (, )
n
X
xi
i=1
Ni
i=1
n
X
xi
xi , + n
16
1+
p
1p BF10
evidence
1 1.5
1.5 10
10 100
> 100
Weak
Moderate
Strong
Decisive
Sampling Methods
16.1
Setup
x(i)
+ n, +
p =
xi
i=1
i=1
i=1
Geo (p)
n
X
n
X
0 0.5
0.5 1
12
>2
p
BF
10
1p
BF10
n
X
xi
i=1
U Unif (0, 1)
XF
F 1 (u) = inf{x | F (x) u}
Algorithm
15.4
Bayesian Testing
If H0 : 0 :
Z
Prior probability P [H0 ] =
f () d
16.2
f ( | xn ) d
Z
0
f (xn | , Hi )f ( | Hi ) d
P [Hi | x ]
P [Hj | xn ]
16.2.1
n
f (x | Hi )
f (xn | Hj )
| {z }
Bayes factor
, . . . , Tn,B
, an iid sample from
(a) Repeat the following B times to get Tn,1
b
the sampling distribution implied by Fn
i. Sample uniformly X1 , . . . , Xn Fbn .
ii. Compute Tn = g(X1 , . . . , Xn ).
(b) Then
!2
B
B
X
X
1
1
bb =
vboot = V
Tn,b
T
Fn
B
B r=1 n,r
b=1
The Bootstrap
P [Hi ]
P [Hj ]
| {z }
prior odds
Normal-based interval
b boot
Tn z/2 se
Pivotal interval
1. Location parameter = T (F )
18
2. Pivot Rn = bn
3. Let H(r) = P [Rn r] be the cdf of Rn
4. Let Rn,b
= bn,b
bn . Approximate H using bootstrap:
B
1 X
b
H(r)
=
I(Rn,b
r)
B
16.4
b=1
iid
), i.e., r = bn
6. r = sample quantile of (Rn,1
, . . . , Rn,B
7. Approximate 1 confidence interval Cn = a
, b where
a
=
b =
b 1 1 =
bn H
2
1
b
=
bn H
2
Percentile interval
16.3
bn r1/2
=
2bn 1/2
bn r/2
=
2bn /2
Rejection Sampling
Algorithm
1. Draw cand g()
2. Generate u Unif (0, 1)
k(cand )
3. Accept cand if u
M g(cand )
4. Repeat until B values of cand have been accepted
Example
We can easily sample from the prior g() = f ()
Target is the posterior h() k() = f (xn | )f ()
Envelope condition: f (xn | ) f (xn | bn ) = Ln (bn ) M
Algorithm
1. Draw cand f ()
17
Decision Theory
Definitions
Cn = /2
, 1/2
Setup
Importance Sampling
17.1
Risk
Posterior risk
Z
h
i
b
b
L(, (x))f
( | x) d = E|X L(, (x))
h
i
b
b
L(, (x))f
(x | ) dx = EX| L(, (X))
r(b | x) =
(Frequentist) risk
b =
R(, )
19
18
Bayes risk
ZZ
b =
r(f, )
h
i
b
b
L(, (x))f
(x, ) dx d = E,X L(, (X))
h
h
ii
h
i
b = E EX| L(, (X)
b
b
r(f, )
= E R(, )
h
h
ii
h
i
b = EX E|X L(, (X)
b
r(f, )
= EX r(b | X)
Linear Regression
Definitions
Response variable Y
Covariate X (aka predictor variable or feature)
18.1
Model
17.2
Yi = 0 + 1 Xi + i
Admissibility
E [i | Xi ] = 0, V [i | Xi ] = 2
Fitted line
b0 dominates b if
b0
b
: R(, ) R(, )
b
: R(, b0 ) < R(, )
b is inadmissible if there is at least one other estimator b0 that dominates
it. Otherwise it is called admissible.
rb(x) = b0 + b1 x
Predicted (fitted) values
Ybi = rb(Xi )
Residuals
i = Yi Ybi = Yi b0 + b1 Xi
17.3
Bayes Rule
rss(b0 , b1 ) =
n
X
2i
i=1
b = inf e r(f, )
e
r(f, )
R
b
b = r(b | x)f (x) dx
(x)
= inf r(b | x) x = r(f, )
Theorems
Squared error loss: posterior mean
Absolute error loss: posterior median
Zero-one loss: posterior mode
17.4
Minimax Rules
Maximum risk
b = sup R(, )
b
)
R(
R(a)
= sup R(, a)
Minimax rule
b = inf R(
e = inf sup R(, )
e
)
sup R(, )
b =c
b = Bayes rule c : R(, )
Least favorable prior
bf = Bayes rule R(, bf ) r(f, bf )
n
b0 = Yn b1 X
Pn
Pn
i=1 Xi Yi nXY
i=1 (Xi Xn )(Yi Yn )
b
Pn
= P
1 =
n
2
2
2
i=1 (Xi Xn )
i=1 Xi nX
h
i
0
E b | X n =
1
P
h
i
2 n1 ni=1 Xi2 X n
V b | X n = 2
1
X n
nsX
r Pn
2
b
i=1 Xi
b b0 ) =
se(
n
sX n
b b1 ) =
se(
sX n
Pn
Pn 2
1
where s2X = n1 i=1 (Xi X n )2 and
b2 = n2
i (unbiased estimate).
i=1
Further properties:
P
P
Consistency: b0 0 and b1 1
20
18.3
Asymptotic normality:
b0 0 D
N (0, 1)
b b0 )
se(
and
b1 1 D
N (0, 1)
b b1 )
se(
Multiple Regression
Y = X +
where
b b1 )
and b1 z/2 se(
b b0 )
b0 z/2 se(
Xn1
Pn b
Pn 2
2
rss
i=1 (Yi Y )
R = Pn
= 1 Pn i=1 i 2 = 1
2
tss
i=1 (Yi Y )
i=1 (Yi Y )
Likelihood
L2 =
1
= ...
Xnk
n
Y
i=1
n
Y
i=1
n
Y
f (Xi , Yi ) =
n
Y
fX (Xi )
n
Y
fY |X (Yi | Xi )
2
1 X
exp 2
Yi (0 1 Xi )
2 i
b = (X T X)1 X T Y
h
i
V b | X n = 2 (X T X)1
Under the assumption of Normality, the least squares parameter estimators are
also the MLEs, but the least squares variance estimator is not the MLE
b N , 2 (X T X)1
1X 2
n i=1 i
rb(x) =
k
X
bj xj
j=1
Prediction
b2 =
N
X
(Yi xTi )2
fX (Xi )
n
n
i=1
fY |X (Yi | Xi ) = L1 L2
i=1
i=1
1
..
=.
i=1
18.2
1
L(, ) = (2 2 )n/2 exp 2 rss
2
L1 =
X1k
..
.
Likelihood
R2
L=
..
.
X11
..
X= .
Pn
2
2
2
i=1 (Xi X )
b
P
n =
b
2j + 1
n i (Xi X)
Yb z/2 bn
b2 =
n
1 X 2
n k i=1 i
= X b Y
mle
b=X
b2 =
nk 2
1 Confidence interval
b bj )
bj z/2 se(
21
18.4
Model Selection
AIC(S) = `n (bS ,
bS2 ) k
Bayesian Information Criterion (BIC)
k
BIC(S) = `n (bS ,
bS2 ) log n
2
Procedure
1. Assign a score to each model
2. Search through all models to find the one with the highest score
Hypothesis testing
m
X
(Ybi (S) Yi )2
i=1
n
n
or
4
2
H0 : j = 0 vs. H1 : j 6= 0 j J
Leave-one-out cross-validation
Mean squared prediction error (mspe)
h
i
mspe = E (Yb (S) Y )2
bCV (S) =
R
n
X
(Yi Yb(i) )2 =
i=1
Prediction risk
R(S) =
n
X
mspei =
i=1
n
X
h
i
E (Ybi (S) Yi )2
n
X
i=1
Yi Ybi (S)
1 Uii (S)
!2
i=1
Training error
btr (S) =
R
n
X
(Ybi (S) Yi )2
19
i=1
19.1
btr (S)
rss(S)
R
R2 (S) = 1
=1
=1
tss
tss
Pn b
2
i=1 (Yi (S) Y )
P
n
2
i=1 (Yi Y )
Adjusted R2
R2 (S) = 1
Density Estimation
Z
R
A
f (x) dx.
Z
2
f (x) fbn (x) dx = J(h) + f 2 (x) dx
Frequentist risk
Z
h
i Z
R(f, fbn ) = E L(f, fbn ) = b2 (x) dx + v(x) dx
n 1 rss
n k tss
Mallows Cp statistic
b
btr (S) + 2kb
R(S)
=R
2 = lack of fit + complexity penalty
h
i
b(x) = E fbn (x) f (x)
h
i
v(x) = V fbn (x)
22
19.1.1
Histograms
KDE
n
x Xi
1X1
K
fbn (x) =
n i=1 h
h
Z
Z
1
1
4
00
2
b
R(f, fn ) (hK )
K 2 (x) dx
(f (x)) dx +
4
nh
Z
Z
2/5 1/5 1/5
c
c2 c3
2
2
h = 1
c
=
,
c
=
K
(x)
dx,
c
=
(f 00 (x))2 dx
1
2
3
K
n1/5
Z
4/5 Z
1/5
c4
5 2 2/5
2
00 2
b
R (f, fn ) = 4/5
K (x) dx
(f ) dx
c4 = (K )
4
n
|
{z
}
Definitions
Number of bins m
1
Binwidth h = m
Bin Bj has j observations
R
Define pbj = j /n and pj = Bj f (u) du
Histogram estimator
fbn (x) =
m
X
pbj
j=1
C(K)
I(x Bj )
Epanechnikov Kernel
pj
E fbn (x) =
h
h
i p (1 p )
j
j
V fbn (x) =
nh2
Z
h2
1
2
R(fbn , f )
(f 0 (u)) du +
12
nh
!1/3
1
6
h = 1/3 R
2 du
n
(f 0 (u))
2/3 Z
1/3
C
3
2
b
0
R (fn , f ) 2/3
(f (u)) du
C=
4
n
h
(
K(x) =
4 5(1x2 /5)
|x| <
otherwise
n
n
n
2Xb
2
1 X X Xi Xj
fbn2 (x) dx
K
+
K(0)
f(i) (Xi )
2
n i=1
hn i=1 j=1
h
nh
K (x) = K
(2)
(x) 2K(x)
(2)
Z
(x) =
K(x y)K(y) dy
19.1.2
2Xb
2
n+1 X 2
fbn2 (x) dx
f(i) (Xi ) =
pb
n i=1
(n 1)h (n 1)h j=1 j
19.2
Non-parametric Regression
Yi = r(xi ) + i
E [i ] = 0
Kernel K
K(x) 0
R
K(x) dx = 1
R
xK(x) dx = 0
R 2
2
x K(x) dx K
>0
V [i ] = 2
k-nearest Neighbor Estimator
rb(x) =
1
k
X
i:xi Nk (x)
Yi
23
20
n
X
Stochastic Processes
Stochastic Process
wi (x)Yi
i=1
wi (x) =
xxi
h
Pn
xxj
K
j=1
h
(
{0, 1, . . . } = Z discrete
T =
[0, )
continuous
{Xt : t T }
[0, 1]
Z
4 Z
2
h4
f 0 (x)
2 2
00
0
R(b
rn , r)
x K (x) dx
r (x) + 2r (x)
dx
4
f (x)
R
Z 2
K 2 (x) dx
+
dx
nhf (x)
c1
h 1/5
n
c2
R (b
rn , r) 4/5
n
Notations Xt , X(t)
State space X
Index set T
20.1
Markov Chains
Markov chain
P [Xn = x | X0 , . . . , Xn1 ] = P [Xn = x | Xn1 ]
n T, x X
n
X
i=1
19.3
n
X
i=1
K(0)
xx
Pn
j
j=1 K
h
Approximation
r(x) =
j j (x)
j=1
J
X
j j (x)
Multivariate regression
where
i = i
..
.
0 (xn )
!2
pij P [Xn+1 = j | Xn = i]
pij (n) P [Xm+n = j | Xm = i]
Transition matrix P (n-step: Pn )
Chapman-Kolmogorov
J (x1 )
..
.
pij (m + n) =
J (xn )
b = (T )1 T Y
1
T Y (for equally spaced observations only)
n
Cross-validation estimate of E [J(h)]
2
n
J
X
X
bCV (J) =
Yi
R
j (xi )bj,(i)
j=1
Pm+n = Pm Pn
i=1
n-step
i pij = 1
i=1
Y = +
0 (x1 )
..
and = .
Transition probabilities
Pn = P P = Pn
Marginal probability
n = (n (1), . . . , n (N ))
where
i (i) = P [Xn = i]
0 , initial distribution
n = 0 Pn
24
20.2
Poisson Processes
Poisson process
{Xt : t [0, )} = number of events up to and including time t
X0 = 0
Independent increments:
(s, t)
Cov [xs , xt ]
=p
(s, t) = p
V [xs ] V [xt ]
(s, s)(t, t)
Cross-covariance function (CCV)
Rt
0
xy (s, t) = p
(s) ds
xy (s, t)
x (s, s)y (t, t)
Backshift operator
>0
B k (xt ) = xtk
Waiting times
Wt := time at which Xt occurs
1
Wt Gamma t,
Difference operator
d = (1 B)d
Interarrival times
White noise
St = Wt+1 Wt
1
St Exp
2
wt wn(0, w
)
St
Wt1
21
Wt
iid
2
Gaussian: wt N 0, w
E [wt ] = 0 t T
V [wt ] = 2 t T
w (s, t) = 0 s 6= t s, t T
Random walk
Time Series
Mean function
xt = E [xt ] =
xft (x) dx
Drift
Pt
xt = t + j=1 wj
E [xt ] = t
Autocovariance function
Symmetric moving average
x (s, t) = E [(xs s )(xt t )] = E [xs xt ] s t
x (t, t) = E (xt t )2 = V [xt ]
mt =
k
X
j=k
aj xtj
where aj = aj 0 and
k
X
j=k
aj = 1
25
21.1
21.2
Estimation of Correlation
Sample mean
Strictly stationary
x
=
1X
xt
n t=1
Sample variance
n
|h|
1 X
1
x (h)
V [
x] =
n
n
k N, tk , ck , h Z
h=n
Weakly stationary
Sample autocovariance function
t Z
E x2t <
2
E xt = m
t Z
x (s, t) = x (s + r, t + r)
r, s, t Z
Autocovariance function
nh
1 X
(xt+h x
)(xt x
)
n t=1
b(h) =
h Z
b(h) =
b(h)
b(0)
bxy (h) =
nh
1 X
(xt+h x
)(yt y)
n t=1
bxy (h)
bxy (h) = p
bx (0)b
y (0)
Properties
xy (h) = p
xy (h)
x (0)y (h)
1
bx (h) = if xt is white noise
n
1
bxy (h) = if xt or yt is white noise
n
21.3
Linear process
j wtj
where
j=
2
(h) = w
|j | <
xt = t + st + wt
j=
X
j=
j+h j
t = trend
st = seasonal component
wt = random noise term
26
21.3.1
Detrending
Least squares
(B) = 1 + 1 B + + p B p
MA (q) (moving average model order q)
Moving average
1
2k + 1
k
X
1
2k+1 :
E [xt ] =
xt1
(
(h) = Cov [xt+h , xt ] =
i=k
ARIMA models
Autoregressive polynomial
z C p 6= 0
2
w
0
Pqh
j=0
j j+h
0hq
h>q
xt = wt + wt1
2 2
(1 + )w h = 0
2
(h) = w
h=1
0
h>1
(
h=1
2
(h) = (1+ )
0
h>1
t = 0 + 1 t = xt = 1
ARMA (p, q)
xt = 1 xt1 + + p xtp + wt + 1 wt1 + + q wtq
Autoregressive operator
(B)xt = (B)wt
(B) = 1 1 B p B
AR (1)
xt = k (xtk ) +
k1
X
j (wtj )
k,||<1
j=0
j (wtj )
j=0
{z
linear process
E [xt ] =
j=0
(E [wtj ]) = 0
j E [wtj ] = 0
MA (1)
Differencing
(z) = 1 1 z p zp
q
X
j=0
Pk
1
If 2k+1
i=k wtj 0, a linear trend function t = 0 + 1 t passes
without distortion
21.4
z C q 6= 0
(h)
(0)
xt =
2 h
w
12
(1 )j1 xtj + wt
when || < 1
j=1
= h
(h) = (h 1) h = 1, 2, . . .
x
n+1 = (1 )xn +
xn
Seasonal ARIMA
27
Periodic mixture
xt =
21.4.1
j=0
wtj = (B)wt
j=0
k=1
q
X
j=0
(h) = 2 cos(20 h)
2 2i0 h 2 2i0 h
e
+
e
2
2
Z 1/2
=
e2ih dF ()
Xtj = wt
j=0
Properties
1/2
ARMA (p, q) causal roots of (z) lie outside the unit circle
(z)
(z) =
j z =
(z)
j=0
j
0
F () = 2 /2
|z| 1
< 0
< 0
0
ARMA (p, q) invertible roots of (z) lie outside the unit circle
(z)
j z j =
(z) =
(z)
j=0
F () = F (1/2) = 0
F () = F (1/2) = (0)
|z| 1
Spectral density
Behavior of the ACF and PACF for causal and invertible ARMA models
ACF
PACF
21.5
AR (p)
tails off
cuts off after lag p
MA (q)
cuts off after lag q
tails off q
ARMA (p, q)
tails off
tails off
Spectral Analysis
Periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t)
(h)e2ih
R 1/2
f () =
h=
Needs
h=
1
1
2
2
1/2
e2ih f () d
h = 0, 1, . . .
f () 0
f () = f ()
f () = f (1 )
R 1/2
(0) = V [xt ] = 1/2 f () d
2
White noise: fw () = w
ARMA (p, q) , (B)xt = (B)wt :
|(e2i )|2
|(e2i )|2
Pp
Pq
where (z) = 1 k=1 k z k and (z) = 1 + k=1 k z k
2
fx () = w
28
I0 (a, b) = 0
I1 (a, b) = 1
Ix (a, b) = 1 I1x (b, a)
n
X
xt e2ij t
22.3
i=1
Fourier/Fundamental frequencies
Finite
j = j/n
Inverse DFT
xt = n1/2
n1
X
d(j )e2ij t
j=0
I(j/n) = |d(j/n)|
Scaled Periodogram
4
I(j/n)
n
!2
n
2X
=
xt cos(2tj/n +
n t=1
P (j/n) =
22.1
n
2X
xt sin(2tj/n
n t=1
!2
Math
k=
k=1
n
X
n(n + 1)
2
(2k 1) = n2
k=1
n
X
k=1
n
X
ck =
k=0
cn+1 1
c1
k=0
n
X
= 2n
r+k
r+n+1
=
k
n
k=0
n
X k
n+1
=
m
m+1
k2 =
n
X
n
k=0
Vandermondes Identity:
r
X
m
n
m+n
=
k
rk
r
k=0
c 6= 1
Binomial
Theorem:
n
X
n nk k
a
b = (a + b)n
k
k=0
Infinite
Gamma Function
Z
ts1 et dt
0
Z
Upper incomplete: (s, x) =
ts1 et dt
x
Z x
Lower incomplete: (s, x) =
ts1 et dt
Ordinary: (s) =
( + 1) = ()
>1
(n) = (n 1)!
nN
(1/2) =
22.2
Binomial
n
X
n(n + 1)(2n + 1)
6
k=1
2
n
X
n(n + 1)
k3 =
2
Periodogram
22
Series
X
k=0
pk =
1
,
1p
p
|p| < 1
1p
k=1
!
X
d
1
1
k
p
=
=
dp 1 p
(1 p)2
pk =
d
dp
k=0
k=0
X r + k 1
xk = (1 x)r r N+
k
k=0
X
k
p = (1 + p) |p| < 1 , C
k
kpk1 =
|p| < 1
k=0
Beta Function
1
(x)(y)
Ordinary: B(x, y) = B(y, x) =
tx1 (1 t)y1 dt =
(x + y)
0
Z x
Incomplete: B(x; a, b) =
ta1 (1 t)b1 dt
Regularized incomplete:
a+b1
B(x; a, b) a,bN X
(a + b 1)!
Ix (a, b) =
=
xj (1 x)a+b1j
B(a, b)
j!(a
+
b
j)!
j=a
29
22.4
Combinatorics
Sampling
k out of n
w/o replacement
nk =
ordered
k1
Y
(n i) =
i=0
w/ replacement
n!
(n k)!
n
nk
n!
=
=
k
k!
k!(n k)!
unordered
nk
n1+r
n1+r
=
r
n1
(
1
n
=
0
0
1kn
n=0
else
Partitions
Pn+k,k =
n
X
Pn,i
k > n : Pn,k = 0
n 1 : Pn,0 = 0, P0,0 = 1
i=1
f :BU
f arbitrary
mn
B : D, U : D
B : D, U : D
B : D, U : D
f injective
(
mn m n
0
else
m+n1
n
m
X
n
k=1
B : D, U : D
D = distinguishable, D = indistinguishable.
m
X
k=1
m
n
m!
n
m
n1
m1
1
0
mn
else
n
m
1
0
mn
else
Pn,m
k
Pn,k
f surjective
f bijective
(
n! m = n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
(
1 m=n
0 else
References
[1] L. M. Leemis and J. T. McQueston. Univariate Distribution Relationships. The American
Statistician, 62(1):4553, 2008.
[2] A. Steger. Diskrete Strukturen Band 1: Kombinatorik, Graphentheorie, Algebra.
Springer, 2001.
[3] A. Steger. Diskrete Strukturen Band 2: Wahrscheinlichkeitstheorie und Statistik.
Springer, 2002.
30
31