0% found this document useful (0 votes)
35 views13 pages

8 A. K. Majee

The document defines and provides examples of random variables and their distributions. It introduces the key concepts of: 1) A random variable is a function that maps outcomes of a probability space to real numbers. 2) The distribution of a random variable is a probability measure on the real numbers defined by P(X∈B) for events B in the random variable's range. 3) The cumulative distribution function (CDF) of a random variable X is the function FX(x) = P(X ≤ x) and describes the probability that X is less than or equal to x.

Uploaded by

tarungajjuwalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views13 pages

8 A. K. Majee

The document defines and provides examples of random variables and their distributions. It introduces the key concepts of: 1) A random variable is a function that maps outcomes of a probability space to real numbers. 2) The distribution of a random variable is a probability measure on the real numbers defined by P(X∈B) for events B in the random variable's range. 3) The cumulative distribution function (CDF) of a random variable X is the function FX(x) = P(X ≤ x) and describes the probability that X is less than or equal to x.

Uploaded by

tarungajjuwalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

8 A. K.

MAJEE

2. Random variables and their distributions


Frequently, when an experiment is performed, we are interested mainly in some function of
the outcome as opposed to the actual outcome itself. For instance, in tossing dice, we are often
interested in the sum of the two dice and are not really concerned about the separate values of
each die.
Definition 2.1 (Random variable). Let (Ω, F, P) be a probability space. A real-valued
function X : Ω → R is said to be random variable if for any B ∈ B(R), X −1 (B) ∈ F.
Example 2.1. Let (Ω, F, P) be a probability space and A ∈ F. Define a function X : Ω → R by
(
1 if ω ∈ A
X(ω) =
0 if ω ∈/A

Then X is a random variable. Indeed, for any x ∈ R, we see that



∅ if x < 0

{ω ∈ Ω : X(ω) ≤ x} = A{ if 0 ≤ x < 1

Ω if x ≥ 1.

The function X is called an indicator function of A and often denoted by χA or IA .


Example 2.2. Let Ω = {0, 1, 2} and F = {∅, {0}, {1, 2}, Ω}. Then (Ω, F) is a measurable space.
Define a function X on Ω by X(i) = i for i ∈ Ω. Then X is NOT a random variable. Indeed,
for any x ∈ R, we see that


∅ if x < 0

{0} if 0 ≤ x < 1
{i ∈ Ω : X(i) ≤ x} =

{0, 1} if 1 ≤ x < 2

Ω if x ≥ 2.

Since {0, 1} ∈
/ F, by definition X is NOT a random variable.
For any B ∈ B(R), we are interested in P(X ∈ B). Let X be a random variable defined on a
given probability space (Ω, F, P). Define a function PX on B(R) via
PX (B) := P(X ∈ B)
One can easily check that PX is a probability measure on (R, B(R)). This is called distribution
of X.
Definition 2.2 (Distribution function/ Cumulative distribution function (cdf ) ). Let
X be a random variable defined on a probability space (Ω, F, P). Then the function FX : R →
[0, 1] defined by
FX (x) = P(X ≤ x) = PX ((−∞, x])
is called the distribution function or cumulative distribution function (cdf) of the random vari-
able X.
Example 2.3. A fair coin is tossed twice: Ω = {HH, HT, T H, T T }. Take F = P(Ω) and
define a probability measure P(A) = |A|
4 for any A ∈ F. For ω ∈ Ω, let X(ω) be the number of
heads so that
X(HH) = 2, X(HT ) = X(T H) = 1, X(T T ) = 0.
PROBABILITY AND STOCHASTIC PROCESS 9

Then X is a random variable. Indeed for any x ∈ R, one has




 ∅ for x < 0

{T T } for 0 ≤ x < 1
{ω : X(ω) ≤ x} =

 {T T, HT, T H} for 1 ≤ x < 2

Ω for x ≥ 2.

The distribution function FX of X is then given by




0 for x<0
1

for 0≤x<1
FX (x) = P(ω : X(ω) ≤ x) = 43
for 1≤x<2
4


1 for x ≥ 2.

Next we discuss some essential properties of distribution function.


Lemma 2.1. The distribution function FX satisfies the following properties:
a) Nondecreasing: if x < y, then FX (x) ≤ FX (y).
b) Right continuity: F is right-continuous i.e., FX (x + h) → FX (x) as h ↓ 0.
c) Left limit: FX (·) has left limit and FX (x−) = P(X < x).
d) limx→∞ FX (x) = 1 and limx→−∞ FX (x) = 0.
e) P(x < X ≤ y) = FX (y) − FX (x).
f ) P(x ≤ X ≤ y) = FX (y) − FX (x−).
g) P(x ≤ X < y) = FX (y−) − FX (x−), P(x < X < y) = FX (y−) − FX (x).
h) P(X = x) = FX (x) − FX (x−).
Proof. Proof of a): For x < y, we have {X ≤ x} ⊆ {X ≤ y} and hence FX (x) ≤ FX (y).

Proof of b): Since F is nondecreasing, it is sufficient to show that FX (xn ) → FX (x) for any
sequence of numbers xn ↓ x with x1 ≥ x2 ≥ . . . xn > x. Define An := {X ≤ xn }. Then
An+1 ⊆ An and ∩∞n=1 An = {X ≤ x}. Hence by using the property of P, we get

FX (x) = P(X ≤ x) = P(∩∞


n=1 An ) = lim P(An ) = lim FX (xn ).
n→∞ n→∞

Proof of c): Let x ∈ R be fixed. Let {xn } be such that x1 ≤ x2 ≤ . . . < x and limn→∞xn =x .
Take An = {X ≤ xn }. Then An ⊆ An+1 and ∩∞ n=1 An = {X < x}. Thus, we have

P(X < x) = P(∩∞


n=1 An ) = lim P(An ) = lim FX (xn ).
n→∞ n→∞

Thus, FX (·) has left limit and FX (x−) = P(X < x).
Proof of d): Observe that {X ≤ n} ⊆ {X ≤ (n + 1)} and Ω = ∪∞
n=1 {X ≤ n}. Hence by using
the property of P, we get
1 = P(Ω) = P(∪∞
n=1 {X ≤ n}) = lim P({X ≤ n}) = lim FX (n) = lim FX (x).
n→∞ n→∞ x→∞

For the second part take An = {X ≤ −n}. Then An is decreasing and ∩∞


n=1 An = ∅. Hence one
has
0 = P(∅) = P(∩∞
n=1 An ) = lim P(An ) = lim FX (−n) = lim FX (x).
n→∞ n→∞ x→−∞

Proof of e): Since PX is a probability measure on (R, B(R)) , by using the property of a
measure that for any B ⊆ A, there holds PX (A) − PX (B) = PX (A \ B) and the definition
FX (x) = PX ((−∞, x]), we see that
FX (y) − FX (x) = PX ((x, y]) = P(x < X ≤ y).
10 A. K. MAJEE

Proof of f): By e), PX ((x − n1 , y]) = FX (y) − FX (x − n1 ). Observe that the sequence of intervals
(x − n1 , y] decreases to [x, y]. Hence
1 1 
P(x ≤ X ≤ y) = PX ([x, y]) = lim PX ((x− , y]) = lim FX (y)−FX (x− ) = FX (y)−FX (x−).
n→∞ n n→∞ n
1
Proof of g): Note that [x, y) = ∩n An where An = [x, y − n ] and An are nondecreasing. Thus,
PX ([x, y)) = lim PX An = FX (y−) − FX (x−).
n→∞

For the second part, we take An = (x − y − n1 ]. Then An ↑ (x, y) and hence we see that
1 
P(x < X < y) = PX ((x, y)) = lim PX (An ) = lim FX (y − ) − FX (x) = FX (y−) − FX (x).
n→∞ n→∞ n
Proof of h): We see that, by using c) and the property of PX

P(X = x) = PX (−∞, x] \ (−∞, x) = PX ((−∞, x]) − PX ((−∞, x))
= FX (x) − P(X < x) = FX (x) − FX (x−).

2.1. Discrete and Continuous random variables:
Definition 2.3 (Discrete random variable). Let X be a real-valued random variable. It
is said to be a discrete random variable if it takes the values in some countable subset E =
{x1 , x2 , . . .}, i.e., P(X ∈ E) = 1.
Definition 2.4 (Probability mass function (pmf )). Let X be a discrete random variable,
defined on a given probability space (Ω, F, P), taking values in a countable set E of R. The
function pX : R → [0, 1] defined by
(
P(X = x) if x = xi ∈ E
pX (x) =
0 otherwise
is called probability mass function of X.
Let pi = P(X = xi ). Then pi ≥ 0 and i pi = ∞ ∞
P P 
i=1 P(X = xi ) = P ∪i=1 {X = xi } =
P(X ∈ E) = 1. The cdf of a discrete random variable may be given in terms of pmf:
X X
FX (x) = P(X ≤ x) = P(∪xi ≤x {X = xi }) = P(X = xi ) = pX (xi ).
xi ≤x xi ≤x

Example 2.4. A fair coin is tossed two times. Let X be the number of heads obtained. Then
clearly X is a discrete random variable. Its pmf is given by
1 2 1
pX (0) = P(X = 0) = , pX (1) = P(X = 2) = =
4 4 2
1
pX (2) = P(X = 2) = , pX (x) = 0, ∀x ∈ R \ {0, 1, 2}.
4
Let us determine the following:
3 1
P(0.5 < X ≤ 4) = P(X = 1) + P(X = 2) = , P(−1.5 ≤ X < 1) = P(X = 0) = ,
4 4
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 1.
Definition 2.5 (Continuous random variable). Let X be a random variable with cdf FX .
Then X is said to be continuous if there exists a non-negative integrable function fX such that
Z x
FX (x) = fX (t) dt.
−∞
PROBABILITY AND STOCHASTIC PROCESS 11
R∞
Observe that, since limx→∞ FX (x) = 1, we have −∞ fX (t) dt = 1. Moreover, for any a, b ∈ R
with a < b, we have
Z b Z a Z b
P(a < X ≤ b) = FX (b) − FX (a) = fX (t) dt − fX (t) dt = fX (t) dt.
−∞ −∞ a
Notice that for any continuous random variable X, the cdf FX is continuous and hence
i) P(X = a) = FX (a) − FX (a−) = 0.
Rb
ii) P(a ≤ X ≤ b) = P(a < X < b) = P(a ≤ X < b) = a fX (t) dt.
R
Furthermore, for any B ∈ B(R), we obtain PX (B) = P(X ∈ B) = B f (t) dt.
Definition 2.6 (Probability density function (pdf )). The function fX given in Definition
2.5 is called probability density function of the random variable X.
d
Observe that if fX is continuous, then FX is differentaible at every point of x and fX = dx FX .
Since FX is absolutely continuous, it is differentiable at all x except a countably many points.
d
For the points where FX is not differentaible, we define dx FX = 0. In this way, we conclude
that
d
fX (x) = FX (x) ∀ x.
dx
Example 2.5. Let X be a random variable whose cdf is given by
(
1 − (1 + x)e−x , x ≥ 0
FX (x) =
0, x < 0.
We like to determine the pdf of X and then calculate P(X ≤ 13 ). Observe that FX is continuous
on R and differentiable except the zero point. Thus
(
xe−x , x > 0
fX (x) =
0, x ≤ 0.
By the definition of cdf, we get that
1 1 4 1
P(x ≤ ) = FX ( ) = 1 − e− 3 .
3 3 3
Example 2.6. Let X be a continuous random variable with pdf
(
kx(1 − x), 0 < x < 1
fX (x) =
0, otherwise
We would Rlike to determine k and then calculate P(X > 0.3). Since fX is pdf, it must satisfy the
condition R fX (x) dx = 1. This implies that k = 6. Using the definition of continuous random
variable, we get
Z 0.3
P(X > 0.3) = 1 − P(X ≤ 0.3) = 1 − FX (0.3) = 1 − 6 x(1 − x) dx = 0.784.
0
2.2. Functions of a random variables: Let X be a random variable defined on a given
probability space (Ω, F, P). Let g be a Borel-measurable function on R. Then Y = g(X) is a
random variable. Indeed, for any y ∈ R,
{ω : Y (ω) ≤ y} = {ω : X ∈ g −1 ((−∞, y])}
Since g is Borel-measurable, g −1 ((−∞, y]) ∈ B(R) and hence {ω : X ∈ g −1 ((−∞, y])} ∈ F.
We are interested in determining the distribution function of Y = g(X) in terms of the cdf
FX of X. It is given by
FY (y) = P(g(X) ≤ y) = P(X ∈ g −1 ((−∞, y])).
12 A. K. MAJEE

Example 2.7. Let X be a random variable with cdf FX . Take Y = |X|. Then
FY (y) = P(−y ≤ X ≤ y) = FX (y) − FX (−y) + P(X = −y).
If X is a continuous random variable, then P(X = −y) = 0 and hence
(
FX (y) − FX (−y), y > 0
FY (y) =
0, y ≥ 0.
If X is discrete random variable, then
(
FX (y) − FX (−y) + P(X = −y), y>0
FY (y) =
0, y ≥ 0.
Example 2.8. Suppose that X is a continuous random variable. Define
(
1, X ≥ 0
Y =
−1, X < 0.
We would like to find cdf of Y . Note that P(Y = 1) = P(X ≥ 0) and P(Y = −1) = P(X < 0).
Hence we get

0, y < −1

P(Y ≤ y) = P(X < 0), −1 ≤ y < 1

1, y ≥ 1.

Theorem 2.2. Let X be a continuous random variable having pdf fX . Suppose that g(x) is
strictly monotone, differentiable function of x. Then Y = g(X) has a pdf given by

f (g −1 (y)) d g −1 (y) if y = g(x) for some x
X
fY (y) = dy
0, if y 6= g(x) for all x.

Proof. Suppose g is increasing. Let y = g(x) for some x. Then


FY (y) = P(g(X) ≤ y) = P X ≤ g −1 (y) = FX (g −1 (y))


Differentiating the above equality, we have


d d −1
fY (y) = fX (g −1 (y)) g −1 (y) = fX (g −1 (y)) g (y)
dy dy
where in the last equality, we have used the fact that derivative of g −1 (y) is non-negative (as
g −1 (y) is non-decreasing). When y 6= g(x) for all x, then FY (y) is either 0 or 1and hence
fY (y) = 0. This completes the proof. 
Corollary 2.3. Let g be piecewise strictly monotone and continuously differentiable i.e., there
exist intervals I1 , I2 , . . . , In which partition R such that g is strictly monotone and continuously
differentiable on the interior of each Ii . Then Y = g(X) has a pdf given by
n
X d −1
FY (y) = fX (gk−1 (y)) g (y)
dy k
k=1

where gk−1 is the inverse of g on Ik .


Example 2.9. Find the pdf of Y = X 3 for any continuous nonnegative random variable X with
density function fX .
1 2
Solution: Let g(x) = x3 . Then g is monotone, g −1 (y) = y 3 and dy g (y) = 13 y − 3 . By applying
d −1

Theorem 2.2, we have


1 2 1
fY (y) = y − 3 fX (y 3 ).
3
PROBABILITY AND STOCHASTIC PROCESS 13

2.3. Expectation of a function of random variable:


Definition 2.7 (Expected value of a random variable). Let X be discrete random variable
with values E = {x1 , x2 , . . .}. We define the expected value of X as
X X
E(X) := xi P(X = xi ), provided |xi |P(X = xi ) < +∞.
i i
Let X be a continuous random variable with pdf fX . We define the expected value of X as
Z Z
E[X] = xfX (x) dx, provided |x|fX (x) dx < +∞.
R R
Example 2.10. Let X be a discrete random variable with pmf given by
( x
e−5 5x! , x = 0, 1, 2, . . .
pX (x) =
0, otherwise
Then
∞ ∞
X 5k X 5i
E[X] = ke−5 = 5e−5 = 5.
k! i!
i=1 i=0

Example 2.11. Let X be a random variable with pdf


( −1
x 2
4π , 0 ≤ x ≤ 4π 2
fX (x) =
0, otherwise
Then
1
Z 4π 2
x2 4
E[X] = dx = π 2 .
0 4π 3
α
Example 2.12. Let X be a random variable with pdf given by fX (x) = α2 +x 2 where α > 0 is a
constant. Observe that
2α ∞ α ∞1
Z Z Z Z M
x 1  2

|x|fX (x) dx = dx = dz = ∞ (∴ lim dz = lim log(M ) − log(α ) )
R π 0 α 2 + x2 π α2 z M →∞ α2 z M →∞

Thus, E(X) does not exist.


Let X be a given random variable defined on a given probability space and g : R → R be a
Borel measurable function. We want to find the expectation of Y = g(X).
Theorem 2.4. Let X be a random variable on a given probability space (Ω, F, P). Then
(P
g(x)pX (x), if X is a discrete random variable
E[g(X)] = R ∞x
−∞ g(x)fX (x) dx, if X is a continuous random variable
provided the above sum or integral converges absolutely.
Proof. Let X be a discrete random variable that takes the values x1 , x2 , . . .. Then Y := g(X)
takes the values g(x1 ), g(x2 ), . . . but some of these values be same. Let yj , j ≥ 1 be the distinct
values of g(xi ). Then, by grouping, we have
X X X X X X
g(xi )pX (xi ) = g(xi )pX (xi ) = yj pX (xi ) = yj P(g(X) = yj )
i j i:g(xi )=yj j i:g(xi )=yj j
X
= yj P(Y = yj ) = E[Y ] = E[g(X)].
j
14 A. K. MAJEE

To prove second part, let us first prove the following: for any non-negative random variable Y ,
one has Z ∞
E[Y ] = P(Y > y) dy.
0
We prove this claim for continuous random variable Y with density function fY . Indeed,
Z ∞ Z ∞Z ∞  Z ∞Z x 
P(Y > y) dy = fY (x) dx dy = fY (x) dy dx
0 0 y 0 0
Z ∞
= xfY (x) dx = E[Y ] .
0
For continuous random variable X, let g be non-negative Borel measurable function. Then, we
have
Z ∞ Z ∞Z 
E[g(X)] = P(g(X) > y) dy = fX (x) dx dy
0 0 x:g(x)>y
Z Z g(x)  Z
= fX (x) dy dx = g(x)fX (x) dx .
x:g(x)>0 0 x:g(x)>0


Example 2.13. A product that is sold seasonally yields a net profit of b dollars for each unit
sold and a net loss of ` dollars for each unit left unsold when the season ends. The number
of units of the product that are ordered at a specific department store during any season is a
random variable having probability mass function pi , i ≥ 0. If the store must stock this product
in advance, determine the number of units the store should stock so as to maximize its expected
profit.
Solution: Let X denotes the number of ordered units and S is the stock. Then the profit is
defined as
(
bX − (S − X)`, X ≤ S
P (S) =
Sb, X > S.
Hence the expected profit
S
X ∞
X S
X S
X
E(P (S)) = [bi − (S − i)`]pi + Sbpi = (b + `) ipi + Sb − (b + `)S pi
i=0 i=S+1 i=0 i=0
S
X
= Sb + (b + `) (i − S)pi .
i=0
S
X
=⇒ E(P (S + 1)) = b(S + 1) + (b + `) (i − S − 1)pi .
i=0

Thus,
S
X
E(P (S + 1)) − E(P (S)) = b − (b + `) pi .
i=0

Hence, stocking S + 1 units will be better than stocking S units whenever


S
X b
pi < . (2.1)
b+`
i=0
PROBABILITY AND STOCHASTIC PROCESS 15

Observe that l.h.s of (2.1) is increasing in S where as r.h.s of (2.1) is constant, and hence the
inequality will be satisfied for all values of S ≤ S ∗ where S ∗ is the largest value of S satisfying
(2.1). Thus, stocking S ∗ + 1 items will lead to a maximum expected profit.
Example 2.14. Let X be a random variable with pdf given by
1 (x−µ)2
fX (x) = √ e− 2σ2 , −∞ < x < ∞
2πσ
where σ > 0 and µ ∈ R are given constants. We would like to find E(X) and E(X 2 ).
Z ∞ Z ∞
1 (x−µ)2 1 y2 x−µ

E[X] = √ xe 2σ dx = √
2 (σy + µ)e− 2 dy (taking y = )
2πσ −∞ 2π −∞ σ
Z ∞ Z ∞ Z ∞
1 y2 σ y2 1 y2
=µ√ e− 2 dy + √ ye− 2 dy = µ √ e− 2 dy = µ,
2π −∞ 2π −∞ 2π −∞
| {z }
=0
where in the last equality, we have used that
Z ∞
y2 √
e− 2 dy = 2π.
−∞
Z ∞ Z ∞
1 y2
E[X 2 ] = x2 fX (x) dx = √ (σy + µ)e− 2 dy
−∞ 2π −∞
Z ∞
2σµ ∞ − y2
Z ∞
σ2
Z
y2 1 y2
=√ y 2 e− 2 dy + √ ye 2 dy +µ2 √ e− 2 dy
2π −∞ 2π −∞ 2π −∞
| {z }
=0
∞ Z ∞
σ2
Z
d − y21 y2
= −√ y e 2 +µ √ 2
e− 2 dy
2π −∞ dy 2π −∞
Z ∞
1 y 2
σ2 y2 ∞
= (µ2 + σ 2 ) √ e− 2 dy − √ (ye− 2 ) = σ 2 + µ2 .
2π −∞ 2π −∞

We now state some important propeties of the expected value of a random variable, whose
proof is simple and hence left as excercise.
Theorem 2.5. Let X be a random variable defined on a given probability space.
i) If X is non-negative, i.e., P(X ≥ 0) = 1 and E(X) exists, then E(X) ≥ 0.
ii) For any constant c, E[c] = c.
iii) If X is bounded i.e., there exists M > 0 such that P(|X| > M ) = 0, then E[X] exists.
iv) Let g and h be two functions such that g(X) and h(X) are random variables and both
E[h(X)] and E[g(X)] exist. Then for any α, β ∈ R, expectation of the random variable
αg(X) + βh(X) exists and given by
E[αg(X) + βh(X)] = αE[g(X)] + βE[h(X)].
In other words, E[·] is linear. Moreover, if g(x) ≤ h(x) for all x, then
E[g(X)] ≤ E[h(X)].
In particular, one has the following:
E[X] ≤ E[|X|].
If g(x) = xn , then E[g(X)] = E[X n ] is called moment of order n.
Lemma 2.6. Let X be a random variable such that E[X r ] exists. Then E[X s ] exists for all
s < r.
16 A. K. MAJEE

Proof. We prove it for continuous random variable. For discrete case, it is similar, and left as
excercise. Let X be a continuous random variable with pdf fX . Let s < r. Then for any x ∈ R,
we have |xs | < 1 + |xr |, and hence
Z ∞ Z ∞ Z ∞
|xs |fX (x) dx < (1 + |xr |)fX (x) dx = 1 + |xr |fX (x) dx < +∞.
−∞ −∞ −∞
This shows that E[X s ] exists for all s < r 
We denote by µ the mean of a random variable i.e., µ = E[X]. It is reasonable to think how
far X is from its mean. Instead, we are interested in the quantity E[(X − µ)2 ].
Definition 2.8 (Variance of a random variable:). Let X be a random variable with finite
second moment i.e., E[X 2 ] < ∞ and mean µ. Then the variance of X, denoted by Var(X), is
defined as
Var(X) := E[(X − µ)2 ].
We now discuss certain properties of variance.
Theorem 2.7. The followings hold:
i) Var(X) ≥ 0 and Var(α) = 0 for any constant α.
ii) Var(X) = E(X 2 ) − (E[X])2 .
iii) Var(aX + b) = a2 Var(X).
iv) Var(X) = 0 if and only if P(X = E[X]) = 1.
Proof. Proof of i) : Since (X − µ)2 ≥ 0, we have E[(X − µ)2 ] ≥ 0 i.e., Var(X) ≥ 0. Moreover,
for any constant α, we have Var(α) = E[(α − E[α])2 ] = E[0] = 0.
Proof of ii) : By using linearity of expectation, we see that
Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E[X 2 ] − (E[X])2 .
Proof of iii) : By using linearity of expectation and ii), we have for any a, b ∈ R
Var(aX + b) = E[(aX + b)2 ] − (E[aX + b])2 = E[a2 X 2 + 2abX + b2 ] − (aE[X] + b)2
= a2 {E[X 2 ] − (E[X])2 } + 2abE[X] + b2 − 2abE[X] − b2 = a2 Var(X).
Proof of iv) : If P(X = E[X]) = 1, then it is clear that Var(X) = 0. Suppose Var(X) = 0. If
possible, let P(X = µ) < 1. Then there exists a constant c > 0 such that
P((X − µ)2 > c) > 0.
We then have
h i
0 = Var(X) = E[(X − µ)2 ] = E (X − µ)2 I{ω:(X−µ)2 >c} + (X − µ)2 I{ω:(X−µ)2 ≤c}
≥ E (X − µ)2 I{ω:(X−µ)2 >c} > cE I{ω:(X−µ)2 >c} = cP((X − µ)2 > c) > 0 ⇒⇐!
   


One can consider the central moment of order k, i.e., E[(X − µ)k ].
PROBABILITY AND STOCHASTIC PROCESS 17

2.4. Generating functions and their applications: We will discuss now various type of
generating functions and their applications. One of them is probability generating function,
which plays an important role in applied probability, in particular in stochastic process.
Definition 2.9 (Probability generating function (pgf )). Let P X be a non-negative integer-
valued random variable and let pi = P(X = i), i = 0, 1, 2, . . . with ∞
i=0 pi = 1. The probability
generating function of X is defined by
X∞
GX (s) = E[sX ] = pi si , |s| < 1.
i=0

Note that, for s = 1, GX (1) = E[1] = 1.


Example 2.15. i): If P(X = c) = 1, then GX (s) = E[sX ] = sc .
ii): If P(X = 1) = p and P(X = 0) = 1 − p for some p ∈ (0, 1), then GX (s) = (1 − p) + ps.
k
iii): Let X be a discrete random variable with pmf given by pX (k) = e−λ λk! for k = 0, 1, 2, . . ..
Then the pgf of X is given by
∞ ∞
X λk X (sλ)k
GX (s) = sk e−λ = e−λ = eλ(s−1) .
k! k!
k=0 k=0

Next theorem is about the application of pgf.


Theorem 2.8. Let X be a non-negative integer valued random variable with pgf GX (·). Then
(k)
E[X(X − 1) . . . (X − k + 1)] = GX (1)
(k) (k)
where GX (1) = lims→1 GX (s).
Proof. Note that, since the series ∞ i
P
i=0 pi s converges in |s| < 1, it converges uniformly and
therefore term by term differentiation is possible at points s satisfying |s| < 1. Thus, we have
(k)
X
GX (s) = si−k i(i − 1) . . . (i − k + 1)pi = E[sX−k X(X − 1) . . . (X − k + 1)].
i
This sum is convergent for |s| < 1. To prove the theorem, we will use the
P∞ following theorem,
called Abel’s theorem. It states that if ai ≥ 0 for all i, and Ga (s) = i=0 ai si is finite for
|s| < 1, then

X
lim Ga (s) = ai
s→1
i=0
whether the sum is finite or equals to +∞. Using Abel’s theorem, we have
(k)
X
lim GX (s) = E[X(X − 1) . . . (X − k + 1)] = i(i − 1) . . . (i − k + 1)pi .
s→1
i
This completes the proof. 
Suppose that E[X 2 ] < +∞. Then we can calculate Var(X) in terms of pgf. We have
(2) (1) (1)
Var(X) = E[X 2 ] − (E[X])2 = E[X(X − 1)] + E[X] − (E[X])2 = GX (1) + GX (1) − {GX (1)}2 .
One can easily check that
1 dn
pi = P(X = i) = GX (s) .
n! dsn s=0
Probability generating function uniquely determine the distribution function. In particular,
if two random variable have the same pgf in some interval containing zero, then the random
variables have same distribution.
Another important generating function is called moment generating function.
18 A. K. MAJEE

Definition 2.10 (Moment generating function (mgf ):). For a given random variable X,
we define its moment generating function, denoted by mX (·), as
mX (t) = E[etX ]
provided E[etX ] exists in a small nbd. of the origin.
e−λ λk
Example 2.16. Let X be a discrete random variable with pmf pX (k) = k! for k = 0, 1, 2, . . ..
Then

−λ
X λk t
mX (t) = e etk = e−λ(1−e ) ∀t.
k!
i=0

Example 2.17. Let X be a random variable with pdf given by


(
2e−2x , x > 0
fX (x) =
0, x ≤ 0.
Then mgf of X is given by
Z ∞ Z ∞
−2x 2
mX (t) = E[e tX
]= tx
e 2e dx = 2 e−(2−t)x dx = , for t < 2.
0 0 2−t
Example 2.18. Let X be a discrete random variable with pmf pX (k) = π62 k12 for k = 0, 1, 2, . . ..
esk
Then for any s > 0, π12 ∞
P
k=1 k2 is infinite and hence mx (·) does not exist. In fact, one can
check that E[X] = ∞.
Theorem 2.9. Let X be a random variable whose moment generating function mX (·) exists.
Then for any r ∈ N, the r-th moment E[X r ] exists. Moreover, E[X r ] can be calculated as follows:
dr
mX (s) = E[X r ], r ≥ 1.
dxr s=0
Proof. Since mX (t) exists, there exists a > 0 such that mX (a) and mX (−a) are finite for some
a > 0. We claim that for any s ∈ (−a, a), mX (s) exists. Indeed,
E[esX ] = E esX 1X>0 + 1X≤0 ≤ E[eaX + e−aX ] = mX (a) + mX (−a) < ∞ .
 

Now, for any k ∈ R∗ , there exists s ∈ (−a, a) such that 0 < sk < a. Therefore, since s|X| ≤ es|X|
we have
sk |X|k ≤ esk|X| ≤ eaX + e−aX
=⇒ E[|X|k ] ≤ s−k E[eaX + e−aX ] = s−k mX (a) + mX (−a) < ∞ .


Observe that
n n n
X ti X ti X ( st )i
E[ X i] = E[X i ] ≤ mX (a) + mX (−a)
i! i! i!
i=0 i=0 i=0
and hence we see that
∞ i ∞ i
X t i X t
mX (t) = E[etX ] = E[ X ]= E[X i ] .
i! i!
i=0 i=0
It is a power series of mX (t) around t = 0, and hence term by term differentiation is possible
for |t| < a. This implies that
∞ i
dr X s r+i dr
mX (s) = E[X ] =⇒ mX (s) = E[X r ], r ≥ 1.
dsr i! dxr s=0
i=0

Next we state an important theorem regarding mgf without proof.
PROBABILITY AND STOCHASTIC PROCESS 19

Theorem 2.10. If mX (t) exists, then it is unique. Moreover, it determines the distribution
uniquely. In particular, if X and Y be two random variables whose moment generating functions
exist, and mX (t) = mY (t) for all t, then X and Y have the same distribution.
We now discuss another important generating function which always exists.
Definition 2.11 (Characteristic function:). For a random variable X, we define its charac-
teristic function φX : R → C via

φX (t) = E[eitX ] = E[cos(tX)] + i E[sin(tX)], i = −1.
We now prove some important properties of characteristic function.
Theorem 2.11. Let φX (·) be the characteristic function of a random variable X. Then the
followings hold:
i) φX (0) = 1 and |φX (t)| ≤ 1 for all t ∈ R.
ii) φX (·) is uniformly continuous on R.
iii) If a, b ∈ R and Y = aX + b, then φY (t) = eitb φX (at).
iv) If E[|X|m < +∞], then φX is m- times continuously differentiable and
dm
φX (t) = im E[X m ].
dtm t=0
v) If X and Y are random variables such that φX (t) = φY (t) for all t ∈ R, then X and Y
have same distribution.
Proof. Proof of i): Since cos(tX) and sin(tX) are bounded functions, expectation exists and
hence φX (t) exists for all t ∈ R. Moreover, φX (0) = E[1] = 1. Furthermore,
|φX (t)| ≤ E[|eitX |] = E[1] = 1, ∀ t ∈ R.
Proof of ii): Let t ∈ R. For any h, we have
φX (t + h) − φX (t) = E[eitX (eihX − 1)] ≤ E[|eitX | |eihX − 1|] = E[|eihX − 1|].
Set Y (h) = |eihX − 1|. Then Y (h) → 0 as h → 0 and |Yh | ≤ 2. Hence by bounded convergence
theorem, E[Y (h)] → 0. Thus, we conclude that φX (·) is uniformly continuous on R.
Proof of iii): Observe that Y is a random variable. Moreover, by using linearity of expectation,
we have
φY (t) = E[eit(aX+b) ] = E[eitb eita X ] = eitb E[ei(at)X ] = eitb φX (at).
Proof of iv): We prove the result for m = 1. The general case can be established analogously
by recurrance. Fix t ∈ R. For any h ∈ R, we have, by using linearity of expectation,
1 1 eihX − 1
[φX (t + h) − φX (t)] = E[eitX (eihX − 1)] = E[eitX · ]
h h h
ihx ihx −1
Note that e h−1 → ix as h → 0 and | e h | ≤ 2|x| for small h. Since E[|X|] < +∞, by
dominated convergence theorem, we have
1 eihX − 1 eihX − 1 
[φX (t + h) − φX (t)] = lim E[eitX · ] = E eitX lim

lim
h→0 h h→0 h h→0 h
 itX  itX
= E e iX = i E[Xe ] < +∞
So, φ0X (t) exists and φ0X (t) = i E[XeitX ]. We now show that φ0X (·) is continuous. Norice that
φ0X (t + h) − φ0X (t) = i E[XeitX (eihX − 1)] ≤ E[|X|Y (h)]
where Y (h) := |eihX − 1|. Notice that |X|Y (h) → 0 as h → 0 and |X|Y (h) ≤ 2|X|. Since
E[|X|] < +∞, by dominated convergence theorem, we have
lim E[|X|Y (h)] = E[ lim |X|Y (h)] = 0.
h→0 h→0
20 A. K. MAJEE

Hence we concluse that φ0X (·) is uniformly continuous on R. 


Example 2.19. Let X be a random variable with pdf fX (x) = λe−λx for x > 0. Then for any
t ∈ R, we have
Z ∞ nZ ∞ Z ∞ o
−λx itx −λx
itX
φX (t) = E[e ] = λe e dx = λ cos(tx)e dx + i sin(tx)e−λx dx
0 0 0
Z ∞
= (λ + i t) cos(tx)e−λx dx := (λ + it)I(t).
0
We now calculate I(t). By applying integration by parts formula, we have
1 ∞ t ∞ t2
Z Z
d 1 1
I(t) = − cos(tx) e−λx = − sin(tx)e−λx + = − 2 I(t) +
λ 0 dx λ 0 λ λ λ
λ
=⇒ I(t) = 2 .
t + λ2
Thus, we have
λ(λ + it) λ
φX (t) = 2 = .
t + λ2 λ − it
Example 2.20. Let X be a continuous random variable with density function fX given by
(
1, 0 < x < 1
fX (x) =
0, otherwise.
then for any t 6= 0, we have
Z 1 Z 1
1 1
φX (t) = cos(tx) dx + i sin(tx) dx = [sin(t) − i cos(t) + i] = (eit − 1).
0 0 t it

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy