S RandomVariables
S RandomVariables
We will now study the main tools use for modeling experiments with unknown outcomes in statistics:
random variables and their distribution.
• Attached to each random variable is a probability rule that measures the likelihood of a
particular outcome. Suppose A is a set in R and we wish to measure the probability that
X ∈ A. This is given by:
F (x) = p(X 6 x)
• For a function FX defined in all R to be a cdf it must satisfy the following conditions:
– be non-decreasing;
– FX (+∞) = 1
– FX (−∞) = 0
– p(a < X 6 b) = FX (b) − FX (a)
• A random variable X is discrete or follows a discrete distribution if F can take only a finite
number of different values {k1 , k2 , . . . , kn }.
• A random variable X is discrete or follows a discrete distribution if F can take only a discrete
number of different values (possibly infinitely many) {k1 , k2 , . . . , kn , . . .}.
1
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• A discrete random variable takes values only on a discrete set of values, {k1 , k2 , . . .}.
• For these variables we can define a function fX called the probability density function and
abbreviated as pdf :
fX (x) = p(X = x)
• Notice:
– If x is not one of the possible outcomes of the experiment, {k1 , k2 , . . .}, then fX (x) = 0.
– At points in the possible space of outcomes, the pdf will be strictly positive. These are
called mass points of the distribution.
– Thus fX is always non-negative. Since it measures the probability of each possible event,
it cannot be bigger than 1.
FX (x) = p(X 6 x)
X
= fX (ki )
i:ki 6x
– FX is non-decreasing: if x2 > x1 then all values ki that are no larger then x1 must not
be larger than x2 and thus FX (x2 ) must be at least equal to FX (x1 );
P
– FX (+∞) = all i fX (ki ) = 1;
– FX (−∞) = 0;
P P P
– p(a < X 6 b) = i:a<ki 6b fX (ki ) = i:ki 6b fX (ki ) − i:ki 6a fX (ki ) = FX (b) − FX (a).
• A continuous random variable X can assume any value within an interval which may be
bounded or not.
2
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• In this case, the cdf of X, FX may be continuous over the whole R or, at least, will be
continuous over intervals of R - it may have some discontinuity points.
• To start with, suppose that the cdf of X is continuous and differentiable over R. Then we
can define the function fX as follows:
dFX (x)
fX (x) = = FX0 (x) (1)
dx
• The function fX defined above is called the probability density function and abbreviated pdf.
It measures the marginal change (increase) in FX as x changes infinitesimally.
• The reverse of (1) is that the cdf can be defined as being the function F that satisfies the
following condition:
Z x
FX (x) = fX (x)dx
−∞
meaning that it measures the area below the curve of fX .
• This definition of cdf can be extended to random variables that follow a continuous distribu-
tion in all R except possibly for a finite number of points. In this case fX (x) = FX0 (x) for all
x where the derivative exists.
p(X = a) = p(a 6 X 6 a)
Z a
= fX (x)dx
a
= 0
meaning that the likelihood of the realisation of a particular value in the (continuous)
distribution of X is zero.
– But then the function fX is non-unique: it can be changed in a discrete (finite or infinite)
number of points and still form the same cdf. We solve the ambiguity by using always
the continuous version of fX unless there are reasons to do it otherwise.
3
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
– Suppose the random variable X is defined on the space {k1 , k2 , . . .}, following a discrete
distribution with pdf fX so that the probability of x is p(X = x) = fX (x).
– Now consider a transformation of X through a function h to form a new random variable
Y : Y = h(X).
– Y follows a new probability rule fY which is defined as follows:
fY (y) = p(Y = y)
= p(h(X) = y)
X
= fX (x)
x:h(x)=y
FY (y) = P (Y 6 y)
= p(h(X) 6 y)
Z
= fX (x)dx
x:h(x)6y
2.1.4 Moments
• The distribution of a random variable contains all the information about it. However, it is
often cumbersome and difficult to present. Instead, some functions of the random variable
summarise the distribution and are often presented. The most commonly used functions are
the moments of the random variable.
4
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• Higher order moments: These help characterise a distribution. They may be centred or not:
• Median: another central moment of the distribution. It is the point m that divides the
distribution in two parts, each with a probability of 0.5.
5
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
– The median of the distribution of a discrete random variable X is defined as the smallest
value m such that:
• Quantile: the median is an example of a quantile, the 0.5-quantile. In general, the p-quantile
of a distribution is the value x that divides the distribution in two parts, one with probability
p and the other with probability 1 − p.
Qp (X) = x if p(X 6 x) = p
– For a discrete random variable X, the p-quantile is defined as the smallest x such that
p(X 6 x) > p
We may imagine cases where more than one random variable is required to describe an experiment.
We will now study how to deal with more than one random variable in simultaneous.
• Let (X, Y ) be a pair of random variables. We now want to characterise their joint distribution.
• Discrete case: if both X and Y are discrete random variables defined on the space S, the
joint pdf is
6
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• Continuous case: if X and Y are continuous random variables, the joint cdf is
FXY (x, y) = P (X 6 x, Y 6 y)
∂ 2 FXY (x, y)
fXY (x, y) =
∂x∂y
and thus:
Z bx Z by
p(ax < X 6 bx , ay < Y 6 by ) = fXY (x, y)dydx
ax ay
and
Z bx Z by
p(X 6 a, Y 6 b) = fXY (x, y)dydx
−∞ −∞
= FXY (a, b)
• Consider again the case of two random variables (X, Y ). If the joint cdf is known, then the
cdf of each variable can be derived.
• In the discrete case, this amounts to sum over all the possible values of the other variable. Let
S be the support of (X, Y ) (the set of possible values that (X, Y ) may assume) and suppose
there are MX and MY different possible values that X and Y can assume, respectively. The
marginal distribution of X is defined by its marginal pdf, fX , as follows:
fX (x) = p(X = x)
MY
X
= fXY (x, yj )
j=1
and similarly to Y :
fY (y) = p(Y = y)
MX
X
= fXY (xi , y)
i=1
7
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• In the continuous case we need to integrate over one of the variables to obtain the cdf of the
other:
Z x Z +∞
FX (x) = fXY (x, y)dydx
−∞ −∞
Z +∞ Z y
FY (y) = fXY (x, y)dydx
−∞ −∞
The marginal cdf’s can now be obtained from the first derivatives of the marginal pdf:
Z +∞
fX (x) = fXY (x, y)dy
−∞
Z+∞
fY (y) = fXY (x, y)dx
−∞
• We have encountered the concept of conditional probability before. We can now apply it to
distribution functions.
• Suppose we have a pair of random variables (X, Y ) and wish to determine the probability of
some realisation of y given that we have some information about X. In particular, we can
derive:
p(Y 6 y, X 6 x)
P (Y 6 y|X 6 x) =
p(X 6 x)
FXY (x, y)
=
FX (x)
• For discrete random variables we can immediately write the pdf in a similar way:
p(Y = y, X = x)
p(Y = y|X = x) =
p(X = x)
fXY (x, y)
=
fX (x)
= fY |X (y|x)
8
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
• For continuous random variables we need to take a small interval in X and write a similar
relationship:
p(Y 6 y, X 6 x + ∆) − p(Y 6 y, X 6 x)
P (Y 6 y|x < X 6 x + ∆) =
p(X 6 x + ∆) − p(X 6 x)
FXY (x + ∆, y) − FXY (x, y)
=
FX (x + ∆) − FX (x)
[FXY (x + ∆, y) − FXY (x, y)]/ ∆
=
[FX (x + ∆) − FX (x)]/ ∆
and thus:
fY |X (y|x)fX (x)
fX|Y (x|y) =
fY (y)
2.2.3 Moments
• Expected value:
– But then:
EXY (g1 (X, Y ) + g2 (X, Y )) = EXY (g1 (X, Y )) + EXY (g2 (X, Y ))
9
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
2.2.4 Independence
and
which means that knowing one does not say anything about the other.
• In this case we have some results for the moments. If X and Y are independent then:
– E(XY ) = E(X)E(Y )
– V (aX + bY ) = a2 V (X) + b2 V (Y )
– E(X|Y ) = E(X) and E(Y |X) = E(Y ) where
Z +∞
E(Y |X = x) = yfY |X (y|x)dy
−∞
10
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
The above results extend simply to the case where there are many random variables. In such case
they are generally arranged in vectors.
FX (x) = p(X 6 x)
= p(X1 6 x1 , X2 6 x2 , . . . , Xn 6 xn )
– Expected value:
EX1 (X1 )
EX (X2 )
2
EX (X) = ..
.
EXn (Xn )
– Each of the expectations inside the vector are performed using the marginal distributions,
so for example:
Z +∞ Z +∞ Z +∞
EX1 (X1 ) = ... x1 fX (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
−∞ −∞ −∞
Z +∞
= x1 fX1 (x1 )dx1
−∞
VX (a0 X + b) = VX (X 0 a)
= a0 V (X)a
This is a quadratic form. Since the variance is always non-negative, it yields that
VX (a0 X + b) > 0. But then, V (X) is psd.
11
UCL Department of Economics MSc Maths and Statistics Refresher Course
Mónica Costa Dias September 2007
2.4 Exercises
1. An exam consists of 100 multiple-choice questions. Form each question there are four possible
answers, only one of them being correct. If a candidate guesses answers at random, what is
the probability of getting at least 30 questions correct?
2. Two fair dices are thrown. Let X be the number of points in the first die and Y be the
number of points in the second die. Define Z = X + Y and W = XY .
Find the expectations and variances of X, Y, Z, W . Also find E(X 2 ), E(Y 2 ), E(Z 2 ) and
E(W 2 ).
Let
T = 2X + Y − 3Z + 4
U = (X + Z)(Y + Z)
12