Unit 1 Multivariate Analysis Lecture Notes
Unit 1 Multivariate Analysis Lecture Notes
Contents
3 Multivariate Distributions 3
12 Solved Exercises 9
Copyright@2020
In this lecture note contain sources were collected from various books, lectures and online. The sources are cited in the
reference section. This document cannot reproducible or republishing for any kind of circumstances and it is open access
for everyone. Provided this material only for the students study purpose.
Lecture Notes on Multivariate Analysis 1 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
Objectives
• When researchers conduct a survey or experiment there are often many variables of interest.
• Multivariate analysis is a branch of statistics concerned with the analysis of multiple measurements, made on one
or several samples of individuals. For example, we may wish to measure length, width, and weight of a product.
• There is always more than one side to the problem you are trying to solve. It’s the same in your data.
• Multivariate analysis provides a more accurate view of the behavior between variables that are highly correlated,
and can detect potential problems in a product or process.
• Many decisions are based on univariate analysis, but only multivariate analysis reveals relationships that help you
detect problems that are not obvious by looking at the variables individually.
The collection of measurements on x is called a vector. In this case it is a row vector. We could have written x as a
column vector.
4
x= 2
0.6
Lecture Notes on Multivariate Analysis 2 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
3 Multivariate Distributions
• A multivariate distribution describes the underlying random structure of a vector of random variables.
• From it we can derive marginal properties of the individual variables.
• It also describes relationships between variables or groups of variables.
• As in much of statistics, we are generally interested in making inferences about this distribution based on a sample.
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
Example (Singular)
Let
1 2
A=
1 2
1 2
A= = (1 × 2) − (1 × 2) = 2 − 2 = 0
1 2
Now, Matrix A said to be a singular, because its determinant is equal to zero.
Lecture Notes on Multivariate Analysis 3 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
f (X) = f (X1 , X2 , . . . , X p )
p/2
1 −1/2 1 0 −1
= ∑ exp − (X − m) ∑ (X − m)
2π 2
where m = (m1 , . . . , m p ) is the vector of means and Σ is the variance-covariance matrix of the multivariate normal
distribution. The shortcut notation for this density is X = N p (m, Σ).
Figure 2: A normal density with mean µ and variance σ 2 and selected area under curve
A plot of this function yeilds the familier bell-shapped curve shown in Figure 2. Also shown in the figure are appro-
priate areas under curve with in ±1 standard deviations and ±2 standard deviations of the mean.
The areas represent the probabilities, and thus, for the normal random variable X.
.
P(µ − σ ≤ X ≤ µ + σ ) = .68
.
P(µ − 2σ ≤ X ≤ µ + 2σ ) = .95
Lecture Notes on Multivariate Analysis 4 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
The following are true for a random vector X having a multivariate normal distribution:
Theorem
Part A The marginal distributions of x1 and x2 are also normal with mean vector µi and covariance matrix Σii (i = 1, 2),
respectively.
Part B The conditional distribution of xi given x j is also normal with mean vector.
−1
µi| j = µi + ∑i j ∑ j j (x j − µ j )
= F(x, ∞)
Let this be F(x). Clearly Z x Z ∞
F(x) = f (u, v)dvdu (2)
−∞ −∞
We call Z ∞
f (u, v)dv = f (u) (3)
−∞
say, the marginal density of X. Then (2) is Z x
F(x) = f (u)du (4)
−∞
In a similar fashion we define G(y), the marginal cdf of Y , and g(y), the marginal density of Y .
Now we turn to the general case. Given F (x1 , . . . , x p ) as the cdf of X1 , . . . , X p , we wish to find the marginal cdf of
some of X1 , · · · , X p say. of X1 , . . . , Xr (r < p). It is
Pr (X1 ≤ xr , . . . , Xr ≤ xr ) (5)
= Pr [X1 ≤ x1 , . . . , Xr ≤ xr , Xr+1 ≤ ∞, . . . , X p ≤ ∞]
= F (x1 , . . . , xr , ∞, . . . . . . , ∞)
The marginal density of X1 , . . . , Xr is
Z ∞ Z 0
··· f (x1 , . . . . . . , xr , ur+1 , . . . , u p ) dur+1 · · · du p∗ (6)
−∞ −∞
Lecture Notes on Multivariate Analysis 5 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
The marginal distribution and density of any other subset of X1 , . . . , X p are obtained in the obviously similar fashion.
The joint moments of a subset of variates can be computed from the marginal distribution; for example,
Conditional Distributions
If A and B are two events such that the probability of A and B occurring simultaneously is P(AB) and the probability of B
occurring is P(B) > 0, then the conditional probability of A occurring given that B has occurred is P(AB)/P(B). Suppose
the event A is X falling in the interval [x1 , x2 ] and the event B is Y falling in [y1 , y2 ] . Then the conditional probability that
X falls in [x1 , x2 ], given that Y falls in [y1 , y2 ] , is
Pr {x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 }
Pr {x1 ≤ X ≤ x2 | y1 ≤ Y ≤ y2 } = (8)
Pr (y1 ≤ Y ≤ y2 }
R x2 R y2
x1 y f (u, v)dvdu
= R 1y2
y1 g(v)dv
Now let y1 = y, y2 = y + ∆y. Then for a continuous density,
Z y+∆y
g(u)du = g (y∗ ) ∆y (9)
y
The characteristic function of a multivariate normal distribution has a form similar to the density function.
Lecture Notes on Multivariate Analysis 6 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
Definition
The characteristic function of a random vector X is
0
φ (t) = E eit X (14)
The Moments
The monents of X1 , . . . , X p with a joint normal distribution can be obtained trom the characteristic function (9). The mean
is
1 ∂φ
E Xh = (15)
i ∂th t=0
= 1i − ∑ j σh j t j + iµh φ (t) t=0
= µh
The second moment is
1 ∂ 2φ
E Xh X j = (16)
i2 ∂th ∂t j t=0
1
= i2
− ∑ σhk tk + iµn − ∑k σk j tk + iµ j − σh j φ (t)
k t=0
= σk j + µh µ j
Thus
Variance (Xi ) = E (Xi − µi )2 = σit (17)
Covariance (Xi , X j ) = E (Xi − µi ) (X j − µ j ) = σi j (18)
Any third moment about the mean is
E (Xi − µi ) (X j − µ j ) (Xk − µk ) = 0 (19)
The fourth moment about the mean is
Example Consider the linear combination a0 X of a Multivariate random vector determined by choice a0 = [1, 0, . . . , 0].
Since
X1
X2
a0 X = [1, 0, . . . , 0] . = X1
..
Xp
and
µ1
µ2
a0 µ = [1, 0, . . . , 0] . = µ1
..
µp
Lecture Notes on Multivariate Analysis 7 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
We have
σ11 σ12 ··· σ1p 1
σ12 σ22 ··· σ2p 0
a0 ∑ a = [1, 0, . . . , 0] . .. .. = σ11
.. ..
.. . . . .
σ1p σ2p ··· σ pp 0
and its follow from result 4.2 that Xi is distributed as N(µ1 , σ11 ). More generally, the marginal distribution of any compo-
nent Xi of X is N(µi , σii ).
The set of 5 observations, measuring 3 variables, can be described by its mean vector and variance-covariance matrix.
The three variables, from left to right are length, width, and height of a certain object, for example. Each row vectorXi is
another observation of the three variables (or components).
Thus, 0.025 is the variance of the length variable, 0.0075 is the covariance between the length and the width variables, 0.00175 is the covariance
between the length and the height variables, 0.007 is the variance of the width variable, 0.00135 is the covariance between the width and height variables
and 0.00043 is the variance of the height variable.
Lecture Notes on Multivariate Analysis 8 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
12 Solved Exercises
Exercise 1
Let X = [X1 X2 ]> be a multivariate normal random vector with mean
>
µ= 1 2
and covariance matrix
3 1
V=
1 2
Prove that the random variable
Y = X1 + X2
has a normal distribution with mean equal to 3 and variance equal to 7 .
Hint: use the joint moment generating function of x and its properties.
Solution
Lecture Notes on Multivariate Analysis 9 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
Exercise 2
Let X = [X1 X2 ]> be a multivariate normal random vector with mean
>
µ= 2 3
E X12 X2
Solution
The third-order cross-moment we want to compute is equal to a third partial derivative of the mgf, evaluated at zero:
∂ 3 MX (t1 ,t2 )
E X12 X2 =
∂t12 ∂t2 t1 −0,tz−0
∂ MX (t1 ,t2 )
= (2 + 2t1 + t2 ) exp 2t1 + 3t2 + t12 + t22 + t1t2
∂t1
2
∂ MX (t1 ,t2 ) ∂ ∂ MX (t1 ,t2 )
=
∂t12 ∂t1 ∂t1
=2 exp 2t1 + 3t2 + t12 + t22 + t1t2
∂ 3 MX (t1 ,t2 )
2
∂ ∂ MX (t1 ,t2 )
=
∂t12 ∂t2 ∂t2 ∂t12
=2 (3 + 2t2 + t1 ) exp 2t1 + 3t2 + t12 + t22 + t1t2
Thus,
∂ 3 MX (t1 ,t2 )
E X12 X2 = = 2 · 3 · 1 + 2 · 2 · 1 + 22 · 3 · 1
∂t12 ∂t2 t1 −0,t2 −0
= 6 + 4 + 12 = 22
Lecture Notes on Multivariate Analysis 10 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
Exercise 3
Measurements were taken on n heart-attack patients on their cholesterol levels. For each patient, measurements were
taken 0, 2, and 4 days following the attack. Treatment was given to reduce cholesterol level. The sample mean vector is:
Variable Mean
X1 = 0-Day 259.5
X2 = 2-Day 230.8
X3 = 4-Day 221.5
Suppose that we are interested in the difference X1 − X2 , the difference between the 0-day and the 2-day measure-
ments.
Solution
• Each single variable has a univariate normal distribution. Thus we can look at univariate tests of normality
for each variable when assessing multivariate normality.
• Any subset of the variables also has a multivariate normal distribution.
• Any linear combination of the variables has a univariate normal distribution.
• Any conditional distribution for a subset of the variables conditional on known values for another subset of
variables is a multivariate distribution.
Lecture Notes on Multivariate Analysis 11 Prepared & Documented using LATEX by Dr K Manoj
NSTC32: Multivariate Analysis M.Sc., Statistics III Semester
Answer Key
1. A
2. A
3. A
4. C
5. B
References
[1] T. W. Anderson, “An Introduction to Multivariate Statistical Analysis (Wiley Series in Probability and Statistics)”,
Wiley-Interscience; 2003. pages 747.
[2] Johnson, R.A. and D.W. Wichern. ”Applied Multivariate Statistical Analysis (Sixth Edition)”, Pearson New Interna-
tional Edition. 2013.
[3] Marco Taboga, Multivariate normal distribution, https://www.statlect.com/probability-distributions/
multivariate-normal-distribution, 2020
[4] The Multivariate Normal Distribution, http://www.math.hkbu.edu.hk/~hpeng/Math3806/Lecture_note3.
pdf,
[5] Lesson 4: Multivariate Normal Distribution, https://online.stat.psu.edu/stat505/book/export/html/
636
******************
Lecture Notes on Multivariate Analysis 12 Prepared & Documented using LATEX by Dr K Manoj