0% found this document useful (0 votes)
25 views138 pages

Correlation and Regression: (L) (Il) (Il)

Uploaded by

Mayank Moyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views138 pages

Correlation and Regression: (L) (Il) (Il)

Uploaded by

Mayank Moyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

CHAPTER TEN

Correlation and Regression

10·1. Bivariate Distribution, Correlation. So far we have con-


fUled ourselves to unvariate distributions, i.e .• the distributiQPS involving only
one variable. We may, however, come across certain series where each term of
the series may assume the values of two or more variables. For example, if we
measure the heights and weights of a certain group of persons, we shall get what
is known as Bivariate distributiofl--:-,()ne variabJe reJating to height and other
variable relating to weight.
In a bivariate distribution we may be interested to out if there is any
correlation or covariation between the two variables under study. It the change in
one variable affects a change in the other variable, the variables are said to be
correlated. If the two variables deviate in. the same direction, i.e .• if the increase
(or decrease) in one resuJts in a corresponding increase (or decrease) in the other,
correlation is said to be direct or positive. But if they constantJy deviate in the
opposite directions, i.e., if increase (or decrease) in one results in
decrease (or increase) in the olber, correlation is said to be diverse Of negative.
For example, the correlation between (l) the heights and weights of a group of
persons, (il) the income and expenditure is positive and the correlation between
(i) price and demand of a (il) the volume and pressure of a perfect
gas, is negative. CorreJation is said to be perfect if the deviation in one variabJe
is followed by a corresponding and proportional deviation in the other.
10·2. Scatter Diagram. It is the simplest way of the diagrammatic
representation of bivariate data. Thus for the' bivariate distribution (Xi, y;); i = I,
2, ... , n. if the values of the variables X and Y plotted along the x-axis and
y-axis respectively in the xy plane, the diagram of dots so obtained is known as
scatter diagram. From the scatter diagram, we can form a fairly good, though
vag'le, idea whether the variables are correJated or not, e.g.. if the points are
very dense, i.e.. very close to each other, we should expect a fairly good
amount of correlation between the variables and if the ,points are widely
scattered, a poor correlation is expected. This method. however. is.not suitable if
the number of observations is fairly large.
10·3. Karl Pearson Coefficient of Correlation. As a measure of
itensity or degree of linear relationship between two variables, Karl Pearson
(1867-1936). a British Biometrician. deyeloped a..formula called Correlation
Coefficient.
Correlation coefficient 1?etween two random variables X and Y, usually
denoted by r(X. Y) or simpJy rxy. is a numerical measure of linear relationship
between them and is defined as
r(X. Y) = COY (X. Y) (10·1)
C1XC1y
10·2 Fundamentals of Mathematical Statistics

=
If (Xi. Yi) ; i I. 2 •... , II is the bivariate distribution. the!)
Cov (X, Y) =E[{X-E(X)} {(Y-E(Y)}]

=1 1:jx;-x) (v;-y)
II
=J.1IJ
ai =E{X-E(X»)2=11:(x.-x)2 ,
Il '
... (10'2)

the summation extending. over i from I to II.


Another convenient form of the formula (10·2) for computational work is as
follows:
Cov (X, Y) =11: (x;-x) (Yi -'y)
II
=1II (Xi Yi -'X;y -x Y; + x Y)
__
=-II xY' - Y -
" n
X, - X -
' II
Y' + X Y
'

(X, Y) 1 -- 2 I
'Co v y, ax ?

and = 1: Y? - y2 ... 00.2a)


Remarks 1. Following are the figures of the standard data for r> 0, <'0,

.
=0, and r =± I.
y
"
"
xJ...
I
-
f
. 0
I .'.: .
0
(r>O)
X 0 (r'< 0)
X 0
(r =0)
X o (r", +1)
X o (r=-·l)
X

2. It may be noted that r (X, Y5 provides a'measure of Ii"ear relationship


between X and Y. For nonlinear relationship, however, it is not very suitable.
3. Sometimes. we write: Cov (X, Y) axy =
4. Karl Pearson's correlation coefficient is also called product-moment
correlatioll coefficient, since
Cov (X, Y)-= E [fX - E(X») tY - E(y»)] J.111' =
10·3'1. Limits for Correlation Coefficient. We have

r'X _ Cov(X,Y)_, "


( • Y) - aX a Y [ _1 1: I ] 112
(x' _ £)2 . - 1: (Y' _ V)2
II' II"
1003

:, r2(X. Y)
('i.a'bjl
=(2aJ?(D>.2 )' }VJtere
(a. =x· x)
' , _ ••. (*)
• I bi=Yl-Y'
We have Qle Schwartz inequality which states that if ai, bi; i == 1,2, ...• n
are real quantities then .

('i." aib;)2 ( " "


a?-)( 1:. b?-)
;-1 ial ;'-:1

the sign of equality holding'if and 6ply if


a1 Oz a"
,
b=''':'='''=b-
1 vz !'
Using Schwartz inequality, we get from (*)
2 • ".
r (X. Y) 1 I.e., I reX. 1) I 1 => -1 rex. Y) 1
I<
•.. (10·3)
Hence correlation cQeffident cannot 'exceed unity numerically. It always
lies between -1 and +1. If r = +1, "the c6rtelation is perfect and positive and if
=
r -1, js perfect and negative'.
Aliter. If we write '£(X) =Ilx and E(Y) = then we have

:[(X (Y "0

E ( X - IlX)2 +E
CJx
f CJy ,
. -'llxHY - IlY)]
± 2 E[(X
CJx CJy
1+ 1± Y) Q
-1 Y) 1.
Theorem 10'1. Correlaiion coefficient is independent of change of.origm
and scale.
X -·a Y- b .'.' .
Proof. Let Y =-h-' V =-k-' tliatX =a hg and Y,,:, b + kV.
where q. b. h. k are constants; h > 0, k > 0 . " •
We shall'prove·tJ;1at reX; 1') r(V, V)= -r
= +
Since X a .hV and y.= b + kV, on taking expectations, we get'
- - E(X) =a + JiE(l!) -and' E(Y) /j + kE(V)
. ., ,
= ,
.. X - = h[V - Y - §(Y) = '- E(l:'))
=> Cov (X. Y) =E[{X -.E(X)lr{Y -E(Y)l] '1
=E[h{V-E(U)} (k{V-E(V)}] "
hk= =
E(U»).{V - E(V»)] hk Cov (V. V) ... (10·4)
C1x2 = E[{X-E(X)}2]=E[h2{V="E(U))2]=II2CJrl
=> CJx = hCJu • (h > 0) _ • ...(104a)
CJ,) =E({Y -E(Y)}2] = P[k2 {V - E(\0},2] =
k2a,}
CJy =kCJv • (k > 0) , ... (10-4b)
10·4 Fundamentals of Mathematical statietiea

Substituting from (104), (1040) and (104b) in (10·1), we get


r(X. Y) =COy (i. Y) _ hk. COy (U. V) =COY (U. V) _ r(U
C1x C1y hk.C1u C1y C1u C1y • V)
This theorem is of fundamental" importance in the numerical of
the correlation coefficient.
Corollary. If X and Yore random variables and a. b. c. d are any numbers
provided only that a #0. c #0. then
. ac
r(aX + b. cY + d) =I ac l·r(X. y)
Proof. With usual notations, we have
=
yar (aX + b) a2C1J?-; Var (cY + d) c2a'; ; =
COy (aX of b, cY + d) = aCC1XY
COy (aX + b •. cY + d)
=
.. r (aX + b. cY + d) [Var (aX + b) VaT (cY + 4)]112
= ac C1XY' ac (X'Y)
I a II c I C1x C1y •
If ac > 0, i.e.. if a and c are of same signs, then ocllocl =+ 1
If ac < 0, i.e .• if a and c are of opposite signs, then acllacl =-1.
Theorem 10·2. Two independent variables are uncorrelated.
Proof. If X and Y 'are independent variabl.es, then
COy (X, Y) = 0 (cl. § 64)
r(X. Y) 'Cov ex. Y) _ 0
C1XC1y
Hence two independentJ v&riable$ m:e Uncorrelated.
But the converse of the theorem is not true, i.e., two uncorrelate(I-variables
may not be independent as the following example illustrates :
Total
X -3 -2 -1 1 2 :3 I.X =0
, .
r 9 4 1 1 4 9 I.Y = 28
,XY ..,27 - 8 -1 1 8 27 ll,Y= 0

- I
X =-
n
U = 0, COy (X. Y) ='n-1 UY - X
--
Y =0
COy (X. Y) 0
r(X. Y)
C1x C1y
Thus in the above example, the variables X and Y un«orrelated. But on
cardul we find that X and Y are not independent but they are
wnnccted tty the relation Y = Xl. Hence two uncorrelated variables need not
lI\'I.'\'sS<lrily be independent. A simple reasoning for this conclusion is
"wt r(.\', n = 0, merely implies the absence of any linear relationship between
10-6

the variables .. X and Y. There may, however, exist some other, form of
relationship between them, e.g., quadratic, cubic or trigonometric.
Remarks. 1. Following are some more examples where two variables are
lJllcorrelated but not independent.
(i)X -N(O, I) and Y=Xl
Since X -N (0, I), E(X) =0 =E(}{3)
.. Cov ·(X, Y) =E (Xy) -E(X) E(Y)
=E(Xl) - E(X) E(Y) =0 ( .• ' Y =X2)
r(X, Y) -Cov (X, Y) =0
CJx CJy
Hence X and Yare uncorrelated,but not independenL
(ii) Let X be a r.v. with p.d.f.

and let Y= Xl. Here we shall get


E(X) =O' and E(XY) == E(X3) =0, => r(X I Y) =0
2. However, the converse of the theorem holds in the following cases:
(a) If X and Yare jointly normally distributed with p =P (X, Y) =0, then
they are independenL If p =0, then [ef. § 10·10,

r
(10·25)]

J(x, y) =CJx & exp [- x CJ y k exp [ - (Y


.. J(x, y) =!t(X)f2(Y)
=> X and Yare independent.
(b) If eaeh of 'the two variables X and Y takes. two values, 0, 1 with
positive probabilities, then r (X, Y) =0 => X and Yare independent.
Proof. Let X take the values 1 and 0 with positive probabilities PI and ql
respectively and let Y take the values 1 and 0 with positive and
'h respectively. Then
r (X, Y) = O. => Cov (X, Y) =0
=> =
0 E(XY) - E(X)E(y)
= 1 • P(X = lilY =!i) ·,.[1 • P(X) = 1) xl. P(Y = 1)]
=P(X =lilY = 1) - P1P2
=> P(X = lilY = 1) =P1P2 = P(X = 1) . P (Y = 1)
=> X and Yare independeilt.
10'3'2. Assumptions Underlying Karl Pearson's Correlation
Coefficient. Pearsonian correlation coefficient r is based on the following as-
sumptions:
(I) The variables X and Y under study are linearly related. In other words,
the scatter: diagram of data will give a straight line curve.
Fundamentala ofMatbematiea1 S1atlatiee

(il) Each of the variables (series) is being affected by a large number of


independent contributory causes of such a' nature as to pro{Juce normal
distribution. For example, the variables' (series) relating to ages, heights,
weight, price, etc., confonn to this assumption. In the words of Karl
Pearson:
"The sizes of the complex of orga,ns (something measurable) are determined
by a great variety of independent contribut<,Jry causes, for example, climate,
nourishment. physical training and innumerable other causes which cannot be
individually observed or. their effects.weasured." Karl Pearson further observes.
liThe .variations in imensiry of the contributory causes are small as compared
with their absolute intensity and these variations follow the normal law of
distribution ...
(iii) The forces so opefi;iting on each of the variable' series are not
independent of each other but are related in a causal fashion. In other word, cause
and effect relationship exists between different forces operating on the items of
the two variable series. These forces must be common to both the series. If the
operating forces are er.!irely independent of each other and not related in any
fashion, then there cannot be any correlapoQ between the variables under study.
For example, correlation coefficient between,
(a) the series of heights and incomes of individqals over a period of.time,
(b) the series of marriage l1lte and the rate of..agriculturaI, production in a
country over a period of time, .
(c) the series ,reiating t9 the size of the shoe and iptelligence of a group of
individuals,
should be zero, since the forces affecting the two variable series in each of the
above cases are entirely independent of each other.' .
However, if in ;my of the above cases value of r for a given set pf data is not
zero, then such correlation is tenned as chance co"elarion ot spurious or non·
sense correlation.
Example 10'1. Calculate the correlauon coefficient for ihe follo.wing
heights (in inches) offathers (X) and their sons (Y) :
X : 6S 66 67 67 68. 69 '7072
Y: 61 68 6S 68 12 12 69 71
Solution.
CALCULATIONS FOR CORRElATION COEFFICIENf
X Y }(2 y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 4900 4761 4830
72 71 5184 5041 5112
Total 544 552 38132 37560
- 1 544 - 1 1 -
X= =--8 = 68, -8 x 552 =69'
'1! n
!l:XY - i Y
r(X, y) = COy (X, =_ n _

· "..,y " -i') Gl:Y' - i")


37560 - 68 x 69
=---;:::===================
_ - (69)lJ

= 4695 - 4692 = = 0.603


...J(4628.5 -4624) (4766·5..:. 4761) ...J4·5 x 5·5
Aliter.
(SHORT-CUT METHOD)

x y U =x -68 V = Y..-69 cP vz uv
- 65 67 -3 -2
,

9 4 6
66 68 -2
\
- -1 4 1 2
67 6S -1 -4 1 16 4
67 68 -1 - 1 1 1 1
68 72 0 3 0 9 0
69 72 1 I 9 3
70 69' 2 0' 4 0 0
72 71 4 2 16 4 8
Total 0 0 44f '24

.ff ::{!w=o,
n -' V=.!IV=O
n
1 - - 1
COy (U, V) -U V =gX 24 = 3
1 ..;., 1
GrJ =ii ICP- (U)2=i x 36: 4·5
1 1
Gv 2 =;;IV2-{v)2=i x44.=S.S

r(U, V) =COy (U, V) 3 =0-603 =r(X, Y)


GuGv ...J4·5 x 5-5
Remark. The reader is advised to calculate the correlation coefficient by
arbitrary origin method rather than by the direct method; since the latter leads to
much simpler arithmetical calculations.
1008

10·2._ A computer while calcl/.lating cqrrelation coefficient


between two variables X and Yfrom 25 pairs of observatiof.lS obtained the
following results : ,
n = 25. IX = 125, IX2 = 650. I.Y = 100. I.f2 = 460. IXY = 508
It·was. however. later discovered at the time of checking that he had copfed
down two pairs as X*'Y while the correct values were
6 14 8 12
8 6 6 8
Obtcdn the correct value of correlation
[Calcutt" Unto. B.Sc. (Moth •• HOM.), 1988, 1991]
Solution.
Couected IX = 125 - 6 - 8 + 8 + 6 = 125
Couected I.Y = 100 - 14 - 6 + 12 + 8 = 100
Couected IX2=650 - @ - 82 + 82 + @=650
Omected I.f2=4® - 142 - @ + 122 + 82=436
Corrected IXY=508'-6xI4 - 8x6 + 8x12 +6x8=520 r

X =! IX =-is x 125 = 5, Y= I.Y =is x 100 = 4


COy (X0Y) a;;1 IXY -XY
-- 1 4
= 2S x 520 - 5 x 4'= 5
1 - 1
a,? =; I,X2 - X2 = 2S x 650 - (5)2 = 1
,
afJ- a;;1 I.y2 - -Y2=251 x 436 -16 = 36
25
'4
.. r(X, Y) -
COV-(X. Y) ·s 2
== - - 6 = -3 =0·67
ax ay I x 5 -
Example 10·3. Show that if X', Y' are the deviations of the random
variables X and Yfrom their respective means then
. ,
(J) r =1 _ L I. (X; _ Y; ),
2N i ax ay.

r = -1 + -
1 I. .:.:.L + -'. -
2N i ax ay
Deduce that -1 s;. r S + 1.
Uni". B.Sc. Oct. 1992; Mtulrtu Uni". B.Sc., No". 1991]
\
Solution. (i) Here = - -
and =-(Y; - Y) r.
-
R.H.S. =I-WI 1 (!i.ax -ay-
i
eorreJatlon'and:&.ep-ion 10·9

= 1 _.1:.. L[ X'l- +' _ 2XiYiJ


2N i CJr- CJ?- CJx<Jy

=1 __
1 [_1_ + _1_ LY? __2_ LXiYi]
2N CJr- i . CJ?- i CJxCJy j

1 [ 1 L (Xj + L (Yj - L(X i - X)(Yj -


... 1 -W CJr- i
2 12
CJr i'
y)2_ '_2_
qx<Jy j
y)lJ

1[1
:: 1 --2 1 .·CJy
-:2. CJx 2 + '-'2 . 2 2
- - - . rCJXCJy
. ]
CJx- CJr CJXCJy

'= 1 - 2I [1 + 1 - 2r] = r- •
(i.) Proceeding similarly. we will gel
1
R.H.S. = -1 + 2 (1 + 1 + 2r) = r ,I

Deduction. Since (Xi ± ,Yiy. being lhe square of a real quantity is


CJx CJY) I

Y/J.
I

L FrOq} part (r'>


always non-negative
• • .L.
"t' -,. - IS a Iso non-negative.
. I/> we gel
j ·CJx CJy,

r =1 -'(some non-negative quantity) => r 1 ... (*)


Also from part (il). we get \
r = -1 + (some non-negative.quantity) => ...1 r ... (**)
The sign of in (*),and (**).holdsjf and only,if

Xi _ Yi =
CJx CJy
o} .'
Xi Yi 'V • = 1.2•...• n
-'" -+-=0
i",1 CJx CJy

respectively.
From (*) and (**), we gel
-1 r S 1
Example 10·4. The variables X and Yare connected by' the equation
ax + bY + c =O. Show that the co"elation between them is -1 if the signs 0/
a and b are aliJce and +1 if they are different.
[NGl!Pur Uni.,. B.Sc. 1992; DeW Uni.,. B.Sc. (SIal: Hon••) 1992)
Solutioll. aX + bY + c =0 => aE(X) + bE(Y) + c =0
.• a{X -E(X») + b{Y -E(n) ... 0

=- (X -E(X») (Y.-E(y)}
•. COy '(X. y) =E[ (X - E(X»)
. (Y - E(Y) H
FurdamentaJ.. otMatbematlcal Statlat1e.

= _f!E[(Y -E(Y)V1
a -
=_.f!.
a
ayl

E(X-E(X)J2 _E(Y)}2J = ayl

_ £. . a y2 _ £. a y2
r= a =_-=a:.....-_
-fayl" ay2 I :1 ay2

={+ I, if b and a 'are of opposjte signs.


-I, if b and a are of same sign-.
Example 10·5: (a) If Z =aX + bY arid r is the correlation coefficiem
between X and Y, show that
ai! =ala;' + b2aYJ. + 2abrax ay
(b) Show that the correlation coefficient r between two random variables X
and Y is 8iven by
r =(ar + a.r -ax -.r) /2axay .
where ax, ay and aX_yare the standard deviations of X. Y and X - Y
respectively.
[Calcutta Univ. B.Se., M.S. Univ. B.Se. 1992]
.•. Taking expectation of both sides of Z =aX + bY, we get
E(Z) =aE(X) + bE(Y)
Z -E(Z) =a(·X -E(X)} ,.. b.(Y ...,E(y)}
Squaring and taking expectation of both sides, we
a'; =alar + bla.r + 'lab Cov (X, Y)
=ala;' + 1J2aYJ. + 2abrax ay
(b) Taking a =I, b = -1 in the above case, we have
Z=X-Y and ax_YJ.=ax2+aYJ.-2rajcay
ax 2 + ayl - ax _ yl
.. r= 2ax 01
Remark. In the above example, we have obtained
1 • , ,
V(aX + bY) = a2 V(XJ + b2 V(Y) + 2ab <)f;
-
n
Similarly, we could obtain the result_
V(aX - bY) =0 2 V(X) +·b2 V(Y) 2ab Cov'(X: n.
The above results are useful in solving theoretical problems.
Example 10·6. X aM Yare two random variables witll variances aJ-
and ay2 respectively and r is the, coefficient of correlation between them. If
U = X + kY and V =X + qy Y" find the value of k so that U and V are
-
UllCorreialed. [Delhi Univ. B.Sc;. 1992; Andhra Univ. B.Se. 1998]
10·11

Sqlution. Taking of U =X + kf and V =X + f, we get

E(U) = E(X) + kE(Y) and E(V) = E(X) + ,CJx E(Y)


, CJy
U -E(U) = (X - E(X)) + k(f - aQd
V - E(V) =(X -'E(X») + CJy
CJx (f - E(Y) )

COy (U, V) =E[(U -E(U)) (V -F,(V»)]

= E[(X -E(X)) + k(f -E(y»)1.x


. CJx (f -E(Y»)]
[(X -E(X)) + CJy

= CJx2 COy (X. f) + k COy tX. Y) + k CJ.\ •


CJy
=. "
,+ kCJXCJY] + ['CJX ...
CJy'
k.]
COy (X. Y)
",a.

rCJx'+ kCJY] Cov ("


= CJx I.\CJx + kqy)- + [CJy X...V\.,

.=.(CJx + [cov(x.n]·,
CJxkCJy)
- CJy
= (CJx +
(+ kay) (1 + r)CJx
,-
U and V will be uncorrelated if
r(U, V) = 0 => COy (U, V) = 0
i.e., if (c;sx + (I- + r) CJx = 0
=> CJx+kCJy=O
CJx
=> k =--
CJy
Example 10·7·. The "randOm variables. X ,and Y are jointly normally
distributed and U and V.are·dejine(!by ,"
.. U <;: X cos· a + f sin a,
V = y.' cos a - X sin a
Show thai V and V will be pncorre,lated if
2rCJxCJy
tan2a
." = CJx2 r- CJy. .2',
where r =. COTTo (X, fJ, = VaT (XJ and CJ; =- Var (V). Are U and V then
independent? .
[Delhi Uraiv. B.Se. (Stat. Bora••) 1989; (Math •• JI01&ll.), 1990]
Solution. We have
COy (lI, V) =E[(U - E(C!» [V - E(V)}]
=E[[(X -E (X)) cos a + (f -'E(y») sin a]
x [[f - E(Y») cos a - (X - P(X») sin a]]
Fundamentals Statistics
= cos2 a COy (X, Y) - sin a cos a.aJl
+ sin cos a.a,z - sin2 a (Cov (X. Y)
= ·(cos a - sin a) COy (X. f) - sin a cos a
2 2 - ay2)
= cos2a COy (X, Y) - sin a cos a - a,z)
U and V will be uncoitelated if and only if
r(U, V) = O. i;e., iff <;ov (U. V) = 0
i.e.. if cos 2a Cov (X. Y) - sin a cos a (ax 2 - ay2) 0 =
sin 2a
or if cos 2a r axCJy =-2-.- . (ax 2 - a,z)
2r axCJy
or if ·tan 2a = .2 . 2
ax - ay-
However, r(U. V) = 0 does imply that the variables.U and V are
independenL [For detailed discussion,see Theorerpl0·2, page 104.].
Example 10·S. [IX, Yare standardized random variables. and
1.+ 2ab
r(aX + bY, bX + aY) = a2 + b 2 " .(*)
find r(X. Y), the coefficient 01 correlation between X and Y.
[Sardar PaI,1 UIIi". B.Sc., 1993; Delhi UIIi". B.Sc. lStat. Hons.), 1989]
Solution. Since X and Y are standardised random variables, we have
. .
E(X) = E(Y) = 0 . j'
and Var (X) = Var (Y) E(X2) = E(f2) = 1
and Cov(X, Y) =
E (X Y) E(XY) = r(X,y).axCJt'
,.:(**)
= r(X,Y)
Also we have
r(aX + bY, bX + aY)
_ E[(aX + bY)(bX + aY)] - E(aX + bn E(bX +'an
- [Var·(aX + bY) .. Var{IfX + aY)]m
= E[abX2,+ a 2 XY + b 2 YX + aby2] - 0
{[a Var (X) + b 2 Var (Y) + 2ab COY (X.y)]
2
x [b 2 Var (X) + a 2 Var Y T 2ba Coy (X,y)])·l/2
. ab.l + a 2 rex, Y) + ·b 2 rex. Y) + ab.l
= ([a + b 2 .+ 2ab r(X, f)][bl + a 2 + 2ba r(X, Y)])
2 1(1.

[Using (**)]
_ 2ab + (a 2 + b 2). reX. n
- a 2 + b 2 + 2ab. r(X, Y)
From (*) and (**). we get
1 + 2ab _ (a 2 + b 2). r(X, -y) + 2ab
. a2 + b 2 - a 2 + b 2 + 2ab: r(X, Y).
Cross multiplying, we get
10·13

(aZ + IJ2) (1 + lab) + lab. r(X, y) (1 +"2ab) = (a2 + 1J2)2. r(X, y) + lab (a2 + IP)
:::> (a4 + !J4 + 202b 2 - 'tab - 4 a21JZ). r (X. Y) =(a2 + IJZ)
[(a2 _1JZ)2 - lab] r(X; Y) = a 2 + b2
a2 + b2
r (X, Y) =(a2 b2)2 _ 'tab

Example 10·9. If X a¢ Y are uncorrelated random variables with means


zero andvariancesCJI 2anda,.2 respectively, show thai
U =X cos a + Y sin a, V =X sin a - Y cos a
haVe a correlation coefficient p given by
CJI2 - CJ22
p = [(CJ12 - CJ22)2 + cosec2 2a]1I2
Solution. We are given that
=
r(X, Y) 0 =
Coy (X, Y) 0, CJll CJ? and = CJ22CJr = •.. (1)
We have
CJU2 = V(X cos a + Y sin a)
= cosla V(X) + sin2a V(y) + 2 sin a cos a Cov (X. Y)
=cosla CJI2 + CJ22 [Using (1)]
Similarly, L.
CJ'; =V(X sin u - Y cos a) =sin2a.CJ12 + cos2a.CJ22
Cov (U, V) =E[(U -E(U)} [V -E(V}}]
=E[ {(X -E(X» Cos a + {(Y -E(Y) sin a}
x [(X -E(X» sin a - (Y -E(Y» cos a}]
=sin a cos a V(X) - cos2a Cov (X, Y)
+ sin2 a 'Cov (X, Y) - sin a cos a V(Y)
=(CJ12 - CJ22) sin a cos a [Using (1)]
2 _ [Cov (U, V)]2
Now p - CJU2CJv2
= (cos2a CJI2 + sinZa CJ22) (siJl2a CJ12 + cosZa CJ/)
== sinZa cos2a(CJ14 + CJ24) + CJl2a22 (cos4 a + sin4a)
= sinZa COS2a(CJI4 + CJ24) + cos2a)2- 2 sin2a cos2a)
=sinZa cosla (CJ1. + CJ24 - 2CJl2a22) + CJI2a22
4
=sinla cos2a (CJ12 - CJ22)2 + CJ12a·i
rr= CJl2a22 + sip2a COSla(CJ1 2 - CJ?)2
- CJ22)2 sin 2 2a
= CJ12CJ22 + sin2 2a. (CJ12 - CJ22)2
qfMatbematical StatWtio.

(a1 2 - (22)2 ,
= 40'12cr22 + (0'1 2 -'-.0'22)2
_ 0'1 2 -0'22
=> P- 20
[(0'1 2 - 0'22)2 +'40'1 22 cosec22a]lf2

Example 10-10.11 V = aX + bY and V = eX + dr, where X and Yare


m({lSured Irom, their respeetivtf means and if r is the eo."elation cOl'ffi.eient
between X and Y, and if V and V are unCo"elated, sho.w that
, =(ad - be) O'xay (1 - r2)1/2
auav
[Poont;J Univ. B.Sc., 1990; Delhi Unii1. B.Sc. (Stat. Hons.), 1986]
Sqlution. We have
_ Cov (X,
r - ax ay Y) =>
1 2_
- r - ,-
1 [Cov (X,
Gx 2 ar
Y)J2

=> (1 - r 2) axl O'r =O'x ar - [Cov (X. Y)]2


2 _..(*)
[This step is suggested by the answer]
V =aX + bY , V =eX + dY
Since X. Yare measured from their means,

E(X) = 0 = E(Y) => E(V) = 0 = E(V) }


=
au 2 E(V2); E(Vl) = ",,(**)

Also aX + bY - V = 0 and eX + dY -v = 0

X . Y 1
-bY +' dV= -eV + aV= ad -be

1 be (dV -bV).
X = ad:... }
.. ,(***)
Y =ad be (-e V + a V)

Var (X) =(ad _1be)2 [til O'el + b20''; - 2 bd Cov (V, V)


=(ad bc)2JiPael + b2 6';]
[Since p. V are uncorrelated Cov (V. V) 0]=
Similarly, we have

Var (Y) =(ad be)2 a.cl. + Q2 aif).


Cov (X:y) =E(XY) -E(X) E(Y) =-E(XY) ['.' E(X) = 0 = E(Y)]

= (ad be)2 E[(dV - bV) + aV)] [From (***)]


=(ad _1bc)2 '
[-cd GrJ - ab Gy2)
[Using and. Cov (U, V) = 0, given]
=(ad __lb'c)2 [cd + ab Gy2),
Substituting in (*), we get
(1-,.2) Gll Gf1- =(ad _1bc)4 x [(dl ar1 + 1J2 Gy2) (c2 Gel- + a2 Gy2)

- (cdGrJ + ab Gy2)2]
1
= (ad - bc)4
x tc2cP Gu4 + a2b2Gv4 + (a 2cP + b2c2) Grr. Gy2
- c2dlGU4-a2 b2Gv4 - 2abcd GrJ Gy2]

= [a 2cP + h2c2-.2abcd] Gr/Gy2

=(ad bet (ad - bc)2 GrJ Gy2


Gel Gy2 -(
=(ad -
bc)2
Cross multiplying and taking square root, we get the required result.
Example 10·11. (a) Establish the formula :
nrO'xOf =nlrlO'xlOfl + n2r2uXz Ofz + nldxldyl + ...(10·5)
where nl, n2 and n'are respectively the 'Sizes of,thefirst, second and combined
sample: (XI' ]1), (X2' ]2)' (x, ]), their means rl' r2 and r their coefficients of
correlation; (O'xl , Ofl ), (O'xz' Of;, (O'x, O'y) their standard deviations, and

dx1 =Xl - X dYl =]1 - Y


dx2='X2 - x' dyz =']2 -]
(b) Find the correlation co-efficient of combined sample giv.en that
Sample I Sample 1/
Sample size !OO 150
Sample mean (i.l 80 72
Sample mean 6) 100 118
Sampleyariance (Gll) 10 12
Sample variance (GI) 15 18
Cor;elati'on coefficient 0·6 0·4
Solution. (0) Yli) ; i =.1, 2, ... , nl and (X2j' Ylj); j = 1, 2, ... ,
n2. be the two samples 9( si:zes nl and n2 respectively (rolt) the bivariate
population.' Then with the given notations, we have
10·16

x =nixi. + nZxZ - _ nl'l + nZ}Z


nl + nZ Y - nl + nZ
naxZ = nl (axi z,+ dx l Z) + nz (axzZ + dxzZ) }
ncYZ = nl (aYl z + dYIZ) + nz (ay,.z + dyzZ) ...(1)

••.(2)

lit lit
But 1: (xu -XI) =0 and 1: (yu - y,) = O.
i-I i-I

being the algebraic sum Of the deviations from the mean.

lit
:. . 1: (xu - x) (Yli - y) =nl'l aXI aYI + nl dxltiyl [Using (2)]
•- I
Similarly. we will get'
"2
1: (XZj - x) (Y2j - y) =nz,z axz ayz + nz dxzdyz
j - I
Substituting in (3). we get the required formula.
(b) Here we are given:
nl =100. Xl =SO. YI =100. aXlz =10. aylz =15. 'I =0·6
n2 = 150, Xz =72. Yz = l1S,'axzZ =12. ayl = 18. 'z =0.4
100 x SO + x 72_ 75 .2
100 + 1-50 . --
10·17

y = nlYl + nzyz = 100 X 100 + 150 X 118 _ 110.8


nl+nZ 100+150-

dxl =Xl -x = 4·8, dYl = Yl- Y = 10·8


dxz = Xz - x = 3·2, dyz = Y2 - Y = 7·2
nar = nl (ax12 + dxlZ) + nz (aXZZ + dxzZ) = 6640
na1- = nl (aYl + dylZ) + nz(ayl + dyl) = 23640
Substituting these values in the formuJa and simplifying, we get
nlt;laxlGYl + nZr2aXZaY2 + nldxldYl + n2 dicz dY2
r=
nax(Jy =0·8186

= 0 • otherwIse
I
Example 10·12. The independent variables X and Yare defined by :
!(x) = f(y) = 4by.Osy.ss
= 0 • otherwlse
Show that: 'r..
b-a
Cov (U. V) =-b- ,
+a
where U =X + Y and V =X - Y
[I.LT. (S. Tech.), Nov. 1992]
Solution. Since the total area under probability curve is unity (one), we
have:

Jo
r r
1
f(x)dx =·40 Jxdx =1 => 2a,z= 1 => a=2r2 ... (1)
9

Jo fly)dy =
r ' I
1
4b Jydy =1 .•. (ii)
0
2x,
:. !(x)=4ax=,z ,OSxSr; and fly)
.
=4by =s 0 SYSs ... (iii)
Since X arid Yare independent variates.
r(X. Y) == 0 => Cbv (X. Y) = Q ... (iv)
Cov (U. V) = Cov (X + Y, X - Y)
= Cov (X. X) - 'Cov (X. Y)'+ Cov (Y. X) - Cov (Y. Y)
:'ar - a? [Using (iv)]
Var(U) =Var (X + Y) =Var (X) + Var (Y) + 2 Cov (X. Y)
=ax2 + a1- [Using (iv)]
Var (y) =. Var (X -. Y) =Var (X) + Var (Y) - 2 Cov (X, Y)
[Using (iv)]
10·18

'" (v)
We have :

E (X) =J xf(x)"dx' ,=. - J x2 dx =2,.3 [From (iii)]


0. o
r

E(Xl) • =f dx =1. . .='2


;.2 f .xJdx ,.2

o 0
- /:2 -4r2 r2 1
.. Vat (X) =E (Xl) - [E(X)]2 =2 - 9"= T8 =36a [From (i)]
Similarly, we shall get .1

2s s2 s'2 1
EQ') =]' E(y2) =2 and Var O}=18= 36b
Substituting in (v.), we get
. _ 1/(36a) - l'/(36b) b - a
/ (U, V) - 1/(36a) + 1I(36b)' ='1/ + a .
Example 10·13. Let the random variable X !lave the margillal-density
I 1
f. (x) = I, - 2 < x < .2
alld let tile cOllditional density of Y be I
=
f (y I x) 1, x < Y < x + I, - 2 < x < 0 (*)
= I, -x < Y < 1 - x, 0 <: x.< I
Show that the variables X and Yare ullcorrelated.
Solution. We have
2"• 2•
112

E(X) = f xf. (x) rfx:;: f x.l,dx =0


- 112
-2"
• " 2·
-•.!. ",
IfJtx, y) is the joint p.d.f.. of X and Y, then '-
j(x, y) =f(y I x) f. (x) =f (y I,x). (t*) [:., fl (x) =1]
o x :t." • 2"• 1I - .

E(Xy) = J I xy f (x, y) dx dy + f .f"vlyj(x, y) dx dy


Correlation and Reeresaion 10·19
Ci

J
1
1 1 2
=2 x(2x + l)iU + 2 Jx (1 2x) dx

r 1
_1 0
2

=21[112 - 18'- j
12 + IJ"
8 =0
.. Cov (xy) =E(Xy) - E(X) E(Y) =0 => r(X. Y) =0
Hence the variables X and Yare uncorrelated.

EXERCISE 10(a)
1. (a) Show that the of correlation r is independent of a change
of scale and origin of the Also prove that for two independent·
=
variables r O. Show by an example that the converse is not true. State the
between which r lies and give its proof.
[Delhi Univ. M.Sc. (O.R.), 1986]
(b) Let P be the correlation coefficient between two jointly distributed'
random variables X and Y. Show that I P =
1 and that I p I 1 if, and only if X
and Y linearly related. [lndan Forest Service, 1991]
2. (a) Calculate the coefficient of correlation between X and Y for the
following:
X... 1 3 4 5 7 8 10
Y... 2 8 10 14 16 20
Ans. r(X. Y) =" +1
(b) Discuss the validity of the following statements :
(0 "High positive coefficient of correJation between increase in the sale of
newspapers and increase in the number of crimes leads to the conclusion that
newspaper reading may be rt)Sponsible for the increase in the number of crimes:"
(it) "A high positive value of r between the increase in cigarette smoking
and increase in lung cancer establishes that cigarette smoking is for
lung cancer."
(c) (i) Do you agree with the that "r == 0·8 implies 80% of
the data are explained."
(il) Comment on the following :
"The closeness of relationship between two variables is proportional to r".
Hint. (a) No (b) Wrong.
(d) By effecting suitable change of origin and scale, t!le product
moment correlation coefficient for the following set of 5 observations on
(X. Y) :
X: -10 -5 0 5 10
Y: 5 9 7 11 13
Ans. r(X. y) 0·34
10·20

3. Tlie marks obtained by 10 students in Mathematics and Statistics are


given below. Find the Qf correlation bc-.tween the two subjects.
Ro')) No. 1 2 3 4 5 6 7 8 9 10
Marks in
Mathematics: 75 30 60 80 53 35 15 40 38 48
Marks in
Statistics: 85 45 54 91 58 63 35 43 45 44
)

4. (0) The following table gives the number of blind·per lakh of population
in different age-groups. Find out the correlation between age and blindness.
Age in 10-20 20--30 30-40 40-50
Number of blind
per lakh 55 67 100 111 150
Age in year 50-60 60-70 70-80
Number of blind
per lakh 300. 500
. ADS. 0·89
. (b) The following table gives the distribution of items of production and
also the relatively defective items among them, according to size-groups. Is
there ay correlation between·size and defect in 'quality ?
Size-Group: 15-16 17-18 18-19 19-20 20.....:.21
No. of Items: 200 270 340 360 400 300
No. of defective
items 150 162 170 180 180 120
Hint. Here we have to find the correJation coefficient between the size-
group (X) and the percentage of defectives (Y) given below.

x 15·5 16·5 '17·5 18·5 19·5 20·5


y 75 50 50 45 40

Ans. r = 0·94.
5. Using-the formula
crx _y1= crX1 + cry1 - 2 r(X, Y) crx cry
obtain the correlation,coefficient between' the heights of fathers (X)' and of the
sons (Y) from the following data: '
x·: 65 66 67 68 69 70 71 67
Y : 67 68 64 72 70 67 70 68
6. (0) From the following data, compute the co-efficient of correlation
between X and Y.
X series Y series
No. of items 15 15
Arithmetic mean 25 18
Sum of sqlUlTes of deviations 136 138
from mean
of product of deviations of X and Y series from respective
arithmetic means = 122.
ADS. , (X. Y) = 0·891
(b) Coefficient of correlation between two variables X and Y is 0·32. i'iteir
co\'llriance is 7·86. The variance of X is 10. Find the standard deviation of)'
'es.
set! (c) In sets of variables X and Y with 50 observations each. the
following data were observed :
X=10. C1x =3. Y= C1y =2 and ,(X. y) =0·3
But on subsequent verification it was found that one X (= 10) and
one value of Y (= 6) were inaccurate and hence weeded out With the remainin&,
49 pairs of values. how is the origipal value of , affected? '
(}Iagpur Uni(). B.Sc., 1990)
Hint. IX = nX = 500. ry = nY = 300
IX2 = + X"Z) =5450. ryz =50(4 + 36) = 2000
, C1x C1y
LXY -)(
= Cov (X. y) = - n-
-u-
I

=> 0·3 x 3 x 2 =lfoY - x6


=> rXY = 5q(I·8 t 60) = 3090
After weeding out the incorrect pair of observatioJ;l, viz .• (X = 10, Y = 6),
lite corrected of IX, rf2 and IXY for the remaining 50 -1 =
49 pairs of observations are given below :
Corrected Values ;
LX = 500 - 10 = 490 ; ry =300 - 6 = 294
L XY = 3090 - 10 x 6 = 3090 - 60 =3030
L XZ =5450 - 102 = 5350. ryz =2000 - 62 = 1964
." _ Corrected Cov (X. Y) =' 90/49 = 0.3
'.. , = (Corrected C1X)x C1y '450 -200
49
Hence the correlation coeffiCient is invariant in this case,
(d) A prognostic test in Mathematics was given- to 10 students who were
about to begin a course in Statistics. The scrores in their test were-
examined in relations. to (Y) in the final examination in Statistics. The
following results were obtained :-
r X = 71, r Y =70,'l: XZ = 555, r yz =526 and r XY = 527
Find the coefficient of correlation between X and Y.
(Kerola Uni(). BoSc., 1990)
7. (a) Xl andXz are independent variables with means Sand 10 and standard
deviations 2 and 3 respectively, Obtain ,(U, V) where
U = 3XI +4Xz and V = 3XI -Xz
Ans.O (Delhi Uni(). B.&. ,1988)
10.22 Fundamentals 'of Mathematical

(b) If X and Y src im,ependent with zero means and standard


deviations 9 and 12 respectively, and if X + 2Yand leX - Y are non-"correlated.
find k.
(c) X. Y, Z random variables each with expectation 10 and variances I.
4 and 9 respectively. The correlation coeffiCients are
r(X, y) =0, r(Y. 2) =r (X, y) =1/4
Obtain the numerical values of :
(;) E(X + Y - 22), (il) Cov (X + 3, Y + 3), (iii) V(X - 2 Z) and
(iv) Cov (3X, 52)
Ans. (i) =0, (il) 0, (iii) 34, and (ill) 45/4.
(d) XaDd Y an. discrete random variables. If Var (X) = Var(Y) = CJ2,
2
Cov (X, Y) = , find (i) Var (2X - 3y), (il) Corr (2X + 3, '2Y - 3).
8. (0) Prove that:
V(aX ±by) =a2V(X) + blV(Y) ± 2ab Cov (X. Y)
Hence deduce that if X and Y.are independent
V(X ± Y) =V(X) + V(Y)
(b) Prove that correlation coefficient between X and Y is positive or
negative according as
CJx + Y > or < CJ x _ y
9. Show that if X and Y are·two ranc;lom variables each assuming only twO
values and the correlation co-efficient between them is zero, then they are
independent Indicate with justification whether the result is true in general.
Find the correlation coeffident between X and. a - X, where X is any
randOm variable and a is constant.
10. (a) Xi (i =1, 2, "3) are uncorrelated variables each having the same
standard deviation. Obtain the correlation between Xl + and X; + X3'
Ans. 1/2
(b) If Xi (i =1, 2, are three uncorrelated variables having stimdard
deviations O'r, CJ2 and CJ3 respectively, obtain the coefficient of correlation
between (Xl + Xv and (X2 + X3).
Ans. CJ22/..J (CJ12 + CJ22) (CJ22 + CJ32)
(c) Two random variables X and Y have zero means, the same variance (J2
and zero correlation. Show thlilt
V =X (l + Y sin a and V =X sirta - Y cos a
have the same variance 0'2 and zero correlation. )
Ulangalore Uni". B.Sc., 1991
(d) Let X and Y be iincorrelated random If U =X + Y
V =X - Y. prove that the coefficient of correlation between U and V IS
(O:x 2 - O'y2)/(CJX2 + CJy2). where CJx2 and .q,z are of X and r
respectively. . ,
(e) Two independent random variables X and Y have the following
variances: O'xZ =36, O'yZ =16. Calculate the coefficient of correlation betWeen
U=X+YandV=X-Y
Coifelation and Regression 10·23
if) Random variables X and Y have Zt;ro means and non-zero variances (JJil
and a';. If Z = Y - X, then find (Jz and the correlation coefficient p(X. Z) of X
and Z in terms of (Jx. (Jy and the correlation coefficient p (X. Y) of X and Y.
(g) If the independant random variables XI. X2 and X3 have the means 4. 9
and 3 and variances 3. 7.5. respectively. obtain the mean and variance of
(i) Y ='2Xl - 3X2 + 4X3• (il) Z = Xl + 2X2 - X3• and
(iiI) Calculate the correlation between Y and Z.
[Delhi Univ. 1989]
11. (a) X.. X2 ..... :x" randopl variables. all with the same
distribution and zero means. Let X = IXi In
Find the correlation co-efficient between (I) Xi and Xand (il) Xi - Xand X.
[Delhi Uni". B.Sc. (Stat. Hons.), 1993]
Hint. r(X i • X) == (J21n = _I_
V(J2. (J21n {;

COv(Xj - X.:X.) = Cov (Xi. X) - Var (X)


= «(J2/n) - (fJ2ln) = 0
r(Xi - X. X) = 0
(b) Xh X2. ... .. .• X" are random variables each with the same expected

=: (1 -
value Jl and s.d. (J. The correlapon coefficient between any two Ks is p. Show

that(l) Var (X) + p(J2.

(ii) E i (Xi -
1
X)2 =(n -1)(1- p)G2. and (iiI) p > '- _1-1
n-
12. (a) If X aIld Yare independent random variables, show tflat
r(X + Y, X - y) = r2(X. X + Y) - r2 (Y, X + y),
"here r(X + Y. X - Y) denotes the co-efficient of correlation between (X + Y)
and (X - Y ). (Meenlt Uni". B.Sc.; 1991)
(b) Let X and Y be random having, mean 0, variance 1
COrrelation r. Show that X - rY and Y are uncorrelated and that }{l' - rY has
tnean zero and variance 1 - r2. ' ,
13. XI and X2-are two variables with zero means; variances (J12 and (J22
and r is die: correlation coeLlcient between them. Determine the
ues of die constants a and b which are independent of r such that XI + aX2
lind XI + bX2 are uncorrelated. .
,14. (a) If Xl and X2 are two random v.ariables with means and
(J12. (J22 an4 correlation coefficient r. fmd the correlation co-efficient

U =alXl + azX2 and V =blXl + bzX2 ,


al; a2 and bl • bz are constants.
Fundamentals of Mathematical Statistic"

(b) Let X.. X2 be independent random variables with nwaos 111.112 and non,
=
zero variances cr12, cr22 respectively, Let U XI - X2 and Y XI X2· find the =
correlation coefficient, betweeQ (i) XI and U, Ui) XI and V, in of 1lJ,j.ll.
cr I2, cr2 2.
=
15. (q) If U aX + oY and V =bX - aY, where X and Yare meas,ured frolll
their respective and if U and V are uncorrelated, ,. the cQ-efticient or
correlation between X and Y is given by the equation.
cru cry =(a 2 + b2) crx cry (1 - 1'2)1/2 (Utkal UDiv. B. Sc., 1993)
(b) 'Let U =aX + bY and V =aX - bY where X, Y represent deviations frolll
the means of two measurements on the same individual. The coefficient or
correlation between X and Y is p. If U, V are uncorrelated, show
cru cry =2abcrx cry (1 - 1'2)112
16. Show that, if a and b are constants and I' is the correlation coefficienr
between X and Y, then the correlation coefficient between aX and bY is equal to
I' if the signs of a and b are alike, and to - t if they art( different.
Also show that, if constants a, band c are positive, the correlation
coefficient between (aX + bY) and cYis equal to
(arcrx + bcry) I ...J(a2crX2 + b 2cry2 + 2abrcrxcry)
17. If XI, X2 and X3 are three random variables measured frQm their
respective means as origin and of egual variances, find the coefficient of
correlation between XI + X2 an<;l X2 + X3 in terms of t:12. rJ3 and 1'23 and show
that it is equal to .
I2 1
( I.)r 2+ ,I·f 1'13 =1'23 =0,an d(") II =
1'12+
4
3 'f'
,I 1'13 = 1'23
I =
18. (a) For a weighted distribution (x;, Wi), (i = 1, 2, .... , n) shbw that the
weighted arithmetic mean Xw = IV; IV; > or < the unweighted mean l
:x = X;lll according as r.n.. > or < O.
(b) Given N values XI' X2, ... , XN of variable X and weights 11' .. 1V2, .... , WN.
exprpss the coefficient of correlation between X and W in terms involving the
difference between the arithmetic mean and the weighted mean of X.
19. (a) A coin is tossed /I times. If X and Y denote the (random) number 01
heads and number of tails turned up respectively, sflow that I' (X, Y) =-'-I.
HiDt. Note that X + Y =n => Y =" - X
.. I' (X, Y) =
I' (X, n - X) =
I' ( X, =-I' ( X, X) =-I.
(b) Two dice are thrown, their scores being a and b. The first die' is left on
the table while the second is picked up and throw:n again giving the score c.
Suppose:the process is repeated a large number of times. What is the. correlation
=
coefficient between X a + band Y a + c '? =
1
ADS. I' (X, Y) =2

20. (a) If X and Yare independent random variables with means III and 111
an<;l variances cr12, cri respectively, show that the correlation coefficient between
U =X and V =X - Y in terms of 11 .. 112. crl2 and cr22 is crll..J cr. 2 + cr22 .
1.0·26

(b) If X and Yare independent random variables with non-zero variances,


show that the correlatiol) coefficient between U =XY and V.= X in terms of
mean and variaQCe of X and r is given by
+ +
[pelhi {{ni". B.Sc. (Stat Ron••), 1981]
21. If Xi, Yjand Zk are-
all iildependt;nt random with mean zero
and unit variance, fmd correlation coefficient between
m m
=it- lX;i +' 1
II II

U I. Yj and V =i-I
I. ,Xi + I.
k-l
Zk

Ans. r(U. V) = m/(m + nf (Bombay Uni"., B.Se, 199()


22. (a) Find the value of I so that the correlation •coefficient between
(X -/Y) and (X + Y) is maximum, w4ere X, Y are independent random
variables each with mean zero and variance 1. [Ans.1 -1] . =
Hint. U =X•-IY .. V =X +-Y. find I.so that r (U. V) = 1.
(b) If U =X + kYand V =X + mY and r is the correlation coefficient
between X and Y, find the correlation coefficient between U and V. Show that
U and Y are if k = - CJx (CJx + rm CJy)
• CJy (r CJx + mCJy)

and further
,

If m =-,
CJx
CJy
then k =- -.
CJx
CJy
(Gujarat Uni". M+, 1993)
23. Xl' X2, X, are three variables, each with variance CJ2 and ·the correlation
coefficient.between any two of them is'r. If X =(Xl + Xz + X3)!3, show that
Var (X) ='3(1 + 2;)
- (12

Deduce that r -1/2.


24. (a) If U =aX + bY and V =bX - aY, show that cJ and V are
ab PCJx CJy
if a2 _ b2 =CJxz - CJy2
where p is the coefficient of correlation between X and Y. Show further that, in
this case
CJrr + =(a2 + bl)(CJr + CJ,.z) and CJu CJv =(a2 + bl) CJx CJy
CJ';' 1- p2

(b) If u = aX + bY, v =eX + dY, show that

Ivar(U)
cov (U,v)
cov (u,V)
.'
vm; (y) = e d
I Ia b 121 var (X)
cov (X. Y)
cov (X, Y)
- var (Y)
I
25. If X is 8' standard normal' variate and Y =a + bX + eXl,
where a. b. e are constants, find the correlation coefficient between X and Y.
Hence or otherwise obtain the conditions when {i)'X-and Yare uncorrelated and
(ii) X and Yare perfectly correlated.
26. (0) If X - N (0, I), fmd corr (X. y) where y.,= 0 + bX + cXl.
[Delhi Uni". B.Sc. (Math •• Ron••), 1986]
10.28

" b
ADs. r(X, Y) =.1
'I b" + 2c"
(b) If X has Laplace distribution with parameters (A, 0) and
Y = 0 + bX + cX2, find p(X, Y)
[Delhi Univ. B.A. (Stat. HOM. Spl. Coru·.e). 1989]
• 1 '\
,lHnt. p(x) ="2)" exp [-AI X I], -00 < X < 00.
=
E (X24+') =0 =J.l.2k+ I ; E(J(11) J.I.2i =(2k)! 1).,2k
'Ab
Pxr - Vb2)..2 + 10c2
27. In a sample or' n random observations frOIn exponential distribution
with parameter A, the number of observations in (0, 1A) and (lA, 2A),
denoted by X and Yare noted. Find p(X, Y),
t/A

Hint. PI =p(O < X < 1A) = I e 1


o
2A
P2 =p(lA .;. Y < UA) = J)., = e ;Z 1
t/l.
Then (X, Y) has a trinomial distribution with parameters (n =3, PI', P2,
P3 = I-PI -pv.
Hence we have

p(X, Y) = -[(1 . . p'!m - P2)]:1Z = - ve: _- e1+ 1 .


28. Prove that :
r(X, Y + Z) = ay • r.(X, Y) + az • r(X, Z)
Cly+z ay+z
19. If X and Y are independent random find Corr(X, XY).
Deduce the value of Corr(X, X/Y),
Ans. r(X, XY) =ax + ....; + J.l.yZ a';plZ
30. Prove or Disprove:
(0) r(X, Y) =0 r(1 X I, Y) • 0
(b) r(X, Y) = 0, r(Y, Z) = 0 r(X, Z) O. =
ADs •• (0) False, unless X and Y are independent.
(b) Hint. I:.et Z !E! X, and'X and Y be independent Then
r(X, Y) =0 = r(Y, Z). But r(X, Z) =r(X, X) = 1.
31. random variable X JIave a p.d;f. f(.) with distribution functioft
F (.), mean",!" and variance a 2 • Derme· Y ="(l + pX, where (l and p arl"
constants satisfying - 00 < (l < "", and P> O.
(0) Select (l and p so that Y has mean 0 and variance 1.
(b) What is the correlation between X and Y'1
(lorI'elatlonandBecr-lon lO.z'7

32. Let (X, f) be jointly aiscrete·random variables such that" each X and Y
have at most two mass points. Prove or disprove: X and Y are independent if
and only if they are unCQrreIated:
Ans. True.
33. If variableS XI' XZ, ... , Xz,. all !1l1,ve the same variance (J2 and the
correlation coefficient between Xi and Xi (i :f::J) has the same value, show that
II :lit 'i' •

the correlation between 1:'¥i and 1: Xi is given by [np/{l + (n -l)plJ.


i-I i-II+1 .

34. The means 'Qf independent r.'!I's XI. X 2, ••• , XII are zero and variances
are equal, say unity. The correlation coefficients betWeen the :;um of selected t
«11) variables out of these variables and the sum of all n variables are found
out. Prove that the sum of squares ·of all these correlation is ....1C1-1'
[Burdwan Univ. B.Sc. (Bon8.), 1989]
35. Two variables U and V are made up of the sum of a number of terms
as follows:
U=X'I +Xz .+ ••• +XiI+YI +Yz + ....... Y .. ,
V =X I + X z + ... + ZI + Zz + ... + Z,.,
where a and b are all suffixes and where X's, Y's and Z's are all uncorrelated
standardised random variables. Show that the correlation coefficient between
U and V is n • Show furUter that
(n'+a)'(Ii+b)
(n + b')'U + (n ... a) V }
--' al _I . ...(.)
i ...' 1l =V (n + b) U -"V (n + a) V
are llIICOIICJated [South Gujarat Univ. RSc., 1989]
36. (a) Let the random variables X and Y have the joint p.d.f.
I(x, y) = 1/3; (x, y) = (0, 0), (I, 1) (2,0)
Compute E(X), V(X), E(y), V(y) and r(X-, n. Are X and Y stochastically
independent? Give reasons.
(b) Let (X, Y) have the probability :
1(0,0) =.0-45,1(0,1).= 0·05,1(1, 0) = 0·35,],(1, -I) = 0·15.
Evaluate VOO, \/\1') and p(X, n.
Show tha"
while X and Y are correiated. X
X and X - 5Y independent 7
X-5Y are 'Uncorrel&ted; Are

(c) Given the bivariate distribution :


-/(-1,0) = 1/15, 1 (-I, 1) =
3/15, 1(-1,2) = 2/15
1 (0, 0) = 2/15, /(0. i) = 2/15, 1 (0, 2) 1115
1 (l, 0) =
1/15, J (I, 1) =
1/15, I (I, 2) =
y,) =0, .elsewhere.
Obtain :
(.) Tae inarginal distributions'of X and Y.
10.28

(il) The conditional dis.tUbutions of .Y given X = o.


(iii) E(YIX =0).
(iv) The product moment correlation coefficierifbetween X and Y.
Are X and Y independently distributed ?
'37. If X· anti Y are standardised variates with correlation coefficient p,
prove that E [max (X2, f2)] 1 + ..J \ -
Hint. max (X2, = P, - + + " "0(*)
E(X) =E(Y) =0; E(X2), = = I; fi(XY) =p
and [E I X -.Y i . I X + Y 1}2 S E (X - Y)2 • (X + Y)2
(By Cauchy-Schwartz Inequality)
38. The of two vaQates X and Y is given by
f(x. y) = k[(x + y) - (xl yl)] ; 0 < (x. y) < 1
=; 0, otherwise.
Show that X and Yare uncorrelated but not independent
39(a)0 If the random variables X and Y have the joiOt p.d.f.,
X + y; 0 <: x < -,I, ' 0 < y < ]
/(x. y) ={
O. elsewhere' .
then show' that the correlation coefqcient Yis - .111 •
[Madra Univ. B.Sc., Oct., 1990]
.' ,
(b) The dtonsity function/of a random is given by
kX2,
f(x) ={
0,
0

otherwise
(I) What is the of k ? What is the disttibution function .of X ?
(ill Obtain the density of the random variable Y =Xl.
(iiI) 01?tain correlation coefficient between X and Y.
(iv) Ate X and Y disttibuted ?
40(a). If/(x. ,Y) = 6'-x-y
8 ' ; 0 Sx S 2.2 S,y ..,
find "<i) Var (i,) Var (Y) -(iii) r (X. 10.
A (0) 11 (01\ 11 (00:\ 1.
ns. I 36' "'I 36 ' lUI - 11 .
(b) Given the joint·density of random •. Y, Zas :
/ (x. 'y. z) =k x exp [- (y +. z)], 0 < x < 2, y <!: 0, z <!: 0
= 0, elsewhere
Fmd
(i) k •
.(u) the marginal density function,
(iii) conditional expectation of Y, given X and Z. and
(tv) the product mqment correlation between X and Y.
[Madra Uni.,. B.Sc. (Main'Stat.), 1988]
10·29

(c) Suppose that the two dimensional random variable (X. Y) has p.d.f.
given by I (x. y) =ke-Y .0 < x'< y < 1
=O. elsewhere
Find the correlation coefficient rxr. [Delhi U,aiV. M.C.A., 1991]
41. The joint density of (X. Y) is :
j(x.y)=l(x+y), OSxS2,OSy S2.
Find 1.1.'r.r =E (xr Y') and hence find Corr (X. Y).
A .. ' - 2r +' [ 1 + 1 ] . r __ l
ns· ... (r+2)(s+l) (r+ Ih(s+ 2) • - If
(b) Find the'm.g.f. bf'the bivariate distribution:
j(x, y) = 1. 0 < (x. y) < 1
. =o. otherwise
and hence find 1:
Ans•. M (I .. Iz) (e'l .... '1) (e'z - 1)/(11 Iz); 11 ¢ 0, Iz.¢. O. r (X. Y)'.= O.
42. Let (X. Y) have joint.denSity :
j(x; y) = + y)1 (0, _) (x) ./(0, _) (y)
Find Corr (X. Y). Are X and Y independent.?
An$. Corr (X. Y) =0: X and Yare independenL
43. A bivariate distribution in two discrete random variables X and Y is
defined by the probability generating function :
exp [a(u - 1) + b(v - 1) + c (u - 1) (v - 1)],
=
simultaneous probability of X rny =s. where r and s are integers being the
coefficient of urv'. Find the correlation coefficient between X and Y.
= = +
'Hint. Put u ell and, v e'z in exp [a(u - 1) + b(v - 1) c(u - 1) (v - 1)].
the result will be the m.g.f. of a bivarl8te distribution and is given by
=exp [a(e 'l - 1) + b(e'z -. i) + - 1) (e.'z - 1)]
M(l.. lz)

Wehave [aM] =d. [aZM


atl ,} = '2 = ,0 a,l z ],} = 'i = 0 =a(a -t 1).
[ a'laZM_
a,z.] '1' =0
=ab
..
+ c, [aM ]
a, '1 = 0 =b, [aZM'
z atzz. ]'1 = 0 ;= b(b +, 1)

So we have
E(X) = a, E(XZ) = a(a + 1). E(Y) == b. E(y2) =b(b + 1) and E(XY) = ab + c
•• reX. Y) = E(XY) - E(X) E(D _....£:....
V[E(X2) - {E(X»2] ;[E(f2) {E(Y»2]
44. Let the number X be chosen at random from among the integers I, 2,
3, 4 and the .number Y be chosen from among those at least as large as X.
Prove that Cov (X. Y) = 5/8. Find also the regression line of Y on X.
[Delhi Univ. B.Sc. (Math •• Ron ••), 1990]
Hint. P(X = k) = l ;k ... -1, 2, 3, 4 and Y X.
10·30 Fundamentals of Mathematicai Statistics

, i
P(Y=yIX= 1)=4";)'=

P (Y = .V I"X.= 2) = 1
3'·""\I' = 2 3 , 4
1
P (Y = Y I X = 3) = 2"')' = 3,4 ; P (Y = Y I X:;, 4) ;= 'I "y = 4.
The joint probability distribution can be obtained on using:
P (X = x, Y = y) = p (X = x) . fey = y I X- = x).
r (X, y) = Coy (X, n= . 5/8 =- fT5
qx(jy '\[41
. I'me 0 f Y on X : Y - E (y)
RegreSSion =r(jy
- [X -
(jx
E (X)]
45. Two ideal dice are thrown. Let XI be the score on the first dice ad X2,
the score'on the second dice. Let Y =max {XI' X21. Obtain the joirit'distribution
of Yand XI and show that
3
Corr «Y, XI) = ,1-
2 "'173
46. Consider an experiment of tossing two tetrahedra. X be tht:· number
of the down turned face of-first tetrahedron and Y, the larger of the two numbers.
Obtain .the joint distribution of X and Yand hence. p (X, y).
Ans. p (X, l') = COy (X, n 5/8 _2_
(jx (jy 5/4. '" 55/64 {U
47. Three fair coins are tossed. Let X denote the number of heads on the first two
coins and let Y denote die number of tails on the last two coins.
(a) Find the joint distribution of X and Y.
(b) Find'the conditional distribullon of Y given that X = t.
(c) Find COy. (X', y)
Ans. COy. (X, y.) ;::"-1/4.
48. For the trinomial distribution of two random variables X and Y:
n!
f(x,y) =x!Y !(Il-x-y) !pXq.l'(I_p_q)!I-X-Y
=
for x, y 0, 1, 2, .... , n an"d x + y S; Il, P 0, q 0 and p + q S; I.
(a) Obtain the marginal of Y
(b) Obtain E(XIY=y).
(c) Find p (X,Y').
Ans. (a) X-
B (n, p), Y - B (II,

(b) (X I Y=y) - B (n - y,
(Note: p + q::F. I)

:. E(X I Y =y) =(Il - y) (


CorreJationand Beer-ion 10·31

(c) (l-Pi1-1-Q) T"2


OBJECTIVE TYPE QUESTIONS
I. Comm lOt on the following:
(,) rn =0 -x tiJ\ct-Yare j,Qdependent.
-.... ., <'<
(i,) If > 0 then rx, _y > 0, r_Xy>.O and r_x _y > 0
, '

(ii,) r Xl> Q- E(XY) > E(X) E(Y) -'-


(iv) PearvOn's coefficient of correlation is independent of origin but not
.
(v) The numel'ical value of product moment correlation c-oefficient 'r'
between tWo variables X and Ycannot exceed unity.
.{VI} If the-correlation coefficient between 'the. variables X and Y is zero
then the correlation coefficient between J(2 arid ,y2 is also zero.
(¥i,) If r > 0, then as X also increases.
(viii) "The closeness, of relatiOJ:lship, between two is
to r."
(iJ) r measures -every type of betw.een ·the two
U. Comment on the following values o( 'r' (correlation coefficient) :
I, - 0·95, 0,,-1·64, 0',87, 0·32, -I, 2·4.
m. (i) If PXY =-'0·9, then for.large values of-X, what sort of'values do we
expect for Y '7
(i,) If Pxt =0, what is the value of'cov (X. 'f) and how are X and Y related 7
IV. Indicate the correct answer:
(,). The coefficient'of correlation will.have positive sign when
(a) X is increasing,.r is (b) both X and Y are
(c) X decreasing, Y is increasing, (cI) tI;lere is no change in X and Y.
(i,? The coefficient of correlation (a) can take' any value between -1 and +1
(b) is less than -I, (c) is always more than +1, (d) cannot be
zero.
(iiI) The COefficient of correlation (a) cannot be positive, (Ii) cannot be
negative, (c) is always positive, ('d) can be both positive as well as
negative.
(iv) Probable error of r is

(a) 0·6475 1.:;-:2 • (b) 0.6754 17. ' (c) 0.6547 '1 r2 ,

(d) 0.6754 1 - ;-2 .


n
(v) 'The coefficient of correlation between X and Yis 0·6. Their covariance
is 4·8. The variance of X is 9. Then the S.D. of Y is
4·8 0·6 3 4·8
,(a)3 x 0.6' (b) 4.8 x 3 ' (c) 4:,8 x 0.6' (cI) 9 x 0.6'
Fundamental. of .Matbematical- StatistiaI

(",) The coefficient of correlation is independent of (a) change of scale only,


(b) change of only, (c) both change' of ,scale and origin,
(d) neither change 'of scale nor change of Origin.
V. Fill in the blanks . ,
(,) The Karl Pearson coefficient of correIati9D between variables' X and Y
is •••..•
(i,) Two independent are ••••••
(iii) Limits for correlation coefficient are •.....
(iv) If r be the correlation coefficient between the'random variables X and Y
then the varian.ce of X + Y is .• , •••
The absolute value of the product moment correlation coefficient·is less
thag •..••.
(v,) Correlation coeffICient.is i3nvarianl.under changes bf .• and- .'••.. ,
VI. How can you use scatter diagram to obtain an idea of extent and nature
(directiQn) of the correlation coefficient?
10·4. Calculation 01 the Correlation Coefficient for a Bivari-
ate Frequency Distribution. When the data 'are considerably large, they
may be summarised by using a two-way table. Here, for each variable a suitable
number of classes are taken, keeping in ,view the same ,considerations as in the
univariate case. If there are n classes for X and m classes for Y, there will be in
all'm Xn cells in the two-way table. .By going througb the pails of values of X
ahd Y. we can fmd the frequency for each cell. The whole set of cell frequencies
will then define a bivariate jr'eqUlmcy distribution. The column totals and row
totals will give us the marginal distributions of X and Y. A particular column
or row will be called the conditional distribution of. Y for given X or of-X for
given Y respectively. . ,
Suppose that the bivariate data on X and Yare presented in a two-way
correlation table (shown on page 10·33) wMre there are m classes of Y placed
along the line and n classes of X along a vertical line and lij is the
frequency of individu8Is lying in the (i. })th ceil.
!be If(x, y) =g(y)
'.¥'
is the sum of the along any row and
Lf(X., y) =f(x).
:1
is the sum of the frequencies along any column. We observe that
L If(x. y) = L L f(x. y) =Lf(X) :.: L g(y) ::: N
Thus JC:1":1 x
1] =k Ixf(x)
x :1

:i I I xl (x. y) = t x I/(x.
Similarly :x ., :1:; Y X

Y = N.l I IY
1<
1.I.y. , g(y)
y f(-x. y) = N
'1'.1
,
a,?- L LX2f(x. y).:.i' 2 =N.! L'X2 !(x) - i"2
JC :1 JC
BIYARIATE FREQUENCY TABLE (CORRELATION TABLE)'
.... ,,'" " 'J. '." . 0'
.- .
J( Series
-+ - Classes Total 0/
frequencies
Y • o/Y
Series Mid POints g(y)
J, Xl X2 ••• XiII··· ... X".

Yl
I.
.
I
Y2 ,-..

.
I " , . ;0..

10(
; ,
I

Yj I /(x,
- .!!.
. -<
, . I!
0

co
..
y"
.. , .
Total of N-+ l:l! f(x,y
x y
!(x) =, l:Jtx, y)
o/X . y
fix) - ,
-

Example'10·14'. The following table gives, according to. age, the


frequenC! o/marks obtained by 100 students in an intelligence test.
,
'Ages'in
... .
,
yeqrs
-+ 18 19 2(j 21 Toeal
Marks·
J. ,

10-20 4- i· 2 - 8
20-30 5 4 6 4 19'
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 - 2 4 4, 10.-
60.......70 - 2 3 6
;r,ola/ 19 22 31 28 100
..- .... ... ...... -- -
Calculate tM correlation· coefficient.
Fundamentals Statistic.

Solution.
CORRELATION TABLE .
",

u -1 0 1 2 ::i
;:..
JC _
18 19 20 21 Total v/:v) v2/(v) :t
v y Maries ftv) IH¥
(i) @ g --.:
-2 15 .10-20 8 : -16. 32 4
4 ;2 . '7-
G> @ g @ I

25 20-30 10 -19 19 -9
5 4 6 4
@ @ @ @
6 35 30-40 35 0 P 0
6 -8 10 11
g @ @ @
1 45 40-50 22 22 22 18
4 4 -6 8
@ (i) @ .'

2 55 50-60
4-
10 20 40 24
2 4
@ @ @
3 6S 60-70 6 18 54 15
2 3 1
Totalf(u) 19 22 31 ·28 100 2S 167- -52
'.

uf(uJ -19 o· 31 56 .
,; 19 0 3l 112 162

Il!. vf(Il, v) 9 0 13 30 52
v
.. -
Let . U =x - 19.. V= {(Y - 35)/I0)
- 1 68 OL8 - 1 25
u =N Lou,fu) =100 = ""." = N Lo \I g(\I) = 100 = 0·25
u .,

Cov (u,\I) = La. I.., U\I f(u, \I) - ii V = X - 0·68 X 0·25 = 0·35
1 - 162
CJrl =N u 2f(u) - U 2 = 100 - (0·68)'2 i= 1-1576
'2 1 2 - 167 2 •
CJy- =NLo \I g(v)-\l2=100-(O.25) =1·6Q75
v
r (U, V) = Cov (U, V) '-. - 0·35 -
CJu CJv 1.1576'x 1.6075'
eorreJation tmd Repwaion 10-35

Since correlation coeffjcient is independent of change of origin and scaJe,


r(X, y) = r (U, V) = 0·25
Remar.k. Figures in circles in the table on pagelO·34 are the product
termS uvf(u. v) :
. Example 10·lS . .The joint probability distribution of X and Y is giwn
below:
-1 +1 r-

1 3
0
8 8
2 2
1
8 8
Find the correlation coefficient between X and Y.
Solution.
COMPUTATION OF MARGINAL PROBABruTIES

IK -1 +1

3
g(y)

I)
1 i
8 8 8
2 2 Ii
1 i i i
- -
3
p(x) 1
8 8

Wehave:
3 S 1
E(X) '8+ 1 X '8='4

E(XZ) =I.x2p(x) =(-1)2.X 12 x i=r


Var (X) =E(X2) - [E(X)]2 =1 - =
E(y) =I.yg(y)'=Oxi+]

E(y2) = I./y2 g(y) = 02 X 1+ 12 X i =!


8 8. 2

Var (Y) = E(f2) - [E(Y)]2 =!-- =


E(X-y) =-0 x (-1) X -81 + 0 x 1 x !+ 1 x (-I) x + 1 :-< 1 x ?
I 8 8 1\

2 2
=-8+8=0

(X. Y) =E(XY)-E(X)e(Y)=O -'41 x "2=-g


1 1
10·38 Fundamentals of Mathematical Statistica
Ti
1
(X Y) _ Cov (X, Y) - i -1 -1
r, - (Jx (Jy - ... /IS 1 m =3·873
'!16 X 4
=-0·2582
EXERCISE lO(b)
1. Write a brief note on the correlation table:
The following are the marks obtained by '24 students in a class test of
Sf.atisticS and Mathematics:
Role No. of Students 1 2 3 4 5 6 7 8, 9 10 11 12
Marks in Statistics, 15 0 1 3 16 2 18 5 4 17 6 9
Marks in Mathematics 13 1 2 7 8 9 12 9 17 16 6 18
Roil No. of Students 13 14 15 16 17 18 19 20 21 22 23 74
Marks in Statistics 14 9 8' 13 10 13 11 11 12 18 9 7
MarkS in Mathematics 11 3 5 4 10 11 14 7 18 15 15 3
Prepare a correlation table taking the magnitude of each class interval as
four marks and the first class interval as "equ.'ll to 0 and less than 4". Calculate
Karl Pearson's coefficient of correlation between the marks in Statistics and
marks in Mathematics from the correlation table.
Ans. O· 5 544 ..
2. An employment bureau asked applicants their weekly wages on jobs last
held. The actual wages were obtained for 54 of them; and are recorded in the
table below; x represents reported wage, y actual wage, and the entry in the table
represents frequency. Find the correlation coefficient and comment on the
significance of the computed value. [Four figure log table may be used].

15 20 25 30 35 40

40 2
35 3 5
30 4 15
25 20
-
20 3 1
15 1
,
[Cakutta Una". B.Sc. (14ath •• ,Hon•. ), 1986]
3. Calcu1ate the correIallon
' coeffi' I ' table:-
lClent firom the f,0 Iowmg

0-10 10-20 20-30 30-40


..
0-5 1 '3 2 0
5-10 1 10 8 1
10-15 1.0 13 \0 8
15-20 5 8 10 7
20-25 0 1 5 4
Correlation and Regression 10-37

4. (a) Find the correlation coefficient between age and salary of 50 workers
in a factory :
Age Daily pay in rupees
(in years)

,J. 160-169 170-179 180-189 190.....,..199 200-209

20-30 5 3 1 .. , '"

30--40 2 6 2 1 '"

40-50 1 2 4 2 2
50-60 '" 1 3 6 2
60-70 '" ... 1 1 5
...
(b) Fnd the coefficient of correlation between the ages of 100 mothers and
daughters :
Age of mothers Age of daughters in years (Y) Total
in· years (X) . 5-10 10-15 15-20 25-30
15-25 6 3 9
25-35 3· 16 10 29
35-45 10 15 7 32
45-55 7 10 4 21
55-65 4 5 .9
Total 9 29 32 21 9 100 •

[MadraB Univ. B.Sc. (Main Math ••), 1991)


5. Given the following frequency distribution of (X. y) :

5 1.0 Total
find, the frequency distribution
of (U. V), where
10 30 20 50
X-7·S Y-15
.0 ,
U 2.5 I V = -5
20 20, 30 50 What shall be the relationship
,
between the correlation cocffic'ients
between X. Y. and U. V?
Total 50 50
0 100
"
6. (a) Find the ,coefficient of correlation between X and Y for the. following
table:
10.38 Fundamentals of Mathematica,l Statistics '

]1 ]2 Total

XI PI! PI1 P
, ..
Xl Pli I'll Q

Total P' Q' 1

(b) Consider the following probability distribution :

0 1 2
Calculate E(X), Var (X),
Cov(X. Y) and r (X. n.
0 0·1 0·2 0·1

1 0·2 0·3 0·1

[Delhi Univ. M.A (Eeo.), 1991]


(c) Let (X, Y) have the p.m.f.
p(O, i) = p(1, 0) = p(O, -I) == p(-I, 0) = k.
Find r(X, Y). Are X and Y independent? For what values of k, X + kY and
kX + Y are uncorrelated ?
10·5. Probable Error or Correlation Coefficient. If r is the
correlation coefficient in.a sample of n pairs of observations, then its standard
• .'. b S E( ) 1 - r2
error III gIVen. y ..' r.=

Probable error of correlation coefficient,is.given by


P.E.(r) =0·6745 x S.E. ,= 0·6745 (1-
{;,
'r2)
.•. (10·6)

Probable error is an old measure for testing the reliabjlity of an obserVed


coefficient. The reason for taking the factor 0·6745 is that in a
r'OImal distribution, the range 1.1 ± 0·6745 cr covers 50% of the total area.
According to Secrist, "The probable error of the correlation co-efficient is an
ar,tount which if added to and substracted from the mean correlation coefficient,
produces amounts within which the chances are even that a coefficient of
correlmionfrom a series selected at random willfall."
If r < P.E.(r), correlation is not at all significant. If r > 6 P.E.(r), it is
Ilel inilCly significant A rigorous method (t-test) of testing the significance of an
corrclatior: coefficient will be discussed later in "tests of significance"
111 [d. § 144·111.
Correlation and Beer-ion 10089

Probable error also enables us to find the limits within which the
population correlation coefficient can be expected to vary. The limits are
r ± p.E.(r). -
10·6. Rank Correlation. Let us suppose that a group of n individuals
is arranged in order of merit or proficiency in possession of two characteristics A
and B. These ranks in the two characteristics will. in general. be different. For
example. if. we consider the relation between intelligence and beauty. it is not
necessary that a beautiful individual is intelligent also. Let (Xi. Yi); i = 1. 2 •...•
n be the ranks of the ith individual iii two characteristics A and B respectively.
PeafSOnian coefficient of correlation between the Xj's and y;'s is called the
rank correlation coefficient between A and B for that group of individuals.
Assuming that no two individuals are bracketed equal in either
classification. each of the variables X and Y takes the values 1.2..... n.
Hence -X =y- =;;I (1 + 2 + 3 + ... + n) =-2-
n+1

1
crr-- LII x? -x2=-
n i-I
_ 1(
n
}2
+-
+ 22 + ... + n 2 ) - - 2 (n 1)
=n(n + 1)(2n + 1) _ (n + ·1)2 = n2 - r
6n 2) 12
n2 - 1
crx2 =---u- = crf2
In general Xi '#. Yi . Let di =Xi - Yi
di = (Xi - X ) - (yi - Y) (':x=y)
Squaring and summing over i from 1 to n. we get
L d? = L(Xi -x) - (yj - y»)2
=L(Xj -x)2 + L(Yi - y)2 - 2L(Xj.,..,i )(yi - y)
Dividing both sides by n. we get
1n Ldl =crx2 + crf2- 2 Cov (X. Y) =crx2 + crf2 - 2p crxcry/
where p is the rank correlation coefficient between A and B.
1
;;J:.ct? =2CJx2 - 2pcrx2 => 1- P =
II II

dl 6 L di 2
-1 j-,1 -1 i-I
... (10·7)
=> P - - 2ncrr - . n(n2 - 1)
which is the Spearman'sfo171llllafor the rank correlation coefficient.
Remark. We always have
LPi = L (Xi - Yi) = LXi -' LYi = n{x - i>= 0 (.: x =y)
This serves as a check on the calct;lations.
10·40 Fundamentals of Mathematical Statistics

10·6·1. Tied Ranks. If some of the individuals recei.ve the same rank in a
ranking or merit, they are said to be tied. Let us suppose that III of the
individuals, say, (k + I)"', (k + 2)"', .... , (k + III)'" are tied. Then each of these III .
individuals is assigned a common rank, which is the arithmetic mean of the
ranks k.+ I.k +2 • ....• k+m.
Derivatioll ofp (X, y):We have:

... (O)

where x=X-X.y= Y- Y.
If X and Yeach takes the values I, 2 ...... II. then we have
X = (II + 1)12 = Y
and /lCJor
"
?
= -2 11(11 - 212- I)- =an d nCJ = ""v. 2 n(11. 2 - I)
12
•.. (00)

Also = (X - Y)2 = [(X - X) - (Y - y]2 = (x _ y)2


=u2 +
=> xy = [u 2 + - ...("00)
We shaH now investigate the effect of common ranking. (in case of ties). on
the sum of squares of the ranks. Let S2 and S\2 denote the sum of the squares of
untied and tied ranks respectively.
Then we have:
S2 :: (k + 1)2 + (k + 2)2 + ... + (k + m)2
=mk2 + (12 + 22 + ... + m 2) + 2k. (1 + 2 + ... + III )
2 m(m+J)(2m+\) k( I)
=m k + 6 +m 111+
S? =m (Average rank)2
=m[(k+ 1)+(k+2;1+···+(k+III)Y

=m ( k +
m +
2
I) 2 = III k
2
+
m (m
4
+ 1)2
- • + m k (m + 1)

2_S\2 _m(III+1)[2(2 I) 3( 1)]- lII(m 2 -1)


.. S - 12 m + .- m + - 12
Thus the effect of'tying m individuals (ranks) is to reduce the sum of the
squares by III (m 2 - I )/12, though the mean value of the ranks remains the same,
viz .• (n + 1)/2. .
Suppose that there are s such sets of.ranks 'to be' tied in-the X-series so that
the total sum of squares due to them is
s s
112 Ill; (Ill? - I) =1'2 (m? - m;) =Tx. (say) ... (1O.7a)
,= \ ,= \
eorrelationand 1041

Similarly suppose that there are t such sets of ranks to be tied with respect
to the other series Y sO that sum of squares due to them is :
, ,
.!. L m.'.(m.'2-I)=.!. L (mp-m:)=Ty,(say) .... (iO·7,b)
12 j = I J J 12 j = I J

Thus, in the case of ties, the new sums of squares are given by :
n(n Z - 1)
n Var'(X) = L xZ - Tx = 12 - Ti
, Z n(n Z - 1)
nVar(y) =LY ,-Ty = 12 -Ty
ad n Cov'(X, Y) = [L xZ -:fx + LYz - Ty - Ld2] [From (***)]

_![n(n Z -1) T n(n Z - n _T y - dZ]


-2 12 x + 12
n(n Z - 1) 1 [
= 12 ...,. 2 (Tx + Ty) + L tP ]

12
11_ L2 [Tx + Ty+ dZ]
n(n 2 -
p(X. Y) =---==----------
Z
[ n(n 12- I) - Tx JII2 [n(.n 2 - 1)
12 - Ty
JIll
n(n 26- 1) _ [Ld 2 + Tx + Ty]

[ n<nZ6- I) - 2Tx JI/2 [n<n2 - 1)


6 -- 2 T y
JIll
... (l0·7c)
where Tx and Tyare by (10·7a) and (10·7b).
Remark. If we adjust only the covariance ·term Le.• Liy and not the
variances Gx2 (9r L x 2) and Gy2 (or LY) for ties, then the formula' (10.7c)
reduces to:
_ (Ld2 + Tx + Ty)
p(X. Y) = n(n Z _ 1)/6
_ I _ 6 [Ld 2 + Tx + T y]
... (l0·7J)
-
a formula which is commonly used in practice for problems. For'
illustration, see Example 10·18.
Example 10·16. The ranks of same 16 students in Mathematics and
Physics are as follows. Two numbers within brackets denote the ra* of the
Gtudents in Mathematics and Physics.
(1.1) '(2,10) (3,3) (4,4) ·(5,5) (6,7) (7,2) (8,6) (9,8)
(10,11) (11.15) (12,9) (13,14) (14,12) (15,16) (16.13).
Fundamentals olMathematieal Statistic,.

Calculate the rank correlation coefficient for proficiencies of this group ill
Mathematics and Physics.
Solution.
Ranks in 1
Maths. (X)
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total -
Ranks in .1 10 3 4 5 7 2 6 8 11 15 9 14 12 16 13
PhysiC'S(Y)

d=X-Y 0 -8 0 0 o -1 5 2 1 -1 -4 3 -1 2 -1 3 0
tP 0 64 0 0 0 1 25 4 1 1 16 9 1 ;4 1 9 136
Rank correlation coefficient is given by
6 r. d 2 6 x 136 1 4
P = 1- n(n2 _ 1) = 1 -16 x 255 = 1- 5= 5= 0·8
Example 10·17. Ten competitors in a musical test were ranked by the
three judges A. Band C in the following order:
Ranks by A: 1 6 5 10 3 2 4 9 7 8
Ranks by B : 3 5 8 4 7 10 2 1 6 9
Ranks by C : is 4 9 8 1 2 3 10 5 7
Using rank correlation method. discuss which pair ofjudges has the nearest
approach to common likings in music.
Solution. Here· n = 10
Ranks Ranks Ranks
by A by B by C ·dl d,. d3 dl 1 d,.1
(X) (Y) (Z) = X-Y =X-Z =Y-Z

1 3 6 -2 -5 -3 4 25 9
6 5 4 1 2 1 1 4 1
5 g 9 -3 -4 -1 9 16 1
10 4 8 6 .2 -4 36 "4 16
3 7 1 -4 1 6 16 4 36
2 10 2 - 8 0 8 64 0 64
4 2 3 2 1 -1 4 1 1
9 1 10 8 -1 -9 64 1 81
7 6 5 1 2 1 1 4 1
8 9 7 - 1 1 2 1 1 4
Total rdl =0 !,dzl=60 !.di
=214

6r.d12 6 X 200 40 7
p(X. Y) = 1 - n(n2 _ 1) =1 10 X 99 = 1 - 33 = - 33
6r. dl 6 X 60 4 7
p(X. Z) = 1 -,r,(n 2 _ 1) = 1 - 10 x.99.= 1 It
()on'e1ation and Recr-ion 1043

{) L d32 6 X 214 49
p(Y. Z) = 1 - n(n2 _ 1) = 1 to x 99 165
Since p(X. Z) is maximum, we ·conclude that the parr of jQdges A and C
has the nearest approach to common likings in music.
10·6·2. Repeated Ranks (Continued). If any two or more
individuals are bracketed equal in any classification with respect to
A and B, or if there is more than one item with the same value in the series,
then the Spearman's formula (10·7) for calculating the rank correlation
coefficient breaks down, since in this case each of the variables X and Y does
not assume the values 1,2, ... , n and consequently, X:I;. y.
In this case, common ranks are given ro the repeated items. This commor.
rank is the average of lhe ranks which items would h?ve assumed if they
were sightly different from each other and the next item will get the rank next to
the ranks already assumed. As a result of this, followiqg adjustment or
correction is made in the rank correlation formula [c.f. (10·7c) and (10·7d)].
m(m 2 -1)
In the formula, .we add the factor J2 to Ld 2 , where m is the
number of times an item is repeated. This correction factor is to be adcled for
each repeated value in both the X-series. and Y-series.
Example 10·18. Obtain the rank correlation coefficient for the following
data:
X 68 64 15 50 64 80 75 40 55 64
Y 62 58 68 45 81 6() 68 48 50 70
Solution.
CALCUlATIONS R>R RANK CORRELATION

Rank X Rank Y
X Y (x) (y) d=x-y d2
68 62 4 5 -1 1
64 58 (; 7
75 68 2·5 3·5 -1 1
50 45 9 10 -1 1
·64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2·5 3·5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
!.d=O !,d2 =72
In the X-series we see that the value 75 occurs 2 times. The common rank
given to these values is 2·5 which is the average of 2 and 3, the ranks which
these values would have taken if they were different. The next value 68, then
gets the next rank which' is 4. Again we see that value 64 occurs thrice. The
common rank given to it is 6 which is the of 5, 6 and 7. Similarly in
10.44 Fundamentals of Mathematical Statistics

the Y-series, the value 68 occurs twice and its common rank is 3·5 which is the
of 3 and 4. As a result of these common rankings, the fonnula for 'p'
m(m2 -l) .
has to be corrected. To L tP we add 12 for each value where
m is the number of times a value occurs. In the X-series the correction is to be
applied twice, once for the value 75 which occurs twice (m =2) an<t then for the
value 64 which occurs thrice (m =3). The total correction for the X-series is
2(4 - I) 3(9
12 + 12 -2
. 2(4 -1) 1
Similarly, this correction for the Y-series is 12 - 2"' as the value 68
occurs twice.
6[LtP + +
2
!.J2
6(72 + 3)
Thus p = 1 - --n-(n""""2=---I-)- = 1 - 10 x 99 = 0·545
10"6'3. Limits for the Rank Correlation Coefficient.
Speannan 's rank correlation coefficient is given by

'p' is maximum, if L" d?- is minimum, if each of the deviations dj is


j .. 1
minimum. But the minimum value of d j is zero in the particular case Xj =Yj,
i.e., if the ranks of the ith individual in the two characteristic are equal. Hence
the maximum value of p is + I, ,i.e., p:s; 1.

'p' is minimum, if "L d?- is maximum, i.e., if each of the deviations dj


i -= 1
is maxiJ1\um which is so if the ranks of the n individuals.in the two
characteristics are in the opposite directions as.given below:

x 1 2 3 ... ... n -1
. n

y n n - 1 11-2 ... ... 2 1

Case 1. Suppose n is odd and equal to (2m + 1) then the values of dare:
d : .2m, 2m - 2, 2m - 4, ... , 2, 0, -2, -4, ... - (2m - 2), -2m.
.. L" dl =2 {(2m)2 + (2m - 2)2 ... + 42 + 22)
j .. 1

_ 8{ 2 ( _ 1)2 12) _ 8m (,;, + I) (2m :+ 1)


- m+m + ... + - 6.
Correlation and Beer-ion 10-45

II

6 I. d?
Hence . P =J j - 1 _ 1 8m(m + 1)(2m + 1) .
n(n 2 -I) - -(2m + I){(2m + I)2-I)
=8m(m + 1) = 1 _ 8m(m + I) -1
(4m2 + 4m) 4m(m + 1)
Case n. Let n be even and equal to 2m. (say).
Then the values of d are
(2m - 1). (2m - 3) •...• 1. -1. -3 •...• -(2m - 3). -(2m - 1)
. . L d? = 2 {(2m - 1)2 + (2m - 3)2 + ••. + I2)
= 2[{(2m)2 + (2m _1)2 + (2m - 2)2+ '" + 22 + 12)
_{(2m)2 + (2m - 2)2 + ..• + 42 + 22)]
2
::; 2[1 + 22 + '" + (2m)2 _ {22m 2 + 22(m -'--1)2 + .•. + 22}]

=2[2m (2m + + 1) 4 m{m + + UJ


=2; [(2m + 1) (4m + 1) - 2(m + 1)(2m + 1)]
= 2; [(2m + I)(4m + 1 -2m - 2)]
= 2; (2m + 1)(2m _ 1) = - 1)

_ 6I,dj2 _ 4m(4m 2 - 1)_


.. p - 1- n'(n2 _ 1) - 1 - 2m(4m2 _ 1) --1
Thus the limits for'rank correlation coefticient are given by -1 S pSI.
Aliter. For an alternate and simpler proof for obtaining the minimum
value of p. from -(*) onward. proceed as in Hint to Question Number 9 of
Exercise 100c).
Remarks on Spearman's Correlation Coefficient.
1. I.d =I. x - I.y = n(i - y) = O. which provides a check for numerical
calculations.
2. Since Spearman's rank correlation coefficient p is nothing but·
Pearsonian correlation coefficient between the ranks. it can be interpreted in the
same way as the Karl Pearson's correlation coefficient.
3. Karl Pearson's correlation coefficient assume that the parent population
from which sample observations are drawn is normal. If this assumption is
violated then we need a measure which is distribution free (or non-parametric). A
distribution-free measure is one which doesnot make any assumptioJls about the
parameters cf the population. Spearman's p is such a measure (i.e .•
distribution-free). since no strict assumptions are made about the form of the
population from which sample observations are drawn.
4. Speannan' formula is easy to understand and apply as compared with
Karl Pearson's formula. The value obtained by the two formulae. viz .•
Pearson ian , and Spearman's p. are generally different The difference arises due
to the fact that when ranking is used instead of full set of observations. there is
10.46 Fundamentala of Mathematical StatistiQl

always some loss of infonnation. Unless many ties exist. the coefficient of rank
correlation should be only slightly lower than the Pearsonian coefficient.
S. Spearman's formula is the only formula to be used for finding
correlation coefficient if we are dealing with qualitative characteristics which
cannot be measured quantitatively but can be arranged serially. It can also be
used where actual data are given. In case of extreme observations, Spearman's
form ula is preferred to Pearson's fonnula.
6. Spearman's fonnula has its limitations also. It is not practicable in the
case of bivariate frequency distribution (Correlation Table). For n> 30, this
fonnula should not be used-unless the ranks'are-given, since in the contrary case
the calculations are quite
EXERCISE lO(c)
·1. Prove that Spearman's rank correlation coefficient is given by
1 - 63Ld? , where d j denotes the difference between the ranks of ith
n - n
individual.
2. (a) Explain the difference between product moment correlation
coefficient and rank correlation coeffic\ent.
(b) The rankings of teD' students in twO' subjects A and B are as follows:
A 3 5 8 4 7 1-0 1 6 9
B 6 4 9 8 2 3 10 5 7
Find the correlation coefficient.
3. (a) Calculate the coefficient of,correlation for ranks-from the following
data :
(X, Y): (5; ,8); (10, 3), (6, 2), (3, 9). (19., 12),
(5. 3),
(6, 17), (12, 18), (8, 22). ,(2. 12). (10, 17),
(19, 20).
[CaUcut Univ. B.Sc. (Sub•• Stat.), Oct. 1991]
(b) Te'n recruits were subjected to a selection test to ascertain their
suitability for a certain course of training. At the end of training they were given
a proficiency test.
The marks secured by recruits in the selection test (X) and in the proficiency
test (Y) given below : -
Serial·No. 1 2 3 4 5 6 7 8 9 '10
X 10 15 12 17' 13 16 24' 14 22 20
Y : 30 42 45 46 33 34 40 35 39 38
Calculate product inoment correlation coefficient and rank correlation
coefficient. Why are two coefficients different?
4. (a) The I.Q.'s,of a group of 6 persons were measured, and they then, sat
for a certain examination. Their I.Q.'s and.examination marks were as follows:
Person: A B' C D E' F
I.Q. : 110 100 140 120 80 90
Exam. marks : 70 60 80 60 10 20
Compute the coefficients of correlation and,rank correlation. Why are the
correlation figures obtained different?
Ans. 0·882 and· 0,9.
eon-eJation and Recr-ion 1047

The difference arises due to the fact that when ranking is used instead of the
full set of observations, there is always some loss of information.
(b) The value of ordinary correlation (r) for the following data is 0·636 : -
X: ·05 ·14 ·24 ·30 ·47 ·52 ·57 ·61 ·67 ·72
Y: r·08 1·15 1·27 1·33 1·41' 1·46 1·54 2·72 4·01 9·63
(i) Calculate Spearman's rank-correlation (p) for this data.
(iJ) What advantage of p was br9ught out in this example ?
4. Ten competitors in a beauty contest an[ ranked by three judges as
follows:
Competitors
Judges 1 2 3 4 5 6 7 8 9 10
A 6 5 3 10 2 4 9 7 8 1
B 5 8 do 7 10 2 1 6 9 3
C 4 9 8 1 2 3 10 5 7 6
Discuss which pair of judges has the nearest approach to common tastes of
beauty.
s. A sample of 12 fathers and their eldest sons gave the following data
about their height in inches :
Father: 65 63 67 64 68 62 70 66 68 67 69 71
Son : 68 66 68 65 69 66 68 65 ii 67 68 70
Calculate coefficient of rank correlation. (Ans. 0·7220)
6. The coefficient of rank correlation between marks in Statistics and marks
in Mathematics obtained by a certain group of students is 0·8. If the sum of the
squares of the difference in ranks is given to be 33, find the number of student in
the group (Ans. 10). [ModrOB Univ. B.Sc., 1990]
7. The coefficient of rank correlation of the marks obtained by 10 students
in Maths and Statistics was found to be 0·5. It was later discovered that the
difference in ranks in two subjects obtained by one of the students was wrongly
taken as 3 instead of 7. Find the correct coefficient of rank .correlation.
6l:tP
Hint. 0·5 =1-- 10 x 99

=> l:tP =6990


x 2 =82·5

Since one difference was taken as 3 instead of 7, the correct value


of l:tP is given by
Corrected l:tP = 82·5 - (3)2 + (7)2 = 122·5

Corrected P _ 1 6 x 122·5 0.2576


- - 10 99
10-48 Fundamentals of Mathematical Statistics

8. If d i be the difference in the ranks of the ith individual" in two different

characteristics. then show that the maximum value of


i a
" d?
1
is i (n 3 - n).
Hence or otherwise. show that rank correlation coefficient lies'between -1 and
+L . [Dellai Univ. B.Sc. (Math •• Ron••); 1986]
9. LeUlt Xl ••••• X" be-the ranks of n individuals according to a character A
and Ylo Yl ..... y" be the ranks of the same individuals according to other
character B. Obviously (Xlo X2 • ••:';x;;).and Ql. Y2 • .... y,.) are permutations of 1.
2 •...• n. It is given that Xi + Yi = 1 + n. for i = 1, 2, ...• n. Show that the value
of the rank correlation coefficient p between the characters A and B is -1.
Hint. We are given Xi + Yi = n + 1 'Vi = 1.2•...• n
Also Xi - Yi = di

2Xi = n + 1 + dj => dj = 2xi - (n + 1)

" df =
" [4x? + (n + 1)2 - 2(n + l)2xi1
i-I i-I

_ 4 n(n + 1)(2n + 1) ( 1)2 4(n + l)n(n + I}


- 6 +nn+ 2 '

_ n(n 2 - 1)
- 3

"
6 d?
p =1 i. 1
n(n2 - 1)
--1

Remark. From Spearmans' formula we note that p will minimum if


d? is maximum. which will be so if the ranks X ar.d Y are in opposite
directions as given below:

:
This gives us
Xi + Yi =.n + 1. i = 1.2•...• n.
Hence the value of p =- t' obtained above is minimum value of p.
10. Show that in a ranked bivariate distribution in which no ties occur and
in which the variables are independent
(a) I. d? is always even. and
i

(b) there are not more $an (.1 3 - n) + 1 possible values of r.


Con"eJation and Regreuion 1049

11. Show that if X. Y be identically distributed with common probability


1
mass function: P (X =k) =Iii' for k = 1.2•...• N; N >1.

then Px .y. the correlation coefficient X and Y. is given by


I _ 6E(X - y}2
N2 - 1
[Delhi Univ. B.Sc. (Math. Ron••), 1992]
("

10·7. Regression. The tenn "regression" literally means "stepping


back towards the average" . It was fIrst used by a British biometrician Sir Francis
Galton (1822-1911). in connection with the inberitance of stature. Galton
found that the offsprings of abnonnally tall or short parents tend to "regress" or
"step back" to the average population height. But the tenn "regression" as now
used in Statistics is only a convenient tenn without having any reference to
biometry.
Definition. Regression analysis is a mathematical measure of the average
relationship between two or more variables in terms of the original units of the
data.
In regression analysis there are two types of variables. The variable whose
value is influenced or is to be predicted is called dependent variable and the
variable which influences the values or is used for prediction. is called
independent variable. In regression analysis .ndependent variable is also known
as regressor or predictor or explanatory variable while the dependent variable is
also known as regressed or explained variable.
10·7·1. Lines of Regression. If the variables in a bivariate
distribution are related. we will find that the points in the scatter diagram will
cluster round some curve called the "curve of regression". If the curve is a
straight line. it is called the line of regression and there is said to be linear
regression' between variables. otherwise regression is said to be curvilinear.
The line of regression is the line which gives the best estimate to the value
of one variable for any specific value of the other variable. Thus the line of
regression is the line of "best /it" and is obtained by the principles of least
squares.
Let us suppose that in the bivariate distribution (Xi. Yi); i = 1.2•...• n; Y
is dependent variable and X is independent variable. Let the line of regression of
Yon X be Y =a + bX.
According to the principle of ·least squares. the normal equations -for
estimating a and b are,{ c.j. (9·2a})
10.00 Fundamenta1e of Mathematical Statiatiea

L" Yi = na + b L
" Xi ... (10·8)
;ml i-I

and
"
L X;Yi
"
=a i=1 "
L Xi + b L xl- ... (10·9)
i-I i-I

From (10·8) on dividing-by n. we get


y = a + bi ... (10·10)
Thus the line of regression of Y on X passes through the point (i. y).
Now
1;' -_ 1;' -_
llu=Cov(X.y) =- - ... (10·11)
n i-I nial

Also !n ; i_ •x? =(J;(2 + i 2 ••. (lO·l1a)

Dividing (10·9) by 11 and using (10·11) and (lO·lla), we get


Illl + i y =ax + b«(Jx2 + i 2) ... (10·12)
Multiplying (10·10) by i and then subtracting from (10·12), we get

Illl = bu;(2 b= ... (10·13)


"Since' b' is the slope of the line of regression of Yon X and since the line
of regression passes through the point (x , y ), its equation is
- = b(X -x
Y - Y - ) = -(Jx
Illl
2 (X-X- ) ... (10·14)
_ (Jy ( _
Y -Y = r - X -x) ... (I()'I4a)
(Jx
*
Starting with the equation X = A BY and proceeding similarly or by
simply interchanging the variables X and Y in (10·14) and (10·14a), the
equation of the line of regression of X on Y becomes
X -x-Ill.
=(J.(2 (Y -y-) : .. (10·]5)

-
X - x -(Jx (Y -
=r Uy -)
l' ... (10·15a)

Aliter. The straight line defined by


Y = a+bX ... (i)
and satisfying the residual (least square)
S = E [(Y - a - bX)2] =
for in a and b, is called the line of regression of Y on X.
The necessary and sufficient conditions for a minima of S. subject to
variations in a and bare:
10051

. as
(I) aa =0, asab =0 and ... (*)

a2s a2s
(ii) A =
aa 2 i)ai)b a2s
a2s a2s >0 and aaz>O '" (**)
abaa ab'l
Using (*), we get
as
aa :;: -2 E [Y -a-bX] = 0 ... (ii.j
as
ab = -2 E [X(Y - a - bX)] = 0 ... (iv)
=> E(Y) =a + bE(X) ...(v) and E(Xy) =aE(X) + bE ()(2) . .. (vi)
Equation (v) implies that the line (i) of regression of Y on X passes
through the mean varue [E(X), E(Y)).
MUltiplying (v) by E(X) and substracting from (VI), we get
E(XY) - E(X)E(Y) =b[E(X Z) - (E(X)}Z] '.
(X . Y) Z b _ COy (X. Y) _ rC1y ( .;\
=> Co v : z b C1x => - .2 - ••• VII,
C111 C1x
Subtracting (v) from (I) and using (vii), we obtain the equation of line of
regression of Y on X as :
Y _ E(Y) = COY Y) [X _ E(X») => Y _ E(y) =rC1y [X -E(X»)
C1 C1x
Similarly, the straight line by X =A + BY
and satisfying the residual condition
E[X - A -BY]2 = Minimum,
is called the line of regression of X on Y.
Remarks 1. We note that
azs
au- =2 > 0, and
azs azs
ab'l :;: 2E()(2) and aaab =2E(X)
Substituting in (**), we have

A =
azs azs
iJa2 . ab'l -
(aZs
I oai)b)
Y
=
4 [E(XZ) - (E(X»2) 4. C1J? > 0 =
Hence the solution of the least square equations (iii) and (iv), in fact,
provides a minima of S.
2. The regression equation (1O'14a) implies that the I}ne of regression of Y
on X passes through the point (i, Y). Similarly (l0·15a) implies that the line
of regression of X on Y also passes through the point ( i, y ). Hence both the
lines of regression pass through the point' (i, Y ). In other words, the-mean
i
10-52 Fundamenta1e oIMAthematical Statlstica I
values ( i. Y) can be obtained as the point of of the two regression
lines. '
3. Why two lines of Regression ? There are always two lines of regression
one of Y on X and the other of X on Y. The line of regression of Y on i
(1O·14a) is used to estimate or predict the value of Y for any given value of X
i.e., when Y variable and X is an independent variable.
estimate so obtained will be best in the sense that it will" have the minimurn
possible error as defined by the principle of least squares. We can also obtain an
estimate of X for any given value of Y by using equation' (lO·14a) but the
estimate so obtained will not be best since (10· 14a) is obtained on minimiSing
the sum of the squares of errors of estimates in Y and not in X. Hence to
estimate or·predictX for any given value of Y. we' use the regression equation of
X on Y (1O.15a) which is derived on minimising the sum of the squares of
errors of estim§les in X. Here X is' a dependent variable.9nd Y is an independent
variable. The t)Vo regression equations are npt reversible or interchangeable
because of the simple reason that the basis and assumptions for deriving these
equations are quite different. The regression equation of Y on X is obtained On
minimising the sum of the squares of the errors parallel to the Y-axis while the
regression equation of X on Y is obtained on the sum of squares of
the errors parallel tQ the X. -axis ..
In a particular case 9fperfect correlation, positive or negative, i.e., r ± I,
the equation of line of of Yon X becomes:
Y -y =±
CJx

=> .r...=-.i
CJy
= +_
..
- i)
(X.CJx' •..• (10·16)
Similarly. the equation of the line of regression of X on Y becomes :
X- i =± (Y - Y.,)
CJy

=> Y-y =±(X-i).


CJy CJx
whic" is as (10·16).
Hence in case of perfect correlation. (r = ± 1), both the lines of regression
coincide. Therefore. in general. we always have two lines pf regression except in
the particular case of perfec;t c.orrelation when both the lines coincide and we get
only one line. .
10·'·2. Regression Curves. In mo(Jern terminology. the conditional
mean E(Y I X = x) for a continuous distribution is called the regression function
of Yon X, and the graph of this function of x is 19town as.the regression curve
of Yon X or sometimes the regression curve for the mean of Y. Geometrically,
the regression function represents the y co-ordinate of the centre of mass of the
lilVariate'probabiJiw mass in the infinitesimal vertical strip bounded by x and
. x'+dx.
and Reea-ion 10..53

Similarly, the regression function of X on Y is E (X I Y == y) and the graph


of this function of y is called the regression curve (of the mean) of X on Y.
In case a regression curve' is a straight line, the corresponding regression is
said to be linear. If one of the regressions is linear, it not howev.er follow
that the other is also linear. f:or illustration, See Example 10·21.
10·4. Let (X. Y) be a two-dimensional random variable with
E(X) == X, E(Y) = Y, V(X) =crX2, V(Y) == cry2 and let r = reX. Y) be the
correlation coefficient between X and Y. II the regression 01 Y on X is 'linear
then
E(Y I X) = Y + r cry (X - X) ... (10·16a)
crx
Similarly.
,
if the regression 01 X on Y is linear.
I
then
E(X I Y) :: X + r crx (Y - f) ... (lO·I6b)
cry
Proor. Let lite regression equation of Y on X be
E(Y I x) =a + bx .••(1)
But by definition.

E(Y I x) = f: Y I (y I x) dy = J_: y I dy

IX
.. 1
(x) J00
-00
y f(x. y) dy =a + bx ... (2)

Multiplying both sides of (2) by fx(x) and integrating W.f.t. x. we get

J:oo J :ooYf(x. y) dydx =a J:lx(X) dx + b J_:Xlx(X)dX

=> J:ooY [J:!X:y)dx]dY =a+bE(X.I

=> J:00 y Iy(j)dy =a + bE(X)


i.e.. E(Y) = a + bE(X) => Y = a + bX .. :(3)
Multiplying both sides of (2) by xlx(x) and integrating w.r.t. x. we get

J_: J.:XY f(x. y) ay tU =a J': Ix(x) dx + b J:oo x 2 /x(x) dx

=> E(XY) = a E(X) + bE (X2)


=> J.i.11 + X y = aX + b(crx 2 + X'l) ... (4)
10·54 Fundamentals of Mathematical Statistiea

( •• ' III I =E(Xy) ..., E(X)E(Y) =E(Xy) - X Y ;


cr:l =E()(2) - (E(X)}2 = E()(2) - X 2
Solving (3) and (4) simultaneously. we get

._h-.-d cr;
III I -- 1l11--
and a = Y,.,. cr; X
Substituting in (I) and simplifying. we get the required equation of the line
of regression of Yon X as

E(Y I x) =-Y IlII


+ cr; (x ,- X)
--

E(YIX) =Y + (X - X)

E(Y I X) =-Y + r -crx


cry --
(X - X)

=
By starting with the line E (X I y) A + By and proceeding similarly We
shall obtain the equation of the line of regression of X on Yar
- J.l11 ....!. -- crx --
E(XI y) =X +2(Y - y)=X +r-(Y- Y)
crr cry
Example 10·19. Given
f(x, y) =Xe-x(I'+ /);x 20, y 2:0,
find the regression curve of Yon X. [B.H. Univ. M.Sc., )989]
Solution. Marginal p.d.f. of X is given by

fdx) =J f(x. y) dy = Jxe-x/y + I) dy


o o

=xe-X •
o
J e- X)' dy =xe- X l I
e-XY
o
=e- • x 0
x

Conditional p.d.f. of Y pn X is given by


I'IX'" xe-x(I'+I) .
f(y lx) =xe-X.I' y
fl(x) e-X •

The regression curve of Y on X is given by

: y =E (Y I X =x) = Jy fCy I x) dy = Jyxe- X•


" dy
o 0
eorreJationand 10006

(
1
i.e .. y=- => xy=1.
x
which is the equation of a rectangular hyperbola. Hence the regression of Y on
X is not linear.
Example 10'20. Obtain the regression equation of Y on X for the
following distribution :
f(x. y)' =(1 ;X)4 exp (- ; x. y 0

J J
Solution. Marginal p.d.f. 'of X is given by
oo 1
fl(x) = 0 f(x. y) dy =(1 + X)4 ye -y/(I+>:) dy

=(1 .; X)4 . r2 . (1 + x)2 (Using Gamma Integral)

_ 1 . >0
- (1 + x)2' x_
The conditional p.d.f. of Y (for given X) is

f(y = x)2
(1 ; exP 0
Regression equation of Y on X is given by

=(1 .; x)2' r3. (1 + X)3 [Using Gamma Integral]


=> Y =2 (1 + x) [.: r3.= 2 ! = 2]
Hence the regression of Yon X Jinear. ,
Example 10·21•. Let ('X. Y) have the joint p.d/. given by
') = {
I'I X , / y / <. ;co, 0 < x < 1
Y
JI 0, otherWIse
Show that, (he regression of Y on X i9 linear but regres$ion of X on Y is
not linear.
Solution. 1y 1 < x => -x < y < x sod' x > 1y I.
The marginal of X andfz(.} of Yare given by:

fl(X) = f(x, y) dy, = 1. dy =2x ; 0 < x < 1


-%

fz(y) = JII yl
f(x, y) dx = JI
Iyl
14x ='1 -I y 1 ; -1 < Y < 1

=
10-06

.. fdx 1y) =f}:&f =1 ! Iyl ; -I S; Y < 1,0 < x < I


- I
y , < y <0 I;0 1 < x <
-1-- , -; 1 < y < 0 ; 0 < x < I
+y

h (y 1x) =fY.(Xf = ix ,0 < x < I; 1y 1< x

E(YIX=x) = J x
-x
y.f2(ylx)dy=
JX Ldy::-.lyzi
1
_x'lx
x
4x
=0
-x
Hence the curve of regression of Yon X is y =0, which is a straight line.

E(XIY=y) = JX f • (xly)dx

E(XIY=y) = r=2(11_ y ),0<y<1

aOO E(XIY=y) = J>( I


Hence the of regression of X 00' Y is

2(11_ )' 0 < y < I


X -- { I
Y

which is not a straight line.


2(1 + y) , -I < y < 0,
.
Example 10·22. Variables X and Y.have the joiot p.dl.
f(x,y) ='31 (x + y), 0 Sx S 1.0 Sy s2.

(i)' reX. Y)
(ii) The two lines of regression
(iii) The two regression curvesfor the means.
Sohi'tion. The marginaJ.p.d'.f.'s.of X and r given by:

f. (x) , = ftx.y)dy;::kfo{x+Y)dY
,
2
:::), f.(x) =3'(I+x):OS;xS;1 .. ;(1)

h(Y) = Jl(X; y) tU = t J:(X + y) tU= If'+ x;


fz(y) =.t(k+ y ): 0 S Y S 2 ...(2)
Correlation aad Reereuion 10.&7

The diStributions are given by :


f 3(y I x) . =.!J.!J1
ft(x)
=!2 (:!....±....!.)
1 + X
r4 (x I y) - - 2(x+ y) (3)
J' - fiY) - I + 2y ...

E(Ylx) = fo ·/3
Y (y Ix) dy = 2(1 x) 5:y (X + y)dy

_
-2(1 +x)
1
2
+ y2r -
3
2 _ 3x
,;=0 -'3(x
+4
+ 1)
Similarly. we shall .get

E(X I y)
rl
= JOXf4 (x I y). dx = 1 +2 2y 510(x2+ xy) dx = 3(1
2 + 3y
+ 2y)
(ii.) Hent;e curves for means are':
3x + 4 2 + 3.1-
y = E(YIx) = 3 (x + 1) and x = E(Xly) = 3(1 +. 2y)"
From the marginal distributions we shall get

E(X) = J> f1(x) dx = E(XZ) = J>2 f1(X)dx = i8

Var(X) =0 _11..
x- - 18 9 -162

1
(i) r(X. Y} = Cov (X. Y)_ - 8t ( 2 )112
Ox ' Oy 13 23 =- 299
162 x 8t
(ii) The two lines of regression are :
Y -E(Y) =·co:VjJo.Y} [X -E(X)]
lo.ss.
met X -E(X) , = n [Y -E(Y)]

10"'3. Regression Coefricients. 'b', the slope of the line of


regression of Y on X is also called the coefficient of regression of Y on X. It
represents the increment in the value of dependent variable Y corresponding to a
.
unit change in the value of independent variable X. More precisely, we write

bY}{ =Regression coefficient of Y on X :: =r ... (10,17)

Similarly. the coefficient of regression of X on Y indicates the change in


the value of variable X corresponding to a unit change in the value of variable
Y and is given by

bxr = Regression coefficient of X on Y =


' . ,ay
=r.!!.I
ay
... (IO·17a)

10"'4. Properties. or Regression Coefficients.


(a) Correlation coefficient is the geometric mean between the regression
cqefficients.
Proor. Multiplying (10·17) and (10·17a), we get
ax ay
bxyxbyx =r-xr-=il
ay ax
r x bu •.. (10.18)
Remark. We have
r =axJ.lII. ay , bY}{ =J.lll
-:-:i
ax
and
J.lII
bxr =---::i
ay
It may be noted that the sign of correlation coefficient is the same as that of
regression coefficients, since the sign of each depends upon the co-variance term
Ilu. Thus if the regression coefficients are positive, 'r' is positive and if the
regression coefficients are negative 'r' is negative.
From (10·18), we have
x b y.x
the sign to be taken before the square root is that of the regression coefficients.
(b) If one of the regression coefficients is greater than unity. the other must
be less than unity.
Proor. Let one of the regression coefficients (say) brx be greater than
unity, then we have to show that bxr < I. .

Now brx > I b!x < 1 ... (*)


Also ils 1 by}{. bxy S I
1
Hence b xy S-b < I
yx
[From.(*)]

(c) Arithmetic. mean 0/ the, regression coefficients is greater than the


correlation coefficient r,provid.t{dlr > O.
eoneJationlind Recr-ion 10059

Proor. We have to prove that !(b rx + bxy) r

cr -21,Jr ay + r r or !2:+ ax 2 (.: r> 0)


'l ay ay
al - 2aXay 0 i.e., 0 (ay - ax)2
which is always true, since the square of a real quantity is O.
(d) Regression coefficients are independent of the change of origin but not
ofscale.
X-a
Procf. Let U =-.-h-' V =-k-
Y-b
=
X =a + hU, Y =b + kV.
where a. b. h (> 0) and k (> 0) are constants.
=
Then Cov (X, y) hk Cov (U. V), h2ael and ar- = a1' =kla';
_ I!u.. _ hk cov (U. V)
b
YX - c1r- - 1i2CJel
_! cov (U. V) -!b
- h . ael - h vu
Similarly, we can prove that
(hlk) buy
10·7·.5. Angle Between Two Lines or Regression. Equations of
the lines of regression of Yon X, and X on Y are
- ay (X-x- ) and X-x=r.-
Y -y=r.- - ax (Y -y- )
ax ay
Slopes of these lines are r . ay and ay respectively. If
ax rCJx
e is the angle
between the two lines of regression then
ay ay
r--
tan e = ax rCJx _ - I ( aXay )
l ay ay - r' ax 2 + ay2
+r-.-
ax rCJx

_ I ,.. r2 ( aXay. ) (·:,lSI)


- r ax 2 + ay2

;"= {I -r r2 ( axaXay
tan-1
2 + ay"
.. :(10,19)

Case ·(i). (r=0). If =0, e = = e =2x


r tan 00

Thus if the two variables are uncorrelated, the lines of regressio!,!


perpendicul8r to each
=
Case (ii). (r ± 1). If r =±I, tim e 0 =
e =0 or x. =
In this case the two lines of regression either coincide or they are parallel
to each other. But since both the lines of regression pass through the point
10·80 Fundalllentals StatisticS

(X , y ), they cannot be parallel. Hence in the case of perfect correlation,


positive or negative, the two lines of regression coincide.
Rema-:-ks 1. Whenever two lines there are two angles
between them, one and·the ollieI' ootuse Further tan e> 0 if
0< e < na, i.e., 9 is an acute angle and tan 9 0 if 1t/2 < 9 < n, i.e.. 9 is an
obtuse angle and since 0 < rZ < I, the acute angle (9 1) and obtuse angle 9 2
between the two lines of regression are given by

91 =Acute angle =tan-I {Ox0XOy


Z z.
+ Oy -
1----
r
rZ}
,r > 0

tL
vz =0btuse ang1
e =tan- I { Ox
Z
.Oy
, Ox + Or
2
rZ -
• --
r
I} ,r > 0

2. When r = 0, i.e .• variables X and Y are uncorrelated, then the lines of


regressions of Yon X and X on Y are given ,respectively by : [From (1O·14a)

-t
and (lO·15a)1 VI-

Y = Yand X =X,
as shown in the adjoining diagratn. (O,Y
V=Y teXt y_}
Hence, in this case (r =0), the lines X: X
of regression are perpendicular to
each other and are parallel to X- axis
and Y-axis:respectively.
3. The fact that if r =0 (variables'uncorrelated), the two lines of regression
= =
are perpendicular to each and if r ±l, e 0, i.e., the two lines coincide, leads
us to the conclUsion that for higher degree of correlation between the variables,
the angle between the lines is smaller, i.e ... .the two lines of regression are
nearer to each other. On the other hand, if the lines of regression make a larger
angle, they indicate a poor degree of correlation between the variables and
= =
ultimately for e 1t/2, r 0, i.e.. the lines become perpendicular if no
correlatiQn exists between the variables. Thus by ploUing the lines of regression
on a graph paper, we ,can. have an. approximate idea about the degree of
correlation between two variables under study. Consider' the following
:
1WOllNES 1WOllNES .1WOUNES 1WOllNES
COtNc;IDE COINODB >\PART (lOW APART(HIOH
(r=-l) (r=+l) Dl3GREEOF DEGREEOF
roRRELATION

10·7'6. Standard Err.or of Estimate or Residual Variance. The


equation of the line of regression of Y on X is
Correlation and Repoee.ion 10-81

- ay -
y = Y + r aX pC - X)

(' Y-
--= f r -,-- (X -X)
ay ax .
The residual variance s'; is the expected value of the squares of deviations
of the observed values of Y from the expected values as given by the line of
regression of Y on X: Thus
s'; =E[Y - (Y + (ray (X - X)/ax»)]2

=a1-E[Y - f _ r
ay
(X ax- =a1-E(Y*:..rX.)2

where r and r are standafdised variates so that


=
1 = E(y-l) and E(X· -r) r. =
s'; =a';[E(Y"'2) + r2 E(r2) - 2r E(X*Y*)] =a'; (I - 1'2)
Sy
.
=ay (1 - ,-2)1/2
Similarly, the lstandard error of estimate of X is given by
Sx =ax (1 - r2)112
Remarks 1. Since sr- or s'; 0, it follows that
(1 - 1'2) 0 => I r I 1
-1 S r(X. Y) 1 '
= = = ./
2. If r ± I, Sx Sy 0 so that each deviation is zero, and the two lines
of are coincidenL
3. Since, as.r2 .... I, Sx and Sy .... 0, it follows that departure of tne value
,-2 from unity indicates the departure of the relationship between the variables X
and Y from linearity.
4. From the definition of linear regreSsion, the minima condition implies
that Sy or sx is the minimum variance.
10'7·7. Correiation Coe"icient betwe;n Observed and Esti.
mated Value. Here we will find the correlation between Yand
Y" iF. -Y+ r·--
ay (X -X)
-
" ax
where Y is the estimated value of Y as given by the line of regression o[ Y on
X, which is given by
" - Cov (Y, Y)
."
r (Y,Y)
- It. •

Webave
IOo62 Fundamentala of Mathematical Statietice
1\
=> .Oy= rOy
A A A
Also Cov (Y, Y) :: E[{Y - E(y)} {Y - E(Y)}]

={ (b(X -E(X»} {r : : (X - X)}]


-0
=br---.lE[(X-E(X)}2]= r---.l
Ox Ox
( 0)2 0;'=r20';-
...
A r20'; .
r(Y, Y) =--=r=r(X,y)
Oyroy
Hence the correlation coetliclent between observed and estimated value of Y
is the same as the correlation coefficient between X and Y.
10·23. Obtain the equations 0/ the lines 0/ regression/or the
data in Example 10·1. Also.obtain the estimate a/X/or = 70. r
Solution. Let U = X - 68 andY = Y - 69, then
fj = 0, V = 0, oel = 4·5, = 5·5, Cov (U, V) =3 and r (U, V) = 0·6
Since correlation coefficient is independent of change of origin, we get
r = r(X, Y) = r(U, V) ='0·6
. X-a
We know that If U =-h- , V =-k-' then

X=a+hU, Y =b+kV,ox=hou andoy=koy


In our case h =k= l,a =68 and b= 69.
Thus X =68 + 0 =68, Y =69'+ 0 =69
Ox = = "4·5 = 2·12 and Oy':: Ov = .:; 5·5 = 2·35
Equation of line of regression of Yon oX is
• - Oy -
Y -Y =r-(X -X)
Ox •
2·35
i.e., Y = 69 + 0·6 ><'.2.12 (X - 68) => Y = 0·665 X + 23·78
Equation of line of regression of X on Y is
X - X = r Ox (Y _ Y)
Oy .
=> X = 68 + 0·6 x (Y - 69)' i.e., X = 0·54Y + 30·74
To estimate X for given Y, we use the line of regression of X on Y. If
Y = 70, estimated value of X is given by
1\
X = 0·54 x 70 + 30·74 = 68·54,
1\
where X is of X.
Example 10'24. In a partially destroyed laboratory record. 0/ an analysis
0/ correlation ddta, the /ollowing results only are legible:
Correlation and 10-68

Var{allce of X = 9.
Regression equations: BX - lOY + 66 = O. 40X -IBY =214.
What were (i) the mean values of X and Y.
(ii) the correlation coefficient X and Y. and
(iii) the standard deviation..!!f Y ?
[Punjab Univ. B.Sc. (Hons.), 1993]
Solution (i) Since both the lines of regression pass through the point
(X, Y), we have 8X .=. lOY + =0, and 40X - 18Y =214.
=
Solving, we get X 13"Y = 17.
=214 be the lines of regression
(ii) Let 8X - lOY + 66 = 0 and 40X - 18Y
of Yon X and X on Y respectively. These equations can be put in the form :
8 66 \ 18 214
Y =to X + to and X = 40 Y + 40

'b yx = Regression coefficient of Yon X = =

bxy =Regression coefficient of X on Y = =;0


4 9 9
Hence r2 =b yX • bxy =5" . 2Q =25
3 '
r =±5" = ±0·6
But since both the regression coefficients are positive, we take r =+0·6
(iii) We have brx = r· = x [.: cr; =9 (Given)]
Hence cry =4
Remarks. 1. It can be verified that the values of X 13 and Y = ]7 as
obtained in part (i) satisfy both the regression equations. In numerical
of this type, this check should invariably be applied to ascertain the correctness
of the answer.
2. If we had assumed that 8X - lOY + 66 = 0, is the equatipn of the line of
regression of X on Yand 40X - 18 Y= 214 is the equation of line of regression
of Yon X. then we get respectively:
8X = 10 Y - 66 and ] 8 Y = 40X - 2] 4
X - lOy _ 66 and Y _ 40 X _ 2] 4
- 8 8 - 18 18
18 40
bxy =8 and b yx =]8
10 40
r2 = bxy . brx= 8 x Is =2·78
10.64 Fundamentals ofMathematieal Statistics

But since r2 always lies between 0 and I, our supposition is' wrong.
Example 10'25. Find the most likely price in Bombay- corresponding to
the price of Rs. 70 at Calcutta fr.om the following:
Calcutta Bombay
Average price 65 67
Standard, deviation 2·5 3·5
Correlation coefficient between the orices of commodities in the two
is [Nagpur Ulliv. RSe., 1993;
'" Sri Veiakoteswol'O Ulliv. B.Se. (Oct.) 1990]
Solution. Let the prices, (in Rupees), in Bombay and Calcutta be
denoted by Y and X respectively. Then we are give!)
X = 65, Y = 67, ax = 2·5, ay = 3·5 and r =r(X, Y) =0·8. We want Y for
X= 70.
Line of regression of }' on X is
Y ... Y = r ay (X - X)
ax
3·5
Y = 67 + 0·8 x 2.5 (X - 65)

When X = 70,
" = 67 + 0·8 x 3·5
Y 2.5 (70,- 65) = 72·6
Example 10·26. Can Y = 5 + 2·8 X and X = 3 - 0-5Y be the estimated
regression equations of Y on X X on Y respectively? Explain your answer
with suitable theoretical arguments. [Delhi Ulliv. M.A.(Eco.), 1986]
Solution. Line of regression of Y on X is :
Y = 5 + 2·8X => b yx = 2·8 ... (*)
Line of regression of X on Y °is :
X = 3 - 0·5Y => b xy = - 0·5 ... (**)
This is not possible, since each of the regression coefficients brx and bxy
must have the same sign, which is same as that of Cov (X. y). If Cov (x. y) is
positive, then both the coefficients are positive and if C6v (X.. y) is
negative, then both the regression coeffiCients are negative. Hence (*) and (**)
cannot be tile estimated 'regression equations of Y 6n,X and X on Y respectively.

EXERCISE .10 (d')


1. (a) Explain what are regression 1ines. are there two such lines?
Also derive their equations.
(b) Define (I) Line of r.egression, (ii) .{egression coefficient. Find the
to the lines of regression and: sho\J.' that the coefficient of correlation is
tne geometric mean of coefficients of regression.
(c) What equation is the equivalent mathematical statement for the
following words? f

"If the respective deviations in each series, X and Y. from their means were
expressed in units of standard deviations, i.e., if each were divided by the
r
·10-65

standard deviation of the series; to which it belongs and plotted to a scale of


standard deviations, the slope of a straight line best describing the plotted points
would be the correlation coefficient r."
2(a) Obtain the eq\Ultion of the line of regression of Y pn X and show that
the angle 8, between the two lines of is given by
J..::...Q: X crlcr2 z
tan 8= z
P crl + cr 2

where crh crz are the standard deviations of X and Y respectively, and' p is ,the
correlation coefficient (Delhi Univ. RSc. (Math •• Hon••), 1989)
Interpret the cases when p = 0 and p = ± 1.
(Bango1ore Univ. B.Sc. 1990)
(b) If e is the acute angle' between the two regression lines with 'Yorrelation
coefficient r, show that sin e 1 - r2.
3. (a) Explain the'term "regression" by giving examples. Assuming that
the regression of Yon X is linear,'outJine a method for the estimation of the
coefficients in the regression line based on the random paired sample of X and
y, and show -that the varian'ce of the erroF. of the estimate for Y for the
regression line is cry2 (1 - p2), where cri is the variance of Y and p is the
correlation Coefficient between X and Y.
(b) Prove that X and Yare lineady related if and only if Pxy2 =.1.,further
show that the slope of the regression line is positive or nega.tivCl according as
p=+lorp=-l.
. ' . x .... 0 Y-c
(c) Let X and Y be two varIates. Defme X = - b - ' Y = - d - for

some f:onstants a, b, c and d. Show that the regression line (least square) of Y
on X can be obtained from that of r- on X·.
(d) Show that the, coefficient of correlation between the' observed and the
estimated values of Y obtained from the line of regression of Y on X, is the
same as that between X and Y.
4. Two variables X and Y are known to be related to each 'other by t)te
relation Y =X/(aX + b). How is the theory or linear regressic;m to be employed
to estimate the constants a and b from a set of n pairs of observations (Xi, Yi),
i =1,2, ..., n ?
1 aX + b b
Hint. Y = X =.0+X'
1 1·
Put X=Uandy=V
V =o+bU
S. Derive the standard error of estimate of Y obtained from the linear
regression equation o( r on X. What does this standard error measure?
6. (0) Calculate the tocfficien( of correlation from the following data :
X: 1 2 3 4 5 6 7 8 9
Y: 9 8 10 12 H 13 14· 16 15
10.66

AI,so obtain the equations of the lines of regression and obtain an estimate
of Y which s!tould correspond on the average to X =6·2.
Ans. r = = =
Y - 12 0·95 (X - 5), X - 5 0·95 (Y - 12), 13·14
(b) Why do we have, in general, two lines of regression '1 Obtain the
regression of Y on X, and X on Y from the following table and estimate the
blood pressure when the age is 45 years:
Age in years Blood pressure Age in years Illood pressure
(X) (Y) (X) . (Y)
56 147 55 150
...-42---... 125 49 145
12 160 38 115
36 118 42 140·
63 149 68 1 152
47 128 60 155
Ans. Y = 1·138X + 80·778, Y = 131·988 for X = 45.
(c) Suppose the observations on X and Y are given as :
X: S9 65 45 52 60 62 70 55 45 49
Y: 75 70 55 65 60 '69 80 .65 59 61
where N = 10 students, and Y = Marks in Maths, X = Marks in Economics.
Compute the least square regression equations of Y on X and of X on Y.
If a student gets 61 marks in Economics, what would you estimate his
marks in Maths to be ?
7. (a) In a correlation analysis on the ages of wives and husbands, the
following data were (,btained. Find
(I) the value of the correlation coemcient, and (il) the lines of regression.
Estimate the age of husband whose wife's age is 31 years. Estimate the age
of wife whose husband is 40 years old. J

15-25 25-35 35-45 45-55 55-65


Age of
Husband
15-30 30 6 3 - -

30-45 18 32 15 12 8

45-60 2 28 40 16 9

60-75 - 9 10 8

(b) The following table gives the distribution of 'otal cultivable a,ea (X)
and area under cultivation (Y) in a district of 69 villages.
Calculate (0 the linear regression of Yon X.
10067

(il) the correlation coefficient r(X. y), and (iii) the average area under wheat
corresponding to total area of 1,000 Bigi\as,
;;
\)
, Total' area Yin Bighas)
I

500-!.1000 1000---!1500 1500-2000 2000-2500


-
- 0-200 12 6 ...
'<
-
...
.
...
i 200-400 2 18 4 2 1

1 400-600 ...
...
4 1
...
.. .
:3

2
... ,

! 1 1

800-1000 ... ... r 2 3


..
ADS. (,) r= ..
0·7641X - 455·3854, (ii) r(X. y) Q·756 = I'
,

=
(iii) ,y 308·7146 for X 1000= .-
8. (a) Compare and contrast the roles of correlation regression 'in
studying the iQteJ:-depen&nce of two variates.
FOI: 10' observations on price (X) and supply (Y) the following data were
obtained (in appropriate units).
=
LX = 130, I,Y =220, I,Xl' =2288, I,f2 5506 and I,XY 3467 =
Obtaill the line of regression of Y on X and estimate the supply when the
price is 16 units, and find out the standard error. of the estimate,
ADS. Y = 8·8 + 1·015X, 25·04
(b) If a number X is chosen at random from anlong the integers 1,2,3,4
and a number Y is from among those at least as l!tfge as X, prove that
s .
Cov (X. Y) =8
Find also the regression line of X on Y.
(c) Calculate the correlation coefficient from Ihe following data : -
N = 100, IX =12500 I,Y =8000
I,X2 = 1585000, I,f2 =648100 I,XY' = 1007425.
obtain the regression of Y on X . ' •
9. (a) The of a bivariate frequency djstribution are at (3, 4), and
r = 0·4. The line of regression o(Y op X is parallel to the line Y =X. Find the
two lines of regression and estimate the mean of X when:r 1. =
=
(b) For certain data, Y 1·2 X and X = 0·6 Y, are the regression lines.
Compute p(X. l') and ax/ay . Also compute p (X. Z), if Z = Y - X.
(c) The equations of two lines obtained in a correlation analysis
are as follows :
=
3X + 12Y 19, 3Y + 9X 46 =
Obtain (,) the value ofcorrelation coefficient,
(ii) mean values of X and Y,
Fundamenta1e ofMatbematical

(iii) the ratio of tl}e coefficient of variability of X to that of Y.


Ans. -' &v3, 5, Y = 1/3.
(d) For an arl!!Y. of strength 25, the regression of weight of
kidneys (y) on .weight of heart (X ), both measured in .ounces is
Y -, 0·399X - 6·934 = 0
1,nd the regreision of weigh} of heart on weigbt of td<Jney is
, .)( _ 1.212Y + 2.461 = 0 C
I . . " ..
Find the c;:orrelation between X and Yand their values. Can
you find out the standard deviation of X and Yas well ?
"...._ -"--. i j
Ans. ,(x.,. YJ = 0·70, X = Y= No.
(e) Frnd the coefficient 6f correlation for distributfon in whiCh
S.D of X = 3'() units
S.D. of Y = 1·4 units
Coefficient of regr:ession of Yon :x =0·28.
10, (a) Given thai'x ::: 4Y + 5 and Y=kX + 4, ate the lines of regression
of X on Y and r on X respectively. ,show that 0 < 4k < I. If k = • .find the
means of the two variables and coefficient of correlation between them.,
I [Punjab. Univ. B:Sc. (1101Ul.); 1989]
Hint. X =4Y + 5 => bxy 4 = ' .,
'y =kx + 4- byx "= k
., r- =4k, ... (*)
But 0 S,2 I=>O S 4k s 1.
If
.k .then 'from
'= 116 •• we get
r-= 2J x =+ both the regression coefficient are positive]
For k =116 • the two of regressio-:, become
1
X = 4Y ;+- 5 and Y.= 16X + 4
Solving the two equations, we get Y=5',75 , X:;: 28.
(b) For 50 students of a class the regression equation of marks in
(X) on marks in Mathematics (Y) is 3Y - 5X +'180 = O. The mean marks in
Mathematics is 4,4 and variance of in Statistics is 9/16th of'the'variance
of marks in Mathematics. Find the mean marks in Statistics and the coefficient
of correlation between marks in two subjects.
[Banga/ore Un{". B.Sc.,· 1989]
.. ( 1
;Hint., We are g;yen n = Y=44
lIl1 0'; = i6 ... (*)

The equation of the line of regression of X on Y is given'to be


Regreseion 10-69

3Y' - 5X + 180 = 0
3
=> X = 5 Y + s180
crx --:- 5 => ,- . 4 -- 1
bxy -- f .cry 5 or r =0.8
Since the lines of regression pass through the point (X, Y), we get
-
X =53 -Y + """5
180 3
= 5 x 44 + = 624
(c) Out,of the two lines ,of. regression given by
X + 2Y - 5 = 0 and 2X + 3Y- - 8 = 0,
which one is the regression line of X on Y?
Use the equations to find the mean of X and the mean of Y. If the variance
of X is '}2', calculate the variance of Y.
Ans.
(Q) The lines of regression-in a bivariate distribution are :
X + 9Y = 7 and Y + 4X = 4:
Find (l) the coefficient of correlation, (iit) the ratios cr; : crYl : Cov (X, y),
(iii) the means of the distr!bution and (tv) E(X I Y= 1).
(e) Estimate.X when Y= IO,if the two lines of are :
1
X = - Ii Y + A. and Y = -2x + J,l.
(A, J!) being unknown and the meal) of the distribution is at (-1. 2). Also
compute r, A. and J.1. [Gujarat Univ. B.Sc., Oct. 199.J} ,
11. (a) The following reSultS were obtained in th.e arullysis of dat1 on yield
of dry bark in ounces (Y) and age in years (X)'of 200 cinchOna plants :
X Y
Average 9·2 . 16·5
Standard deviation ·2·1 4·2
Correlation coeffiCient = +0·84
.." f •

Construct the two lines of regression and of <fry bark of


plant of age 8 yeats. . [Patna Univ. RSc., 1991J,
(b) The following data pertain to the marlt'$ in subjects A and B in cedain
examination :
Mean marks in A = 39·5
Mean marks inB = 47·5
Standard deviation of marks in A = 10·8
Standard deviation of marks in B = 16·8
Coefficient of correlation between marks in A apd marks B = 0'12.
I;>raw the two lines of and why there are two regression
equauons. Give the estimate of marks in B for candidates who secured 50 marks'
in A. . -
Ans. Y = 0·65X + 2,·825, X = 0·27Y + 26·675 and Y = 54·342 forX = 50
10.70 Fundamentala of Mathematical Statiatiee

(e)'You are given the following infonnation advertising expenditure


and Sales:
. Advertising Expenditure (X) Sales (Y)
(Rs. lakhs) (Rs. lakhs)
'Mean 10 90
s.d. i 3 12
coefficient =0·8
What should the advertising budget if the company wants to attain sales
target9f..Rs.J20 Jakhs? .[Delhi Univ. M.e.A., 1990]
12. ,Twenty-five pairs of value of variates X and Y led to the following
.
= =
25, LX 127, iy = =
100, W::::; 760, l:f2 449 and LXY 500 =

m
N
A subsequent scrutiny showed that two pairs of values were copied down
as :

8
8
14
6
y
m 8
6
y

12
8

(I) Obtain the correct value of the correlation coefficient.


(il) Hence- or otherwise, find tlie correct equations of the two lines of
regression. '
(iiI) Find the angle between the regression lines. .
A..s. (i) r(X, Y) =-;
(0-64 X
(il) X = -.O·64Y + 7·56, Y = -0·15X
13. Suppose you have n observations:
(Xl> Yl), (X2' yz), ...... , (XII' y,.)
on two variables X and' Y, and you have fitted a linear regression Y =a + bX by
the method of least squares ..Denote the 'expected' value of Y by and the r.
residual Y - Y'" bye. Find means and variailces qf Y· and e, and the
correlation co-efficient between (I) X and e. (il) Yand e and (iiI) Y and 'r"'. Use
these results to bring out the significance and 'limitations of the correlation
coefficienL
= =
Ans. r(X. e) 0, r (Y. e) 0 and r(Y, If"') r(X, Y). =
14. (a) The regression lines of Y on X and of X on Y are respectively
= =
Y aX +'b and X eY + d. Show that '
(I) Mean& are X = (be + d)/(1- ae) and Y =(ad + b)/(I- ae)
(ii) Correlation coefficient between X and Y is
(iiI)The Iatio of the standard deviations of X 'and Yis {da .
(b) For two random variables X and Y with- the Same 'mean, the two
• . b I - a
=
regression equations are Y. aX + b and X =aY + p. Show that = I _ a .
Find also the common mean.
• [Pu.vab Univ.B.Sc. (MatI.. 19192]
10m

(c) If tile lines of regression of Y on X and X on Yare. respectively


((IX + blY + CI = 0 and a'Ji( + bzY + Cz = 0, prove that albz S aZb 1•
(Delhi Uniu. B.Sc. (Stat. Hon&), 1989)

HlDt. . rZ = brx . bxy S1 (a1)


- b ( 6z)=albz
l a/Jt S 1
X , ..,.

/I

15. (a) By minimising L /; (Xi COS a + ;i sin a - p)Z for variations in a


; - 1 .
and p. show that there are two straight lines passing through ,the mean of the
distribution for which the sum of squares of normal deviations has an extreme
value. Prove also that their slope$, are given by
. 21l1l
tan2a= z z CIx - CIy
Hint. We have to minimize
/I

S = L /; (x; cos a + Y; sin a - p )2" ••• (1)


i-I
Equating to zero, the partial derivatives of (I) w.r.L a andp, we have

i /;
ua = 0 = 2 ; _ 1 (xi cos a + Y; sin a - p) (-X; sin a + Yi cos a) .•. (2)

as /I
: \ =,0 = -2 L /;(x; cos a + Yi sin a - p) ...(3)
up ; .. I

Equation (3) can be written as

xcos a + y sin a - p =0 . .,(4)


/I

L /; (Xi cos a + Yi sin a - p) = 0


j - 1
From equation (2), we get a quadratic equation which shows that there are
two straight Jines fQr extreme values of E,
From equatiorl (4), it becomes clear that 1>oth the straight lines pass
ihrough the point (x , y ).
Again equation (2) can be written as :
/I

L /; (X; cos a + Yi sin a-p) (yj cos' a -Xi sin a) =d


j - 1
/I

L /; [cos a (Xi.-X)+ sin a (Yi. y)] [y; cos a'-xi sin a] 0


i-I [Using (4)]
It /I

cosz a L /; Yi (Xi - X ) - sin a cos a L Ii Xi (X; - x)


; - 1 .' ; - 1
/I /I

+sinacosa L f;Y;(yi-y):'sinza L/;X;(Yi-Y)=0 ••• (5)


;-1 i-I

We have 1111 =! , -x) (yj -Y)


10·72 Fundamenta18 otMathematlcal Statistics I

:;0 !. -y) -x. ,


1 .. _
Similarly,
, Ii (x; - x)

ax- = N
.2 '-
x; (x; -x) and ar2 = Ii -
. , , y;(y; - y)

Substitutirig these values in (5), we get the required result


(8Nfth-e-straight line dermed by"
Y =a+'bX'-
satisfies the condition - a - bX)2] =minimum, show that the regression
line of the random variable Yon the random variable X is
Y -. Y = r aa y (X - X), where X. = Y = E(Y)
x .
16. '(a) Define Curve of of Yon X.
The joint density function of X and Y is given ;by :
f(x, y) =x + y ,0 < x < 1,0 <y <: 1
=0, otherwise
. Fmd
(i) correlation coefficient between X and Y,
(;1) the regression curve of Yon X, and
(iiI) the regression curve of X on Y.
P (X, Y) =-
ADS.
:
2
111 • [MadraB Univ. )J.Se., Slat. (Main),1992]
.
(b) Let j{Xl'XV'= a 2 ;O":;:Xl
=0, elsewhere
be the joint p.dJ. of Xl and X2 •
Find conditional means and variances. Also show that p = .
17. If the joint density of X and Y is given. by .
. .

f( ) - { (x + y)/3, for O,<'x < 1,0 <'y <2


x, y - 0, otherwise
obtain the regressions (I) of Y on X. and (il} of X on Y.
Are the regressions linear ? Find the correlation coefficient between
X ahd Y. • Uni". B.Sc.199J)
, '3x + 4 2 + 3)'
ADS. r= E(Y =.3 (x + 1); x = E(X Iy) =3 (1 .2y)
Corr. (X,-of) =- ( )12
18. Let the joint density fu'ncti9n of X and i' be' given by :-
j(x, y) = 8xy, 0 < %<: y'< 1
= 0, otherwise
eorreJationand Beer-ion 10·73

Find: (l) E(Y IX =x), (il) E[XY I:¥ =xi, (iiI) 'ry IX == x]
[Dellai Uni,,-. BSc. (Moth. Hon ••),· 1988]

Ans. (I) E(YIx) = 3l


2 (I+X+X2)
1 x E (XY I x) =x.E(Y I (iiI) E •
l,oX) = 1
2

19. Give an example to show that it is possible to have the regression of


yon X constant (does not depend on X), ;but the regression 'of X on Y is not
constant (does depend on y)., •
Hint. 10'.2.1 .
20. Prove or, 4isprove
E(YIX=x) =constant R r(X.y)=·0
Ans.
21. lfj(x;l).= exp T-y (1 x tl}e p.d.f. of
(X.Y), obtain tJie'equation of regression of Yon X,
, Ans. IX)I= 1/(.1 + x).
22. Variables (X,Y) have joint p:d.f;
I(x;y)'= 6(l- - x - x>
y), :0, j >IO,'x + y < 1.
'T 0,
Find/x(x)./y(y) and Cov (X,Y). Are X and Y independent? Obtain the
regression curves for the' means. • "
- J
[Colcutto
.
·Univ. asc. (Moth •. Bon••), 1986]
. .. 0-<: x < 1 ;h(Y) =3(1 _y)2, 0 < Y < 1.
X and Yare not independent. .
,-
Regression.curves for the means are;:
y == E(Ylx) =t (1 -x) x':: E(X rY>-='t (1 :""y)'
23. For the. Joint p.d.t
• ·1<x. y)= 3x2 - 8xy l' 6y2;-.0;i(x, y)'S.I"
find the 1t1ii.st Wuart: lines and the regression curves for the means.
[Colcutto Unil7. BoSc. (Moth •.j H.n ••); 1981]
ADS. Regression lines :. ./

y - :2) ; x.. :z=-;;G -


Regression' curves for-means are :
9.%2 - 16x.+ 9 36y2 - 3'2'). - 9
y =E(YIx) =6(3x2 :... 4x; +: i) ; x = Iy) ,'i' .12(6;2 '_ 4) + 1)
24. :eet (X. Y.) .be joiiltly'distributed with p.d.f:
f(x, y) =e-Y , 0 < x < y < 00
;. ".= 0 , odierwisO
Prove that:
E(Y IX =:i) '= x +')1: and E(X I'Y =j) )/2.
10.74

Hence prove that r(X. Y) = "In. .


2 S·. Letf(x. y) = e-7 (1 :.;. e-%) ,0 < x < y ; 0 < y <
= e-% (1 - e-7 ) • 0 < -Y' < x ; 0 < x < 00
(a) Show thatj{i. y) is a p.d.f.
(b) Find marginal distributions of X and ·Y.
(c) Find E(YIX =x) for x> O.
(d) Find P (X S 2, Y S 2).
(e) Find correlation coefficient Y). reX.
if> Find another joint p.d.f. having the same marginals.
Ans. (b) 11(x) = xe-% ,0 < x < 00 ;12(Y) = yc' ,0 < y < 00.
(c)

(d) 1 -'
C:'
-
c:-
(e) rex. y) - (XI
(1% (1, 2
1
y) - {il{2'= -2
2
(j) Hint. I(x. y, a) =/.(x)/2(Y) [1 + a (2F(x}-I) (2F(y) -'1)] .
I a J < I, has the same margiiJals/.(x) andl2(Y).
26. Obtain regressi9n equatiPD of Y9D X for the distributions :
9 1 + x + y _ .J •

(a) j{x.y) +'X)4(1 :l-y)4 ;x.y.

(b) f(x. y) =; (x + ; x! Y 0
M.Sc., 1992]

Ans. (a) Hint. See Example 5·25, page 5·55, (b);x ++ .


27. A ball is drawn at random from an \lrD containing 'three white balls
numbered O. I, 2 ; two red balls numbered 0, 1 and one black .hall numbered O.
If the colours white, red and black are again numbered 0, 1 and.2 respectively.
find the correlation coeffICient between the variatesX. the coloufnumber and Y
the number of the ball. down the equation of regression line of f on X.
,[Calcutta U"iu. B.Sc. (MGtIu. 'Hon.&), 1986]

OBJECTIVE TYPE QUESTI,ONS


I. State, giving reasons, whether each of the following statements is true or
false.
(i) Both regression !ines of Y on X and of X on Y do not'inter!IeCt at
all.
(iJ) In a regression, brx = bxr = 10 t,
(iii) The regression coefficient of Y on X 1s. 3·2; and that of X on Y is
0·8.
(iv) no relationship between 'iOrretapon coefficient and regression
coefficient.
(v) Both the regression coefficients exceed unity.
CornIationand Recl-ion 10.76

(VI) The greater the value of 'r', the better are the estimates obtained
through regression an8Iysis.
(vii) If X and Yare negatively correlated variables, and (0, 0) is on the
least line of Y on X, and if X = 1 is, the obser\red value then
predicted value of Y must be. negative.
(viiI) Let the correlation between X and Y be perfect and positive.
Suppose the points (3, 5) and (1,4) are on the regression lines, With
this knowledge it is possible to detennine the least squares line
lexactIy. .
If the lines of i
are Y. = X and X = Y + I, then p = and
E(X I Y =0) = 1.
(X) Ina distribution, brx =2·8 and bxy =0·3.
p. Fill in t..'te· blanks :
(l) The regression analysis measures ••• between X and Y.
=
(iI) Lines of regressiol! are ... if rxr 0 and they are ... if rxr ± 1. =
(iil) If the regression coefficients of X on Y and Y on X are - 04 and
- 0·9 respectively then the correlation coefficient between X and Y
is ...
(iv) If the two regression lines are X + 3Y -5 =0 and 4X + 3Y - 8 =0,
r
then the correlation coefficient between X and is ...
(v) leone of the 'regression coefficients is ... unity, the other must,be ...
unity. '
(VI) The farther the two regression lines cut each other, ... will. be the
degree of correlation.
(viI) When one regression coefficient is positive, the other would be ..•
(viii) 'The sign 9f regression coefficient is ••• as that of cOrrelation
coefficienL
(ix) Correlation coefficient is the, ... between regression coefficients.
'(x) Arithmetic of is '" correlalion'coeffi-
. cienL
(.u) When the correlation coefficient is :zero, ,the. two lines are
•. and when it is ± I, then the regression lines are ...
m. Indicate'tIie correct answer :
(i)' The regression line of Y on X (a) total of the squares of
'horizontal deviations. (b) total of the squares' of tht( vertical
devialions, (c) both vertical and horiZontal deviations, {d) none 'of
these.
(ii) The regression coefficients. b2 and hi, Then the correlation
. coefficient r is (a) bl/b,., (b) b;/bh (c) bib,. (d) ± VtJi b2 '
-'(iil) "Qte f8Jlber the two lines cut each other (a) tIle greater will
be the degree of correlation, (b) the lesser will be the degree of
correlation, (c) does DOt matter.
Fundamentals orMathematical Statistics

(iv) If one regression coefficient greater than unity, then 'the other must
be (a) greater than the first one, (b) equal to unity, (c) less than
unity; (d) equal t o ' z e r o : ' 'J' •

(v) When the correlation coefficient r = ±1, then. the two regression lines
(a) are perpendicular to each- other; (b) coincide, (c) an( parallel to
each·other, (if) do not exist. .; ,
·(vl) The ·two lines of are given as x + 2Y ..:. 5 ;:: 0 and
;2X + 3Y =8. Then ·the mean values of X and Y'respeCtively are (a)
2, 1, (b) 1,2, (c) 2,,5, (tf) 2, 3. ' -
(vii)' The tangeqt of the angle between, two regression lines is given as 0·6
and the s.d. of Y is known to be twice that of X. 'fheQ:'the value of
cOJelation coefficient' X andY is (a) (b) (c) 0·7,
(if) 0.3. r· " , •
IV. Ox and Oy are the standard deviations of tWQ coriehitea:-variables X an
Y respectively in a large sample, ·and r is the samp1e correlation
coefficient .
(I) State the "Standard Error of Estimate" for linear-regression of Y on
X. II

(if) What is the standard error in,estimating Y from X 'if r =O?


(iii) By how -much is this error reduced if r is iriJreased to 0.30?
(iv) How large must r be to redure this standard error' to one-half
its value for r =:; 0 ? '
(v) Give your interpretations for the cases r =0 and, = 1.
V. pxplain why V{e have two lines of regression. '
1 I
1,0-8. 'Correlation Ratio. As discused earlier, when variables are
linearly rel!!ted, we have the regression of one variable on variable
and 'correlation coefficient can be 'computed to ten. us about the extent of
association between them. However, 'if the varlabies are not Iinearlyrelated but
some so;l of curvilinear relationship'exists between them, the use of r which is
a measure of the degree to which the relation approaches a stpighf line "law"
will be misleading. We might come across bivariate'distributions_ where r may
be very low or even, zero but the'regression may strong, or even perfect.
ratio '1)' is the appropriate measure of curvilinear 'reI3tj.9,nship
as
between the tWQ variables. Just r measures the of points about
the 'strhl,gl\t line orbest {it, 11 measures the concel1tration of ,po!nts about the
curve of best fit. If regression is linear 11. =..r, otherwise'l1 > r (ff. Remark 2"
§ 10·8·1).
10·S·I! of Correlation _Ratio. In the prev10us, we
have aSsum,oo that there is a single obS«rved'value Y to the given
value Xi of X but sometimes there ate more than one suell' v81ue of Y.
," to the values Xi' (i = 1,2, ..... m) o( th(variable X,
tIle variable Y,takes 'the 'values Yij with respective hj' j 2, ... , n.
Though all the x's in the ith vertieal array have the the y's are
different. A typical pair of values in the ith array is (Xj, Yij), with frequericy hj.
eorreJationand Re.....ion

1bus the first suffix i indicates the vertical array while the second suffix j
indicates the- positions of y in that array. Let

If Yi and i denote the means of the. ith array and the 9veIflll ,mean
respectively. then •
I,.I,.fij Y ij I,.n i Yi T
1. = I hk
.., = 'I.. ni = Ii
j I J I

In other words y is· the weighted mean of all the array the weights
being the array frequencies.
Def. The correlation ratio of Yon X. usually denoted by 1lrx is given by,
__ 2 - 1 C1e1-.
1'\YX -. - C1il ••. (1021)
.
where C1ey2 and C1y2 are given by

C1eY2 =).
N Ii I.1i-
j I, I, I
(y .. - Y·)2' and C11- = 1.
N Ii Ii,
j
.. (y .. _ y)2 I, I,
A c011.venient expression' for 1)rx can be obtained in terms of stalldard
deviation C1mr of the means of the vertical arrays, each mean being weighted by
the array frequency.
We have
..

= ");./;j Qij -. y;'r + "f.,'I,./;j ( Yi '- )2 + Y I,.fij (Yij - Yi) (y'i -.y)
, I } ,.' I } , I ! '
The term.=2["f.,( Yi {");./;j -(Y;j y;)l1 'vanishes since I,./;j (yij - Yi) =0,
I }

being the algebraic sum of the deviations from mean.


=I,. I,./;j (jjj - Yi)2 + I,. ni(Yi - y)2
I if .• I

N C1il •
I ) I

=> .N ,C1,y2 = NC1 =>, C1 y2 = C1 .y2 +'C1".y2.


. eil +cNC1"2' y
, . '2
1 C1.l _ d".y
=> -C1 2 -C1 2,
y y .'
which'on ComparisOn with (l0.21(gives
C1 2, I nj(J; - y)2
'!1rx2= --.=...,.,....:...-----
".y i
...
C1l "f., "f.,f;j (Yij :., )2'-
I ) .
10-78 Fundamental. olMad1ematieal Statiatiea

We have
Na
Illy
'l =I.j "
n· (Y. - y )'l= !liijl-Ny.= I. Tr - 12
j' jn; N

... (10·23)

a formuia. much more convenient for computational purpoSes.


Remarks 1. (10·21) implies .that
(1,,'; = d'; (1
Since a,,'; and a'; are non-negative; we
J 1 - flyx2 => flyx 2 => I flyx lSI
2. sum of squares of deviations in any array is minimum when
measured from its mean. we have
"i;. "i;./;j (y;j -
, J
Yi)2 So "i;. "i;.fij (Yij -
. , J
Yjj)2 ••• (*)

where Yij is the estimate of Yij for. given value of X =Xi • say. as given by the
line of of Yon X i.e.. %
=a + bXi. (j =1.2•...• n).
But
, !
(Yij - y;)2 =Na;'; =Na? (-1-
"i;. "i;./;j
, J
- a - bXi)2 =Na'; ({- r2) (cf. § 10·7-6)
:. (*) => 1 -flyx2 S 1 _r2
i.e .• r:2 I flrx I I r I
Thus the absolute value of the correlation ratio can never be less thwUlte
absolute of r. the correlation coefficient. .,
When,. tJte regJ.:ession of Y on X is straight line of means of arrays
=
coincides with the line of regression and flyx2 r2. Thus flyx 2 - r2 is the
departure of regression from linearity. It is also clear (from Remark 1) that the
more nearly approaches unity. the smaller is a,,'; and. thetefore. closer are
the points to the curve of means of vertical
When flrx2 = 1. a,,'; =0 I. f.hj (Yij"" Y/)2 =0 .
Yij. =Yi • 'V j = J. 2 •...• n. i.e.. all the points lie on the curve of
means. This implies that. there is a functional relationship between X and Y.
fin is. therefore. the measure of the degree to which the association between the
variables approaches a functional-relationship of the form Y =F(X). where
F(X) is a valued function of X. [F(X) = a + bX).
3. [t is worth noting that the value of fI YX is not independent of the
classification of the data. As the class interVals become narrower llrx approaches
unity. since in that case 0:".'; gets nearer to a';. If the grouping is so fine that
only one item appears in each row (related to each x-class). that item will
constitute the mean of that column and thus in this case a",'; and a'; become
eqqal so that = 1. On the other hand. a very coarse grouping tendS to make
the value ·of flrx approach r. "Student" has given a formula for ·the correction'
()on'8lationand Reel-ion 10.79

to be made in the correlation ratio 'Cor grouping' in Biometrika (Vol IX page


316-320.) .
4. It can be easily proved that is indeperluent of change of origin and
scale of measurements.
S. TI.xyl, the second correlation ratio of X on Y depends upon the scatter of
observations about the line of column meaJ'ls.
6. rxr and rrx are same but Tlrx is, in general, different from Tlxr.
7. In tenns of expectation, corrclation ratio is defined as follows:
= Ex [E(YIX) -E(Y)]2 _E[E(YIX) -E(D]2.
Tlrx2 E[Y-E(Y)]Z - (11--
Tlx1- = Ey [E (X I D - E(X)]2 _ E[E(X I D - E(X)]2
E[X - E(X)]2 -
8. We give below some diagrams, exllibiting the relationship r
and Tlrx·
(i) For completely random scattering of the dots with no trend, both r IlIl4 TI
are zero.
y

r :0.'1YX : It
")tY
=0

(ii) If dots lie precisely on a line, r =1 and TI =1.


10,80 of Mathematical StatistiQ

(iii) If dots lie on a curve, such .that no ordinate cuts more than
Tbx =1 andif fuithennore, the dots are symmetrically placed about Y-axis, then
llXY = 0, r = O.
y •

...
x

(iv) IfTlrx > r, the dots are scattered around a definitely curved trend line.

EXERCISE IO(e)
I. (a) Define correlation and correlation ratio. When is'!he latter
a more suitable measure of correlation than tne former ? Show that the
correlation' ratio is never less than the correlatic.n coefficient. What do you infer
if the two are equal? Further, show that none of these can exceed one.
r.LkUii ....iv. I1Sc. (Stat. Bon••), 1988]
2 2
(b) Show that 1 TlYX rrx 0
Interpret each of the following stateIPents.
(i) r =0, (il) r2 = 1, (iil) Tl2 = 1, (i.v) Tl2 =r2 and (v) TI =0
(c) When the correlation cOefficient is equal to unity, show.that the two
correlation ratios are also equal to unity. Is the converse true?
(d) Define correlation ratio Tlxr and prove that
10·81

• .. 1 Tl 2XY r2, .
where r is the
coefficient of correlation betw.een X and Y: S11o,," further that
('16Y - r2) is a measure of non-linearity of regression.
2. For the joint p.d.f. .
f(x.y) [-x(y+ l)];y;>O,x>O
=
\,
0 ,. f otherwise,
fllld:
(I) Two lines of regression.
(if) The regression curves for the means.
(iii) reX. Y).
2 2
(iv) Tlrx and Tlxr .

[Delhi Univ. BoA. (Stat. Bon.. Spl. Course).. 1987]


1 2 10
Ans. (I) y=-6 x +! x=-3 Y +"3

"(if) y =E (f.lIx):,1x =E(X Iy) =-4-


x
1 +Y
(iii), r(X.,Y) !f,-i· (iv) flh =}, Tl XY R k
2

3. rQf, Y) and Tlrx f9r tJte following data :


x: 0,5 - 1·5 1·5 - 2·5 2·5 - 3·5 3·5 - 4·5 4·5 - 5·5
t: 20 30 ' 35 2S ,:5
-
J

Yi' ': 11·3 12'7 14·1 16·5 }.9.·1


Var (Y) = 9·61
Ans. ... r ='0'·85.
4. Compute'TlXY for 'the followin table:
•• J • .. .
.,. X
47 52 57 62 67
Y
57 4 4 2
62 4 8 8 1
67 7 12 1 4
72 3 1 8 5
77 3 5 6

10·9. Intra-class Correlation. correlation means within


class correlation. It is distinguishable from proouct moment correlation in as
mtfch as here both the variables measure the same characteristics. Sometimes
specially,in biological and agricultural study, it is of interest to know 'how the
members of a family or group are correlated among themselves with respect to
some one of their common characteristic. For example', we may require the
correlation between the of brothers of a family or between yields of
prots of an cxperim-,ntal block. In such cases both the variables measure the
same characteristic, e.g.. height and height or weight and weight. There is
10.82 FundamantaJa ofMathematlcal StatlstL:s

nothing distinguish one from the other so that one may be treated as X-
variable and the other as the Y-variable.
Suppose we have Ah A 2 , ••• , A .. families with kh "2, ... , k .. members,
each of which may be represented as
Xll X21 .•..... :.•...•... Xii ..••••..••••.•.•• X"l

!
!
!
!:
;
I :
Xlj X'1J ................. X;j ................. XIIj

Xl.tt X2lz ..•..••..•••..•:. Xik; ................. x...t..

and let xij (i = 1,2, ... , n1j = 1,2, ... , k;) denote tht( measureillent on the jth
member in the ith family.
We shall have k;(k; - I) pairs for"the iib family or group'like (x;j' Xii),
j I. There will be L" k; (k; - i) = N pairs for all the n lamitieS or groups. If
;.• '1
we prepare a correlation table there will be k; (k; - I) entries for the ith group or
=
family and L k; (k; - I) N entries for all the n families or groups. The table is
i
symmetrical about the principal diagonal. Such a table is called an intra-class
correlation table and the correlation is called intra-class correlation. .
In the bivariate table Xii occurs (k; - I)'times, x.'loccurs (k; - I) times, ""
X;k' occurs (ki -: I) times, i.e .• from the ith family we have (k; -1) LX;j and
• I • j
hence for all the n families we have (ki - I) l:%ij as the marginal; frequency,
,
the table being symmetrical about principal diagonal.

.. i = Y7' [r (k; - 1) f;i]


Similarly,

CJJt- =CJyZ=! [¥k; - 1) lj(Xij - i) 2.]-


FUrther
C-ov (X. =! r (Xij - -,x )} j I

== kL k;
.I. I. . (Xii - X ) (Xii - X ) ..... L: (Xii - i )2
i J-l '.1 J-l
CoJTEllation and-Regression 10·83
£f we write X; = X;/ kit then
J
k k
L
;
[:f :f (xij - x) (Xii - X)] = L;. [L
j = I 1= I _ j
(x·· -x) L
I
(X,,-X)] IJ 1

Therefore intra-class correlation coefficient is given by


k .? (x; - x)2 - L L (xij - x)2
reX. y) = (X. Y) = 1 ; j ... (10.24)
VeX) V (Y) (k; - I) (x;j - i")2
1 J

If we put k; =k. i.e.• if all families have equal members then


k2 L(X;
;
_;")2 - L; LJ' (x·-IJ - xp
r
(k - I) (Xij'- x)2
1 J

_ Ilk 2 cr m2 - nkcr 2 _ I_{k cr "'_1


_ 2 }
... (l0·24a)
- (k -1) nkcr 2 -(k - I) cr2 -
where cr2 denotes the variance of X and crm2 the variance of means of families.
Limits. We have from (1O·24a) •
." kcr m2
1 + (k - J) r = - 2 - 0 => .,. - (k - I)
cr
. cr",2
Also 1 + (k - I) r k. as the ratio crT 1 => r I

so that ) < < )


(k _ I) - r_

Interpretation. Intraclass correlation cannot be less than . . . , I/(k - I),


though. it may attain the value + I on the positive side. so... that it is a skew
coefl1cient and a negative value has not the same significance as a departure from
independence as.an equivalent positive value.
EXERCISE iO (f)
1. If X,, X2, ... , Xk be k variates with standard deviation cr and 111 be a.ny
number. prove that
k k
k2cr 2 =(k-1) L (x,-m)2: ,=1'.\'=1
,=1
L L (x,-m) (x,,-m),r:t:s
Hence deduce that the coefficient of intraclass correlation for 1/ families with
varying number of members in eaeh family is
10084 Fundamentals of Mathematical S!.:1tistics

I. kiar
1 _ i
a 2I. ki(k i - I)
j

where ki' a; denote the number of members and the variance respectively in the
itn family and a 2 is the general variance.
(jiven n::: 5, aj = i. kj = i + 1 (i 5), find the least possible intraclass
correlation coefficient
2. What do you understand by intra-class correlation coefficient
Calculate it'! value for the following data·:
Family No. Height of brothers
1 / 60 62 63 65
2 59 60 61 62
3 62 62 64 63
4 65 66 65 66
5 66 67 67 69
3. In four families each containing eight persons, the chest measurements
of persons are given below. Calculate the intraclass correlation co-efficient
F,:mily 1 2 3 4 5 6 7 8
I 43 46 48 42 50 45 45 49
IT 33 34 37 39 82 35 37 41
ill 56 52 50 51 54 52 39 52
IV 34 37 38 40 40 41 44 44
10·10. Bivariate Normal Distribution.,. The bivariate normal
distribution is a generalization of a normal distribution for a single variate. Let
X and Y be two normally correlated variables with correlation coefficient p and
E(X) =Illo Var (X) = a1 2 ; E(Y) =1l2' Var (y) = a2 2• In_deriving the bivariate
normal.distribution we the following three assumptions.
(i) The regression of Y on X is linear. Since the mean of each array is on
the line of regression Y = p(az!al)X, the mean or expected value of Y is
p(az/al)X' for different values of X.
. (ii) The arrays are homoscedastic. i.e.• variance in each array is same. The
common variance of estimate of Y in each array is then given byai (1- p2),
P being the correlation coefficient between variables X and Yand is independent
ofX.
(iiI) The distribution of Yin different arrays in normal. Suppose that one
of the variates, say X, is dist{iputed normally with mean 0 and standard
deviation al so that the probability that a randorr. X will fall in the
small int.erval dx is

g(x) dx= k.
al (2n) .
exp (-X2/ 2a lZ) dx

The probability that. a value of Y, taken at random in an assigned vertical


array will fall in the interval dy is
Correlation and Regreeeion 10.85

h (y I x) dy =G2'..J 2n 1(I _ p2,) . exp {- 2G22(: - P


2) (y - P X
)
1M joint probability differential of X and Y is given by
dP (X. y) =g(x)h(y I x)dxdy
..5

- -1- {(.t-fAIf
- - - 2p -'--:....:.:....::--'-'''- +
_ 1 e 2(1- pl) 0.1
- 27tGIG2..J (1 _ p2) . •
, ( - 00 <x< 00. - 00 (10·25)
<y< 00) •••

where Ill> 1l2. GI (>0), G2 (>0) and p (-1 < P < 1) are the five parameters of the
distribution.
NORMAL CORIELAnON SURFACE

This is the density function of a bivariate· normal distribution. The variables


X and Yare said to be normally correlated and the surface z =f(x. y) is known as
the normal correlation sUrface. The nature of the normal correlation surface is
indicatt'.d in the above diagram
Remarks 1. The vector (X. Y)' following the joi.{ll q.dJ. f(X. y) as given
in (10·25). will be abbreviated as (X. Y) - N U11. J.l.z. (fi. CJ2. p) or BVN U11. 1l2.
z Z
(fit 0z • p). If in particular III = Ilz = 0 and (fl = 0z = 1 then
"

.
(X. Y) - N (0.0, I, I, p) or. BVN (0. O. 1,1. p).
2. The curve z =l(x., y) which is the equation of a in three
dimensions, is called the 'Norma' Correlation SUrface'.
10-86

10·10·1. Moment Generating Function of Bivariate Normal


Distribution. Let (X. Y) - BVN (JJ.lt 1l2. 1'). By def.,

Mxy (tto
x - III
=E [e tlX + t2 Y ] =I_:
Y - 112
f_:
exp (tl X + t1Y).f(x.y) dxdy

Put - - = u, - - =v, - 00 < (u. v) < 00


<11' <12
i.e.. x =<1lu+llloY=<12v+J.12 IJI=<11<12
, exp (tllli + t2ll2)
y (tt. = 21t...f 1 _ 1'2
X If [t
uv
exp 1<1 1U + t2<12V - 2(1 1'2) {u 2 -7puv + v 2 } Jdudv
_ exp (tllli + t21l2)
- 21t...f 1 _ 1'2'
x ffex
u.v
p [ 2(1 {(u 2 -2puv + v 2) -2(1- J
+t2<12v) } dudv

We have
(u 2 - 2puv + v2) - 2(1 - 1'2) (tl(11 u + t2<12V)
= [(u -pv) - (1 - p2)tl<11)2
+ (1- 1'2) {(v - ptl<1l - t2<1i)2 - t llcrj2 - t22 <1l - 2pI112<11<12} .,(*)
By taking
u - pv - (1 - 1'2) tl<1l = 00(1 - p2)1/:l} ......
dudv=Vl-p 2 dwdz
and v-ptl<11-t2<12=z
and using (*), we get
Mx. y(tto tv =exp[tllli + t:1J.l2 + i(tl2012+tl'(J,)?+ 2ptlt2<1l<1i)]

= exp [tllll 4- i
t2ll2 + (t1 2<11 2 + tl<12Z + 2ptlt2<1l<1i1] ... (10·26)
In particular if (X. Y) - BVN (0, 0, I, I, 1'), then
Mx. y (tto ti) =exp [k(t1 2 + t22 + 2ptlti)l ... (10·26/1)
Theorem 10'5. Let (X. Y) - BVN (Illt 1l2' <11 2, <122, 1'). Then X and Y
are indepenilent if and only if p =O.
Proof. (a) If (X. Y) - BVN (JJ.1o 1l2' <11 2, <122, 1') and I' =0, then X and Y
are independent [ef. Remark 2(a) to Theorem 10·2, page 10.5J.
Aliter. (X. Y) - BVN (JJ.t. 1l2' <11 2, <122, 1')
eorreJatlonandRecr-lon 10.87

., Mx. Y (tit = exp (tllli + t2J12 + (tll<JIl + 2ptl t2<JI<J2 +


If P =0, then
MX,y(tl,ti)
. = exp { tllll+2"tl<JI
Ill} .exp {tlJ.1z+2"tl<J2
.12l}

J =MX(tl)' My (tz). ...(*)


[.: If (X. Y) - BVN (J.11t 1l2' <JI l , <Jll , p), then the marginal p.dJ.'s of
X and Y are normal i.e .• X - N (Ilit <JIl) and Y - N <Jll )].
(*)::) X and Yare independent'
(b) Conversely if X and Yare independent, then p =0 [c:f. Theorem 10·2]
Theorem 10·6. (X. Y) possesses a bivariate normal distribution if and
only if every linear combination,of X and Y viz .• aX + bY. a :;f: O. b :;f: O. is a
normal variate.
Proof. '(a) Let (X. Y) - BVN Uti' 1l2' <J1 2, <J2Z, p), then we shall prove
that aX + bY, a:;f: O,'b :;f: 0 is a normal variate.
Since (X. Y) bas a bivariate normai distribution, we have'
Mx,y (tit = E (et.x + tzY)
= et,l!, + t2ll2 + ;(t,2o,2 + (J,o, + ... (*)
Then m.gJ. of Z =aX + bY, is given by:
Mz(t) =E(etZ) =E (e'(aX + bn ) =E (eDtX + btY)
t2
== exp (t(alll + bJ.12) + 2 (a2cr12 + 2paOOl<J2 + blcrl)} ,
[Taking tl = at, t2 = bl in (* )]
which is the 11).g.f. of normal distribution with paquneters
Il = alll +-412; <Jl = a2<J1 2 + 2pabcrl<J2 + b2cr22. . .. (**)
Hence by uniqueness theorem of m.g.f;,
Z =aX + bY - N{J.1, <J2),
where Il and <J2 are given in (**).
(b) Conversely. let Z = aX + bY, a ':;f: 0, b :;f: 0 be alnormal variate. Then
we have to prove that (X. Y) has a bivariate,nonnal distribution,
Let Z =aX + bY - N(Il, <J2),
where Il = EZ = E (aX + bY) = aJ.1z + bJ..L,
3ld <J2 = Var Z =Var (aX :.. bY) =a2cr,.? + 2abp<J,,<J, + *l
Mz{t) = exp [1J.1 + 12cr2/2]
=exp [t(all" + bJ..l.,) + (a2cri + 2abp<J:xf1, + b2cr/)]
I
= exp [till" + t2J..l, + l(tl2cri + t2<JP, + tl2<J/)] ... (***)
where tl =at and t2:;: bt.
But is the m.gJ. of BVN distribution with parameters (Il", Il" <J,,2.
oi, p). Hence by uniqueness tfleorem of m.g.f.
, .
Fundamental. otMatbemadcal Statistic.

(X. Y) - BVN (JJ.x,lly, ax", aI, p)


10·10·2. Marginal Distribution or Bivariate Normal. Distribution.
The marginal distribution of randoqt variable X is given by

Ix(x) =J_:/xy (x. y) dy


Pl\t -
a"
Il"
y - - = u, then dy = a" duo Therefore,

1
Ix<J)
21tala"..J (1 - p,,)

= 1 exp [ - -
1
21tal " (1 - p") 2 al

X .J00
-00 exp [ - 2(1 _1 p2) { u - P (X - du

Put 1 [u _ p (X - III =t. tl;1en du =..J (1-:... p2) dt


" (1 - p2) al U
2 . exp
Ix(x) =21tc;JI . [ I
- 2 (X-IlI)2]JOO exp (t- 2'
-oc
2 ).
dt

= _ 1 exp [ _ L ... 00·27)


al V2it 2 al
Similarly, we shall get

1r<Y) =J_:lxr<x, y) dx

=a2'&
Hence X - ']Ii (JJ.I,
ex p [ -

and Y - N (JJ.2, (22)


e ... (1O.27b)
(10·270)

Remark. We have proved that if (X. Y) - B'lN (JJ. .. Il", al , a,,2, p),
2
then the marginal p.d.f.'s of X and Y are also normal:However, the converse
is not true, i.e., we may have joint p.d.f. I (X. Y) of (X, Y) which is not
eorrelationand Reer-ion 1(1-89

normal but the marginal p.dL's may still be normai as discussed in the
following illustration.
Consider the joint distribution of X and Y given by :

f(x, y) =-i[21t (1 pZ)l/Z exp pZ) (x z - 2pxy + yZ) }

+ 21t(1 pZ)llZ ex p { 2(1 (x Z + 2pxy + yl) }]


[fl(X, y) + fz (x, y)] ; - 00 (x, y) < 00 •••

wherefl(x, y) is the p.d.f. of BVN (0,0,1, I, p) distribution andfz(x, y) is the


p.d.(. of BVN (0, 0, 1, 1, - p) distribution.
It can be easily verified that f(x, y) is the joint p.d.f. of (X, y) and
obviously f(x, y) is not the p.d.f. of bivariate normal
Marginal distribution of X in (10·27c)

fx(x)=&[

But r:f.(x"Y) dy is the marginal p.d.f. of. X, where

(X, y) - BVN (0, 0, 1, 1, p) and is given by X - N(O, 1).

Similarly f_:fz (x, y) dy is the marginal p.d.f. of X, where

(X, y) - B"vN (0,0,1, I, - p) and is given by X - N (0,1).

=_ e-x'-/2 ; _ 00 < x < 00 - ... (l)


v21t
X - N(O, 1) i.e., the marginal distribution of X ... (10·27c) is normal.
Similarly, we can show that that the marginal p.d.f. of Y ill' (10·i7c) is
given by:
fy(y) =![ili
__ e,:,,,o/2 + ili e-"o/2]
2
I_ _1_

= I r
e- /2; _ 00 < y < 00 ... (iO

Y - N(O, 1).
Hence if the marginal distributions of X and Yare normal (Gaussian),)1-
does not necessarily imply that the join{ distribution of (X, Y) is bivariate
normal.
For another illustration, see Question NU!llber 17, Exertise 10(1).
We further note that for the joint p.dJ. (10·27c), on using_(l) and (il), }Ve.b.ave
E(X) =0, CIxz = 1 and E(y) =0, CII = 1.
Fundamental. otMatbematical Statiatie.

Cov (X. y) = E(Xy) - E(X) E(Y) =E(X'Y)


(1x (1y

=4 [[ [ .tY Ji(x. y) 4..;y + [ [.tY f,(x. y) <hdy 1


=Hp+(-p)] =0.
because. for fl(X. y), (x. Y) - BVN '(0, 0, I, l"p) and for fz (x. y).
(X; Y) - BVN (0,0, I, I, - pl .

/
..! Corr. ,(X, Y) =' Cov (X. y) = 0

However, have ;.[From (i) and (iz)]


fi(x) .fy(y) = e- ! (x? + )12) '# f(x. y)

=>. X and Yare not independent.


The above example illustrates that we may' have a joint density (non-
Gaussian) of rv's (X. Y) in which the marginal p.d.f.'s of X and Yare normal
and p(X. Y) = 0 and yet X and Y lP"e not independent. _ I

1(HO·3: Conditional Distributions. Conditional distribution of X


for a fixed Y,is given by
fn (x. y)
fXly(xly) = 1r<Y)
,= ili (11 (1- p2;exp[ - p2) {( X J
1
- (11ThV (l - p-2)

x exp [ - -2(1- {(X - - P (y - YJ


1
- ili (11V (1- p2)

is the prQbabili,ty
x, exp L- 2(1 _

of a unvarjJltp.
{x - 0 1+ P (y -
with mean
and variance given by

E(X I Y =y.) = +p - V(X I Y =y) ;:: (11 2 (1 - p2)


Hence the-conditional distribution of X 'for fixed Y is given by ;
eorreJaponand Jleeresaion 10·91

... (10·27d)

,
Similarly the conditional distribution of random variables Y for a fixed X is
fxy (x. y)
= fx(x)
1
=Th (12..J (1 - p2)

X exp [ - 2(1 _ (1zz {(Y - Ilz) - p (x - Ill) }Z] ,


-OO<Y<OO
Thus the conditional" distribution of Y for fixed X is given by

(Y IX =x) -N [-Ilz + P (1z


(11 (x -Ill) ,(1z2 (1 - pZ)
] ... (10·27e)

It is apparent from the above results that the array means are collinear, i.e.•
the regression equations-are linear (invol\'ing linear functions of the independent
variables) and the array variances are constant (i.e .• free fi0n:t independent
variable). We express this by saying that the regression equations ofY on X and
Xon Yare linear and homoseedastie.
For p = 0, the conditional variance V (Y I X) is equal to the marginal
variance (122 and the conditional mean E(Y"I X) is equal to the marginal mean
Ilz and the two variables become .independent, which is also apparent from joint
distribution function. In between the two extremes when p =± 1, the correlation
coefficient p provides a measure of degree of association or interdependence
between the two variables.
Example 10·27. Show that for the bivariate normal distribution

4P =eonst, up [- 2(1 p2) (x2 - 2pxy + yZ)] dx dy.


(l) M.G.F. is M(th tz} =exp.(!(t1z.+ 2PtltZ + tzZ)]
(ii) Moments obey the recurrence relation.
Pllr-1 .• t.. 1 + (r-l) (s-l) (1-pZ) Ilr-2..-2
lienee or otherwise. show that
Ilr.r =0, ifr + sis odd.1l31 =3p, 1122 =1 +.2p2
rbelhi U"i.,. 8.Bc. (Stat. H6M.), '1989]
. Solution. (0 From the given probability function. we see that
III =0 =Ilz and (11 Z=1 = (1zz .
:. From.(10·26a). we get
=M (tl • tz) =exp + 2PtitZ + tz2)]

(ii)
aM =M(tl -+ ptl> aM-=M (t2 + PI))
I1tl I1tz
IOo92 Fundamentals of Mathematical StatUities

iJlM
011012 =OIl0 (oM) 0
012 =011 [M(/2 + P/1)]
= Mp + (/2 + P/1) (II +
02M oM oM
01 1012 - p/l - P/2 012
= [Mp + (/2 + p/I)(/1 + -:: p/l (II + - P/2(/2 + P/I)MJ
=M[/1/2 + P - p2/1/2l (On simplificalion)
= Mp .... ,(1 - p2)M11/2
iJlM oM oM
011012 = p/l a,;-+ P/2 012 + Mp + M (1 - p2)/1/2 ••• (*)

L L .:1; ':',
00 00

But M =exp H(/1 2 + 2p/1/2 + 122)] =


,=Os=O
:. (*) gives
00 00

li,-llr l
£.j £.J J.lrs • (r - 1) , (s - 1) !
r=ls=1

= '[ L L
P
oo oo
.!l.!L
" ,+ p
r . s .
LL
oo
,
oo
,II' -
sJ.l,... -
r ,
12'
. S ,•
r=ls=O ,=Os=1

+P LL 1{12""
J.l,... r., S ,• + (1 - P2)
LL • 11'+1/2'+1
r ., S ,•
]

,=Os=O ,=Os=O
. . 12'-1
I {-I .
EquatlDg the coeffiCients of (r- Ij , . (s' _ 1) ! on both we get
J.l1'S" = [p(r - 1) J.l,-I."...I + p(s - 1)J.lr-I" -I- + p2J.lr_'1. ;:'1
- + (1 - pZ)(r - 1)(s - 1)J.lr-2, ,-21
J.In = (r + S - 1) PJ.l,-I. ,-I + (r - J)(s - 1)(1 - p2)J.lr_2. ,-2
In particuJar '
J.l31 = 3PJ.l2.0 +0 =,3pGI 2 =3p (.,' GI2 =1)
J.l22 = 3PJ.ll.l + (1 - = 3p2 + (1 - p2).l,
=(1 + 2p2) (.,' J.lll = =p)
Also =J.l30 =0
J.l12 = +0=0 ,(.,' J.lIO =0)
1123 =; + 1·2 (1 - p2)J.lo'1 =0
we will get J.l21 =0, =0
If r + s is odd, so is (r - 1) + (5 -1), (r -'2),+ (s - 2), iIlld-so on.
And since J.l30 =0 =J.lo3, J.l12 =0 ='J.l21 , J.l23 =0 =J.l32"" we finally get,
Correlation and Recr-ion

= +
J.lrs O. if r s is 9dd.
Example 10·28. Show thai if and X z are standard normal variates
with correlation coefficient p between them. then the correlatif)n coefficient
between XI Zana Xzz is glven by pZ l
Solution. Since XI and Xz. are two standard no-mal variates. we have
=
E(X I) =E(Xi) =0 and V(XI) E(XIZ) = 1 =V(Xi) E(XzZ) =
)

n
Mx •• Xz (t .. ti) =exp (ti Z+ + tz2)] [c.f. (10·26)]
E(X1Z XzZ) - E(X)Z) 'E(XzZ)
Now p(X)z.Xz2) =
...J [£(X)4) - (E(X)Z)}2] -\ [E(XZ4) - (E(XzZ)}Z]
, t z, t Z
where E(X)zXzZ) =Coefficient of fr. ftin M(t .. ti) =(2pZ + 1)

E(X)4) =Coefficient of in M(t .. ti) = 3

E(XZ4) = Coefficient of in M(t). ti) = 3


p(Xlz.XZZ) 2pz + 1 - 1 Z
.. =...J (3 .:. 1) ...J (3 - 1) =p
Example 10·29. The variables X and Y with zero means and standard
deviations (7) an4. <7z ar!.normally correlated with correlation coefficient p. Show
that U and'V defined as
U=X + I.. and V=X _I..
(7) (7z (7) <7z
are independent normal variates with variances 2(1 + p) and 2(1 - p)
respectively.
Solution. We are that

dF(x. y) =21t(7)(7z (1':' pZ) exp [-: 2(1 {;;z - +


-00 < (x. y)< 00
10.94 }'undament...J. o(Mathematfcal Statistic.

= ./.
21t. 2'1 (1 _.p2)
e"(p [- 4(1 1 2) { (1- p)u2+ (1 + p)v2
_.p
1] du dv

= 1
21t..J2(1 _ p) ..J2(1 + p)
. exp [_ u2 _ v2 du dv
2(1 + p)2 2(1- p)2 .
J
=[ _ 1 {
- u2 ]]
, flU..
ili ..J 2(1 + p) . exp . 2(1 + p,)2

x p)' exp {- 2(1 : P)1}] dv


= [f\(u)du] lfz(v)dv]. (say)

where f\(u) = 1 .exp {_ _ -"u;....2_}


+ p) 2€1 + p)2

f2(v) =lli ..J _ p) . exp {- 2(1 p )2}


Hence.u and V are independently distributed. U as N [0,'2(1 + p)] and Vas
N [0, 2 (1 - p)].
Aliter. Find joint of U and V viz.,
=
M (tit ti) E (e t \ U + t2V) = E [eX(t\ + ti)/o\ + Y(t\ - ti)/Ol ]
and use E(et\X + t1Y.) = exp [(ti2012 + t22022 + 2pt t20"\0"i)12]
Example lO·30./f X'and Yare standard normal variates with co-efficient
of correlation p. show that
(J) Regression of Y on X is IifJear.
(il) X + Y and X - Yare independently distributed.
":\ Q "X2 - (12pXY
("" _ p2)+ y2 1$. d',strl'buted I'u
, a chi -square, '.e.,
. as thatoif
the sum of the .squares of standard normal variates.
(Madra Uni". B.E•• 1990)
Solution. (i) c.f. § 10·10·3.
(U) Let u =x + :y and v =x - Y

dF (x, y) = 21t ..J 11 _ p2. exp [- 2(1 P


2) (x 2 - 2pxy + ]2)J dxdy

u+v u-v
Now x =-2-'y =-2-
,.
ax ax 1 1
au av 2. 2 1
J = Q1 Q1 = 1 1 =-2
au av 2- 2

dG(u••) = C exp [- 2(1 _ . 4 (2(u' + "'» ] dudY


eorre1ationand 10.95

where C __
4ft...) :... ;'2

:. dqu. V) =cexp [- 4(1 {(1 - p)u 2 + (1 + P)V2}]dudv

=[C 1 4(1: p)r] x [C 2 p)rJ


= [g1(u)du] [giv)dv] , (say).
Hence U and V are independently distributed.

(ii,) MQ(t) = J_: J e'Q dF(x. y)

= 1 roo JOO exp(tQ)


2ft...J(1_p2)J- 00 --00

2(1 {X2 -2pxy + Y2}Jemty

= Joo (
exp tQ - - Q) dxdy
2ft (1 - p2) -00 -00 2

.= 1
21t ...J (1 - p2) J -
r 00
00
J-
co
00
exp,[- (1 - 2t)]dxdy

Put ...J (1- 2t) x =u and ...J (1 - 2t) y =v


til dv
.. dx= anddy=
...J (1 - 2t) ...J (1 - 2t)
_ 1 [ 2] _ 1 [u 2 - 2puv + V2]
Also Q - (1 _ p2) x2,... 2pxy + y - (1 _ p2) 1 _ 21

.• MQ(t)... 1
21t...J (1 - p2) (1 - 2t)

x J J_:
_00
00
exp .[- ,2(1 p2) (u 2 - 2puv + v2 ) Jdu dv
1
= (1 _ 2t) • 1 = (1 - 2t)-1
Fundamentala ofMatbematical Stati8tie.

which is the m.g.f. of chi-square (xZ) variare- wilh-n \=2) degrees of freedom.
Example 10·31. Let X and Y be ".dependent standard normal variates.
Obtain the m.gf. of XY. [Gauh"ati M.Sc.,1992]
Solution. We have. by definition:

Mxr<t) =E(e rXY) = J_ J_:


0000 e txy .f(x. y) dxay

Since X and Y are independent standard nonnal variates, their joint p.d.f.
f(x. y) is given by :
/ f(x. y) =fi(x) .fz(y) = e-r12 e-r12 ;- 00 < (x. y) < 00

M Xy(t) =2Jt1 J 00

-00
J 00

-00
-! (X Z - 2txy + yZ)
e 2 dxdy

1
= 2Jt
JooJoo
-00 -00
1
exp [ - 2(1 - t Z )

..
... ( )

= azz = (1 _ tZ) and p = t, we get


1 1 1 _r::--;.
Mxy(t) =-2 . 2Jt r-=-' '41 - t Z
Jt '4 1 - t Z "! 1 - t Z
=> Mxr ;: (1 - tZ)1!Z ; -1 < t < 1
Example 10'32. Let X and Y have bivariate normal distribution with
parameters:
J.lx = 5. J.ly = 10. (jr = 1. (j'; = 25 and Corr (X. Y) p. =
(a) If p > O,fmd p wheiiP (4 < Y-< 16 i X 5) 0·954 = =
\. _ [Delhi Univ. B.Sc. (Math. Hons.), 1993, '83]

-Chi-square distribution is discussed in Chapter 13


1()'97

(b) If P = o,Jind P (X + Y 16).


Solution. Since (X, y) - BVN (J.Lx, J.l.y, (JyZ, p), the conditional
distribution of Y X =x is also normal.
(Y IX
)
=x) - N [J.l. = J.ly + P(Jy (x - J.lx). (J2.., c? (1 _ p2)]
(Jx
. . (Y I X =5) - N [J.l = 10 + P x f (5 - 5), (J2 =25 (1 - p2)] •
=N [J.l = 10, (Jz = 25(1 - p2)]
We want p so that
P(4 < Y < r61X =5) = 0·954
where Z =J:...=..M..= Y - 10 - N (0, I)
(J 5 '" (1 _ pZ)
:::> P (4 10 < t < 16 -(JIO) =0.954

:::> P (-(J6 < Z < 2a) = 0·954 ... (*)


But we know that if Z - N (0, I), then
P (-2 < Z < 2) =0·954 .... (*"')
Comparing ("') and (""" ), we gel
=2 => (J =3 => (J2 =9 =25 (1 _ p2)

1 - P2 = -2S => . P =-2S => P =-S = 0·8


,9 2 16 4
(·.'p>O)
(b) Since (X. Y) have bivariate normal distribution,
p =0 => X and Yare independent rv's
X - N(J.lx and Y - N(J.l.y , (Jy2)
X + Y - N (J.L =J.lx + J.ly, (Jz == (Jx z + (Jy2) = N (15, 26)
Hence
P (X + Y S 16) = P (z S 16ju15)

where 2 = (X + (JY) - J.l. _ N (0, I).

P(X+YSI6)=P(Z s
26/ V
where <I>(z) = P (Z S z), is the distribution functi.on of standard normal vilriale.

Remark. P(X + Y S 16) =p(ZS


= 0·5 + P (0 SZ S 0·196)
= 0·5 + 0·0793 (approx.)
= 0·5793.
10·98 Fundamentals of Mathematical Statistic.

EXERCISE 10(f)
1. (0) Define conditional and marginal distributions. If X and Y follow
bivariate normal distribution, find (i) the conditional distri_bution of X given y
and (;,) the distribution of X. Show that the conditional mean of X is
dependent on the given Y, but conditional variance is independent of it.
(b) Derme Bivariate Normal distQbution. If (X. Y) has a normal
distribution, find the marginal density function/x(x) of X.
[Delhi Univ. B.Sc. (Maim. Hom.), 1988)
2. The marks X and Y scored by candidates in an examination in two
Mathematics and Statistics are known to follow a bivariate nonnal
distribution. The mean of X is 52 and its standard deviation is IS, while Y has
mean 48 and standard deviation 13. Also the of correlation between X
and Y is 0·6.
Write down the joint distribution of X aud Y. If 100 marks in the
aggregate !lJ'e needed fOI a pass in the examination, show how to calculate the
proportion of candidates who pass the euminatioJI ?
(b) A manufacturer of electric bulbs, ill his desire for putting only gOOd
bulbs for sale, rejects all bulbs for which a certain quality X of the
ftlarnent is less than 65 units. Assume that the quality characteristic X and lh('
life Y, of the bulb in hours are jointly normally distributed with parameters
below :
X Y
Mean 80 1100
Standard deviation 10 10
Correlation coefficient p(X. Y) =0·60
Find (i) the proportion of bulbs produced that will bum fOf, less ilian 1000
hours, (;,) the proportion of bulbs produced that will be put for sale, (iii) the
average life of bulbs put for sale.
3. (0) Determine the panpne,iels of the bivariate normal distribution:
Ax, y) =k exp [- :7 (x - 7)2 - 2(x - 7) (y + 5) + 4(y + 5)2 J J
Also find the value of k.
(b) For the bivariate normal distribution:

(X, Y) -BVN (1,2,4 2 ,5 2 , {})

[rod (t) P(X > 2), (;,) P(X > 2 I Y =2).


(c) The bivariate random variable (X .. X 2 ) have a bivariate normal
distribution with means 60 and 75 and standard deviations 6 and 12 with a
correlation coefficient of 0·55. Find the following probabilities :
75), (;,) P (71 =
80 IX I 55) and (iii) P(IXI 15).
4. For a bivariate normal distribution :

Ixy (x, y) = ..J) exp 2(1 p2) (x 2 - 2pxy + y2)} •


21t (1 - p2) l
- 00 < (x, y) < co
Correlation and 1()'99

Find (i) marginal distribution of X and Y,


(ii) conditional distribution of Y given X,

(iii) of (I ! p2) [x2 - 2pxy :+ y2],


and (iv) show that in general X and Y are stochastically and will be
independent if and only if p O.
5. Let tl)e joint p.d.f. of X and Y be

f(x, y) = 1
21tcrlcrl '" (1 - p7)
1 [(X-lll)Z 2 (X,.,.lll)
x exp {- 2( 1 _ pZ) 'crlZ - P crl

where - 00 < x < 00, - 00 < y < 00, -I < P < 1.

(i)Find the marginal distribution of X.


(ii)i·.iud tile conditional distribution of Y given X x. =
(iiI) Show that the regression of Yon X is linear and homoscedastic.
(iv) Find P(3 < Y < 8 I X = i), given that ill = 3, Ilz = I, crll 16,. =
crll = 25, P 0·6, =
(v) Find the probability of tqe simultaneous materialization of the
inequalities, X > E(X) and Y > E(Y)
Hint. (v) Required probability p is given by
p = P[X > E(X), Y > E(Y)] = P[X > Ill) r"I (Y> Ilz)]

=J,"" J"
J.It J.I2
f(x, y) dx dy
'

-= J: J: 21t "': _ pZ . exp [- 2(1 pl) (U Z - 2puv + vZ) }UdV,

( u = x - Ill, V =Y - Ill).
crl crl
Now proceed' as in Hint to Question Number 9(b).
6. Let the jrandom variables X apd Y be assumed to have a joint bivariate
normal distribution with
.
III =,Ilz, = 0, crl = 4, crz = 3 r(X. Y) = 0·8.
m Write do}Vn the joit;lt density of X and Y.
(il) Write down the regression of Yon X.
(iii) Obtain the density of X + Y and X - Y.
7. For the distribution 'of random van,abJesX and Y given by

dF= kexp [ - 2(1-:' pz) (xl •• 2pxy + yZ) Jdx dy; - - Sx S 00, -c:o S y 00-
Fundamentals ofMatbematical Statistic.

Obtain
(I) the constant k,
(U) the distJ;ibutions of X and Y,
(iil) the distributions of X for given Y and of Y for given X.
(iv) the curves of regression of Yon X'and of X 011' Y,
Md (v) the distributions of X + Y and X - Y.
8. Let (X. y) be a bivariate normal variable with E(X) = E(Y) = 0,
Var (X).J: Var (Y) = I and Cov Y) = p. Show that the random variable
Z= a CaUc.:1Y distribution.
[Delhi Univ. B.Sc. (Malhs. Hons.), 1989]
_1 [ (1 - p2)1!2 ]
Ans,!(z)-1t (l_p2).+(z_p)2 ,-OO<Z<OO.
9. (a) If (X, y) - N(IJ.", lJ.y, a"2, ai, p),
prove that
, 1 sin-Ip
P(X > IJ." n Y > lJ.y) = 4 +. 21t
[Delhi Univ. M.Sc. (Sial.), 1987]
(b) If (X, Y) - 0, 1, 1, p} then prove that
1 sin-I p
P(X > OnY > 0) = 4 + 21t .
[D.e(hi Univ. B.$c. (Sial. Honf.), 1990]

Hi.bt. P = P(X > OnY > 0)


r
1 _ p2
,x
0 0
J'" Joo L-
:exp 2(1 2) {,X2 - 2p,Xy + y2}
p.
JdxdY
Put x :: 'lcOS ,9, y = , sin 9 => I J I = r;0 < , < 00, 0 9 S; 1t!2

. p_
.. - 21t
1
-V 1 _ p2
Joo0 J1tI2
0
xp [_ r2 2) (1 - p SIn '29>] ,drde
2(1 ... P •
Now integrate frrst w.;. to, !hen w.r. to 9.
10. (a) Let XI and X2 be two indepenoent normally distributed variables
with zero means and unit variances. Let YI and Y2 be the linear functions of XI
and X2 defined by
Yj = ml + IIIX I + 112 X2, Y2 ,: m2 + 121 XI + 122 X2
Show that Y\ and Y2 are normally distributed with means ml and m2, variances
Jl20 = 1\1 2 + /122, 1-102 = 1212 + lxi, and covariance 111/21 + 112 / 22•
(b) Let XI and X2 be independent standard'normal variates. Show that the
variates Yh Y2 defined by
XI = a\ + bllX I + b 12X2 , >:2 = + b21 X I + b 22X2 .ate dependent normal
variates and find their and-variance.
"!nt. YI and Y2 , being ,inear combination of S.N.V's are also normally
l!istributCd: To prove that they ar:e.dependeQt, it i:; sufficieQt. to 'prove that
rO'\> Y2) O. [e/. Remark 2 to Theorem' 10·2) .
eon-eJationand Beer-ion 10·101

11. (a) Show that, if J( and Y are independent nonnal variates with zero
means and variances GI2 and G22 respectively, the point of inflexion of the curve
of intersection of the nonnal correlation surface by planes through the z-axis, lie
on the elliptical cylinder, .
}{2 f2
-
(f12+ - -1
(f,l-
(b) If X and Yare bivariate nonnal variates with standard deviations unity
and with correlation coefficient p, show that the regression of X2 (f2) on f2
(Xl) is strictly linear. Also show that the regression of X (Y) on f2 (}{2) is not
linear.
12. For the bivariate nonnal distribution:
£iF =k exp [- (x2 - xy + y2 - 3x + 3y + 3)] dx dy.
obtain (i) the marginal distri»ution of Y, and
(il) the conditional distribution of Y given X.
Also obtain ilie characteristic function of the above bivariate I}ormal
ditribution and hence the covariance betwecrn X,and Y •
• 3. Let/and g be the p.dJ.'s with corresponding distribution functions F
and G. Also let
h(x. y) -= j(x) g(y) [1 + a (2F.'(x) 1) (2G(y) - 1)],
where I a 1St, is a constant and h is a bivariate p.dJ. with marginal p.d.f.'s
f and g. Further let/and g be p.dJ.' s of N (0, 1) distribution. Then 'prove that:
Cov (X. y) =a/TC
14. If (X, Y) - BVN {JJ.J, 1l2' G12, (f22, p), compute the correlatiol)
coefficient between eX and eY •
Hint. Let U = eX, V = eY •
Il'n [e rX + sY]
=exp ['111 + SJ.I.2 + i(r2a12+s2a22 + 2prs)]
[c.f. m.g.f. of B. Vlj., <listribution : 'I = r, t2 = sJ
Now E(U) = Il/lo ; E(U2):.: Il' 20, E(UV) = Ilu' and so on.
- I
Ans. p(U,v) = 2 --. 2
[(ec:rl - 1) _l)]1/2
(e G2
15. If (X. y) - BVN (0,0, 1, 1, p), find E [max (X. Y)].
Hint. max, (X. Y) (X + Y) X- YI
ani Z =X - Y..., N rO.2 (1- p)] ,[c.f.Theorem 10·6J
Ans. E [max (X. Y)] = ( l=..Q....TC)112
16. If (X. Y) - BVN (0, 0, I, I, p) wilh joint p.d.f.j(x. y) lhen prove that
(a) P(XY>O)
10·102 Fundamentals olMatbematieal Statistics

Hint. P(XY > 0) = °°


P(X > f"'I Y > 0) + P(X <
= 2 P(X > f"'I Y > ·0)
° f"'I Y < 0)
[By symmetry]
Now proceed as in Hint to Question No. 9(b).

(b) 21t I__ I__


o 0
.f{x, y) dxdy =1t + sin-J p

17. The joint density of T.V'S (X, Y) is given by:


f(x,-y)
1 =-21 .. exp [- (x2 + y2)/2] x [I + xy exp (- (x2 + y2 - 2)!2)] ;
y 1t '
- 00 < (x, y) < 00

(I) Verify thatf(x, y) is a p.d.f.


(iI) Show that the marginal distribution of each of X and Y is normal.
(iii) Are X and Y independent?
Ans. (ii) X -N (0,1), Y - N(O, 1)
(if) X and Yare not independent.
18. Show by means of an example that the normality of conditional
p.d.f.'s docs not imply that !he bivariate density is normal.
Hint. Consider f(x, y) =constant. exp [- (1 + x 2) (1 + y2)]; -00 < (x, y) < 00

. Then (rIX),-N(O. 2(1 :x2»)and(XIY)-N(0, 2(1 :'y2»)


19. For a bivariate normal T.V. (X, f), does the conditional p.d.!". of (X, y)
=
given X + Y c, (constant) exist? If so find it. If not, why not?
AIlS. No, since P (X + Y = c) = 0.
20. Let

..J l' {I
e'xp - 2(1 _ p2) (x 2 - 2pxy + y2) }]
=2 [
1 21t 1 _ p2
f(x. y) .
+ .1 exp {- 2(1 2) (x 2 + 2pxy + X2)}
21t..J 1: _ p2 P
- 00 < x< 00, - 00 < y < 00

then show that :


(1).f{X, y) is a joint p.d.f. such that bOth marginal densities are normal but
f(x. y) is not bivariate normal.
(ii) X and Y have zero correlation "but X and Yare not independent.
[Delhi Univ. B.Se. (Sial. Bon&), 19891
21. Let X. Y be normally correlated variates with zero means and variances
(J,2, (J2 2 and if

w=K. z= 1
(Jl ' ..J (1 _ p2) (J2 (J,

Show that
CJ(w. zl_ I
CJ(x, y) - (J,(J2..J (1 _ p2)
correJatlon and Rear-Ion

!n1 W 2 + Z2 = 1
(1 - p2) <112
[X2 _ 2pX Y +
<11<12
.r=-J
<122
])educe thauhe joint probability differential of Wand Z is

1 - Z2)JdwdZ
and hence that 'W, Z are independent normal variates with zero means and unit
S.D.'S [Meerut Univ. M.Sc., 1993]
Hence or otherwise obtain the m.g.f. of bivariate normal distribution.
22. From· a standard bivariate normal population, a random sample of n
=
observations (Xj, Yj), (i 1,2, ... , n) is drawn. Show that the distribution of

ZI =-n1 j aL" l X? and -I L"


Zz =_ni=1 y j2

-:!' _00_00

Now use the result

foo Joo
-- --
exp[- (ax2 + 2hxy + by2)] dxdy =-V 1t
. ab _.h 2
and simplify.
10·11. Multiple and Partial Correlation. When the values of one
variable are associated with or influenced by other variable, e.g., the age of
husband and' wife, the height of father and son, the supply and demand of ,a
commodity and so on, Karl Pearson's eoefficient.of correlation can be used·as a
measure of linear relationship between them. But sometimes there is
interrelation between many variables and the value of one variable may be
influenced by many others, e.g .• the yield of crop per acre say (XI) depends
quality.oJ seed fertility of soil (X 3), fetilizer used (X4 ), irrigation
facilities (Xs). conditions (X6 ) and ,so Whenever we are interested in
studying \.be of a group of variables upon a variable not included in
group, our study is that of f!lultiple correlation and mult!ple regression.
Suppose in a trivariate or. multi-variate we are interested in
relationship between two variables only. The are two alternatives, viz., (i) we
10·104

consider ·only those two members of the observed data in which the other
members have specified values or (ii) we may eliminate mathematically the
effect of other variates on two variates. The fllst method has the disadvantage
that'it limits the size of the data and also it will be applicable to only the data in
which the other variates have assigned values. In the second method it may not
be possible to eliminate the entire influence of variates but the linear effect
,can, be easily eliminated. The correlation and regression between only two
variates eliminating the linear effect of other variates in them is called the partial
correlation and partial regression.
10·11·1. Yule's Notation. Let us consider a distribution involving
three randoin variables X I, X 2 and X 3' Then the equation of the plane of
regressiortof Xl onX2 andX3 is
Xl =a + + b 13."x3 ••• (10·28)
Without loss of generality we can assume that the variables Xl' X2 and X3
have been measured ftom their JespecUve means, so that
E(XI ) =E(X,) =E(X) =0
Hence on taking expectation of both sides in (10·28), we get a =O.
Thus the plane of regression of Xl on X2'and X3 becomes
'Xl =b12.3 X2 +b13.i X, ••• (10,2&)
The ,coefficients b l2 .3 and b 13•2 are known as the partial regression
coefficients of Xl' on X2 and of Xl on X3 respectively.
=
el.23 biB X2. + b l,.2 X3
is called the estimate of X I as given by the plane of regression (10·28a) and the
quantity
= X I,- b 12•3 X2 - bl 3-1 X 3,
is called the error of estimate ot residual.
In the general case of n variables Xl> X2, ••• , X".the equation of the plane of
regression of Xl onX2,X" ••. ,X" becomes J

Xl = bI2.34••• "X2 + bl3-24••• "X3 + ... + bl".23 ... (....I) XII


The errcr of estimate or residual is given by
XI.23··." =XI - (b I2.34 ••• "X.'2 + b I3.24••• "X3 + ... + bl ".23 ... (".I) X,J
The noUlu"bns used here are due to Yule. The subscripts before the dot (.) are
known as,primary su/Jscripts and those after the dot are ·called secondary
subscripts. The order of a regression coefficient is determined by tl)e number of
secondary subscripts, e.g.,
bI2." bi2.34, ••• , bI2.34 ••• "
are the regression coefficients of order 1,2, ... (Ii - 2) respectively. Thus in
general, a regression coefficient with p-secondaly subscripts. will be called a
regression co-efficient of oider 'p'. It may be pointed out that the order in which
the secondary subscripts are written is immaterial but the order of the primary
subscripts is important, e.g., in b I2.34 ..• ", X 2 independent- while Xl is
dependent variable but in ••• " ,Xl is independent while X2 is dependent
Rect-ion 1!)·105

variable. Thus of the two primary subscripts. fonner refers to dependent variable
and Ille latter to independent variable.
The order of a residual is detennined by the number of secondary
subscripts in it, e.g., XI.Z3 • XI.Z34 •...• XI.Z3 ..... are the residuals of order 2.3 •
.•.• (n - 1) l'tfspectively.
Remark. In the following we shall assume that the v:uiables
under consideration have been measured from their respective meanso
10.12. Plane of Regression. The equation of the plane of regression
of Xl on X z and X3 is
=
XI bIZ03Xi'+ b I3 .Z·X3 ... (10·29)
The constants b's in (to·29) are by the principle of least
squares. i.eo, by minimising the sum of the squares of the residuals. viz .•
S =l:Xl.Z3 z ,= l:(X l -b 1Z.3 X Z -b 13.ZX 3)z.
the summation being extended to the given values (N in number) of the
variables.
The nonnal equations for estimating b1Z.3 and b l 3-z are

:lbOS
CJ lZ.3
= 0 =- 2 l:XZ(Xl-blZ03XZ-b130ZX3)}
as ... (10·30)
CJ 13.Z
= 0 = -2 l: X 3(X 1 - b 1Z.3 X 2 - b I3 .Z X 3)

i.e., LXZXl.Z3 =0 and l:X3Xl oZ3 '= 0 .•.(10·30a)


-, biB E Xzz..,.. b 130z l; X ZX 3 = O} ... nO·30b)
l:XIX3 - b 12.3 l: XZX3 - b 13.Z L X3 2 = 0 I
SinceX;'s are measured from their respective means. we have

!
GiZ.= l: X? Cov (Xi. Xi) = l: Xi Xi }
. Cov (Xi. Xi) l: XiX; ... (to·30e)
an rii = GiG)' = N GiG)
Hence from (to·30b). we get
rlZ GIGZ - b 12•3 dl- b l 3-2 rZ3 GZG3 = 0 }
2 0 .•. (10·30d)
I rl3 GI G3 - b IZ 3 rZ3 G ZG 3 - b 13·Z G3 = .
0

Solving the equations (10.30d) for b l 2-3 and b I3 .Z• we get

IrlZ GI rZ3 I I rZ3


r13 GI G3 rlZ
rl3 1
I
b.,., -I' Gz
rZ3 Gz
rZ3 G3
f13 1-:'1 1 rZ3
rZ3 1 I.
... (10·31)
·10-106 Fu.ndamentaJa ofMathemaiicaI Statis,tica' i

Similarly, we will get

... (1O·31a)

If we write
1
·(10-32)

and (j)jj is the cofactor of the, element i,n the ith row andjth column' of <0, we
have from (10·31) and (1O·31a)
01 <012 01 <013
b\2.3 =- - . - and b l3 .2 =- - . - ...
02 <011 03 <011
Substituting these values in (10·29), we get the of the
plane of regression of XI on X2 and X3 as
XI =- °1 . X2 - <!1.. X3
02 <0. l' 03 <OIl
XI X2 X3
- . <011 + - . <012 + - . <0\3 =0 ... (10·34)
01 02 03
Aliter. Eliminating the cOefficient b 12.3 and b l3 .2 in (.10·29) and (10·30d),
the required equation of the plane of regression of XI on X2
XI X2 X3
=0
rl30103 r23¥3 032
Dividing C .. C2 and C3 by 0., 02 and 03 respectively and also R, and R3
by 02 and 03 respectively, we g e t ' .

=0
r13 r23 1
!1. COll + X3
COl2 + -, COl3 0 =
01 02 03
where <Oij is in (10·32).
10·12·1. Generalisation. In general, the equation of the plane of
regression of XI on X2 ,X3 , : •• XII is
XI =bI2.34... IIX2 + b l3.24 ..."X3 + ... + OIIl.23 ... (II-l)XII ... (10·35)
The sum'of the squares of residuals is given by
S =L X21.23..."
Correlation and Regression 10.. 107.

=L (Xt -b t 2-34..."XZ -b 13.24 ••• "X 3 - ••• -b t ".23 ... (,,_t)X,,)Z ,


Using the principle Qf least squares, tile normal equations for estimating
(n - 1), b's are

=0 -2l: X1(X I - b l l.34 ... "X1 - b u .14 ... "X3 - ... - bl"'13'... (" -lye,,)
ab IZ·34 ...•
oS
_ = 0 = -2l:X3(XI - bll.34 ...• Xl-bu.14 ... "X3 - ... - bl".13. (,,_I)X,,)
abI3·14... •

_ as '" 0= -2l:X,,(X I - b ll-34 ..."X1-bd.14 ..." X3 - ... - bl"'13, ..


ab l .Z3 ... (" -I)
... (10·36)
i.e.. LXi X t .23 ...n = 0, (i = 2, 31 "'1 n) ..,.(lO·360)
which on smplification after using (lO·30e), give
rt:zGtGz:: b 1Z.34 ..."G22 + bt 3-24 ..."r23G 2G 3 + ... + bt ".23...("_t)r2llG2G,,
rt)C1tG3 =b tZ.34 ."r23GzC13 + bt3-24. "G32 + ... + b t ...23 ...("_l) r 3"G3G "

rt"GIG" =btZ.34..."r.2IlGzC1" + bt 3-24..." r3"G3G " + ... + bt"'23... (,,_t)G,,2


... (10·36b)
Hence the eliminant of b's between (lO·35) and (lO·36b) is

Xt X2
rlzC1tCJz G 22
r13GtCf) r2)C1zCf)
=0

Dividing C h C2 , ••• ,·C" by G\> Gz, •••• G" respectively and R2 , R 3 , ••• R"
by 02, G3; ••• , G" respectively, we,get

Xl X2 X3 X"
Gt G2 C1) G"
rt2 1 r32
1
=0 ... (lO·37)

1
10·108

If we write
1 ri2 ,rI3' rl..
r21 1 rZ3 r:7JI
r31 r32 1 r311
CO= ... (10·38)

"'{Ill rto. 1
r ti3
and OJij is the cofactor of'the element in the ith row andjth column of OJ, we get
fMm ,(10·37)
XI
- . OJll
(JI
+-
X2
(J2
X3
COl2 of - OJI3
0'3
+ ... + -
X ...
(J..
COl .. =U ••• (10·39)

as the required equation of the plane of regressiol} of XI on X2 , X3 , ••• , X...


Equation (10·39) can be re-written as
X - (JI OJI2 X (JI COl3 X (JI OJI .. X
... (10·390)
I - - (J2 • COll 2 - (J3 • OJll 3 - ••• - (In • OJll ..

Comparing (10·390) with (10:35), we get

Remarks 1. From the ·symmetry of the result obtained in (10·40), the


equation of the of regression of Xi' (say), on the remaining variables Xj
*
(j i = 1, 2, ... , n), is given by
X2 Xi X.. .
Xl
- COil + - COi2 + ... + - CO" +" ... + -
(J2
COi.. 0 ; I = 1, 2, , n
(J..
=
. .. (1041)
2. We have
b12-34..... <Jz • con
3'Xl b
21·34.....
=_<Jz
(JI
C021
C022

Since each of (JI> (J2, OJll and OJ22 is non-negative and OJI2 = OJ21> [c/o
Remarks 3 and 4 to §10·14, page 10·113] the sign of each regression
coefficient b l 2-34..... and bzl.34..... depends on eDt;.
eorreJationand Reenuion 10·109

10·13. Properties 6f residuals


Property 1. The sum of th.e product of any residual of order zero with any
other residual of higher order is zero, provided the subscript of the former: occurs
among secondary subscrjPts of the latter.
The normal equations for estimating b's in trivariate and n-variate
distributions. as obtained in equations (10·30a) and (10·3OO). are
I,X ZK'I.Z3 = O. I,X3XI.Z3 = 0
mel I, X j X I.Z3 ..... = 0; i 3•...• n
respectively. Here Xi. (i = 1.2.3-•...• n) can be regarded as a residUal of order
zero. Hence the result
Property 2. The sum of the product of any two residuals in which all the
secondary subscripts of the first occur among the secondary subscripts of the
second is unaltered ifwe omit any or all of the seconoory subscripts of the first.
Conversely; ,the product sum ofany residual of o.rder 'p' witli a residual of order
p + q, the 'p' subscripts being the same in each case is unaltered by adding to
the secondary subsQ"ipts of the former any or all the 'q' additional subscripts of
the latter.
Let us consider
I, XI.ZXI.Z3 = I, (Xl - = I,X IX I.Z3 -biZ I, X z X I .Z3
= I,X IXI.Z3 (cf,
Also I,XI.Z;z =I,XI.Z3XI.Z3 =I, (Xl -b lz.3 Xz -b I3•Z X3) XI.Z3
=I, X I XI.Z3 -.b l,2.3 b X z XI.Z3 - b l 3-Z I, X3 XI.Z3
=I, XI XI .Z3 (cf, Property I)

Again I, XI •34..... XZ•34 .....


=I,t(XI - bI3-4...11 X3 - bI4•3S,.... X4 - ••• - bl....34... X.. } XZ.34..... ]
= I, X1XZ•34... 11 (cf, Property I)
Hence property?
I
Property 3. The sum of the product of two residuals is zero if all the
subscripts (primary as,well as secondary) of the one occur among the secondary
subscripts of the other. e.g.,
I,X I.ZXHZ = I, (Xl - biZ Xz) XHZ =' I, ,xIXHZ - bIZ I, Xl X3-lZ = 0
(cf.'Property I)
I, XZ•34 ..... X.I .Z3 .....
=I,[(Xz - bZ34... IIX3 - b24.3S ..... X4 - ...... bz..,}4...(,.,.l) X.. } X 1.Z3...II ]
=I, Xz XI . Z3..... - b23 .4..... I, X3 X1.Z3 ..... -b24.3s..... I, 1(4 X1.Z3 .....
••• - hz,..34 ... (..... I) I, X.. X 1.Z3.....
=0 (c/.Property.1)
Hence the property 3.
10·110 Fundamentals of Mathematical

10·13·1. Variance of the Residual, Let us consider the plane of


regression of XI on X 2 • X 3 • •••• viz .•
XI = b I2.34 ..•" X 2 + b 13.24..."X3 + ... + b l ".23 ...(,,-.I) X"
Since all the X;'s are measured from their respective means •. have
E(Xj) = 0; i 1.2 ....• n E(X I .23 ... ,.} = 0
Hence the variance of the residual is given by

(J2
1·23 ... "
= 1.
N L[X" 1. LX2
_ E(X 1·23... ,.}]2 = N ' 1·23 ... "
1 1
=N LX1·23 ... " X 1·23..." = N LX IX 1·23 .. ".
(c/. Property 2 § 10·13)

=N1 L XI (Xl"" b I2.34 ... " X 2 - b.JJ.24.,." X3 - '" - b!ll.23 ... (" .... I) X,.}

= (J1 2 - b I2.34 ..." r12(JI(J2 - bl3 24... " r13(JI(J3 - ••• - b l".23... (" -I) rl,,(JI(J,;
:;:> (J1 2 - (J21.23 ... " = b I2.34..." rI2(JI(J2 - b 13.24 ... " r13(JIO-3 - .,.

- b l ".23 .. (,,- D rl"(JI(J,, ••• (1042)


Eliminating the b's in equations (10·42) and (10·36b). we get
(J1 2 - (J21 23 ... "
rl2 (JI(J2

=0

rl ,,0\ (J" r2ll(J2(J" (J,,2

Dividing R10 R1 • ...• R". by (J1o (J2 • ••• , (JII respectively and also C h C l •
. . .• C II by (JI. (J2: ••• ,"(In respectively. we get
(J2 1·23 .. II
1.- rl2 rill
(Jll
rl2 1
=0
..
rl .. rlll 1
(J21.23 .....
1- rlZ riA
(J1 2
rlZ +0 1 rlll
=0

rill +0 rlll
Correlation and Rei:r-ion 10·111

I TI2 TI .. G21.23.....
GI2 TI2 TI..
1 T'bt
oJ .0 1 T'bt
=0

TI.. T'bt 1 0 T'bt

ro G 2 1·23 ..... ro =0
=> - 11

••
G7
1·23...,. -
-G 2 J!L
1 0>11 ... (1043)
Remark. In a tri-variate distribution.
ro
GI.232 = GI2 - ... (1043a)
rolf
where ro and roll are defined in (10·32).
10·14. Coefficient of Multiple Correlation. -In a tri-variate
distribution in which each of the variables Xl> X2 • arid X3 has N observations.
the multiple correlation coefficient of XI on X2 amI X 3 • usually denoted by
RI.23. is the simple correlation coefficient between XI and the joint effect of X2
and X3 on XI' In other words R1.23 is the correlation coefficient between XI and
its estimated value as given by the plane of regression of XI on X2 and X3 viz.•
el.23 = b12.3X2 + b13.2X3
We have XI,23
=> el·23 = XI -XI.23
Since X;'s are measured from their respective means. we have
E(X I .23) = 0 and E(el.23) = 0 (.: = 0; i = 1.2.3)
Rydef.•
R _ Cov (XI. el.23)
.. (I044)
1·23 -
Cov (XI' el.23) =E[{XI ,...E(XI»){el.23 -E(el.23))] =E(XI el.23)
=N1 .L. XI el·23 'N.L.
1 ,
XI (XI -XI.23)

='N1 LXI2 - I 1 1
N LXIXI.23 ='N LX 12 - N LXZI.23
=GI2 - <11.232 (cf. Property 2, § 10·13)

Also V(e123) =E(el'232)='N1 L el.232 =Ii.1 L (XI 2

='N1 L (¥1 2 +XI.232 - 2


112
='N LXI2 + 'N LXI.232 -'N LX 1XI.23
Fundam.entals ofMa1hematkal·Statistiee

1 1 2
=i(L X I 2 + N I.X l·232 -N I.Xl.232
=al 2 -al.232 (cf. Property 2, § 10'13)
2
al - al.23 2
R 1.23 =-;:::::::;:=::::::::::::::=====
2
'" al2(a l -
- _ al 2 - al.23 2 _ 1 al.232
R21.23",
=> 2 - - 2
\ al al
al.232
1 -R2l-23
al 2
=
Using (lO·43a), we get
... (1045)
where
1 rl2 rl3
<0 = r21 1 r21 =1 - rii-·- rl32 - + 2r12r13r23(On simplification).
r31 r31 1

CO'll = I 1
r:u
r73
l
I = 1- r23 2

Hence from (1045), we get


R2 _ 1 J!L _ rl2l + ,r13 2 - 2r12 rl3 r23
123-
.
- COll - L - r23 2
... (1045a)

This formula expresses. the multiple correlation coefficient in terms of the


total correlation coefficients between the pairs of
Generalisation. In case of n-variate distribution, the multiple correlation
coefficient of Xl on X 2, X 3 , ... , X"' usually denoted by R l .23..." , is the
correlation coefficient between Xl and

1 1
=NI.X l el-23 .•. ,, = NI.Xl(XI-Xl'23 ... ,,)

1 1
=N I.X12 - Ii I.X.Xl.23... "
=N1 I. X12 - 1•
N W l .2:! ..." =a1 2 .... a21.23 ..." ( ... (*)

1 '1 r
V(el.23 ...J =Nl! e21.23...,,= 1i"i;(XI -Xl.23...,,)2
eon-elation and Regreesion 10·113

=N1 L (X 12 + X21.23 ..... - 2X 1X 1•23 .....)

=N1 1 X2 1·23..... - 2 .!.


2+N N
x1·23.....
_.!. 2 .!.
- N !JX 1 +. N
X2
1·23..... N
_1. 1.. 23 .....
=.(11 2 - (12 1.23 .....
_ (11 2.- (121.23..... = (11 2 - (121.23 ... 112
.. R 1•2 3 .. ,,, 2( 2' 2 >. (11 2
V (11 (1,1 - (1 1·23 ......

R2 = 1_.(121.23..... = 1_.J!L ... (1045c)


1·23..... (11 2 COlI

where co and C011 are defined in (10·38).


1. It may be out here correlation
coefficient can never be negative. because from (*) and (**). we get
Cov (X.It el.23·... ,.) = (11 2 - (121-23 ..... = Var (61.23 ... ,.) 0
'Since the sign of R 1.23 ... n depends upon. the 'covariance. term
Cov·(Xi. er.z3 ...,.). we conclude that R1.23 ..... O.
2. Since R21•23 ..... O. we have:
1 _..!!L 0 => CO COlI ••• (1045d)
COl1

COlI ."

=> J!L CO 0 .• ·.(10·45e)


C011
From the above results. we get
C011 CO 0 ... (1045.1)
In general. we have
COii i = 1•. ..... n
4. Since co is symmetric in Ti/S. we have ,"
• COij =COji; i. 'F j = 1.,2: .... n ... (1045g)

10·14'1. Properties or Multiple Correlation Coerricient


1. Multiple correlation co-efficient, measures the closeness of the
associati6n between the observed values and the expected values of a variable
obtained from the multiple linear regression of that variable on other variables.
2. Multiple correlation coefficient betweep observed values an4 expected
values. when the expected values ate calculated from a linear relation of the
variables determined by the method of least squares. is always greater than that
where expected values are calculated from any. other linear combination of the
variables.
lo·n.- Fundamentals ofMathematica1 Statistic.

3. Since R 1.23 is the simple correlation between X I and el.l3, it must lie
between -1 and +1. But as seen in Remark 1 above, R I .23 is a,non-negative
quantity and we conclude that 0 s R S 1. ./
4. If R 1.23 = 1, .then association is perfect and all the regression residuals
are zero, and as such <121.23 =O. In ths case, siqce XI = el.23, the predicted value
of XIt the multiple linear regression equation of Xl on Xl and X3 may be said to
be a perfect prediction formula. '
5, If RI ' l3 =,0, then \ all total and partial correlations involving Xl are zero
.
[See Example 10·37). So XI is completely with all the other
variables in this case aI\9 the multiple regression equation fails -to 'throw any
light on the value of XI when Xl and X3 are known.
'6. R I .23 is not less than any total correlation i.e .•
R1.23 r12, r13, rl3
10'lS. Coefficient of Partial Correlation. Sometimes the
correlation between two 'variables X I and Xl may be partly due to the correlation
of a third variable, X3 with both Xl and Xl' In such a situation, one may Want
to know what the correlation between Xl and Xl would be if the effect of X3 on
each of XI and Xl were eliminated. This correlation is called the partial
correlation and the correlation between:X 1 and Xl after the linear.
effect of X3 on each of them has been eliminated is called the partial correlationl
coeffiCient.
The residual X l .3 =X I -b 13 X 3, may be regarded as that part of the
variable Xl which remains after the linear effect of X3 has been eliminated..
Similarly, the residual Xl .3 may be interpreted as the part of the variable X;
obtained after eliminating the linear effect of X3 • Thus the partial correlation
coefficient between Xl and Xl, usually denoted.by '12.3, is given by
Cov (X 1.3, X l .3)
r12.3 = ...(1046)
..JVar (X 1.3) Var (X l .3)
We have
1 1
COV(Xl.3,Xl .3): NI.X 1.3 Xl.3': NI.X 1Xl .3
1 1 1
= N I. XI (Xl -b23 X 3) :'N I.XIXl -b 23 N I.X 1X3

= r12',<11<12 - r23 • (r13<11<13).

= (r12 - rl,3 r23)


1 . 1
=NI.X1.32=

1 1
:'N I.X 1Xl,3'=; N I.XI(X 1 -b 13'x3)
eorreJat.Jonand .10.115

=N1 l:Xlz -b 13 • Ii1 LX1X 3


Gl
= Gl z -r13 -r13G1G3
G3
L
=G1Z(1- r13Z)
Similarly. we shall get
V(XZ.3) =
G;(1- r23Z)
Hence
rIZ·3·
G1GZ(rlZ - r13 r23)
=--;::::=r=lZ:::-===r:::13::r=Z:::3::::::;:=
. ••• (10400)
..J G1 Z(1 -
rqZ) GzZ(1 - r23 Z) ",,(1 - (1 - rZ3 )Z

Aliter. We have
o=
=LXz.3 (Xlr blZ.3 XZ - b l 3-z X3)

From this it follows that b lZ.3 is coefficient of regression of X l '3 on Xn


:: imilarly. hzl.3 is the coefficient of regression of X 2-3 on X 1.3. -
Since correlation coefficient is the geometric mean between regression
coefficients. we have

Butbydef.•
b. 1Z.3 = - ?l • CJ?lZ and b Gz roZl
Gz roll Zl·3 = - Gl • rozz

_ rolZZ
,.llZ.3 =(- C!!
Gz roll
( GZ
- Gl' rozz - roll rozz

(.: rolZ

=> ·T.1Z·3 = - " roll rozz •


the negative sign being taken since the sign of regression coefficients is the
same as that of (- rolz)'
Substituting the values of rolZ. roll and ro22 from (10·32). we get

=v(1 - r13.Z)(1 -
rlZ - r'l3 rZ3
rlZ·3
rZ3 Z)
Remarks 1. The expressions for r13.Z and r23 ... can be similarly obtained.
to give
10·118

d T23 - T21 T31


_.
T13.2 -
V(1 -
T13 - T12 T32
(1 - T32 2 )
T122)
an T23.1
V(1 - T212)(1 - T312)
=
=
2. If TI2-3 O. we have then T12 T13 T23. it means = T12 will not be zero
if X3 is correlated with both Xl and X 2• Thus. although Xl and X 2 may be
Uilcorrelated when effect of X3 is eliminated. yet Xi and X2 may appear to be
correlated because they carry the effect of X3 on them.
3. Partial correlation coefficient helps in deciding whether to include or nOt
an additional indePendent variable in regression analysis.
4. We know that 0"1 2(1- T12'1) and 0"1 2(1 - T13'1) are the residual variances if
Xl is estimated' from X 2 andX 3 individually. while <J'1,2 (1-R 1.232) is the
residual variance if:Xl is estimated from X2 and X3 taken together. So from the
above remark andR 1.232 T122 and T13 2• it follows that inclusion of an additiOlla}
variable can only reduce the residual variance. Now inclusion of X3 when X2 has
already been taken for predicting Xl. is worthwhile o!1ly when the resultant
reduction in the residual variance is substantial. This will be the caSe when r13'2
is sufficiently large. Thus in this respect partial correlation coefficient has its
significance in regression analysis.
10·15·1. Generalisation. In the of n variables Xl. X2 •••• X" the
partial coefficient TI2.34•••" between Xl and X2 (after the linear effect
of X3 • X4 • •••• X. on them has been eliminated). is giveo by
,212.34...n = bll-34.•.n X b21 .34...11
But;we ",aVe'

b l 7,.34 ..... =- }
tnl a 0) [ef. Equation (1040)].
b -_ -11 "
21·34... n - al' 0)22

r2 _(
...II - - OJ'
0) 12) ( a2 0)2 f)_...!!!JL
0)1'1' - al' 0)12 - 0)11 0)22

T1234 =- COil
(1046b\
.' ...11 ' " COnC022 J

negati\1e sign being taken since the sign of the regression coefficient is same as
that of (-(012)'
10·16. Multiple Correlation in Terms of Total and Partial
'Correlations.
• ,t
••• (1046c).
Proof. We have
T122 + Tli· - 2T12 T13 T23
1,- T23 2
Correlation and Regression 10·117

Also
_ I (r 13 - _ I - 1'12 2 - r23 2 - 1'13 2 + 2rl2rl31
I - r 13. 22 -
- (l - r122)( I - r23 2 ) - (I ...; rI22)(1 - r23 2 )
He-nee the result.
Theorem. Any standard deviation of order 'p' may be expressed ill terms
of a stalldard deviation of order (p - J) alld a partial correlatiol! coefficient of
order (p - J).
Proof. Let us consider the sum :
2. X2 1.23 . n = 2. XJ.23 ... nXI.23...11

=2.[XI23 (1I-I)X 1·23 ... n•

(c.f Property 2, § 10·13)


= 2.[X r.23,..(11 - J) (XI' - bi2.34 ... 11 X 2 - '" -bl(n _ 1).23. II XII_I

- b lll .23 ... (I1- I) Xn)]


= 2. X I .23.(n _I) XI - b lll.23 ,.(n _I) 2. X I·23...(II- I) XII

(c.f Property 2 § 10·13)

= 2. X2 1.23,..(II_ I) - bill 23 ... (11- I) 2.X 1.2 3...(1I - I)XII. 23...(II- I)

D'ividing both sides by N (total number of observations), we get


= 0'2 123...(11_1) - b 1n .23 (n-I) Cov (XI 23 .. (II-I)' X II·23 .. (n-I»
0'2 1.23. II

The regression coefficient of Xn 23 ... (n _ I) on X I .23,..(II-1l is given by


Cov (XI.23 ... (II-I),XII.23...(II-I»
b/II. 23 . (II_I) =
0' 1·2L.(1I - I)

0'2 1.23 . ':11 = 0'2'1.2.3 .. (II - I) [I - b lll .23 .. (n - 1)·b lll ' 23...(n -I)]

= 0'21 .23 . .(11 - I) [I - r2 /I 1')3


_.... (11 - I) ] , ... (10·47)
a formula which expresses the standard deviation of order (1/ - I) in terms of
standard deviation of order (n - 2) and partial correlation coefficient of order
(1/ - 2), If we take p =(II - 1), the theorem is established.
Cor. 1. From (10-47), we have
0'2 123 .. (11- I) = 0'2 1.23 ... (11_ 2) (I - r21(n - ).23 ... (11- 2» ... (1O·47a)

and so on: Thus the repeated application of (1 0-47) gives


0'2 1.23 II = 0'1 2 (1 - rli) (1 - rl3.i) (1 - rI4.'3i) .. ·(1 - ,.2111.23...(1I-1l)
... (l0·47b)
Since partial correlation coefficients cannot exceed uni'ty numerically, we
get from (10·47), (I0·47a), and so on, .
10·118 Fu.ndament.U ofMatbematical Stadatiee

(721.23...11 S (721.23... (11_1)

(721.23 ... (11 -1) S (721.23 ..• (11_2)

(71 ••• ... n ... (1047c)


Cor. 2. Also, we have
(721.23 .•.11 = (712(1 - R21.23 •••II )

On using (I047b), we get


I - R21.23 ...11 == (1- rI22)(1- rI3.22) ••• (1 - ,-2111.3 ... (11-1» ... (1047d)
This is the generalisation of the result obtained in (IO·46c).
Since I rij.(s) I S I; s =0, 1,2, .•. , (n - I),
where rij.(s) is a partial correlation coefficient of order s. we get from (1047d)
I- R2 1.23 ...11 S 1'- rli-
I-R21. 23 ... 11 S I -r213.2,
and soon.
I.e., R21.23 ... 11 r122, r21].2, • ".' r21!'.23 ... (11 -I) ••• (1047e)
Since R 1.23...11 is symmetric in its secondary subscripts, we
R2 1. 23 ... 11 rl?, (i = 2, 3, ... , n) }
... (I()47j)
R 1.2LII rlij (i ¢ j = 2,
2
... , n)
and so on
10·17. Expression for Regress'ion Coefficients in Terms of
Regression Coefficients of Lower Order. Consider'
:r. XI.34 ... IIX2.34 ... 11 =:r. X I.34... (II_I) X 2.34... 11

=:r.XI.34... (II - - b 23.4... II X 3 - ... - b 2ll.34... (II_I)X,,)

=:r. .,.(II-I)X2 "" b2ll.34 ... (!'_ .1) :r. .. ,(II-I)XII

=:r. X I .34... (II-1)X2.34... (.. -I)


- b 2ll•34... (II-1) I X 1•34•••(II-1)X...34...(II-1)
Dividing both sides by N, the total number of observations, we get
dov (XI.34 ... II.X2.34...,,) = Coy (XI.34...(II-I).X2•34...(II_1»
- ... (II- 1) Cov'(Xl·34...(II- J),x 11·34... (11-1)
b l 2-34 ...11 (722-34 ...11 =b I2.34... (II", I) (722.34 ...(11 - I)
- b211.34 ... (11 -I) b 111.. 34... (11 -I) (7211.34... (11_1)
On using (10·47), we get
Co1ftIationand Begreaaion

bI2.34 ... 11 a 22-34 ..• (II -I) (I - ,2211•34 ... (11 _ I)}
= a22.34 •.. (11 -I) [b I2•34 ..• (11 -I) b 2ll.34 ... (11 -I) b lll•34 •.• (11 - I)

X a:". '"-1)] ...(*)


34 ...
a 2.34 •.. (11 - I)

Irt the case of two vatiables, we have


bij ar = t1r bjl =Cov (Xi, Xj) => bij =aJ
a.2
bji

b' a 211.34 •.. (11 - I) b


•• 211·34 ... (" -I) a 22·34
'.
... (,,-1)
= 112·34... (11- I)

Hence from (*), we get


bI2.34 ... 11 a22.34 ••. (11 _ I) ( 1- ,2211.34 .•. (11 - I)}

-----;. =.a22.34. •• (II':' I) [bu.34 •.. (11 - I) - b lll.34 ..• (11 -I) b ll2.34 .•• (11 -I)]

b _ [b IB4 ... (II -1) - b lll.34 ... (I1- 1) b Il2.34 ... (11 - 1)]
... (1048)
12·34 •.. 11 - 1 ,2
, - 211·34 ... (" - 1)

=> b
12·34 ...11
=,[bu .341... ,"b-
-
1) - b l ".34... ,"
b
1).

2,,·34 .•. (11 -I)· ,,2·34 ... (11 -


b lll.34 ... ,,,
1)
'p] ... (1048a)

10·18. Expression for Partial Correlation Coefficient in


Terms of Correlation Coemcients of Lower Order. By definition, we
have
(J.1m.
bil. k ..t

- , .. 1m. X '...b.=!
- IJ. ..t aj.k . .t
••• (*)

b 1".34 ... (" - 1).bll2.34 ... (11 - I)

_ al.34 ... '" -I) x a ll·34... (II-1)


- '111.34... (11 -I) a ' '"2.34 •.. (" - I) a
. 11.34••. (11-1) 2·34 .•. (11 -I)

.. a1.34••. (" - I)
="1".34 •.• (11-1). '112.34... (11-1). a ... (**)
2·34 ... (11-1)
Hence from (1048). on using ('!') and (...... ). we get

.
'12·34 ...)l X
a 2-34...11
_ [(rI2'34 ... (11 -.1) - ... (11 -I) r112·34 ... (11- t)} a 1·34 ... (1i - I)J ••• (***)
-. I' - 2
, 211·34 ••• (" -I) a2.34 .. ,(I1- I)

Also on using (10·47), we get


al·34 ... " =al·34... (II-I) x [1 - u],u?·
a2·34 ••. 11 a2,34 ... (I1- 1) [1 - r 2/1·34••• (11 - I)

Hence from (.........). we get


10·120

1 - 1'2110.34 ... (11 _


1'12·34••• 11 [ 1 1'2
n] i
!
- 211·34 •.. (11 - I)

_ [1'12'34 ... (11 - I) - 1'111·34 ... (11_1) 1'112·34... (11 - I)J-


- 1 - 1'2211.34 ... (11 _ I)

=> 1'12·34
•. ,
):J (/12.!" ... (1I-1) -
- I' 111.34•.• (11 - 1 - I' 112·34.,.(11 - 1)
... (1049)

which is an expression for the correlation Coefficient of order p =(n - 2) in


,tenns of the correlation coefficient of order (p - 1) =(n - 3).
Example 10'33. From the data relating to the yield of dry bark (Xl)'
height (Xi) and girth X3for 18 cinchona plants the following correlation
coefficients were obtained:
1'12 =0·77. 1'13 =0·72 and 1'23 =.0·52
Find the the partial. cdrrelation coefficient 1'12-3 and multiple correlation
coefficient R 1.23 . "

0·77 - 0·72 x 0·52 = 0.62


..}[1- (0·72)2][1- (0·52)2]
R 2 _ 1'122 +. 1'132 - 21'12 1'13 I'll
1·23 - 1 - 1'232
(0·77)2 + (0·72)2 - 2 x 0·77 x 0·72 0·52
= 1 _ (0.52)2 =0·7334
R 1-23 =:.: 0·8564
(since multiple correlation coefficient is non-negattve).
Example 10'34. In a trivariate distribution :
(J1 =2, (J2 =CJ3 3, T12 =0·7, '23" 1'31 =
0·5. =
Find (i) r23_1t (ii) R J-23 , (iii) b J2.3, and (iv) (JI.23.
Solution. We have
-;=0:::.:::5 = 0.2425
..) (1 - 0·49)(1-·0·25)
_ 1'122 + 1'132 - 21'121'13 r2l
- 1- '2,2
_ 0·49 + 0·25 - 2(0·7)(0·5)(0·5) 0 412
- 1 _ 0.25 ... -'J

R I -23 =+ 0·7211
10.121

<11'3 =<11 V(I- T132 ) =2V(1-0..25) = 1·7320


<12.$ =<12 V (\'''': =3 V (1 - 0·25) =2·5980
J_
<11.2 =<11 V (1 - T122) =2 V (1 - 0·49) = \.4282
U3.2 = <13 V (1 .,.., Tn2) =3 V (1 - 0·25) =2·5980
Hence b 1l.3 =04 and b U .2 =0·1333

(iv) (11.23 =<11


-.
1 Tn T13

where C.t> = T21 1 T13 =1 - T122_T132 "- T132 + 2rll1"13T23 =0·36


T31 Tn 1

SId C.t>ll = I I= 1 1-0·25 =0·75


•• 0'1.23 2 x V 048 '= 2 x 0-6928= 1·3856 =
Example 10·35. Find the equation of Xl on X2 and X3 given
thefollowing resUlts :-
Trail Mean Standard deviation
XI 28·024·42
X2 4·91 1·10 -0·56
X, 594 85 -0·40
where Xl -= "Seed peT aCTe; X2 =Rainfallin inches
X3 =Accumulated temperature above 42°F.
Solution. Regression equation of Xl on X2 and X3 is given by
(Xl -Xl).!!!!!. + (X 2 -Xz) C.t>12 + (X 3 -X 3) C.t>103 =0

I
<11 <12 CJ:J
I Tn T13
where C.t> - T21 1
- T31 Tn ·1

C.t>ll =I 1.= I- T 232 =1-(-0·56)2 =0·686


C.t>12 ,... - II T21
T31
T13
1
.I= TU Tl3 - T21 =- 0·576
C.t>13 =T23 T12 - T13 = (- 0·56) (O.go) - (.... 040) =- 0·048
: ..Required equation ofpIane of regression of Xl onX2 andX3 is given by

442
(X .... 28·02) + (..,..0·576) (X .... 4·91) of (-0·048)
1"' 1.10 2' "85.00
(X3 - 594) =0
10·122 Fundamentala of Mathematical Statiatit.

Example 10·36. Five hundred students were examined in three subjects I


1/ and III; each subject carrying 100 marks. A student getting 120 or more bu;
less than 150 marks was put in pass class. A student getting 150 or more bur
less than 180 marks was put iii second class and a student getting 180 or more
marks was put in the first class. The following marks were obtained:
, I n I/-
Mean: 35·8 524 48·8
SD. : 4·2 5·3 6·1
Correlation: r12 =0·6, r13 =0·7 "23 =0-8
(J) Find the number of students in each of the three classes.
(ii) Find the total number of students with total marks lying between 120
and 190.
(ii.) Find the probabil.ity that a student gets more that 240 marks.
(iv) What should be the correlation between marks in subjects I and II
among students who scored equal marks in subject 11/ '1
(v) If r23 was not knowri,'pbtain limits within which it may lit from the
values of r12 and r13 (ignoring sampling errors). .
SoJution. If Z denotes dte total o( the students in the three subjects
and X 1" X2, X 3 the total, marks of the students in subjects I, II and III
respectively, then
Z =X 1 +X2 +X3
•• E(Z) = E(X1) + E(Xz) + E(X3) = 35·8 + 52·4 + 48·8 = 137
=
V(Z) V(X 1) + V(Xz) + V(X3)
. +2(Cov (Xl'Xz) + Cov (X2,X:i)-+ Cov (X3;X1)]
= 17-64 + 28.()9 + 37·21 + 26·712 + 35·868 + 51·728
= 197·248 . [Using Cov (Xi, X)) =rq<Yiap
=> <Y; = 197·248 or <Yz = 14·045
Now ; =Z- - N(O, 1)
<Yz

Z ; Z-137
14·045 P= J
-00
p(;)d; Class
Area ,,,.der the
curve in this
class (A)
Frequenq
500 x (A)

120 - 1·21050 0·11314 120 - 150 0·70937 354·685


150 0'92567 0·82251 159 -180 0·17639 88:195
180 3·06180 0·99890 180 - 0·00102 0·510
190 3·77400 0·99992 120 -; 190 0·88678 443·390
240 7·33410 - 240-
. 0·00000 0·000
(I) The number of students in fIrst. second and third class respectiv.ely are
355, 88 and 0 (approx.)
(iJ) Total number of students with total marts between 120 and 190 is 443.
(ii.) Probability that a student gets than 240 marks is zero,
(iv) The correlation coefficient marks in subjects I and.n of the
Sbldents who secured eqQ81 marlcs in subject m is rl2-3 and is given by
eorreJationand R.eer-ion 10·123

_ '12 - '13 '23


'12·3 - = . 0·04 = 0.0934
V(1 - '132)(1 - '232) ...J (1 - 0·49)(1 - 0·61)
(v)Wehave.
2 ('12 - '13 '23)2 S 1
'12·3 = (1 - '132)f1 - '232)

... (0·6 - 0·7a)2


(1 _ 0.49)(1- a2) S I, where a = '23·
0·36 + 049a2 - O·84a S 0·51 (1' - n2)
=> a 2 -0·840 -0·15 SO
Thus •a' .lies between the rootS of the equation:
a 2 -O·84a -0·15 ='0,
which are 0·99 and - 0·15.
Hence '23 should lie between - 0·15 and 0·99.
Example "}'0·37. that
1-R 1.232.=(1-'122)(1-'13.22 )
Deduce that
(I) R 1.23 '12. (iI) R 1.232 = '122 + '132, if'23 =.0
(iii) 1 - R 1.23 2 = (1 - 2p) ,p,ovided all the coefficients of ze,o
order are equal to p.
(iv) If R 1.23 =0, Xl is ""correlated with any of othe, va,iables, i.e.,
=
T12 = '13 O. [Delhi Uni". B.Sc. (Stat. Hon&)fl989]
Solution. (,) Since I '13.21 S I, we have from (l0·46c)
1 -R 1.232 S 1 - '122 => R 1.23 '12
(il) We have

'13·2= a''4 (1"13


.- '12- 2'12)(1 '32-'322) -a'
_ '13
'41 - 't2
2

•• Ffom (1046c). we get


2
l-R t .232 = (l-r122) [1 .:.. 1 'i3 = 1 .... '122 -'132
- '12 .
Heoce R 1''+32 = '122 + '132, if '23 = O.
(iii) Here, we that'12 ='13 =P
.. ' '132 = P - p2 = p(1 - =
. ...J (1- p2)(1 _ p2) 11 - p) 1+ P
Hence from (1046c), we have
1 R 2_ ( 2) [ . p2 ] _ (1 - p)(1 + 2p)
- 1·23 - 1,,..,p 1.,. (1 + - (1 + p)
(iv) '!IF 1.23 = 0, (1046c) gives
1 = (1 - '122)(1 - '13.2.2) ... (*)
Since 0 S r122 S 1 and 0 S r13.22 S 1, (*) will hold if and only if
r12 = 0 and rl3-2 = 0
r13.- r 12 r32 _ 0
Now r13.2 = 0 =>
V(1 - r122)(1 - r32 ) -
2
r13 ,__ 0 (
=> .,'r12=0)
VI - r32 2
=> r13 = 0
= = =
Thus if R1.23 0, then r13 ru 0, i.e .• Xl is uncorrelated with X2 andX3•
Example 10'38. Show that the correlatio..n coefficient between the
residuals and X2.13 is equal and opposite to that between X1.3 and XZ,3.
[PoonG Univ. B.Sc., 1991]
The correlation between X 1.23 and X2.13 is given by

Cov (X X )
1·23, 2-13 1·23
X
2·13
N.l - 'x2 - b 13.2X3)
Cll.23 Cl2·13 =N Cll.23 Cl2.13 - Cll.23 Cl2.13

=-b lZ.3NLX2.13X2
Cll·23 Cl2.U.
f
(c•• Property 1, § 10·13)

b ·LX2.132
=- lZ,3 NCll:21. Cl2.13 ,c.f. Property,2 § 10'13)
I

_ b ClZ,13... b (Cl2
- - 12·3 - - 12-3 .
Cll·23 (CliVW/ooll )
1 r12 r13
where 00 = r21 I r23
r31 rn '1

I I
"1 -
0011= r: 1= l--r232 and 1= l-r132
z
•• . XZ,13) = - b1203 Cl2
::;- • 1 rrZ3 2 =- b12.3 _
VI 13 vl·3
-
=
[since Cl2032 Cl22 (1 - r232) and Cll.32 ='Cl12 (I - r132)]
(X 1.3, X2.3) Cl2·3
•• r (X1·23, X
.
2·13) =- 2 •-
Cll.3
COY (Xp, X 2•3) r(X X_\
=- Cl203 Cll.3 =- 13
'. 231

Hence the resulL


lQ·39. Show tMt =
if X3 aXl + bX2, the three partial
are numerically· equal to unity, r13.2 havmg the 'sign of a, r23·L, the SIgn of band
rl1-3. the opPosite sign of alb. [Ktmpur Univ. M.Sc., 199J]
10·121

Solution. Here we may regaraX3 as dependent on Xl andX2 which may


be taken as independent variables. Since Xl and X2 are independeni, they are
..,correlated.
nus r(X 1,Xz}r:O Cov(XloXz}=O
V(X3) = V(aX l + bXz} = a 2V(X1) + blV(Xz} + 'lab Cov (Xl> Xz)
tRa12 + JilG';,.
w!JetC V(X 1) = a1 2, V(Xz} = a22•
Also X 1X 3 = X1(aX l + bXz} = a X12 + bX1X 2
Assuming that Xl's are measured from their meIms, on taking expectations
of both sides, we get .
COV(Xl,X3) =OO12'+bCov (X1,Xz}-OOI 2
r _Cov(X .. X3) _ aa1 2 001
., 13 - "V(X 1) V(X 3 ) - "al2(a2a1 +JilG22) =T'
2
wherC le2 =a2a12+bla-i.
Similarly, we
will get
Cov (X 2 , X3) 002
r23 =" V(X2) V(X3) = T
r13- r12 rn Ie
'132 = = k =_ =± =±1
, " (I - r122)(1 - r322) Vle2 - b2a22 .." dlo12 aal
according as 'a' is positive ot negative... Hence. r13.2 has the same sign as 'a'.
Again
rn - r21r31 ba., k M.. ±I
'23,1 = "(1--;:21,:)6 =T Vle2_ a2a12 = =, ,
aCcording as 'b' is positive or negative. Hence r2;l.l has the same Sign as 'b'.
Now
r12 .... r13r23
'12-3 = .. - Ie • Ie • v(k2-a2cJ12)(le2_'b20j)
v(1-,r13 2) (l-r23 2)
abal Ga _ aib _ -(alb) -:r- 1
= - ...J tra22 X a2cJ12 - - "QlOl - - "dlnr - ± (a/b) - ,
according as (alb) is positive or negative. Hence rl2-3 has the sign opposite to

Example 10'40. If all the cod/icients of zero order (n a set 0/


p-variales are to p, sHow thoJ
(,? partial co"elDtion of s'th order,is T!;p ...(*)
(u) The coefficient 0/ multiple correlatiQ.n R 0/ a variate with the othe,
(p -1) variates is given by
1-R2=(1-p) [11 ++(P-l) P]
- 2) P'
rDellai URi.,. M.Sc. (MaIM'); 1990]
"
10·128 ofMatbematical

Solution. We are given that


r ..... =p.(m.n= 1.2•...• p;m*n)
We have

rift = rij - rit rit


• (. •
I. j.
k
= 1'. 2• .... p; I. J .j ' :I- k)
rit2)(1 - rjt2 )
\_ p-p.p _---L
••. (**)
':"" _ p2)(1- p2).- 1 + P
Thus every partial correlation coefficient of first order is p/( I + p).
=> (*) is true for s = 1.
The result will be established by the principle of mathematical induction.
Let us suppose that every partial correlaq9n coefQcient of order s is given by
pI(1 + sp). Then the partial correlation coefficient of order (s + I) is given by

rij.bro. ..t = 2- • ( 2 )
(1 - r it.(;» 1 - r jt.(s)
where k. m • •••• tare (s + I) secondary subscripts and rij.(s). rik.(s) .. rjt·(s). are
partial correlation coefficients of order s. Thus

P - :\t".+
-:----L--
1 + sp
(---e.- Y
sp) __
(1 _ P
T..
'J·Icm. •• t = =
1- (1- 1 +(s+ l)p
1 + sp ) 1 + sp 1 + sp

Using (**) and (***). the required result follows by induction.


(;0 We have 1 - R2 =....!!L
CO 11
where R is the multiple correlation coefficient of a variable with other (p -I)
variables and •
1 P. f' ... p
p i p ... P
co= P.P 1 ... p • a determinant of order 'p' and

p p p ... 1

1 P P •.. P
P 1 P ••• P
COlI = P P 1 .•• P a determinant of order (p - 1).

P P P ••• 1
10·127

Webave
1 P P P ... P
1 1 P P ... P
0)= [1 + (p -I)p] 1 P 1 P ... P
1 I I
i , i !
1 P P P ... 1

1 P P P P
0 (1- p) '0 0 0
0 ;0 (i - p) 0 0
ro=[I+(p-I)p]

·0 0 0 0 (1- p)
[On operating Rj -Rlt (i = 2. 3•.•.p)].
•• CD =[I + (p _ l)p](1 _ py-l
Similarly. we will have
0)11 =[1 + (p - 2)p](1 _ p)P-z,

I_Rz
roll
+ (P-I)p
.1 + (p - 2)p
J
Example 10'41. In a p-variate distribution, all the total (order zero)
correlation coefficients are equal 10 Po Let Pl denote the partial correlation
coefficient of order k and be the multiple correlation co'!jficient of one variate'
on k other variatef. Prove that
(i) Pn - (p I)' (ii) Pl- Pl-l =- PlPA>-lt and

( ";\ R Z - k (Delhi Univ. MoSc. (SIaL) 1981]


lll, 1 - 1 + (k - 1 )Po .
Solution. (z) We have proved in 'Example 10·40. that
_ Po
Pl-l +kpo
In the case of p-variate distribution, the partial correlation coefficient of the
highest ord€[, is Pp-z and is given by
_ Po
Pp-z-I + (P _ 2)po
Since I Pp-z lSI -I S Pp-z S 1.
we have (on considering the lower limit)

-I SI or -[I+(p-2)po]Spo
1
10.128

= - (I +
(iiI) Taking P = Po and k = P - I in P81l (ii) Exan;aple 1040, we get
2 .[ 1+ kpo ]
I -R t = (1 Po) I + (k - .1')Po
_ (I - Po)(1 + kpo) _ k p02 ".
Rt 2 - I- I + (k _ I)po - I + (k _ I)po (On simplificatIon).
Example 10·42. If T12 and T13 are given. show that T23 must /ie in the
Tange: r13 ± (1 - T122 - T132 + TI22T132)lfJ.
T12
IfT12 =k and T]3 =-k. show that T23 will /ie between -1 and 1- 2fil.
[&rdar Palel Univ. B.8c. Oct., 1992; Madrae Univ. B.8c. Mainj 1991)

Solution. We have
.
TI2-3
2
=[ T12- T 13T23 ] 2
S
I
"'(1- T13 2 ) (l-rn2)
•• (T12 -'T13T23)2 S (1 - T232)
'1'1+ TI32r232 - 2r12 T13 T23
S I-TIl, -T232 + TI3?:T231-
2
T122 + T13 + - 2r12 T13 T23 S I •..(.)
This condition holds for .consistent values of T12, '!t3. and f 23 • • may be
rewritten as :
T232 - (2TI2T13)T23
+ (T122 + TI32 .- I) sO.
Hence, for given values of T12 and T13, T23 must lie between the roots of the
quadratic (in equation
T23 2 -- (2r12 T13) T23 + (T122 + T132 - I) = 0,
which are given by, :
T23 =T12T13 ± "'TI22;'132 - (T122 + Tq2 I)
Hence
T12 T13 - -.J I - T,,,2 -'TI3 2 + S T23 S T1.2 T13
+ -.J (I - Tll'--T-13"2-+-T-12-;;:2T-l-:32::") ... (U)

In other words, Tz3 must lie in the range


T12 T13 ± "'-:(-I---T-12-=2:O-_-T-l-:32=-+-T-12"2T-l"""32:-)
In particular, if T12 =k and T13 =- k, we get froin (**)
- k 2 - '" (1 - k2 - k2 + k4) S T23 oS - k 2 +
eorreJadoll1UUl1le8a-ion

-k2-(I-k2) S T23 S - k 2 + (1 - k 2)
-1 ST23 S J _2k2

EXERCISE 10(g)
'-
L (0) Explain partial correlation and multiple correJatiOQ.
(b) Explain the concepts of multiple and partial coefficients.
Show ihat the multiple correlation coefficient R 1.23. is, in the usual
notations given by : R)'232 = I--
ro
roll
2 (0) In the usual notations, prove that
R .2 _ T122. + T1?- - 2r 1t'2l'31..,. 2
1·23 - 1 - T23 2 T12

(b) If R 1.23 = 1, prove that T2.13 is alW to 1. 'If R 1.23 =:= 0, .does it
necessarily mean
thatRz,13 is also ,zero ?
3. (0) Ol>Wn an expression-for'the variance of the residual X 1.23 in terms of
the corre1ations T12, T23 and T31 and deduce T12 and T13'
(b) Show that the standard deviation of'oroet'p inay be expressed mterms of
standard deviation of order (p - 1) and-a correlation coefficient of'oroer (p - 1).
Hence deduce that :
(i) 01 °1.2 °1.23 .:. ° 1.23 ... ,.

(ii) 1 - ..." = (1 - (1 -' T ... (r ..:. ..• ("- 1»


[Delhi Univ. M.Sc. (Stot.) 1981]
4. (0) In a p:variate distribution all the loal (zero order) correlation
*'
coefficients are equal to Po O. If 1'1 denotes the partial,correlation coefficient of
order k, fmd Pt. Hence deduce that :
,
(,) pk - Pt -1 = - Pk Pt-l
Po -1I(P - .1). [Delhi Univ. M.Sc. (Stat.), 1989] •
.. -
I:,) -.
- ,
. (b) Show that the.multiI?le correlation coefficient- R1' 23 .• J between Xl and
=
(X2, X3, •.• , Xj)' j 2, 3, ..• , p the :
.. R 1•2 S R 1:23 ,S .•• S R 1.23 ...P ·
Univ. M.Sc. (Matias.), 1989]
5. (0) Xo, Xl' .,., X,. are (n + 1) vaqates. Qbtain a linear function of Xl>.
X2, X,. which will have a maximum correlation with Xo. Show that the
••• ,
correlation R of Xo with the linear ful}ction is giyen'by .

.R =(1- c?,
(1),00
J1
lo.t30

-. 1 '01 'OZ.···.Ja.
'10 1 '12..... Jl"
<.0=

'110 'Ill ',a......l


and COoo is the determinant obtained by deleting the fU'St row and the fU'St column
of <.0.
(b) With the usUal notations, prove that
<.0 .
<721.234.../1 =m<712 =<712 (1- '122)(1- '13.22) ... (1- rll /l'23... /I_ I)
11 _
(c) For a trivariate distribution, prove that

'12·3 = ::::::::;;::
V(1 -
'132) (1 - '232)
6. (a) The simple correlation coefficients between tePlperature (XI)' corn
=
yield (Xz) and rainfall (X3) are, '12 = ()'59, '13 046 and '23 = 0·17.
Calculate the partial correlation coefficients '12-3, '2H and '31.2' Also
calculate R I •23•
If r12 = '13 = - 040 and '23 = - 0-56, fmd the values of '12.3, '13.1
and '23:1' Calculate funher R 1(23)' R2(13) and R3(12)-
7. (a) In certain investigation, the following-values were obtained :
'12 =0-6, '13 = - 04 and '23 =0·7
Are the values consistent?
(b) Comment on the consistency of
3 4 1
'12 =S' '23 =S' '31 =- 2 .
(c) SupPose a computer has found, for a given set of values of XI, Xl and

=
'12 = 0·91, '13 0-33 and '32 = 0·81
Examine whether the computations may be said to be free from error_
=
8. (a) Show that if '11 '13 = 0, then R 1(23) =O. WhaJ is the sig'ilificance
of this result in,regard to the mulQple regression equation of XI 011 X2 and X3 ?
(b) For what value of R 1•23 will X2 andX3 be uncorrelated 7
(c) Given the data: '12 =-0·6, '13 = 04, fmd tile value of '13.80 thittRI.23'
the multiple correlation coefficient of XI oil X2 and X3 should be unity.
9. From the heights (Xl), weighl$ (Xz) and ages (X 3) of a group of students
the following,staildard deviations tJIld correlation coefficients were obtained :
<71 = 2·8 iJlches, <72 = 12 lbs, and <73 =,1'5 years, '12 = 0·75, T23 0-54, and =
=
'31 0,,43. Calculate (I) partial regression coefficients and (ii) partial correlation
coefficients.
10. For a trivariate distribution. :
XI =40 Xl =70 X3 =90
<71 =3 <71 =6 <73 =7
'12 =04 '23 =().5 '13 =0·6
lO·lSl

Find
(0 R 1.23. (;0 r23.1. (iii) the value of X3 when Xl = 30 and X2 = 45.
11. (a) In a study of a random sample of 120 students, the following results
are obtained :
Xr = 68. X2 =70. X3 74 =
S1 2 =100. = 25.
S22 S32 = 81.
rn =0·60. r13 =
0·70. r2) 0·65 =
[S1 =Var (Xi)]. where X 1.X2 • X3 denote percentage of marks obtained by a
sbldent in I test, II test and the final examination respectively.
(0 Obtain the least square regression equation of X3 on Xl and X2:
(iO Compute rt2.3 andR 3•12•
(iii) Estimate the percentage marks of a student in the final examination if
lie gets 60% and 67% in I and II tests respectively.
(b) Xl is the consumption of mille per head. X2 the mean price of mille. and
X3• the per capita income. Time series of the three variables are rendered trend
free and the standani deviations and correlation coefficients calculated ':'
SI = =
7·22. S2 547. S3 6·87 =
rn =- =
0·83. r13 0·92. r23 =-
0·61
Calculate the regression equation of X 1 on X2 and X 3 and interpret the regression
'lIS a demand equation..
12. (a) Five thousand candidates were examined in·the subjects (a). (b), (c);
each of these subjects carrying 100 marks. The following constants relate to
these data : ./ y

Subjects
(a) (b) (c)
Mean 39·46 52·31 45·26
Standard deviation 6·2 9·4 8·7
rbc = 047 rca = 0·38 rab = 0·29
Assuming normally correlated population. find· the number of candidates
who will pass if minimum pass marks are all aggregate of 150 marks for the
three subjects together.
(b) Establish the equation of plane of regression for variates Xl. X2. X3 in
the determinant form
X l/al X2Ial X3Ia 3
rn 1, r2) =0
1
[Delhi Univ. B.Sc. (Matu. HOI&8.). 1986] ,
13. (a) Prove the identity
b1l.3 b23.1 b31•2 =r12.3 r23.1 r31-2 [GujaraI Unit•• B.Sc.. 199!]
Fundamental· of MatbematiaJ. Stad.tIa

(b) Prove that


2
R I.23 = b12.3 rl2 + b13-2 r13
[Sardar Pab!l UniV. B.Sc., 1991]\
14. (a) If X3 = aX I + bX2 for all sets of values.of X I .X2 , and X3• find the
value of r23.I'
(b) If the relation aXI + bX2 + eX3 =0 holds for all sets of.values XI .X2
and X3. what must be the partial correlation coefficients ?
IS. (a) If rl2 = r23 =r31 =P I. then
p.
r12·3 =r23-1 = r31·2 = - a n d R I (23) =R2(13) = R3(12) =
p...n
I+p
(b) Yh Y2. Y3 are uncorrelated standard variates. X I = Y2 + Y3,
X2 =Y3 + Ylo and X3 = Yl + Y2 • Find the multiple -correlation coefficient
betweenX3 and (Xl
16. X, Y, Z are independent random variables with the same variance. If
1 ( 1 1
XI=..{2 X-Z)'X 2 =.,f3(X+Y+Z), X 3 =..,[6(X+2Y+Z),

show tharXI , X 2, X3 have equal variances. Calculate r12.3 andR I (23)'


17. (a) If X I ,X2 and X3 are three variables measured from their respective
means as origin and if el is the expected value of XI for given values of X2 and
X3 from the linear regression of Xl on X2.and X" prove that
Cov (Xl. ell = Var (el) = Var (Xl) - Var (Xl - el)
(b) If rl2 = k and r23 =- k, show that rl3 will lie between -I and I - 2/c2•
18. (a) For three variables X. Yand Z, prove that
rXY + ryz + rzx <!: - ., .(*)
Hint. Let us transform X, Y, Z to their standard variables U, V and lV.
(say), respectively, where
U = X - E(X) , V =Y - E(Y) • W =Z - E(Z)
ax ay
so that
E (U) = E (V) = E (W) =0 } ••• (*"')
au2 == av2 =aw 2 = I => E(U2) = E(V2) = E(W2) =1
ruv =Covauav
(U, -V) E(UV) - E(U) E(V) E(Uv)}
'" au av
= .•• (*"'''')
l'uw == E(UW); rvw = E(VW)
Since correlation coefficient is independent of change 'of origin and scale,
proving (*) is equivalent to proving l
ruv + rvw + ruw -3/2 ... ("''''**)
To establish (*..*) let us consider the E(U + V + W)2, which is alwayS
non-negative i.e., E(U + V + W)2 0, and use (**) and (***).
10.133

(b) X,Y,Z are : three reduced (standard) variates and E(YZ) =; E(ZX) = - Itl,
find the limits between the coefficient of correlation r(X, 1') is necessarily

Hint. Consider E(X + Y + Z)2 0 :=> r - t.


(c) If r12, r23 and r31 are correlation coefficients of any three random variables
XI ,X2 and X3 taken in pairs (Xl. Xz). (X 2.X3) and (X 3• Xl) respectively. show
that I + 2rl2 r23 r31 r122 + r132 + rri
19. (a) If the relation aXl + bX2 + cX3 = O. holds for all sets of values of
Xl,X2 and X3• where Xlo X2 andX3 are three standardised variables. find the
three total correlation coefficients r12. r23 and r)3 in terms of a, b and c. What
are the values of partial correlation coefficients if a, b and c are positive?
(b) Suppose Xl> X2 and X3 satisfy the relation alX) + azX2 + = k.
(i) Determine the three total correlation coefficients in terms of standard
deviations and the constants at. a2 and a3'
(ii) Slate what the partial correlation coefficients would be.
20. (a) Show that the multiple correlation between Y·and Xl. X 2• .... Xp is
the maximum correlation between Y and any linear function of XI, X2 ••••• Xpo
(b) Show that for p variates there are pe2 correlation coefficients of order
zero and'p....ZC". pe2 of order s. Show further that there are pe2• 21'-2 correlation
coefficients and pe2• 2P-) regression coefficients.

ADDITIONAL EXERCISES ON CHAPTER X


1. Find the correlation coefficient between
(,), aX + b and Y. (ii) Ix + mY and X + Y, when cQrrelation cQ..efficient
between X and Y is p.
2. If Xl andX2 are independent nonnal variates and U and V are defined by
U=X 1 cosa+X2 sina. V=X2 cosa-X l sina.
show that the correlation coefficient p between U and V is given by
2-1_ 4G1 2 G;
p - 4G)2GZ2 + (G)2 - (22) sin 2 2a '
where Gl 2 and C!l are variances of Xl and X2 .respectively.
3. The variables X and Y -are normally correlated. an4 11 are defined by
= X cos e + Y sin e, 11 = Y cos a -X sin e
Obtain a so that the distributions of and 11 are independent.
4. A set of n observations of simultaneous values of X and Y are made by
an observer and the standard deviations and product moment coefficient about the
mean are found to be Gx, Gy and PXy. A second observer repeating the same
?bservations made a constant ·error e in observing each X and a constant error E
In observing each Y. The two sets of observations are combined' into a single set
and coefficient of correlation calculated from it. Show that its value is

(PXy+ (Gx 2 + +
lo.tS4 Fundamentals 01 Mathematical StatistiQ

Hint. here we have two sets of'observations :


x.
i = 1.2•...• n; Mean = s.d. =a., .
1st Set: (x. Yi).
Product moment coefficient p", =,'" a.. a,
2nd Set: (Xi + e. Yi + E). i= I. 2•...• n

1y1ean (x' ), =Ii


1 I -
+ e) = x + e

Variance =ai' =! L [(Xi + e) - (x + -x )Z=(Ji


n . n L (Xi
Mean (y') =y + E. a/' =al.
Product moment coefficient:
p",' =! L[(Xi + e) - (x + e)][(Yi + E) - (y + E)] = p",
n
'To obtain the correlation coefficient for the combined set of 2n observations
ese Formula (10·5); Exrur.ple 10·1 I (a) page 10·15.
S. Each of n independent trials can materialise in exactly one of tlte resullS

AI.A z..... A ... If the probability of Ai is Pi in every trial (. L


I , = I
Pi = I) ,fmd
the probability of obtaiping the frequencies '10 'Z ..... 'k for AhAZ ..... A.
respectively in these trials. Also find Var ('j) and show that the correlation
coefficient between 'i and 'j is independent of n. .
6. In a sample of size n from a multinomial population nl' nz• .... nk are of
type 1.2..... k with };Pi,= I. where Pi is the probability of type i (i = 1.2.... ,
k). Show that the expected value of nz when nl is given is (n - iii) Pil - PI) and
hence Or otherwise show that the coefficient of correlation between nj and nj is

J2
I

[ PiP;
.". (I - Pi) (1 - Pj)
7. A ball is drawn at random from an urn contaiaing 3 white balIs
numbered O. 1.2 ; 2 red balls numbered' O. I and I black ball numbered O. If tbe
cOlours red and black are again numJ>ered O. I and 2 respectively. shOW
tI¥lt the correlation coefficient between the variables : X, the colour number and
Y, the number of'the ball is -
8. If X\ and X z are two independent normal variates with a common mean
zero and variances al z and azz respectively. show that the variates defmed by
az al
al az
are independent and that each is normally distributed with mean zero and
common variance (O:I Z + azZ).
Con"elation and Regreuion lo.t36

9. If X!>,X 2 and X3 are uncorrelated variables with equal M and


variances V 12, V 22 and V3 2 respectively, prove that ,correlation coefficient p

between ZI = and Z2 = is given by


V3 2
P =--;:=::::::::=::==:::::::::=:::::::::=::==:::::;=
'.}[(V1 2 + V3 2)(V22 + V3 2)

Hint. Neglecting the cubes and higher powers of ::' x! being the
deviation of Xi from M and the means and s.d. 's Qf ZI and Zz to be 11,12
and Slo S2 respectively, we get

11 = kL? =k 3
k(Xli +..M)(X3i + M )-1

L (I + i;) '{ I + i:y1


I

=
_1.
-N LJ
[(I -M+ x.1C. ')
Ml-'" + M-
X...Jl Xli X 3i
M2 + ...
]

V32
=1+ Ml
V32
Similarly 12 = 1 + W
.. h =12

Now -112

(l" S1 2 + i 12 = I + 3v32 !:J:. V32


Ml + Ml ,:'and so we have S1 2 =M2 + M2 •

Similarly Sz2 = +

Now NpSlS2 = 11 ) - - 12 )b (On simplification)

Hence p =Np SI S2 _ V32


SISz • "-;":=(=V==32=+:::;V;::2==2=)
to 10. (Weldon's Dice Problem). 'n white dice and m reo dice are shaken
and thrown on a table. The sum of the dots on the upper faces are noted.
e red dice are. then picked up and thrown again among the white dice left on
the The sum of the dice on the upper faces is again noted. What is the
COrrelation coefficient between the fll'St and the second sums?
Ans. n/(n + m)
11. Random variables X and Y have zero means. anti
G1-: If Z =Y - X, then find
variances
and tfle correlation coefficient p(X. Z) of
ar
and Z in terms of Gx. Gyand the correlation coefficient r(X; 1') of X and Y.
10·136 FundamentpJs 01 Ma1hematka1

For certain data Y = 1·2X and X = 0·6Y. are the regression lines. Compute
r(X. Y) and ax/ay. Also compute p(X. Z). if Z Y-X. =
[Calcutta Unifl. B.8c. (MtJtlu. 198f11
12. An item (say. a pen) from a production line can be acceptable
repairable or useless. Suppose a production is stable and let p. q. r (p + q + r
1). denQte the probabilities for three possible of an ilem. If the itellls
are put into lots of 100 :
(,) Derive an expression for the probability functiop of Y) where X and
Y are the number of items in the lots that are respectively in the rlJ'st
two conditions.
(ii) Derive the moment generating function of X and Y.
(ii,) Find the marginal distribution X.
(iv) Find the conditional distribution of Y given =90. :x
(v) Obtain the regression function. of Yon X:
(Delhi M4 (Eco.), 1985)
13. If the regression of XI on X2• .... Xp is given by :
E(X1 IX2• •..• Xp) =a +:P2X2 t P3X3 + ..• + PpXp
a22 a23 ••• alp
a32 a33 ••• a3p
, >0. =variances
..
)
aij = covanances
apl a p3 ••• a pp
then the constants ,a. P2•...• Pp are given by
a =III + !!n-
R •
2l.
11 <72
!!u
• 112 + R • • 113 + ••. + R .
11 a3 11 ap
• IIp

and A.._
t'}--R & ._. a. (j=- l ••...•
2 p)
11 Vj
where Rij is the cofactor of Pi} in the determinant (R) of the correlation matrix
Pn PI2'" Pip
P21 Pn .,. Plp
R=
!
I I
P,I Pp2 ••• Ppp
[Delhi Univ. M..Sc. (Stal.), iSM·
14. Let XI andX2 be random variables with means 0 and variances 1 and
correlation coeffICient p. Show that:
E[max (.KI1 • Xl 2)] 1 + V.l _ p'l
Using the above inequality. show that for random variables XI and X1. with
means III and Ill. variances all and all and correlation ci>efticient.p and for any
k > O.
10-137

or +..Jl- p 2;]
15. Let the maximum correlation between Xo and any linear function of
Xl.X2• •••• X" and if rOl =r02= ••• =rOIl =r
and all other correlation coefficients are equal to s, flIen show that:

R =r [1 + (:. _ 1)sJ / 2

16. If1= f(x, y) is the p.d.f. of BVN (0. O. 1. 1. p) distribution. verify


that : g,[ .2!L
()p = axay
Further. if two new random variables U and V are defm.ed by the relation
=
U =P(Z s x) and V P(Z s y) where Z - N(O. 1).
prove the marginal distributions of both II and V are uniform in the interval
( 1 1)
- 2' 2 andthe'If common vanance. . 12 • 1
Hence proveJb,a1 R =Corr. (U, V). satisfies the relation:
p = 2 sin (1CR/6).
[Delhi U"iu. B.A. (Stat. Hom. SpL CoUT'lll!). 1988] I'
17. If (X, y) - BVN (J.Lx. J.l.y •ai. a/,p). then.prove that a + bX + cY.
(b *' *'
0, c 0) is distributed as N(a + bllx + clly. blal + c2a/
+ 2bcpaxa y)'
[Delhi Univ. M.Sc. (StaL). IB89]
18. Let X It X 2. X 3 be a random sample of size n = 3 from N (0. 1)
distribution.
(a) Show that -Y 1 = Xi +- ax
3, Y2 = X2+ ax
3 has a bivariate nornial
distribution.
(b) Find the value of aso that p(Ylt Yz} =
(c) What transformation involving Y1 and Y2 wou,Id produce a
bivariate normal distribution with means J.l.l and J.l.2. variances (11 2 and a22• and
the same correlation coefficient p ?
=
Ans. (b) -lor l. (c) Zl alrt + J.l.lt Z2 a2Y2 + J.l.2.=
19. If (X, y) - BVN (0, O. 1. I, p), prove that:
=
E[max (X, Y)] [(1- p)/n]1I2 and E [mIn (X, Y)] =.., [(1 - p)/n]112
20. If (X, y) - BVN (0.0, a1 2 , al, p), show that rth cumulant of XY is
given by:
lC, = -I)! at' a2'.'[(p It 1)" + (p -1)"].
=
Deduce that E(XZ f2) a1 2 a·i (1 + 2p2).
21. Letl and g be the p.d.f.'s of X and Y with corresponding distribution
functions F and G. Also let
h(x, y) =f(x) g(y) [1 + a (1Jtx) - 9
(2G(x) - 1)] ; I a lSI.
Show that h(x, y) is a joint p.d.f. with marginal p.d.f.'s/and g. Further,
let I and g be N (0, 1) p.d.f.' s. Show that Z =:. X + Y, is not normally
distributed. except in the trivial case a =O.
10.188

Hint. 'Fmd M-rJ../) = E(elZ ) and use Cov (X. 1) = a/fr..


ll. State p.d.f. of bivariate normal distribution. Let X and Y have joint
p.df. of the form :
II ':\ L -![au(% - b l )2 + 2aI2(% - bl)(y- b2) + a22<i - bv 2);
J\%' y, = #foe
- 0 0 < (z. y) <

Find (i) k. (ii) the correlation coeffICient between X and Y.


23. Write down. but do not' the moment generating functi9n for a
pair of random variables which have a bivariate normal distribution with both
means equal to zero.
The independent random variables X. Y. Z. are each normally distributed
with mean 0 and variance 1. If U = X + Y + Z and V =X ..,. Y + 22. show that'
U and V have bivariate normal distribution. Find the correlation of U with
Vand the expectation of U when V is equal to 1.
24. Let Xl and X2 'have a joint m.g.f.
M(llt =[a<e'l + 12 -+ 1) + b(/1 +
in which a and b are positive such. that 2a + 2b =1.
Find E(XI). Var. (XI)' Var Cov (Xl'
,ADS. Means =I, Variances =k. Covariance = 2a - k:
25. X",X2' X, have joint distribution as a distribution with
parameters N.PltP2,P3' If'ij is the correlation coefficient between Xj'
find the expression for '12. '23 and '31 and hence the expression {or the
partia1 correlation coefficient '1023'
26. (i)' If all the infer-correlations between (p + 1) variates Xo• X.. X 2 ••••
Xp are equal to,. show that each of 'the partial correlation co-efficien1s of oriIer
p - 1 is equal to ,/[1 + (p -1),] ,and that the multiple correlation of Xo on XI'
X2 • .•• , ,Xp is given by
1 -R.2 - (1 - ,)(1 - p')
0(l2. •.p) - 1 + (p - I),
(u) '12 =('12.3 - '13-2'23.1)1[(1 -
• 27. If R denotes the multiple correlation co-efficient of XJ on X2• X3, ••••
in p-variate distribution. prove that
(z) R2 R02. where Ro is the of Xl with any arbitrary linear
fun(.;tion of X 2• X, •.•••
(if) R2 R 12. where R 1 is the multiple CQRelation coefficient of Xl with
X2 .X3 • •••• Xl. Ii: <P

n
p

(iii) 1 - R2 = (1 - r2lj.23.,,(j -1»


j-2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy