Correlation and Regression: (L) (Il) (Il)
Correlation and Regression: (L) (Il) (Il)
=
If (Xi. Yi) ; i I. 2 •... , II is the bivariate distribution. the!)
Cov (X, Y) =E[{X-E(X)} {(Y-E(Y)}]
=1 1:jx;-x) (v;-y)
II
=J.1IJ
ai =E{X-E(X»)2=11:(x.-x)2 ,
Il '
... (10'2)
(X, Y) 1 -- 2 I
'Co v y, ax ?
.
=0, and r =± I.
y
"
"
xJ...
I
-
f
. 0
I .'.: .
0
(r>O)
X 0 (r'< 0)
X 0
(r =0)
X o (r", +1)
X o (r=-·l)
X
:, r2(X. Y)
('i.a'bjl
=(2aJ?(D>.2 )' }VJtere
(a. =x· x)
' , _ ••. (*)
• I bi=Yl-Y'
We have Qle Schwartz inequality which states that if ai, bi; i == 1,2, ...• n
are real quantities then .
:[(X (Y "0
E ( X - IlX)2 +E
CJx
f CJy ,
. -'llxHY - IlY)]
± 2 E[(X
CJx CJy
1+ 1± Y) Q
-1 Y) 1.
Theorem 10'1. Correlaiion coefficient is independent of change of.origm
and scale.
X -·a Y- b .'.' .
Proof. Let Y =-h-' V =-k-' tliatX =a hg and Y,,:, b + kV.
where q. b. h. k are constants; h > 0, k > 0 . " •
We shall'prove·tJ;1at reX; 1') r(V, V)= -r
= +
Since X a .hV and y.= b + kV, on taking expectations, we get'
- - E(X) =a + JiE(l!) -and' E(Y) /j + kE(V)
. ., ,
= ,
.. X - = h[V - Y - §(Y) = '- E(l:'))
=> Cov (X. Y) =E[{X -.E(X)lr{Y -E(Y)l] '1
=E[h{V-E(U)} (k{V-E(V)}] "
hk= =
E(U»).{V - E(V»)] hk Cov (V. V) ... (10·4)
C1x2 = E[{X-E(X)}2]=E[h2{V="E(U))2]=II2CJrl
=> CJx = hCJu • (h > 0) _ • ...(104a)
CJ,) =E({Y -E(Y)}2] = P[k2 {V - E(\0},2] =
k2a,}
CJy =kCJv • (k > 0) , ... (10-4b)
10·4 Fundamentals of Mathematical statietiea
- I
X =-
n
U = 0, COy (X. Y) ='n-1 UY - X
--
Y =0
COy (X. Y) 0
r(X. Y)
C1x C1y
Thus in the above example, the variables X and Y un«orrelated. But on
cardul we find that X and Y are not independent but they are
wnnccted tty the relation Y = Xl. Hence two uncorrelated variables need not
lI\'I.'\'sS<lrily be independent. A simple reasoning for this conclusion is
"wt r(.\', n = 0, merely implies the absence of any linear relationship between
10-6
the variables .. X and Y. There may, however, exist some other, form of
relationship between them, e.g., quadratic, cubic or trigonometric.
Remarks. 1. Following are some more examples where two variables are
lJllcorrelated but not independent.
(i)X -N(O, I) and Y=Xl
Since X -N (0, I), E(X) =0 =E(}{3)
.. Cov ·(X, Y) =E (Xy) -E(X) E(Y)
=E(Xl) - E(X) E(Y) =0 ( .• ' Y =X2)
r(X, Y) -Cov (X, Y) =0
CJx CJy
Hence X and Yare uncorrelated,but not independenL
(ii) Let X be a r.v. with p.d.f.
r
(10·25)]
x y U =x -68 V = Y..-69 cP vz uv
- 65 67 -3 -2
,
9 4 6
66 68 -2
\
- -1 4 1 2
67 6S -1 -4 1 16 4
67 68 -1 - 1 1 1 1
68 72 0 3 0 9 0
69 72 1 I 9 3
70 69' 2 0' 4 0 0
72 71 4 2 16 4 8
Total 0 0 44f '24
.ff ::{!w=o,
n -' V=.!IV=O
n
1 - - 1
COy (U, V) -U V =gX 24 = 3
1 ..;., 1
GrJ =ii ICP- (U)2=i x 36: 4·5
1 1
Gv 2 =;;IV2-{v)2=i x44.=S.S
r = -1 + -
1 I. .:.:.L + -'. -
2N i ax ay
Deduce that -1 s;. r S + 1.
Uni". B.Sc. Oct. 1992; Mtulrtu Uni". B.Sc., No". 1991]
\
Solution. (i) Here = - -
and =-(Y; - Y) r.
-
R.H.S. =I-WI 1 (!i.ax -ay-
i
eorreJatlon'and:&.ep-ion 10·9
=1 __
1 [_1_ + _1_ LY? __2_ LXiYi]
2N CJr- i . CJ?- i CJxCJy j
1[1
:: 1 --2 1 .·CJy
-:2. CJx 2 + '-'2 . 2 2
- - - . rCJXCJy
. ]
CJx- CJr CJXCJy
'= 1 - 2I [1 + 1 - 2r] = r- •
(i.) Proceeding similarly. we will gel
1
R.H.S. = -1 + 2 (1 + 1 + 2r) = r ,I
Y/J.
I
Xi _ Yi =
CJx CJy
o} .'
Xi Yi 'V • = 1.2•...• n
-'" -+-=0
i",1 CJx CJy
respectively.
From (*) and (**), we gel
-1 r S 1
Example 10·4. The variables X and Yare connected by' the equation
ax + bY + c =O. Show that the co"elation between them is -1 if the signs 0/
a and b are aliJce and +1 if they are different.
[NGl!Pur Uni.,. B.Sc. 1992; DeW Uni.,. B.Sc. (SIal: Hon••) 1992)
Solutioll. aX + bY + c =0 => aE(X) + bE(Y) + c =0
.• a{X -E(X») + b{Y -E(n) ... 0
=- (X -E(X») (Y.-E(y)}
•. COy '(X. y) =E[ (X - E(X»)
. (Y - E(Y) H
FurdamentaJ.. otMatbematlcal Statlat1e.
= _f!E[(Y -E(Y)V1
a -
=_.f!.
a
ayl
_ £. . a y2 _ £. a y2
r= a =_-=a:.....-_
-fayl" ay2 I :1 ay2
.=.(CJx + [cov(x.n]·,
CJxkCJy)
- CJy
= (CJx +
(+ kay) (1 + r)CJx
,-
U and V will be uncorrelated if
r(U, V) = 0 => COy (U, V) = 0
i.e., if (c;sx + (I- + r) CJx = 0
=> CJx+kCJy=O
CJx
=> k =--
CJy
Example 10·7·. The "randOm variables. X ,and Y are jointly normally
distributed and U and V.are·dejine(!by ,"
.. U <;: X cos· a + f sin a,
V = y.' cos a - X sin a
Show thai V and V will be pncorre,lated if
2rCJxCJy
tan2a
." = CJx2 r- CJy. .2',
where r =. COTTo (X, fJ, = VaT (XJ and CJ; =- Var (V). Are U and V then
independent? .
[Delhi Uraiv. B.Se. (Stat. Bora••) 1989; (Math •• JI01&ll.), 1990]
Solution. We have
COy (lI, V) =E[(U - E(C!» [V - E(V)}]
=E[[(X -E (X)) cos a + (f -'E(y») sin a]
x [[f - E(Y») cos a - (X - P(X») sin a]]
Fundamentals Statistics
= cos2 a COy (X, Y) - sin a cos a.aJl
+ sin cos a.a,z - sin2 a (Cov (X. Y)
= ·(cos a - sin a) COy (X. f) - sin a cos a
2 2 - ay2)
= cos2a COy (X, Y) - sin a cos a - a,z)
U and V will be uncoitelated if and only if
r(U, V) = O. i;e., iff <;ov (U. V) = 0
i.e.. if cos 2a Cov (X. Y) - sin a cos a (ax 2 - ay2) 0 =
sin 2a
or if cos 2a r axCJy =-2-.- . (ax 2 - a,z)
2r axCJy
or if ·tan 2a = .2 . 2
ax - ay-
However, r(U. V) = 0 does imply that the variables.U and V are
independenL [For detailed discussion,see Theorerpl0·2, page 104.].
Example 10·S. [IX, Yare standardized random variables. and
1.+ 2ab
r(aX + bY, bX + aY) = a2 + b 2 " .(*)
find r(X. Y), the coefficient 01 correlation between X and Y.
[Sardar PaI,1 UIIi". B.Sc., 1993; Delhi UIIi". B.Sc. lStat. Hons.), 1989]
Solution. Since X and Y are standardised random variables, we have
. .
E(X) = E(Y) = 0 . j'
and Var (X) = Var (Y) E(X2) = E(f2) = 1
and Cov(X, Y) =
E (X Y) E(XY) = r(X,y).axCJt'
,.:(**)
= r(X,Y)
Also we have
r(aX + bY, bX + aY)
_ E[(aX + bY)(bX + aY)] - E(aX + bn E(bX +'an
- [Var·(aX + bY) .. Var{IfX + aY)]m
= E[abX2,+ a 2 XY + b 2 YX + aby2] - 0
{[a Var (X) + b 2 Var (Y) + 2ab COY (X.y)]
2
x [b 2 Var (X) + a 2 Var Y T 2ba Coy (X,y)])·l/2
. ab.l + a 2 rex, Y) + ·b 2 rex. Y) + ab.l
= ([a + b 2 .+ 2ab r(X, f)][bl + a 2 + 2ba r(X, Y)])
2 1(1.
[Using (**)]
_ 2ab + (a 2 + b 2). reX. n
- a 2 + b 2 + 2ab. r(X, Y)
From (*) and (**). we get
1 + 2ab _ (a 2 + b 2). r(X, -y) + 2ab
. a2 + b 2 - a 2 + b 2 + 2ab: r(X, Y).
Cross multiplying, we get
10·13
(aZ + IJ2) (1 + lab) + lab. r(X, y) (1 +"2ab) = (a2 + 1J2)2. r(X, y) + lab (a2 + IP)
:::> (a4 + !J4 + 202b 2 - 'tab - 4 a21JZ). r (X. Y) =(a2 + IJZ)
[(a2 _1JZ)2 - lab] r(X; Y) = a 2 + b2
a2 + b2
r (X, Y) =(a2 b2)2 _ 'tab
(a1 2 - (22)2 ,
= 40'12cr22 + (0'1 2 -'-.0'22)2
_ 0'1 2 -0'22
=> P- 20
[(0'1 2 - 0'22)2 +'40'1 22 cosec22a]lf2
Also aX + bY - V = 0 and eX + dY -v = 0
X . Y 1
-bY +' dV= -eV + aV= ad -be
1 be (dV -bV).
X = ad:... }
.. ,(***)
Y =ad be (-e V + a V)
- (cdGrJ + ab Gy2)2]
1
= (ad - bc)4
x tc2cP Gu4 + a2b2Gv4 + (a 2cP + b2c2) Grr. Gy2
- c2dlGU4-a2 b2Gv4 - 2abcd GrJ Gy2]
••.(2)
lit lit
But 1: (xu -XI) =0 and 1: (yu - y,) = O.
i-I i-I
lit
:. . 1: (xu - x) (Yli - y) =nl'l aXI aYI + nl dxltiyl [Using (2)]
•- I
Similarly. we will get'
"2
1: (XZj - x) (Y2j - y) =nz,z axz ayz + nz dxzdyz
j - I
Substituting in (3). we get the required formula.
(b) Here we are given:
nl =100. Xl =SO. YI =100. aXlz =10. aylz =15. 'I =0·6
n2 = 150, Xz =72. Yz = l1S,'axzZ =12. ayl = 18. 'z =0.4
100 x SO + x 72_ 75 .2
100 + 1-50 . --
10·17
= 0 • otherwIse
I
Example 10·12. The independent variables X and Yare defined by :
!(x) = f(y) = 4by.Osy.ss
= 0 • otherwlse
Show that: 'r..
b-a
Cov (U. V) =-b- ,
+a
where U =X + Y and V =X - Y
[I.LT. (S. Tech.), Nov. 1992]
Solution. Since the total area under probability curve is unity (one), we
have:
Jo
r r
1
f(x)dx =·40 Jxdx =1 => 2a,z= 1 => a=2r2 ... (1)
9
Jo fly)dy =
r ' I
1
4b Jydy =1 .•. (ii)
0
2x,
:. !(x)=4ax=,z ,OSxSr; and fly)
.
=4by =s 0 SYSs ... (iii)
Since X arid Yare independent variates.
r(X. Y) == 0 => Cbv (X. Y) = Q ... (iv)
Cov (U. V) = Cov (X + Y, X - Y)
= Cov (X. X) - 'Cov (X. Y)'+ Cov (Y. X) - Cov (Y. Y)
:'ar - a? [Using (iv)]
Var(U) =Var (X + Y) =Var (X) + Var (Y) + 2 Cov (X. Y)
=ax2 + a1- [Using (iv)]
Var (y) =. Var (X -. Y) =Var (X) + Var (Y) - 2 Cov (X, Y)
[Using (iv)]
10·18
'" (v)
We have :
o 0
- /:2 -4r2 r2 1
.. Vat (X) =E (Xl) - [E(X)]2 =2 - 9"= T8 =36a [From (i)]
Similarly, we shall get .1
2s s2 s'2 1
EQ') =]' E(y2) =2 and Var O}=18= 36b
Substituting in (v.), we get
. _ 1/(36a) - l'/(36b) b - a
/ (U, V) - 1/(36a) + 1I(36b)' ='1/ + a .
Example 10·13. Let the random variable X !lave the margillal-density
I 1
f. (x) = I, - 2 < x < .2
alld let tile cOllditional density of Y be I
=
f (y I x) 1, x < Y < x + I, - 2 < x < 0 (*)
= I, -x < Y < 1 - x, 0 <: x.< I
Show that the variables X and Yare ullcorrelated.
Solution. We have
2"• 2•
112
J
1
1 1 2
=2 x(2x + l)iU + 2 Jx (1 2x) dx
r 1
_1 0
2
=21[112 - 18'- j
12 + IJ"
8 =0
.. Cov (xy) =E(Xy) - E(X) E(Y) =0 => r(X. Y) =0
Hence the variables X and Yare uncorrelated.
EXERCISE 10(a)
1. (a) Show that the of correlation r is independent of a change
of scale and origin of the Also prove that for two independent·
=
variables r O. Show by an example that the converse is not true. State the
between which r lies and give its proof.
[Delhi Univ. M.Sc. (O.R.), 1986]
(b) Let P be the correlation coefficient between two jointly distributed'
random variables X and Y. Show that I P =
1 and that I p I 1 if, and only if X
and Y linearly related. [lndan Forest Service, 1991]
2. (a) Calculate the coefficient of correlation between X and Y for the
following:
X... 1 3 4 5 7 8 10
Y... 2 8 10 14 16 20
Ans. r(X. Y) =" +1
(b) Discuss the validity of the following statements :
(0 "High positive coefficient of correJation between increase in the sale of
newspapers and increase in the number of crimes leads to the conclusion that
newspaper reading may be rt)Sponsible for the increase in the number of crimes:"
(it) "A high positive value of r between the increase in cigarette smoking
and increase in lung cancer establishes that cigarette smoking is for
lung cancer."
(c) (i) Do you agree with the that "r == 0·8 implies 80% of
the data are explained."
(il) Comment on the following :
"The closeness of relationship between two variables is proportional to r".
Hint. (a) No (b) Wrong.
(d) By effecting suitable change of origin and scale, t!le product
moment correlation coefficient for the following set of 5 observations on
(X. Y) :
X: -10 -5 0 5 10
Y: 5 9 7 11 13
Ans. r(X. y) 0·34
10·20
4. (0) The following table gives the number of blind·per lakh of population
in different age-groups. Find out the correlation between age and blindness.
Age in 10-20 20--30 30-40 40-50
Number of blind
per lakh 55 67 100 111 150
Age in year 50-60 60-70 70-80
Number of blind
per lakh 300. 500
. ADS. 0·89
. (b) The following table gives the distribution of items of production and
also the relatively defective items among them, according to size-groups. Is
there ay correlation between·size and defect in 'quality ?
Size-Group: 15-16 17-18 18-19 19-20 20.....:.21
No. of Items: 200 270 340 360 400 300
No. of defective
items 150 162 170 180 180 120
Hint. Here we have to find the correJation coefficient between the size-
group (X) and the percentage of defectives (Y) given below.
Ans. r = 0·94.
5. Using-the formula
crx _y1= crX1 + cry1 - 2 r(X, Y) crx cry
obtain the correlation,coefficient between' the heights of fathers (X)' and of the
sons (Y) from the following data: '
x·: 65 66 67 68 69 70 71 67
Y : 67 68 64 72 70 67 70 68
6. (0) From the following data, compute the co-efficient of correlation
between X and Y.
X series Y series
No. of items 15 15
Arithmetic mean 25 18
Sum of sqlUlTes of deviations 136 138
from mean
of product of deviations of X and Y series from respective
arithmetic means = 122.
ADS. , (X. Y) = 0·891
(b) Coefficient of correlation between two variables X and Y is 0·32. i'iteir
co\'llriance is 7·86. The variance of X is 10. Find the standard deviation of)'
'es.
set! (c) In sets of variables X and Y with 50 observations each. the
following data were observed :
X=10. C1x =3. Y= C1y =2 and ,(X. y) =0·3
But on subsequent verification it was found that one X (= 10) and
one value of Y (= 6) were inaccurate and hence weeded out With the remainin&,
49 pairs of values. how is the origipal value of , affected? '
(}Iagpur Uni(). B.Sc., 1990)
Hint. IX = nX = 500. ry = nY = 300
IX2 = + X"Z) =5450. ryz =50(4 + 36) = 2000
, C1x C1y
LXY -)(
= Cov (X. y) = - n-
-u-
I
=: (1 -
value Jl and s.d. (J. The correlapon coefficient between any two Ks is p. Show
(ii) E i (Xi -
1
X)2 =(n -1)(1- p)G2. and (iiI) p > '- _1-1
n-
12. (a) If X aIld Yare independent random variables, show tflat
r(X + Y, X - y) = r2(X. X + Y) - r2 (Y, X + y),
"here r(X + Y. X - Y) denotes the co-efficient of correlation between (X + Y)
and (X - Y ). (Meenlt Uni". B.Sc.; 1991)
(b) Let X and Y be random having, mean 0, variance 1
COrrelation r. Show that X - rY and Y are uncorrelated and that }{l' - rY has
tnean zero and variance 1 - r2. ' ,
13. XI and X2-are two variables with zero means; variances (J12 and (J22
and r is die: correlation coeLlcient between them. Determine the
ues of die constants a and b which are independent of r such that XI + aX2
lind XI + bX2 are uncorrelated. .
,14. (a) If Xl and X2 are two random v.ariables with means and
(J12. (J22 an4 correlation coefficient r. fmd the correlation co-efficient
(b) Let X.. X2 be independent random variables with nwaos 111.112 and non,
=
zero variances cr12, cr22 respectively, Let U XI - X2 and Y XI X2· find the =
correlation coefficient, betweeQ (i) XI and U, Ui) XI and V, in of 1lJ,j.ll.
cr I2, cr2 2.
=
15. (q) If U aX + oY and V =bX - aY, where X and Yare meas,ured frolll
their respective and if U and V are uncorrelated, ,. the cQ-efticient or
correlation between X and Y is given by the equation.
cru cry =(a 2 + b2) crx cry (1 - 1'2)1/2 (Utkal UDiv. B. Sc., 1993)
(b) 'Let U =aX + bY and V =aX - bY where X, Y represent deviations frolll
the means of two measurements on the same individual. The coefficient or
correlation between X and Y is p. If U, V are uncorrelated, show
cru cry =2abcrx cry (1 - 1'2)112
16. Show that, if a and b are constants and I' is the correlation coefficienr
between X and Y, then the correlation coefficient between aX and bY is equal to
I' if the signs of a and b are alike, and to - t if they art( different.
Also show that, if constants a, band c are positive, the correlation
coefficient between (aX + bY) and cYis equal to
(arcrx + bcry) I ...J(a2crX2 + b 2cry2 + 2abrcrxcry)
17. If XI, X2 and X3 are three random variables measured frQm their
respective means as origin and of egual variances, find the coefficient of
correlation between XI + X2 an<;l X2 + X3 in terms of t:12. rJ3 and 1'23 and show
that it is equal to .
I2 1
( I.)r 2+ ,I·f 1'13 =1'23 =0,an d(") II =
1'12+
4
3 'f'
,I 1'13 = 1'23
I =
18. (a) For a weighted distribution (x;, Wi), (i = 1, 2, .... , n) shbw that the
weighted arithmetic mean Xw = IV; IV; > or < the unweighted mean l
:x = X;lll according as r.n.. > or < O.
(b) Given N values XI' X2, ... , XN of variable X and weights 11' .. 1V2, .... , WN.
exprpss the coefficient of correlation between X and W in terms involving the
difference between the arithmetic mean and the weighted mean of X.
19. (a) A coin is tossed /I times. If X and Y denote the (random) number 01
heads and number of tails turned up respectively, sflow that I' (X, Y) =-'-I.
HiDt. Note that X + Y =n => Y =" - X
.. I' (X, Y) =
I' (X, n - X) =
I' ( X, =-I' ( X, X) =-I.
(b) Two dice are thrown, their scores being a and b. The first die' is left on
the table while the second is picked up and throw:n again giving the score c.
Suppose:the process is repeated a large number of times. What is the. correlation
=
coefficient between X a + band Y a + c '? =
1
ADS. I' (X, Y) =2
20. (a) If X and Yare independent random variables with means III and 111
an<;l variances cr12, cri respectively, show that the correlation coefficient between
U =X and V =X - Y in terms of 11 .. 112. crl2 and cr22 is crll..J cr. 2 + cr22 .
1.0·26
U I. Yj and V =i-I
I. ,Xi + I.
k-l
Zk
and further
,
•
If m =-,
CJx
CJy
then k =- -.
CJx
CJy
(Gujarat Uni". M+, 1993)
23. Xl' X2, X, are three variables, each with variance CJ2 and ·the correlation
coefficient.between any two of them is'r. If X =(Xl + Xz + X3)!3, show that
Var (X) ='3(1 + 2;)
- (12
Ivar(U)
cov (U,v)
cov (u,V)
.'
vm; (y) = e d
I Ia b 121 var (X)
cov (X. Y)
cov (X, Y)
- var (Y)
I
25. If X is 8' standard normal' variate and Y =a + bX + eXl,
where a. b. e are constants, find the correlation coefficient between X and Y.
Hence or otherwise obtain the conditions when {i)'X-and Yare uncorrelated and
(ii) X and Yare perfectly correlated.
26. (0) If X - N (0, I), fmd corr (X. y) where y.,= 0 + bX + cXl.
[Delhi Uni". B.Sc. (Math •• Ron••), 1986]
10.28
" b
ADs. r(X, Y) =.1
'I b" + 2c"
(b) If X has Laplace distribution with parameters (A, 0) and
Y = 0 + bX + cX2, find p(X, Y)
[Delhi Univ. B.A. (Stat. HOM. Spl. Coru·.e). 1989]
• 1 '\
,lHnt. p(x) ="2)" exp [-AI X I], -00 < X < 00.
=
E (X24+') =0 =J.l.2k+ I ; E(J(11) J.I.2i =(2k)! 1).,2k
'Ab
Pxr - Vb2)..2 + 10c2
27. In a sample or' n random observations frOIn exponential distribution
with parameter A, the number of observations in (0, 1A) and (lA, 2A),
denoted by X and Yare noted. Find p(X, Y),
t/A
32. Let (X, f) be jointly aiscrete·random variables such that" each X and Y
have at most two mass points. Prove or disprove: X and Y are independent if
and only if they are unCQrreIated:
Ans. True.
33. If variableS XI' XZ, ... , Xz,. all !1l1,ve the same variance (J2 and the
correlation coefficient between Xi and Xi (i :f::J) has the same value, show that
II :lit 'i' •
34. The means 'Qf independent r.'!I's XI. X 2, ••• , XII are zero and variances
are equal, say unity. The correlation coefficients betWeen the :;um of selected t
«11) variables out of these variables and the sum of all n variables are found
out. Prove that the sum of squares ·of all these correlation is ....1C1-1'
[Burdwan Univ. B.Sc. (Bon8.), 1989]
35. Two variables U and V are made up of the sum of a number of terms
as follows:
U=X'I +Xz .+ ••• +XiI+YI +Yz + ....... Y .. ,
V =X I + X z + ... + ZI + Zz + ... + Z,.,
where a and b are all suffixes and where X's, Y's and Z's are all uncorrelated
standardised random variables. Show that the correlation coefficient between
U and V is n • Show furUter that
(n'+a)'(Ii+b)
(n + b')'U + (n ... a) V }
--' al _I . ...(.)
i ...' 1l =V (n + b) U -"V (n + a) V
are llIICOIICJated [South Gujarat Univ. RSc., 1989]
36. (a) Let the random variables X and Y have the joint p.d.f.
I(x, y) = 1/3; (x, y) = (0, 0), (I, 1) (2,0)
Compute E(X), V(X), E(y), V(y) and r(X-, n. Are X and Y stochastically
independent? Give reasons.
(b) Let (X, Y) have the probability :
1(0,0) =.0-45,1(0,1).= 0·05,1(1, 0) = 0·35,],(1, -I) = 0·15.
Evaluate VOO, \/\1') and p(X, n.
Show tha"
while X and Y are correiated. X
X and X - 5Y independent 7
X-5Y are 'Uncorrel&ted; Are
otherwise
(I) What is the of k ? What is the disttibution function .of X ?
(ill Obtain the density of the random variable Y =Xl.
(iiI) 01?tain correlation coefficient between X and Y.
(iv) Ate X and Y disttibuted ?
40(a). If/(x. ,Y) = 6'-x-y
8 ' ; 0 Sx S 2.2 S,y ..,
find "<i) Var (i,) Var (Y) -(iii) r (X. 10.
A (0) 11 (01\ 11 (00:\ 1.
ns. I 36' "'I 36 ' lUI - 11 .
(b) Given the joint·density of random •. Y, Zas :
/ (x. 'y. z) =k x exp [- (y +. z)], 0 < x < 2, y <!: 0, z <!: 0
= 0, elsewhere
Fmd
(i) k •
.(u) the marginal density function,
(iii) conditional expectation of Y, given X and Z. and
(tv) the product mqment correlation between X and Y.
[Madra Uni.,. B.Sc. (Main'Stat.), 1988]
10·29
(c) Suppose that the two dimensional random variable (X. Y) has p.d.f.
given by I (x. y) =ke-Y .0 < x'< y < 1
=O. elsewhere
Find the correlation coefficient rxr. [Delhi U,aiV. M.C.A., 1991]
41. The joint density of (X. Y) is :
j(x.y)=l(x+y), OSxS2,OSy S2.
Find 1.1.'r.r =E (xr Y') and hence find Corr (X. Y).
A .. ' - 2r +' [ 1 + 1 ] . r __ l
ns· ... (r+2)(s+l) (r+ Ih(s+ 2) • - If
(b) Find the'm.g.f. bf'the bivariate distribution:
j(x, y) = 1. 0 < (x. y) < 1
. =o. otherwise
and hence find 1:
Ans•. M (I .. Iz) (e'l .... '1) (e'z - 1)/(11 Iz); 11 ¢ 0, Iz.¢. O. r (X. Y)'.= O.
42. Let (X. Y) have joint.denSity :
j(x; y) = + y)1 (0, _) (x) ./(0, _) (y)
Find Corr (X. Y). Are X and Y independent.?
An$. Corr (X. Y) =0: X and Yare independenL
43. A bivariate distribution in two discrete random variables X and Y is
defined by the probability generating function :
exp [a(u - 1) + b(v - 1) + c (u - 1) (v - 1)],
=
simultaneous probability of X rny =s. where r and s are integers being the
coefficient of urv'. Find the correlation coefficient between X and Y.
= = +
'Hint. Put u ell and, v e'z in exp [a(u - 1) + b(v - 1) c(u - 1) (v - 1)].
the result will be the m.g.f. of a bivarl8te distribution and is given by
=exp [a(e 'l - 1) + b(e'z -. i) + - 1) (e.'z - 1)]
M(l.. lz)
So we have
E(X) = a, E(XZ) = a(a + 1). E(Y) == b. E(y2) =b(b + 1) and E(XY) = ab + c
•• reX. Y) = E(XY) - E(X) E(D _....£:....
V[E(X2) - {E(X»2] ;[E(f2) {E(Y»2]
44. Let the number X be chosen at random from among the integers I, 2,
3, 4 and the .number Y be chosen from among those at least as large as X.
Prove that Cov (X. Y) = 5/8. Find also the regression line of Y on X.
[Delhi Univ. B.Sc. (Math •• Ron ••), 1990]
Hint. P(X = k) = l ;k ... -1, 2, 3, 4 and Y X.
10·30 Fundamentals of Mathematicai Statistics
, i
P(Y=yIX= 1)=4";)'=
P (Y = .V I"X.= 2) = 1
3'·""\I' = 2 3 , 4
1
P (Y = Y I X = 3) = 2"')' = 3,4 ; P (Y = Y I X:;, 4) ;= 'I "y = 4.
The joint probability distribution can be obtained on using:
P (X = x, Y = y) = p (X = x) . fey = y I X- = x).
r (X, y) = Coy (X, n= . 5/8 =- fT5
qx(jy '\[41
. I'me 0 f Y on X : Y - E (y)
RegreSSion =r(jy
- [X -
(jx
E (X)]
45. Two ideal dice are thrown. Let XI be the score on the first dice ad X2,
the score'on the second dice. Let Y =max {XI' X21. Obtain the joirit'distribution
of Yand XI and show that
3
Corr «Y, XI) = ,1-
2 "'173
46. Consider an experiment of tossing two tetrahedra. X be tht:· number
of the down turned face of-first tetrahedron and Y, the larger of the two numbers.
Obtain .the joint distribution of X and Yand hence. p (X, y).
Ans. p (X, l') = COy (X, n 5/8 _2_
(jx (jy 5/4. '" 55/64 {U
47. Three fair coins are tossed. Let X denote the number of heads on the first two
coins and let Y denote die number of tails on the last two coins.
(a) Find the joint distribution of X and Y.
(b) Find'the conditional distribullon of Y given that X = t.
(c) Find COy. (X', y)
Ans. COy. (X, y.) ;::"-1/4.
48. For the trinomial distribution of two random variables X and Y:
n!
f(x,y) =x!Y !(Il-x-y) !pXq.l'(I_p_q)!I-X-Y
=
for x, y 0, 1, 2, .... , n an"d x + y S; Il, P 0, q 0 and p + q S; I.
(a) Obtain the marginal of Y
(b) Obtain E(XIY=y).
(c) Find p (X,Y').
Ans. (a) X-
B (n, p), Y - B (II,
(b) (X I Y=y) - B (n - y,
(Note: p + q::F. I)
(a) 0·6475 1.:;-:2 • (b) 0.6754 17. ' (c) 0.6547 '1 r2 ,
:i I I xl (x. y) = t x I/(x.
Similarly :x ., :1:; Y X
Y = N.l I IY
1<
1.I.y. , g(y)
y f(-x. y) = N
'1'.1
,
a,?- L LX2f(x. y).:.i' 2 =N.! L'X2 !(x) - i"2
JC :1 JC
BIYARIATE FREQUENCY TABLE (CORRELATION TABLE)'
.... ,,'" " 'J. '." . 0'
.- .
J( Series
-+ - Classes Total 0/
frequencies
Y • o/Y
Series Mid POints g(y)
J, Xl X2 ••• XiII··· ... X".
•
Yl
I.
.
I
Y2 ,-..
.
I " , . ;0..
10(
; ,
I
Yj I /(x,
- .!!.
. -<
, . I!
0
co
..
y"
.. , .
Total of N-+ l:l! f(x,y
x y
!(x) =, l:Jtx, y)
o/X . y
fix) - ,
-
10-20 4- i· 2 - 8
20-30 5 4 6 4 19'
30-40 6 8 10 11 35
40-50 4 4 6 8 22
50-60 - 2 4 4, 10.-
60.......70 - 2 3 6
;r,ola/ 19 22 31 28 100
..- .... ... ...... -- -
Calculate tM correlation· coefficient.
Fundamentals Statistic.
Solution.
CORRELATION TABLE .
",
u -1 0 1 2 ::i
;:..
JC _
18 19 20 21 Total v/:v) v2/(v) :t
v y Maries ftv) IH¥
(i) @ g --.:
-2 15 .10-20 8 : -16. 32 4
4 ;2 . '7-
G> @ g @ I
25 20-30 10 -19 19 -9
5 4 6 4
@ @ @ @
6 35 30-40 35 0 P 0
6 -8 10 11
g @ @ @
1 45 40-50 22 22 22 18
4 4 -6 8
@ (i) @ .'
2 55 50-60
4-
10 20 40 24
2 4
@ @ @
3 6S 60-70 6 18 54 15
2 3 1
Totalf(u) 19 22 31 ·28 100 2S 167- -52
'.
uf(uJ -19 o· 31 56 .
,; 19 0 3l 112 162
Il!. vf(Il, v) 9 0 13 30 52
v
.. -
Let . U =x - 19.. V= {(Y - 35)/I0)
- 1 68 OL8 - 1 25
u =N Lou,fu) =100 = ""." = N Lo \I g(\I) = 100 = 0·25
u .,
Cov (u,\I) = La. I.., U\I f(u, \I) - ii V = X - 0·68 X 0·25 = 0·35
1 - 162
CJrl =N u 2f(u) - U 2 = 100 - (0·68)'2 i= 1-1576
'2 1 2 - 167 2 •
CJy- =NLo \I g(v)-\l2=100-(O.25) =1·6Q75
v
r (U, V) = Cov (U, V) '-. - 0·35 -
CJu CJv 1.1576'x 1.6075'
eorreJation tmd Repwaion 10-35
1 3
0
8 8
2 2
1
8 8
Find the correlation coefficient between X and Y.
Solution.
COMPUTATION OF MARGINAL PROBABruTIES
IK -1 +1
3
g(y)
I)
1 i
8 8 8
2 2 Ii
1 i i i
- -
3
p(x) 1
8 8
Wehave:
3 S 1
E(X) '8+ 1 X '8='4
2 2
=-8+8=0
15 20 25 30 35 40
40 2
35 3 5
30 4 15
25 20
-
20 3 1
15 1
,
[Cakutta Una". B.Sc. (14ath •• ,Hon•. ), 1986]
3. Calcu1ate the correIallon
' coeffi' I ' table:-
lClent firom the f,0 Iowmg
4. (a) Find the correlation coefficient between age and salary of 50 workers
in a factory :
Age Daily pay in rupees
(in years)
20-30 5 3 1 .. , '"
30--40 2 6 2 1 '"
40-50 1 2 4 2 2
50-60 '" 1 3 6 2
60-70 '" ... 1 1 5
...
(b) Fnd the coefficient of correlation between the ages of 100 mothers and
daughters :
Age of mothers Age of daughters in years (Y) Total
in· years (X) . 5-10 10-15 15-20 25-30
15-25 6 3 9
25-35 3· 16 10 29
35-45 10 15 7 32
45-55 7 10 4 21
55-65 4 5 .9
Total 9 29 32 21 9 100 •
5 1.0 Total
find, the frequency distribution
of (U. V), where
10 30 20 50
X-7·S Y-15
.0 ,
U 2.5 I V = -5
20 20, 30 50 What shall be the relationship
,
between the correlation cocffic'ients
between X. Y. and U. V?
Total 50 50
0 100
"
6. (a) Find the ,coefficient of correlation between X and Y for the. following
table:
10.38 Fundamentals of Mathematica,l Statistics '
]1 ]2 Total
XI PI! PI1 P
, ..
Xl Pli I'll Q
0 1 2
Calculate E(X), Var (X),
Cov(X. Y) and r (X. n.
0 0·1 0·2 0·1
Probable error also enables us to find the limits within which the
population correlation coefficient can be expected to vary. The limits are
r ± p.E.(r). -
10·6. Rank Correlation. Let us suppose that a group of n individuals
is arranged in order of merit or proficiency in possession of two characteristics A
and B. These ranks in the two characteristics will. in general. be different. For
example. if. we consider the relation between intelligence and beauty. it is not
necessary that a beautiful individual is intelligent also. Let (Xi. Yi); i = 1. 2 •...•
n be the ranks of the ith individual iii two characteristics A and B respectively.
PeafSOnian coefficient of correlation between the Xj's and y;'s is called the
rank correlation coefficient between A and B for that group of individuals.
Assuming that no two individuals are bracketed equal in either
classification. each of the variables X and Y takes the values 1.2..... n.
Hence -X =y- =;;I (1 + 2 + 3 + ... + n) =-2-
n+1
1
crr-- LII x? -x2=-
n i-I
_ 1(
n
}2
+-
+ 22 + ... + n 2 ) - - 2 (n 1)
=n(n + 1)(2n + 1) _ (n + ·1)2 = n2 - r
6n 2) 12
n2 - 1
crx2 =---u- = crf2
In general Xi '#. Yi . Let di =Xi - Yi
di = (Xi - X ) - (yi - Y) (':x=y)
Squaring and summing over i from 1 to n. we get
L d? = L(Xi -x) - (yj - y»)2
=L(Xj -x)2 + L(Yi - y)2 - 2L(Xj.,..,i )(yi - y)
Dividing both sides by n. we get
1n Ldl =crx2 + crf2- 2 Cov (X. Y) =crx2 + crf2 - 2p crxcry/
where p is the rank correlation coefficient between A and B.
1
;;J:.ct? =2CJx2 - 2pcrx2 => 1- P =
II II
dl 6 L di 2
-1 j-,1 -1 i-I
... (10·7)
=> P - - 2ncrr - . n(n2 - 1)
which is the Spearman'sfo171llllafor the rank correlation coefficient.
Remark. We always have
LPi = L (Xi - Yi) = LXi -' LYi = n{x - i>= 0 (.: x =y)
This serves as a check on the calct;lations.
10·40 Fundamentals of Mathematical Statistics
10·6·1. Tied Ranks. If some of the individuals recei.ve the same rank in a
ranking or merit, they are said to be tied. Let us suppose that III of the
individuals, say, (k + I)"', (k + 2)"', .... , (k + III)'" are tied. Then each of these III .
individuals is assigned a common rank, which is the arithmetic mean of the
ranks k.+ I.k +2 • ....• k+m.
Derivatioll ofp (X, y):We have:
... (O)
where x=X-X.y= Y- Y.
If X and Yeach takes the values I, 2 ...... II. then we have
X = (II + 1)12 = Y
and /lCJor
"
?
= -2 11(11 - 212- I)- =an d nCJ = ""v. 2 n(11. 2 - I)
12
•.. (00)
=m ( k +
m +
2
I) 2 = III k
2
+
m (m
4
+ 1)2
- • + m k (m + 1)
Similarly suppose that there are t such sets of ranks to be tied with respect
to the other series Y sO that sum of squares due to them is :
, ,
.!. L m.'.(m.'2-I)=.!. L (mp-m:)=Ty,(say) .... (iO·7,b)
12 j = I J J 12 j = I J
Thus, in the case of ties, the new sums of squares are given by :
n(n Z - 1)
n Var'(X) = L xZ - Tx = 12 - Ti
, Z n(n Z - 1)
nVar(y) =LY ,-Ty = 12 -Ty
ad n Cov'(X, Y) = [L xZ -:fx + LYz - Ty - Ld2] [From (***)]
12
11_ L2 [Tx + Ty+ dZ]
n(n 2 -
p(X. Y) =---==----------
Z
[ n(n 12- I) - Tx JII2 [n(.n 2 - 1)
12 - Ty
JIll
n(n 26- 1) _ [Ld 2 + Tx + Ty]
Calculate the rank correlation coefficient for proficiencies of this group ill
Mathematics and Physics.
Solution.
Ranks in 1
Maths. (X)
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total -
Ranks in .1 10 3 4 5 7 2 6 8 11 15 9 14 12 16 13
PhysiC'S(Y)
d=X-Y 0 -8 0 0 o -1 5 2 1 -1 -4 3 -1 2 -1 3 0
tP 0 64 0 0 0 1 25 4 1 1 16 9 1 ;4 1 9 136
Rank correlation coefficient is given by
6 r. d 2 6 x 136 1 4
P = 1- n(n2 _ 1) = 1 -16 x 255 = 1- 5= 5= 0·8
Example 10·17. Ten competitors in a musical test were ranked by the
three judges A. Band C in the following order:
Ranks by A: 1 6 5 10 3 2 4 9 7 8
Ranks by B : 3 5 8 4 7 10 2 1 6 9
Ranks by C : is 4 9 8 1 2 3 10 5 7
Using rank correlation method. discuss which pair ofjudges has the nearest
approach to common likings in music.
Solution. Here· n = 10
Ranks Ranks Ranks
by A by B by C ·dl d,. d3 dl 1 d,.1
(X) (Y) (Z) = X-Y =X-Z =Y-Z
1 3 6 -2 -5 -3 4 25 9
6 5 4 1 2 1 1 4 1
5 g 9 -3 -4 -1 9 16 1
10 4 8 6 .2 -4 36 "4 16
3 7 1 -4 1 6 16 4 36
2 10 2 - 8 0 8 64 0 64
4 2 3 2 1 -1 4 1 1
9 1 10 8 -1 -9 64 1 81
7 6 5 1 2 1 1 4 1
8 9 7 - 1 1 2 1 1 4
Total rdl =0 !,dzl=60 !.di
=214
6r.d12 6 X 200 40 7
p(X. Y) = 1 - n(n2 _ 1) =1 10 X 99 = 1 - 33 = - 33
6r. dl 6 X 60 4 7
p(X. Z) = 1 -,r,(n 2 _ 1) = 1 - 10 x.99.= 1 It
()on'e1ation and Recr-ion 1043
{) L d32 6 X 214 49
p(Y. Z) = 1 - n(n2 _ 1) = 1 to x 99 165
Since p(X. Z) is maximum, we ·conclude that the parr of jQdges A and C
has the nearest approach to common likings in music.
10·6·2. Repeated Ranks (Continued). If any two or more
individuals are bracketed equal in any classification with respect to
A and B, or if there is more than one item with the same value in the series,
then the Spearman's formula (10·7) for calculating the rank correlation
coefficient breaks down, since in this case each of the variables X and Y does
not assume the values 1,2, ... , n and consequently, X:I;. y.
In this case, common ranks are given ro the repeated items. This commor.
rank is the average of lhe ranks which items would h?ve assumed if they
were sightly different from each other and the next item will get the rank next to
the ranks already assumed. As a result of this, followiqg adjustment or
correction is made in the rank correlation formula [c.f. (10·7c) and (10·7d)].
m(m 2 -1)
In the formula, .we add the factor J2 to Ld 2 , where m is the
number of times an item is repeated. This correction factor is to be adcled for
each repeated value in both the X-series. and Y-series.
Example 10·18. Obtain the rank correlation coefficient for the following
data:
X 68 64 15 50 64 80 75 40 55 64
Y 62 58 68 45 81 6() 68 48 50 70
Solution.
CALCUlATIONS R>R RANK CORRELATION
Rank X Rank Y
X Y (x) (y) d=x-y d2
68 62 4 5 -1 1
64 58 (; 7
75 68 2·5 3·5 -1 1
50 45 9 10 -1 1
·64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2·5 3·5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
!.d=O !,d2 =72
In the X-series we see that the value 75 occurs 2 times. The common rank
given to these values is 2·5 which is the average of 2 and 3, the ranks which
these values would have taken if they were different. The next value 68, then
gets the next rank which' is 4. Again we see that value 64 occurs thrice. The
common rank given to it is 6 which is the of 5, 6 and 7. Similarly in
10.44 Fundamentals of Mathematical Statistics
the Y-series, the value 68 occurs twice and its common rank is 3·5 which is the
of 3 and 4. As a result of these common rankings, the fonnula for 'p'
m(m2 -l) .
has to be corrected. To L tP we add 12 for each value where
m is the number of times a value occurs. In the X-series the correction is to be
applied twice, once for the value 75 which occurs twice (m =2) an<t then for the
value 64 which occurs thrice (m =3). The total correction for the X-series is
2(4 - I) 3(9
12 + 12 -2
. 2(4 -1) 1
Similarly, this correction for the Y-series is 12 - 2"' as the value 68
occurs twice.
6[LtP + +
2
!.J2
6(72 + 3)
Thus p = 1 - --n-(n""""2=---I-)- = 1 - 10 x 99 = 0·545
10"6'3. Limits for the Rank Correlation Coefficient.
Speannan 's rank correlation coefficient is given by
x 1 2 3 ... ... n -1
. n
Case 1. Suppose n is odd and equal to (2m + 1) then the values of dare:
d : .2m, 2m - 2, 2m - 4, ... , 2, 0, -2, -4, ... - (2m - 2), -2m.
.. L" dl =2 {(2m)2 + (2m - 2)2 ... + 42 + 22)
j .. 1
II
6 I. d?
Hence . P =J j - 1 _ 1 8m(m + 1)(2m + 1) .
n(n 2 -I) - -(2m + I){(2m + I)2-I)
=8m(m + 1) = 1 _ 8m(m + I) -1
(4m2 + 4m) 4m(m + 1)
Case n. Let n be even and equal to 2m. (say).
Then the values of d are
(2m - 1). (2m - 3) •...• 1. -1. -3 •...• -(2m - 3). -(2m - 1)
. . L d? = 2 {(2m - 1)2 + (2m - 3)2 + ••. + I2)
= 2[{(2m)2 + (2m _1)2 + (2m - 2)2+ '" + 22 + 12)
_{(2m)2 + (2m - 2)2 + ..• + 42 + 22)]
2
::; 2[1 + 22 + '" + (2m)2 _ {22m 2 + 22(m -'--1)2 + .•. + 22}]
always some loss of infonnation. Unless many ties exist. the coefficient of rank
correlation should be only slightly lower than the Pearsonian coefficient.
S. Spearman's formula is the only formula to be used for finding
correlation coefficient if we are dealing with qualitative characteristics which
cannot be measured quantitatively but can be arranged serially. It can also be
used where actual data are given. In case of extreme observations, Spearman's
form ula is preferred to Pearson's fonnula.
6. Spearman's fonnula has its limitations also. It is not practicable in the
case of bivariate frequency distribution (Correlation Table). For n> 30, this
fonnula should not be used-unless the ranks'are-given, since in the contrary case
the calculations are quite
EXERCISE lO(c)
·1. Prove that Spearman's rank correlation coefficient is given by
1 - 63Ld? , where d j denotes the difference between the ranks of ith
n - n
individual.
2. (a) Explain the difference between product moment correlation
coefficient and rank correlation coeffic\ent.
(b) The rankings of teD' students in twO' subjects A and B are as follows:
A 3 5 8 4 7 1-0 1 6 9
B 6 4 9 8 2 3 10 5 7
Find the correlation coefficient.
3. (a) Calculate the coefficient of,correlation for ranks-from the following
data :
(X, Y): (5; ,8); (10, 3), (6, 2), (3, 9). (19., 12),
(5. 3),
(6, 17), (12, 18), (8, 22). ,(2. 12). (10, 17),
(19, 20).
[CaUcut Univ. B.Sc. (Sub•• Stat.), Oct. 1991]
(b) Te'n recruits were subjected to a selection test to ascertain their
suitability for a certain course of training. At the end of training they were given
a proficiency test.
The marks secured by recruits in the selection test (X) and in the proficiency
test (Y) given below : -
Serial·No. 1 2 3 4 5 6 7 8 9 '10
X 10 15 12 17' 13 16 24' 14 22 20
Y : 30 42 45 46 33 34 40 35 39 38
Calculate product inoment correlation coefficient and rank correlation
coefficient. Why are two coefficients different?
4. (a) The I.Q.'s,of a group of 6 persons were measured, and they then, sat
for a certain examination. Their I.Q.'s and.examination marks were as follows:
Person: A B' C D E' F
I.Q. : 110 100 140 120 80 90
Exam. marks : 70 60 80 60 10 20
Compute the coefficients of correlation and,rank correlation. Why are the
correlation figures obtained different?
Ans. 0·882 and· 0,9.
eon-eJation and Recr-ion 1047
The difference arises due to the fact that when ranking is used instead of the
full set of observations, there is always some loss of information.
(b) The value of ordinary correlation (r) for the following data is 0·636 : -
X: ·05 ·14 ·24 ·30 ·47 ·52 ·57 ·61 ·67 ·72
Y: r·08 1·15 1·27 1·33 1·41' 1·46 1·54 2·72 4·01 9·63
(i) Calculate Spearman's rank-correlation (p) for this data.
(iJ) What advantage of p was br9ught out in this example ?
4. Ten competitors in a beauty contest an[ ranked by three judges as
follows:
Competitors
Judges 1 2 3 4 5 6 7 8 9 10
A 6 5 3 10 2 4 9 7 8 1
B 5 8 do 7 10 2 1 6 9 3
C 4 9 8 1 2 3 10 5 7 6
Discuss which pair of judges has the nearest approach to common tastes of
beauty.
s. A sample of 12 fathers and their eldest sons gave the following data
about their height in inches :
Father: 65 63 67 64 68 62 70 66 68 67 69 71
Son : 68 66 68 65 69 66 68 65 ii 67 68 70
Calculate coefficient of rank correlation. (Ans. 0·7220)
6. The coefficient of rank correlation between marks in Statistics and marks
in Mathematics obtained by a certain group of students is 0·8. If the sum of the
squares of the difference in ranks is given to be 33, find the number of student in
the group (Ans. 10). [ModrOB Univ. B.Sc., 1990]
7. The coefficient of rank correlation of the marks obtained by 10 students
in Maths and Statistics was found to be 0·5. It was later discovered that the
difference in ranks in two subjects obtained by one of the students was wrongly
taken as 3 instead of 7. Find the correct coefficient of rank .correlation.
6l:tP
Hint. 0·5 =1-- 10 x 99
" df =
" [4x? + (n + 1)2 - 2(n + l)2xi1
i-I i-I
_ n(n 2 - 1)
- 3
"
6 d?
p =1 i. 1
n(n2 - 1)
--1
:
This gives us
Xi + Yi =.n + 1. i = 1.2•...• n.
Hence the value of p =- t' obtained above is minimum value of p.
10. Show that in a ranked bivariate distribution in which no ties occur and
in which the variables are independent
(a) I. d? is always even. and
i
L" Yi = na + b L
" Xi ... (10·8)
;ml i-I
and
"
L X;Yi
"
=a i=1 "
L Xi + b L xl- ... (10·9)
i-I i-I
-
X - x -(Jx (Y -
=r Uy -)
l' ... (10·15a)
. as
(I) aa =0, asab =0 and ... (*)
a2s a2s
(ii) A =
aa 2 i)ai)b a2s
a2s a2s >0 and aaz>O '" (**)
abaa ab'l
Using (*), we get
as
aa :;: -2 E [Y -a-bX] = 0 ... (ii.j
as
ab = -2 E [X(Y - a - bX)] = 0 ... (iv)
=> E(Y) =a + bE(X) ...(v) and E(Xy) =aE(X) + bE ()(2) . .. (vi)
Equation (v) implies that the line (i) of regression of Y on X passes
through the mean varue [E(X), E(Y)).
MUltiplying (v) by E(X) and substracting from (VI), we get
E(XY) - E(X)E(Y) =b[E(X Z) - (E(X)}Z] '.
(X . Y) Z b _ COy (X. Y) _ rC1y ( .;\
=> Co v : z b C1x => - .2 - ••• VII,
C111 C1x
Subtracting (v) from (I) and using (vii), we obtain the equation of line of
regression of Y on X as :
Y _ E(Y) = COY Y) [X _ E(X») => Y _ E(y) =rC1y [X -E(X»)
C1 C1x
Similarly, the straight line by X =A + BY
and satisfying the residual condition
E[X - A -BY]2 = Minimum,
is called the line of regression of X on Y.
Remarks 1. We note that
azs
au- =2 > 0, and
azs azs
ab'l :;: 2E()(2) and aaab =2E(X)
Substituting in (**), we have
A =
azs azs
iJa2 . ab'l -
(aZs
I oai)b)
Y
=
4 [E(XZ) - (E(X»2) 4. C1J? > 0 =
Hence the solution of the least square equations (iii) and (iv), in fact,
provides a minima of S.
2. The regression equation (1O'14a) implies that the I}ne of regression of Y
on X passes through the point (i, Y). Similarly (l0·15a) implies that the line
of regression of X on Y also passes through the point ( i, y ). Hence both the
lines of regression pass through the point' (i, Y ). In other words, the-mean
i
10-52 Fundamenta1e oIMAthematical Statlstica I
values ( i. Y) can be obtained as the point of of the two regression
lines. '
3. Why two lines of Regression ? There are always two lines of regression
one of Y on X and the other of X on Y. The line of regression of Y on i
(1O·14a) is used to estimate or predict the value of Y for any given value of X
i.e., when Y variable and X is an independent variable.
estimate so obtained will be best in the sense that it will" have the minimurn
possible error as defined by the principle of least squares. We can also obtain an
estimate of X for any given value of Y by using equation' (lO·14a) but the
estimate so obtained will not be best since (10· 14a) is obtained on minimiSing
the sum of the squares of errors of estimates in Y and not in X. Hence to
estimate or·predictX for any given value of Y. we' use the regression equation of
X on Y (1O.15a) which is derived on minimising the sum of the squares of
errors of estim§les in X. Here X is' a dependent variable.9nd Y is an independent
variable. The t)Vo regression equations are npt reversible or interchangeable
because of the simple reason that the basis and assumptions for deriving these
equations are quite different. The regression equation of Y on X is obtained On
minimising the sum of the squares of the errors parallel to the Y-axis while the
regression equation of X on Y is obtained on the sum of squares of
the errors parallel tQ the X. -axis ..
In a particular case 9fperfect correlation, positive or negative, i.e., r ± I,
the equation of line of of Yon X becomes:
Y -y =±
CJx
=> .r...=-.i
CJy
= +_
..
- i)
(X.CJx' •..• (10·16)
Similarly. the equation of the line of regression of X on Y becomes :
X- i =± (Y - Y.,)
CJy
E(Y I x) = f: Y I (y I x) dy = J_: y I dy
IX
.. 1
(x) J00
-00
y f(x. y) dy =a + bx ... (2)
._h-.-d cr;
III I -- 1l11--
and a = Y,.,. cr; X
Substituting in (I) and simplifying. we get the required equation of the line
of regression of Yon X as
E(YIX) =Y + (X - X)
=
By starting with the line E (X I y) A + By and proceeding similarly We
shall obtain the equation of the line of regression of X on Yar
- J.l11 ....!. -- crx --
E(XI y) =X +2(Y - y)=X +r-(Y- Y)
crr cry
Example 10·19. Given
f(x, y) =Xe-x(I'+ /);x 20, y 2:0,
find the regression curve of Yon X. [B.H. Univ. M.Sc., )989]
Solution. Marginal p.d.f. of X is given by
=xe-X •
o
J e- X)' dy =xe- X l I
e-XY
o
=e- • x 0
x
(
1
i.e .. y=- => xy=1.
x
which is the equation of a rectangular hyperbola. Hence the regression of Y on
X is not linear.
Example 10'20. Obtain the regression equation of Y on X for the
following distribution :
f(x. y)' =(1 ;X)4 exp (- ; x. y 0
J J
Solution. Marginal p.d.f. 'of X is given by
oo 1
fl(x) = 0 f(x. y) dy =(1 + X)4 ye -y/(I+>:) dy
_ 1 . >0
- (1 + x)2' x_
The conditional p.d.f. of Y (for given X) is
f(y = x)2
(1 ; exP 0
Regression equation of Y on X is given by
fz(y) = JII yl
f(x, y) dx = JI
Iyl
14x ='1 -I y 1 ; -1 < Y < 1
=
10-06
E(YIX=x) = J x
-x
y.f2(ylx)dy=
JX Ldy::-.lyzi
1
_x'lx
x
4x
=0
-x
Hence the curve of regression of Yon X is y =0, which is a straight line.
E(XIY=y) = JX f • (xly)dx
(i)' reX. Y)
(ii) The two lines of regression
(iii) The two regression curvesfor the means.
Sohi'tion. The marginaJ.p.d'.f.'s.of X and r given by:
f. (x) , = ftx.y)dy;::kfo{x+Y)dY
,
2
:::), f.(x) =3'(I+x):OS;xS;1 .. ;(1)
E(Ylx) = fo ·/3
Y (y Ix) dy = 2(1 x) 5:y (X + y)dy
_
-2(1 +x)
1
2
+ y2r -
3
2 _ 3x
,;=0 -'3(x
+4
+ 1)
Similarly. we shall .get
E(X I y)
rl
= JOXf4 (x I y). dx = 1 +2 2y 510(x2+ xy) dx = 3(1
2 + 3y
+ 2y)
(ii.) Hent;e curves for means are':
3x + 4 2 + 3.1-
y = E(YIx) = 3 (x + 1) and x = E(Xly) = 3(1 +. 2y)"
From the marginal distributions we shall get
Var(X) =0 _11..
x- - 18 9 -162
1
(i) r(X. Y} = Cov (X. Y)_ - 8t ( 2 )112
Ox ' Oy 13 23 =- 299
162 x 8t
(ii) The two lines of regression are :
Y -E(Y) =·co:VjJo.Y} [X -E(X)]
lo.ss.
met X -E(X) , = n [Y -E(Y)]
;"= {I -r r2 ( axaXay
tan-1
2 + ay"
.. :(10,19)
tL
vz =0btuse ang1
e =tan- I { Ox
Z
.Oy
, Ox + Or
2
rZ -
• --
r
I} ,r > 0
-t
and (lO·15a)1 VI-
Y = Yand X =X,
as shown in the adjoining diagratn. (O,Y
V=Y teXt y_}
Hence, in this case (r =0), the lines X: X
of regression are perpendicular to
each other and are parallel to X- axis
and Y-axis:respectively.
3. The fact that if r =0 (variables'uncorrelated), the two lines of regression
= =
are perpendicular to each and if r ±l, e 0, i.e., the two lines coincide, leads
us to the conclUsion that for higher degree of correlation between the variables,
the angle between the lines is smaller, i.e ... .the two lines of regression are
nearer to each other. On the other hand, if the lines of regression make a larger
angle, they indicate a poor degree of correlation between the variables and
= =
ultimately for e 1t/2, r 0, i.e.. the lines become perpendicular if no
correlatiQn exists between the variables. Thus by ploUing the lines of regression
on a graph paper, we ,can. have an. approximate idea about the degree of
correlation between two variables under study. Consider' the following
:
1WOllNES 1WOllNES .1WOUNES 1WOllNES
COtNc;IDE COINODB >\PART (lOW APART(HIOH
(r=-l) (r=+l) Dl3GREEOF DEGREEOF
roRRELATION
- ay -
y = Y + r aX pC - X)
(' Y-
--= f r -,-- (X -X)
ay ax .
The residual variance s'; is the expected value of the squares of deviations
of the observed values of Y from the expected values as given by the line of
regression of Y on X: Thus
s'; =E[Y - (Y + (ray (X - X)/ax»)]2
=a1-E[Y - f _ r
ay
(X ax- =a1-E(Y*:..rX.)2
Webave
IOo62 Fundamentala of Mathematical Statietice
1\
=> .Oy= rOy
A A A
Also Cov (Y, Y) :: E[{Y - E(y)} {Y - E(Y)}]
Var{allce of X = 9.
Regression equations: BX - lOY + 66 = O. 40X -IBY =214.
What were (i) the mean values of X and Y.
(ii) the correlation coefficient X and Y. and
(iii) the standard deviation..!!f Y ?
[Punjab Univ. B.Sc. (Hons.), 1993]
Solution (i) Since both the lines of regression pass through the point
(X, Y), we have 8X .=. lOY + =0, and 40X - 18Y =214.
=
Solving, we get X 13"Y = 17.
=214 be the lines of regression
(ii) Let 8X - lOY + 66 = 0 and 40X - 18Y
of Yon X and X on Y respectively. These equations can be put in the form :
8 66 \ 18 214
Y =to X + to and X = 40 Y + 40
But since r2 always lies between 0 and I, our supposition is' wrong.
Example 10'25. Find the most likely price in Bombay- corresponding to
the price of Rs. 70 at Calcutta fr.om the following:
Calcutta Bombay
Average price 65 67
Standard, deviation 2·5 3·5
Correlation coefficient between the orices of commodities in the two
is [Nagpur Ulliv. RSe., 1993;
'" Sri Veiakoteswol'O Ulliv. B.Se. (Oct.) 1990]
Solution. Let the prices, (in Rupees), in Bombay and Calcutta be
denoted by Y and X respectively. Then we are give!)
X = 65, Y = 67, ax = 2·5, ay = 3·5 and r =r(X, Y) =0·8. We want Y for
X= 70.
Line of regression of }' on X is
Y ... Y = r ay (X - X)
ax
3·5
Y = 67 + 0·8 x 2.5 (X - 65)
When X = 70,
" = 67 + 0·8 x 3·5
Y 2.5 (70,- 65) = 72·6
Example 10·26. Can Y = 5 + 2·8 X and X = 3 - 0-5Y be the estimated
regression equations of Y on X X on Y respectively? Explain your answer
with suitable theoretical arguments. [Delhi Ulliv. M.A.(Eco.), 1986]
Solution. Line of regression of Y on X is :
Y = 5 + 2·8X => b yx = 2·8 ... (*)
Line of regression of X on Y °is :
X = 3 - 0·5Y => b xy = - 0·5 ... (**)
This is not possible, since each of the regression coefficients brx and bxy
must have the same sign, which is same as that of Cov (X. y). If Cov (x. y) is
positive, then both the coefficients are positive and if C6v (X.. y) is
negative, then both the regression coeffiCients are negative. Hence (*) and (**)
cannot be tile estimated 'regression equations of Y 6n,X and X on Y respectively.
"If the respective deviations in each series, X and Y. from their means were
expressed in units of standard deviations, i.e., if each were divided by the
r
·10-65
where crh crz are the standard deviations of X and Y respectively, and' p is ,the
correlation coefficient (Delhi Univ. RSc. (Math •• Hon••), 1989)
Interpret the cases when p = 0 and p = ± 1.
(Bango1ore Univ. B.Sc. 1990)
(b) If e is the acute angle' between the two regression lines with 'Yorrelation
coefficient r, show that sin e 1 - r2.
3. (a) Explain the'term "regression" by giving examples. Assuming that
the regression of Yon X is linear,'outJine a method for the estimation of the
coefficients in the regression line based on the random paired sample of X and
y, and show -that the varian'ce of the erroF. of the estimate for Y for the
regression line is cry2 (1 - p2), where cri is the variance of Y and p is the
correlation Coefficient between X and Y.
(b) Prove that X and Yare lineady related if and only if Pxy2 =.1.,further
show that the slope of the regression line is positive or nega.tivCl according as
p=+lorp=-l.
. ' . x .... 0 Y-c
(c) Let X and Y be two varIates. Defme X = - b - ' Y = - d - for
•
some f:onstants a, b, c and d. Show that the regression line (least square) of Y
on X can be obtained from that of r- on X·.
(d) Show that the, coefficient of correlation between the' observed and the
estimated values of Y obtained from the line of regression of Y on X, is the
same as that between X and Y.
4. Two variables X and Y are known to be related to each 'other by t)te
relation Y =X/(aX + b). How is the theory or linear regressic;m to be employed
to estimate the constants a and b from a set of n pairs of observations (Xi, Yi),
i =1,2, ..., n ?
1 aX + b b
Hint. Y = X =.0+X'
1 1·
Put X=Uandy=V
V =o+bU
S. Derive the standard error of estimate of Y obtained from the linear
regression equation o( r on X. What does this standard error measure?
6. (0) Calculate the tocfficien( of correlation from the following data :
X: 1 2 3 4 5 6 7 8 9
Y: 9 8 10 12 H 13 14· 16 15
10.66
AI,so obtain the equations of the lines of regression and obtain an estimate
of Y which s!tould correspond on the average to X =6·2.
Ans. r = = =
Y - 12 0·95 (X - 5), X - 5 0·95 (Y - 12), 13·14
(b) Why do we have, in general, two lines of regression '1 Obtain the
regression of Y on X, and X on Y from the following table and estimate the
blood pressure when the age is 45 years:
Age in years Blood pressure Age in years Illood pressure
(X) (Y) (X) . (Y)
56 147 55 150
...-42---... 125 49 145
12 160 38 115
36 118 42 140·
63 149 68 1 152
47 128 60 155
Ans. Y = 1·138X + 80·778, Y = 131·988 for X = 45.
(c) Suppose the observations on X and Y are given as :
X: S9 65 45 52 60 62 70 55 45 49
Y: 75 70 55 65 60 '69 80 .65 59 61
where N = 10 students, and Y = Marks in Maths, X = Marks in Economics.
Compute the least square regression equations of Y on X and of X on Y.
If a student gets 61 marks in Economics, what would you estimate his
marks in Maths to be ?
7. (a) In a correlation analysis on the ages of wives and husbands, the
following data were (,btained. Find
(I) the value of the correlation coemcient, and (il) the lines of regression.
Estimate the age of husband whose wife's age is 31 years. Estimate the age
of wife whose husband is 40 years old. J
30-45 18 32 15 12 8
45-60 2 28 40 16 9
60-75 - 9 10 8
(b) The following table gives the distribution of 'otal cultivable a,ea (X)
and area under cultivation (Y) in a district of 69 villages.
Calculate (0 the linear regression of Yon X.
10067
(il) the correlation coefficient r(X. y), and (iii) the average area under wheat
corresponding to total area of 1,000 Bigi\as,
;;
\)
, Total' area Yin Bighas)
I
1 400-600 ...
...
4 1
...
.. .
:3
2
... ,
! 1 1
=
(iii) ,y 308·7146 for X 1000= .-
8. (a) Compare and contrast the roles of correlation regression 'in
studying the iQteJ:-depen&nce of two variates.
FOI: 10' observations on price (X) and supply (Y) the following data were
obtained (in appropriate units).
=
LX = 130, I,Y =220, I,Xl' =2288, I,f2 5506 and I,XY 3467 =
Obtaill the line of regression of Y on X and estimate the supply when the
price is 16 units, and find out the standard error. of the estimate,
ADS. Y = 8·8 + 1·015X, 25·04
(b) If a number X is chosen at random from anlong the integers 1,2,3,4
and a number Y is from among those at least as l!tfge as X, prove that
s .
Cov (X. Y) =8
Find also the regression line of X on Y.
(c) Calculate the correlation coefficient from Ihe following data : -
N = 100, IX =12500 I,Y =8000
I,X2 = 1585000, I,f2 =648100 I,XY' = 1007425.
obtain the regression of Y on X . ' •
9. (a) The of a bivariate frequency djstribution are at (3, 4), and
r = 0·4. The line of regression o(Y op X is parallel to the line Y =X. Find the
two lines of regression and estimate the mean of X when:r 1. =
=
(b) For certain data, Y 1·2 X and X = 0·6 Y, are the regression lines.
Compute p(X. l') and ax/ay . Also compute p (X. Z), if Z = Y - X.
(c) The equations of two lines obtained in a correlation analysis
are as follows :
=
3X + 12Y 19, 3Y + 9X 46 =
Obtain (,) the value ofcorrelation coefficient,
(ii) mean values of X and Y,
Fundamenta1e ofMatbematical
3Y' - 5X + 180 = 0
3
=> X = 5 Y + s180
crx --:- 5 => ,- . 4 -- 1
bxy -- f .cry 5 or r =0.8
Since the lines of regression pass through the point (X, Y), we get
-
X =53 -Y + """5
180 3
= 5 x 44 + = 624
(c) Out,of the two lines ,of. regression given by
X + 2Y - 5 = 0 and 2X + 3Y- - 8 = 0,
which one is the regression line of X on Y?
Use the equations to find the mean of X and the mean of Y. If the variance
of X is '}2', calculate the variance of Y.
Ans.
(Q) The lines of regression-in a bivariate distribution are :
X + 9Y = 7 and Y + 4X = 4:
Find (l) the coefficient of correlation, (iit) the ratios cr; : crYl : Cov (X, y),
(iii) the means of the distr!bution and (tv) E(X I Y= 1).
(e) Estimate.X when Y= IO,if the two lines of are :
1
X = - Ii Y + A. and Y = -2x + J,l.
(A, J!) being unknown and the meal) of the distribution is at (-1. 2). Also
compute r, A. and J.1. [Gujarat Univ. B.Sc., Oct. 199.J} ,
11. (a) The following reSultS were obtained in th.e arullysis of dat1 on yield
of dry bark in ounces (Y) and age in years (X)'of 200 cinchOna plants :
X Y
Average 9·2 . 16·5
Standard deviation ·2·1 4·2
Correlation coeffiCient = +0·84
.." f •
m
N
A subsequent scrutiny showed that two pairs of values were copied down
as :
8
8
14
6
y
m 8
6
y
12
8
/I
i /;
ua = 0 = 2 ; _ 1 (xi cos a + Y; sin a - p) (-X; sin a + Yi cos a) .•. (2)
as /I
: \ =,0 = -2 L /;(x; cos a + Yi sin a - p) ...(3)
up ; .. I
ax- = N
.2 '-
x; (x; -x) and ar2 = Ii -
. , , y;(y; - y)
Find: (l) E(Y IX =x), (il) E[XY I:¥ =xi, (iiI) 'ry IX == x]
[Dellai Uni,,-. BSc. (Moth. Hon ••),· 1988]
(d) 1 -'
C:'
-
c:-
(e) rex. y) - (XI
(1% (1, 2
1
y) - {il{2'= -2
2
(j) Hint. I(x. y, a) =/.(x)/2(Y) [1 + a (2F(x}-I) (2F(y) -'1)] .
I a J < I, has the same margiiJals/.(x) andl2(Y).
26. Obtain regressi9n equatiPD of Y9D X for the distributions :
9 1 + x + y _ .J •
(b) f(x. y) =; (x + ; x! Y 0
M.Sc., 1992]
(VI) The greater the value of 'r', the better are the estimates obtained
through regression an8Iysis.
(vii) If X and Yare negatively correlated variables, and (0, 0) is on the
least line of Y on X, and if X = 1 is, the obser\red value then
predicted value of Y must be. negative.
(viiI) Let the correlation between X and Y be perfect and positive.
Suppose the points (3, 5) and (1,4) are on the regression lines, With
this knowledge it is possible to detennine the least squares line
lexactIy. .
If the lines of i
are Y. = X and X = Y + I, then p = and
E(X I Y =0) = 1.
(X) Ina distribution, brx =2·8 and bxy =0·3.
p. Fill in t..'te· blanks :
(l) The regression analysis measures ••• between X and Y.
=
(iI) Lines of regressiol! are ... if rxr 0 and they are ... if rxr ± 1. =
(iil) If the regression coefficients of X on Y and Y on X are - 04 and
- 0·9 respectively then the correlation coefficient between X and Y
is ...
(iv) If the two regression lines are X + 3Y -5 =0 and 4X + 3Y - 8 =0,
r
then the correlation coefficient between X and is ...
(v) leone of the 'regression coefficients is ... unity, the other must,be ...
unity. '
(VI) The farther the two regression lines cut each other, ... will. be the
degree of correlation.
(viI) When one regression coefficient is positive, the other would be ..•
(viii) 'The sign 9f regression coefficient is ••• as that of cOrrelation
coefficienL
(ix) Correlation coefficient is the, ... between regression coefficients.
'(x) Arithmetic of is '" correlalion'coeffi-
. cienL
(.u) When the correlation coefficient is :zero, ,the. two lines are
•. and when it is ± I, then the regression lines are ...
m. Indicate'tIie correct answer :
(i)' The regression line of Y on X (a) total of the squares of
'horizontal deviations. (b) total of the squares' of tht( vertical
devialions, (c) both vertical and horiZontal deviations, {d) none 'of
these.
(ii) The regression coefficients. b2 and hi, Then the correlation
. coefficient r is (a) bl/b,., (b) b;/bh (c) bib,. (d) ± VtJi b2 '
-'(iil) "Qte f8Jlber the two lines cut each other (a) tIle greater will
be the degree of correlation, (b) the lesser will be the degree of
correlation, (c) does DOt matter.
Fundamentals orMathematical Statistics
(iv) If one regression coefficient greater than unity, then 'the other must
be (a) greater than the first one, (b) equal to unity, (c) less than
unity; (d) equal t o ' z e r o : ' 'J' •
(v) When the correlation coefficient r = ±1, then. the two regression lines
(a) are perpendicular to each- other; (b) coincide, (c) an( parallel to
each·other, (if) do not exist. .; ,
·(vl) The ·two lines of are given as x + 2Y ..:. 5 ;:: 0 and
;2X + 3Y =8. Then ·the mean values of X and Y'respeCtively are (a)
2, 1, (b) 1,2, (c) 2,,5, (tf) 2, 3. ' -
(vii)' The tangeqt of the angle between, two regression lines is given as 0·6
and the s.d. of Y is known to be twice that of X. 'fheQ:'the value of
cOJelation coefficient' X andY is (a) (b) (c) 0·7,
(if) 0.3. r· " , •
IV. Ox and Oy are the standard deviations of tWQ coriehitea:-variables X an
Y respectively in a large sample, ·and r is the samp1e correlation
coefficient .
(I) State the "Standard Error of Estimate" for linear-regression of Y on
X. II
1bus the first suffix i indicates the vertical array while the second suffix j
indicates the- positions of y in that array. Let
If Yi and i denote the means of the. ith array and the 9veIflll ,mean
respectively. then •
I,.I,.fij Y ij I,.n i Yi T
1. = I hk
.., = 'I.. ni = Ii
j I J I
In other words y is· the weighted mean of all the array the weights
being the array frequencies.
Def. The correlation ratio of Yon X. usually denoted by 1lrx is given by,
__ 2 - 1 C1e1-.
1'\YX -. - C1il ••. (1021)
.
where C1ey2 and C1y2 are given by
C1eY2 =).
N Ii I.1i-
j I, I, I
(y .. - Y·)2' and C11- = 1.
N Ii Ii,
j
.. (y .. _ y)2 I, I,
A c011.venient expression' for 1)rx can be obtained in terms of stalldard
deviation C1mr of the means of the vertical arrays, each mean being weighted by
the array frequency.
We have
..
= ");./;j Qij -. y;'r + "f.,'I,./;j ( Yi '- )2 + Y I,.fij (Yij - Yi) (y'i -.y)
, I } ,.' I } , I ! '
The term.=2["f.,( Yi {");./;j -(Y;j y;)l1 'vanishes since I,./;j (yij - Yi) =0,
I }
N C1il •
I ) I
We have
Na
Illy
'l =I.j "
n· (Y. - y )'l= !liijl-Ny.= I. Tr - 12
j' jn; N
... (10·23)
where Yij is the estimate of Yij for. given value of X =Xi • say. as given by the
line of of Yon X i.e.. %
=a + bXi. (j =1.2•...• n).
But
, !
(Yij - y;)2 =Na;'; =Na? (-1-
"i;. "i;./;j
, J
- a - bXi)2 =Na'; ({- r2) (cf. § 10·7-6)
:. (*) => 1 -flyx2 S 1 _r2
i.e .• r:2 I flrx I I r I
Thus the absolute value of the correlation ratio can never be less thwUlte
absolute of r. the correlation coefficient. .,
When,. tJte regJ.:ession of Y on X is straight line of means of arrays
=
coincides with the line of regression and flyx2 r2. Thus flyx 2 - r2 is the
departure of regression from linearity. It is also clear (from Remark 1) that the
more nearly approaches unity. the smaller is a,,'; and. thetefore. closer are
the points to the curve of means of vertical
When flrx2 = 1. a,,'; =0 I. f.hj (Yij"" Y/)2 =0 .
Yij. =Yi • 'V j = J. 2 •...• n. i.e.. all the points lie on the curve of
means. This implies that. there is a functional relationship between X and Y.
fin is. therefore. the measure of the degree to which the association between the
variables approaches a functional-relationship of the form Y =F(X). where
F(X) is a valued function of X. [F(X) = a + bX).
3. [t is worth noting that the value of fI YX is not independent of the
classification of the data. As the class interVals become narrower llrx approaches
unity. since in that case 0:".'; gets nearer to a';. If the grouping is so fine that
only one item appears in each row (related to each x-class). that item will
constitute the mean of that column and thus in this case a",'; and a'; become
eqqal so that = 1. On the other hand. a very coarse grouping tendS to make
the value ·of flrx approach r. "Student" has given a formula for ·the correction'
()on'8lationand Reel-ion 10.79
r :0.'1YX : It
")tY
=0
(iii) If dots lie on a curve, such .that no ordinate cuts more than
Tbx =1 andif fuithennore, the dots are symmetrically placed about Y-axis, then
llXY = 0, r = O.
y •
...
x
(iv) IfTlrx > r, the dots are scattered around a definitely curved trend line.
EXERCISE IO(e)
I. (a) Define correlation and correlation ratio. When is'!he latter
a more suitable measure of correlation than tne former ? Show that the
correlation' ratio is never less than the correlatic.n coefficient. What do you infer
if the two are equal? Further, show that none of these can exceed one.
r.LkUii ....iv. I1Sc. (Stat. Bon••), 1988]
2 2
(b) Show that 1 TlYX rrx 0
Interpret each of the following stateIPents.
(i) r =0, (il) r2 = 1, (iil) Tl2 = 1, (i.v) Tl2 =r2 and (v) TI =0
(c) When the correlation cOefficient is equal to unity, show.that the two
correlation ratios are also equal to unity. Is the converse true?
(d) Define correlation ratio Tlxr and prove that
10·81
• .. 1 Tl 2XY r2, .
where r is the
coefficient of correlation betw.een X and Y: S11o,," further that
('16Y - r2) is a measure of non-linearity of regression.
2. For the joint p.d.f. .
f(x.y) [-x(y+ l)];y;>O,x>O
=
\,
0 ,. f otherwise,
fllld:
(I) Two lines of regression.
(if) The regression curves for the means.
(iii) reX. Y).
2 2
(iv) Tlrx and Tlxr .
nothing distinguish one from the other so that one may be treated as X-
variable and the other as the Y-variable.
Suppose we have Ah A 2 , ••• , A .. families with kh "2, ... , k .. members,
each of which may be represented as
Xll X21 .•..... :.•...•... Xii ..••••..••••.•.•• X"l
!
!
!
!:
;
I :
Xlj X'1J ................. X;j ................. XIIj
and let xij (i = 1,2, ... , n1j = 1,2, ... , k;) denote tht( measureillent on the jth
member in the ith family.
We shall have k;(k; - I) pairs for"the iib family or group'like (x;j' Xii),
j I. There will be L" k; (k; - i) = N pairs for all the n lamitieS or groups. If
;.• '1
we prepare a correlation table there will be k; (k; - I) entries for the ith group or
=
family and L k; (k; - I) N entries for all the n families or groups. The table is
i
symmetrical about the principal diagonal. Such a table is called an intra-class
correlation table and the correlation is called intra-class correlation. .
In the bivariate table Xii occurs (k; - I)'times, x.'loccurs (k; - I) times, ""
X;k' occurs (ki -: I) times, i.e .• from the ith family we have (k; -1) LX;j and
• I • j
hence for all the n families we have (ki - I) l:%ij as the marginal; frequency,
,
the table being symmetrical about principal diagonal.
== kL k;
.I. I. . (Xii - X ) (Xii - X ) ..... L: (Xii - i )2
i J-l '.1 J-l
CoJTEllation and-Regression 10·83
£f we write X; = X;/ kit then
J
k k
L
;
[:f :f (xij - x) (Xii - X)] = L;. [L
j = I 1= I _ j
(x·· -x) L
I
(X,,-X)] IJ 1
I. kiar
1 _ i
a 2I. ki(k i - I)
j
where ki' a; denote the number of members and the variance respectively in the
itn family and a 2 is the general variance.
(jiven n::: 5, aj = i. kj = i + 1 (i 5), find the least possible intraclass
correlation coefficient
2. What do you understand by intra-class correlation coefficient
Calculate it'! value for the following data·:
Family No. Height of brothers
1 / 60 62 63 65
2 59 60 61 62
3 62 62 64 63
4 65 66 65 66
5 66 67 67 69
3. In four families each containing eight persons, the chest measurements
of persons are given below. Calculate the intraclass correlation co-efficient
F,:mily 1 2 3 4 5 6 7 8
I 43 46 48 42 50 45 45 49
IT 33 34 37 39 82 35 37 41
ill 56 52 50 51 54 52 39 52
IV 34 37 38 40 40 41 44 44
10·10. Bivariate Normal Distribution.,. The bivariate normal
distribution is a generalization of a normal distribution for a single variate. Let
X and Y be two normally correlated variables with correlation coefficient p and
E(X) =Illo Var (X) = a1 2 ; E(Y) =1l2' Var (y) = a2 2• In_deriving the bivariate
normal.distribution we the following three assumptions.
(i) The regression of Y on X is linear. Since the mean of each array is on
the line of regression Y = p(az!al)X, the mean or expected value of Y is
p(az/al)X' for different values of X.
. (ii) The arrays are homoscedastic. i.e.• variance in each array is same. The
common variance of estimate of Y in each array is then given byai (1- p2),
P being the correlation coefficient between variables X and Yand is independent
ofX.
(iiI) The distribution of Yin different arrays in normal. Suppose that one
of the variates, say X, is dist{iputed normally with mean 0 and standard
deviation al so that the probability that a randorr. X will fall in the
small int.erval dx is
g(x) dx= k.
al (2n) .
exp (-X2/ 2a lZ) dx
- -1- {(.t-fAIf
- - - 2p -'--:....:.:....::--'-'''- +
_ 1 e 2(1- pl) 0.1
- 27tGIG2..J (1 _ p2) . •
, ( - 00 <x< 00. - 00 (10·25)
<y< 00) •••
where Ill> 1l2. GI (>0), G2 (>0) and p (-1 < P < 1) are the five parameters of the
distribution.
NORMAL CORIELAnON SURFACE
.
(X. Y) - N (0.0, I, I, p) or. BVN (0. O. 1,1. p).
2. The curve z =l(x., y) which is the equation of a in three
dimensions, is called the 'Norma' Correlation SUrface'.
10-86
Mxy (tto
x - III
=E [e tlX + t2 Y ] =I_:
Y - 112
f_:
exp (tl X + t1Y).f(x.y) dxdy
We have
(u 2 - 2puv + v2) - 2(1 - 1'2) (tl(11 u + t2<12V)
= [(u -pv) - (1 - p2)tl<11)2
+ (1- 1'2) {(v - ptl<1l - t2<1i)2 - t llcrj2 - t22 <1l - 2pI112<11<12} .,(*)
By taking
u - pv - (1 - 1'2) tl<1l = 00(1 - p2)1/:l} ......
dudv=Vl-p 2 dwdz
and v-ptl<11-t2<12=z
and using (*), we get
Mx. y(tto tv =exp[tllli + t:1J.l2 + i(tl2012+tl'(J,)?+ 2ptlt2<1l<1i)]
= exp [tllll 4- i
t2ll2 + (t1 2<11 2 + tl<12Z + 2ptlt2<1l<1i1] ... (10·26)
In particular if (X. Y) - BVN (0, 0, I, I, 1'), then
Mx. y (tto ti) =exp [k(t1 2 + t22 + 2ptlti)l ... (10·26/1)
Theorem 10'5. Let (X. Y) - BVN (Illt 1l2' <11 2, <122, 1'). Then X and Y
are indepenilent if and only if p =O.
Proof. (a) If (X. Y) - BVN (JJ.1o 1l2' <11 2, <122, 1') and I' =0, then X and Y
are independent [ef. Remark 2(a) to Theorem 10·2, page 10.5J.
Aliter. (X. Y) - BVN (JJ.t. 1l2' <11 2, <122, 1')
eorreJatlonandRecr-lon 10.87
1
Ix<J)
21tala"..J (1 - p,,)
= 1 exp [ - -
1
21tal " (1 - p") 2 al
X .J00
-00 exp [ - 2(1 _1 p2) { u - P (X - du
1r<Y) =J_:lxr<x, y) dx
=a2'&
Hence X - ']Ii (JJ.I,
ex p [ -
Remark. We have proved that if (X. Y) - B'lN (JJ. .. Il", al , a,,2, p),
2
then the marginal p.d.f.'s of X and Y are also normal:However, the converse
is not true, i.e., we may have joint p.d.f. I (X. Y) of (X, Y) which is not
eorrelationand Reer-ion 1(1-89
normal but the marginal p.dL's may still be normai as discussed in the
following illustration.
Consider the joint distribution of X and Y given by :
fx(x)=&[
= I r
e- /2; _ 00 < y < 00 ... (iO
Y - N(O, 1).
Hence if the marginal distributions of X and Yare normal (Gaussian),)1-
does not necessarily imply that the join{ distribution of (X, Y) is bivariate
normal.
For another illustration, see Question NU!llber 17, Exertise 10(1).
We further note that for the joint p.dJ. (10·27c), on using_(l) and (il), }Ve.b.ave
E(X) =0, CIxz = 1 and E(y) =0, CII = 1.
Fundamental. otMatbematical Statiatie.
/
..! Corr. ,(X, Y) =' Cov (X. y) = 0
is the prQbabili,ty
x, exp L- 2(1 _
of a unvarjJltp.
{x - 0 1+ P (y -
with mean
and variance given by
... (10·27d)
,
Similarly the conditional distribution of random variables Y for a fixed X is
fxy (x. y)
= fx(x)
1
=Th (12..J (1 - p2)
It is apparent from the above results that the array means are collinear, i.e.•
the regression equations-are linear (invol\'ing linear functions of the independent
variables) and the array variances are constant (i.e .• free fi0n:t independent
variable). We express this by saying that the regression equations ofY on X and
Xon Yare linear and homoseedastie.
For p = 0, the conditional variance V (Y I X) is equal to the marginal
variance (122 and the conditional mean E(Y"I X) is equal to the marginal mean
Ilz and the two variables become .independent, which is also apparent from joint
distribution function. In between the two extremes when p =± 1, the correlation
coefficient p provides a measure of degree of association or interdependence
between the two variables.
Example 10·27. Show that for the bivariate normal distribution
(ii)
aM =M(tl -+ ptl> aM-=M (t2 + PI))
I1tl I1tz
IOo92 Fundamentals of Mathematical StatUities
iJlM
011012 =OIl0 (oM) 0
012 =011 [M(/2 + P/1)]
= Mp + (/2 + P/1) (II +
02M oM oM
01 1012 - p/l - P/2 012
= [Mp + (/2 + p/I)(/1 + -:: p/l (II + - P/2(/2 + P/I)MJ
=M[/1/2 + P - p2/1/2l (On simplificalion)
= Mp .... ,(1 - p2)M11/2
iJlM oM oM
011012 = p/l a,;-+ P/2 012 + Mp + M (1 - p2)/1/2 ••• (*)
L L .:1; ':',
00 00
li,-llr l
£.j £.J J.lrs • (r - 1) , (s - 1) !
r=ls=1
= '[ L L
P
oo oo
.!l.!L
" ,+ p
r . s .
LL
oo
,
oo
,II' -
sJ.l,... -
r ,
12'
. S ,•
r=ls=O ,=Os=1
+P LL 1{12""
J.l,... r., S ,• + (1 - P2)
LL • 11'+1/2'+1
r ., S ,•
]
,=Os=O ,=Os=O
. . 12'-1
I {-I .
EquatlDg the coeffiCients of (r- Ij , . (s' _ 1) ! on both we get
J.l1'S" = [p(r - 1) J.l,-I."...I + p(s - 1)J.lr-I" -I- + p2J.lr_'1. ;:'1
- + (1 - pZ)(r - 1)(s - 1)J.lr-2, ,-21
J.In = (r + S - 1) PJ.l,-I. ,-I + (r - J)(s - 1)(1 - p2)J.lr_2. ,-2
In particuJar '
J.l31 = 3PJ.l2.0 +0 =,3pGI 2 =3p (.,' GI2 =1)
J.l22 = 3PJ.ll.l + (1 - = 3p2 + (1 - p2).l,
=(1 + 2p2) (.,' J.lll = =p)
Also =J.l30 =0
J.l12 = +0=0 ,(.,' J.lIO =0)
1123 =; + 1·2 (1 - p2)J.lo'1 =0
we will get J.l21 =0, =0
If r + s is odd, so is (r - 1) + (5 -1), (r -'2),+ (s - 2), iIlld-so on.
And since J.l30 =0 =J.lo3, J.l12 =0 ='J.l21 , J.l23 =0 =J.l32"" we finally get,
Correlation and Recr-ion
= +
J.lrs O. if r s is 9dd.
Example 10·28. Show thai if and X z are standard normal variates
with correlation coefficient p between them. then the correlatif)n coefficient
between XI Zana Xzz is glven by pZ l
Solution. Since XI and Xz. are two standard no-mal variates. we have
=
E(X I) =E(Xi) =0 and V(XI) E(XIZ) = 1 =V(Xi) E(XzZ) =
)
n
Mx •• Xz (t .. ti) =exp (ti Z+ + tz2)] [c.f. (10·26)]
E(X1Z XzZ) - E(X)Z) 'E(XzZ)
Now p(X)z.Xz2) =
...J [£(X)4) - (E(X)Z)}2] -\ [E(XZ4) - (E(XzZ)}Z]
, t z, t Z
where E(X)zXzZ) =Coefficient of fr. ftin M(t .. ti) =(2pZ + 1)
= ./.
21t. 2'1 (1 _.p2)
e"(p [- 4(1 1 2) { (1- p)u2+ (1 + p)v2
_.p
1] du dv
= 1
21t..J2(1 _ p) ..J2(1 + p)
. exp [_ u2 _ v2 du dv
2(1 + p)2 2(1- p)2 .
J
=[ _ 1 {
- u2 ]]
, flU..
ili ..J 2(1 + p) . exp . 2(1 + p,)2
u+v u-v
Now x =-2-'y =-2-
,.
ax ax 1 1
au av 2. 2 1
J = Q1 Q1 = 1 1 =-2
au av 2- 2
where C __
4ft...) :... ;'2
= Joo (
exp tQ - - Q) dxdy
2ft (1 - p2) -00 -00 2
.= 1
21t ...J (1 - p2) J -
r 00
00
J-
co
00
exp,[- (1 - 2t)]dxdy
.• MQ(t)... 1
21t...J (1 - p2) (1 - 2t)
x J J_:
_00
00
exp .[- ,2(1 p2) (u 2 - 2puv + v2 ) Jdu dv
1
= (1 _ 2t) • 1 = (1 - 2t)-1
Fundamentala ofMatbematical Stati8tie.
which is the m.g.f. of chi-square (xZ) variare- wilh-n \=2) degrees of freedom.
Example 10·31. Let X and Y be ".dependent standard normal variates.
Obtain the m.gf. of XY. [Gauh"ati M.Sc.,1992]
Solution. We have. by definition:
Since X and Y are independent standard nonnal variates, their joint p.d.f.
f(x. y) is given by :
/ f(x. y) =fi(x) .fz(y) = e-r12 e-r12 ;- 00 < (x. y) < 00
M Xy(t) =2Jt1 J 00
-00
J 00
-00
-! (X Z - 2txy + yZ)
e 2 dxdy
1
= 2Jt
JooJoo
-00 -00
1
exp [ - 2(1 - t Z )
..
... ( )
P(X+YSI6)=P(Z s
26/ V
where <I>(z) = P (Z S z), is the distribution functi.on of standard normal vilriale.
EXERCISE 10(f)
1. (0) Define conditional and marginal distributions. If X and Y follow
bivariate normal distribution, find (i) the conditional distri_bution of X given y
and (;,) the distribution of X. Show that the conditional mean of X is
dependent on the given Y, but conditional variance is independent of it.
(b) Derme Bivariate Normal distQbution. If (X. Y) has a normal
distribution, find the marginal density function/x(x) of X.
[Delhi Univ. B.Sc. (Maim. Hom.), 1988)
2. The marks X and Y scored by candidates in an examination in two
Mathematics and Statistics are known to follow a bivariate nonnal
distribution. The mean of X is 52 and its standard deviation is IS, while Y has
mean 48 and standard deviation 13. Also the of correlation between X
and Y is 0·6.
Write down the joint distribution of X aud Y. If 100 marks in the
aggregate !lJ'e needed fOI a pass in the examination, show how to calculate the
proportion of candidates who pass the euminatioJI ?
(b) A manufacturer of electric bulbs, ill his desire for putting only gOOd
bulbs for sale, rejects all bulbs for which a certain quality X of the
ftlarnent is less than 65 units. Assume that the quality characteristic X and lh('
life Y, of the bulb in hours are jointly normally distributed with parameters
below :
X Y
Mean 80 1100
Standard deviation 10 10
Correlation coefficient p(X. Y) =0·60
Find (i) the proportion of bulbs produced that will bum fOf, less ilian 1000
hours, (;,) the proportion of bulbs produced that will be put for sale, (iii) the
average life of bulbs put for sale.
3. (0) Determine the panpne,iels of the bivariate normal distribution:
Ax, y) =k exp [- :7 (x - 7)2 - 2(x - 7) (y + 5) + 4(y + 5)2 J J
Also find the value of k.
(b) For the bivariate normal distribution:
f(x, y) = 1
21tcrlcrl '" (1 - p7)
1 [(X-lll)Z 2 (X,.,.lll)
x exp {- 2( 1 _ pZ) 'crlZ - P crl
=J,"" J"
J.It J.I2
f(x, y) dx dy
'
( u = x - Ill, V =Y - Ill).
crl crl
Now proceed' as in Hint to Question Number 9(b).
6. Let the jrandom variables X apd Y be assumed to have a joint bivariate
normal distribution with
.
III =,Ilz, = 0, crl = 4, crz = 3 r(X. Y) = 0·8.
m Write do}Vn the joit;lt density of X and Y.
(il) Write down the regression of Yon X.
(iii) Obtain the density of X + Y and X - Y.
7. For the distribution 'of random van,abJesX and Y given by
dF= kexp [ - 2(1-:' pz) (xl •• 2pxy + yZ) Jdx dy; - - Sx S 00, -c:o S y 00-
Fundamentals ofMatbematical Statistic.
Obtain
(I) the constant k,
(U) the distJ;ibutions of X and Y,
(iil) the distributions of X for given Y and of Y for given X.
(iv) the curves of regression of Yon X'and of X 011' Y,
Md (v) the distributions of X + Y and X - Y.
8. Let (X. y) be a bivariate normal variable with E(X) = E(Y) = 0,
Var (X).J: Var (Y) = I and Cov Y) = p. Show that the random variable
Z= a CaUc.:1Y distribution.
[Delhi Univ. B.Sc. (Malhs. Hons.), 1989]
_1 [ (1 - p2)1!2 ]
Ans,!(z)-1t (l_p2).+(z_p)2 ,-OO<Z<OO.
9. (a) If (X, y) - N(IJ.", lJ.y, a"2, ai, p),
prove that
, 1 sin-Ip
P(X > IJ." n Y > lJ.y) = 4 +. 21t
[Delhi Univ. M.Sc. (Sial.), 1987]
(b) If (X, Y) - 0, 1, 1, p} then prove that
1 sin-I p
P(X > OnY > 0) = 4 + 21t .
[D.e(hi Univ. B.$c. (Sial. Honf.), 1990]
. p_
.. - 21t
1
-V 1 _ p2
Joo0 J1tI2
0
xp [_ r2 2) (1 - p SIn '29>] ,drde
2(1 ... P •
Now integrate frrst w.;. to, !hen w.r. to 9.
10. (a) Let XI and X2 be two indepenoent normally distributed variables
with zero means and unit variances. Let YI and Y2 be the linear functions of XI
and X2 defined by
Yj = ml + IIIX I + 112 X2, Y2 ,: m2 + 121 XI + 122 X2
Show that Y\ and Y2 are normally distributed with means ml and m2, variances
Jl20 = 1\1 2 + /122, 1-102 = 1212 + lxi, and covariance 111/21 + 112 / 22•
(b) Let XI and X2 be independent standard'normal variates. Show that the
variates Yh Y2 defined by
XI = a\ + bllX I + b 12X2 , >:2 = + b21 X I + b 22X2 .ate dependent normal
variates and find their and-variance.
"!nt. YI and Y2 , being ,inear combination of S.N.V's are also normally
l!istributCd: To prove that they ar:e.dependeQt, it i:; sufficieQt. to 'prove that
rO'\> Y2) O. [e/. Remark 2 to Theorem' 10·2) .
eon-eJationand Beer-ion 10·101
11. (a) Show that, if J( and Y are independent nonnal variates with zero
means and variances GI2 and G22 respectively, the point of inflexion of the curve
of intersection of the nonnal correlation surface by planes through the z-axis, lie
on the elliptical cylinder, .
}{2 f2
-
(f12+ - -1
(f,l-
(b) If X and Yare bivariate nonnal variates with standard deviations unity
and with correlation coefficient p, show that the regression of X2 (f2) on f2
(Xl) is strictly linear. Also show that the regression of X (Y) on f2 (}{2) is not
linear.
12. For the bivariate nonnal distribution:
£iF =k exp [- (x2 - xy + y2 - 3x + 3y + 3)] dx dy.
obtain (i) the marginal distri»ution of Y, and
(il) the conditional distribution of Y given X.
Also obtain ilie characteristic function of the above bivariate I}ormal
ditribution and hence the covariance betwecrn X,and Y •
• 3. Let/and g be the p.dJ.'s with corresponding distribution functions F
and G. Also let
h(x. y) -= j(x) g(y) [1 + a (2F.'(x) 1) (2G(y) - 1)],
where I a 1St, is a constant and h is a bivariate p.dJ. with marginal p.d.f.'s
f and g. Further let/and g be p.dJ.' s of N (0, 1) distribution. Then 'prove that:
Cov (X. y) =a/TC
14. If (X, Y) - BVN {JJ.J, 1l2' G12, (f22, p), compute the correlatiol)
coefficient between eX and eY •
Hint. Let U = eX, V = eY •
Il'n [e rX + sY]
=exp ['111 + SJ.I.2 + i(r2a12+s2a22 + 2prs)]
[c.f. m.g.f. of B. Vlj., <listribution : 'I = r, t2 = sJ
Now E(U) = Il/lo ; E(U2):.: Il' 20, E(UV) = Ilu' and so on.
- I
Ans. p(U,v) = 2 --. 2
[(ec:rl - 1) _l)]1/2
(e G2
15. If (X. y) - BVN (0,0, 1, 1, p), find E [max (X. Y)].
Hint. max, (X. Y) (X + Y) X- YI
ani Z =X - Y..., N rO.2 (1- p)] ,[c.f.Theorem 10·6J
Ans. E [max (X. Y)] = ( l=..Q....TC)112
16. If (X. Y) - BVN (0, 0, I, I, p) wilh joint p.d.f.j(x. y) lhen prove that
(a) P(XY>O)
10·102 Fundamentals olMatbematieal Statistics
..J l' {I
e'xp - 2(1 _ p2) (x 2 - 2pxy + y2) }]
=2 [
1 21t 1 _ p2
f(x. y) .
+ .1 exp {- 2(1 2) (x 2 + 2pxy + X2)}
21t..J 1: _ p2 P
- 00 < x< 00, - 00 < y < 00
w=K. z= 1
(Jl ' ..J (1 _ p2) (J2 (J,
Show that
CJ(w. zl_ I
CJ(x, y) - (J,(J2..J (1 _ p2)
correJatlon and Rear-Ion
!n1 W 2 + Z2 = 1
(1 - p2) <112
[X2 _ 2pX Y +
<11<12
.r=-J
<122
])educe thauhe joint probability differential of Wand Z is
1 - Z2)JdwdZ
and hence that 'W, Z are independent normal variates with zero means and unit
S.D.'S [Meerut Univ. M.Sc., 1993]
Hence or otherwise obtain the m.g.f. of bivariate normal distribution.
22. From· a standard bivariate normal population, a random sample of n
=
observations (Xj, Yj), (i 1,2, ... , n) is drawn. Show that the distribution of
-:!' _00_00
foo Joo
-- --
exp[- (ax2 + 2hxy + by2)] dxdy =-V 1t
. ab _.h 2
and simplify.
10·11. Multiple and Partial Correlation. When the values of one
variable are associated with or influenced by other variable, e.g., the age of
husband and' wife, the height of father and son, the supply and demand of ,a
commodity and so on, Karl Pearson's eoefficient.of correlation can be used·as a
measure of linear relationship between them. But sometimes there is
interrelation between many variables and the value of one variable may be
influenced by many others, e.g .• the yield of crop per acre say (XI) depends
quality.oJ seed fertility of soil (X 3), fetilizer used (X4 ), irrigation
facilities (Xs). conditions (X6 ) and ,so Whenever we are interested in
studying \.be of a group of variables upon a variable not included in
group, our study is that of f!lultiple correlation and mult!ple regression.
Suppose in a trivariate or. multi-variate we are interested in
relationship between two variables only. The are two alternatives, viz., (i) we
10·104
consider ·only those two members of the observed data in which the other
members have specified values or (ii) we may eliminate mathematically the
effect of other variates on two variates. The fllst method has the disadvantage
that'it limits the size of the data and also it will be applicable to only the data in
which the other variates have assigned values. In the second method it may not
be possible to eliminate the entire influence of variates but the linear effect
,can, be easily eliminated. The correlation and regression between only two
variates eliminating the linear effect of other variates in them is called the partial
correlation and partial regression.
10·11·1. Yule's Notation. Let us consider a distribution involving
three randoin variables X I, X 2 and X 3' Then the equation of the plane of
regressiortof Xl onX2 andX3 is
Xl =a + + b 13."x3 ••• (10·28)
Without loss of generality we can assume that the variables Xl' X2 and X3
have been measured ftom their JespecUve means, so that
E(XI ) =E(X,) =E(X) =0
Hence on taking expectation of both sides in (10·28), we get a =O.
Thus the plane of regression of Xl on X2'and X3 becomes
'Xl =b12.3 X2 +b13.i X, ••• (10,2&)
The ,coefficients b l2 .3 and b 13•2 are known as the partial regression
coefficients of Xl' on X2 and of Xl on X3 respectively.
=
el.23 biB X2. + b l,.2 X3
is called the estimate of X I as given by the plane of regression (10·28a) and the
quantity
= X I,- b 12•3 X2 - bl 3-1 X 3,
is called the error of estimate ot residual.
In the general case of n variables Xl> X2, ••• , X".the equation of the plane of
regression of Xl onX2,X" ••. ,X" becomes J
variable. Thus of the two primary subscripts. fonner refers to dependent variable
and Ille latter to independent variable.
The order of a residual is detennined by the number of secondary
subscripts in it, e.g., XI.Z3 • XI.Z34 •...• XI.Z3 ..... are the residuals of order 2.3 •
.•.• (n - 1) l'tfspectively.
Remark. In the following we shall assume that the v:uiables
under consideration have been measured from their respective meanso
10.12. Plane of Regression. The equation of the plane of regression
of Xl on X z and X3 is
=
XI bIZ03Xi'+ b I3 .Z·X3 ... (10·29)
The constants b's in (to·29) are by the principle of least
squares. i.eo, by minimising the sum of the squares of the residuals. viz .•
S =l:Xl.Z3 z ,= l:(X l -b 1Z.3 X Z -b 13.ZX 3)z.
the summation being extended to the given values (N in number) of the
variables.
The nonnal equations for estimating b1Z.3 and b l 3-z are
:lbOS
CJ lZ.3
= 0 =- 2 l:XZ(Xl-blZ03XZ-b130ZX3)}
as ... (10·30)
CJ 13.Z
= 0 = -2 l: X 3(X 1 - b 1Z.3 X 2 - b I3 .Z X 3)
!
GiZ.= l: X? Cov (Xi. Xi) = l: Xi Xi }
. Cov (Xi. Xi) l: XiX; ... (to·30e)
an rii = GiG)' = N GiG)
Hence from (to·30b). we get
rlZ GIGZ - b 12•3 dl- b l 3-2 rZ3 GZG3 = 0 }
2 0 .•. (10·30d)
I rl3 GI G3 - b IZ 3 rZ3 G ZG 3 - b 13·Z G3 = .
0
... (1O·31a)
If we write
1
·(10-32)
and (j)jj is the cofactor of the, element i,n the ith row andjth column' of <0, we
have from (10·31) and (1O·31a)
01 <012 01 <013
b\2.3 =- - . - and b l3 .2 =- - . - ...
02 <011 03 <011
Substituting these values in (10·29), we get the of the
plane of regression of XI on X2 and X3 as
XI =- °1 . X2 - <!1.. X3
02 <0. l' 03 <OIl
XI X2 X3
- . <011 + - . <012 + - . <0\3 =0 ... (10·34)
01 02 03
Aliter. Eliminating the cOefficient b 12.3 and b l3 .2 in (.10·29) and (10·30d),
the required equation of the plane of regression of XI on X2
XI X2 X3
=0
rl30103 r23¥3 032
Dividing C .. C2 and C3 by 0., 02 and 03 respectively and also R, and R3
by 02 and 03 respectively, we g e t ' .
=0
r13 r23 1
!1. COll + X3
COl2 + -, COl3 0 =
01 02 03
where <Oij is in (10·32).
10·12·1. Generalisation. In general, the equation of the plane of
regression of XI on X2 ,X3 , : •• XII is
XI =bI2.34... IIX2 + b l3.24 ..."X3 + ... + OIIl.23 ... (II-l)XII ... (10·35)
The sum'of the squares of residuals is given by
S =L X21.23..."
Correlation and Regression 10.. 107.
=0 -2l: X1(X I - b l l.34 ... "X1 - b u .14 ... "X3 - ... - bl"'13'... (" -lye,,)
ab IZ·34 ...•
oS
_ = 0 = -2l:X3(XI - bll.34 ...• Xl-bu.14 ... "X3 - ... - bl".13. (,,_I)X,,)
abI3·14... •
Xt X2
rlzC1tCJz G 22
r13GtCf) r2)C1zCf)
=0
Dividing C h C2 , ••• ,·C" by G\> Gz, •••• G" respectively and R2 , R 3 , ••• R"
by 02, G3; ••• , G" respectively, we,get
Xl X2 X3 X"
Gt G2 C1) G"
rt2 1 r32
1
=0 ... (lO·37)
1
10·108
If we write
1 ri2 ,rI3' rl..
r21 1 rZ3 r:7JI
r31 r32 1 r311
CO= ... (10·38)
"'{Ill rto. 1
r ti3
and OJij is the cofactor of'the element in the ith row andjth column of OJ, we get
fMm ,(10·37)
XI
- . OJll
(JI
+-
X2
(J2
X3
COl2 of - OJI3
0'3
+ ... + -
X ...
(J..
COl .. =U ••• (10·39)
Since each of (JI> (J2, OJll and OJ22 is non-negative and OJI2 = OJ21> [c/o
Remarks 3 and 4 to §10·14, page 10·113] the sign of each regression
coefficient b l 2-34..... and bzl.34..... depends on eDt;.
eorreJationand Reenuion 10·109
(J2
1·23 ... "
= 1.
N L[X" 1. LX2
_ E(X 1·23... ,.}]2 = N ' 1·23 ... "
1 1
=N LX1·23 ... " X 1·23..." = N LX IX 1·23 .. ".
(c/. Property 2 § 10·13)
=N1 L XI (Xl"" b I2.34 ... " X 2 - b.JJ.24.,." X3 - '" - b!ll.23 ... (" .... I) X,.}
= (J1 2 - b I2.34 ..." r12(JI(J2 - bl3 24... " r13(JI(J3 - ••• - b l".23... (" -I) rl,,(JI(J,;
:;:> (J1 2 - (J21.23 ... " = b I2.34..." rI2(JI(J2 - b 13.24 ... " r13(JIO-3 - .,.
=0
Dividing R10 R1 • ...• R". by (J1o (J2 • ••• , (JII respectively and also C h C l •
. . .• C II by (JI. (J2: ••• ,"(In respectively. we get
(J2 1·23 .. II
1.- rl2 rill
(Jll
rl2 1
=0
..
rl .. rlll 1
(J21.23 .....
1- rlZ riA
(J1 2
rlZ +0 1 rlll
=0
rill +0 rlll
Correlation and Rei:r-ion 10·111
I TI2 TI .. G21.23.....
GI2 TI2 TI..
1 T'bt
oJ .0 1 T'bt
=0
ro G 2 1·23 ..... ro =0
=> - 11
••
G7
1·23...,. -
-G 2 J!L
1 0>11 ... (1043)
Remark. In a tri-variate distribution.
ro
GI.232 = GI2 - ... (1043a)
rolf
where ro and roll are defined in (10·32).
10·14. Coefficient of Multiple Correlation. -In a tri-variate
distribution in which each of the variables Xl> X2 • arid X3 has N observations.
the multiple correlation coefficient of XI on X2 amI X 3 • usually denoted by
RI.23. is the simple correlation coefficient between XI and the joint effect of X2
and X3 on XI' In other words R1.23 is the correlation coefficient between XI and
its estimated value as given by the plane of regression of XI on X2 and X3 viz.•
el.23 = b12.3X2 + b13.2X3
We have XI,23
=> el·23 = XI -XI.23
Since X;'s are measured from their respective means. we have
E(X I .23) = 0 and E(el.23) = 0 (.: = 0; i = 1.2.3)
Rydef.•
R _ Cov (XI. el.23)
.. (I044)
1·23 -
Cov (XI' el.23) =E[{XI ,...E(XI»){el.23 -E(el.23))] =E(XI el.23)
=N1 .L. XI el·23 'N.L.
1 ,
XI (XI -XI.23)
='N1 LXI2 - I 1 1
N LXIXI.23 ='N LX 12 - N LXZI.23
=GI2 - <11.232 (cf. Property 2, § 10·13)
1 1 2
=i(L X I 2 + N I.X l·232 -N I.Xl.232
=al 2 -al.232 (cf. Property 2, § 10'13)
2
al - al.23 2
R 1.23 =-;:::::::;:=::::::::::::::=====
2
'" al2(a l -
- _ al 2 - al.23 2 _ 1 al.232
R21.23",
=> 2 - - 2
\ al al
al.232
1 -R2l-23
al 2
=
Using (lO·43a), we get
... (1045)
where
1 rl2 rl3
<0 = r21 1 r21 =1 - rii-·- rl32 - + 2r12r13r23(On simplification).
r31 r31 1
CO'll = I 1
r:u
r73
l
I = 1- r23 2
1 1
=NI.X l el-23 .•. ,, = NI.Xl(XI-Xl'23 ... ,,)
1 1
=N I.X12 - Ii I.X.Xl.23... "
=N1 I. X12 - 1•
N W l .2:! ..." =a1 2 .... a21.23 ..." ( ... (*)
1 '1 r
V(el.23 ...J =Nl! e21.23...,,= 1i"i;(XI -Xl.23...,,)2
eon-elation and Regreesion 10·113
COlI ."
3. Since R 1.23 is the simple correlation between X I and el.l3, it must lie
between -1 and +1. But as seen in Remark 1 above, R I .23 is a,non-negative
quantity and we conclude that 0 s R S 1. ./
4. If R 1.23 = 1, .then association is perfect and all the regression residuals
are zero, and as such <121.23 =O. In ths case, siqce XI = el.23, the predicted value
of XIt the multiple linear regression equation of Xl on Xl and X3 may be said to
be a perfect prediction formula. '
5, If RI ' l3 =,0, then \ all total and partial correlations involving Xl are zero
.
[See Example 10·37). So XI is completely with all the other
variables in this case aI\9 the multiple regression equation fails -to 'throw any
light on the value of XI when Xl and X3 are known.
'6. R I .23 is not less than any total correlation i.e .•
R1.23 r12, r13, rl3
10'lS. Coefficient of Partial Correlation. Sometimes the
correlation between two 'variables X I and Xl may be partly due to the correlation
of a third variable, X3 with both Xl and Xl' In such a situation, one may Want
to know what the correlation between Xl and Xl would be if the effect of X3 on
each of XI and Xl were eliminated. This correlation is called the partial
correlation and the correlation between:X 1 and Xl after the linear.
effect of X3 on each of them has been eliminated is called the partial correlationl
coeffiCient.
The residual X l .3 =X I -b 13 X 3, may be regarded as that part of the
variable Xl which remains after the linear effect of X3 has been eliminated..
Similarly, the residual Xl .3 may be interpreted as the part of the variable X;
obtained after eliminating the linear effect of X3 • Thus the partial correlation
coefficient between Xl and Xl, usually denoted.by '12.3, is given by
Cov (X 1.3, X l .3)
r12.3 = ...(1046)
..JVar (X 1.3) Var (X l .3)
We have
1 1
COV(Xl.3,Xl .3): NI.X 1.3 Xl.3': NI.X 1Xl .3
1 1 1
= N I. XI (Xl -b23 X 3) :'N I.XIXl -b 23 N I.X 1X3
1 1
:'N I.X 1Xl,3'=; N I.XI(X 1 -b 13'x3)
eorreJat.Jonand .10.115
Aliter. We have
o=
=LXz.3 (Xlr blZ.3 XZ - b l 3-z X3)
Butbydef.•
b. 1Z.3 = - ?l • CJ?lZ and b Gz roZl
Gz roll Zl·3 = - Gl • rozz
_ rolZZ
,.llZ.3 =(- C!!
Gz roll
( GZ
- Gl' rozz - roll rozz
(.: rolZ
=v(1 - r13.Z)(1 -
rlZ - r'l3 rZ3
rlZ·3
rZ3 Z)
Remarks 1. The expressions for r13.Z and r23 ... can be similarly obtained.
to give
10·118
b l 7,.34 ..... =- }
tnl a 0) [ef. Equation (1040)].
b -_ -11 "
21·34... n - al' 0)22
r2 _(
...II - - OJ'
0) 12) ( a2 0)2 f)_...!!!JL
0)1'1' - al' 0)12 - 0)11 0)22
T1234 =- COil
(1046b\
.' ...11 ' " COnC022 J
negati\1e sign being taken since the sign of the regression coefficient is same as
that of (-(012)'
10·16. Multiple Correlation in Terms of Total and Partial
'Correlations.
• ,t
••• (1046c).
Proof. We have
T122 + Tli· - 2T12 T13 T23
1,- T23 2
Correlation and Regression 10·117
Also
_ I (r 13 - _ I - 1'12 2 - r23 2 - 1'13 2 + 2rl2rl31
I - r 13. 22 -
- (l - r122)( I - r23 2 ) - (I ...; rI22)(1 - r23 2 )
He-nee the result.
Theorem. Any standard deviation of order 'p' may be expressed ill terms
of a stalldard deviation of order (p - J) alld a partial correlatiol! coefficient of
order (p - J).
Proof. Let us consider the sum :
2. X2 1.23 . n = 2. XJ.23 ... nXI.23...11
0'2 1.23 . ':11 = 0'2'1.2.3 .. (II - I) [I - b lll .23 .. (n - 1)·b lll ' 23...(n -I)]
bI2.34 ... 11 a 22-34 ..• (II -I) (I - ,2211•34 ... (11 _ I)}
= a22.34 •.. (11 -I) [b I2•34 ..• (11 -I) b 2ll.34 ... (11 -I) b lll•34 •.• (11 - I)
-----;. =.a22.34. •• (II':' I) [bu.34 •.. (11 - I) - b lll.34 ..• (11 -I) b ll2.34 .•• (11 -I)]
b _ [b IB4 ... (II -1) - b lll.34 ... (I1- 1) b Il2.34 ... (11 - 1)]
... (1048)
12·34 •.. 11 - 1 ,2
, - 211·34 ... (" - 1)
=> b
12·34 ...11
=,[bu .341... ,"b-
-
1) - b l ".34... ,"
b
1).
.. a1.34••. (" - I)
="1".34 •.• (11-1). '112.34... (11-1). a ... (**)
2·34 ... (11-1)
Hence from (1048). on using ('!') and (...... ). we get
.
'12·34 ...)l X
a 2-34...11
_ [(rI2'34 ... (11 -.1) - ... (11 -I) r112·34 ... (11- t)} a 1·34 ... (1i - I)J ••• (***)
-. I' - 2
, 211·34 ••• (" -I) a2.34 .. ,(I1- I)
=> 1'12·34
•. ,
):J (/12.!" ... (1I-1) -
- I' 111.34•.• (11 - 1 - I' 112·34.,.(11 - 1)
... (1049)
R I -23 =+ 0·7211
10.121
I
<11 <12 CJ:J
I Tn T13
where C.t> - T21 1
- T31 Tn ·1
442
(X .... 28·02) + (..,..0·576) (X .... 4·91) of (-0·048)
1"' 1.10 2' "85.00
(X3 - 594) =0
10·122 Fundamentala of Mathematical Statiatit.
Z ; Z-137
14·045 P= J
-00
p(;)d; Class
Area ,,,.der the
curve in this
class (A)
Frequenq
500 x (A)
Cov (X X )
1·23, 2-13 1·23
X
2·13
N.l - 'x2 - b 13.2X3)
Cll.23 Cl2·13 =N Cll.23 Cl2.13 - Cll.23 Cl2.13
=-b lZ.3NLX2.13X2
Cll·23 Cl2.U.
f
(c•• Property 1, § 10·13)
b ·LX2.132
=- lZ,3 NCll:21. Cl2.13 ,c.f. Property,2 § 10'13)
I
_ b ClZ,13... b (Cl2
- - 12·3 - - 12-3 .
Cll·23 (CliVW/ooll )
1 r12 r13
where 00 = r21 I r23
r31 rn '1
I I
"1 -
0011= r: 1= l--r232 and 1= l-r132
z
•• . XZ,13) = - b1203 Cl2
::;- • 1 rrZ3 2 =- b12.3 _
VI 13 vl·3
-
=
[since Cl2032 Cl22 (1 - r232) and Cll.32 ='Cl12 (I - r132)]
(X 1.3, X2.3) Cl2·3
•• r (X1·23, X
.
2·13) =- 2 •-
Cll.3
COY (Xp, X 2•3) r(X X_\
=- Cl203 Cll.3 =- 13
'. 231
rij.bro. ..t = 2- • ( 2 )
(1 - r it.(;» 1 - r jt.(s)
where k. m • •••• tare (s + I) secondary subscripts and rij.(s). rik.(s) .. rjt·(s). are
partial correlation coefficients of order s. Thus
P - :\t".+
-:----L--
1 + sp
(---e.- Y
sp) __
(1 _ P
T..
'J·Icm. •• t = =
1- (1- 1 +(s+ l)p
1 + sp ) 1 + sp 1 + sp
p p p ... 1
1 P P •.. P
P 1 P ••• P
COlI = P P 1 .•• P a determinant of order (p - 1).
P P P ••• 1
10·127
Webave
1 P P P ... P
1 1 P P ... P
0)= [1 + (p -I)p] 1 P 1 P ... P
1 I I
i , i !
1 P P P ... 1
1 P P P P
0 (1- p) '0 0 0
0 ;0 (i - p) 0 0
ro=[I+(p-I)p]
·0 0 0 0 (1- p)
[On operating Rj -Rlt (i = 2. 3•.•.p)].
•• CD =[I + (p _ l)p](1 _ py-l
Similarly. we will have
0)11 =[1 + (p - 2)p](1 _ p)P-z,
I_Rz
roll
+ (P-I)p
.1 + (p - 2)p
J
Example 10'41. In a p-variate distribution, all the total (order zero)
correlation coefficients are equal 10 Po Let Pl denote the partial correlation
coefficient of order k and be the multiple correlation co'!jficient of one variate'
on k other variatef. Prove that
(i) Pn - (p I)' (ii) Pl- Pl-l =- PlPA>-lt and
-I SI or -[I+(p-2)po]Spo
1
10.128
= - (I +
(iiI) Taking P = Po and k = P - I in P81l (ii) Exan;aple 1040, we get
2 .[ 1+ kpo ]
I -R t = (1 Po) I + (k - .1')Po
_ (I - Po)(1 + kpo) _ k p02 ".
Rt 2 - I- I + (k _ I)po - I + (k _ I)po (On simplificatIon).
Example 10·42. If T12 and T13 are given. show that T23 must /ie in the
Tange: r13 ± (1 - T122 - T132 + TI22T132)lfJ.
T12
IfT12 =k and T]3 =-k. show that T23 will /ie between -1 and 1- 2fil.
[&rdar Palel Univ. B.8c. Oct., 1992; Madrae Univ. B.8c. Mainj 1991)
Solution. We have
.
TI2-3
2
=[ T12- T 13T23 ] 2
S
I
"'(1- T13 2 ) (l-rn2)
•• (T12 -'T13T23)2 S (1 - T232)
'1'1+ TI32r232 - 2r12 T13 T23
S I-TIl, -T232 + TI3?:T231-
2
T122 + T13 + - 2r12 T13 T23 S I •..(.)
This condition holds for .consistent values of T12, '!t3. and f 23 • • may be
rewritten as :
T232 - (2TI2T13)T23
+ (T122 + TI32 .- I) sO.
Hence, for given values of T12 and T13, T23 must lie between the roots of the
quadratic (in equation
T23 2 -- (2r12 T13) T23 + (T122 + T132 - I) = 0,
which are given by, :
T23 =T12T13 ± "'TI22;'132 - (T122 + Tq2 I)
Hence
T12 T13 - -.J I - T,,,2 -'TI3 2 + S T23 S T1.2 T13
+ -.J (I - Tll'--T-13"2-+-T-12-;;:2T-l-:32::") ... (U)
-k2-(I-k2) S T23 S - k 2 + (1 - k 2)
-1 ST23 S J _2k2
EXERCISE 10(g)
'-
L (0) Explain partial correlation and multiple correJatiOQ.
(b) Explain the concepts of multiple and partial coefficients.
Show ihat the multiple correlation coefficient R 1.23. is, in the usual
notations given by : R)'232 = I--
ro
roll
2 (0) In the usual notations, prove that
R .2 _ T122. + T1?- - 2r 1t'2l'31..,. 2
1·23 - 1 - T23 2 T12
(b) If R 1.23 = 1, prove that T2.13 is alW to 1. 'If R 1.23 =:= 0, .does it
necessarily mean
thatRz,13 is also ,zero ?
3. (0) Ol>Wn an expression-for'the variance of the residual X 1.23 in terms of
the corre1ations T12, T23 and T31 and deduce T12 and T13'
(b) Show that the standard deviation of'oroet'p inay be expressed mterms of
standard deviation of order (p - 1) and-a correlation coefficient of'oroer (p - 1).
Hence deduce that :
(i) 01 °1.2 °1.23 .:. ° 1.23 ... ,.
.R =(1- c?,
(1),00
J1
lo.t30
-. 1 '01 'OZ.···.Ja.
'10 1 '12..... Jl"
<.0=
'12·3 = ::::::::;;::
V(1 -
'132) (1 - '232)
6. (a) The simple correlation coefficients between tePlperature (XI)' corn
=
yield (Xz) and rainfall (X3) are, '12 = ()'59, '13 046 and '23 = 0·17.
Calculate the partial correlation coefficients '12-3, '2H and '31.2' Also
calculate R I •23•
If r12 = '13 = - 040 and '23 = - 0-56, fmd the values of '12.3, '13.1
and '23:1' Calculate funher R 1(23)' R2(13) and R3(12)-
7. (a) In certain investigation, the following-values were obtained :
'12 =0-6, '13 = - 04 and '23 =0·7
Are the values consistent?
(b) Comment on the consistency of
3 4 1
'12 =S' '23 =S' '31 =- 2 .
(c) SupPose a computer has found, for a given set of values of XI, Xl and
=
'12 = 0·91, '13 0-33 and '32 = 0·81
Examine whether the computations may be said to be free from error_
=
8. (a) Show that if '11 '13 = 0, then R 1(23) =O. WhaJ is the sig'ilificance
of this result in,regard to the mulQple regression equation of XI 011 X2 and X3 ?
(b) For what value of R 1•23 will X2 andX3 be uncorrelated 7
(c) Given the data: '12 =-0·6, '13 = 04, fmd tile value of '13.80 thittRI.23'
the multiple correlation coefficient of XI oil X2 and X3 should be unity.
9. From the heights (Xl), weighl$ (Xz) and ages (X 3) of a group of students
the following,staildard deviations tJIld correlation coefficients were obtained :
<71 = 2·8 iJlches, <72 = 12 lbs, and <73 =,1'5 years, '12 = 0·75, T23 0-54, and =
=
'31 0,,43. Calculate (I) partial regression coefficients and (ii) partial correlation
coefficients.
10. For a trivariate distribution. :
XI =40 Xl =70 X3 =90
<71 =3 <71 =6 <73 =7
'12 =04 '23 =().5 '13 =0·6
lO·lSl
Find
(0 R 1.23. (;0 r23.1. (iii) the value of X3 when Xl = 30 and X2 = 45.
11. (a) In a study of a random sample of 120 students, the following results
are obtained :
Xr = 68. X2 =70. X3 74 =
S1 2 =100. = 25.
S22 S32 = 81.
rn =0·60. r13 =
0·70. r2) 0·65 =
[S1 =Var (Xi)]. where X 1.X2 • X3 denote percentage of marks obtained by a
sbldent in I test, II test and the final examination respectively.
(0 Obtain the least square regression equation of X3 on Xl and X2:
(iO Compute rt2.3 andR 3•12•
(iii) Estimate the percentage marks of a student in the final examination if
lie gets 60% and 67% in I and II tests respectively.
(b) Xl is the consumption of mille per head. X2 the mean price of mille. and
X3• the per capita income. Time series of the three variables are rendered trend
free and the standani deviations and correlation coefficients calculated ':'
SI = =
7·22. S2 547. S3 6·87 =
rn =- =
0·83. r13 0·92. r23 =-
0·61
Calculate the regression equation of X 1 on X2 and X 3 and interpret the regression
'lIS a demand equation..
12. (a) Five thousand candidates were examined in·the subjects (a). (b), (c);
each of these subjects carrying 100 marks. The following constants relate to
these data : ./ y
Subjects
(a) (b) (c)
Mean 39·46 52·31 45·26
Standard deviation 6·2 9·4 8·7
rbc = 047 rca = 0·38 rab = 0·29
Assuming normally correlated population. find· the number of candidates
who will pass if minimum pass marks are all aggregate of 150 marks for the
three subjects together.
(b) Establish the equation of plane of regression for variates Xl. X2. X3 in
the determinant form
X l/al X2Ial X3Ia 3
rn 1, r2) =0
1
[Delhi Univ. B.Sc. (Matu. HOI&8.). 1986] ,
13. (a) Prove the identity
b1l.3 b23.1 b31•2 =r12.3 r23.1 r31-2 [GujaraI Unit•• B.Sc.. 199!]
Fundamental· of MatbematiaJ. Stad.tIa
(b) X,Y,Z are : three reduced (standard) variates and E(YZ) =; E(ZX) = - Itl,
find the limits between the coefficient of correlation r(X, 1') is necessarily
(PXy+ (Gx 2 + +
lo.tS4 Fundamentals 01 Mathematical StatistiQ
J2
I
[ PiP;
.". (I - Pi) (1 - Pj)
7. A ball is drawn at random from an urn contaiaing 3 white balIs
numbered O. 1.2 ; 2 red balls numbered' O. I and I black ball numbered O. If tbe
cOlours red and black are again numJ>ered O. I and 2 respectively. shOW
tI¥lt the correlation coefficient between the variables : X, the colour number and
Y, the number of'the ball is -
8. If X\ and X z are two independent normal variates with a common mean
zero and variances al z and azz respectively. show that the variates defmed by
az al
al az
are independent and that each is normally distributed with mean zero and
common variance (O:I Z + azZ).
Con"elation and Regreuion lo.t36
Hint. Neglecting the cubes and higher powers of ::' x! being the
deviation of Xi from M and the means and s.d. 's Qf ZI and Zz to be 11,12
and Slo S2 respectively, we get
11 = kL? =k 3
k(Xli +..M)(X3i + M )-1
=
_1.
-N LJ
[(I -M+ x.1C. ')
Ml-'" + M-
X...Jl Xli X 3i
M2 + ...
]
V32
=1+ Ml
V32
Similarly 12 = 1 + W
.. h =12
Now -112
Similarly Sz2 = +
For certain data Y = 1·2X and X = 0·6Y. are the regression lines. Compute
r(X. Y) and ax/ay. Also compute p(X. Z). if Z Y-X. =
[Calcutta Unifl. B.8c. (MtJtlu. 198f11
12. An item (say. a pen) from a production line can be acceptable
repairable or useless. Suppose a production is stable and let p. q. r (p + q + r
1). denQte the probabilities for three possible of an ilem. If the itellls
are put into lots of 100 :
(,) Derive an expression for the probability functiop of Y) where X and
Y are the number of items in the lots that are respectively in the rlJ'st
two conditions.
(ii) Derive the moment generating function of X and Y.
(ii,) Find the marginal distribution X.
(iv) Find the conditional distribution of Y given =90. :x
(v) Obtain the regression function. of Yon X:
(Delhi M4 (Eco.), 1985)
13. If the regression of XI on X2• .... Xp is given by :
E(X1 IX2• •..• Xp) =a +:P2X2 t P3X3 + ..• + PpXp
a22 a23 ••• alp
a32 a33 ••• a3p
, >0. =variances
..
)
aij = covanances
apl a p3 ••• a pp
then the constants ,a. P2•...• Pp are given by
a =III + !!n-
R •
2l.
11 <72
!!u
• 112 + R • • 113 + ••. + R .
11 a3 11 ap
• IIp
and A.._
t'}--R & ._. a. (j=- l ••...•
2 p)
11 Vj
where Rij is the cofactor of Pi} in the determinant (R) of the correlation matrix
Pn PI2'" Pip
P21 Pn .,. Plp
R=
!
I I
P,I Pp2 ••• Ppp
[Delhi Univ. M..Sc. (Stal.), iSM·
14. Let XI andX2 be random variables with means 0 and variances 1 and
correlation coeffICient p. Show that:
E[max (.KI1 • Xl 2)] 1 + V.l _ p'l
Using the above inequality. show that for random variables XI and X1. with
means III and Ill. variances all and all and correlation ci>efticient.p and for any
k > O.
10-137
or +..Jl- p 2;]
15. Let the maximum correlation between Xo and any linear function of
Xl.X2• •••• X" and if rOl =r02= ••• =rOIl =r
and all other correlation coefficients are equal to s, flIen show that:
R =r [1 + (:. _ 1)sJ / 2
n
p