Unit 3 2DRV2
Unit 3 2DRV2
2.8 Regression
Regression Lines :
Regression Line of Y on X is
y − y = byx (x − x)
where byx = Regression coefficient of y on x
Cov (X, Y) Cov (X, Y)
= =
Var (X) σ2x
σy
=r
σPx P P
n xy − ( x) ( y)
= · · · direct method
n x2 − ( x)2
P P
Regression Line of X on Y is
x − x = b xy (y − y)
where b xy = Regression coefficient of x on y
Cov (X, Y) Cov (X, Y)
= =
Var (Y) σ2y
σx
=r
σy
P P P
n xy − ( x) ( y)
= · · · direct method
n y2 − ( y)2
P P
σ σ
2 !
r − 1 x y
Obtuse angle, θ0 = tan−1
σ2x + σ2y
|r|
(iii) r2 = b xy byxp
i.e., r = ± b xy byx and r has the same sign as that of b xy and byx .
(iv) Regression coefficients are independent of change but not of scale and
x−a y−b
if u = , v= , then
h k
h k
b xy = buv and byx = bvu
k h
X = E[X / Y = y]
Y = E[Y / X = x]
Example 2.41. The following table gives age X in years if cars and
annual maintenance cost Y(in 100 Rs.)
X 1 3 5 7 9
Y 15 18 21 23 22
Find
(a) regression equations
(b) correlation coefficient
(c) Estimate the maintenance cost for a 4 year old car.
Solution :
(a) W.K.T. regression equations, i.e.,
Regression equation of Y on X is
(y − y) = byx (x − x) (1)
Regression equation of X on Y is
(x − x) = b xy (y − y) (2)
where
202 MA6453 Probability and Queueing Theory
# X Y XY X2 Y2
1 1 15 15 1 225
2 3 18 54 9 324
3 5 21 105 25 441
4 7 23 161 49 529
5 9 22 198 81 484
X = 25 Y = 99 XY = 533 X = 165 Y = 2003
P P P P 2 P 2
n=5
5(533) − (25)(99)
P
x 99
x= = = 19.8 byx = = 0.95
Pn 5 5(165) − (25)2
y 25 5(533) − (25)(99)
y= = =5 b xy = = 0.8878
n 5 5(2003) − (99)2
∴ Regression equation of Y on X is from (1) is
(y − 19.8) = 0.95(x − 5) ⇒ y = 0.95x + 15.05 (3)
∴ Regression equation of X on Y is from (2) is
(x − 5) = 0.8878(y − 19.8) ⇒ x = 0.8878y − 12.5784 (4)
(b) Correlation Coefficient r :
q
r = ± byx · b xy
√
= ± 0.8878 × 0.95
∴ r = ±0.9183
(c) Cost(Y) for 4 year old(X) car :
i.e., we have to use regression equation of Y on X. i.e.,
∴ (3) ⇒ y = 0.95x + 15.05
= 0.95(4) + 15.05
∴ y = 18.85
i.e., The cost for 4 year old car is Rs.1885.
Two - Dimensional Random Variables 203
8x − 10y + 66 = 0
40x − 18y − 214 = 0
Find
(a) mean values of X and Y (b) r (X, Y)
Solution : Given
8x − 10y + 66 = 0 (1)
40x − 18y − 214 = 0 (2)
(a) Since both the lines of regression pass through the point (x, y), where
(x) and (y) are mean values of x and y, which satisfies the two given
lines of regression.
Find the mean value point x, y by solving equations (1) and (2) : i.e.,
On solving,
x = 13
y = 17
(b) r(x, y) :
(1) ⇒ 8x − 10y + 66 = 0 (1) ⇒ 8x − 10y + 66 = 0
⇒ 8x = 10y − 66 ⇒ 10y = 8x + 666
10 66 8 66
⇒ x= y− ⇒y= x+
8 8 10 10
10 8
∴ b xy = ∴ byx =
8 10
(2) ⇒ 40x − 18y − 214 = 0 (2) ⇒ 40x − 18y − 214 = 0
⇒ 40x − 214 = 18y ⇒ 40x = 18y + 214
40 214 18 214
⇒y= y− ⇒ x= y+
18 18 40 40
40 18
∴ byx = ∴ b xy =
18q 40q
∴ r = ± b xy · byx ∴ r = ± b xy · byx
r r
10 40 18 8
r=± · r=± ·
8 18 40 10
r = ±2.777 r = ±0.6
204 MA6453 Probability and Queueing Theory
W.K.T. 0 ≤ r ≤ 1.
∴ r = ±0.6
4x − 5y + 33 = 0 (A)
20x − 9y = 107 (B)
Since both the lines of regression passes through the mean values
x and y, the point (x, y) must satisfy the two given regression lines.
4x − 5y = −33 (1)
20x − 9y = 107 (2)
x = 13, y = 17
r2 = b xy b xy
9 4
=
20 5
9
=
25
3
r = = 0.6
5
Example 2.44. If y = 2x − 3 and y = 5x + 7 are the two regression
lines, find the mean values of x and y. Find the correlation
coefficient between x and y. Find an estimate of x when y = 1.
10 29 √
Solution : {Mean values x = − , y = − , r = 2/5, for y = 1, x = −6/5}
3 3
Two - Dimensional Random Variables 205
Solution :
(i)
Z∞ Z∞
f (x) = f (x, y)dy f (y) = f (x, y)dx
−∞ −∞
Z2 Z1
1 1
= (x + y)dy = (x + y)dy
3 3
0 0
#1
2 2 1 x2
" # "
1 y = + xy
= xy + 3 2
3 2 0 " # 0
1 1 1
= [2x + 2] = +y
3 3 2
2 1
f (x) = [x + 1], 0 ≤ x ≤ 1 f (y) = [1 + 2y], 0 ≤ y ≤ 2
3 6
Z∞ Z∞
E[X] = x f (x)dx E[Y] = y f (y)dy
−∞ −∞
Z1 Z2
2 1
= (x + 1) dx
x = (1 + 2y) dy
y
3 6
0 0
#1 #2
2 x3 x2 1 y2 y3
" "
= + = +2
3 3 2 0 6 2 3
" # " # 0
2 1 1 10 1 16
= + = = 2+
3 3 2 18 6 3
5 11
E[X] = E[Y] =
9 9
206 MA6453 Probability and Queueing Theory
Z∞ Z∞ Z∞ Z∞
E[XY] = xy f (x, y)dxdy (or) xy f (x, y)dydx
−∞ −∞ −∞ −∞
Z1 Z2 Z1 Z2
1 1
= xy (x + y)dydx = x2 y + xy2 dydx
3 3
0 0 0 0
Z1 #2
x2 y2 xy3
"
1
= + dx
3 2 3 0
0
Z1 " #
18
=
2x2 + x dx
33
0
#1
1 2 3 8 x2
"
= x +
3 3 32 0
" # " #
1 2 4 1 6
= + =
3 3 3 3 3
2
∴ E[XY] =
3
" #" #
2 5 11 2 55
Cov[X, Y] = E[XY] − E[X] · E[Y] = − = −
3 9 9 3 81
−1
Cov[X, Y] =
81
Z∞ ∞
h i h i Z
E X2 = x2 f (x)dx E Y2 = y2 f (y)dy
−∞ −∞
Z1 Z1
2 1
= x2 (x + 1) dx = y2 (1 + 2y) dy
3 6
0 0
Z1 Z1
2 1
= x3 + x2 dx = y2 + 2y3 dy
3 6
0 0
3 1
#2
2 x4 x 1 y3 y4
" # "
= + = +2
3 4 3 0 6 3 4
" ! # " ! 0 #
2 1 1 1 8
= + − (0 + 0) = + 8 − (0 + 0)
3 4 3 6 3
7 16
= =
18 9
Two - Dimensional Random Variables 207
Correlation co-efficient is
Cov(x, y)
r(X, Y) =
σ x σy
1
−
= √ 81
13162
r
23
81
r
2
r(x, y) = −
299
rσX rσY
X − E(X) = [Y − E(Y)] Y − E(Y) = [X − E(X)]
σY σX
r r
r 13 r 23
" # " #
5 2 162 11 11 2 81 5
X− =− × r Y− Y− =− × r X−
9 299 23 9 9 299 13 9
" # 81 " # 162
5 1 11 11 2 5
X− =− Y− Y− =− X−
9 23 9 9 13 9
1 14 2 17
X=− Y+ Y =− X+
23 23 13 13
W.K.T.
f (x, y) (x + y)/3
The conditional probability of X on Y is f (x/y) = =
f (y) (1 + 2y)/6
2(x + y)
=
1 + 2y
f (x, y) (x + y)/3
The conditional probability of Y on X is f (y/x) = =
f (x) 2(x + 1)/3
x+y
=
2(1 + x)
X = E[X / Y = y] Y = E[Y / X = x]
Z1 Z2
= x f (x/y) dx = y f (y/x) dy
0 0
Z1 Z2
2(x + y) x+y
= x dx = y dy
1 + 2y 2(1 + x)
0 0
Z1 Z2
2 1
= x + xy dx
2
= xy + y 2
dy
1 + 2y 2(1 + x)
0 0
3 2
#1 #2
xy2 y3
" "
2 x x 1
= + y = +
1 + 2y 3 2 0 2(1 + x) 2 3 0
" # " ! #
2 1 y 1 8
= + = 2x + − (0 + 0)
1 + 2y 3 2 2(1 + x) 3
2+y 3x + 4
" #
2
= ∴y=
1 + 2y 6 3(1 + x)
2 + 3y
∴x=
3(1 + 2y)