Ps Notes
Ps Notes
3
Contents
4
1 Unit I: Basic Probability
Probability of A: For each event A of the sample space S we suppose that a number P(A), called
the probability of A, is defined and is such that
1. 0 ≤ P (A) ≤ 1
2. P (A) = 1
content...
1. P (S) = 1
2. P (A) = 1 − P (A)
3. P (φ) = 0
Proposition 1.1.1
Proposition 1.1.2
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
5
1 Unit I: Basic Probability
1.1.2 Problems
P (A ∩ B)
P (A|B) =
P (B)
P (A ∩ B)
P (B|A) =
P (A)
1.3 Independence
A set of events is said to be independent if the occurrence of any one of them does not depend on
the occurrence or non-occurrence of the others.
When 2 events A and B are independent, it is obvious from the definition that P(B|A) = P(B).
If the events A and B are independent, the product theorem takes the form
P (A ∩ B) = P (A) × P (B)
1.3.1.1 Partition
2. Ai ∩ Ak = φ, i = 1, 2, ..., n; k = 1, 2, ..., n; i 6= j, which means that the sub- sets are mutually
(or pairwise) disjoint; that is, no two subsets have any element in common.
3. If A = A1 ∪ A2 ∪ ... ∪ An = A, which means that the subsets are collectively ex- exhaustive.
That is, the subsets together include all possible values of the set A.
6
1.4 Baye’ s theorem
Let {B1 , B2 , ..., Bn } be a partition of the sample space S, and suppose each one of the events B1 ,
B2 ,... ,Bn , has nonzero probability of occurrence. Let A be any event. Then
P (Bi ) × P (A|Bi )
P (Bi |A) = Pn (1.1)
i=1 P (Bi ) × P (A|Bi )
1. There are 4 candidates for the office of the highway commissioner; the respective probabilities
that they will be selected are 0.3, 0.2, 0.4 and 0.1, and the probabilities for a project’s approval
are 0.35, 0.85, 0.45 and 0.15, depending on which of the 4 candidates is selected. What is the
probability of the project getting approved? Ans: 0.47
2. A bag contains 7 red and 3 black marbles, and another bag contains 4 red and 5 black marbles.
One marble is transferred from the first bag into the second bag and then a marble is taken
out of the second bag at random. If this marble happens to be red, find the probability that
12
a black marble was transferred. Ans: 47
3. The probability that a student passes a certain exam is 0.9, given that he studied. The
probability that he passes the exam without studying is 0.2. Assume that the probability
that the student studies for an exam is 0.75. Given that the student passed the exam, what
is the probability that he studied? Ans: 27
29
1.5 Problems
1. A box contains 4 bad and 6 good tubes. Two are drawn out from the box one after other.
One of them is tested and found to be good. What is the probability that the other one is
also good?[Ans 59 ]
7
1 Unit I: Basic Probability
2. Two fair dice are thrown independently. Three events A, B, and C are defined as follows:
3. From 6 positive and 8 negative numbers, 4 numbers are chosen at random (without replace-
505
ment) and multiplied. What is the probability that the product is positive?[ANS 1001
]
4. A lot consists of 10 good articles, 4 with minor defects and 2 with major defects. Two articles
are chosen from the lot at random (without replacement). Find the probability that
c) At least 1 is good.[Ans 87 ]
d) At most 1 is good.[Ans 58 ]
5. There are 3 true coins and 1 false coin with head on both sides. A coin is chosen at random
and tossed 4 times. If head occurs all the 4 times, what is the probability that the false coin
16
has been chosen and used? [Ans: 19
]
6. A bag contains 5 balls and it is known how many of them are white. Two balls are drawn at
random from the bag and they are noted to be white. What is the chance that all the balls
in the bag are white. [Ans 12 ]
8
2 Unit II: Random Variables and Probability
Distributions
9
2 Unit II: Random Variables and Probability Distributions
Practice 2.8.3 A sample of 100 dry battery cells tested to find the length of life produced the
following results: Mean=12 hrs. SD=3 hrs. Assuming the data to be normally distributed, what
percentage of battery cells are expected to have life
[Ans: 15.87%,2.28%,49.74%]
Practice 2.8.4 The income of a group of 10000 persons were found to be normally distributed with
mean rs 520 and S.D. Rs 60. Find
Practice 2.8.5 For a normal variate X with mean 25 and standard deviation 10, find the area
between
1. X = 25, X = 35 2. X = 15, X = 35 3. X ≥ 15 4. X ≥ 35
Practice 2.8.6 If the height of 500 students is normally distributed with mean 68 inches and SD
4 inches, estimate the number of students having heights
Practice 2.8.7 For a normally distributed variate X with mean 1 and s.d 3, find
10
3 Unit III: Bivariate Distributions
Consider two random variables X and Y defined on the same sample space. For example, X can
denote the grade of a student and Y can denote the height of the same student. The joint cumulative
distribution function (joint CDF) of X and Y is given by
FXY (x, y) = P [X ≤ x, Y ≤ y]
The pair (X, Y ) is referred to as a bivariate random variable. If we define FX (x) = P [X ≤ x] as the
marginal CDF of X and FY (y) = P [Y ≤ y] as the marginal CDF of Y, then we define the random
variables X and Y to be independent if
11
3 Unit III: Bivariate Distributions
12
13
3 Unit III: Bivariate Distributions
14
3.1 Marginal PMFs and PDFs
15
3 Unit III: Bivariate Distributions
Consider two discrete random variables X and Y with the joint PMF pXY (x, y). The conditional
PMF of Y, given X = x, is given by
P [X = x, Y = y]
pY |X (y|x) =
P [X = x]
pXY (x, y)
=
pX (x)
provided pX (x) > 0. The conditional PMF of X, given Y = y, is given by
P [X = x, Y = y]
pX|Y (x|y) =
P [Y = y]
pXY (x, y)
=
pY (y)
provided pY (y) > 0.
If X and Y are independent random variables
pY |X (y|x) = pY (y)
16
3.3 Conditional Distributions
Problem 3.3.1 The joint probability function of two discrete random variables X and Y is given
by
c(2x + y) 0 ≤ x ≤ 2, 0 ≤ y ≤ 3
p(x, y) =
0
otherwise
Find
2. P (X ≥ 1, Y ≤ 2)
3. P (X = 2, Y = 1)
Problem 3.3.2 The joint density function of two continuous random variables X and Y is
cxy
0 < x < 4, 1 < y < 5
f (x, y) =
0
otherwise
Find
3. P (X ≥ 3, Y ≤ 2)
Problem 3.3.3 The joint probability distribution of two random variables X and Y is given by:
P (X = 0, Y = 1) = 31 , P (X = 1, Y = −1) = 13 , P (X = 1, Y = 1) = 13 ,
Find
17
3 Unit III: Bivariate Distributions
Problem 3.3.4 The joint probability density function of a two dimensional random variable (X,
Y) is given by:
2;
0 < x < 1, 0 < y < x
f (x, y) =
0; otherwise
2. Find the conditional density function of Y given X = x and conditional density function of X
given Y = y.
Test whether X and Y are independent. For the above joint distribution, find the conditional density
of X given Y = y
They are independent
2
fX|Y (X = x|Y = y) = 2xe−x ; x ≥ 0
Problem 3.3.6 The joint density function of the random variables X and Y is given by:
8xy; 0 < x ≤ 1, 0 < y ≤ x
f (x, y) =
0;
elsewhere
Find
18
3.3 Conditional Distributions
2x
; y≤x≤1
1−y 2
fX (x|y) =
0;
otherwise
2y2 ;
0≤y≤x
x
fY (y|x) =
0;
otherwise
Problem 3.3.7 The joint PMF of two random variables X and Y is given by
k(2x + y) x = 1, 2; y = 1, 2
pXY (x, y) =
0
Otherwise
19
3 Unit III: Bivariate Distributions
20
3.3 Conditional Distributions
Problem 3.3.8 A fair coin is tossed three times. Let X be a random variable that takes the value
0 if the first toss is a tail and the value 1 if the first toss is a head. Also, let Y be a random variable
that defines the total number of heads in the three tosses.
Solution Let H denote the event that a head appears on a toss and T the event that a tail appears
on a toss. Table shows the sample space and the values of the two random variables.
(a) Since X takes values 0 and 1, and Y takes values 0, 1, 2, and 3, the joint PMF of X and Y can then
21
3 Unit III: Bivariate Distributions
be constructed as follows:
22
3.3 Conditional Distributions
Problem 3.3.9 X and Y are two continuous random variables whose joint PDF is given by
e−(x+y)
0 ≤ x < ∞, 0 ≤ y < ∞
fXY (x, y) =
0
Otherwise
Problem 3.3.10 The joint PMF of two random variables X and Y is given by
1 (2x + y) x = 1, 2; y = 1, 2
18
PXY (x, y) =
0
Otherwise
23
3 Unit III: Bivariate Distributions
Problem 3.3.11 The joint probability function of two discrete RVs X and Y is given by
c(2x + y) x = 0, 1, 2; y = 0, 1, 2, 3
PXY (x, y) =
0
Otherwise
1
1. find c. Ans c= 42
4
2. P (X ≥ 1, Y ≤ 2). Ans: 7
Problem 3.3.12
k(6 − x − y)
0 < x < 2, 2 < y < 4
fXY (x, y) =
0
Otherwise
1. find k. Ans k= 18
2. P (X < 1, Y < 3), P (X < 1|Y < 3) Ans:P (X < 1, Y < 3) = 83 , P (X < 1|Y < 3) = 3
5
24
4 Unit IV: Basic Statistics
4.2 Moments
Ans: MX (t) = 2
2−t
assuming t < 2, µ = µ01 = 12 , µ02 = 12 , µ03 = 34 , µ04 = 3
2
Problem 4.3.3 Find the first four moments (a) about the origin, (b) about the mean,
4x(9−x2 ) 0 ≤ x ≤ 3
81
f (x) =
0
otherwise
25
4 Unit IV: Basic Statistics
Problem 4.3.4 a) Find the moment generating function of a random variable X having density
function
x
0≤x≤2
2
f (x) =
0
otherwise
(b) Use the generating function of (a) to find the first four moments about the origin.
1+2te2t −e2t
Ans: MX (t) = 2t2
Problem 4.3.5 If X denotes the outcome when a fair die is tossed, Find the moment generating
function of X and hence find the mean and variance of X.
1
et + e2t + e3t + e4t + e5t + e6t , Mean = 27 , Variance= 35
Ans: 6 12
Problem 4.3.6 A r.v. X takes values 0 and 1 with probabilities q and p respectively with q+p=1.
Find the mgf of X and show that all the moments about the origin equal p
Solution:
The first four moments of a distribution about the value 5 of the random variable X
E(X − 5) = 2, E(X − 5)2 = 20, E(X − 5)3 = 40 & E(X − 5)4 = 50.
E(X − 5) = 2 =⇒ E(X) − E(5) = 2
E(X) − 5 = 2 =⇒ E(X) = 7
E(X)=7
26
4.4 Skewness & Kurtosis
E(X − 5)2 = 20
E(X 2 − 2.(X).(5) + 52 ) = 20
E(X 2 − 10X + 25) = 20
E(X 2 ) − 10E(X) + 25 = 20
E(X 2 ) − 10(7) + 25 = 20
E(X 2 ) − 45 = 20
E(X 2 ) = 65
27
4 Unit IV: Basic Statistics
Problem 4.4.2 The first three moments of a distribution about the value 2 of the random variable
X are 1, 16 and - 40. Find the coefficient of skewness.
Ans: -1.480
Problem 4.4.3 The first four moments of a distribution about X = 4 are 1, 4, 10, 45 respectively.
Find the mean, variance, coefficient of skewness and coefficient of Kurtosis.
Problem 4.4.4 The distribution of a random variable X has mean 10, variance 16, coefficient of
skewness 1 and coefficient of kurtosis 4. Obtain the first four moments of X about origin.
Problem 4.4.5 Compute coefficient of skewness and coefficient of Kurtosis for the following dis-
tribution
X=x 0 1 2 3 4 5 6 7 8
p 0.004 0.036 0.1 0.232 0.280 0.240 0.112 0.028 0.004
Ans: 0.288,5.58
x 0 1 2 3 4 5 6
F 4 15 25 22 11 3 0
9P (X = 4) = P (X = 2)
Ans: [ 41 , 32 , 89 , 16
9 45
, 32 ]
28
4.6 Moments, skewness & kurtosis for Normal distribution
Problem 4.5.3 Find the mean and standard deviation, skewness and kurtosis of the following
probability distribution.
X=x 0 1 2 3 4 5 6 7 8
p 0.004 0.036 0.1 0.232 0.280 0.240 0.112 0.028 0.004
Ans: 0.5,0.5,0.5,1.25
x 0 1 2 3 4
F 121 61 15 3 1
P (X = 1) = 2P (X = 2)
, find the mean, variance, skewness and kurtosis of the distribution. Also find P(X=3).
Ans: 1,1,1,4,0.06143
Problem 4.5.7 Find out the fallacy of the statement: “If X is a Poisson varaite such that
P (X = 2) = 9P (X = 4) + 90P (X = 6)
Then mean of X is 1.” Also find mean, variance, coefficient of skewness and kurtosis.
Correct,1,1,1,4
29
5 Unit V: Testing of Hypothesis
5.1 z-test
x̄ − µ
z=
( √σn )
where σ is the standard deviation of the population.
Also,
x̄ − µ
z=
( √sn )
31
5 Unit V: Testing of Hypothesis
Problem 5.1.1 A sample of 100 students is taken from a large population. The
mean height of the students in this sample is 160 cm. Can it be reasonably regarded
that, in the population, the mean height is 165cm, and the SD is 10cm?
Problem 5.1.2 The mean breaking strength of the cables supplied by a manufac-
turer is 1800, with an SD of 100. By a new technique in the manufacturing process,
it is claimed that the breaking strength of the cable has increased. To test this claim,
a sample of 50 cables is tested and it is found that the mean breaking strength is
1850. Can we support the claim at 5% LOS?
Problem 5.1.3 A random sample of 50 items gives the mean 6.2 and standard
deviation 10.24. Can it be regarded as drawn from a normal population with mean
5.4 at 5% level of significance?
Problem 5.1.4 The mean height of a random sample of 100 individuals from a
population is 160. The Standard deviation of the sample is 10. Would it be reason-
able to suppose that the mean height of the population is 165?
Ans: zcal = 5, No
Problem 5.1.5 The mean value of a random sample of 60 items was found to
be 145, with a standard deviation of 40. Find the 95% confidence limits for the
population mean. What size of the sample is required to estimate the population
mean within 5 of its actual value with 95or more confidence, using the sample
mean?
32
5.1 z-test
Problem 5.1.6 A simple sample of heights of 6400 English men has a mean of
170 cm and an SD of 6.4 cm, while a simple sample of heights of 1600 Americans
has a mean of 172 cm and an SD of 6.3 cm. Do the data indicate that Americans
are, on the average, taller than the Englishmen?
Problem 5.1.7 Test the significance of the difference between the means of the
samples, drawn from two normal populations with the same SD using the following
data:
size mean SD
sample 1 100 61 4
sample 2 200 63 6
Problem 5.1.8 The average marks scored by 32 boys are 72 with SD of 8, while
that for 36 girls is 70 with SD of 6. Test at 5 % LOS whether the boys perform
better than girls.
Problem 5.1.9 An IQ test was given to a large group of boys in the age group
of 18-20 years, who scored an average of 62.5 marks. The same test was given to
a fresh group of 100 boys of the same age group. They scored an average of 64.5
marks with an SD of 12.5 marks. Can we conclude that the fresh group of boys has
better IQ?
33
5 Unit V: Testing of Hypothesis
Problem 5.1.10 The means of two simple samples of 1000 and 2000 items are
170 and 169 cm respectively. Can the samples be regarded as drawn from the same
population with SD 10 at 5% LOS?
Problem 5.1.11 Intelligence tests were given to two groups of boys and girls of
the same age group chosen from the same college and the following results were
obtained.
size mean SD
Boys 100 73 10
Girls 60 75 8
Examine whether the difference between the means is significant or not
Problem 5.1.12 In a college, 60 junior students are found to have a mean height
of 171.5 cm and 50 senior students are found to have a mean height of 173.8 cm.
Can we conclude, based on this data that the juniors are shorter than seniors at (i)
5% LOS and (ii) 1% LOS, assuming that the SD of the students of that college is
6.2 cm?
Problem 5.1.13 Experience has shown that 20% of manufactured product is of top
quality. In one day’s production if 400 articles, only 50 are of top quality. Show
that wither the production of the day chosen was not a representative sample or the
hypothesis of 20% was wrong.
34
5.1 z-test
Problem 5.1.15 In a particular area 984 girls against every 1000 boys. Can we
say gender equality exist?
Problem 5.1.16 Certain crosses of pea gave 5321 yellow and 1804 green seeds.
The expectation is 25% green seeds based on a certain theory. Is this divergence
significant or due to sampling fluctuation?
[significant is due to sampling fluctuation?]
Problem 5.1.18 In a large city A, 20% of a random sample of 900 school boys had
a slight physical defect. In another large city B, 18.5 percent of a random sample
of 1600 school boys had the same defect. Is the difference between the proportions
significant?
35
5 Unit V: Testing of Hypothesis
Problem 5.1.19 Before an increase in excise duty on tea, 800 people out of a
sample of 1000 were consumers of tea. After the increase in duty, 800 people were
consumers of tea in a sample of 1200 persons. Find whether there is significant
decrease in the consumption of tea after the increase in duty.
Problem 5.1.20 A Government has started giving tax incentives for installing
certain energy saving devices. However, not entire population is aware about this
offer. Out of 400 persons, selected randomly, 140 were ‘unaware’ about the of-
fer. Construct interval estimation for ‘proportion aware’ about the offer with 95%
confidence level.
[.6032,.6967]
36
5.2 t-test
5.2 t-test
37
5 Unit V: Testing of Hypothesis
Problem 5.2.1 Tests made on the breaking strength of 10 pieces of a metal gave
the following results: 578, 572, 570, 568, 572, 570, 570, 572, 596 and 584 kg. Test
if the mean breaking strength of the wire can be assumed as 577 kg
Problem 5.2.2 A machinist is expected to make engine parts with axle diameter
of 1.75 cm. A random sample of 10 parts shows a mean diameter 1.85 cm with
an SD of 0.1 cm. On the basis of this sample, would you say that the work of the
machinist is inferior?
Problem 5.2.3 Two independent samples of sizes 8 and 7 contained the following
values:
sample 1 19 17 15 21 16 18 16 14
sample 2 15 14 15 19 15 18 16
Is the difference between the sample means significant?
38
5.2 t-test
H0 : x1 6= x2
Problem 5.2.4 The following data represent the biological values of protein from
cow’s milk and buffalo’s milk at a certain level
Problem 5.2.5 The following data relate to the marks obtained by 11 students in
2 tests, one held at the beginning of a year and the other at the end of the year
after intensive coaching
Test 1 19 23 16 24 17 18 20 18 21 19 20
Test 2 17 24 20 24 20 22 20 20 18 22 19
Do the data indicate that the students have benefited by coaching?
39
5 Unit V: Testing of Hypothesis
5.3 F-test
40
5.3 F-test
41
5 Unit V: Testing of Hypothesis
42
5.3 F-test
Problem 5.3.2 In one sample of 9 items, the sum of the squares of deviations
of the sample values from the sample mean was 160, and in another sample of 8
observations it was 91. Test whether difference in variance is significant at 5 %
level.
43
5 Unit V: Testing of Hypothesis
44
5.3 F-test
Problem 5.3.4 Two independent samples of 8 and 7 items respectively had the
following values of the variable:
Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10
Do the estimates of population variance differ significantly at 5 % LOS?
45
5 Unit V: Testing of Hypothesis
Problem 5.4.1 An unbiased die is thrown 600 times and outcomes are given below.
Can we really claim the die is unbiased?
1 2 3 4 5 6
110 98 92 103 105 92
H0 : Die is unbiased
H1 : Die is not unbiased
(Oi −Ei )2
Oi Ei Ei
46
5.4 χ2 - test
A : AB : B = 1 : 2 : 1. A sample of AB 125
(Oi −Ei )2
Oi Ei Ei
A 70 62.5 0.09
AB 125 125 .00
B 55 62.5 .09
1.8
47
5 Unit V: Testing of Hypothesis
Problem 5.4.3 The following data show defective article produced by 4 machines:
Do the figures indicate a significant difference in the performance of the ma-
chines?
Machine A B C D
Production time 1 1 2 3
No. of defective 12 30 63 98
Examine whether the nature of the area is related to voting preference in this
election.
Ans: χ2cal = 10.092, H0 is rejected.
48
5.4 χ2 - test
(Oi −Ei )2
Oi Ei Ei (rounded) Ei
128×140 122
1 83 253 = 70.83 71 71 = 2.03
125×140 122
2 57 253 = 69.17 69 69 = 2.09
128×113 122
3 45 253 = 57.17 57 57 = 2.53
125×113 122
4 68 253 = 55.83 56 56 = 2.57
χ2cal = 9.22
ν = (m − 1)(n − 1) = (2 − 1)(2 − 1) = 1
χ2tab = χ2ν=1,0.05 = 3.84
Test statistic:
P (Oi −Ei )2
χ2cal = Ei = 9.22
Conclusion:
χ2cal < χ2tab
H0 is rejected
There is some association between and smoking
49
6 Linear Statistical models
6.1 Correlation
Problem 6.1.1 Calculate the correlation coefficient for the following heights (in inches) of fathers
(X) and their sons(Y)
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Solution:
X Y X2 Y2 XY
65 67
66 68
67 65
67 68
68 72
69 72
70 69
72 71
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P
n = 8, X = 544, Y = 552, X = 37028, XY = 37560
P P
X Y
X= n
= 68, Y = n
= 69
51
6 Linear Statistical models
r(X, Y ) = 0.6030
Problem 6.1.2 Calculate the Karl Pearson’s correlation coefficient from the following data
X 28 45 40 38 35 33 40 32 36 33
Y 23 34 33 34 30 26 28 31 36 35
[0.5185]
Solution:
X Y X2 Y2 XY
28 23
45 34
40 33
38 34
35 30
33 26
40 28
32 31
36 36
33 35
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P 2 P
n= , X= , Y = , X = , Y = , XY =
P P
X Y
X= n
= ,Y = n
=
r(X, Y ) = 0.5185
Problem 6.1.3 Calculate the correlation coefficient from the following data:
52
6.1 Correlation
X 30 33 25 10 33 75 40 85 90 95 65 55
Y 68 65 80 85 70 30 55 18 15 10 35 45
[r = −.9935]
Solution:
X Y X2 Y2 XY
30 68
33 65
25 80
10 85
33 70
75 30
40 55
85 18
90 15
65 35
55 45
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P 2 P
n= , X= , Y = , X = , Y = , XY =
P P
X Y
X= n
= ,Y = n
=
r(X, Y ) = −.9935
Problem 6.1.4 A computer while calculating correction coefficient between two variables X and Y
from 25 pairs of observations obtained the following results:
P P 2 P P 2 P
n = 25, X = 125, X = 650, Y = 100, Y = 460, XY = 508 [r = 0.2065]
Solution:
53
6 Linear Statistical models
P PP
XY − X Y
n
r(X, Y ) = p P P p P P
n X 2 − ( X)2 . n Y 2 − ( Y )2
r(X, Y ) = 0.2065
Problem 6.1.5 The following table gives the distribution of items of production and also the rel-
atively defective items among them according to size-groups. Is there any correlation between size
and defect in quantity?
[Ans: r = 0.94]
Problem 6.1.6 Calculate the rank correlation coefficient from the following data.
X 1 3 7 5 4 6 2 10 9 8
Y 3 1 4 5 6 9 7 8 10 2
[r = 0.4182]
Solution:
n = 10
di = X − Y
X Y di d2i
54
6.1 Correlation
d2i =
P
6 d2i
P
R=1− 3
n −n
R = 0.4182
Problem 6.1.7 The following table shows the marks obtained by 10 students in Accountancy and
Statistics. Find the Spearman’s coefficient of rank correlation.
No. 1 2 3 4 5 6 7 8 9 10
Acc 45 70 65 30 90 40 50 57 85 60
Stat 35 90 70 40 95 40 60 80 80 50
[Ans : R= 0.8658]
Solution:
n = 10
di = Rank(Acc) − Rank(Stat)
55
6 Linear Statistical models
If ranks are repeated then the Spearman’ s Rank corrrelation formula becomes
P m3−m
d2i
P
6[ + 12 ]
R=1−
n3 − n
d2i = 22
P
1
d2i (m31 (m32
P
6[ + 12 − m1 ) + − m2 ) ]
R=1−
n3 − n
1 3 3
6[22 + 12 (2 − 2) + (2 − 2) ]
R=1−
103 − 10
R = 0.8658
Problem 6.1.8 Find the coefficient of correlation between height of father and height of son from
the following data.
Height of father 65 66 67 67 68 69 71 73
Height of son 67 68 64 68 72 70 69 70
6.2 Regression
Regression can be defined as a method to estimate the value of one variable when that of other is
known, when the variables are correlated. Regression analysis is a mathematical measure of average
relationship between two or more correlated values.
56
6.2 Regression
y − y = byx(x − x)
Cov(x,y) σy
byx = = r
σx2 σx
2. Line of regression of x on y is:
x − x = bxy (y − y)
Least Squares Straight Line For a given set of N data points (x1 , y1 ),(x2 , y2 ),. . . (xN , yN )
assume that the straight line
Y = a0 + a1 X = f (X)
57
6 Linear Statistical models
X X
Y i = N a 0 + a1 Xi
X X X
X i Y i = a0 X i + a1 Xi2
Problem 6.2.1 The following are the marks in Statistics (X) and Mathematics(Y) of ten students
X 56 55 58 57 56 60 54 59 57 58
Y 68 67 67 65 68 70 66 68 66 70
Calculate the coefficient of correlation and estimate marks in Mathematics of a student who scored
62 marks in Statistics.
[Ans : r = 0.44 , Y = 69.5 ]
Problem 6.2.2 It is given that the means of x and y are 5 and 10. If the line of regression of y on
x is parallel to the line 20y = 9x + 40 , estimate the value of y at x = 30
[Ans: 20y = 9x + 155 and y = 21.25]
Problem 6.2.3 Find the two lines of regression from the following data
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Problem 6.2.4 In partially destroyed laboratory record of an analysis of correlation data, the fol-
lowing results only are legible- Variance of X = 9, regression equations are: 8X − 10Y + 66 = 0 &
40X − 18Y = 214 What was
3. the S.D. of Y
58
6.3 ANOVA
X Y
Mean 30.1 47.8
standard deviation 6.2 9.5
Problem 6.2.6 Obtain the equation of the line of regression of cost on age from the following table
giving the age of a car of certain make and the annual maintenance cost.
Age of car 2 4 6 8
Maintenance 5 7 8.5 11
So far, we have seen the concept of simple linear regression where a single predictor variable X was
used to model the response variable Y . In many applications, there is more than one factor that
influences the response. Multiple regression models thus describe how a single response variable Y
depends linearly on a number of predictor variables.
6.3 ANOVA
6.3.1.2 Technique
In one way classification the data are classified according to only one criterion . The null hypothesis
is:
H0 : µ1 = µ2 = µ3 ... = µk
59
6 Linear Statistical models
H0 : µ1 6= µ2 6= µ3 ... 6= µk
Between-column variance
F =
Within-column variance
Symbolically,
S12
F =
S22
4. Compare the calculated value of F with table value of F for the degrees of freedom at a certain
critical level .
If the calculated of F is greater than the table value , it is concluded that the difference in
sample means is significant.
60
6.3 ANOVA
61
6 Linear Statistical models
ANOVA Table
Source Sum of squares Degrees of Mean square Variance
of Variation (SS) freedom (ν) (MS) Ratio of F
SSC
Between samples 50 3 MSC = c−1
= 16.7
SSE MSC
Within samples 208 16 MSE = n−c
= 13.0 MSE
= 1.285
Total 258 19
Problem 6.3.2 Three brands A, B and C of tyres were tested for durability. A sample of 4 tyres
of each brand is subjected to the same test and the number of kilometers
until wear out was noted for each brand of tyres. The data in thousand km
is given in the table below. Make an analysis of variance of the data at 5% of LOS.
Problem 6.3.3 To test the significance of vari- ability observed in rupees were as follows:
ation in the retail prices of a commodity in three Mumbai 16 8 12 14
principal cities, Mumbai, Kolkata and Delhi, four
shops were chosen at random in each city and the Kolkata 14 10 10 6
prices who lack confidence in their mathematical
Delhi 4 10 8 8
Do the data indicate that the price in the three cities are significantly different?
62