0% found this document useful (0 votes)
21 views62 pages

Ps Notes

Uploaded by

me.harsh12.2.9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views62 pages

Ps Notes

Uploaded by

me.harsh12.2.9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Probability & Statistics

Sachin Maheshchandra Verma

July, 2023 - Dec, 2023


Contents

1 Unit I: Basic Probability 5


1.1 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Total Probability theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Baye’ s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Unit II: Random Variables and Probability Distributions 9


2.1 Discrete random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Probability mass function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Continuous random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Poisson approximation to binomial distribution . . . . . . . . . . . . . . . . . . . . 9
2.8 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Unit III: Bivariate Distributions 11


3.1 Marginal PMFs and PDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Marginal PMF s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Marginal PDF s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Rule for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3
Contents

3.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15


3.3.1 Conditional PMF for Discrete Random Variables . . . . . . . . . . . . . . . 16
3.3.2 Conditional PDF for Continuous Random Variables . . . . . . . . . . . . . . 16

4 Unit IV: Basic Statistics 25


4.1 Measures of Central tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Moments generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Skewness & Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Mean and variance of Binomial & Poisson distribution . . . . . . . . . . . . . . . . 28
4.6 Moments, skewness & kurtosis for Normal distribution . . . . . . . . . . . . . . . . 29

5 Unit V: Testing of Hypothesis 31


5.1 z-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 One sample mean test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.2 two sample mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 One sample proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1.4 Two sample proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 one sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.2 Two sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3 F-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 χ2 - test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Linear Statistical models 51


6.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.2 Multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.1 ANOVA: One way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4
1 Unit I: Basic Probability

1.1 Probability Spaces


Sample space: Let S denote the set of all possible outcomes of an experiment. S is called as the
sample space of the experiment.

Event: An event is a subset of a sample space S.

Probability of A: For each event A of the sample space S we suppose that a number P(A), called
the probability of A, is defined and is such that

1. 0 ≤ P (A) ≤ 1

2. P (A) = 1

content...

1.1.1 Axioms of Probability

1. P (S) = 1

2. P (A) = 1 − P (A)

3. P (φ) = 0

4. If A = A1 ∪ A2 ∪ ... ∪ An , where A1 , A2 ,...,An are mutually exclusive events, then

P (A) = P (A1 ) + P (A2 ) + ... + P (An )

Proposition 1.1.1

If A ⊆ B then P (A) ≤ P (B).

Proposition 1.1.2
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

5
1 Unit I: Basic Probability

1.1.2 Problems

1.2 Conditional Probability

The probability of A given B is given by

P (A ∩ B)
P (A|B) =
P (B)

The probability of B given A is given by

P (A ∩ B)
P (B|A) =
P (A)

1.3 Independence

A set of events is said to be independent if the occurrence of any one of them does not depend on
the occurrence or non-occurrence of the others.
When 2 events A and B are independent, it is obvious from the definition that P(B|A) = P(B).
If the events A and B are independent, the product theorem takes the form

P (A ∩ B) = P (A) × P (B)

1.3.1 Total Probability theorem

1.3.1.1 Partition

A partition of a set A is a set {A1 , A2 ,...,An } with the following properties:

1. Ai ⊆ A, i = 1, 2, ..., n, which means that A is a set of subsets.

2. Ai ∩ Ak = φ, i = 1, 2, ..., n; k = 1, 2, ..., n; i 6= j, which means that the sub- sets are mutually
(or pairwise) disjoint; that is, no two subsets have any element in common.

3. If A = A1 ∪ A2 ∪ ... ∪ An = A, which means that the subsets are collectively ex- exhaustive.
That is, the subsets together include all possible values of the set A.

6
1.4 Baye’ s theorem

1.3.1.2 Total Probability

Let {B1 , B2 , ..., Bn } be a partition of the sample space S, and suppose each one of the events B1 ,
B2 ,... ,Bn , has nonzero probability of occurrence. Let A be any event. Then

P (A) = P (B1 )P (A|B1 ) + P (B2 )P (A|B2 ) + ... + P (Bn )P (A|Bn )

1.4 Baye’ s theorem


If {B1 , B2 , ..., Bn } be a set of exhaustive and mutually exclusive events associated with a random
experiment and A is another event associated with (or caused by) Bi , then

P (Bi ) × P (A|Bi )
P (Bi |A) = Pn (1.1)
i=1 P (Bi ) × P (A|Bi )

1. There are 4 candidates for the office of the highway commissioner; the respective probabilities
that they will be selected are 0.3, 0.2, 0.4 and 0.1, and the probabilities for a project’s approval
are 0.35, 0.85, 0.45 and 0.15, depending on which of the 4 candidates is selected. What is the
probability of the project getting approved? Ans: 0.47

2. A bag contains 7 red and 3 black marbles, and another bag contains 4 red and 5 black marbles.
One marble is transferred from the first bag into the second bag and then a marble is taken
out of the second bag at random. If this marble happens to be red, find the probability that
12
a black marble was transferred. Ans: 47

3. The probability that a student passes a certain exam is 0.9, given that he studied. The
probability that he passes the exam without studying is 0.2. Assume that the probability
that the student studies for an exam is 0.75. Given that the student passed the exam, what
is the probability that he studied? Ans: 27
29

1.5 Problems
1. A box contains 4 bad and 6 good tubes. Two are drawn out from the box one after other.
One of them is tested and found to be good. What is the probability that the other one is
also good?[Ans 59 ]

7
1 Unit I: Basic Probability

2. Two fair dice are thrown independently. Three events A, B, and C are defined as follows:

a) Odd face with the first die.

b) Odd face with second die.

c) Sum of the numbers in the 2 dice is odd.

Are the events A, B and C mutually independent?

3. From 6 positive and 8 negative numbers, 4 numbers are chosen at random (without replace-
505
ment) and multiplied. What is the probability that the product is positive?[ANS 1001
]

4. A lot consists of 10 good articles, 4 with minor defects and 2 with major defects. Two articles
are chosen from the lot at random (without replacement). Find the probability that

a) Both are good. [Ans 38 ]


1
b) Both have major defects.[Ans 120
]

c) At least 1 is good.[Ans 87 ]

d) At most 1 is good.[Ans 58 ]

e) Exactly I is good [Ans 21 ]


91
f) Neither has major defects. [Ans 120
]

g) Neither is good. [Ans 18 ]

5. There are 3 true coins and 1 false coin with head on both sides. A coin is chosen at random
and tossed 4 times. If head occurs all the 4 times, what is the probability that the false coin
16
has been chosen and used? [Ans: 19
]

6. A bag contains 5 balls and it is known how many of them are white. Two balls are drawn at
random from the bag and they are noted to be white. What is the chance that all the balls
in the bag are white. [Ans 12 ]

8
2 Unit II: Random Variables and Probability
Distributions

2.1 Discrete random variable

2.2 Probability mass function

2.3 Cumulative distribution function

2.4 Continuous random variable

2.5 Expectation and Variance

2.6 Binomial distribution


Refer unit II doc.pdf and classroom notes

2.7 Poisson approximation to binomial distribution


Refer unit II doc.pdf and classroom notes

2.8 Normal distribution


Practice 2.8.1 In an intelligence test administered to 1000 students, the average was 42 and stan
dard deviation was 24. Find the number of students

9
2 Unit II: Random Variables and Probability Distributions

1. Exceeding the score 50 2. Between 30 and 54

[Ans: Ans: 371, 383]

Practice 2.8.2 If X is a normal random variable with parameters µ = 3 and σ 2 = 9, find

1. P (2 < X < 5) 2. P (X > 0) 3. P (|X − 3| > 6)

Practice 2.8.3 A sample of 100 dry battery cells tested to find the length of life produced the
following results: Mean=12 hrs. SD=3 hrs. Assuming the data to be normally distributed, what
percentage of battery cells are expected to have life

More than 15 hrs. Less than 6 hrs. Between 10 & 14 hrs.

[Ans: 15.87%,2.28%,49.74%]

Practice 2.8.4 The income of a group of 10000 persons were found to be normally distributed with
mean rs 520 and S.D. Rs 60. Find

1. the number of persons having incomes between Rs 400 and Rs 550?

2. The lowest income of the richest 500.

[Ans: 6687, Rs 618.40]

Practice 2.8.5 For a normal variate X with mean 25 and standard deviation 10, find the area
between

1. X = 25, X = 35 2. X = 15, X = 35 3. X ≥ 15 4. X ≥ 35

Practice 2.8.6 If the height of 500 students is normally distributed with mean 68 inches and SD
4 inches, estimate the number of students having heights

1. greater than 72 inches 2. less than 62 inches 3. between 65 and 71 inches.

Practice 2.8.7 For a normally distributed variate X with mean 1 and s.d 3, find

1. P (3.43 ≤ X ≤ 6.19) 2. P (−1.43 ≤ X ≤ 2.3)

[Ans: 0.1672, 0.4574]

10
3 Unit III: Bivariate Distributions

Consider two random variables X and Y defined on the same sample space. For example, X can
denote the grade of a student and Y can denote the height of the same student. The joint cumulative
distribution function (joint CDF) of X and Y is given by

FXY (x, y) = P [X ≤ x, Y ≤ y]

The pair (X, Y ) is referred to as a bivariate random variable. If we define FX (x) = P [X ≤ x] as the
marginal CDF of X and FY (y) = P [Y ≤ y] as the marginal CDF of Y, then we define the random
variables X and Y to be independent if

FXY (x, y) = FX (x)FY (y)

for every value of x and y.

11
3 Unit III: Bivariate Distributions

12
13
3 Unit III: Bivariate Distributions

14
3.1 Marginal PMFs and PDFs

3.1 Marginal PMFs and PDFs

3.1.1 Marginal PMF s

The marginal PMFs for X:


X
pX (x) = pXY (x, y) = P [X = x]
y

The marginal PMFs for X:


X
pY (y) = pXY (x, y) = P [Y = y]
x

3.1.2 Marginal PDF s

The marginal PDFs for X:


Z ∞
fX (x) = fXY (x, y)dy
−∞

The marginal PDFs for Y:


Z ∞
fY (y) = fXY (x, y)dx
−∞

3.2 Rule for Independence


If X and Y are independent random variables,

pXY (x, y) = pX (x)pY (y)

If X and Y are independent random variables,

fXY (x, y) = fX (x)fY (y)

3.3 Conditional Distributions


Recall that for two events A and B, the conditional probability of event A given event B is defined
by
P [A ∩ B]
P [A|B] =
P [B]
which is defined when P[B] ¿ 0. In this section we extend the same concept to two random variables
X and Y governed by a joint CDF F XY (x, y).

15
3 Unit III: Bivariate Distributions

3.3.1 Conditional PMF for Discrete Random Variables

Consider two discrete random variables X and Y with the joint PMF pXY (x, y). The conditional
PMF of Y, given X = x, is given by

P [X = x, Y = y]
pY |X (y|x) =
P [X = x]

pXY (x, y)
=
pX (x)
provided pX (x) > 0. The conditional PMF of X, given Y = y, is given by

P [X = x, Y = y]
pX|Y (x|y) =
P [Y = y]

pXY (x, y)
=
pY (y)
provided pY (y) > 0.
If X and Y are independent random variables

pX|Y (x|y) = pX (x)

pY |X (y|x) = pY (y)

3.3.2 Conditional PDF for Continuous Random Variables

16
3.3 Conditional Distributions

Problem 3.3.1 The joint probability function of two discrete random variables X and Y is given
by

c(2x + y) 0 ≤ x ≤ 2, 0 ≤ y ≤ 3

p(x, y) =
0

otherwise

Find

1. the value of the constant c

2. P (X ≥ 1, Y ≤ 2)

3. P (X = 2, Y = 1)

4. Marginal probability function of X and Y.

Problem 3.3.2 The joint density function of two continuous random variables X and Y is

cxy

0 < x < 4, 1 < y < 5
f (x, y) =
0

otherwise

Find

1. The value of the constant c

2. P (1 < X < 2, 2 < Y < 3)

3. P (X ≥ 3, Y ≤ 2)

4. Marginal probability function of X and Y.

Problem 3.3.3 The joint probability distribution of two random variables X and Y is given by:
P (X = 0, Y = 1) = 31 , P (X = 1, Y = −1) = 13 , P (X = 1, Y = 1) = 13 ,
Find

1. Marginal distribution of X and Y

2. the conditional probability distribution of X given Y = 1

17
3 Unit III: Bivariate Distributions

Problem 3.3.4 The joint probability density function of a two dimensional random variable (X,
Y) is given by:


2;

0 < x < 1, 0 < y < x
f (x, y) =
0; otherwise

1. Find the marginal density function of X and Y.

2. Find the conditional density function of Y given X = x and conditional density function of X
given Y = y.

3. Check for independence of X and Y

Problem 3.3.5 Joint distribution of X and Y is given by



4xye−(x2 +y2 ) x ≥ 0, y ≥ 0

f (x, y) =
0

otherwise

Test whether X and Y are independent. For the above joint distribution, find the conditional density
of X given Y = y
They are independent
2
fX|Y (X = x|Y = y) = 2xe−x ; x ≥ 0

Problem 3.3.6 The joint density function of the random variables X and Y is given by:


8xy; 0 < x ≤ 1, 0 < y ≤ x

f (x, y) =
0;

elsewhere

Find

1. the marginal density of X

Ans: fX (x) = 4x3 , 0 < x ≤ 1

2. the marginal density of Y

Ans: fY (y) = 4y(1 − y 2 )

18
3.3 Conditional Distributions

3. the conditional density of X


2x
; y≤x≤1


1−y 2
fX (x|y) =
0;

otherwise

4. the conditional density of Y.


 2y2 ;

0≤y≤x
x
fY (y|x) =
0;

otherwise

Problem 3.3.7 The joint PMF of two random variables X and Y is given by


k(2x + y) x = 1, 2; y = 1, 2

pXY (x, y) =
0

Otherwise

1. What is the value of k?

2. Find the marginal PMFs of X and Y.

3. Are X and Y independent?

19
3 Unit III: Bivariate Distributions

20
3.3 Conditional Distributions

Problem 3.3.8 A fair coin is tossed three times. Let X be a random variable that takes the value
0 if the first toss is a tail and the value 1 if the first toss is a head. Also, let Y be a random variable
that defines the total number of heads in the three tosses.

1. Determine the joint PMF of X and Y. 2. Are X and Y independent?

Solution Let H denote the event that a head appears on a toss and T the event that a tail appears
on a toss. Table shows the sample space and the values of the two random variables.
(a) Since X takes values 0 and 1, and Y takes values 0, 1, 2, and 3, the joint PMF of X and Y can then

21
3 Unit III: Bivariate Distributions

be constructed as follows:

22
3.3 Conditional Distributions

Problem 3.3.9 X and Y are two continuous random variables whose joint PDF is given by


e−(x+y)

0 ≤ x < ∞, 0 ≤ y < ∞
fXY (x, y) =
0

Otherwise

1. Find the marginal PDFs of X and Y. 2. Are X and Y independent?

Problem 3.3.10 The joint PMF of two random variables X and Y is given by


 1 (2x + y) x = 1, 2; y = 1, 2

18
PXY (x, y) =
0

Otherwise

1. What is the conditional PMF of Y given X?

2. What is the conditional PMF of X given Y?

23
3 Unit III: Bivariate Distributions

Problem 3.3.11 The joint probability function of two discrete RVs X and Y is given by

c(2x + y) x = 0, 1, 2; y = 0, 1, 2, 3

PXY (x, y) =
0

Otherwise

1
1. find c. Ans c= 42

4
2. P (X ≥ 1, Y ≤ 2). Ans: 7

Problem 3.3.12 
k(6 − x − y)

0 < x < 2, 2 < y < 4
fXY (x, y) =
0

Otherwise

1. find k. Ans k= 18

2. P (X < 1, Y < 3), P (X < 1|Y < 3) Ans:P (X < 1, Y < 3) = 83 , P (X < 1|Y < 3) = 3
5

24
4 Unit IV: Basic Statistics

4.1 Measures of Central tendency

4.2 Moments

4.3 Moments generating function


1
Problem 4.3.1 The random variable X can assume the values 1 and -1 with probability 2
each.
Find

1. the moment generating function and

2. the first four moments about origin

Ans: MX (t) = 21 (et + e−t ) , µ = µ01 = 0, µ02 = 1, µ03 = 0, µ04 = 1

Problem 4.3.2 A random variable X has the density function given by



2e−x

x≥0
f (x) =
0

x<0
Find (a) the moment generating function and (b) the first four moments about origin.

Ans: MX (t) = 2
2−t
assuming t < 2, µ = µ01 = 12 , µ02 = 12 , µ03 = 34 , µ04 = 3
2

Problem 4.3.3 Find the first four moments (a) about the origin, (b) about the mean,

 4x(9−x2 ) 0 ≤ x ≤ 3

81
f (x) =
0

otherwise

random variable X having density function

25
4 Unit IV: Basic Statistics

Ans: µ = µ01 = 85 , µ02 = 3, µ03 = 216


35
, µ04 = 27
2
11 −32 3693
Ans: µ1 = 0, µ2 = 25
, µ3 = 875
, µ4 = 8750

Problem 4.3.4 a) Find the moment generating function of a random variable X having density
function 
x

0≤x≤2
2
f (x) =
0

otherwise

(b) Use the generating function of (a) to find the first four moments about the origin.

1+2te2t −e2t
Ans: MX (t) = 2t2

Ans: µ = µ01 = 43 , µ02 = 2, µ03 = 16


5
, µ04 = 16
3

Problem 4.3.5 If X denotes the outcome when a fair die is tossed, Find the moment generating
function of X and hence find the mean and variance of X.

1
et + e2t + e3t + e4t + e5t + e6t , Mean = 27 , Variance= 35

Ans: 6 12

Problem 4.3.6 A r.v. X takes values 0 and 1 with probabilities q and p respectively with q+p=1.
Find the mgf of X and show that all the moments about the origin equal p

Ans: MX (t) = q + pet

4.4 Skewness & Kurtosis


Problem 4.4.1 The first four moments of a distribution about the value 5 of the random variable
X are 2, 20, 40 and 50. Compute a measure, each of central tendency, dispersion, skewness and
kurtosis. Comment on the skewness and kurtosis of the distribution.
Ans: [Ans. mean=7, SD=4, skewness= - 64, Kurtosis= 162]

Solution:
The first four moments of a distribution about the value 5 of the random variable X
E(X − 5) = 2, E(X − 5)2 = 20, E(X − 5)3 = 40 & E(X − 5)4 = 50.
E(X − 5) = 2 =⇒ E(X) − E(5) = 2
E(X) − 5 = 2 =⇒ E(X) = 7
E(X)=7

26
4.4 Skewness & Kurtosis

E(X − 5)2 = 20
E(X 2 − 2.(X).(5) + 52 ) = 20
E(X 2 − 10X + 25) = 20
E(X 2 ) − 10E(X) + 25 = 20
E(X 2 ) − 10(7) + 25 = 20
E(X 2 ) − 45 = 20
E(X 2 ) = 65

27
4 Unit IV: Basic Statistics

Problem 4.4.2 The first three moments of a distribution about the value 2 of the random variable
X are 1, 16 and - 40. Find the coefficient of skewness.

Ans: -1.480

Problem 4.4.3 The first four moments of a distribution about X = 4 are 1, 4, 10, 45 respectively.
Find the mean, variance, coefficient of skewness and coefficient of Kurtosis.

Ans: [Ans. 5; 3; 0; 26/9]

Problem 4.4.4 The distribution of a random variable X has mean 10, variance 16, coefficient of
skewness 1 and coefficient of kurtosis 4. Obtain the first four moments of X about origin.

Ans: [Ans. 10, 116, 1544, 23184]

Problem 4.4.5 Compute coefficient of skewness and coefficient of Kurtosis for the following dis-
tribution
X=x 0 1 2 3 4 5 6 7 8
p 0.004 0.036 0.1 0.232 0.280 0.240 0.112 0.028 0.004

4.5 Mean and variance of Binomial & Poisson distribution


Problem 4.5.1 Fit a binomial distribution for the following data:
x 0 1 2 3 4 5 6
f 5 18 28 12 7 6 4

Ans: 0.288,5.58
x 0 1 2 3 4 5 6
F 4 15 25 22 11 3 0

Problem 4.5.2 With usual notation find p of Binomial distribution if n=6,

9P (X = 4) = P (X = 2)

Also find mean, variance, skewness and kurtosis

Ans: [ 41 , 32 , 89 , 16
9 45
, 32 ]

28
4.6 Moments, skewness & kurtosis for Normal distribution

Problem 4.5.3 Find the mean and standard deviation, skewness and kurtosis of the following
probability distribution.
X=x 0 1 2 3 4 5 6 7 8
p 0.004 0.036 0.1 0.232 0.280 0.240 0.112 0.028 0.004

[Ans. mean=3.972, SD=1.410]

Problem 4.5.4 Fit a Poisson distribution for the following:


Also find mean, variance, skewness and kurtosis.
x 0 1 2 3 4
F 123 59 14 3 1

Ans: 0.5,0.5,0.5,1.25
x 0 1 2 3 4
F 121 61 15 3 1

Problem 4.5.5 Fit a Poisson distribution for the following data:


x 0 1 2 3 4 5 Total
f 142 156 69 27 5 1 400
Also find mean, variance, skewness and kurtosis

Problem 4.5.6 If a random variable X follows Poisson distribution such that

P (X = 1) = 2P (X = 2)

, find the mean, variance, skewness and kurtosis of the distribution. Also find P(X=3).

Ans: 1,1,1,4,0.06143

Problem 4.5.7 Find out the fallacy of the statement: “If X is a Poisson varaite such that

P (X = 2) = 9P (X = 4) + 90P (X = 6)

Then mean of X is 1.” Also find mean, variance, coefficient of skewness and kurtosis.

Correct,1,1,1,4

4.6 Moments, skewness & kurtosis for Normal distribution

29
5 Unit V: Testing of Hypothesis

5.1 z-test

5.1.1 One sample mean test

ˆ
x̄ − µ
z=
( √σn )
where σ is the standard deviation of the population.

Also,

x̄ is sample mean, µ is the population mean, n is the sample size

ˆ If σ is not known, we use

x̄ − µ
z=
( √sn )

where s is the standard deviation of the sample.


Also,

x̄ is sample mean, µ is the population mean, n is the sample size

31
5 Unit V: Testing of Hypothesis

Problem 5.1.1 A sample of 100 students is taken from a large population. The
mean height of the students in this sample is 160 cm. Can it be reasonably regarded
that, in the population, the mean height is 165cm, and the SD is 10cm?

Ans: zcal = −5, H0 is rejected

Problem 5.1.2 The mean breaking strength of the cables supplied by a manufac-
turer is 1800, with an SD of 100. By a new technique in the manufacturing process,
it is claimed that the breaking strength of the cable has increased. To test this claim,
a sample of 50 cables is tested and it is found that the mean breaking strength is
1850. Can we support the claim at 5% LOS?

Ans: zcal = 3.54, H0 is rejected

Problem 5.1.3 A random sample of 50 items gives the mean 6.2 and standard
deviation 10.24. Can it be regarded as drawn from a normal population with mean
5.4 at 5% level of significance?

Ans: zcal = 1.74, H0 is accepted

Problem 5.1.4 The mean height of a random sample of 100 individuals from a
population is 160. The Standard deviation of the sample is 10. Would it be reason-
able to suppose that the mean height of the population is 165?

Ans: zcal = 5, No

Problem 5.1.5 The mean value of a random sample of 60 items was found to
be 145, with a standard deviation of 40. Find the 95% confidence limits for the
population mean. What size of the sample is required to estimate the population
mean within 5 of its actual value with 95or more confidence, using the sample
mean?

Ans: 134.9 ≤ µ ≤ 155.1, least size of the sample is 246.

32
5.1 z-test

5.1.2 two sample mean

Problem 5.1.6 A simple sample of heights of 6400 English men has a mean of
170 cm and an SD of 6.4 cm, while a simple sample of heights of 1600 Americans
has a mean of 172 cm and an SD of 6.3 cm. Do the data indicate that Americans
are, on the average, taller than the Englishmen?

Ans: zcal = −11.32, H0 is rejected

Problem 5.1.7 Test the significance of the difference between the means of the
samples, drawn from two normal populations with the same SD using the following
data:
size mean SD
sample 1 100 61 4
sample 2 200 63 6

Ans: zcal = −3.02, H0 is rejected

Problem 5.1.8 The average marks scored by 32 boys are 72 with SD of 8, while
that for 36 girls is 70 with SD of 6. Test at 5 % LOS whether the boys perform
better than girls.

Ans: zcal = 1.15, H0 is rejected

Problem 5.1.9 An IQ test was given to a large group of boys in the age group
of 18-20 years, who scored an average of 62.5 marks. The same test was given to
a fresh group of 100 boys of the same age group. They scored an average of 64.5
marks with an SD of 12.5 marks. Can we conclude that the fresh group of boys has
better IQ?

Ans: zcal = 1.6, YES

33
5 Unit V: Testing of Hypothesis

Problem 5.1.10 The means of two simple samples of 1000 and 2000 items are
170 and 169 cm respectively. Can the samples be regarded as drawn from the same
population with SD 10 at 5% LOS?

Ans: zcal = 2.58, NO.

Problem 5.1.11 Intelligence tests were given to two groups of boys and girls of
the same age group chosen from the same college and the following results were
obtained.
size mean SD
Boys 100 73 10
Girls 60 75 8
Examine whether the difference between the means is significant or not

Ans: zcal = 1.32, NO

Problem 5.1.12 In a college, 60 junior students are found to have a mean height
of 171.5 cm and 50 senior students are found to have a mean height of 173.8 cm.
Can we conclude, based on this data that the juniors are shorter than seniors at (i)
5% LOS and (ii) 1% LOS, assuming that the SD of the students of that college is
6.2 cm?

Ans: zcal = 1.937, Yes at 5 % and NO at 1 %.

5.1.3 One sample proportion

Problem 5.1.13 Experience has shown that 20% of manufactured product is of top
quality. In one day’s production if 400 articles, only 50 are of top quality. Show
that wither the production of the day chosen was not a representative sample or the
hypothesis of 20% was wrong.

34
5.1 z-test

Ans: zcal = 3.75, H0 is rejected

Problem 5.1.14 A salesman in a departmental stores claims that at most 60 per-


cent of the shoppers entering the leaves without making purchase. A random sample
of 50 shopper showed that 35 of them left without making a purchase. Are these
sample results consistent with the claim of the salesman? Use an LOS of 0.05

Ans: zcal = 1.443, H0 is accepted

Problem 5.1.15 In a particular area 984 girls against every 1000 boys. Can we
say gender equality exist?

Ans: zcal = −0.445, H0 is accepted

Problem 5.1.16 Certain crosses of pea gave 5321 yellow and 1804 green seeds.
The expectation is 25% green seeds based on a certain theory. Is this divergence
significant or due to sampling fluctuation?
[significant is due to sampling fluctuation?]

Problem 5.1.17 A company provided 1000 items are 3% defectively. A person


pics up sample of 50 and he found 2 items defected. Can you say defectively has
increased and he rejected entire slot.

Ans: zcal = 0.4145, H0 is accepted

5.1.4 Two sample proportion

Problem 5.1.18 In a large city A, 20% of a random sample of 900 school boys had
a slight physical defect. In another large city B, 18.5 percent of a random sample
of 1600 school boys had the same defect. Is the difference between the proportions
significant?

35
5 Unit V: Testing of Hypothesis

Ans: zcal = 0.92, H0 is accepted

Problem 5.1.19 Before an increase in excise duty on tea, 800 people out of a
sample of 1000 were consumers of tea. After the increase in duty, 800 people were
consumers of tea in a sample of 1200 persons. Find whether there is significant
decrease in the consumption of tea after the increase in duty.

Ans: zcal = 6.82, H0 is rejected

Problem 5.1.20 A Government has started giving tax incentives for installing
certain energy saving devices. However, not entire population is aware about this
offer. Out of 400 persons, selected randomly, 140 were ‘unaware’ about the of-
fer. Construct interval estimation for ‘proportion aware’ about the offer with 95%
confidence level.

[.6032,.6967]

Problem 5.1.21 During a countrywide investigation, the incidence of TB was


found to be 1%. In a college with 400 students, 5 are reported to be affected whereas
in another college of 1200 students, 10 are found to be affected. Does this indicate
any significance difference?

Ans: zcal = 0.725, not significant.

36
5.2 t-test

5.2 t-test

37
5 Unit V: Testing of Hypothesis

5.2.1 one sample

Problem 5.2.1 Tests made on the breaking strength of 10 pieces of a metal gave
the following results: 578, 572, 570, 568, 572, 570, 570, 572, 596 and 584 kg. Test
if the mean breaking strength of the wire can be assumed as 577 kg

Ans: tcal = −0.65, H0 is accepted

Problem 5.2.2 A machinist is expected to make engine parts with axle diameter
of 1.75 cm. A random sample of 10 parts shows a mean diameter 1.85 cm with
an SD of 0.1 cm. On the basis of this sample, would you say that the work of the
machinist is inferior?

Ans: tcal = 3, H0 is rejected

5.2.2 Two sample

Problem 5.2.3 Two independent samples of sizes 8 and 7 contained the following
values:
sample 1 19 17 15 21 16 18 16 14
sample 2 15 14 15 19 15 18 16
Is the difference between the sample means significant?

Ans: tcal = 0.93, H0 is accepted


x1 19 17 15 21 16 18 16 14
x1 15 14 15 19 15 18 16
n1 = 8,n2 = 7, x1 = 17, x2 = 16,
x21
P P
x1 2
s21 = n1 −( n1 ) = 4.5,
P 2 P
x2
s22 = n2 − ( n2x2 )2 = 2.8571,
H0 : x1 = x2

38
5.2 t-test

H0 : x1 6= x2

ttab = t13,0.05 = 2.1604 [Two tailed]


x1 −x2
tcal = r  = 0.9309
n1 s2 2
1 +n2 s2

1 1
n1 +n2 −2 n1 + n2

Since |t| < t0.05 , H0 is accepted and H1 is rejected.


That is, the two sample means do not differ significantly at 5% LOS.

Problem 5.2.4 The following data represent the biological values of protein from
cow’s milk and buffalo’s milk at a certain level

Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54


Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
Examine if the average values of protein in the two samples significantly differ.

Ans: tcal = −2.03, H0 is accepted

Problem 5.2.5 The following data relate to the marks obtained by 11 students in
2 tests, one held at the beginning of a year and the other at the end of the year
after intensive coaching

Test 1 19 23 16 24 17 18 20 18 21 19 20
Test 2 17 24 20 24 20 22 20 20 18 22 19
Do the data indicate that the students have benefited by coaching?

Ans: tcal = −1.38, H0 is accepted

39
5 Unit V: Testing of Hypothesis

5.3 F-test

A random variable F is said to follow snedecor’s F-distribution or simply

F-distribution if its probability density function is given by

40
5.3 F-test

41
5 Unit V: Testing of Hypothesis

Problem 5.3.1 A sample of size 13 gave an estimated population variance of 3.0


while another sample of size 15 gave an estimate of 2.5. Could both samples be
from populations with the same variance? LOS 5%. [ Fcal = 1.2, H0 is accepted.]

42
5.3 F-test

Problem 5.3.2 In one sample of 9 items, the sum of the squares of deviations
of the sample values from the sample mean was 160, and in another sample of 8
observations it was 91. Test whether difference in variance is significant at 5 %
level.

[ Fcal = 1.54, H0 is accepted.]

43
5 Unit V: Testing of Hypothesis

Problem 5.3.3 A research was conducted to understand whether women have a


greater variation in attitude on political issues than men. Two independent sam-
ples of 31 men and 41 women were used for the study. The sample variances so
calculated were 120 for women and 80 for men. Test whether the difference in
attitude towards political issue is significant at 5% level of LOS?

[ Fcal = 1.5, H0 is accepted.]

44
5.3 F-test

Problem 5.3.4 Two independent samples of 8 and 7 items respectively had the
following values of the variable:

Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10
Do the estimates of population variance differ significantly at 5 % LOS?

[ Fcal = 1.21, H0 is accepted.]

45
5 Unit V: Testing of Hypothesis

5.4 χ2- test

Problem 5.4.1 An unbiased die is thrown 600 times and outcomes are given below.
Can we really claim the die is unbiased?

1 2 3 4 5 6
110 98 92 103 105 92

H0 : Die is unbiased
H1 : Die is not unbiased
(Oi −Ei )2
Oi Ei Ei

1 110 100 1.00


2 98 100 .04
3 92 100 .64
4 103 100 0.09
5 105 100 0.25
6 92 100 0.64
2.66

χ2tab = χ25,0.05 = 11.0705


Test statistic:
P (Oi −Ei )2 
χ2cal = Ei = 2.66
χ2cal < χ2tab
H0 is accepted
Die is unbiased

46
5.4 χ2 - test

Problem 5.4.2 If one of the parent’s we say at 5% LOS population is behaving


blood group is A and B then blood group expectedly?
of child is A, AB, B past data shows A 70

A : AB : B = 1 : 2 : 1. A sample of AB 125

250 new born baby with either parent’s B 55

A is 70, B is 55 and remaining AB. Can

H0 : Population is behaving expectedly


H1 : Population is not behaving expectedly

(Oi −Ei )2
Oi Ei Ei

A 70 62.5 0.09
AB 125 125 .00
B 55 62.5 .09
1.8

χ2tab = χ22,0.05 = 5.9915


Test statistic:
P  (Oi −Ei )2 
χ2cal = Ei = 1.8
Conclusion:
χ2cal < χ2tab
H0 is accepted
Population is behaving expectedly.

47
5 Unit V: Testing of Hypothesis

Problem 5.4.3 The following data show defective article produced by 4 machines:
Do the figures indicate a significant difference in the performance of the ma-
chines?

Machine A B C D
Production time 1 1 2 3
No. of defective 12 30 63 98

Ans: χ2cal = 11.82, H0 is rejected.

Problem 5.4.4 Two sample polls of lowing table


voters for votes for two candidates A and Area Votes for

B for a public office are taken, one from A B


among the rural areas and another from Rural 620 380
urban areas. The results are in the fol- Urban 550 450

Examine whether the nature of the area is related to voting preference in this
election.
Ans: χ2cal = 10.092, H0 is rejected.

Problem 5.4.5 A pharmaceutical company is considering vaccine for new disease

Vaccinated not vaccinated


No. of affected people with disease 50 150
No. of not affected people with dis- 25 125
ease
Can we conclude that vaccination and disease are independent?

Ans: χ2cal = 93.63, H0 is rejected.

48
5.4 χ2 - test

Problem 5.4.6 The following data is Smokers Non


collected on two characters. Based on smokers
this, Can you say that there is no rela- Literates 83 57
tion between smoking and literacy? Illiterates 45 68

H0 : Literacy & smoking are independent.


H1 : Literacy & smoking are not independent.

(Oi −Ei )2
Oi Ei Ei (rounded) Ei
128×140 122
1 83 253 = 70.83 71 71 = 2.03
125×140 122
2 57 253 = 69.17 69 69 = 2.09
128×113 122
3 45 253 = 57.17 57 57 = 2.53
125×113 122
4 68 253 = 55.83 56 56 = 2.57
χ2cal = 9.22

ν = (m − 1)(n − 1) = (2 − 1)(2 − 1) = 1
χ2tab = χ2ν=1,0.05 = 3.84
Test statistic:
P  (Oi −Ei )2 
χ2cal = Ei = 9.22
Conclusion:
χ2cal < χ2tab
H0 is rejected
There is some association between and smoking

49
6 Linear Statistical models

6.1 Correlation
Problem 6.1.1 Calculate the correlation coefficient for the following heights (in inches) of fathers
(X) and their sons(Y)

X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71

[r = 0.6030 (Karl Pearson)]

Solution:
X Y X2 Y2 XY
65 67
66 68
67 65
67 68
68 72
69 72
70 69
72 71
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P
n = 8, X = 544, Y = 552, X = 37028, XY = 37560
P P
X Y
X= n
= 68, Y = n
= 69

Karl Pearson’ s coefficient of correlation


P
XY
− X.Y
n
r(X, Y ) = q P qP
X2 2 Y2 2
n
−X . n
−Y

51
6 Linear Statistical models

r(X, Y ) = 0.6030

Problem 6.1.2 Calculate the Karl Pearson’s correlation coefficient from the following data

X 28 45 40 38 35 33 40 32 36 33
Y 23 34 33 34 30 26 28 31 36 35

[0.5185]

Solution:

X Y X2 Y2 XY
28 23
45 34
40 33
38 34
35 30
33 26
40 28
32 31
36 36
33 35
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P 2 P
n= , X= , Y = , X = , Y = , XY =
P P
X Y
X= n
= ,Y = n
=

Karl Pearson’ s coefficient of correlation


P
XY
− X.Y
n
r(X, Y ) = q P qP
X2 2 Y2 2
n
−X . n
−Y

r(X, Y ) = 0.5185

Problem 6.1.3 Calculate the correlation coefficient from the following data:

52
6.1 Correlation

X 30 33 25 10 33 75 40 85 90 95 65 55
Y 68 65 80 85 70 30 55 18 15 10 35 45

[r = −.9935]

Solution:
X Y X2 Y2 XY
30 68
33 65
25 80
10 85
33 70
75 30
40 55
85 18
90 15
65 35
55 45
P P P 2 P 2 P
X= Y = X = Y = XY =
P P P 2 P 2 P
n= , X= , Y = , X = , Y = , XY =
P P
X Y
X= n
= ,Y = n
=

Karl Pearson’ s coefficient of correlation


P
XY
− X.Y
n
r(X, Y ) = q P qP
X2 2 Y2 2
n
−X . n
−Y

r(X, Y ) = −.9935

Problem 6.1.4 A computer while calculating correction coefficient between two variables X and Y
from 25 pairs of observations obtained the following results:
P P 2 P P 2 P
n = 25, X = 125, X = 650, Y = 100, Y = 460, XY = 508 [r = 0.2065]
Solution:

53
6 Linear Statistical models

P PP
XY − X Y
n
r(X, Y ) = p P P p P P
n X 2 − ( X)2 . n Y 2 − ( Y )2

r(X, Y ) = 0.2065

Problem 6.1.5 The following table gives the distribution of items of production and also the rel-
atively defective items among them according to size-groups. Is there any correlation between size
and defect in quantity?

Size groups 15-16 16-17 17-18 18-19 19-20 20-21


No. of items 200 270 340 360 400 300
No. of defective items 150 162 170 180 180 120

[Ans: r = 0.94]

Problem 6.1.6 Calculate the rank correlation coefficient from the following data.

X 1 3 7 5 4 6 2 10 9 8
Y 3 1 4 5 6 9 7 8 10 2

[r = 0.4182]

Solution:

n = 10
di = X − Y

X Y di d2i

54
6.1 Correlation

d2i =
P

Spearman’s Rank Correlation coefficient is defined by

6 d2i
P
R=1− 3
n −n

R = 0.4182

Problem 6.1.7 The following table shows the marks obtained by 10 students in Accountancy and
Statistics. Find the Spearman’s coefficient of rank correlation.

No. 1 2 3 4 5 6 7 8 9 10
Acc 45 70 65 30 90 40 50 57 85 60
Stat 35 90 70 40 95 40 60 80 80 50

[Ans : R= 0.8658]

Solution:

n = 10
di = Rank(Acc) − Rank(Stat)

Acc Stat Rank(Acc) Rank(Stat) di d2i


45 35 8 10 -2 4
70 90 3 2 1 1
65 70 4 5 -1 1
30 40 10 8.5 1.5 2.25
90 95 1 1 0 0
40 40 9 8.5 .5 0.25
50 60 7 6 1 1
57 80 6 3.5 2.5 6.25
85 80 2 3.5 -1.5 2.25
60 50 5 7 2 4

55
6 Linear Statistical models

If ranks are repeated then the Spearman’ s Rank corrrelation formula becomes

P m3−m
d2i
P
6[ + 12 ]
R=1−
n3 − n

d2i = 22
P

m1 = 2, m1 : 80 marks are obtained by two students in Statistics.


m2 = 2, m2 : 40 marks are obtained by two students in Statistics.

1

d2i (m31 (m32
P
6[ + 12 − m1 ) + − m2 ) ]
R=1−
n3 − n

1 3 3

6[22 + 12 (2 − 2) + (2 − 2) ]
R=1−
103 − 10

R = 0.8658

Problem 6.1.8 Find the coefficient of correlation between height of father and height of son from
the following data.

Height of father 65 66 67 67 68 69 71 73
Height of son 67 68 64 68 72 70 69 70

[Ans : R= 0.4719][Karl Pearson]


Ans. 0.6969 [Spearman’s Rank Correlation ]

6.2 Regression
Regression can be defined as a method to estimate the value of one variable when that of other is
known, when the variables are correlated. Regression analysis is a mathematical measure of average
relationship between two or more correlated values.

56
6.2 Regression

6.2.1 Linear regression

Equations of Lines of regression:

1. Line of regression of y on x is:

y − y = byx(x − x)

where regression coefficient of y on x is given by

Cov(x,y) σy
byx = = r
σx2 σx
2. Line of regression of x on y is:

x − x = bxy (y − y)

where regression coefficient of x on y is given by


Cov(x,y) σx
bxy = = r
σy2 σy
Properties:

1. Lines of regression passes through the point (x, y)


p
2. byx bxy = r2 . [r = byx .bxy ]

3. byx ,bxy have same sign.

Least Squares Straight Line For a given set of N data points (x1 , y1 ),(x2 , y2 ),. . . (xN , yN )
assume that the straight line

Y = a0 + a1 X = f (X)

fits to the data in the least squares sense

57
6 Linear Statistical models

Normalized equation are given by

X X
Y i = N a 0 + a1 Xi

X X X
X i Y i = a0 X i + a1 Xi2

known as ”Normal equations”.

Problem 6.2.1 The following are the marks in Statistics (X) and Mathematics(Y) of ten students

X 56 55 58 57 56 60 54 59 57 58
Y 68 67 67 65 68 70 66 68 66 70

Calculate the coefficient of correlation and estimate marks in Mathematics of a student who scored
62 marks in Statistics.
[Ans : r = 0.44 , Y = 69.5 ]

Problem 6.2.2 It is given that the means of x and y are 5 and 10. If the line of regression of y on
x is parallel to the line 20y = 9x + 40 , estimate the value of y at x = 30
[Ans: 20y = 9x + 155 and y = 21.25]

Problem 6.2.3 Find the two lines of regression from the following data

X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71

[Ans : x = 30.364 + 0.545 y and y = 23.667 + 0.667 x]

Problem 6.2.4 In partially destroyed laboratory record of an analysis of correlation data, the fol-
lowing results only are legible- Variance of X = 9, regression equations are: 8X − 10Y + 66 = 0 &
40X − 18Y = 214 What was

1. the mean of X and Y

2. the correlation between X and Y

3. the S.D. of Y

Problem 6.2.5 You are given the following data

58
6.3 ANOVA

X Y
Mean 30.1 47.8
standard deviation 6.2 9.5

Problem 6.2.6 Obtain the equation of the line of regression of cost on age from the following table
giving the age of a car of certain make and the annual maintenance cost.

Age of car 2 4 6 8
Maintenance 5 7 8.5 11

Also find maintenance cost of the car if its age is 9 years


[Ans : y = 3 + 0.975 x and y = Rs. 11775]

6.2.2 Multiple regression

So far, we have seen the concept of simple linear regression where a single predictor variable X was
used to model the response variable Y . In many applications, there is more than one factor that
influences the response. Multiple regression models thus describe how a single response variable Y
depends linearly on a number of predictor variables.

6.3 ANOVA

6.3.1 ANOVA: One way

6.3.1.1 Assumptions in Analysis of Variance

1. Population from which observations are taken are normal.

2. The environmental results and different treatments are additive in nature.

3. The observations are independent.

6.3.1.2 Technique

In one way classification the data are classified according to only one criterion . The null hypothesis
is:

H0 : µ1 = µ2 = µ3 ... = µk

59
6 Linear Statistical models

H0 : µ1 6= µ2 6= µ3 ... 6= µk

1. Calculate variance between the samples:

2. Calculate variance within the samples:

3. Calculate the ratio F as follows:

Between-column variance
F =
Within-column variance

Symbolically,
S12
F =
S22

4. Compare the calculated value of F with table value of F for the degrees of freedom at a certain
critical level .

If the calculated of F is greater than the table value , it is concluded that the difference in
sample means is significant.

Source Sum of squares Degrees of freedom Mean square Variance


of Variation (SS) (ν) (MS) Ratio of F
SSC
Between samples SSC ν1 = c − 1 MSC = c−1
SSE MSC
Within samples SSE ν2 = n − c MSE = n−c MSE

Total SST n-1

SST=Total sum of squares of variations


SSC= Sum of squares between sample (columns)
SSE = Sum of squares within samples (rows)
MSC= Mean Sum of squares between sample
MSE =Mean Sum of squares within samples

60
6.3 ANOVA

Problem 6.3.1 To assess the significance of A B C D


possible variations in performance in a certain 8 12 18 13
test between the convent schools of a city, a com- 10 11 12 9
mon test was given. The results are given below. 12 9 16 12
Make an analysis of variance of the data at 5% 8 14 6 16
of LOS 7 4 8 15

X1 X12 X2 X22 X3 X32 X4 X42


8 64 12 144 18 324 13 169
10 100 11 121 12 144 9 81
12 144 9 81 16 256 12 144
8 64 14 196 6 36 16 256
7 49 4 16 8 64 15 225
X12 = 421 X22 = 558 X32 = 824 X42 = 875
P P P P P P P P
X1 X2 X3 X4
=45 =50 =60 =65

The sum of all the items of various samples


P P P P
= X1 + X2 + X3 + X4
= 45+50+60+65=220 (T)
T2 (220)2
Correction factor = N
= 20
= 48400
20
=2420
2
X12 + X22 + X32 + X42 - TN
P P P P
Total sum of squares =
=421+558+824+875-2420=258
Sum of squares between the samples
X1 )2 ( X2 )2 ( X3 )2 ( X4 )2 T 2
P P P P
=( N
+ N
+ N
+ N
-N
2 2 2 2
= (45)
5
+ (50)
5
+ (60)
5
+ (65)
5
-2420=50
Sum of squares within samples
= Total sum of squares - Sum of squares between samples
=258-50=208

61
6 Linear Statistical models

ANOVA Table
Source Sum of squares Degrees of Mean square Variance
of Variation (SS) freedom (ν) (MS) Ratio of F
SSC
Between samples 50 3 MSC = c−1
= 16.7
SSE MSC
Within samples 208 16 MSE = n−c
= 13.0 MSE
= 1.285
Total 258 19

The table value for ν1 = 3 and ν2 = 16 at 5% level of significance =3.24


The calculated value of F is less than the table value and hence
the difference in the mean value of the samples is not significant
i.e. samples could have not come from same universe.

Problem 6.3.2 Three brands A, B and C of tyres were tested for durability. A sample of 4 tyres
of each brand is subjected to the same test and the number of kilometers
until wear out was noted for each brand of tyres. The data in thousand km
is given in the table below. Make an analysis of variance of the data at 5% of LOS.

Problem 6.3.3 To test the significance of vari- ability observed in rupees were as follows:
ation in the retail prices of a commodity in three Mumbai 16 8 12 14
principal cities, Mumbai, Kolkata and Delhi, four
shops were chosen at random in each city and the Kolkata 14 10 10 6
prices who lack confidence in their mathematical
Delhi 4 10 8 8

Do the data indicate that the price in the three cities are significantly different?

62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy