0% found this document useful (0 votes)
63 views58 pages

P&S Mod3&4

The document provides various statistical problems and solutions related to correlation, regression, and rank correlation. It includes calculations for correlation coefficients using Karl Pearson’s method, partial and multiple correlations, and regression equations for different datasets. Additionally, it discusses the interpretation of results and the implications of findings in relation to advertisement expenses, age and playing habits, capital and profit, and student performance assessments.

Uploaded by

Manasvi Chouhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views58 pages

P&S Mod3&4

The document provides various statistical problems and solutions related to correlation, regression, and rank correlation. It includes calculations for correlation coefficients using Karl Pearson’s method, partial and multiple correlations, and regression equations for different datasets. Additionally, it discusses the interpretation of results and the implications of findings in relation to advertisement expenses, age and playing habits, capital and profit, and student performance assessments.

Uploaded by

Manasvi Chouhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Simple Correlation

1. From following information find the correlation coefficient between advertisement


expenses and sales volume using Karl Pearson’s coefficient of correlation method.

Firm 1 2 3 4 5 6 7 8 9 10
Advertisemen 11 13 14 16 16 15 15 14 13 13
t Exp. (Rs. In
Lakhs)
Sales Volume 50 50 55 60 65 65 65 60 60 50
(Rs. In Lakhs)

Interpretation: From the above calculation it is very clear that there is a high degree of
positive correlation i.e. r = 0.7866, between the two variables. i.e. Increase in
advertisement expenses leads to increased sales volume

2. Find the correlation coefficient between age and playing habits of the following
students using Karl Pearson’s coefficient of correlation method

Age 15 16 17 18 19 20
Number 250 200 150 120 100 80
of
students
Regular 200 150 90 48 30 12
Players

Interpretation: From the above calculation it is very clear that there is high degree of
negative correlation i.e. r = -0.9912, between the two variables of age and playing
habits. i.e. Playing habits among students decreases when their age increases.
3. Find Karl Pearson’s coefficient of correlation between capital employed and profit
obtained from the following data.

Capital 10 20 30 40 50 60 70 80 90 100
Employed
(Rs. In
Crore)
Profit (Rs. 2 4 8 5 10 15 14 20 22 50
In Crore)

Solution: Let us assume that capital employed is variable X and profit is variable Y.
4. A computer while calculating the correlation coefficient between the variable X and Y
obtained the following results: N = 30; ∑X = 120 ∑X2 = 600 ∑Y = 90 ∑Y2 = 250 ∑XY =
335 It was, however, later discovered at the time of checking that it had copied down
two pairs of observations as: (X, Y) : (8, 10) (12, 7)
While the correct values were: (X, Y) : (8, 12) (10, 8)
Obtain the correct value of the correlation coefficient between X and Y

5. Coefficient of correlation between X and Y is 0.3. Their covariance is 9. The variance


of X is 16. Find the standard devotion of Y series.
Partial and Multiple correlation

1. In a trivariate distribution, it is found that:


r 12=0.7 , r 13 =0.61 , r 23 =0.4
Find the values of 23.1, 13.2, and r 12.3
r r

2. On the basis of observations made on agricultural production ( X 1), the use of


fertilizers ( X 2), and the use of irrigation ( X 3), the following zero-order correlation
coefficients were obtained:
r 12=0.8 , r 13 =0.65 , r 23 =0.7
Compute the partial correlation between agricultural production and the use of
fertilizers, eliminating the effect of irrigation.

3. Is it possible to get the following from a set of experimental data?


(a) r 23 =0 , r 31=−0.5 , r 12=0.6 (b) r 23 =0.7 , r 31=−0.4 , r 12=0.6

4. In a problem involving three variables X 1, X 2, and X 3, we find that r 12=0.8,r 13


=0.6,r 23=0.5
Find the values of coefficients of partial correlation r 23.1 and r 13.2.

5. In a trivariate distribution, the simple coefficients of correlation are as follows:

If r 12=0.8, r 13 =0.65, and r 23 =0.72 calculate the coefficient of partial correlation


r 12.3.
Rank Correlation
1. Find out spearman’s coefficient of correlation between the two kinds of assessment
of graduate students’ performance in a college.

Name of A B C D E F G H I
student
s
Internal 51 68 73 46 50 65 47 38 60
Exam
External 49 72 74 44 58 66 50 30 35
Exam
Interpretation: From the above calculation it is very clear that there is high degree of
positive correlation i.e. R = 0.7833, between two exams. It means there is a high degree of
positive correlation between the internal exam and external exam of the students.

2. The coefficient of rank correlation of the marks obtained by 10 students in statistics


and accountancy was found to be 0.8. It was later discovered that the difference in
ranks in the two subjects obtained by one of the students was wrongly taken as 7
instead of 9. Find the correct coefficient of rank correlation.

3. From the following data, compute the rank correlation.

X 82 68 75 61 68 73 85 68
Y 81 71 71 68 62 69 80 70

Solution: In the problem, we find there are repetitions of ranks. Value of X = 68 repeated 3
times and Value of Y = 71 repeated 2 times. Therefore, we need to compute the adjustment
factor to be added to the value of ∑ D 2.
4. The coefficient of rank correlation of marks obtained by 10 students in English and
Economics was found to be 0.5. It was later discovered that the difference in marks in
two subjects obtained by one of the student was wrongly taken as 3 instead of 7.
Find the correct coefficient of rank correlation.

5. The coefficient of rank correlation between marks in two subjects obtained by a


group of students is 0.8. If the sum of squares of the differences in ranks is 33. Find
the number of students in the group. Given: Rank correlation R =0.8
Simple Regression
1. Find the two regression equation of X on Y and Y on X from the following data:
X : 10 12 16 11 15 14 20 22
Y : 15 18 23 14 20 17 25 28
2. After investigation it has been found the demand for automobiles in a city depends
mainly, if not entirely, upon the number of families residing in that city. Below are the
given figures for the sales of automobiles in the five cities for the year 2019 and the
number of families residing in those cities.

City No. of Families (in lakhs): X Sale of automobiles (in ‘000): Y


Belagavi 70 25.2
Bangalore 75 28.6
Hubli 80 30.2
Kalaburagi 60 22.3
Mangalore 90 35.4
Fit a linear regression equation of Y on X by the least square method and estimate the
sales for the year 2020 for the city Belagavi which is estimated to have 100 lakh families
assuming that the same relationship holds true.
3. From the following data obtain the two regression lines:
Capital Employed (Rs. in lakh): 7 8 5 9 12 9 10 15
Sales Volume (Rs. in lakh): 4 5 2 6 9 5 7 12
4. From the following information find regression equations and estimate the production
when the capacity utilisation is 70%.
Average (Mean) Standard Deviation

Production (in lakh units) 42 12.5

Capacity Utilisation (%) 88 8.5

Correlation Coefficient (r) 0.72


5. The following data gives the age and blood pressure (BP) of 10 sports persons.
Name : A B C D E F G H I J
Age (X) : 42 36 55 58 35 65 60 50 48 51
BP (Y) : 98 93 110 85 105 108 82 102 118 99
i. Find regression equation of Y on X and X on Y (Use the method of deviation
from arithmetic mean)
ii. Find the correlation coefficient (r) using the regression coefficients.
iii. Estimate the blood pressure of a sports person whose age is 45.
6. There are two series of index numbers, P for price index and S for stock of commodity.
The mean and standard deviation of P are 100 and 8 and S are 103 and 4 respectively.
The correlation coefficient between the two series is 0.4. With these data, work out a
linear equation to read off values of P for various values of S. Can the same equation be
used to read off values of S for various values of P?

7. For the variables X and Y the two lines of regression are given by 3x +2y –25 = 0 and 6x
+ y – 30 = 0.
(i) Identify the lines of regression of X on Y and Y on X.
(ii) Find the means of X and Y.
(iii) Find the correlation coefficient between X and Y.

Solution:
Let 3x +2y –25 = 0 (1)
6x + y – 30 = 0 (2)

8. For bivariate distribution, Mean of x = 65, Mean of y = 53 s.d of x = 4.7 s.d of y = 5.2,
Correlation Coeff = 0.78 Find two regression equations and estimate
i) The most probable value of y when x = 63
ii) The most probable value of x when y = 50
9. The regression equation of y on x is 𝑥 + 3𝑦 − 88 = 0 and that of regression equation of x
on y is 2𝑥 + 𝑦 − 71 = 0. Find
i) Mean values of x and y
ii) Coefficient of correlation
10. Given two regression equations 3𝑥 − 𝑦 − 25 = 0 and 2𝑥 − 3𝑦 + 30 = 0. Find
i) Mean values of x and y
ii) Coefficient of correlation
iii) s.d of x if s.d of y is 2

Multiple Regression:

1. A study is conducted involving 10 students to investigate the relationship and


affects of revision time and lecture attendance on exam performance.
Students exam performance, revision time and lecture attendance
Obs 1 2 3 4 5 6 7 8 9 10
𝐘 40 44 46 48 52 58 60 68 74 80
𝑿𝟏 6 10 12 14 16 18 22 24 26 32
𝑿𝟐 4 4 5 7 9 12 14 20 21 24
Stands for (Y) Exam performance (X1) Revision time (X2) Lecture attendance.
Solution:
Calculating the coefficient of regression
2. Using information below, calculate the slopes

Huband’s Housework Number of Children Husband’s Education


Y =3.3 X 1=2.7 X 2=13.7
s y =2.1 s1=1.5 s2=2.6
Zero-order correlations

r y 1=0.50 ; r y 2=−0.30 ; r 12=−0.47

Solution:
3. Find the multiple linear regression equation of X 1 on X 2 and X 3 from the data
relating to three variables given below:

X1 4 6 7 9 13 15
X2 15 12 8 6 4 8
X3 30 24 20 14 10 4
4.

Given the following, determine the regression equation of:


(i) x 1 on x 2 and x 3 and
(ii) x 2 on x 1and x 3

r 12=0.8 , r 13 =0.6 , r 23 =0.5

σ 1=10 , σ 2=8 , σ 3 =5

5. The incharge of an adult education centre in a town wants to know as to how


happy and satisfied the adult education centre students are. The following four
factors were studied to measure the degree of satisfaction:

X1 =age of the time of completing education

X2 = number of living children

X3 = annual income

X4 = average number of social activities per week


The multiple regression equation was determined to be

y=−20+0.04 x 1 +30 x 2 +0.04 x 3 +36.3 x 4

i. Calculate the satisfaction level of a person who passed out at the age of 45
has living children, has an annual income of rupees 12,000 and has only
one social activity in a week
ii. Interpret the value a = −20
iii. Would a person be more satisfied with the additional income of Rs. 2000?

6. Evaluate the following dataset to fit a multiple linear regression model.

y x1 x2
140 60 22
155 62 25
159 67 24
179 70 20
192 71 15
200 72 14
212 75 14
215 78 11
7. The salary of a person in an organisation has to be regressed in terms of
experience (X1) and mistakes (X2). If it is given that the values Y =3.3; X 1= 2.7;
X 2= 13.7; S y = 2.1; S 1 = 1.5; S 2 = 2.6 and the zero order correlations : r y 1 = 0.5; r y 2
= −0.3; r 12 = −0.47; Find the multiple linear regression and interpret the results.
Interpretation:

8. Find the regression equation of X on Y and Z given the following results:

Mean(X) = 35.8; Mean(Y)=52.4; Mean(Z)=48.8; r 12=0.6 ; r 23 =0.7 ; r 31=0.8

SD(X)=4.2; SD(Y)=5.3; SD(Z)=6.1

9. Calculate the Regression Coefficients

X1 X2 Y X 12 X1 X2 X 22 X1Y X2Y
4 1 7 16 4 1 28 7
7 2 12 49 14 4 84 24
9 5 17 81 45 25 153 85
12 8 20 144 96 64 240 160
10. Find the multiple regression coefficient for the below data

Sales (Lakh Rs) Advertising ('000 Rs) X 1 Number of Selling Agents


Sales Territory
Y X2
1 100 40 10
2 80 30 10
3 60 20 7
4 120 50 15
5 150 60 20
6 90 40 12
7 70 20 8
8 130 60 14
Binomial Distribution:

1. If X is binomially distributed with 6 trials and a probability of success equal to 1/ 4 at


each attempt, what is the probability of: (a) exactly 4 successes (b) at least one
success?
2. When an unbiased coin is tossed eight times what is the probability of obtaining: (a)
less than 4 heads (b) more than five heads?
3. A biased die is thrown thirty times and the number of sixes seen is eight. If the die is
thrown a further twelve times find: (a) the probability that a six will occur exactly
twice; (b) the expected number of sixes; (c) the variance of the number of sixes.
4. A random variable X is binomially distributed with mean 6 and variance 4.2. Find
P(X ≤ 6).

5. The random variable X has binomial distribution B 40,0.2 ( ) . Determine each of the
following. a) P(X = 5) . b) P ( X < 13). c) P ( X >10 ) . d) P(8 < X<14 ) .

Answer: 0.0854 , 0.9568 , 0.1608 , 0.3875

6. The probability of a customer ordering the colour of a particular model of new car in
silver is 0.2 . Find the probability that in next 30 random orders there will be …

a) … exactly 10 orders in silver.

b) … at most 8 orders in silver.

c) … no more than 11 orders in silver

Answer: 0.0355 , 0.8713 , 0.9905

7. The probability that Anna wakes up before her alarm rings is 0.4 .
a) Find the mean and variance of the number of times that Anna wakes up before her
alarm rings, in the next 7 mornings.

b) Determine the probability that in the next 7 mornings, Anna will wake up before
her alarm rings …

i. … at most once.

ii. … in more than 1 but less than 5 mornings.

c) Calculate the probability that in the next 4 weeks Anna will wake up before her
alarm rings on exactly 7 mornings.

E(X) =2.8 , Var (X) =1.68 , 0.1586 , 0.7451 , 0.0426

8. A box contains 50 coloured drawing. The proportion of red drawing pins in these
boxes is 3 out of 20 .

a) Find the mean and variance of the number of red drawing pins in these boxes.

b) Find the probability that in a box of 50 drawing pins there will be …


i. … exactly 7 red drawing pins.

ii. … more than 10 red drawing pins.

iii. … no more than 8 red drawing pins.

iv. … between 6 and 12, not inclusive, red drawing pins.

E (X) =7.5, Var(X)=6.375, 0.1575 , 0.1199 , 0.6681 , 0.5759

9. Ama is a supermarket cashier. The probability that Ama will have to rescan a
shopping item because the barcode reader failed to “read” it, is 0.15. A shopping item
whose bar code is read on the first attempt is called a “first time item”. Ama scans 40
shopping items.

a) Determine the probability that Ama will have more than 31 but at most 37 “first
time items”. Bama is a less experienced supermarket cashier. The probability that
Bama will scan a shopping item on her first attempt is 0.7 . Bama scans 50 shopping
items.

b) Determine the probability that Bama will have at least 35 “first time items”.
10. A geologist is looking for fossils in rocks. In a certain area it has been established
over a long period of time that 10% of the rocks contain fossils. The geologist selects
twenty rocks from this area. a) State 2 conditions that must be apply in order for a
binomial model to be valid. Find the probability that in the geologist’s sample there
will be … b) … one rock containing fossils. c) … at least one rock containing fossils.
The geologist selects a new sample of n rocks. He wants to have at least a 95% chance
that his new sample will contain fossils. d) Determine the smallest value of n .
Poisson Distribution

1. In the manufacture of glassware, bubbles can occur in the glass which reduces the
status of the glassware to that of a ‘second’. If, on average, one in every 1000 items
produced has a bubble, calculate the probability that exactly six items in a batch of
three thousand are seconds.

2. A manufacturer produces light-bulbs that are packed into boxes of 100. If quality
control studies indicate that 0.5% of the light-bulbs produced are defective, what
percentage of the boxes will contain: (a) no defective? (b) 2 or more defectives?

3. Suppose it has been observed that, on average, 180 cars per hour pass a specified
point on a particular road in the morning rush hour. Due to impending roadworks it is
estimated that congestion will occur closer to the city centre if more than 5 cars pass
the point in any one minute. What is the probability of congestion occurring?
4. The mean number of bacteria per millilitre of a liquid is known to be 6. Find the
probability that in 1 ml of the liquid, there will be: (a) 0, (b) 1, (c) 2, (d) 3, (e) less
than 4, (f) 6 bacteria.
5. A factory uses tools of a particular type. From time to time failures in these tools
occur and they need to be replaced. The number of such failures in a day has a
Poisson distribution with mean 1.25. At the beginning of a particular day there are
five replacement tools in stock. A new delivery of replacements will arrive after four
days. If all five spares are used before the new delivery arrives then further
replacements cannot be made until the delivery arrives. Find (a) the probability that
three replacements are required over the next four days. (b) the expected number of
replacements actually made over the next four days.

6. Suppose vehicles arrive at a signalised road intersection at an average rate of 360 per
hour and the cycle of the traffic lights is set at 40 seconds. In what percentage of
cycles will the number of vehicles arriving be (a) exactly 5, (b) less than 5? If, after
the lights change to green, there is time to clear only 5 vehicles before the signal
changes to red again, what is the probability that waiting vehicles are not cleared in
one cycle?
7. A manufacturer sells a certain article in batches of 5000. By agreement with a
customer the following method of inspection is adopted: A sample of 100 items is
drawn at random from each batch and inspected. If the sample contains 4 or fewer
defective items, then the batch is accepted by the customer. If more than 4 defectives
are found, every item in the batch is inspected. If inspection costs are 75 p per
hundred articles, and the manufacturer normally produces 2% of defective articles,
find the average inspection costs per batch.

8. The number of failures occurring in a machine of a certain type in a year has a


Poisson distribution with mean 0.4. In a factory there are ten of these machines. What
is (a) the expected total number of failures in the factory in a year? (b) the probability
that there are fewer than two failures in the factory in a year?
9. The number of misprints on a page of the Daily Mercury has a Poisson distribution
with mean 1.2. Find the probability that the number of errors (a) on page four is 2; (b)
on page three is less than 3; (c) on the first ten pages totals 5; (d) on all forty pages
adds up to at least 3.

10. A shop sells a particular make of video recorder. (a) Assuming that the weekly
demand for the video recorder is a Poisson variable with mean 3, find the probability
that the shop sells (i) at least 3 in a week, (ii) at most 7 in a week, (iii) more than 20 in
a month (4 weeks). Stocks are replenished only at the beginning of each month. (b)
Find the minimum number that should be in stock at the beginning of a month so that
the shop can be at least 95% sure of being able to meet the demands during the month.

Normal Distribution

1. The length of human pregnancies from conception to birth approximates a normal


distribution with a mean of 266 days and a standard deviation of 16 days. What
proportion of all pregnancies will last between 240 and 270 days (roughly
between 8 and 9 months)?
2. The average number of acres burned by forest and range fires in a large New
Mexico county is 4,300 acres per year, with a standard deviation of 750 acres. The
distribution of the number of acres burned is normal. What is the probability that
between 2,500 and 4,200 acres will be burned in any given year?
3. If Z ~ N(0,1) , find (a) P (Z > 1.2) (b) P( −2.0 < Z < 2.0) (c) P(-1.2<Z<1.0 )

4. If Z ~ N( 0,1) , find a such that (a) P (Z < a) = 0.90 (b) P( Z > a) = 0.25
5. Eggs laid by a particular chicken are known to have lengths normally distributed,
with mean 6 cm and standard deviation 1.4 cm. What is the probability of: (a)
finding an egg bigger than 8 cm in length; (b) finding an egg smaller than 5 cm in
length?
6. If X ~ N(4,9) , find (a) P(X > 6) (b) P(X>1 )

7. The lifetimes of a certain brand of car tyres, in km, are Normally distributed with
a mean of 7500 . Find the standard deviation, if 5% of these tyres last less than
6000 km.

8. The lengths of pine needles, in cm, are Normally distributed. It is further given
that 11.51% of these pine needles are shorter than 6.2 cm and 3.59% are longer
than 9.5 cm. Find the mean and the standard deviation of the length of these pine
needles.
9. The weights of newly born kittens are Normally distributed. 4.95% of newly born
kittens are heavier than 122 grams and 10.56% are lighter than 93 grams. Find the
mean and the standard deviation of the weights of newly born kittens.
10. An airline operates between Manchester and Madrid. The flight time may be
modelled by a Normal distribution with mean of 85 minutes and standard
deviation 8 . In order to boost sales for the service, the airline decides to refund
the fares if a flight time exceeds the mean flight time by t minutes. The airline
does not want to refund more than 0.005 of the fares. Find the value of t , correct
to the nearest minute.
Gamma Distribution, Exponential Distribution, Weibull Distribution

1. Suppose that on an average 1 customer per minute arrive at a shop. What is the
probability that the shopkeeper will wait more than 5 minutes before (i) both
of the first two customers arrive, and (ii) the first customer arrive? Assume
that waiting times follows gamma distribution.
2. Let X∼N(2,4)X∼N(2,4) and Y=3−2XY=3−2X.

Find P(X>1)P(X>1).

Find P(−2<Y<1)P(−2<Y<1).

Find P(X>2|Y<1)P(X>2|Y<1).

3. Suppose X has gamma distribution with parameters α = 8 and β = 15.


Compute P(60 ≤ X ≤ 120)
4. The lifetime T (years) of an electronic component is a continuous random
variable with a probability density function given by f ( t ) =e−t ; t ≥ 0 (i.e. λ = 1
or µ = 1) Find the lifetime L which a typical component is 60% certain to
exceed. If five components are sold to a manufacturer, find the probability that
at least one of them will have a lifetime less than L years.

5. car cooling systems are controlled by electrically driven fans. Assuming that
the lifetime T in hours of a particular make of fan can be modelled by an
exponential distribution with λ = 0.0003 find the proportion of fans which will
give at least 10000 hours service. If the fan is redesigned so that its lifetime
may be modelled by an exponential distribution with λ = 0.00035, would you
expect more fans or fewer to give at least 10000 hours service?

6. The time intervals between successive barges passing a certain point on a busy
waterway have an exponential distribution with mean 8 minutes. (a) Find the
probability that the time interval between two successive barges is less than 5
minutes. (b) Find a time interval t such that we can be 95% sure that the time
interval between two successive barges will be greater than t.
7.
8. The lifetime X (in hundreds of hours) of a certain type of vacuum
tube has a Weibull distribution with parameters α=2 and β=3. Compute
the following:
a. E(X) and V(X)
b. P(X≤5)
c. P(1.8≤X≤5)
d. P(X≥3).
9. Assume that the life of a packaged magnetic disk exposed to corrosive gases
has a Weibull distribution with α=300 hours and β=0.5.
Calculate the probability that
a. a disk lasts at least 600 hours,
b. a disk fails before 500 hours.
10. A monitor issues a warning signal when an action is needed as part of a
production process. The interval, 𝑋 hours, between successive signals follows
an exponential distribution with parameter 0.08. (i) Find the probability that
the interval between the next two signals is: a. Between 10 and 20 hours; b.
Less than two hours; c. Longer than 50 hours. (ii) State the mean and standard
deviation of the intervals between successive signals. (iii) Following a
warning signal, what is the longest time the production process could be left
unsupervised whilst ensuring the probability of missing the next signal is less
than 0.01?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy