Statistics For Data Sceince Suggestion-1
Statistics For Data Sceince Suggestion-1
SUGGESTION
Marks: 1
11. From a group of 7 men and 6 women, five persons are to be selected to form a committee
so that at least 3 men are there on the committee. In how many ways can it be done?
Answer: 756
12. In how many different ways can the letters of the word 'LEADING' be arranged in such a
way that the vowels always come together?
13. What is the probability that two cards drawn at random from a deck of playing cards will
both be aces? Answer: 1/221 (m)
14. A die is cast twice and a coin is tossed twice. What is the probability that the die will turn a 6
each time and the coin will turn a tail every time? Answer: = 1/144 (m)
15. A die is cast 6 times. What is the probability that each throw will return a prime number?
Answer: 1/64
Marks: 5
2. The probability that a management trainee will remain with a company is 0.6. The
probability that an employee earns more than Rs. 10,000 per month is 0.50. The
probability that an employee is a management trainee who remained with the
company or who earns more than Rs. 10,000 per month is 0.70. What is the
probability that an employee earns more than Rs. 10,000 per month, given that he is a
management trainee who stayed with the company?
Answer: = 2/3
3. There are 12 balls in a bag, 8 red and 4 green. Three balls are drawn successively
without replacement. What is the probability that there are alternately of the same
colour?
Answer: 8/33
4. A box contains 3 red and 7 white balls. One ball is drawn at random and in its place a
ball of the other colour is put in the box. Now one ball is drawn at random from the
box. Find the probability that it is red.
Answer: 0.34
5. A bag contains 30 balls numbered 1 through 30. Suppose drawing an even numbered ball
is considered a success . Two balls are drawn from the bag with replacement. Find the
probability of getting:
(iv) No successes
Marks: 15
3. If P(A) = 0.4 , P(B) = 0.7 and 𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝐴 𝑎𝑛𝑑 𝐵) = 0.8 find P(only one of A and
B)
Answer: 0.3
4. In a village of 21 inhabitants, a person tells a rumor to a second person, who in
turn repeats it to a third person etc. At each step the recipient of the rumor is
chosen at random from the 20 people available. Find the probability that the rumor
will be told 10 times without
a) Returning to the originator
b) Being repeated to any person
Answer: (19/20)9 (20*19*18…11)/2010
6. a) State and prove the addition theorem of probability for any two events A and B.
Rewrite the theorem when A and B are mutually exclusive.
d) Define independent events. Obtain the necessary and sufficient condition for
the independence of two events A and B. (5 +3+3+4)
7. a) Explain with the example the rules of addition and multiplication in the
theory of Probability.
MARKS: 1
1. Consider a dice with the property that that probability of a face with n dots showing
up is proportional to n. The probability of face showing 4 dots is? 4/21
2. Runs scored by batsman in 5 one day matches are 50, 70, 82, 93, and 20. The standard
deviation is : The mean of 5 innings is
(50+70+82+93+20)÷5 = 63
S.D = [1⁄n (x(n)-mean)2]0.5
S.D = 25.79.
3. Find median and mode of the messages received on 9 consecutive days 15, 11, 9, 5,
18, 4, 15, 13, 17. Arranging the terms in ascending order 4, 5, 9, 11, 13, 14, 15, 18,
18.
Median is (n+1)/2 term as n = 9 (odd) = 13.
Mode = 18 which is repeated twice.
4. If E denotes the expectation the variance of a random variable X is denoted as? By
property of Expectation
V (X) = E (X2)-(E(X))2.
5. What is the area under a conditional Cumulative density function?
9. Let X be a random variable with probability distribution function f (x)=0.2 for |x|<1
= 0.1 for 1 < |x| < 4
= 0 otherwise
The probability P (0.5 < x < 5) is
Explanation: P (0.5 < x < 5) = Integrating f (x) from
0.5 to 5 by splitting in 3 parts that is from 0.5 to 1
and from 1 to 4 and 4 to 5 we get
P (0.5 < x < 5) = 0.1 + 0.3 + 0
P (0.5 < x < 5) = 0.4.
10. If f(x) is a probability density function of a continuous random variable,
∞
then ∫−∞ 𝑓(𝑥) = ? Answer: 1
MARKS:5
1. (i) If the probability that a bomb dropped from a place will strike the target is
60% and if 10 bombs are dropped, find mean and variance?
(ii) If P(1) = P(3) in Poisson’s distribution, what is the mean?
(iii) Find the mean of tossing 8 coins.
(i) Explanation: Here, p = 60% = 0.6 and q = 1-p = 40% = 0.4 and n = 10
Therefore, mean = np = 6
Variance = npq = (10)(0.6)(0.4)
= 2.4.
(ii)P(x)=(e−λλx)/x!
Therefore, P(3)=(e−λλ3)/3!
and P(1)=(e−λλ1)/1!
P(1) = P(2)
Therefore, λ=√6.
(iii) Explanation: p = 1⁄2
n=8
q = 1 ⁄2
Therefore, mean = np = 8 * 1⁄2 = 4.
2. (i) If 40% of boys opted for maths and 60% of girls opted for maths, then what is
the probability that maths is chosen if half of the class’s population is girls?
Let E be the event of electing boy or a girl and A be the event of selecting a maths
student.
P(A) = P(E1) P(A|E1) + P(E2) P(A|E2)
=(12)(40100)+(12)(60100)
= 0.5.
(ii) Company A produces 10% defective products, Company B produces 20%
defective products and C produces 5% defective products. If choosing a company
is an equally likely event, then find the probability that the product chosen is
defective. Answer: 0.12
3. (i)Suppose 5 men out of 100 men and 10 women out of 250 women are colour
blind, then find the total probability of colour blind people. (Assume that both
men and women are in equal numbers.)= 0.045
(ii) A problem is given to 5 students P, Q, R, S, T. If the probability of solving the
problem individually is 1/2, 1/3, 2/3, 1/5, 1/6 respectively, then find the
probability that the problem is solved. = 0.37
4. Let X and Y be jointly continuous random variable with joint PDF
𝑐𝑥 + 1 𝑥, 𝑦 ≥ 0, 𝑥 + 𝑦 < 1
𝑓𝑋,𝑌(𝑥, 𝑦) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑠𝑖𝑒
(i) Show the range of (X,Y), RXY, in x-y plane.
(ii) Find the constant c
(iii) Find the marginal pdfs fX(x) and fY(y)
(iv) Find P(Y<2x2)
5. Let X and Y be jointly continuous random variable with joint PDF
6𝑒−(2𝑥+3𝑦) 𝑥, 𝑦 ≥ 0,
𝑓𝑋,𝑌 (𝑥, 𝑦) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑠𝑖𝑒
Marks: 15
Marks: 1
1. Six men and five women apply for an executive position in a small company. Two of the
applicants are selected for an interview. Let X denote the number of women in the interview
pool. We have found the probability mass function of X. (easy)
2. An urn contains four balls of red, black, green and blue colours. There is an equal
probability of getting any coloured ball. What is the expected value of getting a blue ball out
of 30 experiments with replacement? (easy)
=N×p
= 30 × 0.25
= 7.50
Solution:
P(7) = 0.193
Therefore, the probability of getting exactly 7 heads is 0.193.
9. The probability that a person can achieve a target is 3/4. The count of tries is 5. What
is the probability that he will attain the target at least thrice? (Med)
Solution:
Given that, p = ¾, q = ¼, n = 5.
Using binomial distribution formula, we get P(X) = nCx · px (1 − p)n−x
Thus, the required probability is: P(X = 3) + P(X=4) + P(X=5)
= 5C3 · (¾)3 (¼ )2 + 5C4 · (¾)4 (¼ )1 +5C5 · (¾)5
= 459/512.
Therefore, the probability that the person will attain the target atleast thrice is
459/512.
10. A coin that is fair in nature is tossed n number of times. The probability of the
occurrence of a head six times is the same as the probability that a head comes 8
times, then find the value of n. (Hard)
Solution:
Similarly, the probability that head occurs 8 times = nC8 (½)8 (½)n-8
Given that, the probability of the occurrence of a head six times is the same as the probability
that a head comes 8 times,
⇒nC6 = nC8
⇒ 6 = n-8
⇒ n= 14.
Marks: 5
1. Determine the mean and variance of the random variable X having the following
probability distribution. (Easy)
Answer:
2. Define Normal distribution. What is SNV? Explain the advantages of normal
distribution. (Easy)
3. Consider a random variable X with probability density function
Answer:
From the given data, you first calculate the probability distribution of the random variable.
Then using it you calculate mean and variance.
4. (a)A radar unit is used to measure speeds of cars on a motorway. The speeds are
normally distributed with a mean of 90 km/hr and a standard deviation of 10
km/hr. What is the probability that a car picked at random is travelling at more
than 100 km/hr?
Answer: Let x be the random variable that represents the speed of cars. x has μ = 90
and σ = 10. We have to find the probability that x is higher than 100 or P(x > 100)
For x = 100 , z = (100 - 90) / 10 = 1
P(x > 90) = P(z > 1) = [total area] - [area to the left of z = 1]
= 1 - 0.8413 = 0.1587
The probability that a car selected at a random has a speed greater than 100 km/hr is
equal to 0.1587
(b) For a certain type of computers, the length of time between charges of the
battery is normally distributed with a mean of 50 hours and a standard deviation
of 15 hours. John owns one of these computers and wants to know the probability
that the length of time will be between 50 and 70 hours.
Answer: Let x be the random variable that represents the length of time. It has a
mean of 50 and a standard deviation of 15. We have to find the probability that x
is between 50 and 70 or P( 50< x < 70)
For x = 50 , z = (50 - 50) / 15 = 0
For x = 70 , z = (70 - 50) / 15 = 1.33 (rounded to 2 decimal places)
P( 50< x < 70) = P( 0< z < 1.33) = [area to the left of z = 1.33] - [area to the left of
z = 0]
= 0.9082 - 0.5 = 0.4082
The probability that John's computer has a length of time between 50 and 70 hours
is equal to 0.4082.
Answer: Let x be the random variable that represents the scores. x is normally
ditsributed with a mean of 500 and a standard deviation of 100. The total area under
the normal curve represents the total number of students who took the test. If we
multiply the values of the areas under the curve by 100, we obtain percentages.
For x = 585 , z = (585 - 500) / 100 = 0.85
The proportion P of students who scored below 585 is given by
P = [area to the left of z = 0.85] = 0.8023 = 80.23%
Tom scored better than 80.23% of the students who took the test and he will be
admitted to this University.
1. Note: What is meant here by area is the area under the standard normal curve.
a) For x = 40, the z-value z = (40 - 30) / 4 = 2.5
Hence P(x < 40) = P(z < 2.5) = [area to the left of 2.5] = 0.9938
b) For x = 21, z = (21 - 30) / 4 = -2.25
Hence P(x > 21) = P(z > -2.25) = [total area] - [area to the left of -2.25]
= 1 - 0.0122 = 0.9878
c) For x = 30 , z = (30 - 30) / 4 = 0 and for x = 35, z = (35 - 30) / 4 = 1.25
Hence P(30 < x < 35) = P(0 < z < 1.25) = [area to the left of z = 1.25] - [area to the
left of 0]
= 0.8944 - 0.5 = 0.3944
5. a) For x = 80, z = 1
Area to the right (higher than) z = 1 is equal to 0.1586 = 15.87% scored more that 80.
b) For x = 60, z = -1
Area to the right of z = -1 is equal to 0.8413 = 84.13% should pass the test.
c)100% - 84.13% = 15.87% should fail the test.
CHAPTER 4
Marks: 1
1. What is the formula of calculating the confidence interval in confidence interval
estimation?
2. What does range or set of values having chances to contain value of population
parameter with particular confidence level considered as?
3. If sample size is greater than or equal to 30 then sample standard deviation can be
approximated to population standard deviation for the?
4. Considering sample satistic if mean of sampling distribution is equal to population
mean then what does sample satistics is classified as?
5. What is the value of any sample statistics which is used to estimate parameters of
population classifies as?
6. What does the method in which sample statistics is used to estimate value of
parameters of population classified as?
7. What is the confidence interval if point estimate is 8 and margin of error is 5 ?
8. What does the distance between true value of population parameter and estimated
value of population parameter called?
9. Which hypothesis test should be used to as certain improvement of the worker’s
performance before and after training?
10. Write one advantage of simple random sampling.
11. What do you mean by likelihood inference
12. What do you mean by finite population?
13. What do you mean by poster distribution?
Marks 5
1. Write the steps of simple random sampling.
2. What do you mean by simple random sampling?
3. What do you mean by Maximum likelihood estimation?
4. What is the requirement of MLE in data science?
5. Suppose that you would like to estimate the portion of voters in your town that plan
to vote for Party A in an upcoming election. To do so, you take a random sample of
size n from the likely voters in the town. Since you have a limited amount of time and
resources, your sample is relatively small. Specifically, suppose that n=20. After
doing your sampling, you find out that 6 people in your sample say they will vote for
Party A.
For simplicity, suppose your prior beliefs on the population percentage of adults
(under age 70) who have severe reactions to angioplasty has the following
distribution:
p Pr(p)
0 1/11
0.10 1/11
0.20 1/11
0.30 1/11
0.40 1/11
0.50 1/11
0.60 1/11
0.70 1/11
0.80 1/11
0.90 1/11
1.00 1/11
a) What is the posterior distribution of p?
b) What is the posterior probability that p exceeds 50%?
0 1/6 0 0 0
0.10 1/6 .0000312 5.21 (10^(-6)) .00037
0.20 1/6 .0724 .012 .8656
0.30 1/6 .0112 .00186 .1339
0.40 1/6 .000008 1.38 (10^(-6)) .0001
0.50 1/6 6.2 (10^(-11)) 1.04 (10^(-11)) 7.4 (10^(-10))
Pr(X=28) = .013933
Each entry in the third column is obtained by using the binomial formula for the
corresponding value of p. For example,
Pr(X=28 | p=0.20) = (127!)/(28! 99!) .228 .899 = .0724
Each entry in the fourth column is obtained from the multiplication rule:
Pr(X=28, p) = Pr(p) Pr(X=28 | p)
Note that this prior distribution is very strong, in that it forces p to equal only one of 6
values. A more realistic prior distribution would allow p to range from 0 to 1. But,
that's more complicated computationally than we need to show the general idea of
Bayesian statistics.
Marks: 15
A machine is built to make mass-produced items. Each item made by the machine has
a probability p of being defective. Given the value of p, the items are independent of
each other. Because of the way in which the machines are made, p could take one of
several values. In fact p = X/100 where X has a discrete uniform distribution on the
interval [0, 5]. The machine is tested by counting the number of items made before a
defective is produced. Find the conditional probability distribution of X given that the
first defective item is the thirteenth to be made.
2. What do you mean by hypothetical population? Give example. Define the term
population and sample. Write the differences between sample and population. What
do you mean by population parameter and sample statistics?
3. What do you mean by optimal interference? Define inference based on MLE. What
is the Procedure for Statistical Inference? What are the three components of statistical
inference?
Chapter 5
Marks 1
Marks 5
1. What is Markov process? Define Markov chain. Give an example.
2. Define Poisson process. What is the need of it?
3. What do you mean by time serise data? Why this kind of data is requered?
4. Under what conditions a stochastic process becomes counting process?
A counting process is a stochastic process {N(t), t ≥ 0} with values that are
non-negative, integer, and non-decreasing: N(t) ≥ 0. N(t) is an integer. If s
≤ t then N(s) ≤ N(t).
5. What is state space in stochastic process?
The range (possible values) of the random variables in a stochastic
process is called the state space of the process.
6. Why do we need stochastic process?
The stochastic process is a model for the analysis of time series. The
stochastic process is considered to generate the infinite collection (called the
ensemble) of all possible time series that might have been observed. Every
member of the ensemble is a possible realization of the stochastic process.
Marks: 15