Sample Questions: Subject Name: Semester: VI
Sample Questions: Subject Name: Semester: VI
Computer Engineering
Q:Multiple
1. linear regression (MLR) is a __________ type of statistical analysis.
Options
a) Univariate
b) Bivariate
c) Multivariate
d) Trivariate
Q:A
2. linear regression (LR) analysis produces the equation Y = 0.4X + 3. This indicates that:
Options
When Y = 0.4, X = 3
When Y = 0, X = 3
When X = 3, Y = 0.4
When X = 0, Y = 3
Q:A
3. LR analysis produces the equation Y = -3.2X + 7. This indicates that:
Options
a) A 1 unit increase in X results in a 3.2 unit decrease in Y.
b) A 1 unit decrease in X results in a 3.2 unit decrease in Y.
c) A 1 unit increase in X results in a 3.2 unit increase in Y.
d) An X value of 0 would would increase Y by 7
When
4. writing regression formulae, which of the following refers to the predicted value on
the dependent variable (DV)?
a)Y
b)Y (hat)
c)X
d)X (hat)
In5.MLR, the square of the multiple correlation coefficient or R2 is called the
a)Coefficient of determination
b)Variance
c) Covariance
d)Cross-product
Which
6. of the following is true about the adjusted R2?
It is usually larger than the R2
It is only used when there is just one predictor
It is usually smaller than the R2
It is used to determine whether residuals are normally distributed
7. square method calculates the best-fitting line for the observed data by minimizing
Least
the sum of the squares of the _______ deviations.
a) Vertical
b) Horizontal
c) Both of these
d) None of these
A8.
residual is defined as
a) The difference between the actual Y values and the mean of Y.
b) The difference between the actual Y values and the predicted Y values.
c) The predicted value of Y for the average X value.
d) The square root of the slope.
The
9. correct relationship between SST, SSR, and SSE is given by;
a) SSR = SST + SSE
b) SST = SSR + SSE
c) SSE = SSR – SST
d) d) all of the above
10.
Below you are given a summary of the output from a simple linear regression analysis
from a sample of size 15, SSR=100, SST = 152. The coefficient of determination is;
r2=SSR/SSTO=1−SSE/SSTO
a) 0.5200
b) 0.6579
c) 0.8111
d) 1.52
11.
Significance for the coefficients (b) is determined by
a)an F-test.
b)an R2 test.
c)a correlation coefficient.
d)a t-test.
Q:A
12. researcher polls people as they walk by on the street.
Options
a) Systematic Random Sample
b) Convenience Sampling
c) Judgmental Sampling
d) Quota Sampling
Q:Inspectors
13. for a hospital chain with multiple locations randomly select some of their
locations for a cleanliness check of their operating rooms.
Options
a) Cluster sampling
b) Stratified Sampling
c) Quota Sampling
d) Snowball Sampling
Q:14.
The runs scored by a batsman in 5 ODIs are 31,97,112, 63, and 12. The standard
deviation is
Options
1: 24.79
2: 23.79
3: 25.79
4: 26.79
Q:15.
Find the mode of the call received on 7 consecutive day 11,13,13,17,19,23,25
Options
1: 11
2: 13
3: 17
4: 23
Q:16.
Find the median of the call received on 7 consecutive days 11,13, 17, 13, 23,25,19
Options
1: 13
2: 23
3: 25
4: 17
Median
Median is the middle most value of a set of observations when the
samples are arranged in order of magnitudes ( Either in ascending or in
descending)
EVALUATION
Here the given observations are
11, 13, 17, 13, 23, 25, 19
Rearranging in ascending order we get
11 , 13 , 13 , 17 , 19 , 23 , 25
Total number of observations = 7 which is odd
Hence the required median
= The middle most value of a set of observation
= 4 th observation
= 17
Hence the required median = 17
Q:17.
If the probability of hitting an object is 0.8, find the variance
Options
1: 0.18
2: 0.16
3: 0.14
4: 0.12
Given,
P = 0.8
q = 1-p
= 1 – 0.8
=0.2
Q:18.
E(X) = λ is used for which distribution?
Options
1: Binomial distribution
2: Poisson's distribution
3: Bernoulli's distribution
4: Laplace distribution
In Poisson's distribution, a positive constant called λ is used, which is the mean and
variance of the distribution. The Poisson distribution predicts how many of a
certain type of event will occur in a bounded area or during a given period,
provided that the events occur independently and cannot occur simultaneously.
The events are sometimes called "outcomes" or "observed occurrences
Q:19.
The classification of data on geographical basis is also called as
Options
1: reflected classification
2: populated classification
3: sampling classification
4: spatial classification
Q:20.
The summary and presentation of data in tabular form with several non-overlapping
classes is referred as
Options
1: nominal distribution
2: ordinal distribution
3: chronological distribution
4: frequency distribution
Q:21.
The largest value is 60 and smallest value is 40 and the number of classes desired is 5
then the class interval is
Options
1: 20
2: 4
3: 25
4: 15
Q:22.
The diagram used to represent group and ungrouped data is classified as
Options
1: breadth diagram
2: width diagram
3: bar diagram
4: length diagram
Q:23.
Histogram, pie charts and frequency polygon are all types of
Options
1: one dimensional diagram
2: two dimensional diagram
3: cumulative diagram
4: dispersion diagram
Q:24.
Which of the following is not a type of univariate frequency distribution
Options
1: Individual observation
2: Discrete frequency distribution
3: Continuous frequency distribution
4: Random frequency distribution
Q:25.
The method of classification of data in terms of class intervals in which both the lower
limit and the upper limit of any class (class interval) are included in that class (class
interval) is known as __________________ method of classification
Options
1: exclusive
2: inclusive
3: equal
4: unequal
Exclusive Method– This method is used for those series in which the upper limit of
one class becomes the lower limit of the next class. It is called as exclusive series
because the frequencies of the upper limit of a class interval are not included in that
particular class. In such type of series, the upper limit of one class becomes the lower
limit of the next class, for example, 0–10, 10–20, 20–30 and so on. The upper limit is
excluded but the lower limit is included in the class interval. This method is most
appropriate for data of continuous variables.
Inclusive Method– Under this method of classification of data, the classes are formed
in such a manner that the upper limit of a class interval does not repeat itself as the
lower limit of the next class interval. In such a series, both the upper limit and the
lower limit are included in the particular class interval, for example, 1–5, 6–10, 11–15
and so on. The interval 1–5 includes both the limits i.e. 1 and 5.
Q:26.
What is the arithmetic mean of 2, 8, 10, 6, 14?
Options
1: 5
2: 6
3: 7
4: 8
Correct option is D)
Using the formula for Required arithmetic
mean =number of terms/sum of the terms
Q:27.
From the following frequency distribution, find the median class:
Options
1: 1400–1550
2: 1550–1700
3: 1700–1850
4: 1850–2000
Q:28.
We need _____ dimension(s) to table or plot a univariate (1-variable) frequency
distribution
Options
1: one
2: two
3: three
4: four
Q:29.
Quota sampling is similar to ____________ sampling.
Options
1: purposive
2: convenience
3: stratified
4: cluster
Q:30.
The number of possible samples of size 2 out of 5 population size in simple random
sampling with replacement (SRSWR) is equal to
Options
1: 10
2: 15
3: 20
4: 25
Q:31.
Which of the following method of sampling is not a part of ‘restricted random
sampling’?
Options
1: Lottery method
2: Stratified method
3: Systematic method
4: Cluster method
Q:32.
Consider simple random sampling without replacement (SRSWOR) from a population
of size N. The number of samples of size n is
Options
1: NPn
2: NCn
3: Nn
4: N
Q:33.
Population census conducted by the government of India after every 10 years is an
example of __________ data.
Options
1: Primary data
2: Secondary data
3: Structured data
4: Unstructured data
Q:34.
If byx = 0.5 and bxy = 0.46, then the value of coefficient of correlation (r) is
Options
1: 0.39
2: 0.48
3: 0.23
4: 0.25
Q:35.
From the given table, the MAE value is
Options
1: 100
2: 333.33
3: 133.33
4: 33.33
Descriptive Questions
In a simple study about coffee habits in two towns A and B the following information is
given
Town A: Females were 40%, total coffee drinkers were 45% and female non coffee drinkers
were 20%.
Town B: males were 55%, male non coffee drinkers were 30% and female coffee drinkers
were 15%
Present the data into a table format
Ans:
Ans:
Linear regression is a way to model the relationship between two variables. You might also recognize
the equation as the slope formula. The equation has the form Y= a + bX, where Y is the dependent
variable (that’s the variable that goes on the Y axis), X is the independent variable (i.e. it is plotted on
the X axis), b is the slope of the line and a is the y-intercept.
Where
n = Total number of observations
Σx = Total of the First Variable Value
Σy = Total of the Second Variable Value
Σxy = Sum of the Product of first & Second Value
Σx2 = Sum of the Squares of the First Value
Σy2 = Sum of the Squares of the Second Value
Thus, the coefficient of of determination = (correlation coefficient)2 = r2
x 9 8 7 6 5 4 3 2 1
y 15 16 14 13 11 12 10 8 9
Which is better?
Adjusted R-square should be used to compare models with
different numbers of independent variables. Adjusted R-square
should be used while selecting important predictors (independent
variables) for the regression model.
Using t-test check significance of independent variable.
What is the effect of R2 and Adjusted R2 for addition of new variable in multiple linear
regression.
Ans: R2 shows how well terms (data points) fit a curve or line. Adjusted R2 also indicates how well
terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and
more useless variables to a model, adjusted r-squared will decrease. If you add more useful variables,
adjusted r-squared will increase.
Adjusted R2 will always be less than or equal to R2.
You only need R2 when working with samples. In other words, R2 isn’t necessary when you have data
from an entire population.
The formula is:
where:
• N is the number of points in your data sample.
• K is the number of independent regressors, i.e. the number of variables in your model, excluding
the constant.
How will you decide about the relative importance of various independent variables?
What is non-probability sampling and explain types of non-probability samplings.
From 10 observations on Price (x) and Supply (y) of a commodity, the following
summary of figures were obtained.
Compute the line of regression of y on x and interpret the result. Estimate the
supply when price is 16 units.
is an unbiased estimator of
Obtain Partial correlation coefficients for following data
Frequency Distribution, Types of Univariate Frequency Distribution, Cumulative Frequency
Distribution, Bivariate/Two-way classification of data, Cumulative frequency curve or ogive
(i.e., more than ogive and less than ogive)
Methods to Check the Performance of Regression Models: MAE, MSE, R 2, MAPE (Moving
Averages)
Sums on Point Estimate of the Population mean, Population Std Deviation, and Std. Error of
the Estimate mean
Hypothesis Testing:
a) Z test for Single Mean
b) Z test for Difference of Mean
Explain 1.Test of significance 2.Level of significance 3.Simple hypothesis
4.Composite Hypothesis
The manufacturer of a certain make of electric bulbs claims that his bulbs have a mean life of
25 months with standard deviation of 5 months. A random sample of 6 such bulbs gave the
following values
Life of bulb in months 24,26,30,20,20,18
Is the manufacturer’s claim valid at 1% level of significance?(Given that the table values of
the appropriate test statistics at said level are 4.032,3.707 and 3.499 for 5, 6 and 7 degree of
freedom respectively)
Explain in details MP and UMP-Test
Define MAPE, MAE, RMSE with formula and example