0% found this document useful (0 votes)
205 views91 pages

AS Multiple Choices

The document discusses several examples related to linear regression analysis and interpreting correlation coefficients. Multiple choice questions are asked about predicting relationships between variables based on scatterplots and regression models fit to sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views91 pages

AS Multiple Choices

The document discusses several examples related to linear regression analysis and interpreting correlation coefficients. Multiple choice questions are asked about predicting relationships between variables based on scatterplots and regression models fit to sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 91

1.

A field researcher who studies lions conjectured that the more time a Cubs spins playing, the sooner the couple
begin to hunt. Observational data were collected from 20 lion cubs. The researcher recorded how long they
spent playing in the age when they began hunting. Because male and female lions have different hunting
behaviors, the research a record of the data for males and found female separately. The two scatterplot show the
data for the 10 female lions and 10 male lions. Based on the scatterplots, for which gender does there appear to
be evidence that the more time a lion cub spends playing, the sooner the lion cub is likely to begin hunting?

A) For female cubs only


B) For male cubs only
C) For both male cubs and female cubs, with equal evidence
D) For both male cubs and female cubs, with more evidence for female cubs than for male cubs
E) For neither male cubs nor female cubs
Explain: The scatterplot for female and male lion cubs is shown below: The x-axis is the playing time (hours) and
the y-axis is the age (months) at which the cub started to hunt.
In both scatterplots, there seems to be a negative linear correlation between playing time and the age of starting
to hunt. That is, as playing time increases, the age at which the cubs start to hunt decreases.
However, the scatterplot for male lion cubs seems to be more tightly clustered around the line of best fit than
the scatterplot for female lion cubs. This suggests that the correlation between playing time and the age of
starting to hunt is stronger for male lion cubs than for female lion cubs (Strength)

2. A tennis ball was thrown in the air. The height of the ball from the ground was recorded every millisecond from
the time the ball was thrown until it reached the height from which it was thrown. The correlation between the
time and a height was computed to be zero. What does this correlation should just about the relationship
between time and height?

1. There is no relationship between time and height.


2. There is no linear relationship between time and height.
3. The distance the ball traveled upward is the same as the distance the ball traveled downward.
4. The correlation suggests that there is measurement or calculation error.
5. The correlation suggests that more measurements should be taken to better understand the relationship.
3. A researcher collected data on the Asian years and the growth of sea turtles. The following graph is a residual
plot of the regression of growth versus age. Does the residual plot support the appropriateness of a linear
model?

A) Yes, because there is a clear pattern displayed in the residual plot


B) Yes, because about half the residuals are positive and the other half are negative
C) Yes, because as age increases, the residuals increase
D) No, because the points appear to be randomly distributed
E) No, because the graph displays a U-shaped pattern

4. Dairy farmers are aware there is often a linear relationship between the age, in years, of a dairy cow in the
amount of milk produced, and gallons per week. The least squares regression line produced from a random milk
sample is milk = 40.8-1.1(age). Based on the model, what is the difference in predicted amounts of milk
produced between a cow of 5 years in a cow of 10 years?
A) A cow of 5 years is predicted to produce 5.5 fewer gallons per week
B) A cow of 5 years is predicted to produce 5.5 more gallons per week
C) A cow of 5 years is predicted to produce 1.1 fewer gallons per weeK
D) A cow of 5 years is predicted to produce 1.1 more gallons per week
E) A cow of 5 years and a cow of 10 years are both predicted to produce 40.8 gallons per week
Explain:
Predicted amounts of milk produced between a cow of 5 years = 40.8 - 1.1 * 5 = 35.3
Predicted amounts of milk produced between a cow of 10 years = 40.8 - 1.1 * 10 = 29.8
We have: 35.3 – 29.8 = 5.5 => B
5. A scatterplot of students heights, in inches, versus corresponding arm span length, in inches, is shown below. One
of the points in the graph is labeled a period of 0.8 is removed which of the following statements would be true?
A) The slope of the least squares regression line is unchanged and the correlation coefficient increases.
B) The slope of the least squares regression line is unchanged and the correlation coefficient decreases.
C) The slope of the least squares regression line increases and the correlation coefficient increases.
D) The slope of the least squares regression line increases and the correlation coefficient decreases.
E) The slope of the least squares regression line decreases and the correlation coefficient increases.
6. At a large airport, data were recorded for one month on how many baggage items were unloaded from each
flight upon arrival as well as the time required to deliver all the baggage items on the flight to the baggage claim
area. A scatterplot of the two variables indicated a strong positive linear association between the variables.
Which of the following statements is a correct interpretation of the word strong in the description of the
association?
A) A least-squares model predicts that the more baggage items that are unloaded from a flight, the greater the
time required to deliver the items to the baggage claim area.
B) The actual time required to deliver all the items to the baggage claim area based on the number of items
unloaded will be very close to the time predicted by a least-squares model.
C) The time required to deliver an item to the baggage claim area is relatively constant, regardless of the
number of baggage items unloaded from a flight.
D) The variability in the time required to deliver all items to the baggage claim area is about the same for all
flights, regardless of the number of items unloaded from a flight.
E) The time required to unload baggage items from a flight is related to the time required to deliver the items
to the baggage claim area.
7. For random sample of 20 professional athletes, there is a strong, linear relationship between the number of
hours of exercise per week and the resting heart rate. For the athletes in the sample, those who exercise more
hours per week tend to have lower resting heart rate than those who exercise less. Which of the following is a
reasonable value for the correlation between the number of hours athletes exercise per week and they're resting
heart rate?
A) 0.71
B) 0.00
C) −0.14
D) −0.87
E) −1.00
Explain: Who exercise more hours per week tend to have lower resting heart rate than those who exercise
less => Negative linear relationship
Analyzing the answer choices:
A) 0.71: This is a positive and strong correlation, which aligns with the scenario.
B) 0.00: This indicates no correlation, which contradicts the given information.
C) -0.14: This is a weak negative correlation, not compatible with a strong positive relationship.
D) -0.87: This is a strong negative correlation, opposite to the stated positive relationship.
E) -1.00: This is the strongest negative correlation, inconsistent with the positive relationship described.
=> D
8. An exponential relationship exists between the explanatory variable and the responsible variable and a set of
data. The common logarithm of each value of the response of variable is taken, and the least square regression
line has an equation of log (Y) = 7.3-1.5 X. Which of the following is closest to the predictive value of the
response variables for X = 4.8?
A) 0.1
B) 0.68
C) 1.105
D) 1.26
E) 14.5
Explain: Log(Y) = 7.3-1.5*4.8 => Y=1.26
9. Which of the following is the best description of a positive association between two variables?
A) The values will create a line when graphed on a scatterplot.
B) The values will create a line with positive slope when graphed on a scatterplot.
C) As the value of one of the variables increases, the value of the other variable tends to decrease.
D) As the value of one of the variables increases, the value of the other variable tends to increase.
E) All values of both variables are positive.
10. For a specific species of fish in a pond, a wildlife biologist wants to build a regression equation to predict the
weight of a fish based on its length. The biologist collects a random sample of this species of fish and finds that
the lengths vary from 0.75 to 1.35 inches. The biologist uses the data from the sample to create a single linear
regression model. Would it be appropriate to use this model to predict the weight of a fish of this species that is
3 inches long?
A) Yes, because 3 inches falls above the maximum value of lengths in the sample.
B) Yes, because the regression equation is based on a random sample.
C) Yes, because the association between length and weight is positive.
D) No, because 3 inches falls above the maximum value of lengths in the sample.
E) No, because there may not be any 3-inch fish of this species in the pond.
11. which of the following statements about a least squares regression analysis is true?
1) appoint with a large residual is an outlier.
2) appoint with a hide leverage has Y value that is not consistent with the other Y values in the stat
3) The removal of an influential point from a data set could change the value of the correlation coefficient
A) 3 only
B) 2 only
C) 1 and 2 only
D) 1,2,3
E) 1 only
12. A road runner is a desert bird that tends to run instead of fly. While running, the Road Runner uses its tail to
balance. A sample of 10 road runners was taken and the birds total length and centimeters and tail length and
centimeters were recorded. The output shown in the table is from Elise squares regression to predict the tail
length and given the total length. Suppose A road runner has a total tail length of 59.0 cm and a tail length of
31.1 cm. Based on the residual, does the regression model over estimate or underestimate The tail length of the
Road Runner?
A) Underestimate, because the residual is positive.
B) Underestimate, because the residual is negative.
C) Overestimate, because the residual is positive.
D) Overestimate, because the residual is negative.
E) Neither, because the residual is 0.
13. The distributions of four variables are shown in the following histograms. Which of the following shapes is not
represented by one of the four distributions?
A) Normal distribution
B) Uniform distribution
C) Exponential distribution
D) Bimodal distribution
Explain: 1: Bimodal distribution; 2,3 : Exponential distribution; 4: Normal distribution

Uniform distribution:
14. A company determines the mean and standard deviation of the number of sick days taken by its employees in
one year. Which of the following is the best description of the standard deviation?
A) Approximately the mean distance between the number of sick days taken by individual employees and the
mean number of sick days taken by all employees.
B) Approximately the median distance between the number of sick days take by individual employees and the
median number of sick days take by all employees.
C) The distance between the greatest number of sick days taken by an employee and the mean number of sick
days taken by all employees.
D) The number of days separating the fewestest sick days taken and the most sick days taken when considering
all employees.
E) The number of days separating the fewestest sick days taken and the most sick days taken when considering
the middle 50 percent of the distribution.
Explain: Here's why the other options are not accurate:
B) The median is the middle value when the data is ordered from least to greatest. While it has some
connection to the distribution, it doesn't necessarily represent the average distance from the mean like
the standard deviation.
C) This only considers the highest value and the mean, not the overall spread of data.
D) This simply gives the range, the difference between the highest and lowest values, which doesn't
capture the central tendency or how spread out the data is.
E) This describes the interquartile range (IQR), which only focuses on the middle 50% of the data and
doesn't capture the entirety of the distribution like the standard deviation.
15. A graph (not shown) of the selling prices of homes in a certain city for the month of April reveals that the
distribution is skewed to the left. Which of the following statements is the most reasonable conclusion about the
selling prices based on the graph?
A) The mean is greater than the median.
B) The median is the average of the first quartile and the third quartile.
C) There are fewer selling prices between the first quartile and the median than there are between the median
and the third quartile.
D) There are more selling prices that are less than the mean than selling prices that are greater than the mean.
E) The value of maximum minus third quartile is less than the value of first quartile minus minimum.
Explain: A) smaller B) Only if the distribution is symetrical
16. A local real estate magazine used the median instead of the mean when it reported the SAT score of the average
student who attends Groveland High School. A graphical display of SAT scores of June to attend Groveland High
School indicated that the data were strongly skewed to the right. Which of the following examples why, in this
situation, the median is a more accurate indicator of the SAT score of the average student than the mean is?
A) The mean is affected by the skewness, whereas the median is not.
B) The median is always preferred statistics.
C) The mean will be less than the median when the data is strongly skewed to the right.
D) The mean should be used only when data are strongly skewed to the left.
E) The median is equal to one-half of the maximum and minimum SAT scores at Groveland High.
17. A marketing firm obtained random samples of 20 people in five regions of the country to investigate the level of
interest in a new product. People in the sample were asked to rate their level of interest on a scale from 1 to 10,
with 1 being the least amount of interest and 10 being the greatest. The histograms show the results for each
region. The graph for which region displays data for level of Interest with the least standard deviation?

A) Region A
B) Region B
C) Region C
D) Region D
E) Region E
18. At a local ice-cream store, 210 people were surveyed on whether they 4/4 preferred eating ice cream from a cone
or a cup. Of the 210 people surveyed, 70 were adults and 140 were children. Of the responses, 150 indicated the
cone as the preferred method of eating ice cream. For those surveyed, there was no association between age and
preferred method of eating ice cream. Which of the following tables shows the distribution of responses?

A) Table 1
B) Table 2
C) Table 3
D) Table 4
E) Table 5
19. As part of a science experiment, a student recorded 10 measurements of the temperature of a liquid. One of the
measurements was an outlier when compared with the other 9 measurements. Which of the following must be
true about the 9 measurements, excluding the outlier, when compared with the 10 measurements? (Note: An
outlier is any number that is greater than the upper quartile or less than the lower quartile by at least 1.5 times
the interquartile range.)
A) The median of the 9 measurements is less than the median of the 10 measurements.
B) The median of the 9 measurements is greater than the median of the 10 measurements.
C) The maximum of the 9 measurements is less than the maximum of the 10 measurements.
D) The maximum of the 9 measurements is greater than the maximum of the 10 measurements.
E) The standard deviation of the 9 measurements is less than the standard deviation of the 10 measurements.
Explain: Based on the formula of variance and standard deviation
∑( xi−x )
Variance: s2 =
n−1
20. A car rental agency has two locations in a city. The box plots below summarize the miles driven for one day if
single - day car rentals at each location. Based on the box plots, which statement provides the best comparison of
the two locations?

A) The number of single-day rentals is greater for location A than for location B.
B) The number of single-day rentals is less for location A than for location B.
C) Compared with location A, the miles driven for location B display more variability, and the median is greater
D) Compared with location A, the miles driven for location B display less variability, and the median is greater.
E) Compared with location A, the miles driven for location B display less variability, and the median is about the
same.
21. The histogram shows the distribution of heights, in inches, of 100 adult men. Based on the histogram, which of
the following is closest to the interquartile range, in inches, of the distribution?

A) 2
B) 5
C) 99
D) 12
E) 15
Explain: First quartile = ¼ * (n+1) = 101/4 = 25.25 -> 25.25 on the histogram is 66
Third quartile = ¾ * (n+1) = 303/4 = 75.75 -> 75.75 on the histogram is 71
 71 – 66 = 5
22. Which of the following statements must be true about the data sets A and B displayed in histograms below?

A) The mean of data set A is equal to the mean of data set B.


B) The median of data set A is equal to the median of data set B.
C) The range of data set A is equal to the range of data set B.
D) The standard deviation of data set A is less than the standard deviation of data set B.
Explain: Because NA > NB
23. In a certain School District, students from grade 6 through grade 12 can participate in a school-sponsored
community service activity. The following bar chart shows the relative frequencies of Dunes from each grade
who participate in the community service activity. Which of the following statements is supported by the bar
chart?

A) The greatest number of participating students was in grade 10.


B) The number of participating students in grade 6 was equal to the number of participating students in grade
7.
C) The relative frequency of all participating students in grades 6 and 7 combined was 0.60.
D) Grade 12 had the least relative frequency of participating students.
E) Grade 11 had the greatest relative frequency of participating students.
24. Scientists estimate that the distribution of the life span of the Galápagos Islands giant tortoise is approximately
normal with mean 100 years and standard deviation 15 years. Based on the estimate, which of the following is
closest to the age of a Galápagos Islands giant tortoise at the 90th percentile of the distribution?
A) 80 years
B) 115 years
C) 120 years
D) 125 years
E) 130 years
Explain:
- Find the z-score corresponding to the 90th percentile. For a normal distribution, the cumulative
distribution function (CDF) at the 90th percentile is approximately 0.9. Using a z-score table or online
calculator, we find that the z-score closest to 0.9 is 1.28.
- Multiply the z-score by the standard deviation to find the value at the desired percentile: 1.28 * 15 years
= 19.2 years.
- Add the obtained value to the mean to reach the age of the tortoise at the 90th percentile: 100 years +
19.2 years = 119.2 years.
25. At a college the scores on the chemistry final exam are approximately normally distributed, with a mean of 75
and a standard deviation of 12. The scores on the calculus final are also approximately normally distributed, with
a mean of 80 and a standard deviation of 8. A student scored 81 on the chemistry final and 84 on the calculus
final. Relative to the students in each respective class, in which subject did this student do better?
A) The student did better in chemistry.
B) The student did better in calculus.
C) The student did equally well in each course.
D) There is no basis for comparison, since the subjects are different from each other and are in different
departments.
E) There is not enough information for comparison, because the number of students in each class is not
known.
Explain: This is a possibility if the student's scores in both exams are equally distant from their respective
means, relative to the standard deviations. To check this, we can calculate the z-scores for each exam:
Chemistry z-score: (81 - 75) / 12 = 0.5
Calculus z-score: (84 - 80) / 8 = 0.5
Both z-scores are equal (0.5), indicating that the student's performance in both courses deviates from
the mean by the same proportion relative to the standard deviations. So, C) The student did equally well
in each course is the most likely conclusion.
26. Some descriptive statistics for a set of test scores are shown below. For this test, a certain student has a
standardized score of z = -1.2. What score did this student receive on the test?

A) 266.28
B) 779.42
C) 1008.02
D) 1083.38
E) 1311.98
27. The caffeine content of 8-ounce cans of a certain cola drink is approximately normally distributed with mean 33
milligrams (mg). A randomly selected 8-ounce can containing 35 mg of caffeine is 1.2 standard deviations above
the mean. Approximately what percent of 8-ounce cans of the cola have a caffeine content greater than 35 mg?
A) 1%
B) 8%
C) 12%
D) 16%
E) 99%
Explain:
Step 1 We find the standard deviation
A randomly selected 8-ounce can containing 35mg of caffeine is 1.2 standard devatons doove te mean
Mean = 33mg
35 = Mean + 1.2 (standard deviation)
35 = 33 + 1.2(standard deviation)
35 - 33 = 1.2 (standard deviation)
2 = 1.2 (standard deviation)
Standard deviation = 2/1.2 = 1.6666666667 = 1.67
Step 2
Using the z score formula
Z score =x - µ/ơ
x= 35mg u= mean = 33mg
ơ = standard deviation = 1.67 = 35 - 33/1.67 =1.1976
Probabilty value from Z-Table: P(x<35) = 0.88446
P(x>35) = 1 - P(x<35) =1-0.88446 =0.11554
Converting to percentage = 0.11554 × 100 = 11.554 = 12%
28. The weight of adult male grizzly bears living in the wild in the continental United States is approximately
normally distributed with a mean of 500 pounds and a standard deviation of 50 pounds. The weight of adult
female grizzly bears is approximately normally distributed with a mean of 300 pounds and a standard deviation
of 40 pounds. Approximately, what would be the weight of a female grizzly bear with the same standardized
score (z-score) as a male grizzly bear with a weight of 530 pounds?
A) 276 pounds
B) 324 pounds
C) 330 pounds
D) 340 pounds
E) 530 pounds
Explain:
Let the weight of male bears be X
Mean M = 500 and standard deviation ơM = 50
Z score for x = 530 is
Let weight of female bears be Y
Mean F = 300 and standard deviation = 40
Z score for Y = 530 (given)
Z score for x = 530 is 0.6
0.6 =


29. Gina's doctor told her that the standardized score (z- score) for her systolic blood pressure, as compared to the
blood pressure of other women her age, is 1.50. Which of the following is the best interpretation of this
standardized score?
A) Gina's systolic blood pressure is 150.
B) Gina's systolic blood pressure is 1.50 standard deviations above the average systolic blood pressure of
women her age.
C) Gina's systolic blood pressure is 1.50 above the average systolic blood pressure of women her age.
D) Gina's systolic blood pressure is 1.50 times the average systolic blood pressure for women her age.
E) Only 1.5% of women Gina's age have a higher systolic blood pressure than she does.
30. A candy company produces individually wrapped candies. The quality control manager for the company believes
that the weight of the candies is approximately normally distributed with mean 720 milligrams (mg).If the
manager's belief is correct, which of the following intervals of weights will contain the largest proportion of the
candies in the distribution of weights?
A) 740 mg to 780 mg
B) 700 mg to 740 mg
C) 680 mg to 720 mg
D) 660 mg to 700 mg
E) 620 mg to 660 mg
31. Shalise competed in a jigsaw puzzle competition where participants are timed on how long they take to
complete puzzles of various sizes. Shalise completed a small puzzle in 75 minutes and a large jigsaw puzzle in 140
minutes. For all participants, the distribution of completion time for the small puzzle was approximately normal
with mean 60 minutes and standard deviation 15 minutes. The distribution of completion time for the large
puzzle was approximately normal with mean 180 minutes and standard deviation 40 minutes. Approximately
what percent of the participants had finishing times greater than Shalise's for each puzzle?
A) 16% on the small puzzle and 16% on the large puzzle
B) 16% on the small puzzle and 84% on the large puzzle
C) 32% on the small puzzle and 68% on the large puzzle
D) 84% on the small puzzle and 84% on the large puzzle
E) 84% on the small puzzle and 16% on the large puzzle
Explain: According to the empirical rule, approximately 68% of the completion times are within 1 standard
deviation of the mean of 60 minutes for the smaller puzzle. By symmetry, 16% of the remaining completion
times are less than 45 minutes and 16% of the completion times are greater than 75 minutes. For the large
puzzle, the empirical rule guarantees that approximately 68% of the times will be within 1 standard deviation
of the mean of 180 minutes. By symetry, 16% of the remaining time are less than 140 minutes and 16% of
the times are greater than 220 minutes . Therefore 84% of the times will be greater than Shalise time of 140
minutes on the large puzzle
32. The distribution of heights of 6-year-old girls is approximately normally distributed with a mean of 46.0 inches
and a standard deviation of 2.7 inches. Aliyaah is 6 years old, and her height is 0.96 standard deviation above the
mean. Her friend Jayne is also 6 years old and is at the 93rd percentile of the height distribution. At what
percentile is Aliyaah's height, and how does her height compare to Jayne's height?
A) Aliyaah's height is at the 17th percentile of the distribution, and she is shorter than Jayne.
B) Aliyaah's height is at the 67th percentile of the distribution, and she is shorter than Jayne.
C) Aliyaah's height is at the 67th percentile of the distribution, and she is taller than Jayne.
D) Aliyaah's height is at the 83rd percentile of the distribution, and she is shorter than Jayne.
E) Aliyaah's height is at the 83rd percentile of the distribution, and she is taller than Jayne.
33. Which of the following is the correct order from least to greatest for the values of r, s, and t ?

A) r, s, t
B) r, t, s
C) s, t, r
D) t, r, s
E) t, s, r
34. A botanist found a correlation between the length of an Aspen Leaf and its surface area to be 0.94. Why does the
correlation value of 0.94 not necessarily indicate that a linear model is the most appropriate model for the
relationship between length of an Aspen Leaf and its surface area?
A) The value must be exactly 1 or to indicate a linear model is the most appropriate model.
B) The value must be 0 to indicate a linear model is the most appropriate model.
C) A causal relationship should be established first before determining the most appropriate model.
D) The value of 0.94 implies that only 88% of the data have a linear relationship.
E) Even with a correlation value of 0.94, it is possible that the relationship could still be better represented by a
nonlinear model.
35. A family would like to build a linear regression equation to predict the amount of grain harvested per acre of
land on their Farm. They subdivide their land into several smaller plots of land for testing and would like to select
an exploratory variable they can control. Which of the following is an appropriate exploratory variable that the
family could use to create a linear regression equation?
A) The total amount of rainfall recorded at their farm
B) The type of crop planted in the plot the previous year.
C) The average daily temperature at their farm.
D) The variety of grain planted in the plot.
E) The amount of fertilizer applied to each plot of land.
36. A market researcher asked a group of men and women to choose their favorite color design from a sample of
advertisements. The results are shown in the following table. Which of the following statements is not supported
by the table?

A) More men than women chose the color design red with black.
B) More women than men chose the color design yellow with black.
C) For men, the number who chose a design with black was greater than the number who chose a design with
blue.
D) The color design chosen by the most people was green with blue.
E) The total number of men surveyed by the market researcher was equal to the total number of women
surveyed by the market researcher.
37. The following segmented bar chart shows the number of flights that were either on time or delayed at three
different airports on one day. Which of the following statements is supported by the bar chart?

A) Airport T has the greatest percentage of on-time flights compared to the other two airports.
B) Airport R has the least percentage of on-time flights compared to the other two airports.
C) The number of on-time flights at Airport S is half the number of on-time flights at Airport T.
D) The number of on-time flights at Airport R is less than the number of on-time flights at Airport S.
E) The number of flights at Airport T is equal to the total number of flights at Airports R and S combined.
38. A survey of 57 students was conducted to determine whether or not they held jobs outside of school. The two-
way table above shows the number of students by employment status (job, no job), and class (juniors, seniors).
Which of the following best describes the relationship between employment status and class?

A) There appears to be no association, since the same number of juniors and seniors have jobs
B) There appears to be no association, since close to half of the students have jobs.
C) There appears to be an association, since there are more seniors than juniors in the survey.
D) There appears to be an association, since the proportion of juniors having jobs is much larger than the
proportion of seniors having jobs.
E) A measure of association cannot be determined from these data.
39. In northwest Pennsylvania, a zoologist recorded the ages, in months, of 55 bears and whether each bear was
male or female. The data are shown in the back-to-back stemplot below. Based on the stemplot, which of the
following statements is true?

A) The median age and the range of ages are both greater for female bears than for male bears.
B) The median age and the range of ages are both less for female bears than for male bears.
C) The median age is the same for female bears and male bears, and the range of ages is the same for female
bears and male bears.
D) The median age is less for female bears than for male bears, and the range of ages is greater for female bears
than for male bears.
E) The median age is greater for female bears than for male bears, and the range of ages is less for female bears
than for male bears.
40. A school is having a contest in which students guess the number of candies in a jar. The student whose guess is
closest to the correct number of candies in the jar wins a prize. The number of candies guessed by male and
female students is shown in the back-to-back stemplot below.
Which of the following statements is true about the distributions of guesses?

A) The distribution of guesses for male students is skewed to the left, and the distribution of guesses for female
students is skewed to the right.
B) The distribution of guesses for male students is skewed to the right, and the distribution of guesses for
female students is skewed to the left.
C) The distributions of guesses for male and female students are both skewed to the right.
D) The distributions of guesses for male and female students are both skewed to the left.
E) The distributions of guesses for male and female students are both symmetric.
41. Janelle collected data on the amount of time in minutes each person in a large sample of customers spent in a
local store. The data also included recording the gender of each customer. These data were used to generate the
boxplots shown below. Which of the following statements is true?

A) The range in the amount of time in minutes males in the sample of customers spent in the store is
approximately 40 minutes.
B) The mean amount of time in minutes males in the sample of customers spent in the store is approximately
20 minutes.
C) The third quartile of the amount of time in minutes males in the sample of customers spent in the store is
approximately 45 minutes.
D) The interquartile range of the amount of time in minutes females in the sample of customers spent in the
store is 15 minutes.
E) Approximately half of the males in the sample of customers spent at least as much time in the store as any
female in the sample of customers.
42. The boxplots above summarize two data sets, A and B. Which of the following must be true?
Set A contains more data than Set B.
The box of Set A contains more data than the box of Set B.
The data in Set A have a larger range than the data in Set B.

A) I only
B) III only
C) I and II only
D) II and III only
E) I, II, and III
43. The following bar chart displays the relative frequency of responses of students, by grade level, when asked, “Do
you volunteer in a community-service activity? Which of the following statements is not supported by the bar
chart?

A) More than 60% of both tenth-grade and eleventh-grade students responded yes.
B) Twelfth-grade students had the least percentage of students respond yes.
C) Less than 40% of tenth-grade students responded no.
D) The number of tenth-grade students who responded yes was greater than the number of ninth-grade
students who responded yes.
E) The percentage of eleventh-grade students who responded no was less than the percentage of ninth-grade
students who responded no.
44. In a standard golf tournament, golfers play 18 holes of golf on each of 4 consecutive days. For each hole, golfers
keep track of the number of times they hit the ball (strokes) before the ball goes into the cup. A golfer’s score for
the tournament is the total number of strokes needed to complete the tournament. The boxplots below
summarize the scores for golfers who competed in tournament 1 and golfers who competed in tournament 2.
Based on the boxplots, which of the following statements must be true?

A) More golfers played in tournament 1 than in tournament 2.


B) In both tournaments, at least half the golfers completed the tournament with a score less than 288.
C) The number of golfers who completed tournament 1 with a score less than 288 was greater than the number
of golfers who completed tournament 2 with a score less than 288.
D) The range of scores for tournament 1 is less than the range of scores for tournament 2.
E) The score of the golfer with the least score in tournament 1 was greater than the score of the golfer with the
least score in tournament 2.
45. A sociologist collected data from a sample of people on their highest level of education and the number of times
they visited any fast food restaurant during the previous week. The data are summarized in the boxplots. Based
on the boxplots, which of the following statements must be true?

A) The number of people surveyed at the more than four-year college level is greater than the number of
people surveyed at the high school level.
B) The proportion of people surveyed from the first quartile to the third quartile at the four-year college level is
less than the respective proportion at the community college level.
C) The interquartile range (IQR) for the number of visits at the community college level.
D) The maximum number of visits at the community college level is greater than the maximum number of visits
at the high school level.
E) The median number of visits at the four-year college level is greater than the median number of visits at the
high school level.
46. Nutritionists examined the sodium content of different brands of potato chips. Each brand was classified as
either healthy or regular based on how the chips were marketed to the public. The sodium contents, in
milligrams (mg) per serving, of the chips are summarized in the boxplots below. Based on the boxplots, which
statement gives a correct comparison between the two classifications of the sodium content of the chips?

A) The number of brands classified as healthy is greater than the number of brands classified as regular.
B) The interquartile range (IQR) of the brands classified as healthy is greater than the IQR of the brands
classified as regular.
C) The range of the brands classified as healthy is less than the range of the brands classified as regular.
D) The median of the brands classified as healthy is more than twice the median of the brands classified as
regular.
E) The brand with the least sodium content and the brand with the greatest sodium content are both classified
as healthy.
47. The director of a technical school was curious about whether there is a relationship between students who
complete one of the school's most popular health sciences certificate programs and whether those students go
on to complete more advanced studies in the health sciences within two years of completing the certificate
program. She randomly selected 100 students who completed the program. Data collected on these students are
shown in the table below. Which of the following statements is true for these 100 students?
A) Being a person who completed more advanced studies is more likely than being a person who did not complete
more advanced studies.
B) Being a person who completed the program is less likely than being a person who did not complete the program.
C) Being a person who completed the program and completed more advanced studies is less likely than being a
person who did not complete the program and did not complete more advanced studies.
D) Being a person who did not complete the program but completed more advanced studies is less likely than being
a person who completed the program and completed more advanced studies.
E) Being a person who completed the program but did not complete more advanced studies is more likely than
being a person who did not complete the program and did not complete more advanced studies.
48. The figure above summarizes the heights, in centimeters, of approximately 400 pine seedlings six years after they
were planted at a center for environmental study. Approximately half of the trees were fertilized yearly, and the
remaining trees were never fertilized. Which of the following statements about the medians and interquartile
ranges (IQRs) of the heights of the two groups of trees 6 years after being planted is true?

A) The medians and IQRs are the same for the unfertilized trees and the fertilized trees.
B) The median for the unfertilized trees is greater than the median for the fertilized trees, and the IQR is also
greater for the unfertilized trees.
C) The median for the unfertilized trees is the same as the median for the fertilized trees, and the IQR is greater
for the unfertilized trees.
D) The median for the unfertilized trees is less than the median for the fertilized trees, and the IQR is greater for
the unfertilized trees.
E) The median for the unfertilized trees is less than the median for the fertilized trees, and the IQR is less for
the unfertilized trees.
49. Grain moisture is a characteristic of grain that affects the price paid for the grain. A random sample of 28 loads of
corn was evaluated for moisture as a percent of the total weight. A different random sample of 28 loads of
soybeans was also evaluated for moisture. The data are displayed in the dotplots below. Based on the dotplots,
which of the following is greater for the percent moisture of corn than for the percent moisture of soybeans?

A) The first quartile


B) The median
C) The third quartile
D) The range
E) The interquartile range
50. A random sample of 374 United States pennies was collected, and the age of each penny was determined.
According to the boxplot below, what is the approximate interquartile range (IQR) of the ages?
A) 8
B) 10
C) 16
D) 40
E) 50
51. The histogram shown summarizes the responses of 100 people when asked, “What was the price of the last meal
you purchased?” Based on the histogram, which of the following could be the interquartile range of the prices?

A) $40
B) $21
C) $10
D) $5
E) $3
52. A random sample of 25 households from the Mountainview School District was surveyed. In this survey, data
were collected on the age of the youngest child living in each household. The histogram below displays the data
collected in the survey. In which of the following intervals is the median of these data located?

A) 0 years old to less than 2 years old


B) 4 years old to less than 6 years old
C) 6 years old to less than 8 years old
D) 8 years old to less than 10 years old
E) 10 years old to less than 12 years old
53. The histogram below displays the frequencies of waiting times, in minutes, for 175 patients in a dentist's office.
Which of the following could be the median of the waiting times, in minutes?
A) 2.50
B) 7.25
C) 12.25
D) 15.00
E) 17.50
54. First-year students enrolled at a college were asked whether they play video games. The responses, classified by
whether the students were enrolled in the school of sciences or the school of arts, are shown in the table. Of all
the students enrolled in the school of arts who responded, approximately what proportion responded that they
play video games?

A) 0.242
B) 0.401
C) 0.438 (=347/793)
D) 0.554
E) 0.605
55. A sample of 942 homeowners are classified, in the two-way frequency table below, by the number of credit cards
they have and the number of years they have owned their current homes. Of the homeowners in the sample
who have four or more credit cards, what proportion have owned their current homes for at least one year?

A) 78/212
B) 78/258
C) 78/942
D) 212/942
E) 258/942
56. As part of a study on the relationship between the use of tanning booths and the occurrence of skin cancer,
researchers reviewed the medical records of 1,436 people. The table below summarizes tanning booth use for
people in the study who did and did not have skin cancer. Of the people in the study who had skin cancer, what
fraction used a tanning booth?
A) 190/265
B) 190/896
C) 190/1,436
D) 265/1,436
E) 896/1,436
57. The following question(s) refer to the following scenario and set of data. In the 1830s, land surveyors began to
survey the land acquired in the Louisiana Purchase. Part of their task was to note the sizes of trees they
encountered in their surveying. The table of data below is for bur oak trees measured during the survey. An
outlier may be defined as a data point that is more than 1.5 times the interquartile range below the lower
quartile or is more than 1.5 times the interquartile range above the upper quartile. According to this definition,
what is the diameter, in inches, of the smallest tree that is an outlier?
A) 4
B) 28
C) 30
D) 34
E) 36
58. Which of the following describes a continuous variable?
A) The number of items sold at a craft booth for one day
B) The number of apps downloaded from a website one day
C) The diameters of the tree trunks at an evergreen farm
D) The number of baskets made by a basketball player
E) The shoe sizes of all shoes on sale at a department store
59. Professor James gave the same test to his three sections of statistics students. On the 35-question test, the
highest score was 32 and the lowest was 15. Based on the information displayed in the boxplots above, which of
the following statements is true?

A) Section 1 has the smallest interquartile range.


B) The lowest score in section 2 is higher than the highest score in either of the other sections.
C) Section 2 has the smallest range of scores.
D) The top 25% of scores in section 2 are lower than the highest score in section 3.
E) At least 50% of the scores in section 3 are higher than all of the scores in section 1.
60. Each person in a random sample of adults was asked how many DVDs he or she owned. Summary statistics are
given below. Which of the following statements is true?
A) Seventy-five percent of the adults in the sample own more than 95 DVDs.
B) Fifty percent of the adults in the sample own between 0 and 129.4 DVDs.
C) The distribution of the number of DVDs owned appears to be approximately symmetric.
D) The interquartile range of the number of DVDs owned is 65. (95-30)
E) The distribution of the number of DVDs owned contains outliers on both the low side and the high side.
61. A scientist recorded the duration of the eruptions of the Old Faithful geyser in Yellowstone National Park that
occurred during a one-month time period. The histogram below shows the distribution of the duration, in
seconds, of the eruptions. Based on the histogram, which of the following is the best description of the
distribution?

A) The distribution is uniform, is centered at about 200 seconds, and has a range of at most 250 seconds.
B) The distribution is skewed to the left, is centered at about 125 seconds, and has a range of at most 250
seconds.
C) The distribution is skewed to the right, is centered at about 260 seconds, and has a range of at most 250
seconds.
D) The distribution displays two clusters, has a range of at most 200 seconds, and includes outliers below 75
seconds and above 325 seconds.
E) The distribution displays two clusters, with one cluster centered at about 125 seconds and the other
centered at about 260 seconds, and has a range of at most 250 seconds.
62. A group of students played a game in which they earned points for answering questions correctly. The following
dotplot shows the total number of points earned by each student. Which of the following is the best description
of the distribution of points earned?

A) Approximately normal
B) Bimodal without a gap
C) Bimodal with a gap
D) Skewed to the right without a gap
E) Skewed to the right with a gap
63. The prices, in thousands of dollars, of 304 homes recently sold in a city are summarized in the histogram below.
Based on the histogram, which of the following statements must be true?
A) The minimum price is $250,000.
B) The maximum price is $2,500,000.
C) The median price is not greater than $750,000.
D) The mean price is between $500,000 and $750,000.
E) The upper quartile of the prices is greater than $1,500,000.
64. For a sample of 42 rabbits, the mean weight is 5 pounds and the standard deviation of weights is 3 pounds.
Which of the following is most likely true about the weights for the rabbits in this sample?
A) The distribution of weights is approximately normal because the sample size is 42, and therefore the central
limit theorem applies.
B) The distribution of weights is approximately normal because the standard deviation is less than the mean.
C) The distribution of weights is skewed to the right because the least possible weight is within 2 standard
deviations of the mean.
D) The distribution of weights is skewed to the left because the least possible weight is within 2 standard
deviations of the mean.
E) The distribution of weights has a median that is greater than the mean.
65. The number of hurricanes reaching the East Coast of the United States was recorded for each of the last ten
decades by the National Hurricane Center. Summary measures are shown below.
Min = 12 Max = 24
Lower quartile = 15 Upper quartile = 18
Median = 16 n = 10
Which of the following statements is true?
A) The smallest observation is 12 and it is an outlier. No other observations in the data set could be outliers.
B) The largest observation is 24 and it is an outlier. No other observations in the data set could be outliers.
C) Both 12 and 24 are outliers. It is possible that there are also other outliers.
D) 12 is an outlier and it is possible that there are other outliers at the low end of the data set. There are no
outliers at the high end of the data set.
E) 24 is an outlier and it is possible that there are other outliers at the high end of the data set. There are no
outliers at the low end of the data set.
66. Data will be collected on the following variables. Which variable can be considered discrete?
A) The height of a person
B) The weight of a person
C) The length of a person’s arm span
D) The time it takes for a person to solve a puzzle
E) The number of books a person finished reading last month
67. Data on homes recently sold in a certain town included the area of the home, reported in square feet. The table
below shows summary statistics of the reported areas, in square feet. An auditor determined that an error was
made in the reported areas and that all of the areas should have been 100 square feet greater than what was
reported. The areas were corrected and new summary statistics were reported. What are the interquartile range
(IQR) and the standard deviation of the corrected areas?

A) IQR 102, standard deviation 61.0723


B) IQR 102, standard deviation 161.0723
C) IQR 202, standard deviation 61.0723
D) IQR 202, standard deviation 161.0723
E) IQR 187, standard deviation 61.0723
68. Administrators at a state university computed the mean GPA (grade point average) for juniors and seniors
majoring in either physics or chemistry. The results are displayed in the table below. When juniors and seniors
are grouped together, could physics majors have a higher mean GPA than chemistry majors?

A) No. The physics majors’ mean GPA for juniors and seniors must be 3.0, while the chemistry majors’ mean
GPA for juniors and seniors must be 3.3.
B) No. There is not enough information to determine the mean GPA for each major, but it must be higher for
chemistry majors than for physics majors.
C) Yes. It could happen. Whether it does happen depends on the number of juniors and seniors in each major.
D) Yes. It could happen. Whether it does happen depends on the variability of the GPAs within each of the four
groups of students.
E) Yes. It could happen. Whether it does happen depends on the shapes of the distributions of the GPAs for
each of the four groups of students.
69. Each value in a sample has been transformed by multiplying by 3 and then adding 10. If the original sample had a
variance of 4, what is the variance of the transformed sample?
A) 4
B) 12
C) 16
D) 22
E) 36
Explain:
When you apply a linear transformation of the form y = ax + b to a dataset, the variance of the transformed
data (y) is related to the variance of the original data (x) by the following formula:
Var(y) = a^2 * Var(x)
In this case, the transformation is to multiply by 3 and then add 10, which can be expressed as y = 3x + 10.
Therefore, a = 3 in the formula.
We are given that the original variance (Var(x)) is 4. Plugging this into the formula:
Var(y) = 3^2 * 4 = 9 * 4 = 36
Therefore, the variance of the transformed sample is 36.
70. A graduate student conducted a study of field mice in rural Kansas. The student obtained a sample of 100 field
mice and recorded the weight, in grams, of each mouse. After the measurements were taken, it was discovered
that the scale was not calibrated correctly. The student adjusted the 100 recorded measurements by subtracting
3 grams from each measurement. Which of the following statistics for the weight, in grams, of the field mice has
the same value before and after the adjustment?
A) The median
B) The mean
C) The first quartile
D) The third quartile
E) The interquartile range
71. A data set of test scores is being transformed by applying the following rule to each of the raw scores.
Transformed score = 3.5(raw score) + 6.2.
Which of the following is NOT true?
A) The mean transformed score equals 3.5(the mean raw score) + 6.2.
B) The median transformed score equals 3.5(the median raw score) + 6.2.
C) The range of the transformed scores equals 3.5(the range of the raw scores) + 6.2.
D) The standard deviation of the transformed scores equals 3.5(the standard deviation of the raw scores).
E) The IQR of the transformed scores equals 3.5(the IQR of the raw scores).
72. A local company is interested in supporting environmentally friendly initiatives such as carpooling among
employees. The company surveyed all of the 200 employees at the downtown offices. Employees responded as
to whether or not they own a car and to the location of the home where they live. The results are shown in the
table below. Which of the following statements about a randomly chosen person from these 200 employees is
true?

A) If the person owns a car, he or she is more likely to live elsewhere in the city than to live in the downtown
area in the city.
B) If the person does not own a car, he or she is more likely to live outside the city than to live in the city
(downtown area or elsewhere).
C) The person is more likely to own a car if he or she lives in the city (downtown area or elsewhere) than if he
or she lives outside the city.
D) The person is more likely to live in the downtown area in the city than elsewhere in the city.
E) The person is more likely to own a car than not to own a car.
73. One statistic calculated for pitchers in baseball is called the earned run average, or ERA. The following boxplots
summarize the ERA for pitchers in two leagues, A and B. Based on the boxplots, which of the following statistics
is the same for both leagues?

A) The range
B) The interquartile range
C) The median
D) The minimum
E) The maximum
74. Roger claims that the two statistics most likely to change greatly when an outlier is added to a small data set are
the mean and the median. Is Roger’s claim correct?
A) Yes, both the mean and median are likely to change greatly.
B) No, only the mean is likely to change greatly.
C) No, only the median is likely to change greatly.
D) No, neither the mean nor the median are likely to change greatly.
E) There is not enough information to determine if the mean or the median is likely to change greatly.
75. A child psychologist asked 100 five year olds and 50 ten year olds to name their favorite color. Their results are
shown in the following table. Which of the following statements is supported by the table?

A) The percentage of five year olds who selected red or blue as their favorite color is greater than the
percentage of ten year olds who selected red or blue as their favorite color.
B) The percentage of five year olds who selected yellow as their favorite color is greater than the percentage of
ten year olds who selected yellow as their favorite color.
C) The percentage of children who selected red, yellow, or blue as their favorite color was equal for both ages.
D) Less than half of the five year olds selected red, yellow, or blue as their favorite color.
E) Less than half of the ten year olds selected red, yellow, or blue as their favorite color.
76. The following data were collected from a random sample of people, who identified their favorite type of juice.
The results are shown in the following two-way table. What proportion of the children identified orange as their
favorite type of juice?

A) 400/1,000
B) 400/700
C) 400/2,000
D) 600/1,300
E) 1,000/2,000
77. The following frequency table shows the responses from a group of college students who were asked to choose
their favorite flavor of ice cream. Which of the following statements is not supported by the table?

A) The number of student responses is 300.


B) One-third of the students chose vanilla.
C) One-third of the students chose chocolate or strawberry.
D) One-fourth of the students chose mint chip or coffee.
E) One-half of the students chose vanilla or chocolate.
78. The following data were collected from a random sample of people on their favorite types of leisure activities
and their age. The results are shown in the two-way table below. What proportion of the people aged 7 to 18
years gave watching television as their favorite type of leisure activity?
A) 300/2,200
B) 200/900
C) 100/1,300
D) 640/3,500
E) 300/640
79. The following pie chart summarizes the results of a survey given to airlines about the primary reason for flight
delays. Which of the following statements is supported by the pie chart?

A) The reason given most frequently was runway closure.


B) More delays were caused by weather than by all other reasons combined.
C) More delays were caused by runway closure than were caused by overbooking.
D) Overbooking and runway closure accounted for greater than one-fourth of the reasons given for flight
delays.
E) The combined percentage for other and runway closure was equal to the percentage for overbooking.
80. Which of the following statements is true about a distribution that appears to have a gap when displayed as a
histogram?
A) The distribution must have an outlier.
B) The distribution has a region between two data values where no data were observed.
C) The distribution is approximately normal.
D) The distribution cannot be symmetric.
E) The distribution must be bimodal.
81. The following boxplot shows the typical gas mileage, in miles per gallon, for 20 different car models. Based on
the boxplot, the top 25 percent of the cars have a typical gas mileage of at least how many miles per gallon?
A) 15
B) 20
C) 25
D) 35
E) 50
82. A golfer recorded the following scores for each of four rounds of golf: 86, 81, 87, 82. The mean of the scores is
84. What is the sum of the squared deviations of the scores from the mean?
A) ∑(x−x¯)=(86−84)+(81−84)+(87−84)+(82−84)
B) ∑|x−x¯|=|86−84|+|81−84|+|87−84|+|82−84|
C) 2∑|x−x¯|=2[|86−84|+|81−84|+|87−84|+|82−84|]
D) ∑(x−x¯)2=(86−84)2+(81−84)2+(87−84)2+(82−84)2
E) [∑|x−x¯|]2=[|86−84|+|81−84|+|87−84|+|82−84|]2
83. One way to measure the duration of subterranean disturbances such as earthquakes and mining is to calculate
the root-mean-square time. The following histograms summarize the distributions of the root-mean-square
times for two sources of disturbances. Based on the histograms, which of the following correctly compares the
two distributions?

A) The median of the earthquake disturbances is equal to the median of the mining disturbances.
B) The median of the earthquake disturbances is less than the median of the mining disturbances.
C) The range of the earthquake disturbances is equal to the range of the mining disturbances.
D) The range of the earthquake disturbances is less than the range of the mining disturbances.
E) The mode of the earthquake disturbances is equal to the mode of the mining disturbances.
84. An amusement park attraction has a sign that indicates that a person must be at least 48 inches tall to ride the
attraction. The following boxplot shows the heights of a sample of people who entered the amusement park on
one day. Based on the boxplot, approximately what percent of the people who entered the amusement park met
the height requirement for the attraction?
A) 25%
B) 48%
C) 50%
D) 75%
E) 100%
85. The following histogram summarizes the amount spent on plane tickets to travel home, in dollars, for a group of
30 college students. If the interval size is decreased from $200 to $100, which of the following must remain the
same on the new histogram?

A) The heights of the bars


B) The widths of the bars
C) The number of bars
D) The sum of the frequencies
E) The shape of the distribution
86. A study was conducted on three types of home siding and the type of damage done to the siding by
woodpeckers. Each hole made by a woodpecker was classified as either drumming (territorial signaling),
foresting (looking for food), or nesting. The following bar chart shows the relative frequency of the holes for each
type of siding. Which of the following statements is supported by the bar chart?

A) The proportion of holes created for drumming is the same for all three siding types.
B) The proportion of holes created for drumming is greatest for grooved plywood.
C) The proportion of holes created for drumming is least for grooved plywood.
D) The number of holes created for drumming is least for grooved plywood.
E) The number of holes created for drumming is greatest for nonwood.
87. Resting heart rates, in beats per minute, were recorded for two samples of people. One sample was from people
in the age-group of 20 years to 30 years, and the other sample was from people in the age-group of 40 years to
50 years. The five-number summaries are shown in the table. The values 60, 62, and 84 were common to both
samples. The three values are identified as outliers with respect to the age-group 20 years to 30 years because
they are either 1.5 times the interquartile range (IQR) greater than the upper quartile or 1.5 times the IQR less
than the lower quartile. Using the same method for identifying outliers, which of the three values are identified
as outliers for the age-group 40 years to 50 years?

A) None of the three values is identified as an outlier.


B) Only 60 is identified as an outlier.
C) Only 60 and 62 are identified as outliers.
D) Only 60 and 84 are identified as outliers.
E) The three values are all identified as outliers.
88. A certain motel is roughly 20 miles from the entrance to Yosemite National Park. The motel manager wants to
get a better estimate of the distance and asks five people to each measure the distance, to the nearest tenth of a
mile, using the odometer in his or her car. The manager will use the median of the five measurements as the
estimate of the distance. Which of the following statements is NOT a statistical justification for the manager’s
plan?
A) Odometer reading should be considered a variable when used to measure this distance.
B) The median of the five measurements is more likely to be close to the actual distance than is a single
measurement.
C) The actual distance should be considered a variable, and taking five measurements allows the manager to
estimate the variability in the actual distance.
D) If one or two odometers give inaccurate readings, the estimate still should be fairly close to the actual
distance.
E) The manager can get some indication of how far off the estimate might be.
89. As part of a community service program, students in three middle school grades (grade 6, grade 7, grade 8) each
chose to participate in one of three school-sponsored volunteer activities. The graph below shows the
distribution for each class for the three activities. Based on the graph, which statement must be true?

A) Of all the students who chose activity B, the greatest number of students were in grade 6.
B) Grade 7 and grade 8 had the same number of students who did not choose activity A.
C) The grade with the greatest percentage of students who chose activity C was grade 8.
D) For students in grade 7, the number who chose activity C was greater than the number who chose activity B.
E) For students in grade 8, the number who chose activity A was greater than the number who chose activity B.
90. An airline recorded the number of on-time arrivals for a sample of 100 flights each day. The boxplot below
summarizes the recorded data for one year. Based on the boxplot, which of the following statements must be
true?
A) The range of the number of on-time arrivals is greater than 90.
B) The interquartile range of the number of on-time arrivals is 22.
C) The number of days that had at least 80 on-time arrivals is greater than the number of days that had at most
76 on-time arrivals.
D) The number of days that had from 76 to 80 on-time arrivals is equal to the number of days that had at most
76 on-time arrivals.
E) The difference between the median and the lower quartile for the number of on-time arrivals is less than 2.
91. The pulse rate for each person in a sample of 20 men and 20 women was recorded. The boxplots below
summarize the pulse rates for the men and the women in the sample. Which of the following statements about
the people in the sample must be true?

A) There are more people between the first and third quartiles for women than there are between the first and
third quartiles for men.
B) The person with the lowest pulse rate is a woman.
C) At least half of the women had higher pulse rates than three-fourths of the men.
D) More than half of the men had lower pulse rates than three-fourths of the women.
E) If a man and a woman were randomly selected from the 40 people, the man would have the lower pulse
rate.
92. Data were collected on the amount, in dollars, that individual customers spent on dinner in an Italian restaurant.
The quartiles for these data are given below. Which of the following statements must be true for these
customers?

A) At least half of the customers spent less than or equal to $44.27 and at least half spent greater than or equal
to $44.27.
B) Seventy-five percent of the customers spent between $36.27 and $58.97.
C) Twenty-five percent of the customers spent less than or equal to $58.97 and the remaining 75 percent spent
greater than or equal to $58.97.
D) The mean amount spent by customers is $44.27.
E) A majority of customers spent $44.27.
93. The back-to-back stem-and-leaf plot below gives the percentage of students who dropped out of school at each
of the 49 high schools in a large metropolitan school district. Which of the following statements is NOT justified
by these data?
A) The drop-out rate decreased in each of the 49 schools between the 1989-90 and 1992-1993 school years.
B) For the school years shown, most students in the 49 schools did not drop out of high school.
C) In general drop-out rates decreased between the 1989-90 and 1992-1993 school years.
D) The median drop-out rate of the 49 high schools decreased between the 1989-90 and 1992-1993 school
years.
E) The spread between the schools with the lowest drop-out rates and those with the highest drop-out rates
did not change much between the 1989-90 and 1992-1993 school years.
94. The following histogram shows the ages, in years, of the people who attended a documentary at a movie theater.
Based on the histogram, which of the following statements best describes the relationship between the mean
and the median of the distribution of ages?

A) The mean and the median are equal in value because the distribution is symmetric.
B) The mean is most likely less than the median because the distribution is skewed to the right.
C) The mean is most likely less than the median because the distribution is skewed to the left.
D) The mean is most likely greater than the median because the distribution is skewed to the right.
E) The mean is most likely greater than the median because the distribution is skewed to the left.
95. Which of the following statistics is defined as the 50th percentile?
A) The mean
B) The median
C) The mode
D) The interquartile range
E) The standard deviation
96. The following list shows the selling prices of 8 houses in a certain town. What is the median selling price of the
houses in the list?

A) $263,200
B) $283,300
C) $288,450
D) $290,600
E) $293,400
97. Heights, in inches, for the 200 graduating seniors from Washington High School are summarized in the frequency
table below. Which of the following statements about the median height is true?
A) It is greater than or equal to 78 inches.
B) It is greater than or equal to 72 inches but less than 78 inches.
C) It is greater than or equal to 66 inches but less than 72 inches.
D) It is greater than or equal to 60 inches but less than 66 inches.
E) It is less than 60 inches.
98. A statistician at a metal manufacturing plant is sampling the thickness of metal plates. If an outlier occurs within
a particular sample, the statistician must check the configuration of the machine. The distribution of metal
thickness has mean 23.5 millimeters (mm) and standard deviation 1.4 mm. Based on the two-standard
deviations rule for outliers, of the following, which is the greatest thickness that would require the statistician to
check the configuration of the machine?
A) 19.3mm
B) 20.6mm
C) 22.1mm
D) 23.5mm
E) 24.9mm
99. For the three histograms above, which of the following correctly orders the histograms from the one with the
smallest proportion of data above its mean to the one with the largest proportion of data above its mean?

A) J, K, L
B) J, L, K
C) K, L, J
D) L, K, J
E) All three histograms
100. The number of siblings was recorded for each student of a group of 80 students. Some summary
statistics and a histogram displaying the results are shown below. An outlier is often defined as a number that is
more than 1.5 times the interquartile range below the first quartile or above the third quartile. Using the
definition of an outlier and the given information, which of the following can be concluded?

A) The median is greater than the mean, and the distribution has no outliers.
B) The median is greater than the mean, and the distribution has only one outlier.
C) The median is greater than the mean, and the distribution has two outliers.
D) The median is less than the mean, and the distribution has only one outlier.
E) The median is less than the mean, and the distribution has two outliers.
101. The following relative frequency table shows the political party affiliation for a sample of 500 people in a
certain town. Which of the following statements is supported by the table?

A) The number of people affiliated with the Republicans is 30.


B) The number of people affiliated with the Independents is 100.
C) Less than half of the people are affiliated with the Democrats or the Republicans.
D) At least 200 people are affiliated with the Democrats.
E) At least 80 people are affiliated with the Green Party or the Libertarians.
102. A penalty kick in soccer involves two players from different teams, the shooter and the goalie. During the
penalty kick the shooter will try to score a goal by kicking a soccer ball to the left or right of the goal area. To
prevent the shooter from scoring a goal, the goalie will move to the left or right of the goal area. The following
table summarizes the directions taken by the shooter and the goalie for 372 penalty kicks. Which of the following
indicates an association between the shooter's choice of direction and the goalie’s choice of direction?

A) The marginal relative frequencies for the shooter and the goalie are equal.
B) The marginal relative frequencies for the shooter and the goalie are not equal.
C) The row totals are not equal.
D) For the goalie, the relative frequency of a direction is equal to the relative frequency conditioned on the
shooter’s direction.
E) For the goalie, the relative frequency of a direction is not equal to the relative frequency conditioned on the
shooter’s direction.
103. At a photography contest, entries are scored on a scale from 1 to 100. At a recent contest with 1,000
entries, a score of 68 was at the 77th percentile of the distribution of all the scores. Which of the following is the
best description of the 77th percentile of the distribution?
A) There were 770 entries with a score less than or equal to 68.
B) There were at least 230 entries with a score of 77.
C) There were 23% of the entries with a score less than or equal to 68.
D) There were 77% of the entries with a score equal to 68.
E) There were at least 77% of the entries with a score greater than 68.
104. The following table summarizes the number of pies sold at a booth one day at a local farmers market.
Which of the following statements is supported by the table?

A) More cherry pies were sold than any other type of pie.
B) Twice as many apple pies as key lime pies were sold.
C) More than half the pies sold were apple.
D) Fewer than 50 pies were sold at the booth that day.
E) The combined percentage of key lime pies sold and pumpkin pies sold was less than 50%.
105. A sample of 100 students from Liberty High School and a sample of 60 students from Central High School
were asked what they planned to do after graduation. Responses fell into five categories: four-year university
(4Y), community college (CC), join the workforce (W), join the military (M), or undecided (UD). The results are
shown in the following bar chart. Which of the following statements is supported by the bar chart?

A. For the category four-year university, the number of students from Central High School was 10 greater
than the number of students from Liberty High School.
B. At Liberty High School, more students selected a four-year university than any other activity.
C. For the category join the workforce, the number of students from each school was equal.
D. At Central High School, the same number of students selected four-year university and military.
E. For the category undecided, the number of students from Liberty High School was 4 greater than the
number of students from Central High School.
106. A random sample of 1,092 people were asked whether color was a consideration in buying a new car.
They were also asked to identify one additional feature that was important. The responses are shown in the
table. Which of the following is closest to the proportion of people who responded no to color consideration and
who identified safety as the additional feature that was important?

A) 0.18
B) 0.34
C) 0.36
D) 0.49
E) 0.51
107. The following bar chart shows the relative frequency of days of rain for 30 days in four regions of a
certain state. Which of the following statements is not supported by the bar chart?

A) Region D had the greatest percentage of days of rain.


B) Region B had the least percentage of days of rain.
C) Region A had more than 15 days of rain.
D) Region C had more than 25 days of rain.
E) Region D had less than 23 days of rain.
108. The following table shows data for the 8 longest roller coasters in the world as of 2015. Which of the
following variables is categorical?

A) Length
B) Type
C) Speed
D) Height
E) Drop
109. A company wanted to determine the health care costs of its employees. A sample of 25 employees were
interviewed and their medical expenses for the previous year were determined. Later the company discovered
that the highest medical expense in the sample was mistakenly recorded as 10 times the actual amount.
However, after correcting the error, the corrected amount was still greater than or equal to any other medical
expense in the sample. Which of the following sample statistics must have remained the same after the
correction was made?
A) Mean
B) Median
C) Mode
D) Range
E) Variance
110. Which of the following questions about cars in a school parking lot will allow for the collection of a set of
categorical data?
A) How many blue cars are in the lot?
B) What are the gas mileages, in miles per gallon, of the cars in the lot?
C) What are the weights, in pounds, of the cars in the lot?
D) What is the number of cars in the lot with out-of-state license plates?
E) What are the colors of the cars in the lot?
111. A veterinarian collected data on the weights of 1,000 cats and dogs treated at a veterinary clinic. The
weight of each animal was classified as either healthy, underweight, or overweight. The data are summarized in
the table. Based on the data in the table, which of the following is the most appropriate type of graph to visually
show whether a relationship exists between the type of animal and the weight classification?

A) Back-to-back stemplots
B) Scatterplot
C) Side-by-side boxplots
D) Segmented bar chart
E) Dotplot
112. Researchers conducted a telephone survey of 427 adults living in a large city. The adults were asked
whether they planned to purchase a smart watch in the next year. The table shows the responses categorized by
the region of the city in which the residents live. Which of the following graphical displays is most appropriate for
comparing the proportions of those surveyed who plan to purchase a smart watch within the four regions?
A) Skewed to the left (negatively skewed)
B) Skewed to the right (positively skewed)
C) Bimodal
D) Uniform
E) Approximately normal
113. New employees at a large corporation go through a training program during their first week of
employment. The new employees take a written assessment at the completion of the program to determine how
well prepared they are for their jobs. A score greater than the mean indicates a well-prepared employee. Assume
the following distributions of new employee scores have the same mean score, the same maximum score, and
the same minimum score. Which distribution has a shape that is most likely to represent the greatest percent of
well-prepared employees?
A) The distribution of scores is skewed to the right.
B) The distribution of scores is skewed to the left.
C) The distribution of scores is bimodal and symmetric.
D) The distribution of scores is uniform.
E) The distribution of scores is approximately normal.
114. The following table shows data that were collected from a random sample of people, who indicated their
age and their favorite sporting event to watch on television. Based on the results above, what proportion of the
randomly sampled people are over age 12 years?

A) 900/3,500
B) 1,300/3,500
C) 1,200/3,500
D) 2,300/3,500
E) 1,000/3,500
115. Research indicates that the standard deviation of typical human body temperature is 0.4 degree Celsius
(C). Which of the following represents the standard deviation of typical human body temperature in degrees
Fahrenheit (F), where F=9/5C+32
A) 9/5(0.4)+32
B) 9/5(0.4)
C) 9/5(0.4)2
D) (9/5)2(0.4)
E) (9/5)2(0.4)2
116. The following boxplot summarizes the heights of a sample of 100 trees growing on a tree farm. Emily
claims that a tree height of 43 inches is an outlier for the distribution. Based on the 1.5×IQRrule for outliers, is
there evidence to support the claim?

A) Yes, because (max−Q3) is greater than (Q1−min).


B) Yes, because 43 is greater than (Q3+IQR)
C) Yes, because 43 is greater than (Q1−1.5×IQR).
D) No, because 43 is not greater than (Q3+1.5×IQR).
E) No, because 43 is greater than (Q1−1.5×IQR).
117. The following dotplot shows the scores of 25 people who played an online trivia game. Which of the
following statements is the best description of the distribution of scores?

A) The distribution is roughly symmetric.


B) The distribution is roughly uniform.
C) The distribution is skewed left.
D) The distribution is skewed right.
E) The distribution is bimodal.
118. Data were collected on 100 United States coins minted in 2018. Which of the following represents a
quantitative variable for the data collected?
A) The type of metal used in the coin
B) The value of the coin
C) The color of the coin
D) The person depicted on the face of the coin
E) The location where the coin was minted
119. The following list shows the number of video games sold at a game store each day for one week.
15, 43, 50, 39, 22, 16, 20
Which of the following is the best classification of the data in the list?
A) Categorical and continuous
B) Quantitative and continuous
C) Categorical and discrete
D) Quantitative and discrete
E) Neither categorical nor quantitative, and neither discrete nor continuous
120. A split ticket is a voting pattern in which a voter casts votes for candidates from more than one political
party. In a recent study, 1,000 men and women were asked whether they voted a split ticket in the last election.
The totals are shown in the following table. What value of a would indicate no association between gender and
voting pattern for the people in the sample?

A) 300
B) 400
C) 480
D) 500
E) 800
121. Scientists estimate that the distribution of the life span of the Galápagos Islands giant tortoise is
approximately normal with mean 100 years and standard deviation 15 years. Based on the estimate, which of
the following is closest to the age of a Galápagos Islands giant tortoise at the 90th percentile of the distribution?
A) 80 years
B) 115 years
C) 120 years
D) 125 years
E) 130 years
Explain:
We are given that the life span of the Galápagos Islands giant tortoise follows a normal distribution with a
mean of 100 years and a standard deviation of 15 years.
The 90th percentile represents the age at which 90% of the tortoises are younger and 10% are older.
To find the age at the 90th percentile, we need to calculate the z-score corresponding to the 90th percentile
in the standard normal distribution. You can use a z-score table or calculator to find that the z-score for the
90th percentile is approximately 1.28.
Now, we can use the z-score and the known mean and standard deviation to find the age at the 90th
percentile in the actual distribution:
Age at 90th percentile = Mean + (z-score * Standard deviation)
Age = 100 years + (1.28 * 15 years)
Age ≈ 120 years
122. The distribution of the number of transactions per day at a certain automated teller machine (ATM) is
approximately normal with a mean of 80 transactions and a standard deviation of 10 transactions. Which of the
following represents the parameters of the distribution?
A) x¯=80;s=10
B) x¯=80;s2=10
C) x¯=80;σ=10
D) μ=80;σ=10
E) μ=80;s=10
123. At a small coffee shop, the distribution of the number of seconds it takes for a cashier to process an
order is approximately normal with mean 276 seconds and standard deviation 38 seconds. Which of the
following is closest to the proportion of orders that are processed in less than 240 seconds?
A) 0.17
B) 0.25
C) 0.36
D) 0.83
E) 0.95
124. A sleep time of 15.9 hours per day for a newborn baby is at the 10th percentile of the distribution of
sleep times for all newborn babies. Assuming the distribution is normal with standard deviation 0.5 hour,
approximately what is the mean sleep time, in hours per day, for newborn babies?
A) 15.1
B) 15.3
C) 16.3
D) 16.5
E) 16.7
Explain:
1. Identify the z-score: Since the given sleep time (15.9 hours) is at the 10th percentile, it corresponds to a z-score
of approximately -1.28 in the standard normal distribution. (A z-score table or calculator can be used to confirm
this).
2. Relate z-score to mean and standard deviation: In a normal distribution, the z-score represents the number of
standard deviations a specific value is away from the mean. Therefore:
z = (X - μ) / σ
where:
 z is the z-score (-1.28)
 X is the sleep time at the 10th percentile (15.9 hours)
 μ is the mean sleep time for all newborn babies (unknown)
 σ is the standard deviation (0.5 hours)
3. Solve for the mean: Rearranging the equation to solve for μ:
μ=X+z*σ
μ = 15.9 hours - (-1.28) * 0.5 hours
μ ≈ 16.54 hours

125. For a recent season in college football, the total number of rushing yards for that season is recorded for
each running back. The mean number of rushing yards for the running backs that season is 790 yards. One
running back had 1,637 rushing yards for the season, which is 2.42 standard deviations above the mean number
of rushing yards. What is the standard deviation of the number of rushing yards for the running backs that
season?
A) 250 yards
B) 300 yards
C) 350 yards
D) 400 yards
E) 450 yards
Explain:
1. Identify the z-score: The running back with 1,637 yards is 2.42 standard deviations above the mean. This means
their rushing yards have a z-score of 2.42.
2. Relate z-score to mean and standard deviation: We know the z-score formula:
z = (X - μ) / σ
where:
 z is the z-score (2.42)
 X is the rushing yards of the outlier (1,637 yards)
 μ is the mean number of rushing yards (790 yards)
 σ is the standard deviation (unknown)
3. Solve for the standard deviation: We can rearrange the formula to solve for σ:
σ = (X - μ) / z
σ = (1,637 yards - 790 yards) / 2.42
σ ≈ 350 yards
126. Students in a large psychology class measured the time, in seconds, it took each of them to perform a
certain task. The times were later converted to minutes. If a student had a standardized score of z = 1.72 before
the conversion, what is the standardized score for the student after the conversion?
A) z = 0.26
B) z = 1.03
C) z = 1.72
D) z = 1.98
E) The standardized score for the student after the conversion cannot be determined.
127. Height, in meters, is measured for each person in a sample. After the data are collected, all the height
measurements are converted from meters to centimeters by multiplying each measurement by 100. Which of
the following statistics will remain the same for both units of measure?
A) The mean of the height measurements
B) The median of the height measurements
C) The standard deviation of the height measurements
D) The maximum of the height measurements
E) The z-scores of the height measurements
128. A distribution of test scores is not symmetric. Which of the following is the best estimate of the z-score
of the third quartile?
A) 0.67
B) 0.75
C) 1.00
D) 1.41
E) This z-score cannot be estimated from the information given.
129. A certain type of remote-control car has a fully charged battery at the time of purchase. The distribution
of running times of cars of this type, before they require recharging of the battery for the first time after its
period of initial use, is approximately normal with a mean of 80 minutes and a standard deviation of 2.5 minutes.
The shaded area in the figure below represents which of the following probabilities?

A) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 75 minutes and 82.5 minutes.
B) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 75 minutes and 85 minutes.
C) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 82.5 minutes.
D) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 85 minutes.
E) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 87.5 minutes.
130. The height of 3-year-old boys is approximately normally distributed. Duncan and Shane are 3-year-old
boys.Duncan is 32.0 inches tall and is at the 32nd percentile of the distribution. Shane is 34.0 inches tall and is at
the 62nd percentile of the distribution. Which of the following is closest to the mean of the height distribution?
A) 32.50 inches
B) 32.79 inches
C) 33.00 inches
D) 33.21 inches
E) 36.53 inches
131. The distribution of monthly rent for one-bedroom apartments in a city is approximately normal with
mean $936 and standard deviation $61. A graduate student is looking for a one-bedroom apartment and wants
to pay no more than $800 in monthly rent. Of the following, which is the best estimate of the percent of one-
bedroom apartments in the city with a monthly rent of at most $800 ?
A) 1.3%
B) 2.5%
C) 50%
D) 95%
E) 97.5%
Explain:
1. Calculate the z-score of the desired rent:
z = (Target rent - Mean) / Standard deviation
z = ($800 - $936) / $61 = -2.23
2. Find the area below the z-score in a standard normal distribution:
Using a standard normal table or calculator, look up the cumulative area (probability) below a z-score of -
2.23. This represents the proportion of apartments with rent below $800.
3. Interpret the result:
The table or calculator will most likely give you a value around 0.1217, which means 12.17% of the
apartments have rent at or below $800.
Therefore, the best estimate for the percentage of apartments with rent at most $800 is:
A) 1.3%

132. Based on findings from a recent study on women's health, researchers created a 90 percent confidence
interval of (0.42, 0.48) to estimate the percent of all women who do not find time to focus on their own health.
Based on the confidence interval, which of the following claims is not supported?
A) Less than half of all women do not find time to focus on their own health.
B) More than 40 percent of all women do not find time to focus on their own health.
C) Approximately 45 percent of all women do not find time to focus on their own health.
D) More than 45 percent of all women do not find time to focus on their own health.
E) More than 25 percent of all women do not find time to focus on their own health.
Explain: It provides a range of values within which the true percentage is likely to fall with a 90% level of
confidence based on the Sample data.
Based on the confidence interval of (0.42, 0.48), the claim that is not supported is that less than 42% of all
women do not find time to focus on their own health. This is because the lower bound of the interval is 0.42.
indicating that at worst. 42% of all women do not find time to focus on their own health. Therefore, any
claim that suggests a lower percentage is not supported by the confidence interval.
On the other hand, the confidence interval does support claims that suggest the percentage of women who
do not find time to focus on their own health is at least 42% and no more than 48%. For example, a claim
that states "between 42% and 48% of all women do not find time to focus on their own health" is supported
by the confidence interval. It is important to note that the confidence interval does not tell us anything
about the actual percentage of women who do not find time to focus on their own health. Rather, it provides
a range of values within which the true percentage Is likely to fall with a 90% level of confidence based on
the samale dala. To know more about Sample data.
133. A recent survey estimated that 19 percent of all people living in a certain region regularly use sunscreen
when going outdoors. The margin of error for the estimate was 1 percentage point. Based on the estimate and
the margin of error, which of the following is an appropriate conclusion?
A) Approximately 1% of all the people living in the region were surveyed.
B) Between 18% and 20% of all the people living in the region were surveyed.
C) All possible samples of the same size will result in between 18% and 20% of those surveyed indicating they
regularly use sunscreen.
D) The probability is 0.01 that a person living in the region will use sunscreen when going outdoors.
E) It is plausible that the percent of all people living in the region who regularly use sunscreen is 18.5%.
134. In a large school district, 16 of 85 randomly selected high school seniors play a varsity sport. In the same
district, 19 of 67 randomly selected high school juniors play a varsity sport. A 95 percent confidence interval for
the difference between the proportion of high school seniors who play a varsity sport in the school district and
high school juniors who play a varsity sport in the school district is to be calculated. What is the standard error of
the difference?
A) 0.0347
B) 0.0695
C) 0.1362
D) 0.9800
E) 1.6900
Explain:
To calculate the standard error of the difference between the proportions of high school seniors and juniors
who play a varsity sport, we can use the following formula:


SE = s 2∗(
1 1
+ )
n1 n 2
2
S=
n 1∗p 1∗( 1−p 1 ) +n 2∗p 2∗( 1− p 2 ) 85∗
n 1+ n 2−2
=
16
85 ( )(
∗ 1−
16
85
+67∗
19
67)∗ 1− ( )(
19
67
=0.0121
)
85+67−2

135.
√ 1 1
SE = 0.0121∗( + ) = 0.0695
85 67
Environmentalists want to estimate the percent of trees in a large forest that are infested with a certain
beetle. The environmentalists will select a random sample of trees to inspect. Which of the following is the most
appropriate method for creating such an estimate?
a. A two-sample z-interval for a population proportion
b. A one-sample interval for a sample proportion
c. A one-sample z-interval for a population proportion
d. A two-sample z interval for a difference between sample proportions
e. A two sample 2-interval for a difference between population proportions
136. A random sample of residents in city J were surveyed about whether they supported raising taxes to
increase bus service for the city. From the results, a 95 percent confidence interval was constructed to estimate
the proportion of people in the city who support the increase. The interval was (0.46, 0.52). Based on the
confidence interval, which of the following claims is supported?
A) More than 90 percent of the residents support the increase.
B) Fewer than 10 percent of the residents support the increase.
C) More than 40 percent of the residents support the increase.
D) More than 60 percent of the residents support the increase.
E) Fewer than 25 percent of the residents support the increase.
Explain:
Claims supported:
C) More than 40 percent of the residents support the increase: The lower bound of the confidence interval is
0.46, which is already above 40%, so this claim is true.
(Bonus) None of the residents support the increase and none oppose it: While not explicitly stated, a confidence
interval of (0.46, 0.52) encompasses the possibility of 50% support, meaning exactly half the residents could fall
for both support and opposition, resulting in 0% for each individually. However, this interpretation is less
common and usually requires further clarification.
Claims not supported:
A) More than 90 percent of the residents support the increase: The upper bound of the confidence interval is
0.52, which is far below 90%, so this claim is definitely false.
B) Fewer than 10 percent of the residents support the increase: The lower bound of the confidence interval is
0.46, significantly higher than 10%, so this claim is also false.
D) More than 60 percent of the residents support the increase: The upper bound of the confidence interval is
0.52, which is below 60%, so this claim is not supported.
E) Fewer than 25 percent of the residents support the increase: Similar to option B, the lower bound of the
confidence interval (0.46) is already above 25%, making this claim unsupported.
Therefore, the only claim definitively supported by the confidence interval is C) More than 40 percent of the
residents support the increase.
Remember, a confidence interval provides a range for the estimated proportion with a certain level of
certainty (95% in this case). It doesn't tell us the exact value, but it helps us rule out some values as
statistically unlikely based on the sample data.
137. A local arts council has 200 members. The council president wanted to estimate the percent of its
members who have had experience in writing grants. The president randomly selected 30 members and surveyed
the selected members on their grant-writing experience. Of the 30 selected members, 12 indicated that they did
have the experience. Have the conditions for inference with a one-sample z-interval been met?
A) Yes, all conditions for inference have been met.
B) No, because the sample size is not large enough to satisfy the conditions for normality.
C) No, because the sample was not selected at random.
D) No, because the sample size is not less than 10 percent of the population size.
E) No, because the sample is not representative of the population.
Explain: Calculate the ratio of randomly selected members and total number of council members
n 30 n
= =¿ =0.15> 0.1
N 200 N
138. In 2009 a survey of Internet usage found that 79 percent of adults age 18 years and older in the United
States use the Internet. A broadband company believes that the percent is greater now than it was in 2009 and
will conduct a survey. The company plans to construct a 98 percent confidence interval to estimate the current
percent and wants the margin of error to be no more than 2.5 percentage points. Assuming that at least 79
percent of adults use the Internet, which of the following should be used to find the sample size (n) needed?
0.5
A) 1.96 √ ≤ 0.025
n
( 0.5 )∗(0.5)
B) 1.96 √ ≤ 0.025
n
( 0.5 )∗(0.5)
C) 2.33 √ ≤ 0.05
n
( 0.79 )∗(0.21)
D) 2.33 √ ≤ 0.025
n
( 0.79 )∗(0.21)
E) 2.33 √ ≤ 0.05
n
Explain: At least => ≤
( p )∗(1−p)
Formula: Zα/2√ ≤ε
n
139. Consider a 90 percent confidence interval to estimate a population proportion that is constructed from a
sample proportion of 66 percent. If the width of the interval is 10 percent, what is the margin of error?
A) 2.5 percent
B) 10 percent
C) 20 percent
D) 45 percent
E) 5 percent
Explain: Using confidence intervals concepts, it is found that the margin of error is of 5%
The width of a CT is twice it’s margin of error
140. A commercial for a breakfast cereal is shown during a certain television program. The manufacturer of
the cereal wants to estimate the percent of television viewers who watch the program. The manufacturer wants
the estimate to have a margin of error of at most 0.02 at a level of 95 percent confidence. Of the following, which
is the smallest sample size that will satisfy the manufacturer's requirements?
A) 40
B) 50
C) 100
D) 1,700
E) 2,500
Explain:
Margin of error ε = 0.02
At a level of 95 percent confidence => Z_value = 1.96
Sample size = Z2 p(1-p)/ ε 2
P is the proportion of population (generally taken as 0.5)
141. Paul will select a random sample of students to create a 95 percent confidence interval to estimate the
proportion of students at his college who have a tattoo. Of the following, which is the smallest sample size that
will result in a margin of error of no more than 5 percentage points?
A) 73
B) 97
C) 271
D) 385
E) 1,537
Explain: Similar to above question
142. Consider a 90 percent confidence interval for a population proportion p. Which of the following is a
correct interpretation of the confidence level 90 percent?
A) The probability that the true difference in population proportions falls within the bounds of the confidence
interval is 0.90.
B) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the sample proportions will fall within the bounds of the confidence interval.
C) If the sampling process is repeated 10 times, 9 intervals will capture the true difference between the
population proportions and 1 interval will not.
D) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the confidence intervals constructed will capture the true difference between the population proportions.
E) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the confidence intervals constructed will capture the sample difference between the population proportions.
Explain:
What is the confidence interval? The confidence interval is the range of values that you expect your estimate
to fall between a certain percentage of the time if you run your experiment again or re-sample the
population in the same way. This means that if we were to take many random samples from the same
population and compute a 90% confidence interval for each sample, we would expect that about 90% of
these intervals would contain the true population proportion. Option B is incorrect because it refers to the
sample praportions. not the population proportion
Option is incorrect because it implies that there is a probability that the population proportion falls within
the interval, which is a common misconception. The population proportion is fixed and is either in the
interval or not. The interval is a range of plausible values for the population proportion based on the sample
data and the level of confidence chosen.
Option is incorrect because it implies that 90% of the confidence intervals contain the population proportion,
which is not necessarily true. We can only say that about 90% of the intervals will contain the population
proportion if we repeat the sampling and interval construction process many times.
Hence, The correct interpretation of a 90% confidence interval of a proportion is option D: "Ninety percent
of the population proportions will fall within the limits of the confidence interval."
143. USA Today reported that speed skater Bonnie Blair had "won the USA's heart," according to a USA
Today/CNN/Gallup poll conducted on the final Thursday of the 1994 Winter Olympics. When asked who was the
hero of the Olympics, 65 percent of the respondents chose Blair, who won five gold medals. The poll of 615
adults, done by telephone, had a margin of error of 4 percent. Which of the following statements best describes
what is meant by the 4 percent margin of error?
A) About 4 percent of adults were expected to change their minds between the time of the poll and its
publication in USA Today.
B) About 4 percent of the adults sampled are not representative of the population.
C) About 4 percent of the 615 adults polled refused to answer the question.
D) The difference between the sample percentage and the population percentage is likely to be less than 4
percent.
144. A polling agency reported that 66 percent of adults living in the United States were satisfied with their
health care plans. The estimate was taken from a random sample of 1,542 adults living in the United States, and
the 95 percent confidence interval for the population proportion was calculated as (0.636, 0.684). Which of the
following statements is a correct interpretation of the 95 percent confidence level?
A) The probability is 0.95 that the percent of adults living in the United States who are satisfied with their
health care plans is between 63.6% and 68.4%.
B) Approximately 95% of random samples of the same size from the population will result in a confidence
interval that includes the proportion of all adults living in the United States who are satisfied with their
health care plans.
C) Approximately 95% of random samples of the same size from the population will result in a confidence
interval that includes the proportion of all adults in the sample who are satisfied with their health care plans.
D) Approximately 95% of all random samples of adults living in the United States will indicate that between
63.6% and 68.4% of the adults are satisfied with their health care plans.
E) Approximately 95% of all random samples of adults living in the United States will result in a sample
proportion of 0.66 adults living in the United States who are satisfied with their health care plans.
145. A 90 percent confidence interval is to be created to estimate the proportion of television viewers in a
certain area who favor moving the broadcast of the late weeknight news to an hour earlier than it is currently.
Initially, the confidence interval will be created using a simple random sample of 9,000 viewers in the area.
Assuming that the sample proportion does not change, what would be the relationship between the width of the
original confidence interval and the width of a second 90 percent confidence interval that is created based on a
sample of only 1,000 viewers in the area?
A) The second confidence interval would be 9 times as wide as the original confidence interval.
B) The second confidence interval would be 3 times as wide as the original confidence interval.
C) The width of the second confidence interval would be equal to eh width of the original confidence interval.
D) The second confidence interval would be times as wide as the original confidence interval.
E) The second confidence interval would be times as wide as the original confidence interval.
146. A marketing company wants to estimate the proportion of consumers in a certain region of the country who
would react favorably to a new marketing campaign. Further, the company wants the estimate to have a margin
of error of no more than 5 percent with 90 percent confidence. Of the following, which is closest to the
minimum number of consumers needed to obtain the estimate with the desired precision?
A) 136
B) 271
C) 385
D) 542
E) 769
147. An environmental group wanted to estimate the proportion of fresh produce sales identified as organic in a
local grocery store. In the winter, the group obtained a random sample of sales from the store and used the data
to construct the 95 percent z-interval for a proportion (0.087, 0.133 ). Six months later in the summer, the group
obtained a second random sample of sales from the store. The second sample was the same size as the first, and
the proportion of sales identified as organic was 0.4. How does the 95 percent z-interval for a proportion
constructed from the summer sample compare to the winter interval?
A) The summer interval is wider and has a lesser point estimate.
B) The summer interval is wider and has a greater point estimate.
C) The summer interval is narrower and has a lesser point estimate.
D) The summer interval is narrower and has a greater point estimate.
E) The summer interval is the same width and has a greater point estimate.
Explain:
The new 95 percent z-interval for summer will also be a similar width to the winter one because they're both
based on the same confidence ievel and sample size, but the whole interval will be shifted to the right due to
the Increased observed proportion of organic sales.
To compare the two 95 percent z-intervals for organic produce sales, we need to understand that these
intervals represent the range in which we are 95 percent confident the real proportion of organic sales falls.
In winter, this was between 0.087 and 0.133. In summer, based on the observed proportion of 0.4, without
the actual calculation, we can expect that the z-interval would shift to the right on the number line as the
proportion increased significantly, yet it also depends on the standard error of proportions that occur due to
sample size and standard deviation. Bear in mind that a confidence interval is a range of values between
which an unknown population parameter is likely to be located. These intervals are more about the
confidence level and less about the actual data, meaning that the same level of confidence will produce
similar intervals in terms of width if the sample sizes are the same, but centered on the observed
proportions. Thus. 95 percent z-intervals can be interpreted as the range in which we would expect the true
proportion to fall 95 percent of the time if we took repeated samples and computed the interval each time
Relevantly, It's also important to know about the empirical rule (66-95-99.7 rule) which describes the spread
of data in a normal distribution. But in the case of relative proportions. we're using z-scores for constructing
confidence intervals instead
148. The management team of a company with 10,000 employees is considering installing charging stations for
electric cars in the company parking lots. In a random sample of 500 employees, 15 reported owning an electric
car. Which of the following is a 99 percent confidence interval for the proportion of all employees at the
company who own an electric car?
(0.03)(0.097)
A) 0.03 ± 2.326 √
500
(0.03)(0.097)
B) 0.03 ± 2.576 √
500
(0.03)(0.097)
C) 0.15 ± 2.326 √
500
(0.03)(0.097)
D) 0.15 ± 2.675 √
500
p(1− p)
Explain: Formula : p ± zα/2√
n
149. A 99 percent one-sample z-interval for a proportion will be created from the point estimate obtained from
each of two random samples selected from the same population: sample R and sample S. Let R represent a
random sample of size 1,000, and let S represent a random sample of size 4,000. If the point estimate obtained
from R is equal to the point estimate obtained from S, which of the following must be true about the respective
margins of error constructed from those samples?
A) The margin of error for S will be 4 times the margin of error for R.
B) The margin of error for S will be 2 times the margin of error for R.
C) The margin of error for S will be equal to the margin of error for R.
D) The margin of error for R will be 4 times the margin of error for S.
E) The margin of error for R will be 2 times the margin of error for S.
Explain: ε = z/√ n
150. A random sample of 432 voters revealed that 100 are in favor of a certain bond issue. A 95 percent
confidence interval for the proportion of the population of voters who are in favor of the bond issue is
(0.5)(0.5)
A) 100 ± 1.96 √
432
(0.5)(0.5)
B) 100 ± 1.645 √
432
(0.231)(0.769)
C) 100 ± 1.96 √
432
(0.231)(0.769)
D) 0.231 ± 1.96 √
432
(0.231)(0.769)
E) 0.231 ± 1.645 √
432
151. A news organization conducted a survey about preferred methods for obtaining the news. A random sample
of 1,605 adults living in a certain state was selected, and 16.2 percent of the adults in the sample reported that
television was their preferred method. Which of the following is an appropriate margin of error for a 90 percent
confidence interval to estimate the population proportion of all adults living in the state who would report that
television is their preferred method for obtaining the news?
(0.162)(1−0.162)
A) 1.645 √
1,605
(0.5)(1−0.5)
B) 1.645 √
1,605

(0.162)(1−0.162)
C) 1.96 √
1,605
(0.5)(1−0.5)
D) 1.96 √
1,605
(0.162)(1−0.162)
E) 1.83 √
1,605
152. A survey was conducted to determine what percentage of college seniors would have chosen to attend a
different college if they had known then what they know now. In a random sample of 100 seniors, 34 percent
indicated that they would have attended a different college. A 90 percent confidence interval for the percentage
of all seniors who would have attended a different college is
A) 24.7% to 43.3%
B) 25.8% to 42.2%
C) 26.2% to 41.8%
D) 30.6% to 37.4%
E) 31.2% to 36.8%
p(1− p)
p ± zα/2√
n
153. On the day before an election in a large city, each person in a random sample of 1,000 likely voters is asked
which candidate he or she plans to vote for. Of the people in the sample, 55 percent say they will vote for
candidate Taylor. A margin of error of 3 percentage points is calculated. Which of the following statements is
appropriate?
A) The proportion of all likely voters who plan to vote for candidate Taylor must be the same as the
proportion of voters in the sample who plan to vote for candidate Taylor (55 percent), because the data
were collected from a random sample.
B) The sample proportion minus the margin of error is greater than 0.50, which provides evidence that
more than half of all likely voters plan to vote for candidate Taylor.
C) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because the 1,000 likely voters in the sample represent only a small fraction of all likely
voters in a large city.
D) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because this is not an experiment.
E) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because this is a random sample and not a census.
154. Researchers investigating a new drug selected a random sample of 200 people who are taking the drug. Of
those selected, 76 indicated they were experiencing side effects from the drug. If 5,000 people took the drug,
which of the following is closest to the interval estimate of the number of people who would indicate they were
experiencing side effects from the drug at a 90 percent level of confidence?
A) (0.313,0.447)
B) (0.324, 0.436)
C) (65,87)
D) (1565, 2235)
E) (1620, 2180)
Explain: The question asks us to find the confidence interval for the people who would indicate they were
experiencing side effects trom the new drug. By looking at the percentage of people in the sample who
reported side effects (76/200) we can apply that percentage (38%) the total popuation of people who took
the dug
We are asked to find the 95 percent confidence level for the percentage of side effects. Confidence interval is
found using a formula that involves the size of the sample, the standard error of the measurement, and the
desired confidence level. However, these options seem to yield intervals directly without extra calculations.
So, we only need to multiply each end of each of those intervals by the total population of 5000 to see which
yields the best interval for our data.
The confidence interval that best fits these conditions is option d. 1565, 2235)
155. Jessica wanted to determine if the proportion of males for a certain species of laboratory animal is less
than 0.5. She was given access to appropriate records that contained information on 12,000 live births for the
species. To construct a 95 percent confidence interval, she selected a simple random sample of 100 births from
the records and found that 31 births were male. Based on the study, which of the following expressions is an
approximate 95 percent confidence interval estimate for p, the proportion of males in the 12,000 live births?
(0.31)(0.69)
A) 0.31 ± 1.96 √
12,000
(0.31)(0.69)
B) 0.31 ± 1.645 √
12,000
(0.5)(0.5)
C) 0.31 ± 1.96 √
12,000
(0.5)(0.5)
D) 0.31 ± 1.645 √
100
(0.31)(0.69)
E) 0.31 ± 1.96 √
100
156. A random sample of 80 people was selected, and 22 of the selected people indicated that it would be a
good idea to eliminate the penny from circulation. What is the 99 percent confidence interval constructed from
the sample proportion p̂?
(22)(58)
A) 0.275 ± 1.96 √
80
(0.275)(0.725)
B) 0.22 ± 2.576 √
80
(0.275)(0.725)
C) 0.275 ± 2.576 √
80
(0.275)(0.725)
D) 0.275 ± 1.96 √
80
(0.275)(0.725)
E) 0.22 ± 2.323 √
80
157. From a random sample of 1,005 adults in the United States, it was found that 32 percent own an e-reader.
Which of the following is the appropriate 90 percent confidence interval to estimate the proportion of all adults
in the United States who own an e-reader?
( 0.32 ) ( 0.68 )
A) 0.32 ± 1.960 ( )
√ 1,005
( 0.32 ) ( 0.68 )
B) 0.32 ± 1.645 ( )
√ 1,005
(0.32)(0.68)
C) 0.32 ± 2.575 √
1,005
(0.32)(0.68)
D) 0.32 ± 1,960 √
1,005
(0.32)(0.68)
E) 0.32 ± 1.645 √
1,005
158. Elly and Drew work together to collect data to estimate the percentage of their classmates who own a
particular brand of shoe. Using the same data, Elly will construct a 90 percent confidence interval and Drew will
construct a 99 percent confidence interval. Which of the following statements is true?
A) The midpoint of Elly's interval will be greater than the midpoint of Drew's interval.
B) The midpoint of Elly's interval will be less than the midpoint of Drew's interval.
C) The width of Elly's interval will be greater than the width of Drew's interval.
D) The width of Elly's interval will be less than the width of Drew's interval.
E) The width of Elly's interval will be equal to the width of Drew's interval
Explain:
159. A school administrator is interested in estimating the proportion of students in the district who participate in
community service activities. From a random sample of 100 students in the district, the administrator will
construct a 99 percent confidence interval for the proportion of all district students who participate in
community service activities. Which of the following statements must be true?
A) The population proportion will be in the confidence interval.
B) The probability that the confidence interval will include the population proportion is 0.99.
C) The probability that the confidence interval will include the sample proportion is 0.99.
D) The population proportion and the sample proportion will be equal.
E) The probability that the population proportion and the sample proportion will be equal is 0.99.
160. A newspaper poll found that 52 percent of the respondents in a large random sample of likely voters in a
district intend to vote for candidate Smith rather than the opponent. A 95 percent confidence interval for the
population proportion was computed to be 0.52 ± 0.04. Based on the confidence interval, which of the following
should the newspaper report to its readers?
A) Smith will win because a majority of voters are in favor of Smith.
B) There is a 95% chance that Smith will win.
C) The poll predicts Smith will win, but there is a 5% chance that the prediction is incorrect due to sampling
error.
D) With 95% confidence, there is convincing evidence that Smith will win.
E) No prediction about who will win can be made with 95% confidence.
Explain:
The report that the newspaper reported to its readers that the estimated proportion of likely voters in the
district who intend to vote for Candidate Smith is between 0.48 and 0.56 at a 95 percent confidence level.
We have,
Based on the confidence interval provided (0.52 ‡ 0.04), the newspaper should report to its readers that the
estimated proportion of likely voters in the district who intend to vote for Candidate Smith is between 0.48
and 0.56 at a 95 percent confidence level.
The confidence interval is calculated as follows:
Lower bound = 0.52 - 0.04 = 0.48
Upper bound = 0.52 + 0.04 = 0.56
A 95 percent confidence level means 95 percent confidence that the true proportion of likely voters who
intend to vote for Candidate Smith lies within this interval (0.48 to 0.56).
In other words, if we were to take many random samples and calculate their confidence intervals, we would
expect approximately 95 percent of those
161. Lila and Robert attend different high schools. They will estimate the population percentage of students at
their respective schools who have seen a certain movie. Lila and Robert each select a random sample of students
from their respective schools and use the data to create a 95 percent confidence interval. Lila's interval is (0.30,
0.35), and Robert's interval is (0.27, 0.34). Which of the following statements can be concluded from the
intervals?
A) Lila’s sample size is most likely greater than Robert’s sample size.
B) Robert’s sample size is mostly likely greater than Lila’s sample size.
C) Lila and Robert will both find the same sample proportion of students who have seen the movie.
D) Lila’s interval has a greater degree of confidence than that of Robert.
E) Robert’s interval has a greater degree of confidence than that of Lila.
162. A large-sample 98 percent confidence interval for the proportion of hotel reservations that are canceled on
the intended arrival day is (0.048, 0.112). What is the point estimate for the proportion of hotel reservations that
are canceled on the intended arrival day from which this interval was constructed?
A) 0.032
B) 0.064
C) 0.080
D) 0.160
E) It cannot be determined from the information given.
163. A 95 percent confidence interval of the form p̂ ± E will be used to obtain an estimate for an unknown
population proportion p. If p̂ is the sample proportion and E is the margin of error, which of the following is the
smallest sample size that will guarantee a margin of error of at most 0.08?
A) 25
B) 100
C) 140
D) 155
E) 175
Z2 p(1-p)/ ε 2
164. Courtney has constructed a cricket out of paper and rubber bands. According to the instructions for
making the cricket, when it jumps it will land on its feet half of the time and on its back the other half of the time.
In the first 50 jumps, Courtney's cricket landed on its feet 35 times. In the next 10 jumps, it landed on its feet only
twice. Based on this experience, Courtney can conclude that
A) the cricket was due to land on its feet less than half the time during the final 10 jumps, since it had
landed too often on its feet during the first 50 jumps
B) a confidence interval for estimating the cricket's true probability of landing on its feet is wider after the
final 10 jumps than it was before the final 10 jumps
C) a confidence interval for estimating the cricket's true probability of landing on its feet after the final 10
jumps is exactly the same as it was before the final 10 jumps
D) a confidence interval for estimating the cricket's true probability of landing on its feet is more narrow
after the final 10 jumps than it was before the final 10 jumps
E) a confidence interval for estimating the cricket's true probability of landing on its feet based on the initial
50 jumps does not include 0.2, so there must be a defect in the cricket's construction account for the
poor showing in the final 10 jumps
165. A random sample of 1,175 people in a certain country were asked whether they thought climate change
was a problem. The sample proportion of those who think climate change is a problem was calculated, and a 95
percent confidence interval was constructed as (0.146, 0.214). Which of the following is a correct interpretation
of the interval?
A) We are 95 percent confident that any sample of 1,175 people will produce a sample proportion between
0.146 and 0.214.
B) We are 95 percent confident that the proportion of all people in the country who think climate change is a
problem is between 0.146 and 0.214.
C) We are 95 percent confident that the proportion of people in the sample who think climate change is a
problem is between 0.146 and 0.214.
D) The probability that 95 percent of all people in the country who think climate change is a problem is
between 0.146 and 0.214.
E) The probability is 0.95 that the proportion of all people in the country who think climate change is a problem
is between 0.146 and 0.214.
166. A school librarian wanted to estimate the proportion of students in the school who had read a certain
book. The librarian sampled 50 students from the senior English classes, and 35 of the students in the sample
had read the book. Have the conditions for creating a confidence interval for the population proportion been
met?
A) Yes, because the sample was selected at random.
B) Yes, because sampling distributions of proportions are modeled with the normal model.
C) Yes, because the sample is large enough to satisfy the normality conditions.
D) No, because the sample is not large enough to satisfy the normality conditions.
E) No, because the sample was not selected using a random method.
167. A city planner wants to estimate the proportion of city residents who commute to work by subway each
day. A random sample of 30 city residents was selected, and 28 of those selected indicated that they rode the
subway to work. Is it appropriate to assume that the sampling distribution of the sample proportion is
approximately normal?
A) No, because the size of the population is not known.
B) No, because the sample is not large enough to satisfy the normality conditions.
C) Yes, because the sample is large enough to satisfy the normality conditions.
D) Yes, because the sample was selected at random.
E) Yes, because sampling distributions of proportions are modeled with a normal model.
168. The manager of a magazine wants to estimate the percent of magazine subscribers who approve of a
new cover format. To gather data, the manager will select a random sample of subscribers.
Which of the following is the most appropriate interval for the manager to use for such an estimate
A) A two-sample z-interval for a difference between sample proportions
B) A two-sample z-interval for a difference between population proportions
C) A one-sample z-interval for a sample proportion
D) A one-sample z-interval for a population proportion
E) A one-sample z-interval for a difference between population proportions
169. The superintendent of a large school district wants to estimate the percent of district residents who
support the building of a new middle school. To gather data, the superintendent will select a random sample of
district residents.
A) A one-sample z -interval for a sample proportion
B) A two-sample z -interval for a difference between population proportions
C) A two-sample z -interval for a population proportion
D) A one-sample z -interval for a difference between population proportions
E) A one-sample z -interval for a population proportion
170. A box contains 10 tags, numbered 1 through 10, with a different number on each tag. A second box
contains 8 tags, numbered 20 through 27, with a different number on each tag. One tag is drawn at random from
each box. What is the expected value of the sum of the numbers on the two selected tags?
A) 13.5
B) 14.5
C) 15.0
D) 27.0
E) 29.0
171. A compact disc (CD) manufacturer wanted to determine which of the two different cover dish s for a
newly released CD will generate more sales. The manufacturer chose 70 stores to sell the CD. Thirty-five of these
stores were randomly assigned to sell CD's with one of the cover designs and the other 35 were assigned to sell
the CDs with the other cover design. The manufacturer recorded the number of CDs sold at each of the stores
and found a significant difference between the mean number number of CDs sold for the two cover designs.
Which of the following gives the conclusion that should be made based on the results and provides the best
explanation for the conclusion?
A) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because this was not an experiment
B) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because there was no control group for comparison
C) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because the 70 stores were not randomly chosen
D) It is reasonable to conclude that the difference in sales was caused by the different cover deigns because the
cover designs were randomly assigned to stores
E) It is reasonable to conclude that the difference in sales was caused by the different cover designs because
the sample size was large
Explain:
 Random assignment: The experiment used random assignment, which reduces the influence of confounding
variables and strengthens the causal inference. Assigning the cover designs randomly helps to ensure that any
difference in sales is more likely due to the cover design and not some other factor that differs between the two
groups of stores.
 Significant difference: The experiment found a significant difference in the mean number of CDs sold for the two
cover designs. This suggests that the observed difference is unlikely to be due to chance.
 Control group: While not explicitly mentioned, the two groups of stores can be considered as control groups for
each other. Comparing the sales between the two groups allows us to isolate the effect of the cover design.
Therefore, based on the information provided, it is reasonable to conclude that the difference in sales was
caused by the different cover designs. The random assignment of the cover designs strengthens the causal
inference and reduces the possibility of alternative explanations.
Let's analyze the other options:
A) Not an experiment: The study described uses the principles of experimentation, even though it may not be a
formal laboratory experiment. Random assignment and controlled comparison support the validity of the
conclusions.
B) No control group: As explained above, the two groups of stores can be considered as control groups for each
other.
C) Not randomly chosen: While it would be ideal if all stores were chosen randomly, the random assignment of
the cover designs within the chosen stores strengthens the causal inference.
E) Large sample size: While a large sample size can increase the confidence in the results, it alone is not enough
to conclude causality. Random assignment is crucial for causal inference.
Therefore, option D is the most accurate conclusion based on the information provided.

172. A company sells concrete in bathes of 5 cubic yards. The probability distribution of X, the number of cubic
yards sold in a single order for concrete from this company, shown in a table below.

X 10 15 20 25 30
Probability 0.15 0.25 0.25 0.30 0.05
The expected value of the probability distribution of X is 19.25 and the standard deviation is 5.76. There is a fixed
cost to deliver the concrete. The profit, Y, in dollars, for a particular order can be described by Y = 75X -100. What
is the standard deviation of Y?
A) $332
B) $532
C) $1,343.75
D) $432
Explain: The standard deviation of Y is $432 and this can be determined by using the formula of standard
deviation and the given data.
Given :
• A company sells concrete in batches of 5 cubic yards.
• The probability distribution of X, the number of cubic yards sold in a single order for concrete from this
company, is shown in the table below.
• The expected value of the probability distribution of X is 19.25 and the standard deviation is 5.76.
• There is a fixed cost to deliver the concrete. The profit Y, in dollars, for a particular order can be described
by (Y = 75X - 100).
The formula to obtain the standard deviation is given below: SD(Y) = SD(75X - 100)
Simplify the above expression by substituting the value of SD(X) and SD(100) in the above expression.
SD(Y) = 75SD(X) - SD(100)
SD(Y) = 75 SD(X) - 0
SD(Y) = 75 × 5.76
SD(Y) = $432
173. Ecologists conducted a study to investigate the potential ecological impact of golf courses. Investigators
monitored the reproductive success of bluebirds in birdhouses at nine golf courses and ten similar birdhouses at
nongolf sites. Data on nests in birdhouses occupied only by bluebirds are shown in the table.
Observed Number of Nests per Birdhouse by Location

0 nest 1 nest 2 or 3 nests Total


Goft 30 42 8 80
Nongoft 40 58 22 120
Total 70 100 30 200
If the proportions of nests occupied is the same for golf and nongolf sites, what would be the expected count of
birdhouses with 1 nest in nongolf locations?
A) 40
B) 42
C) 50
D) 60
E) 80
Explain: To determine the expected count of birdhouses with 1 nest in nongolf locations, we must apply the
printiples of probability in the context of a contingency table. Firstly, we need to calculate the proportion of
1-nest birdhouses in relation at the total number. The total number of birdhouses, both on golf and nongolf
sites, is 200. Out of these, the number of 1-nest birdhouses is 100. Thus, the proportion of birdhouses with 1
nest is 100/200 = 0.5 or 50%
Secondly, we need to multiply this proportion by the total number of nongolf birdhouses to estimate the
expected count. The total number of birdhouses in nongolf sites is 120. Therefore, the expected count of
birdhouses with 1 nest in nongolf locations would be 0.5 (proportion) × 120 (count) = 60.
174. A fisheries biologist collected a random sample of fish from a lake and conducted a chi-square goodness-
of-fit test to see if the distribution of fish changed over time. The table below shows the distribution of fish that
were put into the lake when it was originally stocked.

Fish type Trout Bass Perch Sunfish Catsish


Percent 25% 25% 20% 15% 15%
The biologist found evidence to reject the null hypothesis in favor of the alternative hypothesis. Which of the
following represents the alternative hypothesis of the test?
A) At least one of the fish proportions is different than the corresponding proportion when the lake was
originally stocked.
B) The proportions for the different fish types are the same as the corresponding proportions when the lake
was originally stocked.
C) The proportions are evenly distributed arong fish types.
D) At least one of the fish proportions is the same as the corresponding proportion when the lake was stocked.
E) All of the fish proportions are different than the corresponding proportions when the lake was stocked.
175. A factory produces bags of rubber bands. A bag of rubber bands has five different sizes: extra large (XL),
large (L), medium (M), small (S), and extra small (XS). A quality control specialist collects a random sample of 450
rubber bands from the bagging machine and calculates a chi-square goodness-of-fit test to see if the frequencies
for each size in the sample match the hypothesized distribution. The quality control specialist will test his sample
against the following null hypothesis
Ho: pXL=0.10,pL=0.20,pM=0.40,pS=0.20,pXS=0.10
How many medium rubber bands are expected in the random sample of 450 rubber bands?
A) 20
B) 40
C) 90
D) 180 (=0.4*450)
E) 1125
176. At a certain company, loan agents are paid based on the number of loans they close in a day. Based on
company records, the number of loans X that a randomly selected loan agent closes on a randomly selected day
has the probability distribution below.

x 1 2 3 4 5 6 7
P(x) 0.05 0.1 0.22 0.3 0.18 0.12 0.03
At the company, the daily salary of a loan agent is $150 plus $50 per loan closed. Let Y represent the amount of
money made by a randomly selected loan agent on a randomly selected day. Which of the following statements
is NOT true?
A) The mean of X is less than the mean of Y.
B) The standard deviation of Y is approximately $71.
C) The mean daily salary is greater than $350 per day.
D) The standard deviation of X is less than the standard deviation of Y.
E) The shape of the probability distribution of Y is unimodal and roughly symmetric.
Explain: The computation is shown below:
Y = a + bX
where,
Y = money made by a random selected
a = $150
b = $50
X = number of loan
E(x) = (1 × 0.05) + (2 × 0.10) + (3 × 0.22) + (4 × 0.30) + (5 × 0.18) + (6 × 0.12) + (7 × 0.03) = 3.94
E(y) = $150 - ($50 × 3.94) = $347
So, the correct option is given by C) The mean daily salary is greater than $350 per day.
177. A newspaper article indicated that 43 percent of cars with black seats are white, 46 percent of cars with
black seats are blue, 7 percent of cars with black seats are red, and 4 percent of cars with black seats are black. A
test was conducted to investigate whether the color of cars with black seats was consistent with the newspaper
article. A random sample of cars of these colors was selected, and the value of the chi-square test statistic was
χ2=8.2 . Which of the following represents the p-value for the test?
A) P(χ2 ≥ 8.2) = 0.08
B) P(χ2 ≥ 8.2) = 0.04
C) P(χ2 ≤ 8.2) = 0.96
D) P (χ2= 8.2) = 0.00
E) The p-value cannot be calculated because the sample size is not given.
178. Let X be a random variable whose values are the number of dots that appear on the uppermost face
when a fair die is rolled. The possible values of X are 1, 2, 3, 4, 5, and 6. The mean of X is 7/2 and the variance of
X is 35/12. Let Y be the random variable whose value is the difference (first minus second) between the number
of dots that appear on the uppermost face for the first and second rolls of a fair die that is rolled twice. What is
the standard deviation of Y?
A) √(35/12)
B) √(35/12) + √(35/12)
C) √{(35/12) + (35/12)}
D) √(35/12) - √(35/12)
Explain:
The mean of X is 7/2 and the variance is 35/12
Let Y be a random variable. X and Y are binomials.
E(X-Y) = E (X) - E(Y)
Write the formula for the variance of X and Y. Var(X-Y) = Var (X) - Var (Y)
Where x and y are independent covers (x,y) is 0.
Var(X-Y) =35/12 + 35/12
The standard deviation is the square root of the variance => SD = √{(35/12) + (35/12)
179. The National Park Service writes materials for students to use while in the parks. In a study of the
effectiveness of some of these materials, a random sample of students was selected to take a short quiz about
oak trees after using these materials. A random sample of park professionals also took the quiz. Investigators
compared classifications (low, medium, and high) of the crown shapes—the general shapes of the leafy parts of
the trees—made by students in s 6 through 12 with classifications made by professionals. Data from the study
are shown in the table.

Professionals Students Total


Low 54 43 97
Medium 48 39 87
High 7 9 16
Total 109 91 200
If the professionals and the students do not differ in the distributions of their responses, which of the following is
equal to the expected number of students who classify the crown shapes as medium?
Explain:
The total number of students is 91.
The total number of students who classified the crown shapes as low is 43.
The total number of students who classified the crown shapes as medium is 39.
The total number of students who classified the crowh shapes as high is 9.
Now let's calculate the expected counts for the students who classified the crown shapes as medium:
The total number of classifications for the medium category is 87 (from the table).
The proportion of classifications in the medium category made by the professionals is 48/109.
The expected number of classifications in the medium category made by the students can be calculated as
(proportion from professionals) × (total number of student classifications) = (48/109) × 87 = 38.3 (rounded to one
decimal place).
Therefore, the expected number of students who classify the crown shapes as medium is approximately 38.3.
180. For which of the following is a chi-square goodness-of-fit test most appropriate?
A) Estimating a difference between two population means
B) Estimating a difference between two population proportions
C) Finding the expected value of a probability distribution
D) Determining whether a categorical variable has a significantly different distribution of proportions than the
expected distribution
E) Determining the best shape for a set of data
181. A company claims that each bag of grass seeds that it sell contains the following distribution of grass-
seed type

Type of grass seed Percent


Fescue (F) 55%
Buffalo grass (B) 20%
Blue grana (BG) 10%
Indian grass (I) 7%
Green needlegrass (NG) 6%
A quality control specialist tests samples of the seed being packaged and uses a chi-square goodness-of-fit test to
see whether the proportions in the samples match what is claimed by the company. Which of the following best
describes the null hypothesis and the alternative hypothesis for the test?
A) Ho : p = 0.20
Ha: p ≠ 0.20
B) Ho : Pf = 0.20, Pb = 0.20, PBG = 0.20, PI = 0.20, PNG = 0.20
Ha: At least one of the proportions is different.
C) Ho : PF = 0.55, PB = 0.22, PBG = 0.10, PI = 0.07,PNG = 0.06
Ha : At least one of the proportions is different.
D) Ho : All proportions are equally likely.
Ha: All of the proportions are different.
E) Ho: There is no association between the grass seed types.
Ha: There is an association between the grass seed types.
182. A local restaurant claims that it gets 45 percent of its customers from Monday through Thursday, 20
percent on Friday, 20 percent on Saturday, and 15 percent on Sunday. How many degrees of freedom should be
used to conduct a chi-square goodness-of-fit test of the claim?
A) 1
B) 2
C) 3 (=n-1)
D) 4
183. Ms. Harper knows that her students in a computing course can choose from one of three operating
systems for the semester: Doors, Banana, or Duix. Ms. Harper wants to test the hypothesis that her students will
select the operating systems in the same proportion as students in other computing courses at the university.
She conducts a χ2 goodness-of-fit test and calculates χ2=3.79 with a corresponding p-value of 0.15. Which of the
following is correct at a 5-percent level of significance?
A) Reject the null hypothesis, since 3.79 > 2.
B) Fail to reject the null hypothesis, since 3.79 > 2.
C) Reject the null hypothesis, since 0.15 > 0.05.
D) Fail to reject the null hypothesis, since 0.15 > 0.05.
E) Reject the null hypothesis, since 0.15 < 3.79.
Explain:
P-value and decision regarding null hypothesis:
If the p-value is larger than the significance level, we do not reject the null hypothesis.
Otherwise, we reject the null hypothesis.
In this question:
The test statistic is x2=3.79, with a p-value of 0.15.
The level of significance is 0.05.
Since 0.15 > 0.05, we fail to reject the null hypothesis, and the correct answer is given by option D.
184. A spinner made for a game of chance has 8 equally likely spaces. Alfonso records the result of a sample
of 400 spins. Alfonso decides to calculate a chi-square test statistic for a goodness-of-fit test to see whether the
spinner is fair. Which of the following is the appropriate null hypothesis?
A) H0 :p1 = 0.125, p2 = 0.125, p3 = 0,125, p4 = 0.125, p5 = 0.125, p6 = 0.125, p7 = 0.125, p8 = 0.125
B) H0 : At least one proportion is different
C) H0 : p = 0.125
D) H0 :p1 ≠ 0.125, p2 ≠ 0.125, p3 ≠ 0,125, p4 ≠ 0.125, p5 ≠ 0.125, p6 ≠ 0.125, p7 ≠ 0.125, p8 ≠ 0.125
E) H0 :p1 = 0.08, p2 = 0.08, p3 = 0.08, p4 = 0.08, p5 = 0.08, p6 = 0.08, p7 = 0.08, p8 = 0.08
185. An administrator at a local high school wanted to investigate whether there is an association between
the amount of time a student studies for a test and the type of extracurricular activity the student is involved in.
Three hundred students selected at random were asked how long they had studied for the last math test and
how many extracurricular activities they are involved in. The times they had studied were recorded as either not
at all, less than 30 minutes, or more than 30 minutes. Each student also identified which extracurricular activity
(out of a total of 5 extra curricular activities) they were involved in. The calculated chi-square test statistic was
7.53 with a corresponding p-value of 0.4807. Based on this p-value, which of the following is the correct decision
for the appropriate hypothesis test at the α=0.05 significance level?
A) Reject the null hypothesis: The test is statistically significant because a p-value of 0.4807 is greater than a
significance level of 0.05
B) Reject the null hypothesis. The test is statistically significant because a p-value of 0.4807 is less than the test
statistic of 7.53
C) Fail to reject the null hypothesis. The test is not statistically significant because a p-value of 0.4807 is greater
than a significance level of 0.05.
D) Fail to reject the null hypothesis. The test is not statistically significant because a p-value of 0.4807 is léss
than the test statistic of 7.53.
E) Accept the null hypothesis. The test is not statistically significant because a p-value of 0.4807 is greater than
a significance level of 0.05
186. Students in a high school statistics class wanted to see if the distribution of the colors of a popular
candy was different in the bags for different types of candies the company manufactures. The students
purchased several large bags of regular candies, tropical-flavored candies, and sour-flavored candies. For each
type of candy, the students took a random sample of 100 candies and recorded how many of each color (red,
green, yellow, or blue) were in the sample. The students verified the conditions for inference and calculated a
chi-square test statistic of 12.59 with a corresponding p-value of 0.05. Which of the following is the correct
interpretation of the p-value in the context of the test?
A) The hypothesis test has a significance level of a = 0.05.
B) There is a 5 percent chance that the distribution of colors is different for the different types of candies.
C) There is a 5 percent chance that the distribution of colors is the same for the different types of candies.
D) Assuming that the distribution of colors for the different types of candies is the same, there is a 5 percent
chance of finding a test statistic of 12.59 or larger.
E) Assuming that the distribution of colors for the different types of candies is different, there is a 5 percent
chance of finding a test statistic of 12.59 or larger.
187. A state highway commission is considering removing the lane that allows people to pay cash for a toll
on a toll road and requiring all people who use the toll road to pay with an electronic transponder that is
connected to their car. The commission wants to know whether the proportion of people who live in the
northern part of the state and are in favor of removing the cash lane is different from the proportion of people
who live in the southern part of the state and are in favor of removing the cash lane. Independent random
samples are selected from the northern and southern parts of the state. The table summarizes the responses of
those surveyed.

Northen Southen Total


Remove Cash Lanes 112 98 210
Keep Cash Lanes 89 105 194
Total 201 203 404
Which of the following is closest to the p-value of the appropriate test to investigate whether the proportion of
people living in the northern part of the state who are in favor of removing the cash lane is different from the
proportion of people living in the southern part of the state who are in favor of removing the cash lane?
A) 0.9429
B) 0.0671
C) 0.9401
D) 0.5235
E) 0.1342
Explain ( ứ hiểu lắm ) :
This is known to be a kind of statistical measurement that is often employed to help to confirm a given
hypothesis against the observed data of a scientific experiment.
Note that the p-value helps use to know the probability of getting the observed result and as such a lower
the p-value is greater statistically.
Therefore, the use of 0.432 is best for the given experiment above.
188. A company claims it audits its employees' transactions based on their job level. For entry-level
positions, the company claims that 50 percent get a basic audit, 30 percent get an enhanced audit, and 20
percent get a complete audit. The company tests this hypothesis using a random sample and finds χ2=0.771 with
a corresponding p-value of 0.68. Assuming conditions for inference were met, which of the following is the
correct interpretation of the p-value?
A) There is a 68 percent chance of obtaining a chi-square value of at least 0.771.
B) There is a 68 percent chance that the company's claim is correct.
C) If the null hypothesis were true, there would be a 68 percent chance that the company's claim is correct.
D) If the null hypothesis were true, there would be a 68 percent chance of obtaining a chi-square value ol 0.771.
E) If the null hypothesis were true, there would be a 68 percent chance of obtaining a chi-square value of at
least 0.771.
189. A random variable X has a mean of 120 and a standard deviation of 15. A random variable Y has a mean
of 100 and a standard deviation of 9. If X and Y are independent, approximately what is the standard deviation of
X-Y?
𝝁x=120
𝝈x=15
𝝁y=100
𝝈y= 9
Now,
The standard deviation of X-Y is given by,
s.d.(X-Y) = √Var(X-Y)
= √[Var(x)+(-1)^2Var(Y)-2cov(X,Y)]
= √[15^2 +9^2]; Since X and Y are independent Cov(X,Y)=0
= 17.4928
190. A survey of a random sample of 210 male teens and 228 female teens, ages 13 years to 17 years, found
that 122 of the male teens and 160 of the female teens brush their teeth at least twice a day. If there is no
difference between the proportions in the population of all male and female teens ages 13 years to 17 years who
brush their teeth at least twice a day, approximately how many males and females in the sample would be
expected to brush their teeth at least twice a day?
A) 105 males and 114 females
B) 122 males and 160 females
C) 135 males and 147 females
D) 141 males and 141 females
E) 219 males and 219 females
Find the pooped sample proportion by combining the number of successes in each sample and combining
the sample sizes:
p^ = (122 + 160) / (210 + 228) = 282/438
Then multiply this pooled sample proportion by the two sample sizes:
Expected males = n1 × p^ = 210(282/438) = 135.21
Expected females = n2 × p^ = 228(282/438) = 146.79
191. A chi-square goodness-of-fit test where all assumptions were met yielded the test statistic χ2=12.4.
Henry claims the corresponding p-value of 0.03 means that the probability of observing a test statistic of χ2=12.4
is 0.03, assuming the null hypothesis is true. Which of the following is a valid criticism of this interpretation of
the p-value?
A) The researcher did not state that the p-value is conditional on the null hypothesis being true.
B) The researcher interpreted the p-value as the probability of observing 1.92 exactly.
C) The alternative hypothesis is not stated.
D) The significance level is not stated.
E) The degrees of freedom are not stated.
192. A player pays $15 to play a game in which a chip is randomly selected from a bag of chips. The bag
contains 10 red chips, 4 blue chips, and 6 yellow chips. The player wins $5 if a red chip is selected, $10 if a blue
chip is selected, and $20 if a yellow chip is selected. Let the random variable X represent the amount won from
the selection of the chip, and let the random variable W represent the total amount won, where W = X - 15.
What is the mean of W?
A) 10.5
B) 4.5
C) -4.5
D) -6.5
E) -10.5
Explain:
We are given:
Winnings if a red chip is selected x1 = $5
Now, there are 10 red chips in a bag of 20 chips
Thus, the probability of this outcome is p1 = 10/20 = ½
Winnings if a blue chips is selected x2 = $10
Now, there are 4 blue chips in a bag of 20 chips
Thus, the probability of this outcome is p2 = 4/20 = 1/5
Winnings if a yellow chips is selected x3 = $20
Now, there are 6 yellow chips in a bag of 20 chips
Thus, the probability of this outcome is p2 = 6/20 = 3/10
The expected value is E(X) = 5*(1/2)+10*(1/5)+20*(3/10)=$10.5
We are told that w = x - 15 = 10.5-15=-4.5 (x is the amount won from the selection of chip which is E(X)=10.5)
193. A company that makes fleece clothing uses fleece produced from two farms, Northern Farm and
Western Farm. Let the random variable X represent the weight of fleece produced by a sheep from Northern
Farm. The distribution of X has mean 14.1 pounds and standard deviation 1.3 pounds. Let the random variable Y
represent the weight of fleece produced by a sheep from Western Farm. The distribution of Y has mean 6.7
pounds and standard deviation 0.5 pound. Assume X and Y are independent. Let W equal the total weight of
fleece from 10 randomly selected sheep from Northern Farm and 15 randomly selected sheep from Western
Farm. Which of the following is the standard deviation, in pounds, of W ?
A) 1.3+0.5
B) sqrt(1.3^2+0.5^2)
C) sqrt(10(1.3)^2+15(0.5)^2)
D) sqrt(10^2(1.3)^2+15^2(0.5)^2)
E) sqrt((1.3)^2/10 + (0.5)^2/15)
Explain: The question is asking for the standard deviation of the total weight of fleece produced by a certain
number of sheep from each farm. When you sum up multiple independent random variables, the variance of
the total is equal to the sum of the variances of each variable. Since standard deviation is the square root of
variance, we have to square the individual standard deviations, multiply each by the number of terms, sum
these up, and finally take the square root. Consequently, the correct answer is the formula that corresponds
to this explanation: sqrt(10(1.3) 2+15(0.5) 2)
194. The manager of a restaurant tracks the types of dinners that customers order from the menu to ensure
that the correct amount of food is ordered from the supplier each week. Data from customer orders last year
suggest the following weekly distribution.

Type of dinner Beef Chicken Fish Pork Vegetarian


Proportion 0.18 0.41 0.15 0.20 0.06
The manager believes that there might be a change in the distribution from last year to this year. A random
sample of 200 orders was taken from all customer orders placed last week. The following table shows the results
of the sample.

Type of dinner Beef Chicken Fish Pork Vegetarian


Frequency 32 86 34 30 18
Assume each order is independent. For which type of dinner is the value of its contribution to the appropriate
test statistic the greatest?
A) beef
B) chicken
C) fish
D) pork
E) vegetarian
195. A statistician is conducting a chi-square goodness-of-fit test and is limited by the cost, per individual, to
conduct the study. The statistician selects a sample of size 39, which is the smallest sample possible that will meet
the condition for large expected counts. Which of the following could not be the null hypothesis for the study?
A) H0 : p1 = 0.20, p2 = 0.20, p3 = 0.20, p4 = 0.20, p5 = 0.20
B) H0 : p1 = 0.15, p2 = 0.35, p3 = 0.22, p4 = 0.15, p5 = 0.13
C) H0 : p1 = 0.21, p2 = 0.23, p3 = 0.21, p4 = 0.18, p5 = 0.14
D) H0 : p1 = 0.34, p2 = 0.21, p3 = 0.14, p4 = 0.15, p5 = 0.16
E) H0 : p1 = 0.43, p2 = 0.23, p3 = 0.17, p4 = 0.09, p5 = 0.08
196. Julian is a manager at a clothing store for teens. He is analyzing the order for next season. Data for the
previous 10 years suggests that teens are willing to spend an average of $75 for a pair of designer jeans with a
standard deviation of $5. However, Julian thinks the average may have changed due to a recession. He finds that
the last three seasons of data show that teens spent an average of $68 on a pair of jeans Therefore, he
performed a hypothesis test to see if the recent average is the same. Julian used a significance level of 5% to
perform the test. Which of the following statements is valid based on the results?
Answer : Julian's data shows that the recent seasons' average jean price is not $75.
197. Brett is performing a hypothesis test in which the population mean is 310 and the standard deviation is
20. His sample data has a mean of 295 and a sample size of 50. What of the following correctly depicts the z-
statistic for Brett's data?
Answer: -5.3
How to fine the value of z-statistic for population mean?
We have:
x̅ : the sample mean
µ : the population mean
ơ : the population standard deviation
n : the sample size
x−µ 295−310
The z-statistic for this data is found as : z = = = -5.3
ơ /√ n 20 /√ 50
198. A popular video game claims that the average time needed to reach level 10 Paladin is 3 hours with a
standard deviation of 0.4 hours. James thinks that he and his four friends are more skilled than the average
gamer because it took them an average of only 2.5 hours. Which of the following is the most restrictive level that
would validate his claim?
Answer: 1%
In order to know if the claim of James is correct or not, firstly we need to determine the standard error. The
standard error is the ratio of the standard deviation and the square root of the sample. Therefore, the standard
error can be written as,
ơ 0.4
Standard error: SE = = =0.178
√µ √5
The z-score is defined as the distance from the sample to the population mean in units of standard error.
X−µ 3−2.5
Z= = 2.81
SE 0.178
Using the Z-table in order to know the probability of the claim of James is correct or not, therefore, at this z-
score, the probability is 0.9974=1.00%.
Hence, the probability that the claim of James is correct is 1%.
199. David wants to complete a hypothesis test with the least amount of probability for error. If he sets the
significance level to 1%, assuming his sample is truly random, what else could he adjust in the test in order to
reduce error?
Answer: He could increase the sample size.
z α /2 2
n=( ¿
ε
200. Tyesha found that the z-statistic was 2.1 and that the critical z-values were -1.96 and 1.96. Which of the
following is a valid conclusion based on these results?
Answer: One can reject the null hypothesis.
201. Zachary completes a hypothesis test and finds that he rejects the null hypothesis. Which statement
gives a reason for rejecting the null hypothesis?
Answer: The z-statistic lies in the critical region.
202. A study investigated the job satisfaction of teachers allowed to choose supplementary curriculum for
their classes versus teachers who were assigned all curricular resources for use in their classes. The authors of
the study wanted to know if the two groups of teachers had different levels of job satisfaction. They will use a
significance level of 5% for their test.
Answer: -1.96 and 1.96
203. A consumer protection group randomly checks the volume of different beverages to ensure that
companies are packaging the stated amount. Each individual volume is not exact, but a volume of iced tea
beverages is supposed to average to 300 mL with a standard deviation of 3 mL. The consumer protection group
sampled 20 beverages and found the average to be 298.4 mL. Which of the following is the most restrictive level
of significance on a hypothesis test that would indicate the company is packaging less than the required average
300 mL?
Answer: 2.5%
Sample size n = 20
Sample mean x bar = 298.4
ơ = population sta deviaton = 3
µ = population mean = 300 mL
x−µ
t statistic =
SE
std deviation 3
STd error = std deviation/sqrt of n = = =0.6711
√n √ 20
298.4−300
Hence t statistic = = -2.38
0.6711
df =20-1=19
p value for this is 0.013 = 1.3%
Note that p vaue →1% but less than 2.5%, 5% and 10%
Hence the restrictive value most is 2.5%
204. A recent survey of 8,000 high school students found that the mean price of a prom dress was $195.00
with a standard deviation of $12.00. Alyssa thinks that her school is more fashion conscious and that students
spent more than $195.00. She collected data from 20 people in her high school and found that the average price
spent on a prom dress was $208.00. Which of the following are the correct null hypothesis and alternate
hypothesis?
Answer: H0 = 195; Ha > 195
205. A study investigated the job satisfaction of teachers allowed to choose supplementary curriculum for
their classes versus teachers who were assigned all curricular resources for use in their classes. On average,
when surveyed regarding job satisfaction, teachers give a score of 3.3 out of 5 with a standard deviation of 0.6.
When the authors of the study interviewed 40 teachers who supplemented with their own materials, they found
3.5 to be the mean. The authors wanted to know if the group of teachers that could choose supplementary
curriculum had a higher level of job satisfaction. They used a significance level of 1%. Which of the following
statements is valid based on the results of the test?
Answer: The data shows that the authors cannot make a determination either way with this data.
206. A researcher collated data on Americans' leisure time activities. She found the mean number of hours
spent watching television each weekday to be 2.7 hours with a standard deviation of 0.2 hours. Jonathan
believes that his football team buddies watch less television than the average American. He gathered data from
40 football teammates and found the mean to be 2.3. Which of the following are the correct null and alternate
hypotheses?
Answer: H0 = 2.7; Ha < 2.7
207. Dion is performing a hypothesis test in which the population mean is 92 and the standard deviation is 2.
His sample size is 7 with a mean of 93.5. Which of the following correctly depicts the z-statistic for this data?
Answer: 1.98
x−µ
z=
ơ /√ n
208. A researcher collated data on Americans' leisure time activities. She found the mean number of hours
spent watching television each weekday to be 2.7 hours with a standard deviation of 0.4 hours. Jonathan
believes that his football team buddies watch less television than the average American. He gathered data from
15 football teammates and found the mean to be 2.3. Which of the following shows the correct z-statistic for this
situation?
Answer: -3.87
209. The College Board states that the average math SAT score is 514 with a standard deviation of 117.
Colleen gathered data from 50 students in her graduating class and found the average score to be 523. She
thinks that her class's math SAT score is different from the average. Which of the following are the correct null
hypothesis and alternate hypothesis?
Answer : Ho = 514, Ha ≠ 514
210. A real estate agent is working for a developer who claims that the average commute time to downtown
is 20 minutes with a standard deviation of 7 minutes. Stephon is an independent real estate agent and wants to
check the times for his client. He took a random sample of 15 commute times and found an average of 26
minutes. He did hypothesis testing using a significance level of 5%. Which conclusion could he make?
Answer: The z-statistic is 3.32, so the null hypothesis should be rejected.
211. When a t-test cannot be used?
a) When there is an interaction term in the model
b) When among the independent variables there are seasonal dummies
c) When we are interested in the overall significance of the multiple regression model
d) When there is an intercept in the model
212. If the coefficient of determination is equal to 1 in a regression problem, the
a) Error sum of squares must be 0
b) Total sum of squares must be 0
c) Regression sum of squares must be 0
d) Residual sum of squares must be 1
(An R2 of 1 indicates that the regression predictions perfectly fit the data.)
SSE
R2 = 1-
SST
213. In a regression analysis if r2= 1, then
a) SSE must also be equal to one
b) SSE must be equal to zero
c) SSE can be any positive value
d) SSE must be negative
214. If the coefficient of determination is equal to 0 in a regression problem:
a) Regression sum of squares must be 1
b) Total sum of squares must be 0
c) Error sum of squares must be 1
d) Regression sum of squares must be 0
Explain: The coefficient of determination, also known as R-squared, represents the proportion of the variance in
the dependent variable explained by the independent variable(s) in a regression model. It ranges from 0 to 1,
with 0 meaning no explanation and 1 meaning perfect explanation.
When the coefficient of determination is equal to 0, it means that none of the variance in the dependent
variable is explained by the independent variable(s).
This implies that the regression line (predicted values) perfectly coincides with the mean line (average of the
dependent variable). Therefore, there is no difference between the predicted values and the mean of the
dependent variable, leading to a regression sum of squares (RSS) of 0.
The other options are incorrect:
o Regression sum of squares cannot be 1 unless all the data points fall on the regression line, which would
contradict R-squared being 0.
o Total sum of squares (TSS) represents the total variance in the dependent variable and cannot be 0.
o Error sum of squares (ESS) represents the unexplained variance and would likely be very close to TSS
when R-squared is 0, not necessarily 1.
Therefore, when the coefficient of determination is equal to 0, the only correct option is that the regression sum
of squares must be 0.
215. If the coefficient of determination is equal to 1, then the correlation coefficient
a) Must be equal to 1
b) Can be either -1 or +1
c) Can be any value between -1 to +1
d) Must be -1
216. If the correlation coefficient is a positive value, then the slope of the regression line
a) must also be positive
b) can be either negative or positive
c) can be zero
d) can not be zero
217. If the coefficient of determination is 0.81, the correlation coefficient
a) is 0.6561
b) could be either + 0.9 or - 0.9
c) must be positive
d) must be negative
218. The correlation coefficient is used to determine:
a) A specific value of the y-variable given a specific value of the x-variable
b) A specific value of the x-variable given a specific value of the y-variable
c) The strength of the relationship between the x and y variables
d) None of these
219. In order to test the validity of a multiple regression model involving 5 independent variables, an
intercept and 50 observations, the statistic for assessing the overall significance of the model follows:
a) Student’s t-distribution with 45 degrees of freedom
b) Student’s t-distribution with 44 degrees of freedom
c) F-distribution with 5 and 44 degrees of freedom
d) F-distribution with 5, 44 and 45 degrees of freedom
Explain: In multiple regression, we test the overall significance of the model using the F-test. This test compares
the variance explained by the model (regression sum of squares) to the unexplained variance (error sum of
squares).
The degrees of freedom for the F-test depend on the number of independent variables (k) and the number of
observations (n). In this case, k = 5 and n = 50.
Therefore, the F-statistic will have degrees of freedom for numerator (k) = 5 and degrees of freedom for
denominator (n - k - 1) = 50 - 5 - 1 = 44.
The other options are incorrect:
o Student's t-distribution is used for testing individual coefficients in a regression model, not the overall
significance.
o The sum of the degrees of freedom for numerator and denominator should not be included in the F-
distribution.
Therefore, to test the overall significance of a multiple regression model with 5 independent variables, an
intercept, and 50 observations, the F-distribution with 5 and 44 degrees of freedom is used.
220. Among OLS assumptions there is no assumption about:
a) Normally distributed error term
b) Serial correlation of residuals
c) Multicollinearity
d) Significant coefficients
221. You have estimated a model where you explain GDP in European countries with a number of
independent variables. You have quarterly data for GDP and you want to include quarterly dummies into the
model. How many dummies you need to introduce in the model?
a) 11
b) 4
c) 12
d) 3
222. What are the degrees of freedom for a critical T-value in case of a regression analysis based on 6
independent variables and a sample containing 260 observations?
253 (=n-k-1=260-6-1)
223. What are the degrees of freedom for a critical T-value in case of a regression analysis based on 6
independent variables and a sample containing 214 observations?
207
224. The standard error of the estimate for a multiple regression model with two explanatory variables :
a) Has the same sign as B1
b) Measures the proportion of variation in Y that is explained by X1 and X2
c) Measures the proportion of variation in Y that is explained by X1 holding X2 constant.
d) Measures the variation around the predicted regression equation
225. You have a model with 7 independent variables and 152 observations.
Your ANOVA table is presented below.

Sum of squares
Regression 30,501
Residual 36,154
Total 66,655
Please calculate the value of adjusted coefficient of determination. Use four decimal places and a "dot" as the
separator e.g. 0.0003
Answer: 0.4312
n−1 SSE
Adjusted R-squared = 1 – ( * )
n−k−1 SST
226. You have a model with 4 independent variables and 293 observations.
Your Anova table Is presented below.

Sum of squares
Regression 7,781
Residual 9,060
Please calculate the model error variance. Answer: 31,4583
SSE
MSE =
n−k−1
227. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1 and 2 below. Is the coefficient on area significant at
5%? In order to answer please set up hypotheses, critical value, decisionrule and conclusions
Table 1.

Model Sum of Squares Df


1 Regression 255 4
Residual 173 501
Total 428 505
Table 2.

Model Unstrandardardized coefficients


B Std. Error
1 (Constant) -4700.838 4048.906
Crime -209.042 32.098
Rooms 7441.453 401.573
Teaching_quality -1046.638 132.737
Area 991.284 524.198
a. Dependent Variable: price
We conclude that the coefficient of area is insignificant
You have estimated a model where you explain sales (in thousands of dollars) with a number of independent
variables. You also want to include dummies for months in order to capture potential seasonality effect; as the
base reference you use January. What test would you set up in order to verify whether there is a significant
difference in sales between December and January?
a) T test for coefficient on December
b) T test for the intercept
c) T test for coefficient on January
d) F test for subgroup including December and January
228. You have estimated a model where you explain sales (in thousands of dollars) with a number of
independent variables. You also want to include dummies for months in order to capture potential seasonality
effect. What test you would need to set up in order to check whether these dummies are jointly significant?
a) F test for subgroup
b) T-test for all dummies together
c) T-test for each dummy
d) Overall F test
229. You have estimated a model where you explain sales (in thousands of dollars) with a number of
independent variables. You also want to include dummies for months in order to capture potential seasonality
effect. How many monthly dummies you need to introduce in the model?
a) Impossible to tell
b) 11
c) 10
d) 12
230. You have a model with 6 independent variables and 104 observations.
Your Anova table Is presented below.

Sum of squares
Regression 4,477
Residual 5,234
Please calculate the value of the standard error of the estimate. Answer: 7,1633
SSE
SYX = √
n−2
231. You have a model with 5 independent variables and 144 observations.
Your Anova table Is presented below.

Sum of squares
Regression 44,118
Residual 26,616
Please calculate the adjusted coefficient of determination. Answer: 0,6101
n−1 SSE
Adjusted R-squared = 1 – ( * )
n−k−1 SST
232. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1 and 2 below. What is the value of the coefficient of
determination (R squared) in this model?
Table 1.

Model Sum of Squares Df


1 Regression 255 4
Residual 173 501
Total 428 505
Table 2.

Model Unstrandardardized coefficients


B Std. Error
1 (Constant) -4700.838 4048.906
Crime -209.042 32.098
Rooms 7441.453 401.573
Teaching_quality -1046.638 132.737
Area 991.284 524.198
b. Dependent Variable: price
a) 0.321
b) 0.404
c) 0.596
d) 0.687
n−1 SSE
Adjusted R-squared = 1 – ( * )
n−k−1 SST
233. A real estate broker is interested in identifying the factors that determine the price of a house. She
wants to run the following regression: y= B0 + B1x1 + B2x2 + B3x3 + u
where Y = price of the house in $1,000s, x1= number of bedrooms, x2= square footage of livingspace, and x3=
number of miles from the beach. Taking a sample of 30 houses, the broker runs a multiple regression and gets
the following results: Y=123.2 + 4.59x1 + 0.125x2 - 6.04x3
Sb1=1.2
Sb2=2.13
Sb3=4.17
What is the 95% confidence interval for B1?
a) 4.59 +/- 2.47
b) 4.59 +/- 4.17
c) 2.13 +/- 4.38
d) 2.13 +/- 4.17
Explain:
1. Find the critical value:
We use a t-distribution with degrees of freedom (df) = n - k - 1, where n is the sample size (30) and k is the
number of independent variables (3). Therefore, df = 30 - 3 - 1 = 26.
Look up the t-value in a t-distribution table or use a software function for a 95% confidence level and 26 degrees
of freedom. You'll find a t-value of approximately 1.984.
2. Calculate the margin of error:
Multiply the standard error of B1 (Sb1 = 1.2) by the t-value: Margin of error = 1.2 * 1.984 ≈ 2.381.
3. Construct the confidence interval:
Lower bound: Estimated coefficient B1 (4.59) minus the margin of error: 4.59 - 2.381 ≈ 2.209.
Upper bound: Estimated coefficient B1 (4.59) plus the margin of error: 4.59 + 2.381 ≈ 6.971.
Therefore, the 95% confidence interval for B1 is (2.209, 6.971), which can be rounded to 4.59 +/- 2.47 for easier
interpretation
234. Arnar collected various characteristics of 348 houses in his city and used this dataset to set up amultiple
regression model with 4 independent variables. He then, has set up an F test for theoverall significance of the
model and calculated that F statistics is equal to: F = 12. What can he conclude?
a. He can reject H0 at all standard significance levels.
b. He fails to reject H0 at 5%.
c. His calculations are flawed. No conclusion can be reached.
d. He can reject H0 at 5% but fails to reject at 1%
Explain:
Here's the reasoning:
F-test for overall significance: This test checks whether the regression model explains a statistically significant
amount of variance in the dependent variable compared to just using the mean.
F-statistic = 12: A high F-statistic indicates evidence against the null hypothesis (H0), which states that all
regression coefficients are equal to zero (i.e., the model has no explanatory power).
Significance levels: Common significance levels used are 5% and 1%. A higher significance level requires stronger
evidence to reject H0.
Since the F-statistic (12) is relatively high, it provides strong evidence against H0. Therefore, Arnar can reject the
null hypothesis at all standard significance levels, including 5% and 1%. This means the regression model likely
has statistically significant explanatory power.
Here's why the other options are incorrect:
b) Fails to reject H0 at 5%: This would be the conclusion if the F-statistic was less than the critical value for a 5%
significance level and 4 and 344 degrees of freedom (4 independent variables and 348 observations).
c) Calculations are flawed: There's no indication of flawed calculations based on the information provided.
d) Rejects H0 at 5% but fails at 1%: As explained above, rejecting H0 at 5% implies rejecting it at any higher
significance level (like 1%) due to the cumulative nature of p-values.
235. Arnar collected various characteristics of 348 houses in his city and used this dataset to set up amultiple
regression model with 4 independent variables. He then, has set up an F test for theoverall significance of the
model and calculated that F statistics is equal to: F = -12 What can he conclude?
a) He can reject the null hypothesis that coefficients are jointly equal to 0.
b) He can accept the null hypothesis - there is no evidence that coefficients are jointlyinsignificant.
c) He can reject the null hypothesis that coefficients are jointly different from 0.
d) His calculations are flawed. No conclusion can be reached
Explain: Explanations for the other options:
a) He can reject the null hypothesis that coefficients are jointly equal to 0: This option would hold true if the F-
statistic was positive and significant, indicating the model explains a statistically significant amount of variance.
However, with a negative F-statistic, this conclusion cannot be drawn.
b) He can accept the null hypothesis - there is no evidence that coefficients are jointly insignificant: This would
be the conclusion if the F-statistic was non-significant, but still positive. A negative F-statistic doesn't provide
valid information about the significance of the coefficients.
c) He can reject the null hypothesis that coefficients are jointly different from 0: Similar to option a), this
statement requires a positive and significant F-statistic, which isn't applicable in this situation.
Therefore, due to the impossible value of the F-statistic (-12), it's impossible to draw any valid conclusions about
the joint significance of the model coefficients. Further investigation into the calculations and potential errors is
necessary before making any inferences.

236. Arnar collected various


characteristics of 348 houses in his
city and used this dataset to set up a
237. multiple regression model with
4 independent variables. The t-
statistic for size of the house (one
238. of the independent variables in
the model) is equal to 0.89. What can
he conclude?
239. a) Size of the house is
significant at 10% but is not
significant at 1% level (using a 2-
tailed
240. test)
241. b) Size of the house is
significant at 10% but is not
significant at 1% level (using a 1-
tailed
242. test)
243. c) Not enough data to answer
this question
244. d) Size of the house is not
significant at any usual alpha level
245. Arnar collected various characteristics of 348 houses in his city and used this dataset to set up amultiple
regression model with 4 independent variables. The t-statistic for size of the house (one of the independent
variables in the model) is equal to 0.89. What can he conclude?
a) Size of the house is significant at 10% but is not significant at 1% level (using a 2-tailedtest)
b) Size of the house is significant at 10% but is not significant at 1% level (using a 1-tailedtest)
c) Not enough data to answer this question
d) Size of the house is not significant at any usual alpha level
246. You have a model with 7 independent variables and 291 observations.
Your Anova table Is presented below.

Sum of squares
Regression 6,993
Residual 13,070
Please calculate the model error variance. Answer: 46,1837
SSE
MSE =
n−k−1
247. The correct interpretation of the following model is (log(wage))=0.584+0.083education+0.03female
a) For females, if education increases by 1 unit, wage increases by 3*100%, ceteris paribus
b) Females on average earn 3% more than males, keeping education constant.
c) If Female increases by 1, log(wage) increases by 3%.
d) For females, if education increases by 1 year, wage increases by 8.3%, ceteris paribus
248. The correct interpretation of the following model is (log(wage))=0.584+0.083education
a) If education increases by 1 year, wage increases by 8.3%
b) If education increases by 1 year, wage increases by 0.083 per week.
c) If education increases by 1 year, wage increases by 8.3 per week.
d) If education increases by 1 year, wage increases by 0.083# per week.
249. You analyse a simple regression model investigating the size of the apartment (in squaremeters) on the
price of apartments (in thousands of Euros). The dependent variable is in naturallogarithm. The coefficient on
the independent variable is equal to 0,013. What is the interpretation of that coefficient?
a) An extra square meter increases the price of an apartment by 13€
b) An extra square meter increases the price of an apartment by 0,13%
c) An extra square meter increases the price of an apartment by 1,3%
d) An extra square meter increases the price of an apartment by 130€
250. You are estimating a model with the price of a bicycle as the dependent variable (in €). There are four
independent variables in the model:
x1 age of the bicycle (in years),
x2 number of gears,
x3 number of previous owners,
x4 a dummy equal to 1 when a bicycle has a basket and 0 otherwise.
Model is of the following type:
Y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + u
Estimation gives the following results:
Y_hat = 23 - 14.1x1 + 3x2 - 5.2x3 + 3.6x4
How would you interpret coefficient on the dummy variable (x4)?
a) If basket increases by 1 the price of the bike will increase by 3.6€ + 23€, ceteris paribus.
b) A bike with a basket is on average 3.6€ more expensive than a bike without a basket, keeping other variables
fixed
c) If basket increases by 1 the price of the bike will increase by 3.6€.
d) A bike with a basket is on average 36% more expensive than a bike without a basket,keeping other variables
fixed.
251. For a sample of 300 houses across France, we estimate a model relating the price of a house tovarious
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community.
The estimated model is of the following form:(log(price))=9.23-0.718log(nox)+0.306rooms
Which is the correct interpretation?
a) If number of rooms increases by 1, price increases by 0.306€, keeping nox constant.
b) If number of rooms increases by 1, price increases by 0.306%, keeping nox constant.
c) If number of rooms increases by 1, price increases by 30.6%, keeping nox constant.
d) If number of rooms increases by 1%, price increases by 30.6%, keeping nox constant.
252. For a sample of 300 houses across France, we estimate a model relating the price of ahouse to various
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community
The correct interpretation of the following model is(log(price))=9.23-0.718log(nox)+0.306rooms
a) If the amount of nitrogen oxide in the air increases by 1%, the price of the housedecreases by 71,8%, keeping
the number of rooms constant.
b) If the amount of nitrogen oxide in the air increases by 1 unit, the price of the housedecreases by 0,718 unit,
keeping the number of rooms constant.
c) If the amount of nitrogen oxide in the air increases by 1 unit, the price of the housedecreases by 71.8%,
keeping the number of rooms constant.
d) If the amount of nitrogen oxide in the air increases by 1%, the price of the housedecreases by 0.72%, keeping
the number of rooms constant.
253. For a sample of 300 houses across France, we estimate a model relating the price of a house tovarious
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community
The estimated model is of the following form:(log(price))=9.23-0.718log(nox)+0.306rooms
What is the value of intercept in this model and how do we interpret it?
a) Intercept is equal to 9.23, we interpret it only if it is significant (we need to set up a t-test)
b) There is no intercept in this model.
c) Intercept is equal to 9.23, we do not interpret it.
d) Intercept is equal to -0.718, we interpret it as elasticity.
254.

Model Unstandardized Coefficients


B Std. error
1 (Constant) 5,331 ,114
Education ,075 ,006
Experience ,014 ,003
tenure ,013 ,003
The above table presents partial output from a regression.
Please indicate what is the value of the t-statistic for education.
a) 12.5 %
b) 13.5
c) 13.5 %
d) 12.5
255. ANOVA

Model Sum of squares Df


Regression 27671848,87 4
Residual 120544319,4 930
Total 152716168,2 934
i. The above table presents ANOVA results from a regression.How many independent variables are in this
regression?
a) 3
b) 5
c) 4
d) impossible to tell
ii. The above table presents ANOVA results from a regression. How many observations are in the analysed
sample?
a) 930
b) 4
c) 935
d) 934
iii. The above table presents ANOVA results from a regression. What is the value of the coefficient of
determination from this model?
a) 18%
b) 82%
c) 0.82
d) 0.18%
256. What would you conclude if you fail to reject H0: β1 = β2 = ... = βk = 0?
a) the independent variables are good predictors of the dependent variable
b) no relationship exists between the dependent variable and the independent variables
c) a strong relationship exists among the independent variables
d) more information is needed to answer the question
257. What would you conclude if you reject H0: β1 = β2 = ... = βk = 0?
a) All independent variables are good predictors of the dependent variable
b) No relationship exists between the dependent variable and the independent variables
c) More information is needed to answer the question
d) At least one of the tested variables is a good predictor of the dependent variable.
258. In testing the validity of a multiple regression model, a large value of the F-test statistic indicates that:
a) most of the variation in y is unexplained by the regression equation.
b) most of the variation in the independent variables is explained by the variation in y.
c) the model has significant explanatory power because at least one slope coefficient is not zero.
d) the model provides a poor fit.
Explain: Here's a breakdown of why the other options are incorrect and why c) is the correct interpretation:
a) Most of the variation in y is unexplained by the regression equation: This is the opposite of what a large F-
statistic indicates. A high F-value suggests that the model explains a significant amount of variation in y, not the
other way around.
b) Most of the variation in the independent variables is explained by the variation in y: The F-test doesn't assess
the explanation of variation in independent variables. It focuses on how well the independent variables explain
the variation in the dependent variable (y).
d) The model provides a poor fit: A large F-statistic signals a good fit, not a poor one. It means the model is doing
a good job of accounting for the variability in y.
Why c) is correct:
F-test: The F-test, in multiple regression, examines the overall significance of the model. It compares the variance
explained by the regression model (SSR) to the variance unexplained (SSE).
Large F-statistic: A large F-statistic indicates that the ratio of SSR to SSE is substantial, meaning the model is
capturing a significant amount of the variation in y.
At least one slope coefficient not zero: A large F-statistic also implies that at least one of the slope coefficients
(betas) in the model is likely not zero. If all coefficients were zero, the model wouldn't explain any variation in y,
and the F-statistic would be small.
Significant explanatory power: Therefore, a large F-value suggests that the model has significant explanatory
power because it's effectively capturing relationships between the independent variables and the dependent
variable.
259. You are estimating a model with price of a bicycle as the dependent variable. There are four
independent variables in the model:age of the bicycle, number of gears, number of previous owners, a dummy
equal to 1 when abicycle has a basket and 0 otherwise.Model is of the following type:
Y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + u
What additional model would you need to estimate in order to verify the following hypothesis.
H0: B1=B2=0
H1: otherwise
a) Model with B1 and B2 as independent variables.
b) Model with x3 and x4 as independent variables.
c) Model with x1 and x2 as independent variables.
d) Model with B3 and B4 as independent variables
260. What are the degrees of freedom for a critical T-value in case of a regression analysis based on 7
independent variables and a sample containing 167 observations?
Answer: 159 (=n-k-1)
261. In order to evaluate significance of individual coefficients one has to set up:
a) F-test
b) An F-test for a subgroup
c) Only a t-test
d) T-test or a confidence interval
e) Only a confidence interval
262. What are the degrees of freedom for a critical T-value in case of a regression analysis based on 3
independent variables and a sample containing 113 observations?
Answer: 109
263. If there is a very strong correlation between two variables then the correlation coefficient must be
a. any value larger than 1
b. much smaller than 0, if the correlation is negative
c. much larger than 0, regardless of whether the correlation is negative or positive
d. None of these alternatives is correct.
264. In regression, the equation that describes how the response variable (y) is related to the explanatory
variable (x) is:
a. the correlation model
b. the regression model
c. used to compute the correlation coefficient
d. None of these alternatives is correct.
265. The relationship between number of beers consumed (x) and blood alcohol content (y) was studied in
16 male college students by using least squares regression. The following regression equation was obtained from
this study:
Y = -0.0127 + 0.0180x
The above equation implies that:
a. each beer consumed increases blood alcohol by 1.27%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by an average of amount of 1.8%
d. each beer consumed increases blood alcohol by exactly 0.018
266. SSE can never be
a. larger than SST
b. smaller than SST
c. equal to 1
d. equal to zero
267. Regression modeling is a statistical framework for developing a mathematical equation that describes
how
a. one explanatory and one or more response variables are related
b. several explanatory and several response variables response are related
c. one response and one or more explanatory variables are related
d. All of these are correct.
268. In regression analysis, the variable that is being predicted is the
a. response, or dependent, variable
b. independent variable
c. intervening variable
d. is usually x
269. Regression analysis was applied to return rates of sparrowhawk colonies. Regression analysis was
used to study the relationship between return rate (x: % of birds that return to the colony in a given
year) and immigration rate (y: % of new adults that join the colony per year). The following
regression equation was obtained.
y = 31.9 – 0.34x
Based on the above estimated regression equation, if the return rate were to decrease by 10% the rate of
immigration to the colony would:
a. increase by 34%
b. increase by 3.4%
c. decrease by 0.34%
d. decrease by 3.4%
270. In least squares regression, which of the following is not a required assumption about the error term ε?
a. The expected value of the error term is one.
b. The variance of the error term is the same for all values of x.
c. The values of the error term are independent.
d. The error term is normally distributed.
Explain: Required assumptions about the error term (ε):
Zero mean: The expected value of the error term should be zero, not one. This means that, on average, the
errors should balance out to zero across all observations.
Constant variance (homoscedasticity): The variance of the error term should be constant for all values of the
independent variable(s) (x). This ensures that the errors have a consistent spread throughout the range of x.
Independence: The errors should be independent of each other. This means that the error in one observation
should not influence the error in another observation.
Normality (optional): Although not strictly required for least squares estimation, assuming a normal distribution
for the error term allows for more robust statistical inference, such as hypothesis testing and confidence interval
construction.
271. Larger values of r2 (R2) imply that the observations are more closely grouped about the
a. average value of the independent variables
b. average value of the dependent variable
c. least squares line
d. origin
272. In a regression analysis if r2 = 1, then
a. SSE must also be equal to one
b. SSE must be equal to zero
c. SSE can be any positive value
d. SSE must be negative
273. The coefficient of correlation
a. is the square of the coefficient of determination
b. is the square root of the coefficient of determination
c. is the same as r-square
d. can never be negative
274. In regression analysis, the variable that is used to explain the change in the outcome of an experiment,
some natural process, is called
a. the x-variable
b. the independent variable
c. the predictor variable
d. the explanatory variable
e. all of the above (a-d) are correct
f. none are correct
275. In the case of an algebraic model for a straight line, if a value for the x variable is specified, then
a. the exact value of the response variable can be computed
b. the computed response to the independent value will always give a minimal residual
c. the computed value of y will always be the best estimate of the mean response
d. none of these alternatives is correct.
276. A regression analysis between sales (in $1000) and price (in dollars) resulted in the following equation:
y = 50,000 - 8X
The above equation implies that an
a. increase of $1 in price is associated with a decrease of $8 in sales
b. increase of $8 in price is associated with an increase of $8,000 in sales
c. increase of $1 in price is associated with a decrease of $42,000 in sales
d. increase of $1 in price is associated with a decrease of $8000 in sales
277. In a regression and correlation analysis if r = 1, then
a. SSE = SST
b. SSE = 1
c. SSR = SSE
d. SSR = SST (SSE=0)
278. If the coefficient of determination is a positive value, then the regression equation
a. must have a positive slope
b. must have a negative slope
c. could have either a positive or a negative slope
d. must have a positive y intercept
279. If two variables, x and y, have a very strong linear relationship, then
a. there is evidence that x causes a change in y
b. there is evidence that y causes a change in x
c. there might not be any causal relationship between x and y
d. None of these alternatives is correct.
280. If the coefficient of determination is equal to 1, then the correlation coefficient
a. must also be equal to 1
b. can be either -1 or +1
c. can be any value between -1 to +1
d. must be -1
281. In regression analysis, if the independent variable is measured in kilograms, the dependent variable
a. must also be in kilograms
b. must be in some unit of weight
c. cannot be in kilograms
d. can be any units
282. The data are the same as for question 4 above. The relationship between number of beers consumed
(x) and blood alcohol content (y) was studied in 16 male college students by using least squares regression. The
following regression equation was obtained from this study:
y = -0.0127 + 0.0180x
Suppose that the legal limit to drive is a blood alcohol content of 0.08. If Ricky consumed 5 beers the model
would predict that he would be:
a. 0.09 above the legal limit
b. 0.0027 below the legal limit
c. 0.0027 above the legal limit
d. 0.0733 above the legal limit
283. In a regression analysis if SSE = 200 and SSR = 300, then the coefficient of determination is
a. 0.6667
b. 0.6000
c. 0.4000
d. 1.5000
SSR
Explain : r2 =
SSR+ SSE
284. If the correlation coefficient is 0.8, the percentage of variation in the response variable explained by the
variation in the explanatory variable is
a. 0.80%
b. 80%
c. 0.64%
d. 64%
Explain: The correlation coefficient (r) tells us the strength and direction of a linear relationship between two
variables. However, it doesn't directly represent the percentage of variation explained. To find the percentage
explained, we need to square the correlation coefficient and multiply by 100.
Therefore, for a correlation coefficient of 0.8:
Explained variation = (0.8)^2 * 100
Explained variation = 0.64 * 100
Explained variation = 64%
So, 64% of the variation in the response variable is explained by the variation in the explanatory variable when
the correlation coefficient is 0.8.
285. If the correlation coefficient is a positive value, then the slope of the regression line
a. must also be positive
b. can be either negative or positive
c. can be zero
d. can not be zero
286. If the coefficient of determination is 0.81, the correlation coefficient
a. is 0.6561
b. could be either + 0.9 or - 0.9
c. must be positive
d. must be negative
287. A fitted least squares regression line
a. may be used to predict a value of y if the corresponding x value is given
b. is evidence for a cause-effect relationship between x and y
c. can only be computed if a strong linear relationship exists between x and y
d. None of these alternatives is correct.
Explain: Here's why the other options are incorrect:
b. is evidence for a cause-effect relationship between x and y: Regression only shows correlation, not causation.
A strong fit doesn't imply that x causes y; other factors could be influencing the relationship.
c. can only be computed if a strong linear relationship exists between x and y: Least squares regression can be
computed even for weak or non-linear relationships. However, the quality of the predictions might be poor in
those cases.
d. None of these alternatives is correct: This option is incorrect because option a is indeed correct.
Key points about fitted least squares regression lines:
Prediction: Their primary purpose is to predict values of the dependent variable (y) based on the independent
variable (x).
No causation: They don't prove causation, only correlation. Caution is needed when interpreting the relationship
between x and y.
No strength requirement: They can be computed for any degree of linear relationship, but a stronger relationship
generally leads to better predictions.
288. Regression analysis was applied between $ sales (y) and $ advertising (x) across all the branches
of a major international corporation. The following regression function was obtained.
y = 5000 + 7.25x
If the advertising budgets of two branches of the corporation differ by $30,000, then what will be the predicted
difference in their sales?
a. $217,500
b. $222,500
c. $5000
d. $7.25
289. Suppose the correlation coefficient between height (as measured in feet) versus weight (as measured in
pounds) is 0.40. What is the correlation coefficient of height measured in inches versus weight measured in
ounces? [12 inches = one foot; 16 ounces = one pound]
a. 0.40
b. 0.30
c. 0.533
d. cannot be determined from information given
e. none of these
Explain:
Suppose the correlation coefficient of height measured in inches versus weight measured in ounces
then correlation coefficient of height measured in inches versus weight measured in ounces = 0.3
Step-by-step explanation:
the correlation coefficient between heights (as measured in feet) versus weight (as measured in pounds) is 0.40
H ft = 0.4 * W pound
H/W = 0.4
A inches = K B Ounces
K = coefficient of height measured in inches versus weight measured in ounces
A inches = H ft * 12
A = H/12
B Ounces = W pound * 16
12 H = K 16W
=> 3 H = K 4W
=> K = (3/4) (H/W)
=> K = (3/4) (0.4)
=> K = 0.3
290. Assume the same variables as in question 28 above; height is measured in feet and weight is measured
in pounds. Now, suppose that the units of both variables are converted to metric (meters and kilograms). The
impact on the slope is:
a. the sign of the slope will change
b. the magnitude of the slope will change
c. both a and b are correct
d. neither a nor b are correct
291. Suppose that you have carried out a regression analysis where the total variance in the response is
133452 and the correlation coefficient was 0.85. The residual sums of squares is:
a. 37032.92
b. 20017.8
c. 113434.2
d. 96419.07
e. 15%
f. 0.15
Explain: SST = 133452, R2 = 0.85
SSR
R2 = = 0.85 => SSR = 0.85 * 133452 = 113434.2 => SSE = SST – SSR = 20017.2
SST
292. This question is related to questions 4 and 21 above. The relationship between number of beers
consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least squares
regression. The following regression equation was obtained from this study:
Y = - 0.0127 + 0.0180x
Another guy, his name Dudley, has the regression equation written on a scrap of paper in his pocket. Dudley goes
out drinking and has 4 beers. He calculates that he is under the legal limit (0.08) so he decides to drive to another
bar. Unfortunately Dudley gets pulled over and confidently submits to a road-side blood alcohol test. He scores a
blood alcohol of 0.085 and gets himself arrested. Obviously, Dudley skipped the lecture about residual variation.
Dudley’s residual is:
a. +0.005
b. -0.005
c. +0.0257
d. -0.0257
293. You have carried out a regression analysis; but, after thinking about the relationship between variables,
you have decided you must swap the explanatory and the response variables. After refitting the regression model
to the data you expect that:
a. the value of the correlation coefficient will change
b. the value of SSE will change
c. the value of the coefficient of determination will change
d. the sign of the slope will change
e. nothing changes
294. Suppose you use regression to predict the height of a woman’s current boyfriend by using her own
height as the explanatory variable. Height was measured in feet from a sample of 100 women undergraduates,
and their boyfriends, at Dalhousie University. Now, suppose that the height of both the women and the men are
converted to centimeters. The impact of this conversion on the slope is:
a. the sign of the slope will change
b. the magnitude of the slope will change
c. both a and b are correct
d. neither a nor b are correct
295. A residual plot:
a. displays residuals of the explanatory variable versus residuals of the response variable.
b. displays residuals of the explanatory variable versus the response variable.
c. displays explanatory variable versus residuals of the response variable.
d. displays the explanatory variable versus the response variable.
e. displays the explanatory variable on the x axis versus the response variable on the y axis.
296. When the error terms have a constant variance, a plot of the residuals versus the independent variable
x has a pattern that
a. fans out
b. funnels in
c. fans out, but then funnels in
d. forms a horizontal band pattern
e. forms a linear pattern that can be positive or negative
297. You studied the impact of the dose of a new drug treatment for high blood pressure. You think that the
drug might be more effective in people with very high blood pressure. Because you expect a bigger change in
those patients who start the treatment with high blood pressure, you use regression to analyze the relationship
between the initial blood pressure of a patient (x) and the change in blood pressure after treatment with the new
drug (y). If you find a very strong positive association between these variables, then:
a. there is evidence that the higher the patients initial blood pressure, the bigger the impact of the new drug.
b. there is evidence that the higher the patients initial blood pressure, the smaller the impact of the new drug.
c. there is evidence for an association of some kind between the patients initial blood pressure and the impact of
the new drug on the patients blood pressure
d. none of these are correct, this is a case of regression fallacy
298. A variety of summary statistics were collected for a small sample (10) of bivariate data, where the
dependent variable was y and an independent variable was x.
ΣX = 90 Σ(Y – Y̅)(X – X̅) = 466
ΣY = 170 Σ(X – X̅)2 = 234
n = 10 Σ(Y – Y̅)2 = 1434
SSE = 505.98
i. Use the formula to the right to compute the sample correlation coefficient:
466/10
a. 0.8045 ( = )
√234∗1434
∑ ( ( xi−x ) ( yi− y ) )
b. -0.8045 r= 2 2
√ ∑ ( xi−x ) ∑ ( yi− y )
c. 0
d. 1
ii. The least squares estimate of b1 equals
a. 0.923
b. 1.991
c. -1.991
d. -0.923
iii. The least squares estimate of b0 equals
a. 0.923
b. 1.991
c. -1.991
d. -0.923
iv. The sum of squares due to regression (SSR) is
a. 1434
b. 505.98
c. 50.598
d. 928.02
v. The coefficient of determination equals
a. 0.6471
b. -0.6471
c. 0
d. 1
vi. The point estimate of y when x = 0.55 is
a. 0.17205
b. 2.018
c. 1.0905
d. -2.018
e. -0.17205
299. Match the symbol α with the correct definition
a) The power of a test
b) The probability of a Type I error
c) The probability of a Type II error
d) The probability of failing to reject the null hypothesis when it is true
300. To test the validity of a multiple regression model, we test the null hypothesis that all the regression
coefficients are zero by applying the ____.
a) F-test
b) t-test
c) z-test
d) chi-square test
301. Which of the following is true of the least squares equation, y = 20 + 5x?
a) The y-intercept of the least squares line is 5
b) For each unit increase in x, the value of y approximately increase by 5
c) For each unit increase in y, the value of y approximately increase by 20
d) For each unit increase in y, the value of y approximately increase by 25
302. For the chi-square goodness-of-fit-test, the calculated chi-square value is 7.21. If the table chi-square
value is 10.645, what is the appropriate decision for this test?
a) Reject the null hypothesis
b) Fail to reject the null hypothesis
c) Accept both the null and tha alternative hypothesis
d) It is impossible to determine anything from the given information
Explain:
Comparing Calculated and Critical Values:
The calculated chi-square value (7.21) is less than the table chi-square value (10.645) for the given
degrees of freedom and significance level.
Failing to Reject the Null Hypothesis:
When the calculated chi-square value is less than the critical value, it means there's not enough
evidence to reject the null hypothesis. In other words, the observed data doesn't significantly deviate
from the expected distribution under the null hypothesis.
Accepting Hypotheses:
In hypothesis testing, we never "accept" both the null and alternative hypothesis. We either reject or fail
to reject the null hypothesis based on the evidence.
Sufficient Information for Decision:
The given information (calculated and table chi-square values) is sufficient to make a decision about the
null hypothesis in this test.
303. If the coefficient of determination is 1.0 in a regression problem, the
a) Residual sum of squares must be 0.0
b) Error sum of squares must be 1.0
c) Regression sum of squares must be 1.0
d) Total sum of squares must be 0.0
304. How many degrees of freedom are associated with a multiple regression model involving K predictors
and n observations when running a t-test for the individual coefficients?
a) n-K
b) n-K-1
c) n-K+1
d) K-1
305. If the Durbin-Watson statistic has a value close to 0 or 4, which assumption is violated?
a) Normality of the errors
b) Independence of errors
c) Homoscedasticity
d) Variance of errors
306. Suppose you are interested in examining the determinants of earnings. You have information on the age
of the individual as well as their level of education: high school graduate, collage graduate or graduate degree.
Let Y = earnings, X1= age, X2= 1 if the person has only a high school degree and 0 otherwise, X3= 1 if the person
has a collage degree and 0 otherwise, X4 = 1 if the person has a graduate degree and 0 otherwise. Which of the
following model specifications would work for this data?
a) Y = β0+ β1X1+ β2X2+ β3X3+ εi
b) Y = β0+ β1X1+ β2X2+ β3X3+ β4X4+ εi
c) Y = β1X1+ β2X2+ β3X3+ β4X4+ εi
d) Y = β0+ β1X1+ β2X2+ β3X3+ β4X1X4
Explain:
The best model specification for your data examining the determinants of earnings is a) Y = β0 + β1X1 + β2X2 +
β3X3 + εi. Here's why:
Intercept (β0): This accounts for the base level of earnings when all predictors are zero (no age, no education).
Age (β1X1): This estimates the linear relationship between earnings and age.
Education Levels (β2X2, β3X3): These coefficients capture the effect of having a high school degree (X2) and a
college degree (X3) on earnings compared to the baseline category (no degree).
Interaction Term (β4X1X4): This option includes an interaction term between age and graduate degree (X1X4).
While potentially interesting, it might be unnecessary if the model already explains the relationships well
without it.
307. A multiple regression model involves 10 independent variables and 30 observations. If we want to test
the 5% significance level the parameter β4, the critical value will be ____.
a) 1.697
b) 2.093
c) 2.228
d) 1.729
308. Suppose that the estimated regression equation of a College of Business graduates is given by: ŷ =
32,000 + 4,000x + 1,800D, where y is the starting salary, x is the grade point average and D is a dummy variable
which takes the value of 1 if the student is a finance major and 0 if not. An accountancy major graduate with a
3.5 grade point average would have an starting salary of
a) $47,800
b) $46,000
c) $37,800
d) $32,000
309. Suppose you want to estimate the model Y = β0+ β1X1+ β2X2+ β3X3+ β4X4+ εi. However, you cannot
measure X4, so you estimate Y = β0+ β1X1+ β2X2+ β3X3+ εiInstead. This is an example of a regression result that
will be subject to ____.
a) Heteroscedasticity
b) Specification bias
c) Multicollinearity
d) Autocorrelation
310. What does the following plot of residuals (e) from a regression analysis suggest?

a) Heteroscedasticity
b) Specification bias
c) Multicollinearity
d) Autocorrelation
311. Coefficient of determination equal to 0.78 means that:
a) The model provides a poor fit
b) Most of the variation in y is unexplained by the regression equation
c) Model has problems with heteroscedasticity
d) Most of the variation in y is explained by the variation in the independent variables used in the mode
312. An analyst has set up a model explaining advertising expenditures (yt) with retail sales (xt) andprevious
year’s advertising (yt-1). The general form of the model is below. Suppose that retail sales increase by $1 in the
current year. What is the expected impact on advertising in the currentyear; what is the total effect on all current
and future advertising expenditures?
yt= β0+ β1Xt+ β2Xt-1 + εt
a) β2in the first period; βj/(1-y)
b) β1in the first period; β1/(1-β2)
c) β2+ β0in the first period; βj/(1-y)
d) β2* β1in the first period; β1/(1-β2)
313. In order to test the validity of a multiple regression model involving 4 independent variables and 25
observations, the t-statistic for assessing the significance of individual coefficients follows a Student’s t-
distributions with:
a) 20 degrees of freedom
b) 21 degrees of freedom
c) 24 degrees of freedom
d) 19 degrees of freedom
314. You were asked to interpret the coefficients from the following model:
log(price)i= 10.879 - 2.76 log(crime)i- 0.09593school_qualityi
Where:
log(price) is the natural logarithm of a price of a house in neighbourhood i
log(crime) is the natural logarithm of the number of violent crimes in neighbourhood i
School_quality is the ratio of the number of students to teacher in neighborhood i.
Which of the following interpretations is not true?
a) Other things kept constant, if log(crime) changes by 1, log(price) changes in the opposite direction by 2.7
b) Other things kept constant, when school_quality decreases by 1, price increases by 9%
c) Other things kept constant, if crime increases by 1%, price decreases by 2.76%
d) Other things kept constant, when school_quality decreases by 1, price increases by 9
315. What does the following plot of residuals (e) from a regression analysis suggest?

a) Specification bias
b) Autocorrelation
c) Multicollinearity
d) Heteroscedasticity
316. Among OLS assumptions there is no assumption about:
a) Serial correlation of residuals
b) Multicollinearity
c) Normally distributed error term
d) Significant coefficients
317. Suppose you want to estimate a model for electricity demand in Northern Europe. Electricity is often
used for heating and so its use depends among other things on temperature T (as it gets cold more electricity is
demanded). Additionally you know that the amount of electricity used over a week depends on the day of the
week, as people thend to use less electricity over the weekend (as big industry typically does not work on
Saturdays and Sundays). Therefore you want to include dummies for the day of the week: X1 equal to 1 for
Mondays and 0 otherwise, X2 equal to 1 for Tuesdays and 0 otherwise and so on for the rest of the week with X6
and X7 for Saturday and Sunday.
a. Which of the following suggested model would be best to measure the demand over time?
a) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6Tt+ ε
b) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + εt
c) Yt= β0+ β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + β7x7t + β8Tt+ εt
d) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + β7x7t + β8Tt+ εt
318. Which of the following is expected to occur in multiple regression analysis if an important variable is
omitted from the list of independent variables?
a) It will lead to unbiased least squares estimators
b) It will lead to problems with autocorrelation
c) It will lead to either overestimated or underestimated least square coefficients
d) It will lead to multicollinearity between independent variables
e) It will lead to general mayhem and bring on the end of the world or Trump presidency
319. The standard error of the estimate (se) for a multiple regression model with two explanatory variables
X1 and X2:
a) Measures the variation around the predicted regression equation
b) Measures the proportion of variation in Y that is explained by X1and X2
c) Measures the proportion of variation in Y that is explained by X1holding X2constant
d) Has the same sign as b1
320. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1, 2 and 3 below. You want to test the model for the
overall significance of coefficients; What is the value of the test statistic used for this purpose?
Table 1.

Model R R square Adjusted R square Std. error of the estimate


a
1 0.772 .596 .593 5876.79613
a. Predictors: (Constraint) : area, statio, rooms, crime
Table 2.

Model Sum of Squares Df Mean Square


1 Regressio 25522628047.570 4 6380657011.890
n
Residual 17302903098.890 501 34536732.732
Total 42825531146.451 505
Table 3. Coefficienta

Model Unstrandardardized Standardized t Sig.


coefficients Coefficients
B Std. Error Beta
1 (Constant) -4700.838 4048.906 -1.161 .0245
Crime -209.042 32.098 -.195 -6/513 .000
Rooms 7441.453 401.573 .568 18.531 .000
Teaching_quality -1046.638 132.737 -.246 -7.885 .000
Area 991.284 524.198 .054 1.891 .059
a. Dependent Variable: price
a) -1.161

b) 184.75

c) 6380657011.890

d) 5876.79613

321. In a multiple regression model described by the following equation:yi= β0+ β1Xi+ β2Xi2+ εi where β1is
the linear term coefficient and β2 is the coefficient of the squared term, the pattern as shown in figure 2 would
be best described with:

a) β1> 0 and β2> 0


b) β1> 0 and β2< 0
c) β1< 0 and β2> 0
d) β1< 0 and β2< 0
322. In order to test the validity of a multiple regression model involving 4 independent variables and 30
observations, the statistic for assessing the significance of an individual coefficient follows:
a) Student’s t-distribution with 26 degrees of freedom
b) Student’s t-distribution with 25 degrees of freedom
c) F-distribution with 25 degrees of freedom
d) Normal distribution with 26 degrees of freedom
323. In examining the determinants of income, data were collected regarding the characteristics of 45 adults,
and the regression
logY = β0+ β1logX1+ β2logX2+ β3X3+ ε
Was used, were log is the natural logarithm, Y is the annual income (in thousands of dollars), X1 is the adult’s
age, X2 is his/her years of education, and X3is the gender dummy variable which isequal to 1 for a female and
equal to 0 if the adult is male. You run the regression and obtain the equation
logY = 6.3+ 0.91logX1+ 1.3logX2- 0.05X3+ ε
How would you interpret the coefficient on years of education?
a) For every addition in his/her years of education, we would expect the income to increase on average by
$1,310, assuming that all the other independent variables in the model are held constant.
b) For every one percent increase in his/her years of education, we would expect the income to increase on
average by 1.3 percent, assuming that all the other independent variables in the model are held constant.
c) For every 1.3 percent increase in his/her years of education, we would expect the income to increase
approximately by $1,000, assuming that all the other independent variables in the model are held constant.
d) For every one percent increase in his/her years of education, we would expect the income to increase
approximately by $1,310, assuming that all the other independent variables in the model are held constant
324. What does the following plot of residuals (e) from a regression analysis suggest?

a) Specification bias
b) Autocorrelation
c) Multicollinearity
d) Heteroscedasticity
325. A real estate broker is interested in identifying the factors that determine the price of a house. She
wants to run the following regression:
Y = β0+ β1X1+ β2X2+ β3X3+ ε
Where Y = price of the house in $1,000, X1= number of bedrooms, X2= square footage of livingspace, and X3=
number of miles from the beach. Taking a sample of 30 houses, the broker runs amultiple regression and gets the
following results: Ŷ = 123.2 + 4.59X1+ 0.125X2- 6.04X3
With the estimates of the standard error of the slopes equal to:
sb0 = 103.2, sb1 = 2.13, sb2 = 0.062, sb3 = 4.17
i. Determine the price that an individual has to pay for a 3 bedroom, 1,000 square foothouse
that is located three miles away from the beach.
a) $201,422
b) $177,243
c) $243,850
d) 229,198
ii. What is the 95% confidence interval for β1?
a) 2.13 ± 4.17
b) 4.59 ± 4.38
c) 2.13 ± 4.38
d) 4.59 ± 4.17
326. You know that the true relationship between independent variables X1, X2, X3and the dependent
variable Y is described by a following regression
Y = β0+ β1X1+ β2X2+ β3X3+ εi
However, due to lack of data you decide to estimate slightly different model
Y = β0+ β1X1+ β2X2+ ε
What problems will this new, shorter model face? Choose the best answer.
a) Heteroscedasticity
b) Specification bias
c) Normally distributed error term
d) Significant coefficients
327. When autocorrelation might be a problem in a model?
a) When F-statistics is large
b) When Var(ei) is homoscedastic
c) When DW = 2
d) When the model is estimated over time, for e.g. using daily data
328. What would you conclude if you fail to reject H0: β1 = β2 = ... = βk = 0?
a) no relationship exists between the dependent variable and the independent variables
b) a strong relationship exists among the independent variables
c) Some of the independent variables are good predictors of the dependent variable
d) more information is needed to answer the question
329. A value of Durbin - Watson statistic is d = 3.99. This means:
a) The assumption of independence of errors is violated
b) The assumption of serial correlation of errors is not violated
c) The null hypothesis of autocorrelation cannot be rejected
d) There is a positive serial correlation
330. Which of the following is expected to occur in multiple regression analysis if an important variable is
omitted from the list of independent variables?
a) It will lead to unbiased least squared estimators
b) It will lead to biased least squares estimators
c) It will lead to over- or under- estimated estimators of the variance
d) None of the above answers is correct
331. In regression models, multicollinearity arises when the _____.
a) Dependent variables are highly correlated with one another
b) Independent variables are highly correlated with one another
c) Independent variables are highly correlated with the dependent variable
d) Error terms do not have the same variance
332. Student is interested in measuring the price of cars and has set um the following model:
price = constant + α1petrol + α2age + α3summer + α4autum + α5winter + εt
Where:
price - is the price paid for the car
petrol - is a dummy = 1 if car uses petrol and 0 otherwise
age - measures the age of the car in years
summer, autumn and winter - are quarterly dummies (spring is the reference season)
One of the checks that the students wants to run is to verify whether altogether the seasonal dummies are
significant. What are the hypotheses and what is the statistic that allows to answer that question?
a) H0: each of the testes dummies is significantly different from 0; F-test for an overal significance of the tested
model
b) H0: each of the seasonal dummies separately is different from 0; t-test for each coefficient
c) H0: a subset of coefficients in the tested model is different from 0; t-test for a subset of coefficients
d) H0: a subset of coefficients in the tested model is equal to 0; F-test for a subset of coefficients
333. Which of the following will lead to the least squares estimates being biased?
a) Heteroskedasticity
b) Autocorrelated error terms
c) Exclusion of a relevant variable
d) Multicollinearit
334. Which is the correct interpretation of the following model?
log(wage) = 0.584 + 0.083education + 0.02female
Where: wage is measured in dollars per weekeducation is measured in yearsfemale is a dummy variable equal to
1 for a woman and 0 for a man
a) Holding gender constant, if education increases by 1% wage will increase by 8.3%
b) Holding education constant, a female earns 0.02% more than a male
c) When comparing males and females with the same education, males are earning 2% less than females
d) For both females and males an extra year of education increases their salary by 0.083%
e) The effect of education on wages is different for both sexes
335. In a multiple regression analysis with 5 independent variables and a sample size 120, the degrees of
freedom for a test of the significance of an individual coefficient are equal to:
a) 115
b) 114
c) 120
d) Impossible to determine
336. Which of the following is true of the error term used in linear regression?
a) It represents the joint influence of all the dependent variables in the regression model
b) It represents the joint influence of factors, other than the dependent and independent variables, on the
regression model
c) It represents the joint influence of all the independent variables in the regression model
d) It represents the combined effect of the dependent, independent, and nonrepresented factors on the
regression model
337. If an analyst is regression individual independent variables on all other independent variables of a
regression model, he or she is testing for _____.
a) Specification bias
b) Sampling error
c) Heteroscedasticity
d) Multicollinearity
338. Kurt is trying to regress a model that studies market prices between similar products from different
companies. During his study he realized that one of the independent variables, the competitor’s price, was
causing multicollinearity issues with the model. If kurt removes the variable from the model, which of the
following statistical phenomenon could affect Kurt’s model?
a) Specification bias
b) Sampling error
c) Heteroscedasticity
d) Autocorrelation
339. In the following model:
yi= 2 + 2.5x1i - 8x2i
With sb1 = 2 and sb2 = 0.05 where the sample size is 15
a) Coefficient on x1 is significant at 1% level
b) Coefficient on x2 is statistically different from 0 only at 10 level
c) Coefficient on x2 is statistically different from 0 at any conventional significance level
d) None of the above is correct
340. A regression analysis with 5 independent variables and 120 observations has produced the following
results:
Residual sum of squares = 489
Total sum of squares = 700
What are the values of adjusted R squared and standard error of the estimate?
a) 28% and 4.27
b) 72% and 3.25
c) 28% and 4.07
d) 72% and 6.14
341. Total sum of squares:
a) The variation in Y explained by variation in X
b) The variation of observed Y values from the regression line
c) The variation of the Y values around their mean
d) The variation in the slope of regression lines from different possible sample
342. The plot of residuals can be used to test:
a) Multicollinearity
b) Autocorrelation
c) Specification bias
d) Autocorrelation and heteroscedasticity
343. An analyst has set up a model explaining advertising expenditures [y_t] with retail [x_t] and previous
year’s advertising [y_(t-1)] The general form of the model is below. Suppose that retail sales increase by $1 in the
current year. What is the expected impact on advertising in the current year; what is the total effect on all current
and future advertising expenditures?
yt= β0+ β1xt+ β2yt-1 + εt
a) β2in the first period; βj/(1-y)
b) β1*β2+β0in the first period; β1/(1-β2)
c) β1*β2+β0in the first period; βj/(1-y)
d) β1in the first period; β1/(1-β2)
344. In order to test the validity of a multiple regression model involving 5 independent variables, an
intercept and 50 observations, the statistic for assessing the significance of an individual coefficient fellows:
a) Student’s t-distribution with 45 degrees of freedom
b) Student’s t-distribution with 44 degrees of freedom
c) F-distribution with 44 degrees of freedom
d) Normal distribution with 50 degrees of freedom
345. A real estate broker is interested in identifying the factors that determine the price of a house.
She wants to run the following regression:
𝑌 = β0+ β1𝑋1+β2𝑋2+ β3𝑋3+ ε
Where Y = price of the house is $1,000s, X1 = number of bedrooms, X2 = square footage of
living space, and X3 = number of miles from the beach. Taking a sample of 30 houses, the
broker runs a multiple regression and gets the following result:
𝑌 = 123. 2 + 4. 59𝑋1+ 0. 125𝑋2− 7. 04𝑋3
With the estimates of the standard error of the slopes equal to:
𝑠𝑏0 = 103. 2 , 𝑠𝑏1 = 2. 13 , 𝑠𝑏2 = 0. 062 , 𝑠𝑏3 = 4. 17
i)Determine the price that an individual has to pay for 3 bedroom, 1,000 square foot house that is
located three miles away from the beach.
a) $201,422
b) $177,243
c) $243,850
d) $240,850
ii) What is the 95% confidence interval for ?β2
a) 0.062 +/- 4.17
b) 0.125 +/- 4.38
c) 0.125 +/- 0.127
d) 0.062 +/- 0.127
346. When Durbin-Watson statistic cannot be used?
a) When we are interested in second order autocorrelation.
b) When among the independent variables there is no lag of the dependent variable
c) When there is an intercept in the model
d) When we analyse data that are over time.
347. When multicollinearity might be a problem in a model?
a) When F-statistic for the overall significance is large
b) When Variance Inflation Factor is equal to 3
c) When Variance Inflation Factor is equal to 8
d) When correlation between the independent variable and the dependent variable is equal to 0.899
348. Katrin analyses a multiple regression model with 7 independent variables. What would she conclude if
she fails to reject the null hypothesis stating that all coefficients in the model are equal to 0.
a) A strong relationship exist among the independent variables
b) Some of the independent variables are good predictors of the dependent variable
c) No relationship exists between the dependent variable and the independent variables
d) More information is needed to answer the question
349. Arnar collected various characteristics of 348 houses in his city and used this dataset to set up a
multiple regression model with 4 independent variables. The t-statistic for size of the house (one of the
independent variables in the model) is equal to 0.89, What can he conclude?
a) Size of the house is significant at 0% but is not significant at 1% level (using a 2-tailed test)
b) Size of the house is significant at 10% but is not significant at 1% level (using a 1-tailed test)
c) Size of the house is not significant at any usual alpha levels
d) Not enough data to answer this question
350. Among OLS assumptions there is no assumption about:
a) Serial correlation of residuals
b) Significant coefficients
c) Multicollinearity
d) Normally distributed error term
351. If Durbin Watson statistic is equal to 2, then the correlation coefficient between the current error and
the previous period error is equal to:
a) 1
b) -1
c) 0
d) Impossible to tell
Explain: The Durbin-Watson statistic ranges from 0 to 4, with 2 being the ideal value representing no
autocorrelation.
Higher DW values signify negative autocorrelation, while lower values suggest positive autocorrelation.
A correlation coefficient of 1 and -1 represent perfect positive and negative correlations, respectively.
Since DW = 2 implies no autocorrelation, the correlation coefficient between the error terms must be 0, meaning
no linear relationship exists between them.
352. Arnar runs a model with a monthly interest rate as a dependent variable. His dataset covers 10 years.
How many dummies he needs to include in the model in order to check whether seasonality is present in data?
a) 10
b) 9
c) 12
d) 11
353. Which of the following tests is used to check the joint statistical significance of all the lagged variables
in the distributed lag model with additional day of the week dummy variables?
a) T-test
b) F-test for overall significance
c) F-test for subsample
d) Dickey-Fuller test
354. The regression: 𝑌 = β0+ β1𝑋1+β2𝑋2+ β3𝑋3+ β4𝑋4+ ε and Y=β0+ β1𝑋1+β2𝑋2+ ε Were run using a
sample of 30 observations. The SSE (Sum of Squared Errors) for the first regression is 289.4 and 382.32 for the
second regression. Test Ho: β3= β4= 0.
a) Reject H0 at α = 0.01
b) Reject H1 at α = 0.025
c) Reject H0 at α = 0.05
d) Fail to reject H0 at α < 0.10
355. If in a model, all the points on a scatter diagram lie on a straight line, what is the value of the Sum of
Squared Errors?
a) 0
b) Infinity ∞
c) 1
d) Cannot be determined
356. What is the value of coefficient of determination for the following case:
N N

∑ (Yi−Y̅ i ) =25 and ∑ (Yi−Ŷi )2=5


2

i=1 i=1
a) 0.2
b) 0.8
c) 0.5
d) Cannot be calculated - not enough information
357. What is the correct interpretation of the following model?
log(wage)i= 0.584 + 0.083educationi+ 0.02 female + 0.004(educationi*femalei)
Where:
Wage is measured in years
Education is measured in years
And female is a dummy variable equal to 1 for woman and 0 for man.
a) Holding gender constant, is the education increases by 1%, wage will increase by 8,3%
b) An extra year if education in increasing female salary by 8.7%
c) Holding education constant, a female earns 0.02% more than male
d) When comparing males and females with the same education, males are earning 2% less than females
e) For both females and males an extra year of education increases their salary by 0.083%
358. Suppose you were to run a regression of leisure travel expenditures by households on household
income. We would expect that households with low incomes do not travel much. High-income households may or
may not travel much, depending on the household’s preference for travel. The results for this regression will be
subject to ___.
a) Multicollinearity
b) Specification bias
c) Autocorrelation
d) Heteroscedasticity
359. In examining the determinants of income, data were collected regarding the characteristics of 45
adults, and the regression was used, where Y is the annual𝑌 = β + β1𝑋1+ β2𝑋2+ β3𝑋3+ ε was used, where Y is
the annual income (in thousand of dollars), X1 is the person’s age, X2 is his/her years of education, X3 and is a
dummy variable = 1 if the adult is female.
If you get when you run the regression, how would ŷ = 26. 3 + 1. 38 𝑥1+ 2. 98𝑥2− 0. 76𝑥3 you interpret the
coefficient on gender?
a) A woman earns 76% of a man’s earnings.
b) For each additional year of education, a woman earns $760 less than a man.
c) For each additional year in the age, a woman earns $760 less than a man.
d) On average, a woman earns $760 less than a man
e) On average, a women earns $76 less than a man
360. Variance Inflation Factor equal to 4
a) That there is serial correlation in the model
b) That the errors are autocorrelated of degree 1
c) That there is a problem with multicollinearity in the model
d) That there is no problem with multicollinearity in the model.
361. Suppose the following scatter plot shos the relationship between X and Y. How might you modelY?

a) With dummy variables


b) With the log normal specification
c) With dummy variables and interactive terms
d) With a correction for heteroskedasticity
362. A time series is a:
a) Set of measurements on a variable collected at the same time or approximately the same period of time.
b) Is a set of measurements, ordered over time, on a particular quantity of interest
c) Model that attempts to analyze the relationship between a dependent variable and one or more independent
variables.
d) Model that attempts to forecast the future value of a variable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy