AS Multiple Choices
AS Multiple Choices
A field researcher who studies lions conjectured that the more time a Cubs spins playing, the sooner the couple
begin to hunt. Observational data were collected from 20 lion cubs. The researcher recorded how long they
spent playing in the age when they began hunting. Because male and female lions have different hunting
behaviors, the research a record of the data for males and found female separately. The two scatterplot show the
data for the 10 female lions and 10 male lions. Based on the scatterplots, for which gender does there appear to
be evidence that the more time a lion cub spends playing, the sooner the lion cub is likely to begin hunting?
2. A tennis ball was thrown in the air. The height of the ball from the ground was recorded every millisecond from
the time the ball was thrown until it reached the height from which it was thrown. The correlation between the
time and a height was computed to be zero. What does this correlation should just about the relationship
between time and height?
4. Dairy farmers are aware there is often a linear relationship between the age, in years, of a dairy cow in the
amount of milk produced, and gallons per week. The least squares regression line produced from a random milk
sample is milk = 40.8-1.1(age). Based on the model, what is the difference in predicted amounts of milk
produced between a cow of 5 years in a cow of 10 years?
A) A cow of 5 years is predicted to produce 5.5 fewer gallons per week
B) A cow of 5 years is predicted to produce 5.5 more gallons per week
C) A cow of 5 years is predicted to produce 1.1 fewer gallons per weeK
D) A cow of 5 years is predicted to produce 1.1 more gallons per week
E) A cow of 5 years and a cow of 10 years are both predicted to produce 40.8 gallons per week
Explain:
Predicted amounts of milk produced between a cow of 5 years = 40.8 - 1.1 * 5 = 35.3
Predicted amounts of milk produced between a cow of 10 years = 40.8 - 1.1 * 10 = 29.8
We have: 35.3 – 29.8 = 5.5 => B
5. A scatterplot of students heights, in inches, versus corresponding arm span length, in inches, is shown below. One
of the points in the graph is labeled a period of 0.8 is removed which of the following statements would be true?
A) The slope of the least squares regression line is unchanged and the correlation coefficient increases.
B) The slope of the least squares regression line is unchanged and the correlation coefficient decreases.
C) The slope of the least squares regression line increases and the correlation coefficient increases.
D) The slope of the least squares regression line increases and the correlation coefficient decreases.
E) The slope of the least squares regression line decreases and the correlation coefficient increases.
6. At a large airport, data were recorded for one month on how many baggage items were unloaded from each
flight upon arrival as well as the time required to deliver all the baggage items on the flight to the baggage claim
area. A scatterplot of the two variables indicated a strong positive linear association between the variables.
Which of the following statements is a correct interpretation of the word strong in the description of the
association?
A) A least-squares model predicts that the more baggage items that are unloaded from a flight, the greater the
time required to deliver the items to the baggage claim area.
B) The actual time required to deliver all the items to the baggage claim area based on the number of items
unloaded will be very close to the time predicted by a least-squares model.
C) The time required to deliver an item to the baggage claim area is relatively constant, regardless of the
number of baggage items unloaded from a flight.
D) The variability in the time required to deliver all items to the baggage claim area is about the same for all
flights, regardless of the number of items unloaded from a flight.
E) The time required to unload baggage items from a flight is related to the time required to deliver the items
to the baggage claim area.
7. For random sample of 20 professional athletes, there is a strong, linear relationship between the number of
hours of exercise per week and the resting heart rate. For the athletes in the sample, those who exercise more
hours per week tend to have lower resting heart rate than those who exercise less. Which of the following is a
reasonable value for the correlation between the number of hours athletes exercise per week and they're resting
heart rate?
A) 0.71
B) 0.00
C) −0.14
D) −0.87
E) −1.00
Explain: Who exercise more hours per week tend to have lower resting heart rate than those who exercise
less => Negative linear relationship
Analyzing the answer choices:
A) 0.71: This is a positive and strong correlation, which aligns with the scenario.
B) 0.00: This indicates no correlation, which contradicts the given information.
C) -0.14: This is a weak negative correlation, not compatible with a strong positive relationship.
D) -0.87: This is a strong negative correlation, opposite to the stated positive relationship.
E) -1.00: This is the strongest negative correlation, inconsistent with the positive relationship described.
=> D
8. An exponential relationship exists between the explanatory variable and the responsible variable and a set of
data. The common logarithm of each value of the response of variable is taken, and the least square regression
line has an equation of log (Y) = 7.3-1.5 X. Which of the following is closest to the predictive value of the
response variables for X = 4.8?
A) 0.1
B) 0.68
C) 1.105
D) 1.26
E) 14.5
Explain: Log(Y) = 7.3-1.5*4.8 => Y=1.26
9. Which of the following is the best description of a positive association between two variables?
A) The values will create a line when graphed on a scatterplot.
B) The values will create a line with positive slope when graphed on a scatterplot.
C) As the value of one of the variables increases, the value of the other variable tends to decrease.
D) As the value of one of the variables increases, the value of the other variable tends to increase.
E) All values of both variables are positive.
10. For a specific species of fish in a pond, a wildlife biologist wants to build a regression equation to predict the
weight of a fish based on its length. The biologist collects a random sample of this species of fish and finds that
the lengths vary from 0.75 to 1.35 inches. The biologist uses the data from the sample to create a single linear
regression model. Would it be appropriate to use this model to predict the weight of a fish of this species that is
3 inches long?
A) Yes, because 3 inches falls above the maximum value of lengths in the sample.
B) Yes, because the regression equation is based on a random sample.
C) Yes, because the association between length and weight is positive.
D) No, because 3 inches falls above the maximum value of lengths in the sample.
E) No, because there may not be any 3-inch fish of this species in the pond.
11. which of the following statements about a least squares regression analysis is true?
1) appoint with a large residual is an outlier.
2) appoint with a hide leverage has Y value that is not consistent with the other Y values in the stat
3) The removal of an influential point from a data set could change the value of the correlation coefficient
A) 3 only
B) 2 only
C) 1 and 2 only
D) 1,2,3
E) 1 only
12. A road runner is a desert bird that tends to run instead of fly. While running, the Road Runner uses its tail to
balance. A sample of 10 road runners was taken and the birds total length and centimeters and tail length and
centimeters were recorded. The output shown in the table is from Elise squares regression to predict the tail
length and given the total length. Suppose A road runner has a total tail length of 59.0 cm and a tail length of
31.1 cm. Based on the residual, does the regression model over estimate or underestimate The tail length of the
Road Runner?
A) Underestimate, because the residual is positive.
B) Underestimate, because the residual is negative.
C) Overestimate, because the residual is positive.
D) Overestimate, because the residual is negative.
E) Neither, because the residual is 0.
13. The distributions of four variables are shown in the following histograms. Which of the following shapes is not
represented by one of the four distributions?
A) Normal distribution
B) Uniform distribution
C) Exponential distribution
D) Bimodal distribution
Explain: 1: Bimodal distribution; 2,3 : Exponential distribution; 4: Normal distribution
Uniform distribution:
14. A company determines the mean and standard deviation of the number of sick days taken by its employees in
one year. Which of the following is the best description of the standard deviation?
A) Approximately the mean distance between the number of sick days taken by individual employees and the
mean number of sick days taken by all employees.
B) Approximately the median distance between the number of sick days take by individual employees and the
median number of sick days take by all employees.
C) The distance between the greatest number of sick days taken by an employee and the mean number of sick
days taken by all employees.
D) The number of days separating the fewestest sick days taken and the most sick days taken when considering
all employees.
E) The number of days separating the fewestest sick days taken and the most sick days taken when considering
the middle 50 percent of the distribution.
Explain: Here's why the other options are not accurate:
B) The median is the middle value when the data is ordered from least to greatest. While it has some
connection to the distribution, it doesn't necessarily represent the average distance from the mean like
the standard deviation.
C) This only considers the highest value and the mean, not the overall spread of data.
D) This simply gives the range, the difference between the highest and lowest values, which doesn't
capture the central tendency or how spread out the data is.
E) This describes the interquartile range (IQR), which only focuses on the middle 50% of the data and
doesn't capture the entirety of the distribution like the standard deviation.
15. A graph (not shown) of the selling prices of homes in a certain city for the month of April reveals that the
distribution is skewed to the left. Which of the following statements is the most reasonable conclusion about the
selling prices based on the graph?
A) The mean is greater than the median.
B) The median is the average of the first quartile and the third quartile.
C) There are fewer selling prices between the first quartile and the median than there are between the median
and the third quartile.
D) There are more selling prices that are less than the mean than selling prices that are greater than the mean.
E) The value of maximum minus third quartile is less than the value of first quartile minus minimum.
Explain: A) smaller B) Only if the distribution is symetrical
16. A local real estate magazine used the median instead of the mean when it reported the SAT score of the average
student who attends Groveland High School. A graphical display of SAT scores of June to attend Groveland High
School indicated that the data were strongly skewed to the right. Which of the following examples why, in this
situation, the median is a more accurate indicator of the SAT score of the average student than the mean is?
A) The mean is affected by the skewness, whereas the median is not.
B) The median is always preferred statistics.
C) The mean will be less than the median when the data is strongly skewed to the right.
D) The mean should be used only when data are strongly skewed to the left.
E) The median is equal to one-half of the maximum and minimum SAT scores at Groveland High.
17. A marketing firm obtained random samples of 20 people in five regions of the country to investigate the level of
interest in a new product. People in the sample were asked to rate their level of interest on a scale from 1 to 10,
with 1 being the least amount of interest and 10 being the greatest. The histograms show the results for each
region. The graph for which region displays data for level of Interest with the least standard deviation?
A) Region A
B) Region B
C) Region C
D) Region D
E) Region E
18. At a local ice-cream store, 210 people were surveyed on whether they 4/4 preferred eating ice cream from a cone
or a cup. Of the 210 people surveyed, 70 were adults and 140 were children. Of the responses, 150 indicated the
cone as the preferred method of eating ice cream. For those surveyed, there was no association between age and
preferred method of eating ice cream. Which of the following tables shows the distribution of responses?
A) Table 1
B) Table 2
C) Table 3
D) Table 4
E) Table 5
19. As part of a science experiment, a student recorded 10 measurements of the temperature of a liquid. One of the
measurements was an outlier when compared with the other 9 measurements. Which of the following must be
true about the 9 measurements, excluding the outlier, when compared with the 10 measurements? (Note: An
outlier is any number that is greater than the upper quartile or less than the lower quartile by at least 1.5 times
the interquartile range.)
A) The median of the 9 measurements is less than the median of the 10 measurements.
B) The median of the 9 measurements is greater than the median of the 10 measurements.
C) The maximum of the 9 measurements is less than the maximum of the 10 measurements.
D) The maximum of the 9 measurements is greater than the maximum of the 10 measurements.
E) The standard deviation of the 9 measurements is less than the standard deviation of the 10 measurements.
Explain: Based on the formula of variance and standard deviation
∑( xi−x )
Variance: s2 =
n−1
20. A car rental agency has two locations in a city. The box plots below summarize the miles driven for one day if
single - day car rentals at each location. Based on the box plots, which statement provides the best comparison of
the two locations?
A) The number of single-day rentals is greater for location A than for location B.
B) The number of single-day rentals is less for location A than for location B.
C) Compared with location A, the miles driven for location B display more variability, and the median is greater
D) Compared with location A, the miles driven for location B display less variability, and the median is greater.
E) Compared with location A, the miles driven for location B display less variability, and the median is about the
same.
21. The histogram shows the distribution of heights, in inches, of 100 adult men. Based on the histogram, which of
the following is closest to the interquartile range, in inches, of the distribution?
A) 2
B) 5
C) 99
D) 12
E) 15
Explain: First quartile = ¼ * (n+1) = 101/4 = 25.25 -> 25.25 on the histogram is 66
Third quartile = ¾ * (n+1) = 303/4 = 75.75 -> 75.75 on the histogram is 71
71 – 66 = 5
22. Which of the following statements must be true about the data sets A and B displayed in histograms below?
A) 266.28
B) 779.42
C) 1008.02
D) 1083.38
E) 1311.98
27. The caffeine content of 8-ounce cans of a certain cola drink is approximately normally distributed with mean 33
milligrams (mg). A randomly selected 8-ounce can containing 35 mg of caffeine is 1.2 standard deviations above
the mean. Approximately what percent of 8-ounce cans of the cola have a caffeine content greater than 35 mg?
A) 1%
B) 8%
C) 12%
D) 16%
E) 99%
Explain:
Step 1 We find the standard deviation
A randomly selected 8-ounce can containing 35mg of caffeine is 1.2 standard devatons doove te mean
Mean = 33mg
35 = Mean + 1.2 (standard deviation)
35 = 33 + 1.2(standard deviation)
35 - 33 = 1.2 (standard deviation)
2 = 1.2 (standard deviation)
Standard deviation = 2/1.2 = 1.6666666667 = 1.67
Step 2
Using the z score formula
Z score =x - µ/ơ
x= 35mg u= mean = 33mg
ơ = standard deviation = 1.67 = 35 - 33/1.67 =1.1976
Probabilty value from Z-Table: P(x<35) = 0.88446
P(x>35) = 1 - P(x<35) =1-0.88446 =0.11554
Converting to percentage = 0.11554 × 100 = 11.554 = 12%
28. The weight of adult male grizzly bears living in the wild in the continental United States is approximately
normally distributed with a mean of 500 pounds and a standard deviation of 50 pounds. The weight of adult
female grizzly bears is approximately normally distributed with a mean of 300 pounds and a standard deviation
of 40 pounds. Approximately, what would be the weight of a female grizzly bear with the same standardized
score (z-score) as a male grizzly bear with a weight of 530 pounds?
A) 276 pounds
B) 324 pounds
C) 330 pounds
D) 340 pounds
E) 530 pounds
Explain:
Let the weight of male bears be X
Mean M = 500 and standard deviation ơM = 50
Z score for x = 530 is
Let weight of female bears be Y
Mean F = 300 and standard deviation = 40
Z score for Y = 530 (given)
Z score for x = 530 is 0.6
0.6 =
❑
❑
29. Gina's doctor told her that the standardized score (z- score) for her systolic blood pressure, as compared to the
blood pressure of other women her age, is 1.50. Which of the following is the best interpretation of this
standardized score?
A) Gina's systolic blood pressure is 150.
B) Gina's systolic blood pressure is 1.50 standard deviations above the average systolic blood pressure of
women her age.
C) Gina's systolic blood pressure is 1.50 above the average systolic blood pressure of women her age.
D) Gina's systolic blood pressure is 1.50 times the average systolic blood pressure for women her age.
E) Only 1.5% of women Gina's age have a higher systolic blood pressure than she does.
30. A candy company produces individually wrapped candies. The quality control manager for the company believes
that the weight of the candies is approximately normally distributed with mean 720 milligrams (mg).If the
manager's belief is correct, which of the following intervals of weights will contain the largest proportion of the
candies in the distribution of weights?
A) 740 mg to 780 mg
B) 700 mg to 740 mg
C) 680 mg to 720 mg
D) 660 mg to 700 mg
E) 620 mg to 660 mg
31. Shalise competed in a jigsaw puzzle competition where participants are timed on how long they take to
complete puzzles of various sizes. Shalise completed a small puzzle in 75 minutes and a large jigsaw puzzle in 140
minutes. For all participants, the distribution of completion time for the small puzzle was approximately normal
with mean 60 minutes and standard deviation 15 minutes. The distribution of completion time for the large
puzzle was approximately normal with mean 180 minutes and standard deviation 40 minutes. Approximately
what percent of the participants had finishing times greater than Shalise's for each puzzle?
A) 16% on the small puzzle and 16% on the large puzzle
B) 16% on the small puzzle and 84% on the large puzzle
C) 32% on the small puzzle and 68% on the large puzzle
D) 84% on the small puzzle and 84% on the large puzzle
E) 84% on the small puzzle and 16% on the large puzzle
Explain: According to the empirical rule, approximately 68% of the completion times are within 1 standard
deviation of the mean of 60 minutes for the smaller puzzle. By symmetry, 16% of the remaining completion
times are less than 45 minutes and 16% of the completion times are greater than 75 minutes. For the large
puzzle, the empirical rule guarantees that approximately 68% of the times will be within 1 standard deviation
of the mean of 180 minutes. By symetry, 16% of the remaining time are less than 140 minutes and 16% of
the times are greater than 220 minutes . Therefore 84% of the times will be greater than Shalise time of 140
minutes on the large puzzle
32. The distribution of heights of 6-year-old girls is approximately normally distributed with a mean of 46.0 inches
and a standard deviation of 2.7 inches. Aliyaah is 6 years old, and her height is 0.96 standard deviation above the
mean. Her friend Jayne is also 6 years old and is at the 93rd percentile of the height distribution. At what
percentile is Aliyaah's height, and how does her height compare to Jayne's height?
A) Aliyaah's height is at the 17th percentile of the distribution, and she is shorter than Jayne.
B) Aliyaah's height is at the 67th percentile of the distribution, and she is shorter than Jayne.
C) Aliyaah's height is at the 67th percentile of the distribution, and she is taller than Jayne.
D) Aliyaah's height is at the 83rd percentile of the distribution, and she is shorter than Jayne.
E) Aliyaah's height is at the 83rd percentile of the distribution, and she is taller than Jayne.
33. Which of the following is the correct order from least to greatest for the values of r, s, and t ?
A) r, s, t
B) r, t, s
C) s, t, r
D) t, r, s
E) t, s, r
34. A botanist found a correlation between the length of an Aspen Leaf and its surface area to be 0.94. Why does the
correlation value of 0.94 not necessarily indicate that a linear model is the most appropriate model for the
relationship between length of an Aspen Leaf and its surface area?
A) The value must be exactly 1 or to indicate a linear model is the most appropriate model.
B) The value must be 0 to indicate a linear model is the most appropriate model.
C) A causal relationship should be established first before determining the most appropriate model.
D) The value of 0.94 implies that only 88% of the data have a linear relationship.
E) Even with a correlation value of 0.94, it is possible that the relationship could still be better represented by a
nonlinear model.
35. A family would like to build a linear regression equation to predict the amount of grain harvested per acre of
land on their Farm. They subdivide their land into several smaller plots of land for testing and would like to select
an exploratory variable they can control. Which of the following is an appropriate exploratory variable that the
family could use to create a linear regression equation?
A) The total amount of rainfall recorded at their farm
B) The type of crop planted in the plot the previous year.
C) The average daily temperature at their farm.
D) The variety of grain planted in the plot.
E) The amount of fertilizer applied to each plot of land.
36. A market researcher asked a group of men and women to choose their favorite color design from a sample of
advertisements. The results are shown in the following table. Which of the following statements is not supported
by the table?
A) More men than women chose the color design red with black.
B) More women than men chose the color design yellow with black.
C) For men, the number who chose a design with black was greater than the number who chose a design with
blue.
D) The color design chosen by the most people was green with blue.
E) The total number of men surveyed by the market researcher was equal to the total number of women
surveyed by the market researcher.
37. The following segmented bar chart shows the number of flights that were either on time or delayed at three
different airports on one day. Which of the following statements is supported by the bar chart?
A) Airport T has the greatest percentage of on-time flights compared to the other two airports.
B) Airport R has the least percentage of on-time flights compared to the other two airports.
C) The number of on-time flights at Airport S is half the number of on-time flights at Airport T.
D) The number of on-time flights at Airport R is less than the number of on-time flights at Airport S.
E) The number of flights at Airport T is equal to the total number of flights at Airports R and S combined.
38. A survey of 57 students was conducted to determine whether or not they held jobs outside of school. The two-
way table above shows the number of students by employment status (job, no job), and class (juniors, seniors).
Which of the following best describes the relationship between employment status and class?
A) There appears to be no association, since the same number of juniors and seniors have jobs
B) There appears to be no association, since close to half of the students have jobs.
C) There appears to be an association, since there are more seniors than juniors in the survey.
D) There appears to be an association, since the proportion of juniors having jobs is much larger than the
proportion of seniors having jobs.
E) A measure of association cannot be determined from these data.
39. In northwest Pennsylvania, a zoologist recorded the ages, in months, of 55 bears and whether each bear was
male or female. The data are shown in the back-to-back stemplot below. Based on the stemplot, which of the
following statements is true?
A) The median age and the range of ages are both greater for female bears than for male bears.
B) The median age and the range of ages are both less for female bears than for male bears.
C) The median age is the same for female bears and male bears, and the range of ages is the same for female
bears and male bears.
D) The median age is less for female bears than for male bears, and the range of ages is greater for female bears
than for male bears.
E) The median age is greater for female bears than for male bears, and the range of ages is less for female bears
than for male bears.
40. A school is having a contest in which students guess the number of candies in a jar. The student whose guess is
closest to the correct number of candies in the jar wins a prize. The number of candies guessed by male and
female students is shown in the back-to-back stemplot below.
Which of the following statements is true about the distributions of guesses?
A) The distribution of guesses for male students is skewed to the left, and the distribution of guesses for female
students is skewed to the right.
B) The distribution of guesses for male students is skewed to the right, and the distribution of guesses for
female students is skewed to the left.
C) The distributions of guesses for male and female students are both skewed to the right.
D) The distributions of guesses for male and female students are both skewed to the left.
E) The distributions of guesses for male and female students are both symmetric.
41. Janelle collected data on the amount of time in minutes each person in a large sample of customers spent in a
local store. The data also included recording the gender of each customer. These data were used to generate the
boxplots shown below. Which of the following statements is true?
A) The range in the amount of time in minutes males in the sample of customers spent in the store is
approximately 40 minutes.
B) The mean amount of time in minutes males in the sample of customers spent in the store is approximately
20 minutes.
C) The third quartile of the amount of time in minutes males in the sample of customers spent in the store is
approximately 45 minutes.
D) The interquartile range of the amount of time in minutes females in the sample of customers spent in the
store is 15 minutes.
E) Approximately half of the males in the sample of customers spent at least as much time in the store as any
female in the sample of customers.
42. The boxplots above summarize two data sets, A and B. Which of the following must be true?
Set A contains more data than Set B.
The box of Set A contains more data than the box of Set B.
The data in Set A have a larger range than the data in Set B.
A) I only
B) III only
C) I and II only
D) II and III only
E) I, II, and III
43. The following bar chart displays the relative frequency of responses of students, by grade level, when asked, “Do
you volunteer in a community-service activity? Which of the following statements is not supported by the bar
chart?
A) More than 60% of both tenth-grade and eleventh-grade students responded yes.
B) Twelfth-grade students had the least percentage of students respond yes.
C) Less than 40% of tenth-grade students responded no.
D) The number of tenth-grade students who responded yes was greater than the number of ninth-grade
students who responded yes.
E) The percentage of eleventh-grade students who responded no was less than the percentage of ninth-grade
students who responded no.
44. In a standard golf tournament, golfers play 18 holes of golf on each of 4 consecutive days. For each hole, golfers
keep track of the number of times they hit the ball (strokes) before the ball goes into the cup. A golfer’s score for
the tournament is the total number of strokes needed to complete the tournament. The boxplots below
summarize the scores for golfers who competed in tournament 1 and golfers who competed in tournament 2.
Based on the boxplots, which of the following statements must be true?
A) The number of people surveyed at the more than four-year college level is greater than the number of
people surveyed at the high school level.
B) The proportion of people surveyed from the first quartile to the third quartile at the four-year college level is
less than the respective proportion at the community college level.
C) The interquartile range (IQR) for the number of visits at the community college level.
D) The maximum number of visits at the community college level is greater than the maximum number of visits
at the high school level.
E) The median number of visits at the four-year college level is greater than the median number of visits at the
high school level.
46. Nutritionists examined the sodium content of different brands of potato chips. Each brand was classified as
either healthy or regular based on how the chips were marketed to the public. The sodium contents, in
milligrams (mg) per serving, of the chips are summarized in the boxplots below. Based on the boxplots, which
statement gives a correct comparison between the two classifications of the sodium content of the chips?
A) The number of brands classified as healthy is greater than the number of brands classified as regular.
B) The interquartile range (IQR) of the brands classified as healthy is greater than the IQR of the brands
classified as regular.
C) The range of the brands classified as healthy is less than the range of the brands classified as regular.
D) The median of the brands classified as healthy is more than twice the median of the brands classified as
regular.
E) The brand with the least sodium content and the brand with the greatest sodium content are both classified
as healthy.
47. The director of a technical school was curious about whether there is a relationship between students who
complete one of the school's most popular health sciences certificate programs and whether those students go
on to complete more advanced studies in the health sciences within two years of completing the certificate
program. She randomly selected 100 students who completed the program. Data collected on these students are
shown in the table below. Which of the following statements is true for these 100 students?
A) Being a person who completed more advanced studies is more likely than being a person who did not complete
more advanced studies.
B) Being a person who completed the program is less likely than being a person who did not complete the program.
C) Being a person who completed the program and completed more advanced studies is less likely than being a
person who did not complete the program and did not complete more advanced studies.
D) Being a person who did not complete the program but completed more advanced studies is less likely than being
a person who completed the program and completed more advanced studies.
E) Being a person who completed the program but did not complete more advanced studies is more likely than
being a person who did not complete the program and did not complete more advanced studies.
48. The figure above summarizes the heights, in centimeters, of approximately 400 pine seedlings six years after they
were planted at a center for environmental study. Approximately half of the trees were fertilized yearly, and the
remaining trees were never fertilized. Which of the following statements about the medians and interquartile
ranges (IQRs) of the heights of the two groups of trees 6 years after being planted is true?
A) The medians and IQRs are the same for the unfertilized trees and the fertilized trees.
B) The median for the unfertilized trees is greater than the median for the fertilized trees, and the IQR is also
greater for the unfertilized trees.
C) The median for the unfertilized trees is the same as the median for the fertilized trees, and the IQR is greater
for the unfertilized trees.
D) The median for the unfertilized trees is less than the median for the fertilized trees, and the IQR is greater for
the unfertilized trees.
E) The median for the unfertilized trees is less than the median for the fertilized trees, and the IQR is less for
the unfertilized trees.
49. Grain moisture is a characteristic of grain that affects the price paid for the grain. A random sample of 28 loads of
corn was evaluated for moisture as a percent of the total weight. A different random sample of 28 loads of
soybeans was also evaluated for moisture. The data are displayed in the dotplots below. Based on the dotplots,
which of the following is greater for the percent moisture of corn than for the percent moisture of soybeans?
A) $40
B) $21
C) $10
D) $5
E) $3
52. A random sample of 25 households from the Mountainview School District was surveyed. In this survey, data
were collected on the age of the youngest child living in each household. The histogram below displays the data
collected in the survey. In which of the following intervals is the median of these data located?
A) 0.242
B) 0.401
C) 0.438 (=347/793)
D) 0.554
E) 0.605
55. A sample of 942 homeowners are classified, in the two-way frequency table below, by the number of credit cards
they have and the number of years they have owned their current homes. Of the homeowners in the sample
who have four or more credit cards, what proportion have owned their current homes for at least one year?
A) 78/212
B) 78/258
C) 78/942
D) 212/942
E) 258/942
56. As part of a study on the relationship between the use of tanning booths and the occurrence of skin cancer,
researchers reviewed the medical records of 1,436 people. The table below summarizes tanning booth use for
people in the study who did and did not have skin cancer. Of the people in the study who had skin cancer, what
fraction used a tanning booth?
A) 190/265
B) 190/896
C) 190/1,436
D) 265/1,436
E) 896/1,436
57. The following question(s) refer to the following scenario and set of data. In the 1830s, land surveyors began to
survey the land acquired in the Louisiana Purchase. Part of their task was to note the sizes of trees they
encountered in their surveying. The table of data below is for bur oak trees measured during the survey. An
outlier may be defined as a data point that is more than 1.5 times the interquartile range below the lower
quartile or is more than 1.5 times the interquartile range above the upper quartile. According to this definition,
what is the diameter, in inches, of the smallest tree that is an outlier?
A) 4
B) 28
C) 30
D) 34
E) 36
58. Which of the following describes a continuous variable?
A) The number of items sold at a craft booth for one day
B) The number of apps downloaded from a website one day
C) The diameters of the tree trunks at an evergreen farm
D) The number of baskets made by a basketball player
E) The shoe sizes of all shoes on sale at a department store
59. Professor James gave the same test to his three sections of statistics students. On the 35-question test, the
highest score was 32 and the lowest was 15. Based on the information displayed in the boxplots above, which of
the following statements is true?
A) The distribution is uniform, is centered at about 200 seconds, and has a range of at most 250 seconds.
B) The distribution is skewed to the left, is centered at about 125 seconds, and has a range of at most 250
seconds.
C) The distribution is skewed to the right, is centered at about 260 seconds, and has a range of at most 250
seconds.
D) The distribution displays two clusters, has a range of at most 200 seconds, and includes outliers below 75
seconds and above 325 seconds.
E) The distribution displays two clusters, with one cluster centered at about 125 seconds and the other
centered at about 260 seconds, and has a range of at most 250 seconds.
62. A group of students played a game in which they earned points for answering questions correctly. The following
dotplot shows the total number of points earned by each student. Which of the following is the best description
of the distribution of points earned?
A) Approximately normal
B) Bimodal without a gap
C) Bimodal with a gap
D) Skewed to the right without a gap
E) Skewed to the right with a gap
63. The prices, in thousands of dollars, of 304 homes recently sold in a city are summarized in the histogram below.
Based on the histogram, which of the following statements must be true?
A) The minimum price is $250,000.
B) The maximum price is $2,500,000.
C) The median price is not greater than $750,000.
D) The mean price is between $500,000 and $750,000.
E) The upper quartile of the prices is greater than $1,500,000.
64. For a sample of 42 rabbits, the mean weight is 5 pounds and the standard deviation of weights is 3 pounds.
Which of the following is most likely true about the weights for the rabbits in this sample?
A) The distribution of weights is approximately normal because the sample size is 42, and therefore the central
limit theorem applies.
B) The distribution of weights is approximately normal because the standard deviation is less than the mean.
C) The distribution of weights is skewed to the right because the least possible weight is within 2 standard
deviations of the mean.
D) The distribution of weights is skewed to the left because the least possible weight is within 2 standard
deviations of the mean.
E) The distribution of weights has a median that is greater than the mean.
65. The number of hurricanes reaching the East Coast of the United States was recorded for each of the last ten
decades by the National Hurricane Center. Summary measures are shown below.
Min = 12 Max = 24
Lower quartile = 15 Upper quartile = 18
Median = 16 n = 10
Which of the following statements is true?
A) The smallest observation is 12 and it is an outlier. No other observations in the data set could be outliers.
B) The largest observation is 24 and it is an outlier. No other observations in the data set could be outliers.
C) Both 12 and 24 are outliers. It is possible that there are also other outliers.
D) 12 is an outlier and it is possible that there are other outliers at the low end of the data set. There are no
outliers at the high end of the data set.
E) 24 is an outlier and it is possible that there are other outliers at the high end of the data set. There are no
outliers at the low end of the data set.
66. Data will be collected on the following variables. Which variable can be considered discrete?
A) The height of a person
B) The weight of a person
C) The length of a person’s arm span
D) The time it takes for a person to solve a puzzle
E) The number of books a person finished reading last month
67. Data on homes recently sold in a certain town included the area of the home, reported in square feet. The table
below shows summary statistics of the reported areas, in square feet. An auditor determined that an error was
made in the reported areas and that all of the areas should have been 100 square feet greater than what was
reported. The areas were corrected and new summary statistics were reported. What are the interquartile range
(IQR) and the standard deviation of the corrected areas?
A) No. The physics majors’ mean GPA for juniors and seniors must be 3.0, while the chemistry majors’ mean
GPA for juniors and seniors must be 3.3.
B) No. There is not enough information to determine the mean GPA for each major, but it must be higher for
chemistry majors than for physics majors.
C) Yes. It could happen. Whether it does happen depends on the number of juniors and seniors in each major.
D) Yes. It could happen. Whether it does happen depends on the variability of the GPAs within each of the four
groups of students.
E) Yes. It could happen. Whether it does happen depends on the shapes of the distributions of the GPAs for
each of the four groups of students.
69. Each value in a sample has been transformed by multiplying by 3 and then adding 10. If the original sample had a
variance of 4, what is the variance of the transformed sample?
A) 4
B) 12
C) 16
D) 22
E) 36
Explain:
When you apply a linear transformation of the form y = ax + b to a dataset, the variance of the transformed
data (y) is related to the variance of the original data (x) by the following formula:
Var(y) = a^2 * Var(x)
In this case, the transformation is to multiply by 3 and then add 10, which can be expressed as y = 3x + 10.
Therefore, a = 3 in the formula.
We are given that the original variance (Var(x)) is 4. Plugging this into the formula:
Var(y) = 3^2 * 4 = 9 * 4 = 36
Therefore, the variance of the transformed sample is 36.
70. A graduate student conducted a study of field mice in rural Kansas. The student obtained a sample of 100 field
mice and recorded the weight, in grams, of each mouse. After the measurements were taken, it was discovered
that the scale was not calibrated correctly. The student adjusted the 100 recorded measurements by subtracting
3 grams from each measurement. Which of the following statistics for the weight, in grams, of the field mice has
the same value before and after the adjustment?
A) The median
B) The mean
C) The first quartile
D) The third quartile
E) The interquartile range
71. A data set of test scores is being transformed by applying the following rule to each of the raw scores.
Transformed score = 3.5(raw score) + 6.2.
Which of the following is NOT true?
A) The mean transformed score equals 3.5(the mean raw score) + 6.2.
B) The median transformed score equals 3.5(the median raw score) + 6.2.
C) The range of the transformed scores equals 3.5(the range of the raw scores) + 6.2.
D) The standard deviation of the transformed scores equals 3.5(the standard deviation of the raw scores).
E) The IQR of the transformed scores equals 3.5(the IQR of the raw scores).
72. A local company is interested in supporting environmentally friendly initiatives such as carpooling among
employees. The company surveyed all of the 200 employees at the downtown offices. Employees responded as
to whether or not they own a car and to the location of the home where they live. The results are shown in the
table below. Which of the following statements about a randomly chosen person from these 200 employees is
true?
A) If the person owns a car, he or she is more likely to live elsewhere in the city than to live in the downtown
area in the city.
B) If the person does not own a car, he or she is more likely to live outside the city than to live in the city
(downtown area or elsewhere).
C) The person is more likely to own a car if he or she lives in the city (downtown area or elsewhere) than if he
or she lives outside the city.
D) The person is more likely to live in the downtown area in the city than elsewhere in the city.
E) The person is more likely to own a car than not to own a car.
73. One statistic calculated for pitchers in baseball is called the earned run average, or ERA. The following boxplots
summarize the ERA for pitchers in two leagues, A and B. Based on the boxplots, which of the following statistics
is the same for both leagues?
A) The range
B) The interquartile range
C) The median
D) The minimum
E) The maximum
74. Roger claims that the two statistics most likely to change greatly when an outlier is added to a small data set are
the mean and the median. Is Roger’s claim correct?
A) Yes, both the mean and median are likely to change greatly.
B) No, only the mean is likely to change greatly.
C) No, only the median is likely to change greatly.
D) No, neither the mean nor the median are likely to change greatly.
E) There is not enough information to determine if the mean or the median is likely to change greatly.
75. A child psychologist asked 100 five year olds and 50 ten year olds to name their favorite color. Their results are
shown in the following table. Which of the following statements is supported by the table?
A) The percentage of five year olds who selected red or blue as their favorite color is greater than the
percentage of ten year olds who selected red or blue as their favorite color.
B) The percentage of five year olds who selected yellow as their favorite color is greater than the percentage of
ten year olds who selected yellow as their favorite color.
C) The percentage of children who selected red, yellow, or blue as their favorite color was equal for both ages.
D) Less than half of the five year olds selected red, yellow, or blue as their favorite color.
E) Less than half of the ten year olds selected red, yellow, or blue as their favorite color.
76. The following data were collected from a random sample of people, who identified their favorite type of juice.
The results are shown in the following two-way table. What proportion of the children identified orange as their
favorite type of juice?
A) 400/1,000
B) 400/700
C) 400/2,000
D) 600/1,300
E) 1,000/2,000
77. The following frequency table shows the responses from a group of college students who were asked to choose
their favorite flavor of ice cream. Which of the following statements is not supported by the table?
A) The median of the earthquake disturbances is equal to the median of the mining disturbances.
B) The median of the earthquake disturbances is less than the median of the mining disturbances.
C) The range of the earthquake disturbances is equal to the range of the mining disturbances.
D) The range of the earthquake disturbances is less than the range of the mining disturbances.
E) The mode of the earthquake disturbances is equal to the mode of the mining disturbances.
84. An amusement park attraction has a sign that indicates that a person must be at least 48 inches tall to ride the
attraction. The following boxplot shows the heights of a sample of people who entered the amusement park on
one day. Based on the boxplot, approximately what percent of the people who entered the amusement park met
the height requirement for the attraction?
A) 25%
B) 48%
C) 50%
D) 75%
E) 100%
85. The following histogram summarizes the amount spent on plane tickets to travel home, in dollars, for a group of
30 college students. If the interval size is decreased from $200 to $100, which of the following must remain the
same on the new histogram?
A) The proportion of holes created for drumming is the same for all three siding types.
B) The proportion of holes created for drumming is greatest for grooved plywood.
C) The proportion of holes created for drumming is least for grooved plywood.
D) The number of holes created for drumming is least for grooved plywood.
E) The number of holes created for drumming is greatest for nonwood.
87. Resting heart rates, in beats per minute, were recorded for two samples of people. One sample was from people
in the age-group of 20 years to 30 years, and the other sample was from people in the age-group of 40 years to
50 years. The five-number summaries are shown in the table. The values 60, 62, and 84 were common to both
samples. The three values are identified as outliers with respect to the age-group 20 years to 30 years because
they are either 1.5 times the interquartile range (IQR) greater than the upper quartile or 1.5 times the IQR less
than the lower quartile. Using the same method for identifying outliers, which of the three values are identified
as outliers for the age-group 40 years to 50 years?
A) Of all the students who chose activity B, the greatest number of students were in grade 6.
B) Grade 7 and grade 8 had the same number of students who did not choose activity A.
C) The grade with the greatest percentage of students who chose activity C was grade 8.
D) For students in grade 7, the number who chose activity C was greater than the number who chose activity B.
E) For students in grade 8, the number who chose activity A was greater than the number who chose activity B.
90. An airline recorded the number of on-time arrivals for a sample of 100 flights each day. The boxplot below
summarizes the recorded data for one year. Based on the boxplot, which of the following statements must be
true?
A) The range of the number of on-time arrivals is greater than 90.
B) The interquartile range of the number of on-time arrivals is 22.
C) The number of days that had at least 80 on-time arrivals is greater than the number of days that had at most
76 on-time arrivals.
D) The number of days that had from 76 to 80 on-time arrivals is equal to the number of days that had at most
76 on-time arrivals.
E) The difference between the median and the lower quartile for the number of on-time arrivals is less than 2.
91. The pulse rate for each person in a sample of 20 men and 20 women was recorded. The boxplots below
summarize the pulse rates for the men and the women in the sample. Which of the following statements about
the people in the sample must be true?
A) There are more people between the first and third quartiles for women than there are between the first and
third quartiles for men.
B) The person with the lowest pulse rate is a woman.
C) At least half of the women had higher pulse rates than three-fourths of the men.
D) More than half of the men had lower pulse rates than three-fourths of the women.
E) If a man and a woman were randomly selected from the 40 people, the man would have the lower pulse
rate.
92. Data were collected on the amount, in dollars, that individual customers spent on dinner in an Italian restaurant.
The quartiles for these data are given below. Which of the following statements must be true for these
customers?
A) At least half of the customers spent less than or equal to $44.27 and at least half spent greater than or equal
to $44.27.
B) Seventy-five percent of the customers spent between $36.27 and $58.97.
C) Twenty-five percent of the customers spent less than or equal to $58.97 and the remaining 75 percent spent
greater than or equal to $58.97.
D) The mean amount spent by customers is $44.27.
E) A majority of customers spent $44.27.
93. The back-to-back stem-and-leaf plot below gives the percentage of students who dropped out of school at each
of the 49 high schools in a large metropolitan school district. Which of the following statements is NOT justified
by these data?
A) The drop-out rate decreased in each of the 49 schools between the 1989-90 and 1992-1993 school years.
B) For the school years shown, most students in the 49 schools did not drop out of high school.
C) In general drop-out rates decreased between the 1989-90 and 1992-1993 school years.
D) The median drop-out rate of the 49 high schools decreased between the 1989-90 and 1992-1993 school
years.
E) The spread between the schools with the lowest drop-out rates and those with the highest drop-out rates
did not change much between the 1989-90 and 1992-1993 school years.
94. The following histogram shows the ages, in years, of the people who attended a documentary at a movie theater.
Based on the histogram, which of the following statements best describes the relationship between the mean
and the median of the distribution of ages?
A) The mean and the median are equal in value because the distribution is symmetric.
B) The mean is most likely less than the median because the distribution is skewed to the right.
C) The mean is most likely less than the median because the distribution is skewed to the left.
D) The mean is most likely greater than the median because the distribution is skewed to the right.
E) The mean is most likely greater than the median because the distribution is skewed to the left.
95. Which of the following statistics is defined as the 50th percentile?
A) The mean
B) The median
C) The mode
D) The interquartile range
E) The standard deviation
96. The following list shows the selling prices of 8 houses in a certain town. What is the median selling price of the
houses in the list?
A) $263,200
B) $283,300
C) $288,450
D) $290,600
E) $293,400
97. Heights, in inches, for the 200 graduating seniors from Washington High School are summarized in the frequency
table below. Which of the following statements about the median height is true?
A) It is greater than or equal to 78 inches.
B) It is greater than or equal to 72 inches but less than 78 inches.
C) It is greater than or equal to 66 inches but less than 72 inches.
D) It is greater than or equal to 60 inches but less than 66 inches.
E) It is less than 60 inches.
98. A statistician at a metal manufacturing plant is sampling the thickness of metal plates. If an outlier occurs within
a particular sample, the statistician must check the configuration of the machine. The distribution of metal
thickness has mean 23.5 millimeters (mm) and standard deviation 1.4 mm. Based on the two-standard
deviations rule for outliers, of the following, which is the greatest thickness that would require the statistician to
check the configuration of the machine?
A) 19.3mm
B) 20.6mm
C) 22.1mm
D) 23.5mm
E) 24.9mm
99. For the three histograms above, which of the following correctly orders the histograms from the one with the
smallest proportion of data above its mean to the one with the largest proportion of data above its mean?
A) J, K, L
B) J, L, K
C) K, L, J
D) L, K, J
E) All three histograms
100. The number of siblings was recorded for each student of a group of 80 students. Some summary
statistics and a histogram displaying the results are shown below. An outlier is often defined as a number that is
more than 1.5 times the interquartile range below the first quartile or above the third quartile. Using the
definition of an outlier and the given information, which of the following can be concluded?
A) The median is greater than the mean, and the distribution has no outliers.
B) The median is greater than the mean, and the distribution has only one outlier.
C) The median is greater than the mean, and the distribution has two outliers.
D) The median is less than the mean, and the distribution has only one outlier.
E) The median is less than the mean, and the distribution has two outliers.
101. The following relative frequency table shows the political party affiliation for a sample of 500 people in a
certain town. Which of the following statements is supported by the table?
A) The marginal relative frequencies for the shooter and the goalie are equal.
B) The marginal relative frequencies for the shooter and the goalie are not equal.
C) The row totals are not equal.
D) For the goalie, the relative frequency of a direction is equal to the relative frequency conditioned on the
shooter’s direction.
E) For the goalie, the relative frequency of a direction is not equal to the relative frequency conditioned on the
shooter’s direction.
103. At a photography contest, entries are scored on a scale from 1 to 100. At a recent contest with 1,000
entries, a score of 68 was at the 77th percentile of the distribution of all the scores. Which of the following is the
best description of the 77th percentile of the distribution?
A) There were 770 entries with a score less than or equal to 68.
B) There were at least 230 entries with a score of 77.
C) There were 23% of the entries with a score less than or equal to 68.
D) There were 77% of the entries with a score equal to 68.
E) There were at least 77% of the entries with a score greater than 68.
104. The following table summarizes the number of pies sold at a booth one day at a local farmers market.
Which of the following statements is supported by the table?
A) More cherry pies were sold than any other type of pie.
B) Twice as many apple pies as key lime pies were sold.
C) More than half the pies sold were apple.
D) Fewer than 50 pies were sold at the booth that day.
E) The combined percentage of key lime pies sold and pumpkin pies sold was less than 50%.
105. A sample of 100 students from Liberty High School and a sample of 60 students from Central High School
were asked what they planned to do after graduation. Responses fell into five categories: four-year university
(4Y), community college (CC), join the workforce (W), join the military (M), or undecided (UD). The results are
shown in the following bar chart. Which of the following statements is supported by the bar chart?
A. For the category four-year university, the number of students from Central High School was 10 greater
than the number of students from Liberty High School.
B. At Liberty High School, more students selected a four-year university than any other activity.
C. For the category join the workforce, the number of students from each school was equal.
D. At Central High School, the same number of students selected four-year university and military.
E. For the category undecided, the number of students from Liberty High School was 4 greater than the
number of students from Central High School.
106. A random sample of 1,092 people were asked whether color was a consideration in buying a new car.
They were also asked to identify one additional feature that was important. The responses are shown in the
table. Which of the following is closest to the proportion of people who responded no to color consideration and
who identified safety as the additional feature that was important?
A) 0.18
B) 0.34
C) 0.36
D) 0.49
E) 0.51
107. The following bar chart shows the relative frequency of days of rain for 30 days in four regions of a
certain state. Which of the following statements is not supported by the bar chart?
A) Length
B) Type
C) Speed
D) Height
E) Drop
109. A company wanted to determine the health care costs of its employees. A sample of 25 employees were
interviewed and their medical expenses for the previous year were determined. Later the company discovered
that the highest medical expense in the sample was mistakenly recorded as 10 times the actual amount.
However, after correcting the error, the corrected amount was still greater than or equal to any other medical
expense in the sample. Which of the following sample statistics must have remained the same after the
correction was made?
A) Mean
B) Median
C) Mode
D) Range
E) Variance
110. Which of the following questions about cars in a school parking lot will allow for the collection of a set of
categorical data?
A) How many blue cars are in the lot?
B) What are the gas mileages, in miles per gallon, of the cars in the lot?
C) What are the weights, in pounds, of the cars in the lot?
D) What is the number of cars in the lot with out-of-state license plates?
E) What are the colors of the cars in the lot?
111. A veterinarian collected data on the weights of 1,000 cats and dogs treated at a veterinary clinic. The
weight of each animal was classified as either healthy, underweight, or overweight. The data are summarized in
the table. Based on the data in the table, which of the following is the most appropriate type of graph to visually
show whether a relationship exists between the type of animal and the weight classification?
A) Back-to-back stemplots
B) Scatterplot
C) Side-by-side boxplots
D) Segmented bar chart
E) Dotplot
112. Researchers conducted a telephone survey of 427 adults living in a large city. The adults were asked
whether they planned to purchase a smart watch in the next year. The table shows the responses categorized by
the region of the city in which the residents live. Which of the following graphical displays is most appropriate for
comparing the proportions of those surveyed who plan to purchase a smart watch within the four regions?
A) Skewed to the left (negatively skewed)
B) Skewed to the right (positively skewed)
C) Bimodal
D) Uniform
E) Approximately normal
113. New employees at a large corporation go through a training program during their first week of
employment. The new employees take a written assessment at the completion of the program to determine how
well prepared they are for their jobs. A score greater than the mean indicates a well-prepared employee. Assume
the following distributions of new employee scores have the same mean score, the same maximum score, and
the same minimum score. Which distribution has a shape that is most likely to represent the greatest percent of
well-prepared employees?
A) The distribution of scores is skewed to the right.
B) The distribution of scores is skewed to the left.
C) The distribution of scores is bimodal and symmetric.
D) The distribution of scores is uniform.
E) The distribution of scores is approximately normal.
114. The following table shows data that were collected from a random sample of people, who indicated their
age and their favorite sporting event to watch on television. Based on the results above, what proportion of the
randomly sampled people are over age 12 years?
A) 900/3,500
B) 1,300/3,500
C) 1,200/3,500
D) 2,300/3,500
E) 1,000/3,500
115. Research indicates that the standard deviation of typical human body temperature is 0.4 degree Celsius
(C). Which of the following represents the standard deviation of typical human body temperature in degrees
Fahrenheit (F), where F=9/5C+32
A) 9/5(0.4)+32
B) 9/5(0.4)
C) 9/5(0.4)2
D) (9/5)2(0.4)
E) (9/5)2(0.4)2
116. The following boxplot summarizes the heights of a sample of 100 trees growing on a tree farm. Emily
claims that a tree height of 43 inches is an outlier for the distribution. Based on the 1.5×IQRrule for outliers, is
there evidence to support the claim?
A) 300
B) 400
C) 480
D) 500
E) 800
121. Scientists estimate that the distribution of the life span of the Galápagos Islands giant tortoise is
approximately normal with mean 100 years and standard deviation 15 years. Based on the estimate, which of
the following is closest to the age of a Galápagos Islands giant tortoise at the 90th percentile of the distribution?
A) 80 years
B) 115 years
C) 120 years
D) 125 years
E) 130 years
Explain:
We are given that the life span of the Galápagos Islands giant tortoise follows a normal distribution with a
mean of 100 years and a standard deviation of 15 years.
The 90th percentile represents the age at which 90% of the tortoises are younger and 10% are older.
To find the age at the 90th percentile, we need to calculate the z-score corresponding to the 90th percentile
in the standard normal distribution. You can use a z-score table or calculator to find that the z-score for the
90th percentile is approximately 1.28.
Now, we can use the z-score and the known mean and standard deviation to find the age at the 90th
percentile in the actual distribution:
Age at 90th percentile = Mean + (z-score * Standard deviation)
Age = 100 years + (1.28 * 15 years)
Age ≈ 120 years
122. The distribution of the number of transactions per day at a certain automated teller machine (ATM) is
approximately normal with a mean of 80 transactions and a standard deviation of 10 transactions. Which of the
following represents the parameters of the distribution?
A) x¯=80;s=10
B) x¯=80;s2=10
C) x¯=80;σ=10
D) μ=80;σ=10
E) μ=80;s=10
123. At a small coffee shop, the distribution of the number of seconds it takes for a cashier to process an
order is approximately normal with mean 276 seconds and standard deviation 38 seconds. Which of the
following is closest to the proportion of orders that are processed in less than 240 seconds?
A) 0.17
B) 0.25
C) 0.36
D) 0.83
E) 0.95
124. A sleep time of 15.9 hours per day for a newborn baby is at the 10th percentile of the distribution of
sleep times for all newborn babies. Assuming the distribution is normal with standard deviation 0.5 hour,
approximately what is the mean sleep time, in hours per day, for newborn babies?
A) 15.1
B) 15.3
C) 16.3
D) 16.5
E) 16.7
Explain:
1. Identify the z-score: Since the given sleep time (15.9 hours) is at the 10th percentile, it corresponds to a z-score
of approximately -1.28 in the standard normal distribution. (A z-score table or calculator can be used to confirm
this).
2. Relate z-score to mean and standard deviation: In a normal distribution, the z-score represents the number of
standard deviations a specific value is away from the mean. Therefore:
z = (X - μ) / σ
where:
z is the z-score (-1.28)
X is the sleep time at the 10th percentile (15.9 hours)
μ is the mean sleep time for all newborn babies (unknown)
σ is the standard deviation (0.5 hours)
3. Solve for the mean: Rearranging the equation to solve for μ:
μ=X+z*σ
μ = 15.9 hours - (-1.28) * 0.5 hours
μ ≈ 16.54 hours
125. For a recent season in college football, the total number of rushing yards for that season is recorded for
each running back. The mean number of rushing yards for the running backs that season is 790 yards. One
running back had 1,637 rushing yards for the season, which is 2.42 standard deviations above the mean number
of rushing yards. What is the standard deviation of the number of rushing yards for the running backs that
season?
A) 250 yards
B) 300 yards
C) 350 yards
D) 400 yards
E) 450 yards
Explain:
1. Identify the z-score: The running back with 1,637 yards is 2.42 standard deviations above the mean. This means
their rushing yards have a z-score of 2.42.
2. Relate z-score to mean and standard deviation: We know the z-score formula:
z = (X - μ) / σ
where:
z is the z-score (2.42)
X is the rushing yards of the outlier (1,637 yards)
μ is the mean number of rushing yards (790 yards)
σ is the standard deviation (unknown)
3. Solve for the standard deviation: We can rearrange the formula to solve for σ:
σ = (X - μ) / z
σ = (1,637 yards - 790 yards) / 2.42
σ ≈ 350 yards
126. Students in a large psychology class measured the time, in seconds, it took each of them to perform a
certain task. The times were later converted to minutes. If a student had a standardized score of z = 1.72 before
the conversion, what is the standardized score for the student after the conversion?
A) z = 0.26
B) z = 1.03
C) z = 1.72
D) z = 1.98
E) The standardized score for the student after the conversion cannot be determined.
127. Height, in meters, is measured for each person in a sample. After the data are collected, all the height
measurements are converted from meters to centimeters by multiplying each measurement by 100. Which of
the following statistics will remain the same for both units of measure?
A) The mean of the height measurements
B) The median of the height measurements
C) The standard deviation of the height measurements
D) The maximum of the height measurements
E) The z-scores of the height measurements
128. A distribution of test scores is not symmetric. Which of the following is the best estimate of the z-score
of the third quartile?
A) 0.67
B) 0.75
C) 1.00
D) 1.41
E) This z-score cannot be estimated from the information given.
129. A certain type of remote-control car has a fully charged battery at the time of purchase. The distribution
of running times of cars of this type, before they require recharging of the battery for the first time after its
period of initial use, is approximately normal with a mean of 80 minutes and a standard deviation of 2.5 minutes.
The shaded area in the figure below represents which of the following probabilities?
A) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 75 minutes and 82.5 minutes.
B) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 75 minutes and 85 minutes.
C) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 82.5 minutes.
D) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 85 minutes.
E) The probability that the running time of a randomly selected car of this type, before it requires recharging of
the battery for the first time after its period of initial use, is between 77.5 minutes and 87.5 minutes.
130. The height of 3-year-old boys is approximately normally distributed. Duncan and Shane are 3-year-old
boys.Duncan is 32.0 inches tall and is at the 32nd percentile of the distribution. Shane is 34.0 inches tall and is at
the 62nd percentile of the distribution. Which of the following is closest to the mean of the height distribution?
A) 32.50 inches
B) 32.79 inches
C) 33.00 inches
D) 33.21 inches
E) 36.53 inches
131. The distribution of monthly rent for one-bedroom apartments in a city is approximately normal with
mean $936 and standard deviation $61. A graduate student is looking for a one-bedroom apartment and wants
to pay no more than $800 in monthly rent. Of the following, which is the best estimate of the percent of one-
bedroom apartments in the city with a monthly rent of at most $800 ?
A) 1.3%
B) 2.5%
C) 50%
D) 95%
E) 97.5%
Explain:
1. Calculate the z-score of the desired rent:
z = (Target rent - Mean) / Standard deviation
z = ($800 - $936) / $61 = -2.23
2. Find the area below the z-score in a standard normal distribution:
Using a standard normal table or calculator, look up the cumulative area (probability) below a z-score of -
2.23. This represents the proportion of apartments with rent below $800.
3. Interpret the result:
The table or calculator will most likely give you a value around 0.1217, which means 12.17% of the
apartments have rent at or below $800.
Therefore, the best estimate for the percentage of apartments with rent at most $800 is:
A) 1.3%
132. Based on findings from a recent study on women's health, researchers created a 90 percent confidence
interval of (0.42, 0.48) to estimate the percent of all women who do not find time to focus on their own health.
Based on the confidence interval, which of the following claims is not supported?
A) Less than half of all women do not find time to focus on their own health.
B) More than 40 percent of all women do not find time to focus on their own health.
C) Approximately 45 percent of all women do not find time to focus on their own health.
D) More than 45 percent of all women do not find time to focus on their own health.
E) More than 25 percent of all women do not find time to focus on their own health.
Explain: It provides a range of values within which the true percentage is likely to fall with a 90% level of
confidence based on the Sample data.
Based on the confidence interval of (0.42, 0.48), the claim that is not supported is that less than 42% of all
women do not find time to focus on their own health. This is because the lower bound of the interval is 0.42.
indicating that at worst. 42% of all women do not find time to focus on their own health. Therefore, any
claim that suggests a lower percentage is not supported by the confidence interval.
On the other hand, the confidence interval does support claims that suggest the percentage of women who
do not find time to focus on their own health is at least 42% and no more than 48%. For example, a claim
that states "between 42% and 48% of all women do not find time to focus on their own health" is supported
by the confidence interval. It is important to note that the confidence interval does not tell us anything
about the actual percentage of women who do not find time to focus on their own health. Rather, it provides
a range of values within which the true percentage Is likely to fall with a 90% level of confidence based on
the samale dala. To know more about Sample data.
133. A recent survey estimated that 19 percent of all people living in a certain region regularly use sunscreen
when going outdoors. The margin of error for the estimate was 1 percentage point. Based on the estimate and
the margin of error, which of the following is an appropriate conclusion?
A) Approximately 1% of all the people living in the region were surveyed.
B) Between 18% and 20% of all the people living in the region were surveyed.
C) All possible samples of the same size will result in between 18% and 20% of those surveyed indicating they
regularly use sunscreen.
D) The probability is 0.01 that a person living in the region will use sunscreen when going outdoors.
E) It is plausible that the percent of all people living in the region who regularly use sunscreen is 18.5%.
134. In a large school district, 16 of 85 randomly selected high school seniors play a varsity sport. In the same
district, 19 of 67 randomly selected high school juniors play a varsity sport. A 95 percent confidence interval for
the difference between the proportion of high school seniors who play a varsity sport in the school district and
high school juniors who play a varsity sport in the school district is to be calculated. What is the standard error of
the difference?
A) 0.0347
B) 0.0695
C) 0.1362
D) 0.9800
E) 1.6900
Explain:
To calculate the standard error of the difference between the proportions of high school seniors and juniors
who play a varsity sport, we can use the following formula:
√
SE = s 2∗(
1 1
+ )
n1 n 2
2
S=
n 1∗p 1∗( 1−p 1 ) +n 2∗p 2∗( 1− p 2 ) 85∗
n 1+ n 2−2
=
16
85 ( )(
∗ 1−
16
85
+67∗
19
67)∗ 1− ( )(
19
67
=0.0121
)
85+67−2
135.
√ 1 1
SE = 0.0121∗( + ) = 0.0695
85 67
Environmentalists want to estimate the percent of trees in a large forest that are infested with a certain
beetle. The environmentalists will select a random sample of trees to inspect. Which of the following is the most
appropriate method for creating such an estimate?
a. A two-sample z-interval for a population proportion
b. A one-sample interval for a sample proportion
c. A one-sample z-interval for a population proportion
d. A two-sample z interval for a difference between sample proportions
e. A two sample 2-interval for a difference between population proportions
136. A random sample of residents in city J were surveyed about whether they supported raising taxes to
increase bus service for the city. From the results, a 95 percent confidence interval was constructed to estimate
the proportion of people in the city who support the increase. The interval was (0.46, 0.52). Based on the
confidence interval, which of the following claims is supported?
A) More than 90 percent of the residents support the increase.
B) Fewer than 10 percent of the residents support the increase.
C) More than 40 percent of the residents support the increase.
D) More than 60 percent of the residents support the increase.
E) Fewer than 25 percent of the residents support the increase.
Explain:
Claims supported:
C) More than 40 percent of the residents support the increase: The lower bound of the confidence interval is
0.46, which is already above 40%, so this claim is true.
(Bonus) None of the residents support the increase and none oppose it: While not explicitly stated, a confidence
interval of (0.46, 0.52) encompasses the possibility of 50% support, meaning exactly half the residents could fall
for both support and opposition, resulting in 0% for each individually. However, this interpretation is less
common and usually requires further clarification.
Claims not supported:
A) More than 90 percent of the residents support the increase: The upper bound of the confidence interval is
0.52, which is far below 90%, so this claim is definitely false.
B) Fewer than 10 percent of the residents support the increase: The lower bound of the confidence interval is
0.46, significantly higher than 10%, so this claim is also false.
D) More than 60 percent of the residents support the increase: The upper bound of the confidence interval is
0.52, which is below 60%, so this claim is not supported.
E) Fewer than 25 percent of the residents support the increase: Similar to option B, the lower bound of the
confidence interval (0.46) is already above 25%, making this claim unsupported.
Therefore, the only claim definitively supported by the confidence interval is C) More than 40 percent of the
residents support the increase.
Remember, a confidence interval provides a range for the estimated proportion with a certain level of
certainty (95% in this case). It doesn't tell us the exact value, but it helps us rule out some values as
statistically unlikely based on the sample data.
137. A local arts council has 200 members. The council president wanted to estimate the percent of its
members who have had experience in writing grants. The president randomly selected 30 members and surveyed
the selected members on their grant-writing experience. Of the 30 selected members, 12 indicated that they did
have the experience. Have the conditions for inference with a one-sample z-interval been met?
A) Yes, all conditions for inference have been met.
B) No, because the sample size is not large enough to satisfy the conditions for normality.
C) No, because the sample was not selected at random.
D) No, because the sample size is not less than 10 percent of the population size.
E) No, because the sample is not representative of the population.
Explain: Calculate the ratio of randomly selected members and total number of council members
n 30 n
= =¿ =0.15> 0.1
N 200 N
138. In 2009 a survey of Internet usage found that 79 percent of adults age 18 years and older in the United
States use the Internet. A broadband company believes that the percent is greater now than it was in 2009 and
will conduct a survey. The company plans to construct a 98 percent confidence interval to estimate the current
percent and wants the margin of error to be no more than 2.5 percentage points. Assuming that at least 79
percent of adults use the Internet, which of the following should be used to find the sample size (n) needed?
0.5
A) 1.96 √ ≤ 0.025
n
( 0.5 )∗(0.5)
B) 1.96 √ ≤ 0.025
n
( 0.5 )∗(0.5)
C) 2.33 √ ≤ 0.05
n
( 0.79 )∗(0.21)
D) 2.33 √ ≤ 0.025
n
( 0.79 )∗(0.21)
E) 2.33 √ ≤ 0.05
n
Explain: At least => ≤
( p )∗(1−p)
Formula: Zα/2√ ≤ε
n
139. Consider a 90 percent confidence interval to estimate a population proportion that is constructed from a
sample proportion of 66 percent. If the width of the interval is 10 percent, what is the margin of error?
A) 2.5 percent
B) 10 percent
C) 20 percent
D) 45 percent
E) 5 percent
Explain: Using confidence intervals concepts, it is found that the margin of error is of 5%
The width of a CT is twice it’s margin of error
140. A commercial for a breakfast cereal is shown during a certain television program. The manufacturer of
the cereal wants to estimate the percent of television viewers who watch the program. The manufacturer wants
the estimate to have a margin of error of at most 0.02 at a level of 95 percent confidence. Of the following, which
is the smallest sample size that will satisfy the manufacturer's requirements?
A) 40
B) 50
C) 100
D) 1,700
E) 2,500
Explain:
Margin of error ε = 0.02
At a level of 95 percent confidence => Z_value = 1.96
Sample size = Z2 p(1-p)/ ε 2
P is the proportion of population (generally taken as 0.5)
141. Paul will select a random sample of students to create a 95 percent confidence interval to estimate the
proportion of students at his college who have a tattoo. Of the following, which is the smallest sample size that
will result in a margin of error of no more than 5 percentage points?
A) 73
B) 97
C) 271
D) 385
E) 1,537
Explain: Similar to above question
142. Consider a 90 percent confidence interval for a population proportion p. Which of the following is a
correct interpretation of the confidence level 90 percent?
A) The probability that the true difference in population proportions falls within the bounds of the confidence
interval is 0.90.
B) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the sample proportions will fall within the bounds of the confidence interval.
C) If the sampling process is repeated 10 times, 9 intervals will capture the true difference between the
population proportions and 1 interval will not.
D) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the confidence intervals constructed will capture the true difference between the population proportions.
E) For repeated random sampling from the populations with samples of the same size, approximately 90% of
the confidence intervals constructed will capture the sample difference between the population proportions.
Explain:
What is the confidence interval? The confidence interval is the range of values that you expect your estimate
to fall between a certain percentage of the time if you run your experiment again or re-sample the
population in the same way. This means that if we were to take many random samples from the same
population and compute a 90% confidence interval for each sample, we would expect that about 90% of
these intervals would contain the true population proportion. Option B is incorrect because it refers to the
sample praportions. not the population proportion
Option is incorrect because it implies that there is a probability that the population proportion falls within
the interval, which is a common misconception. The population proportion is fixed and is either in the
interval or not. The interval is a range of plausible values for the population proportion based on the sample
data and the level of confidence chosen.
Option is incorrect because it implies that 90% of the confidence intervals contain the population proportion,
which is not necessarily true. We can only say that about 90% of the intervals will contain the population
proportion if we repeat the sampling and interval construction process many times.
Hence, The correct interpretation of a 90% confidence interval of a proportion is option D: "Ninety percent
of the population proportions will fall within the limits of the confidence interval."
143. USA Today reported that speed skater Bonnie Blair had "won the USA's heart," according to a USA
Today/CNN/Gallup poll conducted on the final Thursday of the 1994 Winter Olympics. When asked who was the
hero of the Olympics, 65 percent of the respondents chose Blair, who won five gold medals. The poll of 615
adults, done by telephone, had a margin of error of 4 percent. Which of the following statements best describes
what is meant by the 4 percent margin of error?
A) About 4 percent of adults were expected to change their minds between the time of the poll and its
publication in USA Today.
B) About 4 percent of the adults sampled are not representative of the population.
C) About 4 percent of the 615 adults polled refused to answer the question.
D) The difference between the sample percentage and the population percentage is likely to be less than 4
percent.
144. A polling agency reported that 66 percent of adults living in the United States were satisfied with their
health care plans. The estimate was taken from a random sample of 1,542 adults living in the United States, and
the 95 percent confidence interval for the population proportion was calculated as (0.636, 0.684). Which of the
following statements is a correct interpretation of the 95 percent confidence level?
A) The probability is 0.95 that the percent of adults living in the United States who are satisfied with their
health care plans is between 63.6% and 68.4%.
B) Approximately 95% of random samples of the same size from the population will result in a confidence
interval that includes the proportion of all adults living in the United States who are satisfied with their
health care plans.
C) Approximately 95% of random samples of the same size from the population will result in a confidence
interval that includes the proportion of all adults in the sample who are satisfied with their health care plans.
D) Approximately 95% of all random samples of adults living in the United States will indicate that between
63.6% and 68.4% of the adults are satisfied with their health care plans.
E) Approximately 95% of all random samples of adults living in the United States will result in a sample
proportion of 0.66 adults living in the United States who are satisfied with their health care plans.
145. A 90 percent confidence interval is to be created to estimate the proportion of television viewers in a
certain area who favor moving the broadcast of the late weeknight news to an hour earlier than it is currently.
Initially, the confidence interval will be created using a simple random sample of 9,000 viewers in the area.
Assuming that the sample proportion does not change, what would be the relationship between the width of the
original confidence interval and the width of a second 90 percent confidence interval that is created based on a
sample of only 1,000 viewers in the area?
A) The second confidence interval would be 9 times as wide as the original confidence interval.
B) The second confidence interval would be 3 times as wide as the original confidence interval.
C) The width of the second confidence interval would be equal to eh width of the original confidence interval.
D) The second confidence interval would be times as wide as the original confidence interval.
E) The second confidence interval would be times as wide as the original confidence interval.
146. A marketing company wants to estimate the proportion of consumers in a certain region of the country who
would react favorably to a new marketing campaign. Further, the company wants the estimate to have a margin
of error of no more than 5 percent with 90 percent confidence. Of the following, which is closest to the
minimum number of consumers needed to obtain the estimate with the desired precision?
A) 136
B) 271
C) 385
D) 542
E) 769
147. An environmental group wanted to estimate the proportion of fresh produce sales identified as organic in a
local grocery store. In the winter, the group obtained a random sample of sales from the store and used the data
to construct the 95 percent z-interval for a proportion (0.087, 0.133 ). Six months later in the summer, the group
obtained a second random sample of sales from the store. The second sample was the same size as the first, and
the proportion of sales identified as organic was 0.4. How does the 95 percent z-interval for a proportion
constructed from the summer sample compare to the winter interval?
A) The summer interval is wider and has a lesser point estimate.
B) The summer interval is wider and has a greater point estimate.
C) The summer interval is narrower and has a lesser point estimate.
D) The summer interval is narrower and has a greater point estimate.
E) The summer interval is the same width and has a greater point estimate.
Explain:
The new 95 percent z-interval for summer will also be a similar width to the winter one because they're both
based on the same confidence ievel and sample size, but the whole interval will be shifted to the right due to
the Increased observed proportion of organic sales.
To compare the two 95 percent z-intervals for organic produce sales, we need to understand that these
intervals represent the range in which we are 95 percent confident the real proportion of organic sales falls.
In winter, this was between 0.087 and 0.133. In summer, based on the observed proportion of 0.4, without
the actual calculation, we can expect that the z-interval would shift to the right on the number line as the
proportion increased significantly, yet it also depends on the standard error of proportions that occur due to
sample size and standard deviation. Bear in mind that a confidence interval is a range of values between
which an unknown population parameter is likely to be located. These intervals are more about the
confidence level and less about the actual data, meaning that the same level of confidence will produce
similar intervals in terms of width if the sample sizes are the same, but centered on the observed
proportions. Thus. 95 percent z-intervals can be interpreted as the range in which we would expect the true
proportion to fall 95 percent of the time if we took repeated samples and computed the interval each time
Relevantly, It's also important to know about the empirical rule (66-95-99.7 rule) which describes the spread
of data in a normal distribution. But in the case of relative proportions. we're using z-scores for constructing
confidence intervals instead
148. The management team of a company with 10,000 employees is considering installing charging stations for
electric cars in the company parking lots. In a random sample of 500 employees, 15 reported owning an electric
car. Which of the following is a 99 percent confidence interval for the proportion of all employees at the
company who own an electric car?
(0.03)(0.097)
A) 0.03 ± 2.326 √
500
(0.03)(0.097)
B) 0.03 ± 2.576 √
500
(0.03)(0.097)
C) 0.15 ± 2.326 √
500
(0.03)(0.097)
D) 0.15 ± 2.675 √
500
p(1− p)
Explain: Formula : p ± zα/2√
n
149. A 99 percent one-sample z-interval for a proportion will be created from the point estimate obtained from
each of two random samples selected from the same population: sample R and sample S. Let R represent a
random sample of size 1,000, and let S represent a random sample of size 4,000. If the point estimate obtained
from R is equal to the point estimate obtained from S, which of the following must be true about the respective
margins of error constructed from those samples?
A) The margin of error for S will be 4 times the margin of error for R.
B) The margin of error for S will be 2 times the margin of error for R.
C) The margin of error for S will be equal to the margin of error for R.
D) The margin of error for R will be 4 times the margin of error for S.
E) The margin of error for R will be 2 times the margin of error for S.
Explain: ε = z/√ n
150. A random sample of 432 voters revealed that 100 are in favor of a certain bond issue. A 95 percent
confidence interval for the proportion of the population of voters who are in favor of the bond issue is
(0.5)(0.5)
A) 100 ± 1.96 √
432
(0.5)(0.5)
B) 100 ± 1.645 √
432
(0.231)(0.769)
C) 100 ± 1.96 √
432
(0.231)(0.769)
D) 0.231 ± 1.96 √
432
(0.231)(0.769)
E) 0.231 ± 1.645 √
432
151. A news organization conducted a survey about preferred methods for obtaining the news. A random sample
of 1,605 adults living in a certain state was selected, and 16.2 percent of the adults in the sample reported that
television was their preferred method. Which of the following is an appropriate margin of error for a 90 percent
confidence interval to estimate the population proportion of all adults living in the state who would report that
television is their preferred method for obtaining the news?
(0.162)(1−0.162)
A) 1.645 √
1,605
(0.5)(1−0.5)
B) 1.645 √
1,605
(0.162)(1−0.162)
C) 1.96 √
1,605
(0.5)(1−0.5)
D) 1.96 √
1,605
(0.162)(1−0.162)
E) 1.83 √
1,605
152. A survey was conducted to determine what percentage of college seniors would have chosen to attend a
different college if they had known then what they know now. In a random sample of 100 seniors, 34 percent
indicated that they would have attended a different college. A 90 percent confidence interval for the percentage
of all seniors who would have attended a different college is
A) 24.7% to 43.3%
B) 25.8% to 42.2%
C) 26.2% to 41.8%
D) 30.6% to 37.4%
E) 31.2% to 36.8%
p(1− p)
p ± zα/2√
n
153. On the day before an election in a large city, each person in a random sample of 1,000 likely voters is asked
which candidate he or she plans to vote for. Of the people in the sample, 55 percent say they will vote for
candidate Taylor. A margin of error of 3 percentage points is calculated. Which of the following statements is
appropriate?
A) The proportion of all likely voters who plan to vote for candidate Taylor must be the same as the
proportion of voters in the sample who plan to vote for candidate Taylor (55 percent), because the data
were collected from a random sample.
B) The sample proportion minus the margin of error is greater than 0.50, which provides evidence that
more than half of all likely voters plan to vote for candidate Taylor.
C) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because the 1,000 likely voters in the sample represent only a small fraction of all likely
voters in a large city.
D) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because this is not an experiment.
E) It is not possible to draw any conclusion about the proportion of all likely voters who plan to vote for
candidate Taylor because this is a random sample and not a census.
154. Researchers investigating a new drug selected a random sample of 200 people who are taking the drug. Of
those selected, 76 indicated they were experiencing side effects from the drug. If 5,000 people took the drug,
which of the following is closest to the interval estimate of the number of people who would indicate they were
experiencing side effects from the drug at a 90 percent level of confidence?
A) (0.313,0.447)
B) (0.324, 0.436)
C) (65,87)
D) (1565, 2235)
E) (1620, 2180)
Explain: The question asks us to find the confidence interval for the people who would indicate they were
experiencing side effects trom the new drug. By looking at the percentage of people in the sample who
reported side effects (76/200) we can apply that percentage (38%) the total popuation of people who took
the dug
We are asked to find the 95 percent confidence level for the percentage of side effects. Confidence interval is
found using a formula that involves the size of the sample, the standard error of the measurement, and the
desired confidence level. However, these options seem to yield intervals directly without extra calculations.
So, we only need to multiply each end of each of those intervals by the total population of 5000 to see which
yields the best interval for our data.
The confidence interval that best fits these conditions is option d. 1565, 2235)
155. Jessica wanted to determine if the proportion of males for a certain species of laboratory animal is less
than 0.5. She was given access to appropriate records that contained information on 12,000 live births for the
species. To construct a 95 percent confidence interval, she selected a simple random sample of 100 births from
the records and found that 31 births were male. Based on the study, which of the following expressions is an
approximate 95 percent confidence interval estimate for p, the proportion of males in the 12,000 live births?
(0.31)(0.69)
A) 0.31 ± 1.96 √
12,000
(0.31)(0.69)
B) 0.31 ± 1.645 √
12,000
(0.5)(0.5)
C) 0.31 ± 1.96 √
12,000
(0.5)(0.5)
D) 0.31 ± 1.645 √
100
(0.31)(0.69)
E) 0.31 ± 1.96 √
100
156. A random sample of 80 people was selected, and 22 of the selected people indicated that it would be a
good idea to eliminate the penny from circulation. What is the 99 percent confidence interval constructed from
the sample proportion p̂?
(22)(58)
A) 0.275 ± 1.96 √
80
(0.275)(0.725)
B) 0.22 ± 2.576 √
80
(0.275)(0.725)
C) 0.275 ± 2.576 √
80
(0.275)(0.725)
D) 0.275 ± 1.96 √
80
(0.275)(0.725)
E) 0.22 ± 2.323 √
80
157. From a random sample of 1,005 adults in the United States, it was found that 32 percent own an e-reader.
Which of the following is the appropriate 90 percent confidence interval to estimate the proportion of all adults
in the United States who own an e-reader?
( 0.32 ) ( 0.68 )
A) 0.32 ± 1.960 ( )
√ 1,005
( 0.32 ) ( 0.68 )
B) 0.32 ± 1.645 ( )
√ 1,005
(0.32)(0.68)
C) 0.32 ± 2.575 √
1,005
(0.32)(0.68)
D) 0.32 ± 1,960 √
1,005
(0.32)(0.68)
E) 0.32 ± 1.645 √
1,005
158. Elly and Drew work together to collect data to estimate the percentage of their classmates who own a
particular brand of shoe. Using the same data, Elly will construct a 90 percent confidence interval and Drew will
construct a 99 percent confidence interval. Which of the following statements is true?
A) The midpoint of Elly's interval will be greater than the midpoint of Drew's interval.
B) The midpoint of Elly's interval will be less than the midpoint of Drew's interval.
C) The width of Elly's interval will be greater than the width of Drew's interval.
D) The width of Elly's interval will be less than the width of Drew's interval.
E) The width of Elly's interval will be equal to the width of Drew's interval
Explain:
159. A school administrator is interested in estimating the proportion of students in the district who participate in
community service activities. From a random sample of 100 students in the district, the administrator will
construct a 99 percent confidence interval for the proportion of all district students who participate in
community service activities. Which of the following statements must be true?
A) The population proportion will be in the confidence interval.
B) The probability that the confidence interval will include the population proportion is 0.99.
C) The probability that the confidence interval will include the sample proportion is 0.99.
D) The population proportion and the sample proportion will be equal.
E) The probability that the population proportion and the sample proportion will be equal is 0.99.
160. A newspaper poll found that 52 percent of the respondents in a large random sample of likely voters in a
district intend to vote for candidate Smith rather than the opponent. A 95 percent confidence interval for the
population proportion was computed to be 0.52 ± 0.04. Based on the confidence interval, which of the following
should the newspaper report to its readers?
A) Smith will win because a majority of voters are in favor of Smith.
B) There is a 95% chance that Smith will win.
C) The poll predicts Smith will win, but there is a 5% chance that the prediction is incorrect due to sampling
error.
D) With 95% confidence, there is convincing evidence that Smith will win.
E) No prediction about who will win can be made with 95% confidence.
Explain:
The report that the newspaper reported to its readers that the estimated proportion of likely voters in the
district who intend to vote for Candidate Smith is between 0.48 and 0.56 at a 95 percent confidence level.
We have,
Based on the confidence interval provided (0.52 ‡ 0.04), the newspaper should report to its readers that the
estimated proportion of likely voters in the district who intend to vote for Candidate Smith is between 0.48
and 0.56 at a 95 percent confidence level.
The confidence interval is calculated as follows:
Lower bound = 0.52 - 0.04 = 0.48
Upper bound = 0.52 + 0.04 = 0.56
A 95 percent confidence level means 95 percent confidence that the true proportion of likely voters who
intend to vote for Candidate Smith lies within this interval (0.48 to 0.56).
In other words, if we were to take many random samples and calculate their confidence intervals, we would
expect approximately 95 percent of those
161. Lila and Robert attend different high schools. They will estimate the population percentage of students at
their respective schools who have seen a certain movie. Lila and Robert each select a random sample of students
from their respective schools and use the data to create a 95 percent confidence interval. Lila's interval is (0.30,
0.35), and Robert's interval is (0.27, 0.34). Which of the following statements can be concluded from the
intervals?
A) Lila’s sample size is most likely greater than Robert’s sample size.
B) Robert’s sample size is mostly likely greater than Lila’s sample size.
C) Lila and Robert will both find the same sample proportion of students who have seen the movie.
D) Lila’s interval has a greater degree of confidence than that of Robert.
E) Robert’s interval has a greater degree of confidence than that of Lila.
162. A large-sample 98 percent confidence interval for the proportion of hotel reservations that are canceled on
the intended arrival day is (0.048, 0.112). What is the point estimate for the proportion of hotel reservations that
are canceled on the intended arrival day from which this interval was constructed?
A) 0.032
B) 0.064
C) 0.080
D) 0.160
E) It cannot be determined from the information given.
163. A 95 percent confidence interval of the form p̂ ± E will be used to obtain an estimate for an unknown
population proportion p. If p̂ is the sample proportion and E is the margin of error, which of the following is the
smallest sample size that will guarantee a margin of error of at most 0.08?
A) 25
B) 100
C) 140
D) 155
E) 175
Z2 p(1-p)/ ε 2
164. Courtney has constructed a cricket out of paper and rubber bands. According to the instructions for
making the cricket, when it jumps it will land on its feet half of the time and on its back the other half of the time.
In the first 50 jumps, Courtney's cricket landed on its feet 35 times. In the next 10 jumps, it landed on its feet only
twice. Based on this experience, Courtney can conclude that
A) the cricket was due to land on its feet less than half the time during the final 10 jumps, since it had
landed too often on its feet during the first 50 jumps
B) a confidence interval for estimating the cricket's true probability of landing on its feet is wider after the
final 10 jumps than it was before the final 10 jumps
C) a confidence interval for estimating the cricket's true probability of landing on its feet after the final 10
jumps is exactly the same as it was before the final 10 jumps
D) a confidence interval for estimating the cricket's true probability of landing on its feet is more narrow
after the final 10 jumps than it was before the final 10 jumps
E) a confidence interval for estimating the cricket's true probability of landing on its feet based on the initial
50 jumps does not include 0.2, so there must be a defect in the cricket's construction account for the
poor showing in the final 10 jumps
165. A random sample of 1,175 people in a certain country were asked whether they thought climate change
was a problem. The sample proportion of those who think climate change is a problem was calculated, and a 95
percent confidence interval was constructed as (0.146, 0.214). Which of the following is a correct interpretation
of the interval?
A) We are 95 percent confident that any sample of 1,175 people will produce a sample proportion between
0.146 and 0.214.
B) We are 95 percent confident that the proportion of all people in the country who think climate change is a
problem is between 0.146 and 0.214.
C) We are 95 percent confident that the proportion of people in the sample who think climate change is a
problem is between 0.146 and 0.214.
D) The probability that 95 percent of all people in the country who think climate change is a problem is
between 0.146 and 0.214.
E) The probability is 0.95 that the proportion of all people in the country who think climate change is a problem
is between 0.146 and 0.214.
166. A school librarian wanted to estimate the proportion of students in the school who had read a certain
book. The librarian sampled 50 students from the senior English classes, and 35 of the students in the sample
had read the book. Have the conditions for creating a confidence interval for the population proportion been
met?
A) Yes, because the sample was selected at random.
B) Yes, because sampling distributions of proportions are modeled with the normal model.
C) Yes, because the sample is large enough to satisfy the normality conditions.
D) No, because the sample is not large enough to satisfy the normality conditions.
E) No, because the sample was not selected using a random method.
167. A city planner wants to estimate the proportion of city residents who commute to work by subway each
day. A random sample of 30 city residents was selected, and 28 of those selected indicated that they rode the
subway to work. Is it appropriate to assume that the sampling distribution of the sample proportion is
approximately normal?
A) No, because the size of the population is not known.
B) No, because the sample is not large enough to satisfy the normality conditions.
C) Yes, because the sample is large enough to satisfy the normality conditions.
D) Yes, because the sample was selected at random.
E) Yes, because sampling distributions of proportions are modeled with a normal model.
168. The manager of a magazine wants to estimate the percent of magazine subscribers who approve of a
new cover format. To gather data, the manager will select a random sample of subscribers.
Which of the following is the most appropriate interval for the manager to use for such an estimate
A) A two-sample z-interval for a difference between sample proportions
B) A two-sample z-interval for a difference between population proportions
C) A one-sample z-interval for a sample proportion
D) A one-sample z-interval for a population proportion
E) A one-sample z-interval for a difference between population proportions
169. The superintendent of a large school district wants to estimate the percent of district residents who
support the building of a new middle school. To gather data, the superintendent will select a random sample of
district residents.
A) A one-sample z -interval for a sample proportion
B) A two-sample z -interval for a difference between population proportions
C) A two-sample z -interval for a population proportion
D) A one-sample z -interval for a difference between population proportions
E) A one-sample z -interval for a population proportion
170. A box contains 10 tags, numbered 1 through 10, with a different number on each tag. A second box
contains 8 tags, numbered 20 through 27, with a different number on each tag. One tag is drawn at random from
each box. What is the expected value of the sum of the numbers on the two selected tags?
A) 13.5
B) 14.5
C) 15.0
D) 27.0
E) 29.0
171. A compact disc (CD) manufacturer wanted to determine which of the two different cover dish s for a
newly released CD will generate more sales. The manufacturer chose 70 stores to sell the CD. Thirty-five of these
stores were randomly assigned to sell CD's with one of the cover designs and the other 35 were assigned to sell
the CDs with the other cover design. The manufacturer recorded the number of CDs sold at each of the stores
and found a significant difference between the mean number number of CDs sold for the two cover designs.
Which of the following gives the conclusion that should be made based on the results and provides the best
explanation for the conclusion?
A) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because this was not an experiment
B) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because there was no control group for comparison
C) It is not reasonable to conclude that the difference in sales was caused by the different cover designs
because the 70 stores were not randomly chosen
D) It is reasonable to conclude that the difference in sales was caused by the different cover deigns because the
cover designs were randomly assigned to stores
E) It is reasonable to conclude that the difference in sales was caused by the different cover designs because
the sample size was large
Explain:
Random assignment: The experiment used random assignment, which reduces the influence of confounding
variables and strengthens the causal inference. Assigning the cover designs randomly helps to ensure that any
difference in sales is more likely due to the cover design and not some other factor that differs between the two
groups of stores.
Significant difference: The experiment found a significant difference in the mean number of CDs sold for the two
cover designs. This suggests that the observed difference is unlikely to be due to chance.
Control group: While not explicitly mentioned, the two groups of stores can be considered as control groups for
each other. Comparing the sales between the two groups allows us to isolate the effect of the cover design.
Therefore, based on the information provided, it is reasonable to conclude that the difference in sales was
caused by the different cover designs. The random assignment of the cover designs strengthens the causal
inference and reduces the possibility of alternative explanations.
Let's analyze the other options:
A) Not an experiment: The study described uses the principles of experimentation, even though it may not be a
formal laboratory experiment. Random assignment and controlled comparison support the validity of the
conclusions.
B) No control group: As explained above, the two groups of stores can be considered as control groups for each
other.
C) Not randomly chosen: While it would be ideal if all stores were chosen randomly, the random assignment of
the cover designs within the chosen stores strengthens the causal inference.
E) Large sample size: While a large sample size can increase the confidence in the results, it alone is not enough
to conclude causality. Random assignment is crucial for causal inference.
Therefore, option D is the most accurate conclusion based on the information provided.
172. A company sells concrete in bathes of 5 cubic yards. The probability distribution of X, the number of cubic
yards sold in a single order for concrete from this company, shown in a table below.
X 10 15 20 25 30
Probability 0.15 0.25 0.25 0.30 0.05
The expected value of the probability distribution of X is 19.25 and the standard deviation is 5.76. There is a fixed
cost to deliver the concrete. The profit, Y, in dollars, for a particular order can be described by Y = 75X -100. What
is the standard deviation of Y?
A) $332
B) $532
C) $1,343.75
D) $432
Explain: The standard deviation of Y is $432 and this can be determined by using the formula of standard
deviation and the given data.
Given :
• A company sells concrete in batches of 5 cubic yards.
• The probability distribution of X, the number of cubic yards sold in a single order for concrete from this
company, is shown in the table below.
• The expected value of the probability distribution of X is 19.25 and the standard deviation is 5.76.
• There is a fixed cost to deliver the concrete. The profit Y, in dollars, for a particular order can be described
by (Y = 75X - 100).
The formula to obtain the standard deviation is given below: SD(Y) = SD(75X - 100)
Simplify the above expression by substituting the value of SD(X) and SD(100) in the above expression.
SD(Y) = 75SD(X) - SD(100)
SD(Y) = 75 SD(X) - 0
SD(Y) = 75 × 5.76
SD(Y) = $432
173. Ecologists conducted a study to investigate the potential ecological impact of golf courses. Investigators
monitored the reproductive success of bluebirds in birdhouses at nine golf courses and ten similar birdhouses at
nongolf sites. Data on nests in birdhouses occupied only by bluebirds are shown in the table.
Observed Number of Nests per Birdhouse by Location
x 1 2 3 4 5 6 7
P(x) 0.05 0.1 0.22 0.3 0.18 0.12 0.03
At the company, the daily salary of a loan agent is $150 plus $50 per loan closed. Let Y represent the amount of
money made by a randomly selected loan agent on a randomly selected day. Which of the following statements
is NOT true?
A) The mean of X is less than the mean of Y.
B) The standard deviation of Y is approximately $71.
C) The mean daily salary is greater than $350 per day.
D) The standard deviation of X is less than the standard deviation of Y.
E) The shape of the probability distribution of Y is unimodal and roughly symmetric.
Explain: The computation is shown below:
Y = a + bX
where,
Y = money made by a random selected
a = $150
b = $50
X = number of loan
E(x) = (1 × 0.05) + (2 × 0.10) + (3 × 0.22) + (4 × 0.30) + (5 × 0.18) + (6 × 0.12) + (7 × 0.03) = 3.94
E(y) = $150 - ($50 × 3.94) = $347
So, the correct option is given by C) The mean daily salary is greater than $350 per day.
177. A newspaper article indicated that 43 percent of cars with black seats are white, 46 percent of cars with
black seats are blue, 7 percent of cars with black seats are red, and 4 percent of cars with black seats are black. A
test was conducted to investigate whether the color of cars with black seats was consistent with the newspaper
article. A random sample of cars of these colors was selected, and the value of the chi-square test statistic was
χ2=8.2 . Which of the following represents the p-value for the test?
A) P(χ2 ≥ 8.2) = 0.08
B) P(χ2 ≥ 8.2) = 0.04
C) P(χ2 ≤ 8.2) = 0.96
D) P (χ2= 8.2) = 0.00
E) The p-value cannot be calculated because the sample size is not given.
178. Let X be a random variable whose values are the number of dots that appear on the uppermost face
when a fair die is rolled. The possible values of X are 1, 2, 3, 4, 5, and 6. The mean of X is 7/2 and the variance of
X is 35/12. Let Y be the random variable whose value is the difference (first minus second) between the number
of dots that appear on the uppermost face for the first and second rolls of a fair die that is rolled twice. What is
the standard deviation of Y?
A) √(35/12)
B) √(35/12) + √(35/12)
C) √{(35/12) + (35/12)}
D) √(35/12) - √(35/12)
Explain:
The mean of X is 7/2 and the variance is 35/12
Let Y be a random variable. X and Y are binomials.
E(X-Y) = E (X) - E(Y)
Write the formula for the variance of X and Y. Var(X-Y) = Var (X) - Var (Y)
Where x and y are independent covers (x,y) is 0.
Var(X-Y) =35/12 + 35/12
The standard deviation is the square root of the variance => SD = √{(35/12) + (35/12)
179. The National Park Service writes materials for students to use while in the parks. In a study of the
effectiveness of some of these materials, a random sample of students was selected to take a short quiz about
oak trees after using these materials. A random sample of park professionals also took the quiz. Investigators
compared classifications (low, medium, and high) of the crown shapes—the general shapes of the leafy parts of
the trees—made by students in s 6 through 12 with classifications made by professionals. Data from the study
are shown in the table.
Sum of squares
Regression 30,501
Residual 36,154
Total 66,655
Please calculate the value of adjusted coefficient of determination. Use four decimal places and a "dot" as the
separator e.g. 0.0003
Answer: 0.4312
n−1 SSE
Adjusted R-squared = 1 – ( * )
n−k−1 SST
226. You have a model with 4 independent variables and 293 observations.
Your Anova table Is presented below.
Sum of squares
Regression 7,781
Residual 9,060
Please calculate the model error variance. Answer: 31,4583
SSE
MSE =
n−k−1
227. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1 and 2 below. Is the coefficient on area significant at
5%? In order to answer please set up hypotheses, critical value, decisionrule and conclusions
Table 1.
Sum of squares
Regression 4,477
Residual 5,234
Please calculate the value of the standard error of the estimate. Answer: 7,1633
SSE
SYX = √
n−2
231. You have a model with 5 independent variables and 144 observations.
Your Anova table Is presented below.
Sum of squares
Regression 44,118
Residual 26,616
Please calculate the adjusted coefficient of determination. Answer: 0,6101
n−1 SSE
Adjusted R-squared = 1 – ( * )
n−k−1 SST
232. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1 and 2 below. What is the value of the coefficient of
determination (R squared) in this model?
Table 1.
Sum of squares
Regression 6,993
Residual 13,070
Please calculate the model error variance. Answer: 46,1837
SSE
MSE =
n−k−1
247. The correct interpretation of the following model is (log(wage))=0.584+0.083education+0.03female
a) For females, if education increases by 1 unit, wage increases by 3*100%, ceteris paribus
b) Females on average earn 3% more than males, keeping education constant.
c) If Female increases by 1, log(wage) increases by 3%.
d) For females, if education increases by 1 year, wage increases by 8.3%, ceteris paribus
248. The correct interpretation of the following model is (log(wage))=0.584+0.083education
a) If education increases by 1 year, wage increases by 8.3%
b) If education increases by 1 year, wage increases by 0.083 per week.
c) If education increases by 1 year, wage increases by 8.3 per week.
d) If education increases by 1 year, wage increases by 0.083# per week.
249. You analyse a simple regression model investigating the size of the apartment (in squaremeters) on the
price of apartments (in thousands of Euros). The dependent variable is in naturallogarithm. The coefficient on
the independent variable is equal to 0,013. What is the interpretation of that coefficient?
a) An extra square meter increases the price of an apartment by 13€
b) An extra square meter increases the price of an apartment by 0,13%
c) An extra square meter increases the price of an apartment by 1,3%
d) An extra square meter increases the price of an apartment by 130€
250. You are estimating a model with the price of a bicycle as the dependent variable (in €). There are four
independent variables in the model:
x1 age of the bicycle (in years),
x2 number of gears,
x3 number of previous owners,
x4 a dummy equal to 1 when a bicycle has a basket and 0 otherwise.
Model is of the following type:
Y = B0 + B1x1 + B2x2 + B3x3 + B4x4 + u
Estimation gives the following results:
Y_hat = 23 - 14.1x1 + 3x2 - 5.2x3 + 3.6x4
How would you interpret coefficient on the dummy variable (x4)?
a) If basket increases by 1 the price of the bike will increase by 3.6€ + 23€, ceteris paribus.
b) A bike with a basket is on average 3.6€ more expensive than a bike without a basket, keeping other variables
fixed
c) If basket increases by 1 the price of the bike will increase by 3.6€.
d) A bike with a basket is on average 36% more expensive than a bike without a basket,keeping other variables
fixed.
251. For a sample of 300 houses across France, we estimate a model relating the price of a house tovarious
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community.
The estimated model is of the following form:(log(price))=9.23-0.718log(nox)+0.306rooms
Which is the correct interpretation?
a) If number of rooms increases by 1, price increases by 0.306€, keeping nox constant.
b) If number of rooms increases by 1, price increases by 0.306%, keeping nox constant.
c) If number of rooms increases by 1, price increases by 30.6%, keeping nox constant.
d) If number of rooms increases by 1%, price increases by 30.6%, keeping nox constant.
252. For a sample of 300 houses across France, we estimate a model relating the price of ahouse to various
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community
The correct interpretation of the following model is(log(price))=9.23-0.718log(nox)+0.306rooms
a) If the amount of nitrogen oxide in the air increases by 1%, the price of the housedecreases by 71,8%, keeping
the number of rooms constant.
b) If the amount of nitrogen oxide in the air increases by 1 unit, the price of the housedecreases by 0,718 unit,
keeping the number of rooms constant.
c) If the amount of nitrogen oxide in the air increases by 1 unit, the price of the housedecreases by 71.8%,
keeping the number of rooms constant.
d) If the amount of nitrogen oxide in the air increases by 1%, the price of the housedecreases by 0.72%, keeping
the number of rooms constant.
253. For a sample of 300 houses across France, we estimate a model relating the price of a house tovarious
house characteristics. We use the following variables:
log(price) is a natural logarithm of price, price is reported in €.
nox is the amount of nitrogen oxide in the air, in parts per million
rooms is the number of rooms in houses in the community
The estimated model is of the following form:(log(price))=9.23-0.718log(nox)+0.306rooms
What is the value of intercept in this model and how do we interpret it?
a) Intercept is equal to 9.23, we interpret it only if it is significant (we need to set up a t-test)
b) There is no intercept in this model.
c) Intercept is equal to 9.23, we do not interpret it.
d) Intercept is equal to -0.718, we interpret it as elasticity.
254.
a) Heteroscedasticity
b) Specification bias
c) Multicollinearity
d) Autocorrelation
311. Coefficient of determination equal to 0.78 means that:
a) The model provides a poor fit
b) Most of the variation in y is unexplained by the regression equation
c) Model has problems with heteroscedasticity
d) Most of the variation in y is explained by the variation in the independent variables used in the mode
312. An analyst has set up a model explaining advertising expenditures (yt) with retail sales (xt) andprevious
year’s advertising (yt-1). The general form of the model is below. Suppose that retail sales increase by $1 in the
current year. What is the expected impact on advertising in the currentyear; what is the total effect on all current
and future advertising expenditures?
yt= β0+ β1Xt+ β2Xt-1 + εt
a) β2in the first period; βj/(1-y)
b) β1in the first period; β1/(1-β2)
c) β2+ β0in the first period; βj/(1-y)
d) β2* β1in the first period; β1/(1-β2)
313. In order to test the validity of a multiple regression model involving 4 independent variables and 25
observations, the t-statistic for assessing the significance of individual coefficients follows a Student’s t-
distributions with:
a) 20 degrees of freedom
b) 21 degrees of freedom
c) 24 degrees of freedom
d) 19 degrees of freedom
314. You were asked to interpret the coefficients from the following model:
log(price)i= 10.879 - 2.76 log(crime)i- 0.09593school_qualityi
Where:
log(price) is the natural logarithm of a price of a house in neighbourhood i
log(crime) is the natural logarithm of the number of violent crimes in neighbourhood i
School_quality is the ratio of the number of students to teacher in neighborhood i.
Which of the following interpretations is not true?
a) Other things kept constant, if log(crime) changes by 1, log(price) changes in the opposite direction by 2.7
b) Other things kept constant, when school_quality decreases by 1, price increases by 9%
c) Other things kept constant, if crime increases by 1%, price decreases by 2.76%
d) Other things kept constant, when school_quality decreases by 1, price increases by 9
315. What does the following plot of residuals (e) from a regression analysis suggest?
a) Specification bias
b) Autocorrelation
c) Multicollinearity
d) Heteroscedasticity
316. Among OLS assumptions there is no assumption about:
a) Serial correlation of residuals
b) Multicollinearity
c) Normally distributed error term
d) Significant coefficients
317. Suppose you want to estimate a model for electricity demand in Northern Europe. Electricity is often
used for heating and so its use depends among other things on temperature T (as it gets cold more electricity is
demanded). Additionally you know that the amount of electricity used over a week depends on the day of the
week, as people thend to use less electricity over the weekend (as big industry typically does not work on
Saturdays and Sundays). Therefore you want to include dummies for the day of the week: X1 equal to 1 for
Mondays and 0 otherwise, X2 equal to 1 for Tuesdays and 0 otherwise and so on for the rest of the week with X6
and X7 for Saturday and Sunday.
a. Which of the following suggested model would be best to measure the demand over time?
a) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6Tt+ ε
b) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + εt
c) Yt= β0+ β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + β7x7t + β8Tt+ εt
d) Yt= β0+ β1x1t + β2x2t + β3x3t + β4x4t + β5x5t + β6x6t + β7x7t + β8Tt+ εt
318. Which of the following is expected to occur in multiple regression analysis if an important variable is
omitted from the list of independent variables?
a) It will lead to unbiased least squares estimators
b) It will lead to problems with autocorrelation
c) It will lead to either overestimated or underestimated least square coefficients
d) It will lead to multicollinearity between independent variables
e) It will lead to general mayhem and bring on the end of the world or Trump presidency
319. The standard error of the estimate (se) for a multiple regression model with two explanatory variables
X1 and X2:
a) Measures the variation around the predicted regression equation
b) Measures the proportion of variation in Y that is explained by X1and X2
c) Measures the proportion of variation in Y that is explained by X1holding X2constant
d) Has the same sign as b1
320. You have the following model investigating the impact of crime, the number of rooms, teachingquality
and the area (town or countryside) on the price of newly build houses:
Pricei = b0 + b1crimei + b2roomsi + b3teaching_qualityi + b4areai + ei
The results for running that model are presented in tables 1, 2 and 3 below. You want to test the model for the
overall significance of coefficients; What is the value of the test statistic used for this purpose?
Table 1.
b) 184.75
c) 6380657011.890
d) 5876.79613
321. In a multiple regression model described by the following equation:yi= β0+ β1Xi+ β2Xi2+ εi where β1is
the linear term coefficient and β2 is the coefficient of the squared term, the pattern as shown in figure 2 would
be best described with:
a) Specification bias
b) Autocorrelation
c) Multicollinearity
d) Heteroscedasticity
325. A real estate broker is interested in identifying the factors that determine the price of a house. She
wants to run the following regression:
Y = β0+ β1X1+ β2X2+ β3X3+ ε
Where Y = price of the house in $1,000, X1= number of bedrooms, X2= square footage of livingspace, and X3=
number of miles from the beach. Taking a sample of 30 houses, the broker runs amultiple regression and gets the
following results: Ŷ = 123.2 + 4.59X1+ 0.125X2- 6.04X3
With the estimates of the standard error of the slopes equal to:
sb0 = 103.2, sb1 = 2.13, sb2 = 0.062, sb3 = 4.17
i. Determine the price that an individual has to pay for a 3 bedroom, 1,000 square foothouse
that is located three miles away from the beach.
a) $201,422
b) $177,243
c) $243,850
d) 229,198
ii. What is the 95% confidence interval for β1?
a) 2.13 ± 4.17
b) 4.59 ± 4.38
c) 2.13 ± 4.38
d) 4.59 ± 4.17
326. You know that the true relationship between independent variables X1, X2, X3and the dependent
variable Y is described by a following regression
Y = β0+ β1X1+ β2X2+ β3X3+ εi
However, due to lack of data you decide to estimate slightly different model
Y = β0+ β1X1+ β2X2+ ε
What problems will this new, shorter model face? Choose the best answer.
a) Heteroscedasticity
b) Specification bias
c) Normally distributed error term
d) Significant coefficients
327. When autocorrelation might be a problem in a model?
a) When F-statistics is large
b) When Var(ei) is homoscedastic
c) When DW = 2
d) When the model is estimated over time, for e.g. using daily data
328. What would you conclude if you fail to reject H0: β1 = β2 = ... = βk = 0?
a) no relationship exists between the dependent variable and the independent variables
b) a strong relationship exists among the independent variables
c) Some of the independent variables are good predictors of the dependent variable
d) more information is needed to answer the question
329. A value of Durbin - Watson statistic is d = 3.99. This means:
a) The assumption of independence of errors is violated
b) The assumption of serial correlation of errors is not violated
c) The null hypothesis of autocorrelation cannot be rejected
d) There is a positive serial correlation
330. Which of the following is expected to occur in multiple regression analysis if an important variable is
omitted from the list of independent variables?
a) It will lead to unbiased least squared estimators
b) It will lead to biased least squares estimators
c) It will lead to over- or under- estimated estimators of the variance
d) None of the above answers is correct
331. In regression models, multicollinearity arises when the _____.
a) Dependent variables are highly correlated with one another
b) Independent variables are highly correlated with one another
c) Independent variables are highly correlated with the dependent variable
d) Error terms do not have the same variance
332. Student is interested in measuring the price of cars and has set um the following model:
price = constant + α1petrol + α2age + α3summer + α4autum + α5winter + εt
Where:
price - is the price paid for the car
petrol - is a dummy = 1 if car uses petrol and 0 otherwise
age - measures the age of the car in years
summer, autumn and winter - are quarterly dummies (spring is the reference season)
One of the checks that the students wants to run is to verify whether altogether the seasonal dummies are
significant. What are the hypotheses and what is the statistic that allows to answer that question?
a) H0: each of the testes dummies is significantly different from 0; F-test for an overal significance of the tested
model
b) H0: each of the seasonal dummies separately is different from 0; t-test for each coefficient
c) H0: a subset of coefficients in the tested model is different from 0; t-test for a subset of coefficients
d) H0: a subset of coefficients in the tested model is equal to 0; F-test for a subset of coefficients
333. Which of the following will lead to the least squares estimates being biased?
a) Heteroskedasticity
b) Autocorrelated error terms
c) Exclusion of a relevant variable
d) Multicollinearit
334. Which is the correct interpretation of the following model?
log(wage) = 0.584 + 0.083education + 0.02female
Where: wage is measured in dollars per weekeducation is measured in yearsfemale is a dummy variable equal to
1 for a woman and 0 for a man
a) Holding gender constant, if education increases by 1% wage will increase by 8.3%
b) Holding education constant, a female earns 0.02% more than a male
c) When comparing males and females with the same education, males are earning 2% less than females
d) For both females and males an extra year of education increases their salary by 0.083%
e) The effect of education on wages is different for both sexes
335. In a multiple regression analysis with 5 independent variables and a sample size 120, the degrees of
freedom for a test of the significance of an individual coefficient are equal to:
a) 115
b) 114
c) 120
d) Impossible to determine
336. Which of the following is true of the error term used in linear regression?
a) It represents the joint influence of all the dependent variables in the regression model
b) It represents the joint influence of factors, other than the dependent and independent variables, on the
regression model
c) It represents the joint influence of all the independent variables in the regression model
d) It represents the combined effect of the dependent, independent, and nonrepresented factors on the
regression model
337. If an analyst is regression individual independent variables on all other independent variables of a
regression model, he or she is testing for _____.
a) Specification bias
b) Sampling error
c) Heteroscedasticity
d) Multicollinearity
338. Kurt is trying to regress a model that studies market prices between similar products from different
companies. During his study he realized that one of the independent variables, the competitor’s price, was
causing multicollinearity issues with the model. If kurt removes the variable from the model, which of the
following statistical phenomenon could affect Kurt’s model?
a) Specification bias
b) Sampling error
c) Heteroscedasticity
d) Autocorrelation
339. In the following model:
yi= 2 + 2.5x1i - 8x2i
With sb1 = 2 and sb2 = 0.05 where the sample size is 15
a) Coefficient on x1 is significant at 1% level
b) Coefficient on x2 is statistically different from 0 only at 10 level
c) Coefficient on x2 is statistically different from 0 at any conventional significance level
d) None of the above is correct
340. A regression analysis with 5 independent variables and 120 observations has produced the following
results:
Residual sum of squares = 489
Total sum of squares = 700
What are the values of adjusted R squared and standard error of the estimate?
a) 28% and 4.27
b) 72% and 3.25
c) 28% and 4.07
d) 72% and 6.14
341. Total sum of squares:
a) The variation in Y explained by variation in X
b) The variation of observed Y values from the regression line
c) The variation of the Y values around their mean
d) The variation in the slope of regression lines from different possible sample
342. The plot of residuals can be used to test:
a) Multicollinearity
b) Autocorrelation
c) Specification bias
d) Autocorrelation and heteroscedasticity
343. An analyst has set up a model explaining advertising expenditures [y_t] with retail [x_t] and previous
year’s advertising [y_(t-1)] The general form of the model is below. Suppose that retail sales increase by $1 in the
current year. What is the expected impact on advertising in the current year; what is the total effect on all current
and future advertising expenditures?
yt= β0+ β1xt+ β2yt-1 + εt
a) β2in the first period; βj/(1-y)
b) β1*β2+β0in the first period; β1/(1-β2)
c) β1*β2+β0in the first period; βj/(1-y)
d) β1in the first period; β1/(1-β2)
344. In order to test the validity of a multiple regression model involving 5 independent variables, an
intercept and 50 observations, the statistic for assessing the significance of an individual coefficient fellows:
a) Student’s t-distribution with 45 degrees of freedom
b) Student’s t-distribution with 44 degrees of freedom
c) F-distribution with 44 degrees of freedom
d) Normal distribution with 50 degrees of freedom
345. A real estate broker is interested in identifying the factors that determine the price of a house.
She wants to run the following regression:
𝑌 = β0+ β1𝑋1+β2𝑋2+ β3𝑋3+ ε
Where Y = price of the house is $1,000s, X1 = number of bedrooms, X2 = square footage of
living space, and X3 = number of miles from the beach. Taking a sample of 30 houses, the
broker runs a multiple regression and gets the following result:
𝑌 = 123. 2 + 4. 59𝑋1+ 0. 125𝑋2− 7. 04𝑋3
With the estimates of the standard error of the slopes equal to:
𝑠𝑏0 = 103. 2 , 𝑠𝑏1 = 2. 13 , 𝑠𝑏2 = 0. 062 , 𝑠𝑏3 = 4. 17
i)Determine the price that an individual has to pay for 3 bedroom, 1,000 square foot house that is
located three miles away from the beach.
a) $201,422
b) $177,243
c) $243,850
d) $240,850
ii) What is the 95% confidence interval for ?β2
a) 0.062 +/- 4.17
b) 0.125 +/- 4.38
c) 0.125 +/- 0.127
d) 0.062 +/- 0.127
346. When Durbin-Watson statistic cannot be used?
a) When we are interested in second order autocorrelation.
b) When among the independent variables there is no lag of the dependent variable
c) When there is an intercept in the model
d) When we analyse data that are over time.
347. When multicollinearity might be a problem in a model?
a) When F-statistic for the overall significance is large
b) When Variance Inflation Factor is equal to 3
c) When Variance Inflation Factor is equal to 8
d) When correlation between the independent variable and the dependent variable is equal to 0.899
348. Katrin analyses a multiple regression model with 7 independent variables. What would she conclude if
she fails to reject the null hypothesis stating that all coefficients in the model are equal to 0.
a) A strong relationship exist among the independent variables
b) Some of the independent variables are good predictors of the dependent variable
c) No relationship exists between the dependent variable and the independent variables
d) More information is needed to answer the question
349. Arnar collected various characteristics of 348 houses in his city and used this dataset to set up a
multiple regression model with 4 independent variables. The t-statistic for size of the house (one of the
independent variables in the model) is equal to 0.89, What can he conclude?
a) Size of the house is significant at 0% but is not significant at 1% level (using a 2-tailed test)
b) Size of the house is significant at 10% but is not significant at 1% level (using a 1-tailed test)
c) Size of the house is not significant at any usual alpha levels
d) Not enough data to answer this question
350. Among OLS assumptions there is no assumption about:
a) Serial correlation of residuals
b) Significant coefficients
c) Multicollinearity
d) Normally distributed error term
351. If Durbin Watson statistic is equal to 2, then the correlation coefficient between the current error and
the previous period error is equal to:
a) 1
b) -1
c) 0
d) Impossible to tell
Explain: The Durbin-Watson statistic ranges from 0 to 4, with 2 being the ideal value representing no
autocorrelation.
Higher DW values signify negative autocorrelation, while lower values suggest positive autocorrelation.
A correlation coefficient of 1 and -1 represent perfect positive and negative correlations, respectively.
Since DW = 2 implies no autocorrelation, the correlation coefficient between the error terms must be 0, meaning
no linear relationship exists between them.
352. Arnar runs a model with a monthly interest rate as a dependent variable. His dataset covers 10 years.
How many dummies he needs to include in the model in order to check whether seasonality is present in data?
a) 10
b) 9
c) 12
d) 11
353. Which of the following tests is used to check the joint statistical significance of all the lagged variables
in the distributed lag model with additional day of the week dummy variables?
a) T-test
b) F-test for overall significance
c) F-test for subsample
d) Dickey-Fuller test
354. The regression: 𝑌 = β0+ β1𝑋1+β2𝑋2+ β3𝑋3+ β4𝑋4+ ε and Y=β0+ β1𝑋1+β2𝑋2+ ε Were run using a
sample of 30 observations. The SSE (Sum of Squared Errors) for the first regression is 289.4 and 382.32 for the
second regression. Test Ho: β3= β4= 0.
a) Reject H0 at α = 0.01
b) Reject H1 at α = 0.025
c) Reject H0 at α = 0.05
d) Fail to reject H0 at α < 0.10
355. If in a model, all the points on a scatter diagram lie on a straight line, what is the value of the Sum of
Squared Errors?
a) 0
b) Infinity ∞
c) 1
d) Cannot be determined
356. What is the value of coefficient of determination for the following case:
N N
i=1 i=1
a) 0.2
b) 0.8
c) 0.5
d) Cannot be calculated - not enough information
357. What is the correct interpretation of the following model?
log(wage)i= 0.584 + 0.083educationi+ 0.02 female + 0.004(educationi*femalei)
Where:
Wage is measured in years
Education is measured in years
And female is a dummy variable equal to 1 for woman and 0 for man.
a) Holding gender constant, is the education increases by 1%, wage will increase by 8,3%
b) An extra year if education in increasing female salary by 8.7%
c) Holding education constant, a female earns 0.02% more than male
d) When comparing males and females with the same education, males are earning 2% less than females
e) For both females and males an extra year of education increases their salary by 0.083%
358. Suppose you were to run a regression of leisure travel expenditures by households on household
income. We would expect that households with low incomes do not travel much. High-income households may or
may not travel much, depending on the household’s preference for travel. The results for this regression will be
subject to ___.
a) Multicollinearity
b) Specification bias
c) Autocorrelation
d) Heteroscedasticity
359. In examining the determinants of income, data were collected regarding the characteristics of 45
adults, and the regression was used, where Y is the annual𝑌 = β + β1𝑋1+ β2𝑋2+ β3𝑋3+ ε was used, where Y is
the annual income (in thousand of dollars), X1 is the person’s age, X2 is his/her years of education, X3 and is a
dummy variable = 1 if the adult is female.
If you get when you run the regression, how would ŷ = 26. 3 + 1. 38 𝑥1+ 2. 98𝑥2− 0. 76𝑥3 you interpret the
coefficient on gender?
a) A woman earns 76% of a man’s earnings.
b) For each additional year of education, a woman earns $760 less than a man.
c) For each additional year in the age, a woman earns $760 less than a man.
d) On average, a woman earns $760 less than a man
e) On average, a women earns $76 less than a man
360. Variance Inflation Factor equal to 4
a) That there is serial correlation in the model
b) That the errors are autocorrelated of degree 1
c) That there is a problem with multicollinearity in the model
d) That there is no problem with multicollinearity in the model.
361. Suppose the following scatter plot shos the relationship between X and Y. How might you modelY?