0% found this document useful (0 votes)
57 views21 pages

243 Final Exam - Practice

Uploaded by

fualisa0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views21 pages

243 Final Exam - Practice

Uploaded by

fualisa0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MULTIPLE CHOICE – Write your answer in the space provided, using

UPPERCASE letters. Illegible answers will be marked incorrect. You DO


NOT need to justify your answer.

1) When doing a significance test, a student gets a p-value of 0.06. This means: [1
mark]
I. Assuming Ho is true, the sample results are a likely event.
II. 94% of samples should give results that fall in this interval.
III. We reject Ho.
A. I only
B. III only
C. I and II
D. I and III
E. II and III
ANSWER: ____A______

2) What is the purpose of inferential statistics versus descriptive statistics? [1 mark]


A. Inferential statistics describe the outcome of random phenomena
B. Inferential statistics are used to make informed predictions about parameters
we don’t know
C. Inferential statistics describe samples that are normally distributed and large
enough (n>30)
D. Inferential statistics are used to generate samples of random data for a more
reliable analysis
E. Inferential statistics are used to predict outcomes for games of chance

ANSWER: _____B_____

STUDY A (questions 3 & 4) You are interested in the flowering time of tulips in early
spring and decided to keep track of the date that different gardens in your neighborhood
bloom. You use April 1 as a reference point and record the number of days past this date
st

for each garden. You sample 13 gardens and find that the mean number of days is 12.3
with a standard deviation of 3.1.

3) For STUDY A, what is the 95% confidence interval for the average days past April 1 st

that the gardens bloomed based on this sample? [1 mark]


A. (10.4,14.2)
B. (5.1,18.1)
C. (5.5,19.1)
D. (-1.9,1.9)
E. (10.7,13.8)
ANSWER: ____A______

1
4) For STUDY A, which of the following statements about the confidence interval is
true? [1 mark]
A. The confidence interval indicates the range over which you would fail to
reject the null hypothesis of a single-sample t-test.
B. The average number of days before flowering in the sample is between the
days given by the interval.
C. There is a 95% probability that the true average number of days before
flowering is between the days given by the interval.
D. The confidence interval is based on a z-distribution
E. None of the above.
C
ANSWER: ____B______

STUDY B (questions 5, 6 & 7) As a mushroom farmer in the Kingston region, you are
trying to find new opportunities to sell the range of goods you produce. You decide to try
developing homegrown mushroom kits, but wanted to do a market survey first to see if
the average homeowner could growth mushrooms successfully. You approach 23 random
people at the weekend farmers market and ask if they would be willing to try growing
Oyster mushrooms from your kit. You first ask them whether they have previous
experience growing mushrooms and then follow up with each participant each week over
the next three weeks and ask them whether the mushrooms successfully grew. You are
interested in evaluating whether success at growing mushrooms by the end of the 3 weeks
depended on previous experience.

5) For STUDY B, what statistical test would be most appropriate for testing your
question? [1 mark]
A. One-sample t-test
B. Two-sample t-test
C. Chi-square test
D. Single-factor ANOVA
E. Two-factor ANOVA
ANSWER: ____C______

6) For STUDY B, identify the type of study design. [1 mark]


A. Cohort study
B. Stratified survey
C. Case-control study
D. Cluster survey
E. Simple random survey
ANSWER: ____A______

2
7) For STUDY B, select the plot that is most appropriate for showing whether success at
growing mushrooms by the end of the 3 weeks depended on previous experience. [1
mark]
A. Bar chart
B. Box plot
C. Grouped bar chart
D. Grouped box chart
E. Scatter plot
ANSWER: _____C_____

8) Which of the following statements is FALSE? [1 mark]


A. We repeatedly sample a population in practice to create the sampling
distribution
B. The ‘true’ sampling distribution does not change among samples
C. The population distribution does not change among samples
D. The sample mean is variable among different samples
E. The estimated standard error is variable among different samples

ANSWER: _____A_____

9) Which of the following statements best explains why confidence intervals cannot be
used to test for significance in a t-test? [1 mark]
A. Confidence intervals do not share a common sampling distribution with t-
tests.
B. Confidence intervals indicate the range over which the true population value
lies.
C. Confidence intervals are a statement about the sampling distribution of your
population, not of the null distribution.
D. Confidence intervals do not use the standard error estimated from your
sample.
E. Confidence intervals are always two-tailed whereas t-test can be one- or two-
tailed.
ANSWER: _____C____

3
STUDY C (questions 10-13) You discover a pamphlet advertising final exam prep
sessions for statistics. Since exams are coming soon, you decide to read what they have to
say. Use your statistical knowledge to find the statistical and methodological errors.

Improve your stats grade by 50%! (entirely fictional)


Our company (StatsHeroes) has spent the last 3 years developing the most efficient exam
prep material to help you excel in your statistics course. We developed the material based
on a survey of students who had previously taken a statistic courses. We selected 100
students who had the top grades in their course, and asked them a series of questions
about exam prep strategies. Using a linear regression, we found that students who used
YouTube videos versus taking notes from the lectures as a study method performed
statistically better on the exams. Our analysis was done using a t-test on the slope and we
rejected the null hypothesis (t_observed=0.3, t_critical=2.1). We found that students who
slept well the night before an exam felt less stressed heading into the exam (Chi-square
test, p<0.05). Working with our sample of students, we developed a series of practice
questions as part of the prep material that are designed to improve your grade. To test the
effectiveness of these practice questions, we asked all of the students in our sample to
take an exam, then let them use the practice questions to hone their statistical skills, and
finally asked the same students to retake the exam. We analyzed the results using a two-
sample t-test and found a significant result (p=0.012). The p-value indicates that our
practice questions are guaranteed to improve your score 99.8% of the time!

10) For STUDY C, which of the following is the most important methodological error
found in the passage? (Multiple answers possible: 1 mark for each correct response.
You will lose 0.5 marks for each incorrect answer, but only applied to this question.)
A. The study design has pseudoreplicated sampling units
B. The study design has biased sampling unit allocation
C. The study design uses non-random sampling units
D. The study design is missing a control level needed to compare.
E. There are no methodical errors
ANSWER: ____C_____

11) For STUDY C, which of the following is an error found in the passage? [1 mark]
I. Testing the study methods should not be done using linear regression.
II. Linear regression is not evaluated using a t-test.
III. There is a mismatch between the test scores and the statistical conclusion.
A. I only
B. III only
C. I and II
D. I and III
E. II and III

ANSWER: ____D______

4
12) For STUDY C, which of the following is an error found in the passage? [1 mark]
I. Evaluating the relationship between sleep and stress should not be done with a
Chi-square test.
II. Evaluating the effectiveness of the practice questions on exam success should
not be done using a two-sample t-test.
III. The researchers have drawn an incorrect scientific conclusion about the p-
value for the effectiveness of the practice questions on exam success.
A. I only
B. III only
C. I and II
D. I and III
E. II and III
ANSWER: ____E______

STUDY D Use the following study to answer questions 13, 14 & 15.
Airbnb is an accommodation-sharing platform that relies heavily on guest ratings.
Interestingly, the distribution of ratings (shown below) is not equal across the possible
scores and is also not Normally distributed (5.0 is excellent, 1.0 is poor). The proportion
of accommodations in each category is:

Rank 5.0 is 46.3%


Rank 4.5 is 41.1%
Rank 4.0 is 10.3%
Rank 3.5 is 1.7%
Rank 3.0 is 0.3%
Rank 2.5 is 0.2%
Rank 2.0 is 0.1%
Rank 1.5 is 0%
Rank 1.0 is 0%

13) If you took repeated samples of potential accommodations for STUDY D, which of
the following statements about the distribution of the mean ratings would be
INCORRECT? [1 mark]
A. Since the data are not Normally distributed, the distribution of mean ratings
cannot be Normally distributed.
B. The mean of the distribution is the same as the mean of the population
distribution.
C. The variance of the distribution depends on sample size.
D. Sampling error causes the variation in the distribution.
E. The standard deviation of the distribution can be estimated from a sample.

ANSWER: ____A_____

5
14) You collect a sample of 32 accommodations for STUDY D and find the following
information about the ratings: Mean=4.2, Median=4.5, SD=0.2, IQR=1.5. Based on
this information, what is the estimated standard deviation of the distribution of mean
ratings? [1 mark]
A. 0 ≤ ANSWER < 0.05
B. 0.05 ≤ ANSWER < 0.1
C. 0.1 ≤ ANSWER < 0.2
D. 0.2 ≤ ANSWER < 0.5
E. 0.5 ≤ ANSWER < 1.0

ANSWER: _____A_____

15) What is the probability that a randomly selected accommodation in STUDY D will
have a rating that is less than 4.0? [1 mark]
A. 0 ≤ ANSWER < 0.2
B. 0.2 ≤ ANSWER < 0.4
C. 0.4 ≤ ANSWER < 0.6
D. 0.6 ≤ ANSWER < 0.8
E. 0.8 ≤ ANSWER < 1.0

ANSWER: _____A_____

16)You have been investigating the human health effects of a factory that has been
releasing contaminated water into a river from which all local communities draw
their drinking water. The toxin in question is known to cause liver cancer in lab
rats. To determine whether pollution in the river is affecting human health, you
randomly sample 200 people living in the community upstream from the factor
(that don’t drink the polluted water) and 200 individuals that live downstream
(hence they do drink the contaminated water. For each person sampled, you
perform a liver enzyme test to look for the tell tale signs of liver cancer. What
type of statistical test would be best suited to analyze these data? [1 mark]
A. Chi-square test
B. Paired-sample t-test
C. 2-sample t-test
D. 1-factor analysis of variance (ANOVA)
E. Regression

ANSWER: _____C_____

6
17) You want to plant a butterfly garden on the balcony of your apartment, but are not
sure what species of milkweed is best suited for pots. You grow 10 plants of the
‘Common Milkweed’ and 10 plants of a related species called the ‘Butterfly Weed’
and measure their height after 2 months. Which of the following statements is about
the descriptive statistics for your data? [1 mark]
I. A boxplot showing that the interquartile range for the common milkweed is
greater than for the butterfly weed.
II. The difference in median height between the two species is 3.2 cm.
III. The height difference between the two species is unexpected from sampling
error.

A. I only.
B. II only.
C. I and II.
D. I and III.
E. II and III.
ANSWER: _____C_____

18) Which of the following questions should be analyzed using a 1-tailed t-test? [1
mark]
A. Is the fluorine concentration in Kingston’s water above the recommended
guide of 0.7 mg/L?
B. Does the mean patient wait time differ between an After-Hours health clinic
compared to a regular doctor’s office?
C. Has the first calendar date that stores put out their Christmas ornaments for
sale changed over the past 20 years?
D. Is the amount of sea ice in our Arctic waters different now from what it was
10 years ago?
ANSWER: _____A_____

19)Sara is an ornithologist and has been studying stress hormones (ug/ml) in black-
capped chickadees over the winter months. She collects blood samples from 10
random birds in a forest site, 10 random birds from a field site, and 10 random
birds from a urban site. What statistical test should be used to evaluate whether
there is a difference in stress hormones among the sites? [1 mark]
A. Chi-square test
B. Paired-sample t-test
C. 2-sample t-test
D. 1-factor analysis of variance (ANOVA)
E. Regression

ANSWER: _____D_____

7
20) The following R output is for a linear regression of brain mass in bats as a function of
their body mass. Which of the following values is used to test the hypothesis that
body mass can be used to predict brain mass? [1 mark]

A. The estimate value of 3.97821.


B. The t-value value of 44.44.
C. The multiple R-squared value of 0.9272
D. The t-value value of 27.88.
E. The estimate value of 0.75164
ANSWER: _____D_____

21)You have been given a dataset that contains the viral load for patients who are
sick with the flu. The dataset also includes whether the patients had been given a
flu vaccination (levels of yes/no) and their age group (levels of
child/teen/adult/senior). Which of the following figures would best illustrate
whether the efficacy of the flu vaccine for reducing viral load depends on patient
age? [1 mark]
A. Boxplot
B. Contingency table
C. Interaction plot
D. Histogram
E. Scatter plot

ANSWER: _____C_____

22)Select which of the following null and alternative hypotheses are most
appropriate for a two-sample t-test that answers the following question: “Are the
means of my samples different?”. [1 mark]
A. H0: μ ≤ 0 HA: μ > 0
B. H0: μA = μB HA: μA ≠ μB
C. H0: μA ≥ μB HA: μA < μB
D. H0: μ > 0 HA: μ ≤ 0
E. H0: μA ≠ μB HA: μA = μB
ANSWER: ____B_______

8
SHORT ANSWER - Write all your answers in the space provided.

23) Carbon tax is a hot political issue right now in Canada. The idea is that by taxing
activities or products that generate a lot of carbon dioxide, consumers will change
their behavior and opt for choices that generate less carbon dioxide. You have been
asked to conduct a survey looking at how people feel about carbon taxes in five key
areas: gasoline, cement manufacturing, electricity generation from natural gas plants,
home heating with fossil fuels, agriculture. You decide to run the survey online by
sending emails to 1000 people who have their email registered with Revenue Canada.
The survey asks participants to select their level of support (strongly support, support,
neutral, do not support, strongly do not support) for carbon taxes under each area.

A. Indicate the sampling unit and the statistical population (be specific). [2 marks]
[sampling unit: a person (or person’s email); statistical population: all the people
(or emails) who have an email registered with Revenue Canada. (Zero marks for
not identifying that the statistical population is just those that have an email
registered with revenue Canada.)]

B. Give one example of a hidden bias that could arise from this sampling method (be
specific). [1 mark]
[People without an email registered with Revenue Canada will not be included in
the sample, such as those who are older or who can’t afford a computer. These
people are likely to have a different perspective on carbon tax, which may cause a
bias in your answer]

C. Indicate the most appropriate statistical test (be as specific as possible) [1 mark]
[Chi-square test]

D. Explain why you selected this test [1 mark]


[There are two categorical variables, which can only be analyzed with a Chi-
square test.]

E. Indicate the null and alternative hypothesis (be mindful of direction in the test if
appropriate) [1 mark]
[The null hypothesis is H0: observed counts are not different from the expected
counts, HA: observed counts are different from the expected counts. *grade part c
based on the answer in part a, even if that was incorrect]

F. Name the appropriate test statistic (e.g., F-score) (You do not need to find its
value.) [1 mark]
[Chi-square score. *grade part d based on the answer in part a, even if that was
incorrect]

9
24) You work for a cosmetic company that develops tanning solutions designed to
modify skin color. You have been asked to evaluate the effectiveness of two
formulations (Dihhydroxyacetone, Erythrulose) and three methods of
application (Cream, Gel, Solution) in terms of color change. After running the
experiments and collecting the data, you analyze the data and generate the
following R output. Use these results to answer the following questions.

10
24) continued…
A. Indicate the most appropriate statistical test (e.g., t-test, regression etc.) for this
data and explain your rationale. Be as specific as possible. [1 mark]
This is a two-factor ANOVA [0.5 marks], which is most appropriate because we are
studying a numerical response under two categorical explanatory variables [0.5
marks].

B. Indicate the statistical distribution you will use to test the null hypothesis. [1
mark]
It is an F-distribution.
C. Your employers are interested in whether the color change for each formulation
depends on the type of application. For this specific question, state:
a. The null and alternative hypothesis. [1 mark]
δ F1A1=δF2A1=δF1A2=δF2A2=δF1A3=δF2A3= 0, where delta is the difference of
each cell from additivity.
b. The observed test score (3 significant figures). [1 mark]
F_observed=4.677
c. The appropriate degrees of freedom and the associated critical test score
(3 significant figures) from the table provided at the end of the exam.
Assume a Type I error rate of 5%. [1 mark]
Numerator degrees of freedom is 2, denominator degrees of freedom is 18.
From the table, this gives a F_crit=3.555
d. Your statistical conclusion. [1 mark]
Since F_crit<F_obs, we reject the null hypothesis (0.5 marks) and conclude
that some of the cells are different from additivity (0.5 marks).

11
24) continued…
D. The following boxplot shows the results of the experiment. Labels are ‘D’ for
Dihhydroxyacetone, ‘E’ for Erythrulose, ‘Cream’ for cream application, ‘Gel’
for gel application, and ‘Sol’ for solution application. Indicate directly on the
boxplot which groups are significantly different from each other based on
above R output. Use the letter scheme shown in lecture. [2 marks]
40

a b a b c b
30
Color Change

20
10
0

D.Cream E.Cream D.Gel E.Gel D.Sol E.Sol

E. Use the medians in the above boxplot to draw the interaction plot below.
Draw the figure to scale and be as accurate as possible. [2 marks]

Application
30

Solution
Gel
Cream
25
Color Change

ADD BLANK BOX


20
15
10

Dihhydroxyacetone Erythrulose

Formulation

12
25) A study was published in the Lancet on the body mass index (BMI) of people from
around the globe. The following table shows a random subset of the data for 10
people. For this question, test the hypothesis that the mean BMI of people in this
sample is above the ‘healthy’ threshold of 25.

25.76 25.72 25.39 24.65 24.90 26.01 25.97 25.11 25.91 26.54

A. Indicate the statistical test that is most appropriate for this data and the
scientific question [1 mark] [single sample t-test]

B. Write the null and alternative hypotheses [1 mark]


[Ho: m <= 25, Ha: m > 25, where m is the sample BMI mean. The sign is
important here]
C. Calculate the observed test score. Make sure to show your work and report
your answer to three decimal places. [3 marks]
[m=25.596, sd=0.578, n=10, t=(m-25)/(sd/sqrt(n)), t=3.264]
D. If the critical score is 1.833, write your statistical conclusions and your
scientific conclusions. Include your rationale for the statistical conclusions. [2
marks]
[Since the t-observed is greater than t-crit=1.833, we reject the null
hypothesis. We conclude that there is evidence to support that the mean BMI
in the sample is greater than 25.]
E. In the space below, draw the null distribution and null hypothesis for the
statistical test along with proper axis labels. Label both the null distribution
and the null hypothesis. Draw on the observed test score and shade in the
region that represents the p-value. [3 marks]

Remove picture

13
26) For each of the following studies, identify and rationalize the most appropriate
statistical test (e.g., t-test, regression etc.). Include the null and alternative hypotheses
(be mindful of direction in the test), as well as the appropriate test statistic (e.g., F-
score for an F-test).

STUDY 1 A geneticist was interested in the degree to which behavior is determined


by gender. She conducted a survey of 40 random students. The scientist categorized
each student as being male or female, and their study behavior as diligent,
procrastination or reactive. She wants to know whether study style depends on gender

i. Identify and rationalize the most appropriate statistical test [1 mark]


This is a Chi-square analysis because both factors are categorical.

ii. State the null and alternative hypothesis [1 mark]


Ho: there is no difference between expected and observed frequencies.
HA: there is a difference between expected and observed frequencies.
(students could also use the independent/non independent terminology)

iii. Identify the appropriate test statistic [1 mark]


This is a Chi-square test and the test statistic is the Chi-square score.

STUDY 2 A sports medicine study was conducted looking at whether changing the
source of dietary proteins improved race times in marathon runners. The study asked
50 racers to increase the proportion of their dietary protein coming from plants for a
season. The researchers recorded the proportion of dietary protein derived from plants
(proportion) and the change in race time (minutes) for each racer. They wanted to test
the hypothesis that increasing the proportion of plant proteins could predict faster race
times.

i. Identify and rationalize the most appropriate statistical test [1 mark]


This is a linear regression because both variables are numerical and the
question is about prediction.

ii. State the null and alternative hypothesis [1 mark]


The null hypothesis is H0: b≥0, HA: b<0, where b is the slope.

iii. Identify the appropriate test statistic [1 mark]


The test is a t-test, and the test statistic is a t-score.

14
26) continued…

STUDY 3 Researchers are interested in whether a new form of bio control can
successfully manage pest insects on tea plantations compared to using pesticides.
They conduct an experimental study where 120 tea plants are randomly allocated to
either i) no biocontrol or pesticide, ii) biocontrol but no pesticide, and iii) pesticide
but not biocontrol. For each tea plant, they measured the mass of consumed after
exposure to the insects.

i. Identify and rationalize the most appropriate statistical test [1 mark]


This is single-factor ANOVA because there is one categorical factor and a
numerical response variable.

ii. State the null and alternative hypothesis for the question whether there is
an effect of biocontrol. [1 mark]
The null hypothesis is H0: u1=u2=u3, HA: u1≠u2≠u3, where u is the mean
for each treatment.

iii. Identify the appropriate test statistic [1 mark]


The test statistic is an F-score.

27) Maple syrup production is looking to be good this year for sugar bush operators. You
are interested in how soil type impacts the amount of maple syrup produced per
hectare of forest, so conduct a study where you randomly sample 60 maple sugar bush
operators from each municipality on Ontario and Quebec. For each operator, you
record the soil type (humic, acidic, sandy, bog) and the amount of production (<100L
per ha, 100-200L per ha, >200L per ha).
A. Indicate the survey design used in the study [1 mark] [stratified sampling]

B. Indicate the sampling unit and observation unit [1 mark] [0.5 marks for each
of sampling unit: sugar bush, observation unit: sugar bush]

C. Indicate the type of data for each of the measurement variables in the study [1
mark] [0.5 marks for each of soil type: categorical, production: categorical]

15
27) continued…

D. The following table shows the data from the Maple Syrup Study for one of the
municipalities in question 27.
Humic soils Acidic soils Sandy soils Bog soils
<100L per ha 8 12 5 4
100-200L per ha 13 5 1 1
>200L per ha 5 3 1 2

i. Indicate the statistical test that is most appropriate for this data and the
scientific question [1 mark] [Chi-square test]

ii. Write the null and alternative hypotheses [1 mark]


[Ho: soil type is independent of maple syrup production, Ha: soil type is not
independent of maple syrup production.

iii. Calculate the missing expected counts in the table below under the null
hypothesis. Report your answers to one decimal place, showing your work
in the space below the table for full marks. [3 marks]
[For the first cell, it would be ((8+13+5)/60)*((8+12+5+4)/60)*60=12.6. Full
table is shown below]
Expected Humic soils Acidic soils Sandy soils Bog soils
<100L per ha 12.6 9.7 3.4 3.4
100-200L per ha 8.7 6.7 2.3 2.3
>200L per ha 4.8 3.7 1.3 1.3

16
27) continued…

iv. Calculate your observed test score to two decimal places. [2 marks]
[Observed test score is 7.70] df=(number of rows -1)*(number of columns -
1)=6]

v. Find the critical score and write both your statistical conclusions and
scientific conclusions. [2 marks]
[chisq-crit=12.592. Since the chisq-observed is less than chisq-crit, we fail to
reject the null hypothesis. We conclude that there is no evidence that maple
syrup production and soil type are not independent.]

28) Explain what a null F-distribution represents. Include a clear definition of the F-score
(i.e., how is an observed F-score calculated), and then include a clear description of
what the null distribution for the F-score represents. Be as specific as possible. [3
marks]
[The F-score is the ratio of the variation among categorical groups divided by the
residual variation within a group. The null distribution for the F-score represents the
variation in that ratio you would expect from repeated sampling of a population where
there was no true difference in the means.]

17
FORMULAE

18
19
20
21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy