Solutions Homework Week 6
Solutions Homework Week 6
Q1
a) Suppose an equal number of students is assigned to each design:
Design 1 Design 2 Total
More than 1 min 12 5 17
Less than 1 min 13 20 33
Total 25 25 50
b)
1st year 3rd year Total
Yes 85 117 202
No 276 104 380
Total 361 221 582
Q2
a) See below for the marginals
GEBDER
Lied at least once MALE FEMALE Total
YES 3228 10295 13 523
(exp=6268.3) (exp=7254.7)
NO 9659 4620 14 279
(exp=6618.7) (exp=7660.3)
Total 12 887 14 915 27 802
Another problem is that percentages rather than numbers are reported. Since we know the sample
size n, it is possible to translate these percentages into numbers though.
INTEGRATIVE ASSIGMENT
1.
a) The best two options are the mean and the median.
b) The extent to which the response time distributions are skewed would be an important thing
to know when deciding whether to use the median or the mean. With a heavily-skewed
distribution, the mean will be affected by values that are far away from the typical
observations; the median will not exhibit this behavior.
2.
a) Assuming that a person’s sex is unrelated to whether they plan to have hip surgery, the
proportion of males in the sample should be the same as the population at large. In order to
assess this, we can construct a table like this:
Gender
Male female Total
Number 80
Proportion 1
b) In order to do this, we would use the formula for a confidence interval for a single
population proportion:
" "
, 01% 2
CI = ± 𝑧 ∗ 1# #
- -
where Y is the number of females in the sample. This is the same as the formula
CI = 𝑝̂ ± 𝑧 ∗ SE34
3.
a) Since both scores will be quantitative variables, the best plot in this case would
be a scatterplot.
b) In this case, a scatter plot would not yield as much information. The math scores
only have 4 potential values, so it would be more insightful to plot the average
IQ score for every math rating, along with standard errors. This would yield four
bars, one for each math rating.
c) Since the language-IQ plot is two (roughly) continuous variables, a good statistic would be
either 𝑟, if the relationship is linear, or Kendall's τ, if the relationship is not linear. The
mathematical score is a little trickier. The best answer, of the statistics we have discussed so
far, is Kendall's 𝜏.
4.
a) You could use a box plot or a bar plot, with standard errors, for this purpose. The plot would
have two boxes/bars, and the y axis would show the RT score.
b) An independent samples t-test would probably be best, assuming that the scores
are approximately normally distributed. Otherwise, one of the non-parametric
alternatives (Wilcoxon rank sum test or permutation test) would be more appropriate.
For the Wilcoxon test, the test statistic is the sum of the ranks of all observations in one of
the groups.
e) In this case, the appropriate statistic to build a confidence interval around is X; − X< . The
=! =!
formula for the confidence interval is 𝐶𝐼 = EX; − X< F ± 𝑡 ∗ G-' + -( .
' (
f) Assuming that the confidence is a 95% confidence interval, the test will be at α = 0.025.
5.
a) A scatterplot would be the most appropriate plot to show whether participants
score worse one month after surgery. On the x axis would be the score before,
and on the y axis would be the score one month after. It would also be helpful
to place a line at y = x to show where we’d expect points to fall if there were no
change in the scores.