M346-201306 XXX
M346-201306 XXX
This examination is in TWO parts. Part 1 carries 25% of the total available marks
and Part 2 carries 75%.
You should attempt ONE question from Part 1: this question carries 25 marks. You
should attempt THREE questions from Part 2: each question in this part also carries
25 marks.
You are advised not to cross through any work until you have replaced it with another
solution to the same question (or part of question). Crossed through work will not be
marked.
In Part 1 of the paper, if you answer both questions, your better score will count
towards your result. In Part 2 of the paper, if you answer more than three questions,
your best three scores will count towards your final mark.
This question paper is rather long because of the inclusion of tranches of GenStat
output. Do not let its length put you off. In your initial reading of the paper,
you will be able to either ignore or pass over very quickly all such output.
Please start each question on a new page, and cross out rough working.
Put all your used answer books together with your signed desk record on top. Fasten
them in the top left corner with the round paper fastener. Attach this question paper
to the back of the answer books with the flat paper clip.
Copyright
c 2013 The Open University
PART 1 (Questions 1 and 2)
You should attempt ONE question from this part of the examination,
which carries 25% of the total available marks. Each question carries
25 marks. A guide to mark allocation is shown beside each question
thus: [4].
In each question in Part 1 you are asked to write a short essay on a
topic from the course. By the word ‘essay’, we do not mean to imply
that your answer should be entirely text; formulae and mathematical
symbols, if appropriate, are allowed. However, you should think of
this as an essay question in the senses of structure and readability.
Indeed, 4 of the 25 marks will be awarded for putting the essay
together in a reasonably clear manner, including a reasonable
structure with beginning, middle and conclusion, and reasonably
concise use of language. References to specific data-based examples in
the course are not expected. However, it may be useful to illustrate
points by giving special cases, perhaps in mathematical form (e.g.
Y ∼ N(0, σ2 ) is a special case of a distributional assumption, and
α + β1 x1 + β2 x2 is a special case of a formula for a regression mean).
Question 1
Write an essay describing the role of blocking in the design and
analysis of experiments.
Your answer should include:
• a brief description of what a block is in this context; [2]
• an outline of the reasons for using blocks in the design of an
experiment; [4]
• a brief description of an experimental situation where more than
one kind of block is involved, making it clear why it is necessary
to have more than one kind; [6]
• a brief explanation of how linear models for experimental data
take account of blocks; [5]
• a brief explanation of the reason why the ANOVA commands in
software like GenStat do not routinely display p values for blocks. [4]
The remaining four marks are for the clarity and structure of your
essay. [4]
Question 3
Toughness and fibrousness of asparagus are major determinants of
quality. A method for quantifying asparagus texture is based on
measuring the maximum shear force necessary to cut through the
spears. A study was carried out where each of 18 randomly selected
spears of asparagus from a local market had its maximum shear force
to cut and its percentage content of dry fibre weight measured. The
data were recorded in a GenStat file containing the variables force and
weight with the measured shear forces (in Kgf) and fibre dry weights
(in %) respectively.
(a) A scatterplot of the data is given in Figure 1.
Figure 1
(i) On the basis of this plot, would you say it is reasonable to fit
a simple linear regression model to the data? Briefly explain
your answer. [3]
(ii) The researcher who gathered the data considered
transforming the variables by taking the log of both force and
weight. Under what circumstances would that be helpful? [2]
Figure 2
Model C
Regression Analysis
Response variate: weight
Fitted terms: Constant + force + green + force.green
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 3 2.9200 0.973341 119.86 <.001
Residual 14 0.1137 0.008121
Total 17 3.0337 0.178454
Change -1 -0.0001 0.000099 0.01 0.914
black
educ −0.1458
lgest −0.1627 0.0563
smoke 0.0524 −0.2257 −0.1426
grams −0.2565 0.1187 0.7000 −0.2281
black educ lgest smoke grams
Number of observations: 1115
Regression analysis
Response variate: grams
Fitted terms: Constant, black, educ, lgest, smoke
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 4 235948058. 58987015. 310.09 <.001
Residual 1110 211150093. 190225.
Total 1114 447098151. 401345.
Percentage variance accounted for 52.6
Standard error of observations is estimated to be 436.
Message: the following units have large standardized residuals.
Unit Response Residual
748 1900. −3.35
1045 4830. 3.60
Message: the following units have high leverage.
(Here GenStat gave a list of 24 units, with leverages between 0.0156
and 0.0696.)
Estimates of parameters
Parameter estimate s.e. t(1110) t pr.
Constant −15929. 623. −25.55 <.001
black 1 −178.0 27.2 −6.54 <.001
educ 10.45 6.46 1.62 0.106
lgest 5242. 168. 31.21 <.001
smoke 1 −176.3 31.6 −5.58 <.001
Parameters for factors are differences compared with the reference
level:
Factor Reference level
smoke 0
black 0
block.plot stratum
cutdate 3 9.8605161 3.2868387 412.57 <.001
Residual 8 (1) 0.0637342 0.0079668 8.51
block.plot.subplot stratum
fertiliser 1 0.0794758 0.0794758 84.86 <.001
fertiliser.cutdate 3 0.0153184 0.0051061 5.45 0.018
Residual 10 (2) 0.0093659 0.0009366
Figure 3 Figure 4
Figure 5
(b) If a logistic regression model is fitted in GenStat using General
model within the Generalized Linear Models dialogue box, what
should be entered into the field Binomial Totals for this type of
data, and why? [2]
(c) Consider the following GenStat logistic regression output:
Model A
Regression analysis
Response variate: Y
Binomial totals: ????
Distribution: Binomial
Link function: Logit
Fitted terms: Constant + x1 + x2 + x3 + x4 + x5
Yes No
F > 0.1 16 20
F ≤ 0.1 8 22
M > 0.1 42 12
M ≤ 0.1 18 18
It was decided to analyse these data using log-linear analysis in
GenStat.
(a) Describe how these data should be entered into a GenStat
spreadsheet. Your answer should mention the number of rows and
columns that are required and describe the contents of each
column and each row. [4]
(b) The data were entered into a GenStat spreadsheet and then
analysed using Log-linear modelling in the Analysis field of the
Generalized Linear Models dialogue box. The following GenStat
output was obtained from fitting such a model (Model A).
Model A
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 3 17.79 5.929 5.93 <.001
Residual 4 14.98 3.746
Total 7 32.77 4.681
Dispersion parameter is fixed at 1.00.
Model C
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary + STdep.coronary
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 18.03 4.506 4.51 0.001
Residual 3 14.74 4.914
Total 7 32.77 4.681
Dispersion parameter is fixed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 16.00 −3.06
3 42.00 2.75
5 20.00 2.42
7 12.00 −2.77
Model D
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary + gender.STdep
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 18.25 4.562 4.56 0.001
Residual 3 14.52 4.840
Total 7 32.77 4.681
Dispersion parameter is fixed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 16.00 −2.74
3 42.00 2.71
5 20.00 2.42
7 12.00 −3.04
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant 2.412 0.210 11.48 <.001 11.15
gender M 0.182 0.247 0.74 0.461 1.200
STdep >0.1 0.182 0.247 0.74 0.461 1.200
coronary Yes 0.525 0.166 3.17 0.002 1.690
gender M .STdep >0.1 0.223 0.328 0.68 0.496 1.250
Message: s.e.s are based on dispersion parameter with value 1.