Open navigation menu

Scribd

0% found this document useful (0 votes)

29 views24 pages

M346-201306 XXX

The document outlines the structure and requirements for the M346/E Module Examination in Linear Statistical Modelling, scheduled for June 12, 2013. The exam consists of two parts, with Part 1 requiring one essay question and Part 2 requiring three questions, each focusing on various statistical modeling concepts. Specific guidelines for answering questions, including the importance of clarity and structure, are provided, along with detailed instructions for submitting completed work.

Uploaded by

mariuszwarczak91

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views24 pages

M346-201306 XXX

The document outlines the structure and requirements for the M346/E Module Examination in Linear Statistical Modelling, scheduled for June 12, 2013. The exam consists of two parts, with Part 1 requiring one essay question and Part 2 requiring three questions, each focusing on various statistical modeling concepts. Specific guidelines for answering questions, including the importance of clarity and structure, are provided, along with detailed instructions for submitting completed work.

Uploaded by

mariuszwarczak91

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

M346/E

Module Examination 2013

Linear Statistical Modelling

Wednesday 12 June 2013 10.00 am – 1.00 pm

Time allowed: 3 hours

This examination is in TWO parts. Part 1 carries 25% of the total available marks
and Part 2 carries 75%.

You should attempt ONE question from Part 1: this question carries 25 marks. You
should attempt THREE questions from Part 2: each question in this part also carries
25 marks.

You are advised not to cross through any work until you have replaced it with another
solution to the same question (or part of question). Crossed through work will not be
marked.

In Part 1 of the paper, if you answer both questions, your better score will count
towards your result. In Part 2 of the paper, if you answer more than three questions,
your best three scores will count towards your ﬁnal mark.

This question paper is rather long because of the inclusion of tranches of GenStat
output. Do not let its length put you oﬀ. In your initial reading of the paper,
you will be able to either ignore or pass over very quickly all such output.

Please start each question on a new page, and cross out rough working.

At the end of the examination

Check that you have written your personal identiﬁer and examination number on
each answer book used. (You may well have used only one answer book.) Failure to
do so will mean that your work cannot be identiﬁed.

Put all your used answer books together with your signed desk record on top. Fasten
them in the top left corner with the round paper fastener. Attach this question paper
to the back of the answer books with the ﬂat paper clip.

Copyright
c 2013 The Open University
PART 1 (Questions 1 and 2)
You should attempt ONE question from this part of the examination,
which carries 25% of the total available marks. Each question carries
25 marks. A guide to mark allocation is shown beside each question
thus: [4].
In each question in Part 1 you are asked to write a short essay on a
topic from the course. By the word ‘essay’, we do not mean to imply
that your answer should be entirely text; formulae and mathematical
symbols, if appropriate, are allowed. However, you should think of
this as an essay question in the senses of structure and readability.
Indeed, 4 of the 25 marks will be awarded for putting the essay
together in a reasonably clear manner, including a reasonable
structure with beginning, middle and conclusion, and reasonably
concise use of language. References to specific data-based examples in
the course are not expected. However, it may be useful to illustrate
points by giving special cases, perhaps in mathematical form (e.g.
Y ∼ N(0, σ2 ) is a special case of a distributional assumption, and
α + β1 x1 + β2 x2 is a special case of a formula for a regression mean).

Question 1
Write an essay describing the role of blocking in the design and
analysis of experiments.
Your answer should include:
• a brief description of what a block is in this context; [2]
• an outline of the reasons for using blocks in the design of an
experiment; [4]
• a brief description of an experimental situation where more than
one kind of block is involved, making it clear why it is necessary
to have more than one kind; [6]
• a brief explanation of how linear models for experimental data
take account of blocks; [5]
• a brief explanation of the reason why the ANOVA commands in
software like GenStat do not routinely display p values for blocks. [4]
The remaining four marks are for the clarity and structure of your
essay. [4]

M346 June 2013 2

Question 2
Write an essay in which you describe the role of transformations in
linear and generalized linear modelling.
Your essay should include:
• a brief definition of what transformation means in this context; [2]
• a brief explanation of the main reasons for transforming either
the response variable or the explanatory variable(s) in normal
linear models, making it clear which type of variable (response,
explanatory or both) is transformed in each case; [9]
• a brief explanation of the circumstances in which fitting a
generalized linear model might be a better alternative than
transforming the data and fitting a normal model; [5]
• a brief explanation of the circumstances in which you would still
want to transform the data in fitting a non-normal generalized
linear model. [5]
The remaining four marks are for the clarity and structure of your
essay. [4]

M346 June 2013 TURN OVER 3

PART 2 (Questions 3 to 7)
You should attempt THREE questions from this part of the
examination, which carries 75% of the total available marks. Each
question carries 25 marks. The mark allocation for each part of each
question is shown beside each part thus: [4].

Question 3
Toughness and fibrousness of asparagus are major determinants of
quality. A method for quantifying asparagus texture is based on
measuring the maximum shear force necessary to cut through the
spears. A study was carried out where each of 18 randomly selected
spears of asparagus from a local market had its maximum shear force
to cut and its percentage content of dry fibre weight measured. The
data were recorded in a GenStat file containing the variables force and
weight with the measured shear forces (in Kgf) and fibre dry weights
(in %) respectively.
(a) A scatterplot of the data is given in Figure 1.

Figure 1
(i) On the basis of this plot, would you say it is reasonable to ﬁt
a simple linear regression model to the data? Brieﬂy explain
your answer. [3]
(ii) The researcher who gathered the data considered
transforming the variables by taking the log of both force and
weight. Under what circumstances would that be helpful? [2]

M346 June 2013 4

(b) The following output is generated by GenStat from ﬁtting a simple
linear regression model, Model A, to the (untransformed) data.
Model A
Regression analysis
Response variate: weight
Fitted terms: Constant, force
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 1 2.8320 2.83200 224.64 <.001
Residual 16 0.2017 0.01261
Total 17 3.0337 0.17845
Percentage variance accounted for 92.9
Standard error of observations is estimated to be 0.112.
Estimates of parameters
Parameter estimate s.e. t(16) t pr.
Constant 1.7588 0.0658 26.72 <.001
force 0.008340 0.000556 14.99 <.001
(i) What is the regression equation for Model A? On the basis of
this model, calculate a point estimate for the percentage ﬁbre
dry weight of an asparagus spear for which the maximum
cutting shear force is 120 Kgf. [3]
(ii) The output for Model A gives the required information to
test the hypothesis that the slope of the regression line is
zero. Give the value of the test statistic and report the
results of this test, stating your conclusions clearly. [3]
(iii) A composite residual plot for Model A is given in Figure 2.
Explain whether the plot indicates that the assumptions of
the simple linear model hold for this model. [5]

Figure 2

M346 June 2013 TURN OVER 5

(c) The asparagus spears in the study were selected from two groups,
one with green stems and the other with purple stems. It was
thought that green colour was a contributing factor for higher
percentage of fibre content of asparagus. The data were recorded
in a GenStat factor called green, in which those with purple colour
were coded “0” and those with green colour were coded “1”.
To investigate whether colour had an effect on percentage of fibre
dry weight, the following two models were fitted using GenStat:
Model B: Constant + force + green
Model C: Constant + force + green + force.green
The following output was obtained.
Model B
Regression Analysis
Response variate: weight
Fitted terms: Constant + force + green
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 2 2.9199 1.459962 192.46 <.001
Residual 15 0.1138 0.007586
Total 17 3.0337 0.178454
Percentage variance accounted for 95.7
Standard error of observations is estimated to be 0.0871.
Message: the following units have large standardized residuals.
Unit Response Residual
17 3.4900 2.13
Estimates of parameters
Parameter estimate s.e. t(15) t pr.
Constant 1.7821 0.0515 34.59 <.001
force 0.007450 0.000505 14.77 <.001
green 1 0.1644 0.0483 3.40 0.004
Parameters for factors are differences compared with the reference
level:
Factor Reference level
green 0

Model C
Regression Analysis
Response variate: weight
Fitted terms: Constant + force + green + force.green
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 3 2.9200 0.973341 119.86 <.001
Residual 14 0.1137 0.008121
Total 17 3.0337 0.178454
Change -1 -0.0001 0.000099 0.01 0.914

M346 June 2013 6

Percentage variance accounted for 95.4
Standard error of observations is estimated to be 0.0901.
Message: the following units have large standardized residuals.
Unit Response Residual
17 3.4900 2.12
Estimates of parameters
Parameter estimate s.e. t(14) t pr.
Constant 1.7874 0.0719 24.85 <.001
force 0.007388 0.000765 9.66 <.001
green 1 0.152 0.125 1.22 0.244
force.green 1 0.00012 0.00105 0.11 0.914
Parameters for factors are differences compared with the reference
level:
Factor Reference level
green 0
(i) Give a simple description of the size of the effect of having
green colour on the percentage of fibre dry weight of an
asparagus spear, as identified by Model B. [1]
(ii) What can you say about whether there is a difference
between the regression slopes for the purple and green stem
groups, on the basis of Model C? [1]
(iii) With Model C, unit 17 was flagged as having large residual
and units 6 and 15 had the highest leverage, but an index
plot of Cook statistics shows that only unit 17 has a large
value. Explain why. [3]
(iv) Explain carefully which of the three models (Models A, B
and C) most appropriately describes the relationship between
the percentage of fibre dry weight of an asparagus spear and
the shear force required to cut it. [4]

M346 June 2013 TURN OVER 7

Question 4
These are data based on a 5% sample of all births occurring in
Philadelphia, USA in 1990. The sample has 1115 observations on ﬁve
variables:
black Mother is black (a factor, with 1=yes, 0=no),
educ Mother’s years of education (whole years, ranging from 0 to 17),
smoke Whether mother smoked during pregnancy (a factor, with
1=yes, 0=no),
gestate Gestational age in weeks, and
grams Birth weight in grams.
The response variable is grams, and the rest are explanatory variables.
(a) On the basis of plots of the data, it was decided that gestate
should be transformed to lgest = log(gestate) (using base e for the
logs). Give a reason why this decision might have been made. [2]
(b) Explain why it does not matter whether black (or smoke) is
recorded as a factor or as a variate in GenStat. [2]
(c) Using lgest in place of gestate, the following GenStat output gives
the correlation matrix for the variables in the data set, the results
of a multiple regression analysis of the full four-variable model,
and the results of four individual simple linear regressions of
grams on each of the explanatory variables in turn. (The
correlation matrix and full multiple regression analysis are given
direct from GenStat, except that the details of the high leverage
observations have been omitted to save space; the individual
simple linear regression results have been edited into a single
table.)

M346 June 2013 8

Correlations

black
educ −0.1458
lgest −0.1627 0.0563
smoke 0.0524 −0.2257 −0.1426
grams −0.2565 0.1187 0.7000 −0.2281
black educ lgest smoke grams
Number of observations: 1115

Regression analysis
Response variate: grams
Fitted terms: Constant, black, educ, lgest, smoke
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 4 235948058. 58987015. 310.09 <.001
Residual 1110 211150093. 190225.
Total 1114 447098151. 401345.
Percentage variance accounted for 52.6
Standard error of observations is estimated to be 436.
Message: the following units have large standardized residuals.
Unit Response Residual
748 1900. −3.35
1045 4830. 3.60
Message: the following units have high leverage.
(Here GenStat gave a list of 24 units, with leverages between 0.0156
and 0.0696.)
Estimates of parameters
Parameter estimate s.e. t(1110) t pr.
Constant −15929. 623. −25.55 <.001
black 1 −178.0 27.2 −6.54 <.001
educ 10.45 6.46 1.62 0.106
lgest 5242. 168. 31.21 <.001
smoke 1 −176.3 31.6 −5.58 <.001
Parameters for factors are diﬀerences compared with the reference
level:
Factor Reference level
smoke 0
black 0

Estimates of parameters for the individual simple regressions

Parameter estimate s.e. t(1113) t pr.
black 1 −330.7 37.4 −8.85 <.001
educ 35.87 8.99 3.99 <.001
lgest 5572. 170. 32.70 <.001
smoke 1 −337.7 43.2 −7.82 <.001

M346 June 2013 TURN OVER 9

What does this output suggest about which of the four possible
explanatory variables should be included in a good regression
model based on a subset of the four available variables? Explain
your answer clearly, making explicit which parts of the output
relate to each of your conclusions. Comment on the presence of
observations with high leverage. [7]
(d) The stepwise search method provided by GenStat, with M346
default choices, was applied to the dataset. Both forward
selection and backward elimination methods resulted in a model
containing lgest, black and smoke, but excluding educ.
(i) What is meant by forward selection and backward
elimination in the stepwise method, and why do they
sometimes result in different models? [4]
(ii) Does the model finally obtained by GenStat seem reasonable
given the preliminary analysis you did in part (c)? Briefly
explain your answer. [2]
(e) GenStat output for Model 1, the regression of grams on lgest,
smoke and black only, is given next.
Model 1
Regression analysis
Response variate: grams
Fitted terms: Constant, lgest, smoke, black
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 3 235450059. 78483353. 411.98 <.001
Residual 1111 211648092. 190502.
Total 1114 447098151. 401345.
Percentage variance accounted for 52.5
Standard error of observations is estimated to be 436.
Message: the following units have large standardized residuals.
Unit Response Residual
748 1900. −3.34
1045 4830. 3.68
Message: the following units have high leverage.
(Here GenStat gave a list of 21 units, with leverages between 0.0170
and 0.0692.)
Estimates of parameters
Parameter estimate s.e. t(1111) t pr.
Constant −15798. 619. −25.54 <.001
lgest 5243. 168. 31.19 <.001
smoke 1 −187.5 30.9 −6.07 <.001
black 1 −184.0 27.0 −6.82 <.001

M346 June 2013 10

Parameters for factors are differences compared with the reference
level:
Factor Reference level
smoke 0
black 0
Explain whether you consider Model 1 to be a good model. [2]
(f) Model 2 was found to be another sensible model. The resulting
GenStat output from fitting this model is as follows.
Model 2
Regression analysis
Response variate: grams
Fitted terms: Constant + lgest + smoke + black + smoke.black
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 4 237440198. 59360049. 314.27 <.001
Residual 1110 209657954. 188881.
Total 1114 447098151. 401345.
Percentage variance accounted for 52.9
Standard error of observations is estimated to be 435.
Message: the following units have large standardized residuals.
Unit Response Residual
94 4830. 3.34
119 3438. 3.33
1045 4830. 3.74
Message: the following units have high leverage.
(Here GenStat gave a list of 19 units, with leverages between 0.0174
and 0.0762.)
Estimates of parameters
Parameter estimate s.e. t(1110) t pr.
Constant −15866. 616. −25.75 <.001
lgest 5269. 168. 31.45 <.001
smoke 1 −314.8 49.8 −6.32 <.001
black 1 −230.6 30.5 −7.57 <.001
smoke 1 .black 1 204.5 63.0 3.25 0.001
Parameters for factors are differences compared with the reference
level:
Factor Reference level
smoke 0
black 0
(i) Compare Model 2 with Model 1, the model considered in
part (e), and comment on their advantages and weaknesses. [2]
(ii) Write down the regression equation for Model 2. Explain this
equation in qualitative terms. [3]
(iii) Using Model 2, compute the expected birth weight in grams
for a white smoking woman with 39 weeks gestational age. [1]

M346 June 2013 TURN OVER 11

Question 5
An agricultural experiment was carried out to investigate the effects
of date of cutting and of nitrogen fertiliser on the yield of forage. The
data were saved in a GenStat spreadsheet.
The experiment was laid out in four blocks (GenStat factor block),
each of which contained four plots (GenStat factor plot). Each plot
was divided into two subplots (GenStat factor subplot), one of which
was fertilised with a nitrogen fertiliser. The GenStat factor fertiliser
records whether each subplot was fertilised or not. The four plots
within each block were each harvested on a different date (11 June,
1 July, 22 July and 12 August), with these dates recorded in the
GenStat factor cutdate. The yield of forage (GenStat variate yield) on
each subplot was recorded. Unfortunately no data could be obtained
for three of the subplots in the experiment.
Preliminary analysis indicated that some transformation of the data
was necessary, and a log transform (base e) of the yields was carried
out (variate lyield).
(a) Explain why, in some experiments, it is necessary to split the
plots and to apply one treatment factor to whole plots and
another to subplots. [2]
(b) How should the experimenters have decided which cutting date is
used for which plot within a block, and which subplot in each
plot should have the fertiliser? Give reasons for your answer. [3]
(c) Give a possible reason why the experimenters may have chosen to
divide the field into four blocks rather than just using each of the
four cutting dates for four plots chosen from anywhere in the field. [2]
(d) The data were analysed using GenStat’s analysis of variance
commands. The resulting output, together with the (GenStat
default) residual plots (Figure 3) and a potentially useful plot of
means (Figure 4), are as follows.
Analysis of variance
Variate: lyield

Source of variation d.f. (m.v.) s.s. m.s. v.r. F pr.

block stratum 3 0.0302010 0.0100670 1.26

block.plot stratum
cutdate 3 9.8605161 3.2868387 412.57 <.001
Residual 8 (1) 0.0637342 0.0079668 8.51

block.plot.subplot stratum
fertiliser 1 0.0794758 0.0794758 84.86 <.001
fertiliser.cutdate 3 0.0153184 0.0051061 5.45 0.018
Residual 10 (2) 0.0093659 0.0009366

Total 28 (3) 9.3620650

M346 June 2013 12

Message: the following units have large residuals.
block 1 plot 5 subplot 1 0.0452 approx. s.e. 0.0171
block 1 plot 5 subplot 2 −0.0452 approx. s.e. 0.0171
Tables of means
Variate: lyield
Grand mean 4.0373
fertiliser no yes
3.9874 4.0871
cutdate Jun11 Jul01 Jul22 Aug12
3.1120 4.1073 4.4105 4.5192
fertiliser cutdate Jun11 Jul01 Jul22 Aug12
no 3.0287 4.0650 4.3595 4.4965
yes 3.1953 4.1496 4.4616 4.5419
Standard errors of diﬀerences of means
Table fertiliser cutdate fertiliser
cutdate
rep. 16 8 4
s.e.d. 0.01082 0.04463 0.04718
d.f. 10 8 9.88
Except when comparing means with the same level(s) of
cutdate 0.02164
d.f. 10
(Not adjusted for missing values)

Figure 3 Figure 4

M346 June 2013 TURN OVER 13

(i) Explain briefly the meaning of the numbers in brackets in the
(m.v.) column of the analysis of variance table. [2]
(ii) Explain briefly why the rows labelled cutdate and fertiliser
appear in different strata in the analysis of variance table. [2]
(iii) Explain why there is a row labelled fertiliser.cutdate in the
analysis of variance table, and what hypothesis the figures in
this row can be used to test. [3]
(iv) Are there any aspects of the output or plots that give you
cause to suspect that the assumptions of the analysis of
variance model may not hold? Briefly explain your answer.
Suggest what further investigation or analysis you might
carry out in the light of any suspicions you may report. [5]
(v) Suppose you are asked to report on this experiment to
agriculturalists who want to know how cutting date and
nitrogen fertiliser might affect their own forage yields. Write
a brief summary of the main findings from this experiment. [6]

M346 June 2013 14

Question 6
A GenStat data file contains records on 32 Tibetan skulls divided into
two groups (1-17 and 18-32), denoted on the file by Y = 0 for the first
group and Y = 1 for the second. On each skull five measurements (in
millimeters) were recorded: greatest length of skull (x1), greatest
horizontal breadth of skull (x2), height of skull (x3), upper face height
(x4), and face breadth, between outermost points of cheek bones (x5).
The goal of the analysis is to estimate the probability that a skull
belongs to one of the two groups, given the values of its five
measurements.
(a) Explain briefly why logistic regression is appropriate for analysing
these data. Explain briefly whether the scatter plot of x1 against
Y in Figure 5 supports this choice. Say, with a reason, whether
you could use log-linear modelling instead. [3]

Figure 5
(b) If a logistic regression model is ﬁtted in GenStat using General
model within the Generalized Linear Models dialogue box, what
should be entered into the ﬁeld Binomial Totals for this type of
data, and why? [2]
(c) Consider the following GenStat logistic regression output:
Model A
Regression analysis
Response variate: Y
Binomial totals: ????
Distribution: Binomial
Link function: Logit
Fitted terms: Constant + x1 + x2 + x3 + x4 + x5

M346 June 2013 TURN OVER 15

Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 5 24.07 4.8149 4.81 <.001
Residual 26 20.16 0.7755
Total 31 44.24 1.4270
Dispersion parameter is fixed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
25 1.00 2.12
Message: the residuals do not appear to be random; for example,
fitted values in the range 0.00 to 0.22 are consistently larger than
observed values and fitted values in the range 0.70 to 1.00 are
consistently smaller than observed values.
Message: the error variance does not appear to be constant:
intermediate responses are more variable than small or large
responses.
Message: the following units have high leverage.
Unit Response Leverage
1 0.00 0.57
12 0.00 0.66
14 0.00 0.44
25 1.00 0.42
26 1.00 0.64
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant −73.5 44.4 −1.66 0.098 *
x1 0.1289 0.0966 1.33 0.182 1.138
x2 −0.323 0.180 −1.80 0.072 0.7237
x3 0.104 0.122 0.85 0.394 1.110
x4 0.335 0.230 1.46 0.145 1.399
x5 0.424 0.270 1.57 0.116 1.528
Message: s.e.s are based on dispersion parameter with value 1.
Making use of the Model A output, answer the following
questions:
(i) Explain briefly why the regression degrees of freedom takes
the value 5. [2]
(ii) In the Model A output, what does approx chi pr <.001 mean
and what does it tell us about the usefulness of the terms in
the regression model? [2]
(iii) Does Model A fit the data well or not? Explain briefly what
evidence there is for your answer in the Model A output. [3]

M346 June 2013 16

(iv) Are there warnings in the Model A output that can be
disregarded, and, if so, why? [3]
(v) Does overdispersion appear to be problem for Model A?
Brieﬂy explain why or why not. [2]
(d) It was decided to discard the variable with the largest t pr. value
in Model A, x3, and consider the resulting reduced model. The
same process was then repeated. The models found and the
t pr. values of the non-constant terms involved in them are listed
in Table 1:
Table 1
model t pr. values
Constant + x1 + x2 + x4 + x5 0.144, 0.054, 0.172, 0.093
Constant + x1 + x2 + x5 0.036, 0.055, 0.041
Constant + x1 + x5 0.028, 0.164
Constant + x1 0.006
Which of the listed models do you consider best, and why? [2]
(e) The GenStat output for the last model of Table 1 is as follows:
Model B Regression analysis
Response variate: Y
Binomial totals: ????
Distribution: Binomial
Link function: Logit
Fitted terms: Constant, x1
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 1 13.32 13.316 13.32 <.001
Residual 30 30.92 1.031
Total 31 44.24 1.427
Dispersion parameter is ﬁxed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 0.00 −2.14
Message: the error variance does not appear to be constant:
intermediate responses are more variable than small or large
responses.
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant −34.5 12.6 −2.74 0.006 *
x1 0.1914 0.0703 2.72 0.006 1.211
Message: s.e.s are based on dispersion parameter with value 1.

M346 June 2013 TURN OVER 17

Using Model B, write down the fitted equation for the probability
that a skull belongs to the group for which Y = 1. [2]
(f) All five logistic regressions of Y on the five individual
measurements were performed in turn. The regression deviances,
regression d.f.s and p values are shown in Table 2:
Table 2
d.f. Deviance p value
x1 1 13.32 < .001
x2 1 0.07 0.795
x3 1 1.74 0.188
x4 1 15.47 < .001
x5 1 8.84 0.003
Which of the explanatory variables appear to be important
predictors for Y , and why? [2]
(g) It was decided to perform backward elimination of variables
based on Table 2. The resulting models are listed in Table 3:
Table 3
model t pr. values
Constant + x1 + x4 + x5 0.218, 0.148, 0.371
Constant + x1 + x4 0.207, 0.073
Constant + x4 0.006
The GenStat output from the last model in Table 3 is as follows:
Model C Regression analysis
Response variate: Y
Binomial totals: ????
Distribution: Binomial
Link function: Logit
Fitted terms: Constant, x4
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 1 15.47 15.4734 15.47 <.001
Residual 30 28.76 0.9588
Total 31 44.24 1.4270
Dispersion parameter is fixed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
32 1.00 2.13
Message: the error variance does not appear to be constant:
intermediate responses are more variable than small or large
responses.

M346 June 2013 18

Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant −27.99 9.89 −2.83 0.005 *
x4 0.380 0.134 2.83 0.005 1.462
Message: s.e.s are based on dispersion parameter with value 1.
Which of Model B and Model C do you consider to be better, and
why? [2]

M346 June 2013 TURN OVER 19

Question 7
A study of coronary artery disease recorded the electrocardiogram
(ECG) conditions of 156 patients who voluntarily attended a clinic
and requested an evaluation. Table 4 contains data from the study
where classiﬁcations were made according to gender of the patient
(gender), the ST segment depression measure from the patient’s ECG
(STdep) and whether the patient was diagnosed with coronary artery
disease (coronary). The number of patients in each category is the
variable count. The interest in analysing these data focuses on how
other variables might aﬀect the chance that a patient is diagnosed
with coronary artery disease.
Table 4

Gender ST depression Coronary

Yes No
F > 0.1 16 20
F ≤ 0.1 8 22
M > 0.1 42 12
M ≤ 0.1 18 18
It was decided to analyse these data using log-linear analysis in
GenStat.
(a) Describe how these data should be entered into a GenStat
spreadsheet. Your answer should mention the number of rows and
columns that are required and describe the contents of each
column and each row. [4]
(b) The data were entered into a GenStat spreadsheet and then
analysed using Log-linear modelling in the Analysis field of the
Generalized Linear Models dialogue box. The following GenStat
output was obtained from fitting such a model (Model A).
Model A
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 3 17.79 5.929 5.93 <.001
Residual 4 14.98 3.746
Total 7 32.77 4.681
Dispersion parameter is fixed at 1.00.

M346 June 2013 20

Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 16.00 −2.61
3 42.00 2.62
7 12.00 −2.56
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant 2.340 0.186 12.57 <.001 10.38
gender M 0.310 0.162 1.92 0.055 1.364
STdep >0.1 0.310 0.162 1.91 0.056 1.364
coronary Yes 0.525 0.166 3.17 0.002 1.690
Message: s.e.s are based on dispersion parameter with value 1.
Parameters for factors are differences compared with the reference
level:
Factor Reference level
gender F
STdep <=0.1
coronary No
(i) What is the value of the test statistic for testing whether
Model A fits the data adequately? What distribution would
this test statistic be compared to for the test? The p value for
the test turns out to be 0.0047. What do you conclude? [3]
(ii) If Model A fitted the data adequately, what would that tell
you about the relationship between the main factors (gender,
STdep and coronary)? [2]
(c) Three further models, Model B, Model C and Model D were then
fitted to the data. The resulting output is as follows.
Model B
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary + gender.coronary
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 19.13 4.782 4.78 <.001
Residual 3 13.64 4.547
Total 7 32.77 4.681
Dispersion parameter is fixed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.

M346 June 2013 TURN OVER 21

Unit Response Residual
1 16.00 −2.35
3 42.00 2.38
4 18.00 2.05
6 22.00 2.11
7 12.00 −2.31
8 18.00 −2.59
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant 2.472 0.211 11.73 <.001 11.85
gender M 0.069 0.263 0.26 0.793 1.071
STdep >0.1 0.310 0.162 1.91 0.056 1.364
coronary Yes 0.305 0.249 1.23 0.220 1.357
gender M .coronary Yes 0.388 0.335 1.16 0.246 1.474
Message: s.e.s are based on dispersion parameter with value 1.
Parameters for factors are diﬀerences compared with the reference
level:
Factor Reference level
gender F
STdep <=0.1
coronary No

Model C
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary + STdep.coronary
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 18.03 4.506 4.51 0.001
Residual 3 14.74 4.914
Total 7 32.77 4.681
Dispersion parameter is ﬁxed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 16.00 −3.06
3 42.00 2.75
5 20.00 2.42
7 12.00 −2.77

M346 June 2013 22

Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant 2.398 0.217 11.04 <.001 11.00
gender M 0.310 0.162 1.92 0.055 1.364
STdep >0.1 0.208 0.264 0.79 0.431 1.231
coronary Yes 0.431 0.252 1.71 0.087 1.538
STdep >0.1 .coronary Yes 0.164 0.334 0.49 0.624 1.178
Message: s.e.s are based on dispersion parameter with value 1.
Parameters for factors are diﬀerences compared with the reference
level:
Factor Reference level
gender F
STdep <=0.1
coronary No

Model D
Regression analysis
Response variate: count
Distribution: Poisson
Link function: Log
Fitted terms: Constant + gender + STdep + coronary + gender.STdep
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 18.25 4.562 4.56 0.001
Residual 3 14.52 4.840
Total 7 32.77 4.681
Dispersion parameter is ﬁxed at 1.00.
Message: deviance ratios are based on dispersion parameter with
value 1.
Message: the following units have large standardized residuals.
Unit Response Residual
1 16.00 −2.74
3 42.00 2.71
5 20.00 2.42
7 12.00 −3.04
Estimates of parameters
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant 2.412 0.210 11.48 <.001 11.15
gender M 0.182 0.247 0.74 0.461 1.200
STdep >0.1 0.182 0.247 0.74 0.461 1.200
coronary Yes 0.525 0.166 3.17 0.002 1.690
gender M .STdep >0.1 0.223 0.328 0.68 0.496 1.250
Message: s.e.s are based on dispersion parameter with value 1.

M346 June 2013 TURN OVER 23

Parameters for factors are differences compared with the reference
level:
Factor Reference level
gender F
STdep <=0.1
coronary No
(i) Explain why the regression degrees of freedom for Model B is
the same as for Model C and Model D. [2]
(ii) Model B, Model C and Model D can each be obtained from
Model A by adding one two-way interaction. By considering
how the deviance changes, explain which of Models B, C and
D seems to be the most appropriate model. [4]
(d) Which of the four Models A, B, C and D would you choose as the
most appropriate to assess the chance that a patient is diagnosed
with coronary artery disease? Briefly explain why. [3]
(e) Data which can be analysed by means of a log-linear model can
sometimes also be analysed by means of logistic regression.
(i) Give one advantage and one disadvantage of using logistic
regression rather than log-linear modelling. [2]
(ii) Explain why the data in this question could have been
analysed using logistic regression instead of fitting log-linear
models. [2]
(iii) Some log-linear models correspond exactly to a logistic
model. Explain which one of the four log-linear Models A, B,
C and D has a logistic model that corresponds to it exactly. [3]
[END OF QUESTION PAPER]

M346 June 2013 24

You might also like

Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Bivariate
No ratings yet
Bivariate
28 pages
Sta 226
No ratings yet
Sta 226
5 pages
Rss Grad Diploma Module4 Solutions Specimen B PDF
No ratings yet
Rss Grad Diploma Module4 Solutions Specimen B PDF
16 pages
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
No ratings yet
Fikret Isik - Lecture Notes For Statistics Session - IUFRO Genetics of Host-Parasite Interactions in Forestry - 2011
47 pages
Solving XOR Problem Using DNN AIDS
100% (1)
Solving XOR Problem Using DNN AIDS
4 pages
The University of Auckland: Second Semester, 2004 Campus: City
No ratings yet
The University of Auckland: Second Semester, 2004 Campus: City
23 pages
Review For Final Exam: New Material ONLY
No ratings yet
Review For Final Exam: New Material ONLY
4 pages
General Linear Model
No ratings yet
General Linear Model
7 pages
Assignment 4 - BUS 336
No ratings yet
Assignment 4 - BUS 336
4 pages
Final Exam in Statistics
No ratings yet
Final Exam in Statistics
7 pages
Common Statistical Tests Are Linear Models
No ratings yet
Common Statistical Tests Are Linear Models
1 page
Assignment 2 Stat222
No ratings yet
Assignment 2 Stat222
8 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Beam Design Excel Sheet
No ratings yet
Beam Design Excel Sheet
1 page
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
No ratings yet
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
5 pages
2023 Tutorial 11
No ratings yet
2023 Tutorial 11
7 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Assignment Econ6034 2023 s1
No ratings yet
Assignment Econ6034 2023 s1
7 pages
Mgeb12 23S T21
No ratings yet
Mgeb12 23S T21
22 pages
Unit-15 Data Analysis and R
No ratings yet
Unit-15 Data Analysis and R
12 pages
Particulars of Factories Paying Revenue of Rs. One Crore and Above During The Year 2006-2007 As Compared To 2005 - 06 Commissionerate: Chennai-Iv
No ratings yet
Particulars of Factories Paying Revenue of Rs. One Crore and Above During The Year 2006-2007 As Compared To 2005 - 06 Commissionerate: Chennai-Iv
13 pages
ch16 Solutions
No ratings yet
ch16 Solutions
94 pages
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
No ratings yet
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
16 pages
Team8 Lab3
No ratings yet
Team8 Lab3
12 pages
M348 Applied Statistical Modelling - Generalised Linear Models
No ratings yet
M348 Applied Statistical Modelling - Generalised Linear Models
295 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Loops in Python - Shishir Kant Singh
No ratings yet
Loops in Python - Shishir Kant Singh
16 pages
05STATS
No ratings yet
05STATS
6 pages
STA 2311 Statistical Programming II
No ratings yet
STA 2311 Statistical Programming II
3 pages
Midterm2021R1 Sol PDF
No ratings yet
Midterm2021R1 Sol PDF
13 pages
1 Regression
No ratings yet
1 Regression
65 pages
Stats 12 Practice Test
No ratings yet
Stats 12 Practice Test
6 pages
Isi Mtech Qror 08
No ratings yet
Isi Mtech Qror 08
36 pages
Untitled
No ratings yet
Untitled
5 pages
Guidelines For Final Year BE Project Report Submission
No ratings yet
Guidelines For Final Year BE Project Report Submission
4 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
SG49K5J: Multi-MPPT String Inverter For Japan System
No ratings yet
SG49K5J: Multi-MPPT String Inverter For Japan System
1 page
Law Firm Receptionist11-Signed
No ratings yet
Law Firm Receptionist11-Signed
8 pages
Statistical Modelling and Inference Assignment 4: Spring 2 2022
No ratings yet
Statistical Modelling and Inference Assignment 4: Spring 2 2022
12 pages
R Rec M.823 3 2006
No ratings yet
R Rec M.823 3 2006
20 pages
Distributed Generation and Microturbines
No ratings yet
Distributed Generation and Microturbines
5 pages
Flashpool Design
No ratings yet
Flashpool Design
25 pages
Biomass Boiler Layout - Shalina Lab Jejuri
No ratings yet
Biomass Boiler Layout - Shalina Lab Jejuri
1 page
Homework 9: Independent and Paired Samples T-Tests: Information 1
No ratings yet
Homework 9: Independent and Paired Samples T-Tests: Information 1
7 pages
Ps Lregression
No ratings yet
Ps Lregression
6 pages
ICT Skills Development Training - 2
No ratings yet
ICT Skills Development Training - 2
69 pages
DS Theory HW 3
No ratings yet
DS Theory HW 3
6 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Copy For OM Students
No ratings yet
Copy For OM Students
4 pages
Topical Revision Qns - Computer Studies (Paper 1)
No ratings yet
Topical Revision Qns - Computer Studies (Paper 1)
66 pages
M346 Paper 2010
No ratings yet
M346 Paper 2010
26 pages
X400004 2021 07 09 Extra Resit
No ratings yet
X400004 2021 07 09 Extra Resit
8 pages
X400004 2021 02 09 Course
No ratings yet
X400004 2021 02 09 Course
8 pages
M346 Paper 2012
No ratings yet
M346 Paper 2012
22 pages
M346 Paper 2011
No ratings yet
M346 Paper 2011
24 pages
FS - 720 - Общее описание - A6V10210355
No ratings yet
FS - 720 - Общее описание - A6V10210355
182 pages
Technology NEW Vocab Parts 1-2-3
No ratings yet
Technology NEW Vocab Parts 1-2-3
21 pages
Econometrics Trial Exam 1
No ratings yet
Econometrics Trial Exam 1
15 pages
2017dec 02402 Solution en
No ratings yet
2017dec 02402 Solution en
45 pages
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
No ratings yet
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
4 pages
Matlab Odd Workbook - 2022-2023
No ratings yet
Matlab Odd Workbook - 2022-2023
60 pages
Applications of Reinforcement Learning
No ratings yet
Applications of Reinforcement Learning
10 pages
OOSD All Units Notes by MultiAtoms
No ratings yet
OOSD All Units Notes by MultiAtoms
93 pages
RTI GHY April 22
No ratings yet
RTI GHY April 22
42 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Customer On-Boarding - Disclaimer For Retail Bug - Fix - 2024-05-23 - 1155.export-Log
No ratings yet
Customer On-Boarding - Disclaimer For Retail Bug - Fix - 2024-05-23 - 1155.export-Log
109 pages
3602final Question
No ratings yet
3602final Question
18 pages
ASM Question Paper
No ratings yet
ASM Question Paper
2 pages
#2 - Table of Contents - Approval Sheet - Footer - Acknowledgement
No ratings yet
#2 - Table of Contents - Approval Sheet - Footer - Acknowledgement
3 pages
CSE 312-Introduction To Statistical Tools in Research - Question Bank
No ratings yet
CSE 312-Introduction To Statistical Tools in Research - Question Bank
6 pages
2.1 - Linear Regression
No ratings yet
2.1 - Linear Regression
20 pages
Do Not Dare To Copy It
No ratings yet
Do Not Dare To Copy It
37 pages
Ebook - m346 - Sep2 (11B-15J) - E2i1 - Web026093 - l3 XXXX
No ratings yet
Ebook - m346 - Sep2 (11B-15J) - E2i1 - Web026093 - l3 XXXX
22 pages
Ebook - m346 - Ssep2 (09-16) - E1i1 - Web004556 - l3 XXX
No ratings yet
Ebook - m346 - Ssep2 (09-16) - E1i1 - Web004556 - l3 XXX
10 pages
Ebook m346 Ebook M346ebook M346ebook m346
No ratings yet
Ebook m346 Ebook M346ebook M346ebook m346
22 pages
MATH232 Quiz6 Solutions PDF
No ratings yet
MATH232 Quiz6 Solutions PDF
8 pages
Graded Homework 1 Solutions
No ratings yet
Graded Homework 1 Solutions
19 pages
LS LT Reference Guide Summit Racing
No ratings yet
LS LT Reference Guide Summit Racing
2 pages
Eco 15
No ratings yet
Eco 15
3 pages
Logistic Regression Summary
No ratings yet
Logistic Regression Summary
55 pages
p7 Regresion Gia Ara r0
No ratings yet
p7 Regresion Gia Ara r0
17 pages
MSG469 FinalExam2023 (EmptyCellsCorrection)
No ratings yet
MSG469 FinalExam2023 (EmptyCellsCorrection)
15 pages
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
No ratings yet
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
10 pages
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Graph Theory
From Everand
Graph Theory
Ronald Gould
No ratings yet
A First Course in Dimensional Analysis: Simplifying Complex Phenomena Using Physical Insight
From Everand
A First Course in Dimensional Analysis: Simplifying Complex Phenomena Using Physical Insight
Juan G. Santiago
No ratings yet
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy