0% found this document useful (0 votes)
58 views

Exercises

This document provides instructions and exercises for students taking the course "INTHE4020 Introduction to quantitative methods" using Stata. The exercises guide students through common quantitative analysis tasks like planning a study, entering data, descriptive statistics, frequencies, and recoding variables. Instructions are given for each analysis task with examples using sample health and survey data provided in Stata format. Students are asked questions to help them understand and evaluate the results of the different analyses.

Uploaded by

maddy.nesteby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Exercises

This document provides instructions and exercises for students taking the course "INTHE4020 Introduction to quantitative methods" using Stata. The exercises guide students through common quantitative analysis tasks like planning a study, entering data, descriptive statistics, frequencies, and recoding variables. Instructions are given for each analysis task with examples using sample health and survey data provided in Stata format. Students are asked questions to help them understand and evaluate the results of the different analyses.

Uploaded by

maddy.nesteby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Exercises for INTHE4020, Introduction to

quantitative methods (version for Stata)


Datafiles for the exercises can be found in Canvas.

Exercise 1 (Wednesday 8.11)


Planning of studies
You want to investigate the following rather broad research question:

 How does parents’ socioeconomic status influence children’s health?


a) Narrow the question down to something that is manageable in a single study.

b) How would you plan such a study? Here you will need to describe your study
design, who should be included in the study, how would you measure your exposure
and your outcome (maybe you will have to google ..), etc.

The main idea behind this exercise is to spend some time on discussing different
aspects of the planning of such a study, so don’t rush it!

Exercise 2 (Wednesday 8.11) Planning of studies


You are planning a study to investigate the effect of intake of antioxidants on the risk
of lung cancer among smokers. Antioxidants are nutrients typically found in fruits,
vegetables, berries but also in coffee and dark chocolate. You can also take
antioxidant supplements in form of pills. In the planning of the study, you have to
choose study design. You can choose among the following designs:
 Cross-sectional
 Case-control
 Cohort
 Randomized trial

A
Which of these designs can be used to investigate the research question above? Give
pros and cons of each design.

B
Choose one of these designs, describe briefly what you would do and argue why you
find this most appropriate. You should notice, there are several possible answers here.

As for Exercise 1, spend some time on discussing these different options.

-1-
Exercise 3: Entering data + descriptive analysis of
continuous data (Friday 10.11)
Getting started…
Start the computer and log on to your VMWare Horizon Clint. Once inside, choose
Statistics fullscreen (Statistikk fullskjerm). Start Stata from the Start menu.
Alternatively, if you have downloaded Stata to your own computer, just start it from
the Start menu.

A
The following data contain the weight (in kilograms) of 20 students:

Girls:
50 75 70 74 68 83 65 66 65 53
Boys:
65 75 84 55 73 95 72 94 67 65

We are going to enter these data into Stata:

The first thing to do is to define the two variables Sex and Weight.

Click Data – Create or change – Create new variable on the menu line.

Give a Variable name (Sex) and choose "Fill with missing data" and click OK.

Notice, Stata produces the following code: Generate Sex = .

Repeat for next variable.

Next, we will enter our observations.

Choose the Data Editor window (Edit mode) from the Stata menu.
You can now enter your data.
When you enter Sex, you may code girl = 1 and boy = 2.

-2-
I have also included an ID variable, to identify each subject. This takes the values 1-
20. This is not very important in this case, but it’s a nice habit and can prove very
useful when you deal with larger data sets.

Value labels:
We would like to be able to link the correct gender to the coding of the variable 'Sex'.

In the Data Editor window, right-click "Sex", choose Data – Value labels – Manage
value labels.
In the next window, click "Create label".
Next, give a label name (e.g. sexl) and give labels to the values '1' and '2' (girl and
boy).
Click OK
Finally, we need to link the labels to our variable 'Sex'. We do that by right-clicking
"Sex" again, choose Data – Value labels – Attach label to variable Sex. Choose 'sexl'.

Notice again that Stata has produced some code:


label define sexl 1 "Girl" 2 "Boy"
label values Sex sexl

It’s advisable to save the datafile now, to avoid losing it.

-3-
B
So we have two variables: Sex which is categorical and Weight which is continuous.
We are mainly interested in the weight data, and will use sex to group the data.

We are going to do a simple descriptive analysis in Stata.

On the menu line, choose Statistics – Summaries, tables, and tests – Summary and
descriptive statistics – Summary statistics.

Choose Weight under Variables, by clicking the small arrow to the right. Click also
Display additional statistics under Options.

Chose by/if/in on top, click Repeat command by groups, and chose Sex as group.

Click OK.

-4-
We would also like to create a histogram of the weight data.
On the menu line, choose Graphics – histogram. Choose Weight as Variable and
choose Percent under Y axis. Go to By on top, and click Draw subgraph …. Choose
Sex and click OK.

In the table of descriptive statistics, which values do you know the meaning of? Do
the boys and girls appear to be different?

-5-
C
You will often need to compare observed distributions with the theoretical normal
distribution. For that purpose, we add a normal curve to the histograms: Redo the
histogram as above, but in addition, choose Density plots on top and click Add
normal-density plot.

Do the data look like they are normally distributed?

D
Try to create a Box plot. Choose Graphics on the menu line and go from there. What
can we learn from the Box plot? (Use the Help graph box for a description of Box
plots – move down to Description.) Can you identify any ”suspicious” observations?

Exercise 4 (Friday 10.11): Frequencies and Recoding


of variables
Open the data file SURVEY.DTA in Stata: Choose File – Open …

This is a real data file, condensed from a study that was conducted by a group of
graduate students in Educational Psychology. The study was designed to explore the
factors that impact on respondents' psychological adjustment and wellbeing. The
survey contained a variety of validated scales measuring constructs that the extensive
literature on stress and coping suggest influence people's experience of stress. These
scales are, however, not included in the current data file. The survey was distributed
to members of the general public in Melbourne, Australia and surrounding districts.
The final sample size was 439, consisting of 42 percent males and 58 percent females,
with ages ranging from 18 to 82 (mean=37.4).

-6-
The variable source denotes each person’s main source of stress. To see the coding of
this variable, open the Data Editor window (Edit mode) from the Stata menu, right-
click source and choose Data – Value labels – Manage value labels and click on the
relevant 'plus'-sign.

We wish to assess important sources of stress, and if there are gender differences.

A
We will first evaluate the common frequency distribution. Choose Statistics –
Summaries, tables, and tests – Frequency tables – One-way table. Choose source
under ‘Categorical variable’ and click OK.

It is also important to be aware of missing values. Redo the procedure above, but click
"Treat missing values like other values" in the panel.

Notice the differences in the Percent column.

Next, we want to create a Bar chart of the same data. Choose Graphics – Bar chart –
Tick: Graph of percent of frequencies within categories. Under “Categories”, tick:
Group 1 and add grouping variable source. Alternatively, you may run the command:
graph bar, over(source). Here, the different sources of stress appear a bit messy. We
can fix that by using the graph editor (in the graph, choose File – Start graph editor),
click the legend (the messy part below the graph) and choose 'Label angel 45o' on top.

What seems to be the main source of stress?

B
It is interesting to do the analyses separately for men and women. We can use the
following procedure:

Statistics – Summaries, tables, and tests – Frequency tables – Two-way tables with
measures of association. Choose source on the rows and sex on the columns. Tick the
"Within-column relative frequencies" box (why?). Click OK.

We can also ask for a Bar chart for each gender, separately by graph bar,
over(source) by(sex). In the menu, you do as above, but click also “By” on the top
line, tick “Draw subgraphs …” and choose sex under “Variables”.

What do you see?


Does there appear to be differences between the genders in what they consider to be
the most important source of stress?

C
You will find a variable named age which contains the age of all the subjects.

-7-
For practical data analysis, we will categorize age into three groups: 18-25 years, 26-
40 years and over 40 years.
This can be done by the following commands (make sure you understand what is
being done):

generate agegr = .
replace agegr = 1 if (age >= 18) & (age <=25)
replace agegr = 2 if (age >= 26) & (age <=40)
replace agegr = 3 if (age > 40)

Alternatively, by the menu: Data – Create or change data – Change contents of


variable.
Under “New contents”, write “1 if (age >=18) & (age<=25) etc.

You may also add a Value label to each numerical category to show what each value
(1, 2, 3) represents, as we did in Exercise 1.

You can control the recoding by making a frequency table as in question B), choosing
age as row variable and agegr as column variable. Alternatively, run the command
tabulate age agegr.

Evaluate the frequency table: Was the coding done as you wanted it to?

Exercise 5: VAS (Friday 10.11)


Visual analogue scales (VAS) are commonly used to quantify different sorts of
subjective experience. Typical examples are pain and quality of life (or certain aspects
of quality of life).
An example of the use of such a scale could be the following:
“Indicate your current level of pain on the following scale”

No pain Unbearable pain

An open question for discussion: What do you think about this type of measure?
The discussion can be guided by the following points:
i) What does it mean to have 3.1cm pain?
ii) If one person ticks 3.1cm pain and another person ticks 4.5cm pain,
what can we say about the level of pain in these two persons?
iii) If one person ticks 3.1 cm pain and then the next day ticks 4.5cm, what
can we say about the change in pain?
iv) Is a reduction from 2cm to 1cm pain the same as a reduction from 6cm
to 5cm?
What do we think about validity and reliability?

-8-
Exercise 6: More descriptive statistics (Friday 10.11 /
Monday 13.11)
A
Open the data set BIRTH.DTA in Stata.

From the description of the study at the University of Massachusetts:


“The goal of this study was to identify risk factors associated with giving birth to a
low birth weight baby (weighing less than 2500 grams). Data were collected on 189
women, 59 of which had low birth weight babies and 130 of which had normal birth
weight babies. Four variables which were thought to be of importance were age,
weight of the subject at her last menstrual period, ethnicity, and the number of
physician visits during the first trimester of pregnancy.”

Create a histogram over the children’s birth weight, that is, over the variable bwt. Find
mean and median, as well as the minimum and maximum values. Find the standard
deviation.

Go to Graphics – Histogram and choose the bwt variable. Alternatively, use the
command histogram bwt.

Do also a descriptive analysis by Statistics - Summaries, tables, and tests – Summary


and descriptive statistics – Summary statistics. Click also Display additional statistics
under Options.

Numerical answers: mean=2944.7, median=2977, min=709, max=4990,


stdev=729.02.

B
Make histograms and find mean and standard deviation for bwt for smoking and
nonsmoking mothers, respectively. Repeat the procedures under A, but by smk (use
the menu or use commands). The commands will be
histogram bwt, by(smk)
and
by smk, sort : summarize bwt, detail

Does it seem like smoking during pregnancy affects the children’s birth weight?

C
Make histograms and find mean and standard deviation for bwt for the different
values of ht. Does it seem like hypertension affects the birth weight? How many of
the mothers had "History of hypertension"?

 What seems to be most important in explaining bwt: smk or ht?

-9-
Exercise 7 (Monday 13.11): Measure of central
tendency
(Solve this exercise both with and without use of Stata)

Calculate mean and median for the following small dataset:

2,11,4,5,3

Comment on the result.

Exercise 8 (Not in use): Normal distribution and


reference range
Medical laboratories define a reference range for what we can think of as "normal"
values for different parameters, typically based on an assumed normal distribution.
For the age 18 to 29 years, the normal population has a mean cholesterol level of 4.5
mmol/l with SD = 0.81 mmol/l. Assume a person gets his/her cholesterol level
measured as 6.8 mmol/l. Does this value lie inside the 95% reference range? What is
the reference range?

Exercise 9 (Monday 13.11): Confidence intervals


Open the data set BIRTH.DTA in Stata (see Exercise 4 for a description of the data).

We are interested in the expected birth weight among smokers and non-smokers.
Calculate the mean birth weight with corresponding 95% confidence intervals for
smokers and non-smokers by going to Statistics - Summaries, tables, and tests –
Summary and descriptive statistics – Means. Choose bwt under ‘Variables’, click
if/in/over on top and tick ‘Group over subpopulations’. Include smk as Group
variable. Alternatively, run the command mean bwt, over(smk).

Comment on the confidence intervals. Do you understand how they are calculated
(look at mean and standard error)?

Exercise 10 (Monday 13.11): More on confidence


intervals
We are still working with BIRTH.DTA.
In exercise 8, we were interested in the expected birth weight. Now, we will focus on
the probability of having a baby with low birth weight (<2500 grams), and we are
interested in calculating a 95% confidence interval for this probability.

- 10 -
Go to Statistics - Summaries, tables, and tests – Summary and descriptive statistics –
Proportions and move the relevant variable (low) to ‘Variables’ to estimate the
probability of having a baby with low birth weight. Alternatively, run the command
proportion low. Make sure you understand how the confidence interval is calculated
(see formula in today’s lecture).

Exercise 11 (Monday 13.11): Table 1


In a research paper, Table 1 is typically used to characterize the study participants /
study groups. An example, taken from a randomized trial comparing two
chemotherapy regimens in patients with newly diagnosed Ewing sarcoma, is the
following (where “VIDE” and “VDC plus IE” denote the two treatment groups):

- 11 -
Use the data in BIRTH.DTA to set up a Table 1 for a study where you are interested
in comparing smoking and non-smoking mothers with regard to the risk of low
birthweight babies. The idea is to describe the two groups (smokers and non-smokers)
with regard to a number of other characteristics that might influence the risk of low
birthweight. Use what we have learnt about descriptive statistics to pick relevant
summary measures, and use commands from the previous exercises to compute. You
may use the following table as a starting point:

Smokers (n = ) Non-smokers (n = )
Age
Ethnicity
Hypertension
Weight
No. visits to physician 1.
Trimester

Exercise 12 (Tuesday 14.11): Independent samples


t-tests
Stata is assumed to be up with the data set BIRTH.DTA open.

A
Is birth weight associated with the mother’s smoking habits? Formulate hypotheses
and perform an independent samples t-test comparing smokers and non-smokers! Try
to find your way through the menu (Start at Statistics – Summaries, tables, and tests –
Classical tests of hypotheses …), or just run the following command:

ttest bwt, by(smk)

 Formulate a conclusion!

 Discuss the validity of this conclusion.

Several aspects are relevant for this discussion. Tip: sample, representativeness,
assumptions in the model, e.g., if the data are normally distributed in each group. You
may have a look at histograms or normal probability plots (run e.g.

qnorm bwt if smk==1

for a normal probability plot for smokers).

- 12 -
Menu: Graphics – Distributional graphs – Normal quintile plot. Choose bwt as
Variable, go to if/in on top and specify ‘smk==1’ or ‘smk==0’.

 Why is the assumption of normally distributed groups not important in this


particular case?

Exercise 13 (Tuesday 14.11): T-tests


In this exercise, we will be working with data from a randomized study on pain
management among cancer patients with pain from bone metastasis. The patients
were been randomized to either an educational group with the goal to increase
knowledge about pain management, or a control group that received no intervention.
We will focus on pain. We have four different measurements of pain, before and after
treatment (pain now, average pain, worst pain, least pain), all measured on a 10 cm
VAS. We will focus on pain now and average pain. The data can be found in the file
PAIN.DTA.
We will start out by analyzing “pain now”. A useful starting point will be to have a
look at some descriptive statistics.
a) What is the mean score of the variable painnow_baseline in the two groups?
Do you find the groups comparable?
On the menu line, choose Statistics – Summaries, tables, and tests – Summary
and descriptive statistics – Summary statistics. Choose by/if/in and tick Repeat
command by groups. Include the variable group under Variables that define
groups.
Command: by group, sort : summarize Pain_now_baseline
b) Next, we are interested in whether there has been a change in pain from before
to after treatment in the two groups. We will formally test this by a paired-
samples t-test within each of the two treatment groups. This involves the two
variables Pain_now_baseline and Pain_now_retest.
Carry out the tests, describe your findings and conclude.
Menu: Statistics – Summaries, tables, and tests – Classical tests of hypotheses
- t-test. Choose Paired and include the two relevant variables.
Now we want to run the test by group, so you will have to do as in question a);
choose by/if/in …
Command: by group, sort : ttest Pain_now_baseline == Pain_now_retest
c) Notice that we can reproduce exactly the same test as above by doing a one-
sample t-test on the difference between baseline and after treatment (the
variable pain_now_diff). Try this by choosing One-sample instead of Paired,
and specify ‘0’ under Hypothesized mean.
Command: by group, sort : ttest Pain_now_diff == 0
d) Next, we will perform what would be considered the main analysis of these
data, a comparison of the two treatment groups with regard to pain change.
We will do this by running an independent samples t-test.
Meny: As above, but choose Two-sample using groups (you might have to
remove the group under by/if/in).
Command: ttest Pain_now_diff, by(group)

- 13 -
e) How do you conclude? Emphasize not only the result of the hypothesis test,
but also the estimated changes and differences.
f) Repeat the same analysis for average pain.
g) How do you conclude about the effect of the educational program, based on
what you have now seen?

Exercise 14 (Not in use): Paired samples t-tests


Stata is assumed to be up with the data set BIRTH.DTA open.

A
Does the mean number of doctor visits in the first trimester differ from the mean
number of visits in the third trimester?

Again, you may try to find your way through the menu (you are supposed to run a
paired samples t-test with the variables fvt and ttv), or just run the command:

ttest fvt = ttv

What is the conclusion here? Why is it appropriate to use the paired samples test in
this case?

You may also have a look at normality here by creating a new variable that contains
the difference between fvt and ttv:

generate diff=fvt-ttv.

Exercise 15 (Wednesday 15.11): Cross tables


Stata is assumed to be up with the data set BIRTH.DTA open.

A
Make a cross table of the association between low and smk. Does smoking seem to
affect the probability of giving birth to an underweight child (<2500g)?

A cross table is created by choosing Statistics – Summaries, tables, and tests –


Frequency tables – Two-way tables with measures of association.
Choose smk under Rows and low under Columns. Go to Cell contents and indicate
how you want the percentages to be created in this table. In this case, it is appropriate
to percentage by rows (tick ‘Within-row relative frequencies’).
Alternatively, by commands: tabulate smk low, row.

Result:

- 14 -
smoking low
status bwt > 250 bwt < 250 Total

Non-smoker 86 29 115
74.78 25.22 100.00

Smoker 44 30 74
59.46 40.54 100.00

Total 130 59 189


68.78 31.22 100.00

 Why was it appropriate to percentage by the rows, that is, by the variable smk,
in this case?

B
Make a cross table of the association between low and ht.

 What seems to be more important in predicting low birth weight - smk or ht?

C
Create a cross table of the association between low and ethn. Comments?

Exercise 16 (Wednesday 15.11): Cross tables


We will return to the file SURVEY.DTA that we used in Exercise 2. We are
interested in whether there is an association between gender and smoking in this
population.

A
To answer this, we will create a cross table in Stata and run a chi-square test:
Choose Statistics – Summaries, tables, and tests – Frequency tables – Two-way tables
with measures of association.
Choose sex under Rows and smoke under Columns. Go to Cell contents and indicate
how you want the percentages to be created in this table. Tick Pearson's chi-squared
under Test statistics.

Conclusion?

B
Are the assumptions behind the chi-square test met? Repeat the procedure above, but
tick Expected frequencies under Cell contents.

Are there any cells with expected frequency < 5?

- 15 -
C
What is the odds ratio of smoking for men vs women? What does this mean?

Here we have to go to a different procedure in Stata:


Statistics – Epidemiology and related – Tables for epidemiologists. Choose Cohort
study risk ratio etc.
Use sex as exposure variable and smoking status as case variable.
Go to Options and tick Report odds ratios.

Calculate also the odds ratio by hand to make sure you understand the calculation.

Notice: This Stata procedure will only work if the relevant variables are coded zero
and one!

Exercise 17 (Monday 20.11): Correlation


Stata is assumed to be up with the file PULSE.DTA open.
These are data from an exercise among a group of new students who were paired two
and two. The data consists of two repeated measurements of the pulse of each student;
first the students were asked to measure their own pulse (minpuls), next the neighbor
was asked to measure this student’s pulse (nabopuls). The main idea behind the
exercise was to understand if the pulse was higher when measured by some
“stranger”.

Create a scatter plot of the association between minpuls and nabopuls.

Graphics – Twoway graphs. Click ‘Create’, and then you will see that ‘Scatter’ is the
default. You just have to specify the x-variable and the y-variable. After that, click
‘Accept’ and OK.
Alternatively:
twoway (scatter nabopuls minpuls)

 Does it appear to be a linear association between the variables?

Calculate the Pearson correlation coefficient between nabopuls and minpuls. This can
again be done through the menu by Statistics – Summaries, tables, and tests –
Summary and descriptive statistics – Pairwise correlations. Specify the variables in
question and tick 'Print significance level for each entry'.
The easier solution is to run the code: pwcorr minpuls nabopuls, sig.

What is the correlation? Can you conclude that there is good agreement between the
measures?

- 16 -
Exercise 18 (Monday 20.11): Simple linear regression
In the table below, you will find data describing the relationship between age and
blood pressure of 20 healthy adults. Enter the data

Age Blood pressure

20 120
43 128
63 141
26 126
53 134
31 128
58 136
46 132
58 140
70 144
46 128
53 136
70 146
20 124
63 143
43 130
26 124
19 121
31 126
23 123

Find the correlation between age and blood pressure and test if it is significant. See
commands above.
Perform also a linear regression analysis with blood pressure as dependent variable
and age as independent variable.

Statistics – Linear models and related – Linear regression. Specify the dependent (Y)
and the independent (X) variables.

Alternatively, write the command reg BP Age (if you name the variables like this …).

Read off the 95% confidence interval for the regression parameter. Find also the
squared correlation coefficient between age and blood pressure. What does it mean?
What is the expected blood pressure for a person at age 40? For a person at age 80?
Comment.

- 17 -
Exercise 19 (Monday 20.11): Simple linear regression
We will study the association between heart rate (HR) and intake of oxygen (VO2) in
38 persons. The data are given in the file OXYGEN.DTA.

Create a scatter plot of the association between heart rate and intake of oxygen. Use
heart rate (HR) on the X-axis (see commands in Exercise 15).

Does it appear to be a linear association between the variables?

Compute the regression coefficient for the same association (with VO2 as the
dependent variable). Statistics – Linear models and related – Linear regression.
Specify the dependent (Y) and the independent (X) variables.

Code: reg VO2 HR

Is the association statistically significant? Give also a 95% confidence interval for the
coefficient..

Give an interpretation to the coefficient for heart rate.

Some of the subjects are wearing a mask, which might influence the measurements.
Create the scatterplot from A over again, but use different markers for those with and
without a mask. Impose the regression lines for each of the two groups (with and
without mask) separately. What do you find?
This is a bit tricky in Stata, so we will just give the relevant command (or one version
of it):

twoway (scatter VO2 HR if MASK==0, msymbol(square)) (scatter VO2 HR if


MASK==1, msymbol(circle)) (lfit VO2 HR if MASK==0, lcolor(black)) (lfit VO2 HR
if MASK==1, lcolor(red))

Run also the linear regression analysis over again, with MASK as an additional
independent variable. What happens to the estimated effect of HR and its 95%
confidence interval? In the menu, just add MASK to the list of independent variables.

reg VO2 HR MASK

- 18 -
Exercise 20 (Wednesday 22.11): Linear regression
and confounding
In the lectures, we have been working with data from the Oslo health study
(HSCL_EXER.DTA). In these data we have information about physical activity, and
we would like to investigate the association between physical activity and mental
health, as measured by the HSCL10 score. We have collapsed the original physical
activity question into three categories; those who don’t exercise, those who exercise
moderately and those who exercise a lot, code 1, 2 and 3. This is the variable
NewEx1. In this exercise we will learn how to deal with categorical data in a linear
regression analysis.

We will run a linear regression model with the HSCL10 score as dependent variable
and physical activity as independent variable.
In the menu, do the following: Statistics – Linear models and related – Linear
regression. Specify the dependent variable (Y). For independent variables, click the
small box to the right (with three dots on it). Select ‘Factor variable’ as Type of
variable, and select the variable NewEx1 under Variable 1. Click Add to varlist and
OK.

Running the command is easier: reg HSCL10 i.NewEx1

Notice the i. notation that allow Stata to treat the variable NewEx1 as categorical.

Run the model. You will get an output like this:

HSCL10_T1 Coefficient Std. err. t P>|t| [95% conf. interval]

NewEx1
2 -.0674394 .1285768 -0.52 0.601 -.321003 .1861242
3 -.1336821 .1415499 -0.94 0.346 -.4128297 .1454655

_cons 1.56 .122239 12.76 0.000 1.318935 1.801065

What happens is that he lower category (category 1) or the variable NewEx1 is chosen
as reference category, and the two other categories are compared to this one. Give an
interpretation to the findings. Focus on the estimated coefficients.

We have seen that there are gender differences with regard to physical activity, and
we also know there are gender differences when it comes to the mental health score.
This leads us to think that gender might act as a confounder in the mental health –
physical activity relationship. Add gender to the model and see what happens to the
estimated coefficients.

- 19 -
Exercise 21 (Wednesday 22.11): Non-parametric tests
We will be using NAUSEA.DTA. The data are taken from a small study on patients
receiving chemotherapy. In addition, they were randomized to receive either an active
antiemetic treatment or placebo (taken from “Practical statistics for medical research”
(D. Altman). The variable Nausea gives measurements of nausea on a 100mm self-
assessment scale (0 being the best, 100 being the worst; remember our discussions in
Exercise 5).

We are interested in if there is any difference between the two treatment groups.
First, have a look at the distribution of nausea in the two groups, e.g. by a histogram.

histogram Nausea, by(Trt)

We will analyze these data by a non-parametric test.

ranksum Nausea, by(Trt)

If you do this by the menu (Statistics – Summaries, tables, and tests – Non-parametric
tests of hypotheses – Wilcoxon rank-sum test), you will see that you can also ask for
the probability of values from one group being larger than from the other group. Try
this, and try also to reverse the order of the groups. To achieve this, you will have to
recode your variable. Trt is coded 1 for active treatment and 2 for placebo, so the
simplest way of reversing the order is to recode the placebo group to 0. You can
achieve this by running

replace Trt = 0 if Trt==2

How do you conclude?

Exercise 22 (Wednesday 22.11): Non-parametric tests

Again; BIRTH.DTA.
In Exercise 14, we used a paired-samples t-test to investigate whether the mean
number of doctor visits in the first trimester differ from the mean number of visits in
the third trimester. The analysis gave a highly significant difference, with a p-value <
0.001. Now, repeat this analysis by a non-parametric test. (Statistics – Summaries,
tables, and tests – Non-parametric tests of hypotheses – Wilcoxons matched-pairs
signed-rank test). Place one variable (e.g. fvt) in the ‘Variable’ box and the other
variable (ttv) in the ‘Expression’ box. Alternatively, run

signrank fvt = ttv (notice the similarity with the command in Exercise 14).

How do you conclude?

- 20 -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy