0% found this document useful (0 votes)
11 views7 pages

Proyecto de Regresión Multiple en Salud

This document outlines a mini-project focused on multiple regression analysis related to lung function and height in children, as well as the effects of lead exposure on children's IQ. It includes detailed instructions for data analysis, exercises involving scatterplots and regression models, and hypotheses testing. The project consists of two main exercises, one analyzing lung function data and the other examining the impact of lead exposure on IQ scores, with specific tasks and points allocated for each part.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Proyecto de Regresión Multiple en Salud

This document outlines a mini-project focused on multiple regression analysis related to lung function and height in children, as well as the effects of lead exposure on children's IQ. It includes detailed instructions for data analysis, exercises involving scatterplots and regression models, and hypotheses testing. The project consists of two main exercises, one analyzing lung function data and the other examining the impact of lead exposure on IQ scores, with specific tasks and points allocated for each part.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Mini-Project 2: Multiple Regression

Total Points: 80 points and 5 bonus points

Instructions/code for performing linear regression can be found under relevant


software instruction document.

Data Analysis: Lung function and Height in children.

This exercise will continue using the lung function study that we used in mini-project 1.
The study description will be restated below.

A common measurement of lung function is the forced expiratory volume (FEV),


which measures how much air you can blow out of your lungs in a short period of
time. A higher FEV is usually associated with better respiratory function. The data
set fev_sample.dat contains a random sample of 549 children’s data in free field
format. Available data includes measurement of age, height, sex, FEV, and whether
each child smokes or not. The format of the data is:

SEQNBR = case number


SUBJID = subject identification number
AGE = subject age in years at time of measurement
FEV = measured FEV (liters per second)
HEIGHT = subject height at time of measurement (inches)
SEX = subject sex (1 = male, 0 = female)
SMOKE = smoking habits (1 = yes, 0 = no)
LOG_FEV=log transformed FEV (log liters per second)

The primary objective is to determine if there is evidence to suggest that height is


associated with lung function in children.

Exercise 1. (40 points + 5 bonus points) In this problem, we will continue to


analyze the log-transformed FEV and smoking in children data. Therefore, the
interpretation should reflect that log scaled FEV (not the raw FEV) is under
analyses (please refer to “transformation” slides on this.

a. (20 points) Recall that the scatter plot can be used to describe association between
continuous variables, and boxplot can be used to visualize difference in continuous
data between groups. Please produce scatterplot or boxplot when appropriate, and
comment on the relationships observed between:

ii. (4 points) height and age


iii. (4 points) Height and Smoke

iv. (4 points) age and smoke

v. (4 points) log_FEV and age

vi. (4 points) log_FEV and smoke

b. (5 points) Fit a simple linear regression model with log transformed FEV value as
the outcome variable, and smoke as the predictor. What do you conclude about the
relationship between smoke and log transformed FEV value?
Note: please include p value if commenting on the significance of the relationship.

c. (5 points) Fit a multiple linear regression model with log transformed FEV value as
the outcome variable, and height, age and smoke as the predictor. What do you
conclude about the relationship between smoke and log transformed FEV value?
Note: please include p value if commenting on the significance of the relationship.

d. (4 points) Test the overall hypothesis that height, age and smoke when considered
together are significant predictors of log transformed FEV value. State the null and
alternative hypothesis, the p-value and your conclusion.

e. (6 points) Interpret the three estimated slope coefficients based on the multiple
linear regression in part c.

f. (bonus question) (5 points) Compare the results from part b and c, comment on the
difference in these two results and why this difference occurs.
Exercise 2. The effects of lead exposure on neurological and psychological function in
children (40 points total)

A group of children who lived near a lead smelter in El Paso, Texas, were identified and
their blood levels of lead were measured. The response variable is the Wechsler full-scale
IQ score (IQ). There were 46 children who were exposed to lead (LEAD = 1) based on
their blood-lead levels, and 78 children who were not exposed to lead (LEAD = 0). The
following variables were collected:

IQ: IQ score
LEAD: Lead exposure (0=No, 1=Yes)
SEX: Sex (1=MALE, 0=FEMALE)
AGE: Age of children (in years)

Figure 3: Descriptive Plots

a. (4 points) Based on the first two plots in Figure 3, comment on the relationship between
IQ and AGE, IQ and LEAD (Lead exposure).
b. (4 points) Based on the last two plots in Figure 3, do you see any possible interaction
between AGE and LEAD, or between AGE and SEX? Explain why.

c. (6 points) Table 3 provides regression model summaries for the following three models:
(1) IQ as a function of SEX, LEAD, AGE and three interaction terms: AGE and
SEX, AGE and LEAD, SEX and LEAD.
(2) IQ as a function of SEX, LEAD, AGE and two interaction terms: AGE and
SEX, SEX and LEAD.
(3) IQ as a function of SEX, LEAD, AGE and one interaction term: AGE and SEX.
(4) IQ as a function of LEAD and one interaction term: AGE and SEX.

Assume that all model assumptions are satisfied for each of the models. Which model
would you choose, and why? Considering the goal of the study, LEAD will be kept in the
model regardless of its p-value. Can you further simplify the model you chose? If yes,
how? If no, why not?

Hint: When a higher order term (i.e. interaction) included in the model, a lower order (i.e.
individual term) involved in the higher order has to be considered in the model.

Table 3: Regression models

Model (1):

Model Summary(b)

Adjusted R Std. Error of


Model R R Square Square the Estimate
1 .279(a) .078 .030 14.18325
a Predictors: (Constant), SexByLead, age, AgeBySex, AgeByLead, sex, lead
b Dependent Variable: iq
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 103.506 6.743 15.349 .000 90.151 116.861
sex -17.828 7.885 -.605 -2.261 .026 -33.444 -2.212
lead -6.386 8.721 -.215 -.732 .465 -23.657 10.885
age -1.078 .631 -.265 -1.708 .090 -2.328 .172
AgeBySex 1.892 .754 .644 2.508 .014 .398 3.386
AgeByLead -.006 .787 -.002 -.008 .994 -1.565 1.553
SexByLead 3.170 5.642 .095 .562 .575 -8.005 14.344
a. Dependent Variable: iq

Model (2):

Model Summary(b)

Adjusted R Std. Error of


Model R R Square Square the Estimate
1 .279(a) .078 .039 14.12303
a Predictors: (Constant), SexByLead, age, AgeBySex, lead, sex
b Dependent Variable: iq

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 103.526 6.209 16.674 .000 91.230 115.821
sex -17.829 7.850 -.605 -2.271 .025 -33.375 -2.284
lead -6.444 4.333 -.217 -1.487 .140 -15.024 2.136
age -1.080 .573 -.265 -1.884 .062 -2.216 .055
AgeBySex 1.892 .751 .644 2.519 .013 .405 3.379
SexByLead 3.179 5.490 .095 .579 .564 -7.693 14.050
a. Dependent Variable: iq

Model (3):

Model Summary(b)

Adjusted R Std. Error of


Model R R Square Square the Estimate
1 .274(a) .075 .044 14.08352
a Predictors: (Constant), AgeBySex, lead, age, sex
b Dependent Variable: iq
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 102.707 6.029 17.036 .000 90.769 114.645
sex -16.271 7.353 -.552 -2.213 .029 -30.831 -1.710
lead -4.464 2.653 -.150 -1.682 .095 -9.718 .790
age -1.064 .571 -.261 -1.864 .065 -2.195 .067
AgeBySex 1.844 .744 .628 2.477 .015 .370 3.318
a. Dependent Variable: iq

Model (4):

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .191a .036 .020 14.25614

a. Predictors: (Constant), AgeBySex, lead

d. (3 points) Based on model (3), write the estimated regression equation for males.

e. (4 points) Using the estimated regression equation in question d) to interpret the effect of
LEAD and AGE for males.

f. (3 points) Based on model (3), write the estimated regression equation for females.
g. (4 points) Using the estimated regression equation in question f) to interpret the
effect of LEAD and AGE for females.

h. (3 points) Is the effect of lead different between males and females? Why will this
occur?

i. (4 points) Based on the estimated regression equation from part (f), predict the IQ score
of a girl at age 8 and exposed to lead.

j. (5 points) For model (2), is there evidence of a significant interaction between AGE and
SEX? State the null and alternative hypothesis, p-value and

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy