0% found this document useful (0 votes)
19 views7 pages

Lab2-Markdown XFL (CLEAN)

This document outlines a lab exercise focused on linear regression using the Carseats dataset. It includes steps for analyzing relationships between variables, conducting linear regression models, and interpreting coefficients. The document also covers diagnostic plots, significance of variables, and additional regression models with different parameters.

Uploaded by

liu.7133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Lab2-Markdown XFL (CLEAN)

This document outlines a lab exercise focused on linear regression using the Carseats dataset. It includes steps for analyzing relationships between variables, conducting linear regression models, and interpreting coefficients. The document also covers diagnostic plots, significance of variables, and additional regression models with different parameters.

Uploaded by

liu.7133
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab 2 - Linear Regression (Exercises)

Jan 16, 2025

For this lab exercise, we will use the Carseats data. Make sure you have loaded all the
required packages and the dataset before starting with the questions.

library(ISLR)
library(MASS)

data("Carseats")
#?Carseats #use this command to learn more about the data

1. Let’s start by looking at the relationship between all the variables. Remember, we use
the pairs() command to view the scatter plot of all the variables. Can you identify
what variables are factors? (There are 3 factor variables)

pairs(Carseats) #figure below minimized for space

80 0 50 30 1.0

Sales
0

CompPrice
80

Income
20

Advertising
0

Population
0
50

Price
1.0

ShelveLoc

Age
30

Education
10
1.0

Urban
1.0

US

0 20 0 1.0 10 1.0

1
2. Suppose you are interested in predicting the Sales of child car seats. You identify
Price, Advertising and Age are the key variables. Conduct a linear regression model
looking at this relationship. In other words, estimate:

Sales = β0 + β1 P rice + β2 Advertising + β3 Age + ε

If you coded properly, the coefficient for Price should be: -0.058.

##
## Call:
## lm(formula = Sales ~ Price + Advertising + Age, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6247 -1.5288 0.0148 1.5220 6.2925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.003472 0.718754 22.266 < 2e-16 ***
## Price -0.058028 0.004827 -12.022 < 2e-16 ***
## Advertising 0.123106 0.017095 7.201 3.02e-12 ***
## Age -0.048846 0.007047 -6.931 1.70e-11 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 2.269 on 396 degrees of freedom
## Multiple R-squared: 0.3595, Adjusted R-squared: 0.3547
## F-statistic: 74.1 on 3 and 396 DF, p-value: < 2.2e-16

2
3. Using the plot() command, graph the diagnostic plot for the linear regression you just
used. Make sure you include par(mfrow=c(2,2)) to ensure that all the plots are or-
dered properly. Do you think a linear regression model is adequate for this relationship?

Standardized residuals
Residuals vs Fitted Q−Q Residuals
353 26 353 26
Residuals

2
0
−5

−3
51 51

2 4 6 8 10 12 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles


Standardized residuals

Standardized residuals
Scale−Location Residuals vs Leverage
3535126

0 2
1.0

−3 166
Cook's distance
144
0.0

51

2 4 6 8 10 12 0.00 0.01 0.02 0.03 0.04

Fitted values Leverage

4. Are all the variables significant? What does being significant really mean? Write down
the interpretation for the Intercept and Age coefficient?

Yes, all variables are significant (p-value less than 0.05). Being significant means that the
coefficient does not equal 0. When Price, Advertising, and Age is 0, the sales is 16.00 on
average (Intercept). As the age increases by 1, sales is associated with a decline of 0.04 on
average (Age coefficient).

3
5. Just for practice, try to perform the linear regression:

Sales = β0 + β1 P rice + ε

. Then using abline() plot the linear regression line. You should be getting a plot
like this:
15
10
Sales

5
0

50 100 150

Price

4
6. Using the predict() command try to calculate the confidence and prediction intervals
for the regression model in Q.5, when Income Price is 30, 60, and 90. If you coded this
properly, you should be getting:

Confidence intervals:

## fit lwr upr


## 1 12.049725 11.112929 12.986521
## 2 10.457534 9.819637 11.095431
## 3 8.865344 8.496982 9.233705

Prediction intervals:

## fit lwr upr


## 1 12.049725 6.983945 17.11550
## 2 10.457534 5.438426 15.47664
## 3 8.865344 3.873328 13.85736

7. Let’s try to add a few more parameters to our regression model. Let us estimate this
model:
Sales = β0 + β1 Log(P rice) + β2 Income + β3 (Income2 ) + ε
Try to only output the coefficients this time. The solution is provided below. Make
sure your coefficients matches the solutions.

## (Intercept) log(Price) Income I(Income^2)


## 3.160110e+01 -5.343843e+00 2.261351e-02 -7.084581e-05

5
8. Now, lets estimate this model:

Sales = β0 + β1 P rice + β2 Income + β3 U rban + β4 (Income × U rban) + ε

The coefficient of β4 = −0.0159.

##
## Call:
## lm(formula = Sales ~ Price + Income * Urban, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6964 -1.8640 -0.0938 1.6930 7.5875
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.968842 0.839695 14.254 < 2e-16 ***
## Price -0.052415 0.005318 -9.856 < 2e-16 ***
## Income 0.023487 0.007811 3.007 0.00281 **
## UrbanYes 1.082502 0.703561 1.539 0.12470
## Income:UrbanYes -0.015931 0.009546 -1.669 0.09594 .
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 2.507 on 395 degrees of freedom
## Multiple R-squared: 0.2196, Adjusted R-squared: 0.2117
## F-statistic: 27.79 on 4 and 395 DF, p-value: < 2.2e-16

6
9. Finally, lets estimate Sales = β0 + β1 Education + ε. Note, Education is formatted
as a numeric variable. In this mode, try to convert this variable to a factor. If done
correctly, your regression output should have 8 different levels of Education.

##
## Call:
## lm(formula = Sales ~ Education, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1317 -1.9725 -0.0588 1.8270 8.7988
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.2458 0.4077 20.223 <2e-16 ***
## Education11 -0.7746 0.5766 -1.343 0.1800
## Education12 -0.7781 0.5737 -1.356 0.1758
## Education13 -1.1935 0.5932 -2.012 0.0449 *
## Education14 -1.1706 0.6048 -1.936 0.0536 .
## Education15 -0.1142 0.6228 -0.183 0.8547
## Education16 -1.0014 0.5797 -1.727 0.0849 .
## Education17 -0.7430 0.5737 -1.295 0.1960
## Education18 -0.9693 0.6048 -1.603 0.1098
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 2.825 on 391 degrees of freedom
## Multiple R-squared: 0.0195, Adjusted R-squared: -0.0005622
## F-statistic: 0.972 on 8 and 391 DF, p-value: 0.4574

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy