0% found this document useful (0 votes)
20 views6 pages

Regression Modelling Ass

The document details a regression analysis performed using R on a dataset from 'grocery.csv', including scatter plots and correlation matrices that reveal relationships between variables Y, X1, X2, and X3. A multiple regression model is fitted, showing significant contributions from X3 and a weak contribution from X1, while X2 is not statistically significant. The analysis concludes that at least one predictor significantly explains the variability in Y, with an R-squared value of 0.688, indicating a strong correlation.

Uploaded by

tpumba002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Regression Modelling Ass

The document details a regression analysis performed using R on a dataset from 'grocery.csv', including scatter plots and correlation matrices that reveal relationships between variables Y, X1, X2, and X3. A multiple regression model is fitted, showing significant contributions from X3 and a weak contribution from X1, while X2 is not statistically significant. The analysis concludes that at least one predictor significantly explains the variability in Y, with an R-squared value of 0.688, indicating a strong correlation.

Uploaded by

tpumba002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

NAME: ANTONY MACHARIA

REG.NO. SCM222-0526/2021

FINANCIAL ENGINEERING

REGRESSION MODELLING 1

R-ASSIGNMENT

(a) To obtain the scatter plot and correlation using R,

Library(ggpairs)

Library(corrplot)

Data<-read.csv(“grocery.csv”)

(i) Pairs(data)

(ii)cor_matrix<-cor(data)

print(cor_matrix)

Scatter Plot Matrix: Provides a visual understanding of pairwise relationships between


Y,X1,X2,X3

Correlation Matrix:

 Y and X3 have a strong positive correlation (0.810)

 Y and X1 have a weak positive correlation (0.21)

 Y and X2 show almost no linear correlation (0.06)

 Correlation matrix

 Scatter plots
(b)To fit a multiple regression model

model <- lm(Y ~ X1 + X2 + X3, data = data)

summary(model)

The estimated regression function model is Y^=4149.89+0.0008X1-13.166X2+623.55X3

Interpretation of the coefficients

Intercept (4149.89): When X1, X2 and X3 are all zero, the estimated total labor hours (YYY)

is 4149.89.

X1(0.0008): For every additional unit increase in the number of cases shipped (X1=1), total

labor hours increase by approximately 0.0008 hours, holding other variables constant.
X2 (-13.166): For every 1% increase in indirect costs as a percentage of labor hours (X2=1),

total labor hours decrease by 13.166, holding other variables constant. However, X2 is not

statistically significant (p=0.571p = 0.571p=0.571).

X3 (623.55): If the week includes a holiday (X3=1), total labor hours increase by 623.55

compared to non-holiday weeks, holding other variables constant. This coefficient is highly

significant (p<0.001p < 0.001p<0.001).

(C) residuals <- resid(model)

-32.06, 169.21, -21.83, -54.12, 75.93

The center line (median) is close to zero, indicating symmetry.A few


potential outliers exist at the higher and lower ends of the residual range.

(d) (i) Residuals against Fitted Values

fitted_values <- fitted(model)

plot(fitted_values, residuals, main = "Residuals vs Fitted Values", xlab = "Fitted Values",


ylab = "Residuals")
abline(h = 0, col = "red")

(ii) Residuals against Predictors

par(mfrow = c(1, 3)) # Set layout for 3 plots

plot(data$X1, residuals, main = "Residuals vs X1", xlab = "X1", ylab = "Residuals")

plot(data$X2, residuals, main = "Residuals vs X2", xlab = "X2", ylab = "Residuals")

plot(data$X3, residuals, main = "Residuals vs X3", xlab = "X3", ylab = "Residuals")

par(mfrow = c(1, 1))

iii) Normal Probability Plot (Q-Q Plot)

qqnorm(residuals)

qqline(residuals, col = "red")


(e)Brown-Forsythe test for error variance constancy

Divide data into two groups based on fitted values

group <- ifelse(fitted_values <= median(fitted_values), 1, 2)

Brown-Forsythe Test

Library(onewaytests)

bf_test <- bf.test(c(group1, group2), factor(rep(c("Group1", "Group2"), c(length(group1),


length(group2)))), alpha = 0.01) print(bf_test)

(f) confint(model, level = 1 - 0.05 / 2)

print(anova_result)

p_value <- summary(model)$fstatistic p_value <- pf(p_value[1], p_value[2], p_value[3],


lower.tail = FALSE) cat("P-value of the test:", p_value, "\n")

Null Hypothesis (Ho):


The regression coefficients are all equal to zero (β1=β2=β3=0).
This implies that the predictors X1, X2, andX3 do not explain the variability in Y
Alternative Hypothesis (Ha):
At least one of the regression coefficients is not equal to zero (βj≠0 for some j.
This implies that at least one predictor explains some variability in Y

Decision Rule

To test the null hypothesis:

 Use a level of significance (α) = 0.05.

 If the p-value of the overall F-test is less than α (0.05), we reject Ho and conclude that
there is a significant regression relationship

Result and Conclusion

From the regression output, the p-value is approximately:

p-value=3.315708e^-12=0.00002037241

 Since p-value<0.5, we reject the null hypothesis.

Conclusion: There is strong evidence to suggest that at least one of the predictors X1, X2, or
X3 significantly explains the variability in Y (total labor hours).

P-value of the test: 3.315708e-12

(g) confint(model, level = 1 - 0.05 / 2)

Provides 95% family confidence intervals for β1 and β3

(h) r_squared <- summary(model)$r.squared cat("R-squared:", r_squared, "\n")

R2 indicates the proportion of variance in Y explained by the predictors X1,X2,X3

R-squared: 0.6883342 indicates that it is strongly correlated.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy