0% found this document useful (0 votes)
244 views23 pages

Factor-Hair RV PDF

The document describes performing exploratory data analysis, factor analysis, and multiple linear regression on a dataset with customer satisfaction and 11 independent variables. 4 factors were extracted from factor analysis explaining 79.5% of variance: Procurement, Marketing, AfterSales, and Segment. Multiple linear regression was performed with customer satisfaction as the dependent variable and the 4 factors as independents, finding the model explained 70% of variance. Removing the insignificant AfterSales factor did not improve the model significantly. Adding an interaction term improved the model, explaining 75% of variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views23 pages

Factor-Hair RV PDF

The document describes performing exploratory data analysis, factor analysis, and multiple linear regression on a dataset with customer satisfaction and 11 independent variables. 4 factors were extracted from factor analysis explaining 79.5% of variance: Procurement, Marketing, AfterSales, and Segment. Multiple linear regression was performed with customer satisfaction as the dependent variable and the 4 factors as independents, finding the model explained 70% of variance. Removing the insignificant AfterSales factor did not improve the model significantly. Adding an interaction term improved the model, explaining 75% of variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Project Factor-Hair-Revised

R Venkataraman
Objective
❖ Perform exploratory data analysis on the dataset.
Showcase some charts, graphs. Check for outliers and
missing values
❖ Is there evidence of multicollinearity ? Showcase your
analysis
❖ Perform simple linear regression for the dependent
variable with every independent variable
❖ Perform PCA/Factor analysis by extracting 4 factors.
Interpret the output and name the Factors
❖ Perform Multiple linear regression with customer
satisfaction as dependent variables and the four factors
as independent variables. Comment on the Model output
and validity. Your remarks should make it meaningful for
everybody

Data file: Factor-Hair-Revised.csv


Exploratory Data Analysis
• The extract shows 13 variables with 100 rows
• ID is the observation number which can be ignored
or deleted for any analysis.
• Satisfaction is the dependant variable which is a
function of 11 independent variables.
• All variables are of numeric data type
• No blank or NA against any variables
Exploratory Data Analysis
• Boxplot of the dataset reveals outliers in 4
variables(Ecom, SalesFImage, OrdBilling &
Delspeed)
• Separate box plot of these variables gives a
better illustration

Customer Satisfaction (dependant variable) Both Shapiro-Wilk test & density graph shows that the
• Negatively skewed dependant variable is normally distributed.
Exploratory Data Analysis
• Histogram with density graph of independent variables
Multicollinearity

The correlation data shows high values between:


• CompRes and DelSpeed
• OrdBilling and CompRes
• WartyClaim and TechSupport
• CompRes and OrdBilling
• OrdBilling and DelSpeed
• Ecom and SalesFImage ….more tests
Multicollinearity
We will calculate the VIF(Variance Inflation Factor)

Regression Model

• Adjusted R-square value indicates that 78% of


variance is explained by the independent variables.

• Three variables are significant.

• P value < .05 implies the model is significant

VIF
As a rule of thumb
• 1 = not correlated.
• Between 1 and 5 = moderately correlated.
• Greater than 5 = highly correlated.

Inference:
• CompRes moderately correlated
• DelSpeed highly correlated
Simple Linear Regression
Simple linear regression was performed with all the eleven independent variables and the results of
intercept & Slopes are produced below. Graphical representation is also illustrated.
Simple Linear Regression
PCA/Factor Analysis
Bartlett test: P value < .05 confirms the Eigen values : Kaiser Rule suggests eigen value >= 1 can be
possibility of data dimension reduction considered. So four components can be used

Proportion of variance: 4 PCs are able to explain


79.5% of variance
SCREE plot: Number that appear before the
elbow which is again 4 components
PCA/Factor Analysis
Positively correlated variables points to the same side
and negatively correlated ones to the opposite side
PCA/Factor Analysis
Factor analysis using 4 factors (Principal Axis factoring with no rotation)

Factor 1 accounts for 29.20% of the variance; Factor 2 accounts


for 20.20% of the variance; Factor 3 accounts for 13.60% of the
variance; Factor 4 accounts for 6.2% of the variance. All the 4
factors together explain for 69.2% of the variance in performance.
PCA/Factor Analysis
Factor analysis using 4 factors (Principal Axis factoring with varimax rotation)

Factor 1 accounts for 24% of the variance; Factor 2 accounts for


17.90% of the variance; Factor 3 accounts for 14.90% of the
variance; Factor 4 accounts for 12.5% of the variance. All the 4
factors together explain for 69.2% of the variance in performance.
PCA/Factor Analysis
Summary

• Bartlett test confirms reduction of dimension possible


• Scree plot & Kaiser rule suggests 4 components confirmed by 79.5 % of variance explained by these 4
components.
• Factor loadings without rotation : 3.21 2.22 1.50 0.68
• Factor loadings with varimax rotation : 2.63 1.97 1.68 1.37
• Factor Naming with variables:

Factors Independent Variables Group Business Activities


PA1 DelSpeed,CompRes, OrdBilling Procurement PO,Invoice,Delivery
PA2 SalesFImage,Ecom,Advertising Marketing All Mktg activites
PA3 WartyClaim,TechSup AfterSales Post sales service
PA4 ProdLine,ProdQual,CompPricing Segment Product positioning

Multiple Linear Regression Equation:

Satisfaction = β0 + β1*Procurement + β2*Marketing + β3*AfterSales + β4*Segment


Multiple Linear Regression
Scores from the rotated factor analysis are used for this regression

Model 1
• P value < .05 confirms the relationship between
independent and dependant variable and significant
• VIF of the variables also confirms no multicollinearity as
values are close to 1
• All independent variables except AfterSales including
intercept are highly significant
• R-Squared value of 70% is the variation of dependant
variable explained by the model
• Adjusted R-Square value of 68% explains how many
data points fall within the line of regression equation.

• With less significant AfterSales there is a scope for


improving the model to ignore it
Multiple Linear Regression
Predicted
Model1 – Predicted values are graphed with actual Actuals
values to visualize the prediction trend
Multiple Linear Regression
AfterSales(less significant) removed from regression Equation

Model 2
• P value < .05 confirms the relationship between
independent and dependant variable and significant
• VIF of the variables also confirms no multicollinearity as
values are close to 1
• All independent variables including intercept are highly
significant
• R-Squared value of 70% is the variation of dependant
variable explained by the model
• Adjusted R-Square value of 68% explains how many
data points fall within the line of regression equation.

• There is no big difference between Model 1 & Model 2,


except for ignoring one independent variable.
Multiple Linear Regression
Predicted
Model2 – Predicted values are graphed with actual Actuals
values to visualize the prediction trend
Multiple Linear Regression
As a best practise, we will check if there are any significant interaction exists between the
variables to be included in the model

Model 3
• P value < .05 confirms the relationship between
independent and dependant variable and significant
• VIF of the variables also confirms no multicollinearity as
values are close to 1
• All independent variables including intercept are highly
significant, Marketing*Segment interaction moderately
significant.
• R-Squared value of 75% is the variation of dependant
variable explained by the model
• Adjusted R-Square value of 73% explains how many
data points fall within the line of regression equation.

• There is a good improvement in Model3 as against


Model 1 & Model 2 by considering interaction
Multiple Linear Regression
Predicted
Model3 – Predicted values(with interaction) are graphed with Actuals
actual values to visualize the prediction trend
Multiple Linear Regression
Summary
Model 1 Model 2 Model 3
All four Factors Model 1+Factor Model2+Interaction
Description included AfterSales excluded between 3 factors With Model3 by including the interaction, the
included model performance measures have improved
Multiple R Square 0.6971 0.6951 0.7494 and this can be used for predictive analysis.
Adjusted R Square 0.6844 0.6856 0.7303
Standard Error 0.6696 0.6683 0.6189 MLR :
Degrees of Freedom 95 96 92 lm(Satisfaction~Procurement+Marketing+Segment+
Intercept 6.9180 6.9180 6.9510 Procurement*Marketing+
Procurement*Segment+
Slopes
Marketing*Segment+
Procurement 0.5796 0.5794 0.7215 Procurement*Marketing*Segment,
Marketing 0.6197 0.6206 0.5340 data=myfadata)
AfterSales 0.0569
Segment 0.6116 0.6148 0.6023
Interaction
Procurement*Marketing 0.1352
Procurement*Segment 0.0811
Marketing*Segment 0.2307
Procurement*Marketing* Segment
0.2089
Source/Reference:

Business Statistics
Communicating with Numbers
➢ Jaggia / Kelly

Marketing Research
An Applied Orientation
➢ Naresh K.Malhotra / Satyabhusan Dash

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy