0% found this document useful (0 votes)
102 views12 pages

Assignment AnjaliVats 244

1. Various decision tree models were generated using different combinations of independent variables from the Carseats dataset to predict the target variables. 2. The accuracy of the models was evaluated using measures like mean squared error. 3. The most accurate model was found to be one that predicts whether a person belongs to the US (Yes/No) category, achieving around 89% accuracy.

Uploaded by

AnjaliVats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views12 pages

Assignment AnjaliVats 244

1. Various decision tree models were generated using different combinations of independent variables from the Carseats dataset to predict the target variables. 2. The accuracy of the models was evaluated using measures like mean squared error. 3. The most accurate model was found to be one that predicts whether a person belongs to the US (Yes/No) category, achieving around 89% accuracy.

Uploaded by

AnjaliVats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Assignment

On

Decision Tree Analysis

IN PARTIAL FULLFILLMENT OF THE DEGREE OF

Masters of Business Administration-Intelligent Data Science (MBA-IDS:2018-2020)

UNDER GUIDANCE OF

Prof. Keerti Jain

Anjali Vats MB18GID244


Contents
Problem Statement........................................................................................................................ 3
DataSet ........................................................................................................................................... 3
Models of Decision Tree & Model Accuracy .............................................................................. 4
Using rpart .................................................................................................................................. 4
Comparing different combinations of model using mean squared error values ......................... 8
Using tree package ...................................................................................................................... 9
Best Model which is generated................................................................................................... 12
Comparing Accuracy ................................................................................................................ 12
Comaparing Means ................................................................................................................... 12
Problem Statement

1. Build various models of decision tree using different combinations of independent variables.
2. And check the accuracy of the models.
3. Find best model among the models which u generated.

DataSet

The Carseats dataset is a dataframe with 400 observations on the following 11 variables:

1. Sales: unit sales in thousands


2. CompPrice: price charged by competitor at each location
3. Income: community income level in 1000s of dollars
4. Advertising: local ad budget at each location in 1000s of dollars
5. Population: regional pop in thousands
6. Price: price for car seats at each site
7. ShelveLoc: Bad, Good or Medium indicates quality of shelving location
8. Age: age level of the population
9. Education: ed level at location
10. Urban: Yes/No
11. US: Yes/No
Models of Decision Tree & Model Accuracy

Using rpart
1. Predicting weather the person belongs to US or not based on variables (Income, Advertising,
Population, price)
############ CODE 1 ##################

install.packages("rpart")

library(rpart)

getwd()

Carseats<-read.csv("C:/Users/intone/Desktop/MBA/T4/MA/After MidTerm/Carseats.csv")

attach(Carseats)

names(Carseats)

tree_analysis<-rpart(US~Income+Advertising+Population+price, data=Carseats)

tree_analysis

install.packages("rpart.plot")

library(rpart.plot)

rpart.plot(tree_analysis,extra=1)

2) Predicting weather the person belongs to Urban or not based on other parameters in the dataset
i.e ( Income, Advertising, Education, Population, Price, Age, Shelvloc, US , Sales)
Carseats <- Carseats[,-1]

Carseats$Urban <- factor(Carseats$US, levels=c(2,4), labels=c("Yes", "No"))

print(summary(Carseats))

set.seed(1234)

ind <- sample(2, nrow(Carseats), replace=TRUE, prob=c(0.7, 0.3))

trainData <- Carseats[ind==1,]

validationData <- Carseats[ind==2,]

tree = rpart(Urban ~ ., data=trainData, method="class")

rpart.plot(tree)

evaluation(tree, validationData, "class")


C) Predicting Shelveloc (Good, Medium or bad) based on other parameters in the dataset
(Income+Advertising+Education+Population+price+age+Urban+US+Sales)
############ CODE ##################

tree_analysis<-
rpart(Shelveloc~Income+Advertising+Education+Population+price+age+Urban+US+Sales,
data=Carseats)

rpart.plot(tree_analysis,extra=1)

Comparing different combinations of model using mean


squared error values
Different combinations of independent variables are used to create models and their mean squared
error values are calculated using the difference in actual and predicted values

The least mean error value has been obtained for model7 and model8 with 7 and 8 independent
variables ie. With the following combinations :
tree_model7=tree(High~Advertising+age+price+Education+Income+Population+US+Shelv
eloc,training_data)
tree_model8=tree(High~Advertising+age+price+Education+Income+Population+US+Shelv
eloc+Urban,training_data)
Using tree package
a) Creating a decision model by splitting the dataset into training and test data.

#split data into training ans test set

set.seed(2)

train=sample(1:nrow(Carseats),nrow(Carseats)/2)

training_data=Carseats[train,]

testing_data=Carseats[test, ]

testing_High=High[test]

#fit thr tree model using training data

tree_model=tree(High~.,training_data)

plot(tree_model)

text(tree_model, pretty = 0)
tree_pred=predict(tree_model, testing_data, type="class")
mean(tree_pred!=testing_High)

#PRUNE the tree


##cross validation to check whre to stop pruning
set.seed(3)

cv_tree=cv.tree(tree_model, FUN=prune.misclass)
names(cv_tree)
plot(cv_tree$size, cv_tree$dev, type="b")
##prune the tree

pruned_model=prune.misclass(tree_model, best=9)

plot(pruned_model)

text(pruned_model, pretty=0)

##check how it is doing

tree_pred=predict(pruned_model, testing_data, type="class")

mean(tree_pred !=testing_High)
Best Model which is generated

Comparing Accuracy
1) The best model has been generated is the one in which US (Yes/ No) labels have been
predicted keeping other variables. This model has accuracy of around 89%.

Comaparing Means
The least mean error value has been obtained for model7 and model8 with 7 and 8 independent
variables ie. With the following combinations :
tree_model7=tree(High~Advertising+age+price+Education+Income+Population+US+Shelv
eloc,training_data)
tree_model8=tree(High~Advertising+age+price+Education+Income+Population+US+Shelv
eloc+Urban,training_data)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy