0% found this document useful (0 votes)
75 views5 pages

Lecture 16

Uploaded by

eimaaniqbal8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views5 pages

Lecture 16

Uploaded by

eimaaniqbal8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lecture 16: Statistical Inference by Dr.

Javed Iqbal
Decision Tree Models for Regression and Classification
Machine Learning is an AI technique that teaches computers to learn from experience. Machine
learning algorithms use computational methods to “learn” information directly from data
without relying on a predetermined equation as a model.

Decision tree is a supervised machine learning technique which involves segmenting or


stratifying predictor space into simple regions (containing mostly one type of objects or cases).
The splitting rule can be represented by (upside down) tree, hence this method is called
decision tree method.

Regression Tree:
Consider to model the baseball players salary, a quantitative variable. Hence the resulting tree
model is a regression tree. There are 322 observations (players) and 20 variables are measured
on them. There can be many variables affecting salary (thousands of dollars, expressed in logs)
but suppose we consider only two predictors (X1: years of experience and X2: hits made in
previous season). The resulting decision tree is represented as follows:

Overall, the tree stratifies or segments players into three regions of predictor space:
𝑅1 = {𝑋|𝑌𝑒𝑎𝑟𝑠 < 4.5}, 𝑅2 = {𝑋|𝑌𝑒𝑎𝑟𝑠 ≥ 4.5, 𝐻𝑖𝑡𝑠 < 117.5},
𝑅3 = {𝑋|𝑌𝑒𝑎𝑟𝑠 ≥ 4.5, 𝐻𝑖𝑡𝑠 ≥ 117.5}
Here we have two internal nodes and three leaves (final nodes). Within a given region,
predicted salary is the average salary of players in this region.
Interpretation of the tree outcome: Year is most important predictor of salary so that players
with lower years of experience (in particular less than 4.5 years) have lower salary.
Given that players have low experience, the number of hits has no role to play in salary. But
among the players with more than 4.5 years, the number of hits also becomes important with
higher salary being paid to players with high number of hits.
In general, the goal of the tree algorithm is to find regions (boxes) 𝑅1 , 𝑅2 ,… 𝑅𝐽 that minimize
the residual sum of squares (RSS), given by:
𝐽

∑ ∑(𝑦̂𝑖 − 𝑦̂𝑅𝑗 )2
𝑗=1 𝑖∈𝑅𝑗

Where 𝑦̂𝑅𝑗 is the mean response for the training observations within the jth box.

What variable to start with making a tree and where the cut is to be made e.g., 𝑌𝑒𝑎𝑟𝑠 < 4.5, is
determined by the decision tree algorithm so that residual sum of square is minimized. Initially
the residual sum of squares equals sum of squared deviation of all observation y from the grand
mean of 𝑦̅. We select predictor and consider all possible cuts for the predictor. The cut that
provides greatest reduction in the residual sum of squares is selected by the algorithm. After a
cut is made, we keep on segmenting the predictor space to look for further reduction in the
residual sum of squares.
The algorithm stops when a stopping criterion is reached e.g., when there are only a certain
number of cases remaining in a region of predictor space.
Ex1: Prepare a decision tree from the given rectangular predictor space (right panel) or vice
versa given the decision tree (left panel), prepare a rectangular predictor space for the two
predictors X1 and X2.
Ex2: Prepare a decision tree from the given rectangular predictor for the two predictors X1
and X2.

Ex3: Prepare a rectangular predictor space corresponding to the decision tree given below for
the two predictors X1 and X2.

Classification Tree:
Used for qualitative dependent variables. The tree building procedure is similar to the
regression tree.
Here the prediction for a test case is given by label of the most commonly occurring class
(i.e., based on majority vote) in which the test case falls.
We keep on segmenting the predictor space to minimize the nodes impurity measured by e.g.,
the Gini Coefficient G. (Here K is number of classes of response variable e.g., K=2 for binary
classification).
𝐾

𝐺 = ∑ 𝑝̂𝑚𝑘 (1 − 𝑝̂𝑚𝑘 )
𝑘=1

Here 𝑝̂𝑚𝑘 is the proportion of training observations in the mth region that are from the kth
class. G will be smaller when there is mostly one type of cases in the region so that 𝑝̂ 𝑚𝑘 is
closer to either zero or 1. When there is an even number of cases of the two types in the
region, 𝑝̂ 𝑚𝑘 will be closer to 0.5 and G will be large.
Ex4: Consider predicting the default status (yes or no) of a sample of clients from the following
classification tree. The predictors are whether the client owns a home, marital status (married
or single/divorced) and annual income (dollars).

a) Predict whether or not a married client who has income of $100,000 and who does not own
a house will default.
b) Predict whether a single client who has income of $60,000 and who owns a house will
default.
c) Predict whether a single client who has an income of $70,000 and who does not own a
house will default.
d) What are the two most important predictors in predicting default status?
e) What is the number of internal nodes and leaves in this problem?
[Ans: a) not default b) not default c) not default, d) Home ownership and marital status, e)3,4]

Ex5: Consider the regression tree grown on a response variable ‘pollution level’ and seven
predictors as mentioned in the tree plot corresponding to different test locations. The predictors
include ‘number of industries’ in the location, the ‘population’ in thousands in that location,
average number of ‘wet days’ in the year, average ‘temperature’ in Fahrenheit, and ‘wind
speed’ in KM/hour. The numbers in each leave node are the average pollution level in this
predictor space.
a). Is this a regression or classification tree?
b).What are the two most important variables which determine the pollution level in an area?
c). Predict the average pollution level in a test location which has 500 industries, with a
population of 200 thousand, the average wet days being 150, average temperature level of 50
Fahrenheit, and wind speed of 8 Km/hour.
[Answer: a) regression b) number of industries and population in the test area c) 33.88]

Ex6: Consider the classification tree grown to predict whether the client will buy computer
(yes or no). The predictors are age (with three levels youth, middle-aged and senior), whether
the client is student and credit rating of the individual (fair or excellent). The resulting
classification tree is as follows:

a) Is this a regression or classification problem?


b) Predict whether a youth who is not a student and has fair credit rating will buy a computer.
c) Predict whether a senior who has excellent credit rating will buy a computer.
[Ans: a) classification b) not buy c) buy]
[Further details in Ch3, p-172 and onwards in Bowerman’s book]
Decision Tree Models in R: (using the hiring.csv data file)
# Decision tree for classification # using hiring data
library(tree) # first load the package ‘tree’ in R and access it via library
hiring=read.csv(file.choose())
attach(hiring)
head(hiring)
hiring$hire=as.factor(hiring$hire) #note dependent variable must be factor
set.seed(1)
hiring_model=tree(hire~educ+exp+male, data=hiring) # fit the model
# note deviance (crorss-entropy is the default impurity measure)
plot(hiring_model)
text(hiring_model, pretty=0)
# pruning tree # pruning results in a more manageable and interpretable tree
prune.tree=prune.tree(hiring_model , best=4) # a 4 leaves tree
plot(prune.tree)
text(prune.tree,pretty =0)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy