0% found this document useful (0 votes)

75 views5 pages

Lecture 16

Uploaded by

eimaaniqbal8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views5 pages

Lecture 16

Uploaded by

eimaaniqbal8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lecture 16: Statistical Inference by Dr.

Javed Iqbal
Decision Tree Models for Regression and Classification
Machine Learning is an AI technique that teaches computers to learn from experience. Machine
learning algorithms use computational methods to “learn” information directly from data
without relying on a predetermined equation as a model.

Decision tree is a supervised machine learning technique which involves segmenting or

stratifying predictor space into simple regions (containing mostly one type of objects or cases).
The splitting rule can be represented by (upside down) tree, hence this method is called
decision tree method.

Regression Tree:
Consider to model the baseball players salary, a quantitative variable. Hence the resulting tree
model is a regression tree. There are 322 observations (players) and 20 variables are measured
on them. There can be many variables affecting salary (thousands of dollars, expressed in logs)
but suppose we consider only two predictors (X1: years of experience and X2: hits made in
previous season). The resulting decision tree is represented as follows:

Overall, the tree stratifies or segments players into three regions of predictor space:
𝑅1 = {𝑋|𝑌𝑒𝑎𝑟𝑠 < 4.5}, 𝑅2 = {𝑋|𝑌𝑒𝑎𝑟𝑠 ≥ 4.5, 𝐻𝑖𝑡𝑠 < 117.5},
𝑅3 = {𝑋|𝑌𝑒𝑎𝑟𝑠 ≥ 4.5, 𝐻𝑖𝑡𝑠 ≥ 117.5}
Here we have two internal nodes and three leaves (final nodes). Within a given region,
predicted salary is the average salary of players in this region.
Interpretation of the tree outcome: Year is most important predictor of salary so that players
with lower years of experience (in particular less than 4.5 years) have lower salary.
Given that players have low experience, the number of hits has no role to play in salary. But
among the players with more than 4.5 years, the number of hits also becomes important with
higher salary being paid to players with high number of hits.
In general, the goal of the tree algorithm is to find regions (boxes) 𝑅1 , 𝑅2 ,… 𝑅𝐽 that minimize
the residual sum of squares (RSS), given by:
𝐽

∑ ∑(𝑦̂𝑖 − 𝑦̂𝑅𝑗 )2
𝑗=1 𝑖∈𝑅𝑗

Where 𝑦̂𝑅𝑗 is the mean response for the training observations within the jth box.

What variable to start with making a tree and where the cut is to be made e.g., 𝑌𝑒𝑎𝑟𝑠 < 4.5, is
determined by the decision tree algorithm so that residual sum of square is minimized. Initially
the residual sum of squares equals sum of squared deviation of all observation y from the grand
mean of 𝑦̅. We select predictor and consider all possible cuts for the predictor. The cut that
provides greatest reduction in the residual sum of squares is selected by the algorithm. After a
cut is made, we keep on segmenting the predictor space to look for further reduction in the
residual sum of squares.
The algorithm stops when a stopping criterion is reached e.g., when there are only a certain
number of cases remaining in a region of predictor space.
Ex1: Prepare a decision tree from the given rectangular predictor space (right panel) or vice
versa given the decision tree (left panel), prepare a rectangular predictor space for the two
predictors X1 and X2.
Ex2: Prepare a decision tree from the given rectangular predictor for the two predictors X1
and X2.

Ex3: Prepare a rectangular predictor space corresponding to the decision tree given below for
the two predictors X1 and X2.

Classification Tree:
Used for qualitative dependent variables. The tree building procedure is similar to the
regression tree.
Here the prediction for a test case is given by label of the most commonly occurring class
(i.e., based on majority vote) in which the test case falls.
We keep on segmenting the predictor space to minimize the nodes impurity measured by e.g.,
the Gini Coefficient G. (Here K is number of classes of response variable e.g., K=2 for binary
classification).
𝐾

𝐺 = ∑ 𝑝̂𝑚𝑘 (1 − 𝑝̂𝑚𝑘 )
𝑘=1

Here 𝑝̂𝑚𝑘 is the proportion of training observations in the mth region that are from the kth
class. G will be smaller when there is mostly one type of cases in the region so that 𝑝̂ 𝑚𝑘 is
closer to either zero or 1. When there is an even number of cases of the two types in the
region, 𝑝̂ 𝑚𝑘 will be closer to 0.5 and G will be large.
Ex4: Consider predicting the default status (yes or no) of a sample of clients from the following
classification tree. The predictors are whether the client owns a home, marital status (married
or single/divorced) and annual income (dollars).

a) Predict whether or not a married client who has income of $100,000 and who does not own
a house will default.
b) Predict whether a single client who has income of $60,000 and who owns a house will
default.
c) Predict whether a single client who has an income of $70,000 and who does not own a
house will default.
d) What are the two most important predictors in predicting default status?
e) What is the number of internal nodes and leaves in this problem?
[Ans: a) not default b) not default c) not default, d) Home ownership and marital status, e)3,4]

Ex5: Consider the regression tree grown on a response variable ‘pollution level’ and seven
predictors as mentioned in the tree plot corresponding to different test locations. The predictors
include ‘number of industries’ in the location, the ‘population’ in thousands in that location,
average number of ‘wet days’ in the year, average ‘temperature’ in Fahrenheit, and ‘wind
speed’ in KM/hour. The numbers in each leave node are the average pollution level in this
predictor space.
a). Is this a regression or classification tree?
b).What are the two most important variables which determine the pollution level in an area?
c). Predict the average pollution level in a test location which has 500 industries, with a
population of 200 thousand, the average wet days being 150, average temperature level of 50
Fahrenheit, and wind speed of 8 Km/hour.
[Answer: a) regression b) number of industries and population in the test area c) 33.88]

Ex6: Consider the classification tree grown to predict whether the client will buy computer
(yes or no). The predictors are age (with three levels youth, middle-aged and senior), whether
the client is student and credit rating of the individual (fair or excellent). The resulting
classification tree is as follows:

a) Is this a regression or classification problem?

b) Predict whether a youth who is not a student and has fair credit rating will buy a computer.
c) Predict whether a senior who has excellent credit rating will buy a computer.
[Ans: a) classification b) not buy c) buy]
[Further details in Ch3, p-172 and onwards in Bowerman’s book]
Decision Tree Models in R: (using the hiring.csv data file)
# Decision tree for classification # using hiring data
library(tree) # first load the package ‘tree’ in R and access it via library
hiring=read.csv(file.choose())
attach(hiring)
head(hiring)
hiring$hire=as.factor(hiring$hire) #note dependent variable must be factor
set.seed(1)
hiring_model=tree(hire~educ+exp+male, data=hiring) # fit the model
# note deviance (crorss-entropy is the default impurity measure)
plot(hiring_model)
text(hiring_model, pretty=0)
# pruning tree # pruning results in a more manageable and interpretable tree
prune.tree=prune.tree(hiring_model , best=4) # a 4 leaves tree
plot(prune.tree)
text(prune.tree,pretty =0)

Decision Tree
No ratings yet
Decision Tree
82 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
17 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Module 6
No ratings yet
Module 6
82 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Ch8 Tree Based Methods
No ratings yet
Ch8 Tree Based Methods
81 pages
Rf&DTfratello 2018
No ratings yet
Rf&DTfratello 2018
10 pages
Cart Animation en Feb19 Final
No ratings yet
Cart Animation en Feb19 Final
60 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
2 - Updated - Ai ML Unit 3 QB 1 2
No ratings yet
2 - Updated - Ai ML Unit 3 QB 1 2
75 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
Lecture Notes - Decision Tree
No ratings yet
Lecture Notes - Decision Tree
13 pages
Unit 3: Classification & Regression: Question Bank and Its Solution
No ratings yet
Unit 3: Classification & Regression: Question Bank and Its Solution
180 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
10a Introduction To Decision Trees Homogeneity
No ratings yet
10a Introduction To Decision Trees Homogeneity
12 pages
Business Intelligence Unit 5
No ratings yet
Business Intelligence Unit 5
12 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Ml2 Summary
No ratings yet
Ml2 Summary
8 pages
COMPENSATION
No ratings yet
COMPENSATION
26 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Unit II
No ratings yet
Unit II
34 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
Unit 2
No ratings yet
Unit 2
11 pages
PADM - Decision Trees
No ratings yet
PADM - Decision Trees
43 pages
Lecture 16
No ratings yet
Lecture 16
6 pages
Unit IV
No ratings yet
Unit IV
36 pages
Classification and Regression Trees-3
No ratings yet
Classification and Regression Trees-3
27 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Tree
No ratings yet
Decision Tree
26 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Lesson 10 Decision Trees
No ratings yet
Lesson 10 Decision Trees
31 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
RevoScale & Decision Trees
No ratings yet
RevoScale & Decision Trees
11 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 16

Uploaded by

Lecture 16

Uploaded by

Lecture 16: Statistical Inference by Dr.

Decision tree is a supervised machine learning technique which involves segmenting or

a) Is this a regression or classification problem?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.