0% found this document useful (0 votes)

1 views8 pages

Regression Trees - Slide9

Uploaded by

quynhanhphamnguyen221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views8 pages

Regression Trees - Slide9

Uploaded by

quynhanhphamnguyen221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Regression Trees

As we have already seen, decision trees can be used for classification, but we can also
use them for regression, commonly called regression trees.

The basic idea behind regression trees is to split our data into groups based on features,
like in classification, and return a prediction that is the average across the data we have
already seen.

Consider the housing data below, where we are using the ‘Age’ to predict the ‘Price’ of a
house.

Here, we can see the difference age has on the house prices. Ages between 0 and 10
have an average price of approximately $500,000, ages between 10 and 50 have an
average price of approximately $380,000 and houses older than 50 years have an
average price of approximately $100,000. Using these general ranges, we can predict
the price of a house.
Using the data above, we can create the regression tree, as shown below. The prices
were determined by calculating the average price of the houses in the age range.

As you can see, we use the features to group the houses and then calculate the average
price across these groupings.
Criterion

The way the trees are built are similar to classification, but instead of using the entropy
criterion. In Classification Trees, we choose features that increase the information gain.
In Regression Trees, we choose features that minimize the error.

A popular one is the Mean Absolute Error, which we have also seen previously.

How Regression Trees are Built?

Take the dataset sample shown below, the first step is to decide what the first decision
is. We will do this by using the criterion and checking every single feature in the dataset
to see which one produces the minimal error.

Categorical Features

Categorical features are simple, here we have Near Water so all we need to do is
calculate the error if we used this as the first feature. Near Water feature has two
categories: ‘Yes’ and ‘No’, therefore, we must calculate the average ‘Price’ of houses in
the ‘Yes’ and ‘No’ categories. Then we use those values to calculate the average error.

‘No’ Category:
Index of Houses in ‘No’ Category = [0, 1, 2, 3, 4] Prices of Houses in ‘No’ Category =
[260831.34, 222939.35, 101882.1, 226868.52, 94868.94] Average House Price in ‘No’
Category = 181478.05 Absolute Error = [79353.29, 41461.30, 79595.95, 45390.47,
86609.11]

‘Yes’ Category:

Index of Houses in ‘Yes’ Category = [5,6,7,8,9] Prices of Houses in ‘Yes’ Category =

[197703.55, 347982.98, 343150.38, 206713.16, 329768.77] Average House Price in ‘Yes’
Category = 285063.77 Absolute Error = [87360.22, 62919.22, 58086.61, 78350.61,
44705.00]

MAE

This is the MAE of the Near Water feature, and this number will be used to compare
against MAE of all other features to determine which one is the lowest. Therefore, it will
establish the first decision in our regression tree.

MAE = 66383.17

Numerical Features

Numerical features, like ‘Age’, are trickier to handle because we need to find a number,
instead of using a category, to split the data by. We do this by creating a boundary
between each point, then we calculate the error.

For example, first we create the boundary between the first two data points, which are
(0, 260831.34) and (5, 347982.98), so we create a boundary of x = 2.5 (The midpoint
between the x component of the first two data points). We now find the average price of
the houses on the left and right sides of this boundary and use it to calculate the MAE.
Left:

Index of Houses on the Left Side = [0] Prices of Houses on the Left Side = [260831.34]
Average House Price of Left Side = 260831.34 Absolute Error = [0]

Right:

Index of Houses on the Right Side = [1, 2, 3, 4, 5, 6, 7, 8, 9] Prices of Houses in the Right
Category = [222939.35, 101882.1, 226868.52, 94868.94, 197703.55, 347982.98,
343150.38, 206713.16, 329768.77] Average House Price in the Right Category =
230208.64 Absolute Error = [7269.29, 128326.54, 3340.12, 135339.7, 32505.09,
117774.34, 112941.74, 23495.48, 99560.13]

MAE

Now we can find the MAE, using the absolute errors from the left and right sides.

MAE = 66055.24

Further Steps

This process is then repeated for each boundary between each pair of consecutive
points.

This results in the following MAE for each boundary:

Where the values for the points are:

(2.5, 66055.24) (7.5, 60871.09) (15, 50847.52) (22.5, 57918.2) (35, 49726.55) (50,
51792.86) (57.5, 51288.06) (75, 62616.49) (95, 66568.42)

We can see that the boundary 35 results in the lowest MAE in this feature.

Choosing the Decision

Now, we compare the categorical MAE and the lowest numerical MAE, in this case, the
categorical is 66055.24, and the numerical is 49726.55. So, for the first decision, we will
use the numerical ‘Age’ feature. We end up with a regression tree that looks like this:

When do we Stop?

With the regression tree above, we have two options, we can either stop here and use
the average value of the ‘Yes’ (left) and ‘No’ (right) to predict the house prices, or we can
continue to add more decisions to either branch. There are a few conditions that are
commonly used to stop growing regression trees:
• Tree depth • Number of remaining samples on a branch • Number of samples on each
branch if another decision is made

The depth of the tree above, is 1, because there is a single decision and the number of
samples on each side is 5. Let’s add more decisions until the depth of the tree is 2. First,
we start with the ‘Yes’ (left) side and we calculate the MAE for the features using the
houses that have ‘Age’ < 35.

Adding Decisions

Left

Like before, we use the Near Water feature and calculate the MAE on houses with index
0, 3, 6, 7, and 9.

Categorical Features

MAE = 11005.34

Numerical Features

Now, we find the MAE for the boundaries in the ‘Age’ feature.

We can see that the Near Water feature causes the MAE to be lowest on the ‘Yes’ (left)
side of the regression tree, so we will add a decision for the Near Water feature.

Right

Now we will find the features that result in the lowest MAE from the houses with index 1,
8, 2, 4, and 5.
Categorical Features

MAE = 35018.94

Numerical Features

Here, we can also see that the ‘Age’ feature will result in the lowest MAE with the
boundary set to 57.5, so on the ‘No’ (right) side of the tree, we will add another decision
for the ‘Age’ feature.

Final Result

The Human Challenges of The Digital World
No ratings yet
The Human Challenges of The Digital World
27 pages
House Prices Prediction in King County
No ratings yet
House Prices Prediction in King County
10 pages
SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
Lecture Notes 2.2 Quantitative - Reasoning Estimation
No ratings yet
Lecture Notes 2.2 Quantitative - Reasoning Estimation
16 pages
Regression Tree ML
No ratings yet
Regression Tree ML
3 pages
Story Point Estimation Copy
No ratings yet
Story Point Estimation Copy
16 pages
Data Analysis and Modeling
No ratings yet
Data Analysis and Modeling
24 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Project Report ME-315 Machine Learning in Practice: Sebastian Perez Viegener LSE ID:201870983 July 3, 2019
No ratings yet
Project Report ME-315 Machine Learning in Practice: Sebastian Perez Viegener LSE ID:201870983 July 3, 2019
15 pages
Regression Trees
No ratings yet
Regression Trees
11 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
House Price Prediction
No ratings yet
House Price Prediction
3 pages
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
100% (1)
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
32 pages
Housing
No ratings yet
Housing
21 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
House Price Prediction Using Machine Learning and Neural Networks
No ratings yet
House Price Prediction Using Machine Learning and Neural Networks
4 pages
Decision Trees, Regression, Artificial Neural Networks, Cluster Analysis, Association Rule Mining Decision Trees
No ratings yet
Decision Trees, Regression, Artificial Neural Networks, Cluster Analysis, Association Rule Mining Decision Trees
20 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Wang 2021
No ratings yet
Wang 2021
5 pages
Capstone Project PPT by Roshan Padhi
No ratings yet
Capstone Project PPT by Roshan Padhi
9 pages
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
No ratings yet
Predicting House Prices Using Regression Techniques: Problem Statement: Problems Faced During Buying A House
20 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Summer Internship Outlook
No ratings yet
Summer Internship Outlook
35 pages
Report
No ratings yet
Report
40 pages
Gee Cart 2008
No ratings yet
Gee Cart 2008
8 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
Pa Da1
No ratings yet
Pa Da1
17 pages
Multiple Regression Real Estate Example PDF
No ratings yet
Multiple Regression Real Estate Example PDF
6 pages
Report
No ratings yet
Report
7 pages
Housing
No ratings yet
Housing
6 pages
House Price Prediction Project
No ratings yet
House Price Prediction Project
55 pages
House Price - Prediction
No ratings yet
House Price - Prediction
4 pages
Urban Studies: Determinants of House Price: A Decision Tree Approach
No ratings yet
Urban Studies: Determinants of House Price: A Decision Tree Approach
16 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Project1 Report1
No ratings yet
Project1 Report1
3 pages
House Price Predictor Using ML Through A
No ratings yet
House Price Predictor Using ML Through A
4 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Ijcse Icter P113
No ratings yet
Ijcse Icter P113
5 pages
Determinants of House Price: A Decision Tree Approach
No ratings yet
Determinants of House Price: A Decision Tree Approach
15 pages
Insurance Analytics: Prof. Julien Trufin
No ratings yet
Insurance Analytics: Prof. Julien Trufin
64 pages
Regression Trees, Step by Step. Learn How To Build Regression Trees and - by Ivo Bernardo - Aug, 2022 - Towards Data Science
No ratings yet
Regression Trees, Step by Step. Learn How To Build Regression Trees and - by Ivo Bernardo - Aug, 2022 - Towards Data Science
36 pages
House Prices
No ratings yet
House Prices
5 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
House Price Predicting Model Using
No ratings yet
House Price Predicting Model Using
7 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
Iamsp 2
No ratings yet
Iamsp 2
8 pages
Module 2
No ratings yet
Module 2
20 pages
Course: Research Methods Assignment 3 Houses in UK SPSS Data Analysis
No ratings yet
Course: Research Methods Assignment 3 Houses in UK SPSS Data Analysis
16 pages
House Pricing Regression
No ratings yet
House Pricing Regression
11 pages
Applied Predictive Analytics For Business: Decision Trees
No ratings yet
Applied Predictive Analytics For Business: Decision Trees
30 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
Synopsis 01
No ratings yet
Synopsis 01
2 pages
Problem Statement
No ratings yet
Problem Statement
6 pages
Part VI: Simple Regression: X X Y E
No ratings yet
Part VI: Simple Regression: X X Y E
22 pages
Gre Formula Book
From Everand
Gre Formula Book
Saifuddin Kamran
No ratings yet
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
HW 2 - AEIOU Framework
No ratings yet
HW 2 - AEIOU Framework
2 pages
Requirements
No ratings yet
Requirements
1 page
Report Mẫu K15
No ratings yet
Report Mẫu K15
15 pages
Week 1 Exercises
No ratings yet
Week 1 Exercises
1 page
Week 4 Exercise
No ratings yet
Week 4 Exercise
2 pages
ESPACE Interior Design Trifold Brochure
No ratings yet
ESPACE Interior Design Trifold Brochure
2 pages
Entre Group5 Assignment5
No ratings yet
Entre Group5 Assignment5
23 pages
Lecture 9
No ratings yet
Lecture 9
22 pages
ITITIU19141 - NGuyenAnhKhoa - Entre - Midterm
No ratings yet
ITITIU19141 - NGuyenAnhKhoa - Entre - Midterm
8 pages
Background and Context of The Problem, Including Trends
No ratings yet
Background and Context of The Problem, Including Trends
3 pages
ID - Name - Entre - Midterm
No ratings yet
ID - Name - Entre - Midterm
9 pages
Midterm Examination: SUBJECT: Scalable and Distributed Computing (ID: IT1391U)
No ratings yet
Midterm Examination: SUBJECT: Scalable and Distributed Computing (ID: IT1391U)
3 pages
ITITIU19141 - NGuyenAnhKhoa - Entre - Midterm
No ratings yet
ITITIU19141 - NGuyenAnhKhoa - Entre - Midterm
9 pages
Linear Regr GD
No ratings yet
Linear Regr GD
3 pages
Song Lyríc
No ratings yet
Song Lyríc
4 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
GradientDescent - Implementation - Ipynb - Colab
No ratings yet
GradientDescent - Implementation - Ipynb - Colab
5 pages
Busness Statistics & Statistics For Busness II Course Outline
No ratings yet
Busness Statistics & Statistics For Busness II Course Outline
3 pages
ECD202 Lec04 2023
No ratings yet
ECD202 Lec04 2023
9 pages
Module 2 PDF
No ratings yet
Module 2 PDF
14 pages
Lampiran Anova
No ratings yet
Lampiran Anova
12 pages
Linear Regression and Anova
No ratings yet
Linear Regression and Anova
11 pages
Cbs 20240602
100% (2)
Cbs 20240602
17 pages
Multiple Choice Test Bank Questions No Feedback - Chapter 5
No ratings yet
Multiple Choice Test Bank Questions No Feedback - Chapter 5
7 pages
Quiz Ans Key
No ratings yet
Quiz Ans Key
12 pages
Quiz#2 Review - Promgt3
No ratings yet
Quiz#2 Review - Promgt3
144 pages
Question Bank Converted 1
No ratings yet
Question Bank Converted 1
33 pages
Chapter 13 The Wilcoxon Signed Rank Test: The Avonford Star Public Votes New Shopping Centre The Tops - at Last!
No ratings yet
Chapter 13 The Wilcoxon Signed Rank Test: The Avonford Star Public Votes New Shopping Centre The Tops - at Last!
18 pages
2.4 - Measures of Dispersion
No ratings yet
2.4 - Measures of Dispersion
16 pages
Le Hoang Phuc An - 2152367 - Report - CC01 - 2023
No ratings yet
Le Hoang Phuc An - 2152367 - Report - CC01 - 2023
2 pages
Applied Statistics and Probability For Engineers Chapter - 7
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 7
8 pages
Types Sampling Methods
67% (6)
Types Sampling Methods
40 pages
Pengaruh Teknik Budidaya Terhadap Produksi Kopi (Coffea Spp. L.) MASYARAKAT KARO
No ratings yet
Pengaruh Teknik Budidaya Terhadap Produksi Kopi (Coffea Spp. L.) MASYARAKAT KARO
16 pages
MBA Sahil Business Analytics
No ratings yet
MBA Sahil Business Analytics
5 pages
Research Paper Using Anova
No ratings yet
Research Paper Using Anova
4 pages
Exercise 14.4: Answer 1
No ratings yet
Exercise 14.4: Answer 1
6 pages
STROBE MR Checklist - Fillable 3
No ratings yet
STROBE MR Checklist - Fillable 3
4 pages
Statistical Hypothesis:: Details At:::::::::::::::::::::http://goo - gl/7Dztn
No ratings yet
Statistical Hypothesis:: Details At:::::::::::::::::::::http://goo - gl/7Dztn
11 pages
Mradul Maheshwari - MPO
No ratings yet
Mradul Maheshwari - MPO
3 pages
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
No ratings yet
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
4 pages
Advenced Level Descriptive Statistics
100% (1)
Advenced Level Descriptive Statistics
14 pages
Elliot M. Cramer R. Darrell Bock
No ratings yet
Elliot M. Cramer R. Darrell Bock
24 pages
Panels Tata Command
No ratings yet
Panels Tata Command
7 pages
Multilevel Modeling Using R - Finch Bolin Kelley
100% (2)
Multilevel Modeling Using R - Finch Bolin Kelley
82 pages
MBA-BUSINESS RESEARCH METHODS - Important Questions - Doc - 20241127 - 105353 - 0000
No ratings yet
MBA-BUSINESS RESEARCH METHODS - Important Questions - Doc - 20241127 - 105353 - 0000
3 pages
Co Stat
No ratings yet
Co Stat
19 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Regression Trees - Slide9

Uploaded by

Regression Trees - Slide9

Uploaded by

Regression Trees

How Regression Trees are Built?

Index of Houses in ‘Yes’ Category = [5,6,7,8,9] Prices of Houses in ‘Yes’ Category =

This results in the following MAE for each boundary:

Choosing the Decision

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.