0% found this document useful (0 votes)
4 views8 pages

120 GSJ10713

The study focuses on developing a reliable housing price prediction model using ensemble learning techniques, specifically Decision Tree, Random Forest, and XGBoost. The research emphasizes the importance of various factors affecting house prices and aims to improve prediction accuracy by combining multiple models. The methodology includes data exploration, model training, and evaluation using performance metrics such as R2 score, MAE, and MSE.

Uploaded by

Prajakta Ghude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

120 GSJ10713

The study focuses on developing a reliable housing price prediction model using ensemble learning techniques, specifically Decision Tree, Random Forest, and XGBoost. The research emphasizes the importance of various factors affecting house prices and aims to improve prediction accuracy by combining multiple models. The methodology includes data exploration, model training, and evaluation using performance metrics such as R2 score, MAE, and MSE.

Uploaded by

Prajakta Ghude
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

GIS SCIENCE JOURNAL ISSN NO : 1869-9391

House Price Prediction Using Ensembled Based Machine


Learning Techniques
aHardi Joshi, bSaket Swarndeep

a
Post Graduate Scholar, Dept. Computer Engineering (Software Engineering), L J University, Ahmedabad,
Gujarat, India
b
Assistant Professor, Dept. Computer Engineering (Software Engineering), L J University, Ahmedabad,
Gujarat, India

Abstract: Owning a home is not only a basic requirement, but it also signifies prestige. House costs
change depending on a variety of elements, such as location, size, no. of bedrooms, lift availability,
parking spaces, etc. The goal of the study is to create a reliable housing price prediction model. For
our model, we plan to use the ensemble learning technique. We will use Decision Tree, Random Forest,
and XGBoost as the base models for ensemble learning. Our primary goal in merging various base
models is to increase prediction accuracy.

Keywords: Machine learning, Real estate, House price prediction, Machine learning algorithms, Ensemble learning,
Decision tree, Random Forest, XGBoost, MSE, MAE, R2 score.

1. INTRODUCTION
A house is an asset for many reasons. More than just a location to call home, it also gives security,
personal space, emotional attachment, stability, Tax benefits etc. The purchase of house is popular
among investors. Investing in properties can be a viable way to build wealth over the long term, but it's
important to carefully consider the risks and benefits before making an investment. Knowing the exact
valuation of a property is very important, whether you are buying or selling a property. The price of a
house is influenced by several factors like Location, Area, No. of bedrooms, Lift availability, Parking
slots etc.

Figure 1.1: House price prediction [9]

Machine learning (ML) is study of algorithms that can recognise patterns and make predictions or
judgements without being explicitly programmed. ensemble learning is a technique of machine learning
which combines the predictions of various models to get a prediction that is more reliable and accurate.
[1]

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1170


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

There are 2 main types of Ensemble Learning:


1. Bagging: This method trains different base models independently.
2. Boosting: This method is sequentially iterative procedure where each next model concentrates
on fixing the errors caused by the earlier versions.

Figure 1.2: Bagging and Boosting [10]

In this research, we will use bagging ensemble learning technique. We will use Decision Tree, Random
Forest and XGBoost as base models. Our main purpose of combining different base model is to improve
prediction and achieve higher accuracy. This thesis is divided into 5 parts: Section 1 contains
Introduction about topic, Section 2 contains Literature review, Section 3 contains Research
Methodology with Dataset, Data exploration and transformation, Proposed model, Section 4 contains
Results and Section 5 have Conclusion and References.

2. Literature Review
Adetunji et al., (2022) [2] studied that housing prices are based on factors like location, city etc. The
authors use of Random Forest algorithm for house price prediction. UCI machine learning repository
“Boston housing” is used in this paper. Performance evaluation metrics are used to test the performance
of the model.

Ghosalkar et al., (2018) [3] focuses on prediction of house prices for the people considering their
financial plans and needs. This study predicts house prices in Indian city Mumbai. The reason Linear
Regression used in this paper is that Linear Regression can predict a numerical target value. MAE,
MSE, RMSE are used to check the quality of model.

T.D. Phan, (2019) [4] studied historical data for house price prediction. It analyzes a real historical
transactional dataset to get valuable insight into the housing market in Melbourne city. In this paper
different machine learning techniques are used like Linear Regression, Polynomial Regression,
Regression Tree, Neural Network, and SVM. In this paper “Melbourne Housing Market” dataset is
used.

Truong et al., (2018) [5] estimates the changes the in house pricing.This paper uses both traditional and
advance machine learning methods. A dataset named “Housing Price in Beijing” is used. In this paper
we will discuss traditional machine learning techniques like Random Forest, Extreme Gradient
Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and advanced machine learning
techniques like Hybrid Regression (65% Lasso and 35% XGBoost) and Stacked Generalization (Level
1:Random Forest and LightGBM, Level 2:XGBoost).

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1171


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

Jain et al., (2020) [6] uses stacking algorithm on different regression algorithms. Other algorithms like
SVM algorithm, decision tree algorithm, Random Forest classifier etc are also used. The final prediction
is calculated with higher accuracy and precision.

Ahtesham et al., (2020) [7] examines information gathered from the Open Data Pakistan website.
Karachi dataset was chosen especially for predicting home prices. This paper uses XGBoost machine
learning algorithm for house price prediction. Accuracy score and MAE are used to evaluate the model's
quality.

3. Research Methodology
3.1 Dataset

As we know price of a house is influenced by several factors like Location, Area, No. of bedrooms, Lift
availability, Parking slots etc. We have used data of Indian city Mumbai. This dataset was downloaded
from a website name Kaggle. This dataset has 6347 observations and 19 variables. The dataset contains
following information: Price, Area, Location, No. of Bedrooms, New/Resale, Gymnasium, Lift
Available, Car Parking, Maintenance Staff, 24x7 Security, Children's Play Area, Clubhouse, Intercom,
Landscaped Gardens, Indoor Games, Gas Connection, Jogging Track, Swimming Pool.

Name Type Description


Price Numerical House's Price
Area Numerical House's Area
Location Categorical House's Location
No. of Bedrooms Numerical Number of Bedrooms
New/Resale Numerical New House/Resale
Gymnasium Numerical Gym available or not
Lift Available Numerical Lift available or not
Car Parking Numerical Car Parking available or not
Maintenance Staff Numerical Maintenance Staff available or not
24x7 Security Numerical 24x7 Security available or not
Children's Play Area Numerical Children's Play Area available or not
Clubhouse Numerical Clubhouse available or not
Intercom Numerical Intercom available or not
Landscaped Gardens Numerical Landscaped Gardens available or not
Indoor Games Numerical Indoor Games available or not
Gas Connection Numerical Gas Connection available or not
Jogging Track Numerical Jogging Track available or not
Swimming Pool Numerical Swimming Pool available or not
Table 3.1: Features Description

3.2 Data Exploration and Transformation

To understand dataset better we will use heatmap. Heatmap will help us to gain insights into the
relationships and dependencies within dataset. Heatmap is used for data exploration and analysis,
particularly for understanding the correlations between variables in a dataset.

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1172


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

Figure 3.2: Heatmap

From figure 3.1 we observed that Area of house and No. of bedroom are strongly connected to price of
the house. As we know Location is a categorical variable, we will use technique called one-hot encoding
to convert location into numerical variable.

3.3 Proposed Model

As we know, the price of a house is influenced by several factors like Location, Area, No. of bedrooms,
Lift availability, Parking slots etc. There are numerous machine learning techniques available for
prediction. Every technique has benefits and drawbacks. Hence, we will use ensemble learning. We
will use Decision Tree, Random Forest and XGBoost as base models.

1. Decision Tree
Decision trees, a type of supervised learning algorithm used for classification and prediction
tasks. It is a non-parametric technique that generates predictions by using a tree-like model of
choices and potential outcomes. The tree is a collection of nodes that represent decisions based
on the input features, and branches represent the possible outcomes. The leaves of the tree
represent the final output labels.[8]

There are two types of DT:


1. Classification tree
2. Regression tree

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1173


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

Figure 3.3.1: Decision Tree [8]

2. Random Forest
Random forest is a supervised ensemble learning technique which can be used for
classification and regression tasks. A huge number of decision trees are constructed by
Random Forest, each uses a random subset of the input features and training data. The
average or majority decision of the individual tree outputs makes up the final output.

Figure 3.3.2: Random Forest [11]

3. XGBoost
XGBoost (Extreme Gradient Boosting) is a supervised ensemble learning algorithm used for
regression, classification, and ranking. It uses decision trees to make predictions. The final
output is the sum of the predictions made by all the trees in the ensemble, and each tree is
constructed using the errors of the previous tree. XGBoost provides a parallel tree boosting
which prevides fast and accurate results. [12]

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1174


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

Figure 3.3.3: XGBoost [13]

After prediction output from all 3 models, we will have to combine them into one. There are 2
techniques for combining multiple output into one in ensemble learning: Averaging and Max voting.
We will use Averaging to combine the predictions of individual models to obtain a final prediction.
Averaging means calculating the average(mean) of predictions made by each model.

Figure 3.3.4: Proposed Model

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1175


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

4. Result

After Averaging, we will use performance evaluation metrics to evaluate performance and effectiveness
of our system. We will use R2 score, Mean Absolute Error (MAE), Mean Squared Error(MSE), and
Root Mean Squared Error(RMSE).

1. R2 Score (Coefficient of Determination):


R2 score measures the proportion of the variance in the dependent variable that is predictable from the
independent variables. It ranges from 0 to 1, with 1 indicating a perfect fit and 0 indicating no
relationship between the independent and dependent variables.

2. Mean Absolute Error (MAE):


MAE calculates the average of the absolute differences between the predicted and actual values. MAE
is computed using the formula:

MAE = (1/n) * Σ|yᵢ - ŷᵢ|

3. Mean Squared Error (MSE):


MSE calculates the average of the squared differences between the predicted and actual values. MSE
is computed using the formula:

MSE = (1/n) * Σ(yᵢ - ŷᵢ)²

4. Root Mean Squared Error (RMSE):


RMSE is the square root of the mean squared error and provides a more interpretable metric. RMSE is
computed using the formula:

RMSE = √(MSE)

Figure 4.1: Result

MAE, MSE, RMSE shows error ratio of our system while R2 score shows relationship between the
Predicted and actual output. Hence the closer the score to 1, the better. In figure 4.2 there is a graph
Comparing the predicted price and actual price.

Figure 4.2: Actual vs. Predicted price

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1176


GIS SCIENCE JOURNAL ISSN NO : 1869-9391

5. Conclusion
The Housing market is Pillar of Economic Growth and Stability. House price prediction plays a
significant role in shaping and influencing the economy. The existing systems mostly focus on a single
model. As there are numerous ML algorithms that can be used for price predictions. We will use
ensemble learning technique for our system. In ensemble learning multiple models are integrated
together for better outcome. Our main purpose of this study is to build a model that improve house price
prediction and achieve higher accuracy.

References
[1] Polikar, R. (2012). Ensemble learning. Ensemble machine learning: Methods and applications, 1-
34.
[2] Adetunji, A. B., Akande, O. N., Ajala, F. A., Oyewo, O., Akande, Y. F., & Oluwadara, G.
(2022). House Price Prediction using Random Forest Machine Learning Technique. Procedia
Computer Science, 199, 806-813.
[3] Ghosalkar, N. N., & Dhage, S. N. (2018, August). Real estate value prediction using linear
regression. In 2018 fourth international conference on computing communication control and
automation (ICCUBEA) (pp. 1-5). IEEE.
[4] Phan, T. D. (2018, December). Housing price prediction using machine learning algorithms:
The case of Melbourne city, Australia. In 2018 International conference on machine learning and
data engineering (iCMLDE) (pp. 35-42). IEEE.
[5] Truong, Q., Nguyen, M., Dang, H., & Mei, B. (2020). Housing price prediction via improved
machine learning techniques. Procedia Computer Science, 174, 433-442.
[6] Jain, M., Rajput, H., Garg, N., & Chawla, P. (2020, July). Prediction of house pricing using machine
learning with Python. In 2020 International Conference on Electronics and Sustainable Communication
Systems (ICESC) (pp. 570-574). IEEE
[7] M. Ahtesham, N. Z. Bawany and K. Fatima, "House Price Prediction using Machine Learning
Algorithm - The Case of Karachi City, Pakistan," 2020 21st International Arab Conference on
Information Technology (ACIT), Giza, Egypt, 2020, pp. 1-5, doi:
10.1109/ACIT50332.2020.9300074.
[8] Thamarai, M., & Malarvizhi, S. P. (2020). House Price Prediction Modeling Using Machine
Learning. International Journal of Information Engineering & Electronic Business, 12(2).
[9] RPubs - House Price Prediction with R. (2021, August 29). RPubs - House Price Prediction With
R. https://rpubs.com/Zetrosoft/lbb-rm
[10] Bagging & Boosting in Machine Learning world. (n.d.). Bagging & Boosting in Machine Learning
World. https://www.linkedin.com/pulse/bagging-boosting-machine-learning-world-debaditya-
chakravorty
[11] What is a Random Forest? (n.d.). TIBCO Software. https://www.tibco.com/reference-center/what-
is-a-random-forest
[12] XGBoost Documentation — xgboost 1.7.3 documentation.
(n.d.). https://xgboost.readthedocs.io/en/stable/
[13] Simplified structure of XGBoost. (n.d.). https://www.researchgate.net/figure/Simplified-structure-
of-XGBoost_fig2_348025909

VOLUME 10, ISSUE 5, 2023 PAGE NO: 1177

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy