0% found this document useful (0 votes)

55 views18 pages

House Price Prediction: # Importing Necessary Libraries

Uploaded by

maha.kandadai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views18 pages

House Price Prediction: # Importing Necessary Libraries

Uploaded by

maha.kandadai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

House Price Prediction

Data Exploration
In [1]: # importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
!pip install lazypredict
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error

/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy vers

ion >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Collecting lazypredict
Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: click in /opt/conda/lib/python3.10/site-packages (from la
zypredict) (8.1.3)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.10/site-packages
(from lazypredict) (1.2.2)
Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from l
azypredict) (1.5.3)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from laz
ypredict) (4.65.0)
Requirement already satisfied: joblib in /opt/conda/lib/python3.10/site-packages (from l
azypredict) (1.2.0)
Requirement already satisfied: lightgbm in /opt/conda/lib/python3.10/site-packages (from
lazypredict) (3.3.2)
Requirement already satisfied: xgboost in /opt/conda/lib/python3.10/site-packages (from
lazypredict) (1.7.6)
Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (from li
ghtgbm->lazypredict) (0.40.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from li
ghtgbm->lazypredict) (1.23.5)
Requirement already satisfied: scipy in /opt/conda/lib/python3.10/site-packages (from li
ghtgbm->lazypredict) (1.11.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.10/site-pa
ckages (from scikit-learn->lazypredict) (3.1.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.10/site-
packages (from pandas->lazypredict) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages
(from pandas->lazypredict) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from
python-dateutil>=2.8.1->pandas->lazypredict) (1.16.0)
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12

In [2]: # loading datset

df=pd.read_csv("/kaggle/input/housing-price-prediction/Housing.csv")

In [3]: # checking first 5 rows

df.head()

Out[3]: price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating aircondit

0 13300000 7420 4 2 3 yes no no no

1 12250000 8960 4 4 4 yes no no no

2 12250000 9960 3 2 2 yes no yes no

3 12215000 7500 4 2 2 yes no yes no

4 11410000 7420 4 1 2 yes yes yes no

In [4]: # checking last 5 rows

df.tail()

Out[4]: price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating aircond

540 1820000 3000 2 1 1 yes no yes no

541 1767150 2400 3 1 1 no no no no

542 1750000 3620 2 1 1 yes no no no

543 1750000 2910 3 1 1 no no no no

544 1750000 3850 3 1 2 yes no no no

In [5]: # checking null values

df.isnull().sum()

price 0
Out[5]:
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

In [6]: # checking duplicate values

df.duplicated().value_counts()

False 545
Out[6]:
dtype: int64

In [7]: # checking column names

df.columns

Index(['price', 'area', 'bedrooms', 'bathrooms', 'stories', 'mainroad',

Out[7]:
'guestroom', 'basement', 'hotwaterheating', 'airconditioning',
'parking', 'prefarea', 'furnishingstatus'],
dtype='object')

In [8]: # checking data types

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB

In [9]: # checking unique values

df.nunique()

price 219
Out[9]:
area 284
bedrooms 6
bathrooms 4
stories 4
mainroad 2
guestroom 2
basement 2
hotwaterheating 2
airconditioning 2
parking 4
prefarea 2
furnishingstatus 3
dtype: int64

In [10]: # getting statistical summary

df.describe()

Out[10]: price area bedrooms bathrooms stories parking

count 545.00 545.00 545.00 545.00 545.00 545.00

mean 4766729.25 5150.54 2.97 1.29 1.81 0.69

std 1870439.62 2170.14 0.74 0.50 0.87 0.86

min 1750000.00 1650.00 1.00 1.00 1.00 0.00

25% 3430000.00 3600.00 2.00 1.00 1.00 0.00

50% 4340000.00 4600.00 3.00 1.00 2.00 0.00

75% 5740000.00 6360.00 3.00 2.00 2.00 1.00

max 13300000.00 16200.00 6.00 4.00 4.00 3.00

Data Visualization
In [11]: # Visualizing 'price'
plt.hist(df['price'], color='r')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.title('Distribution of Prices')
plt.show()

In [12]: # Visualizing 'bedrooms'

df['bedrooms'].value_counts().plot(kind='bar', color='g')
plt.xlabel('Bedrooms')
plt.ylabel('Count')
plt.title('Number of Properties for each Number of Bedrooms')
plt.show()
In [13]: # Visualizing 'bathrooms'
df['bathrooms'].value_counts().plot(kind='barh', color='y')
plt.title('Proportion of Properties for each Number of Bathrooms')
plt.show()

In [14]: # Visualizing 'stories'

df['stories'].value_counts().plot(kind='barh', color='c')
plt.xlabel('Stories')
plt.ylabel('Count')
plt.title('Number of Properties for each Number of Stories')
plt.show()

In [15]: # Visualizing 'mainroads'

df['mainroad'].value_counts().plot(kind='pie', colors=['red', 'yellow'])
plt.title('Properties for Availability of Mainroads')
plt.show()

In [16]: # Visualizing 'guestrooms'

df['guestroom'].value_counts().plot(kind='pie', colors=['green', 'pink'])
plt.title('Number of Properties for Availability of Guestroom')
plt.show()

In [17]: # Visualizing 'basement'

df['basement'].value_counts().plot(kind='pie', colors=['grey', 'cyan'])
plt.title('Number of Properties for Availability of Basement')
plt.show()

In [18]: # Visualizing 'Hot Water Heating'

df['hotwaterheating'].value_counts().plot(kind='pie', colors=['brown', 'orange'])
plt.title('Number of Properties for Availability of Hot Water Heating')
plt.show()

In [19]: # Visualizing 'Air Conditioners'

df['airconditioning'].value_counts().plot(kind='pie', colors=['purple', 'magenta'])
plt.title('Number of Properties for Availability of Air Conditioners')
plt.show()

In [20]: # Visualizing 'parking'

df['parking'].value_counts().plot(kind='bar', color='m')
plt.xlabel('Parking')
plt.ylabel('Count')
plt.title('Number of Properties for each Number of Parking')
plt.show()

In [21]: # Visualizing 'prefarea'

df['prefarea'].value_counts().plot(kind='pie', colors=['darkgreen', 'lightgreen'])
plt.title('Number of Properties for Availability of Prefarea')
plt.show()

In [22]: # Visualizing 'furnishing status'

df['furnishingstatus'].value_counts().plot(kind='pie', colors=['skyblue', 'lightblue', '
plt.title('Number of Properties for Type of furnishing status')
plt.show()

In [23]: # Visualizing 'area' vs. 'price'

plt.scatter(df['area'], df['price'], color='orange')
plt.xlabel('Area')
plt.ylabel('Price')
plt.title('Area vs. Price')
plt.show()
In [24]: # Creating a pair plot
sns.pairplot(df)
plt.show()

In [25]: # Calculating the correlation matrix

correlation_matrix = df.corr()

# Creating a correlation heatmap

plt.figure(figsize=(10, 8)) # Adjust the figure size as per your preference
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Feature Engineering
In [26]: # Select the columns to encode
categorical_columns = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'aircondi

# Perform label encoding

label_encoder = LabelEncoder()
for col in categorical_columns:
df[col] = label_encoder.fit_transform(df[col])

# Display the updated DataFrame

Out[26]: price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating aircon

0 13300000 7420 4 2 3 1 0 0 0

1 12250000 8960 4 4 4 1 0 0 0

2 12250000 9960 3 2 2 1 0 1 0

3 12215000 7500 4 2 2 1 0 1 0

4 11410000 7420 4 1 2 1 1 1 0

... ... ... ... ... ... ... ... ... ...
540 1820000 3000 2 1 1 1 0 1 0

541 1767150 2400 3 1 1 0 0 0 0

542 1750000 3620 2 1 1 1 0 0 0

543 1750000 2910 3 1 1 0 0 0 0

544 1750000 3850 3 1 2 1 0 0 0

545 rows × 13 columns

Machine Learning Model

Splitting the dataset

In [27]: X = df.drop('price', axis=1) # Features (excluding the target variable)
y = df['price'] # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

In [28]: regressor = LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)

models, predictions = regressor.fit(X_train, X_test, y_train, y_test)

100%|██████████| 42/42 [00:14<00:00, 2.98it/s]

In [29]: print(models)

Adjusted R-Squared R-Squared RMSE \

Model
GradientBoostingRegressor 0.62 0.66 1301871.87
PoissonRegressor 0.62 0.66 1303698.42
LassoLarsCV 0.61 0.65 1331071.42
LassoLarsIC 0.61 0.65 1331071.42
LarsCV 0.61 0.65 1331071.42
Lars 0.61 0.65 1331071.42
TransformedTargetRegressor 0.61 0.65 1331071.42
LinearRegression 0.61 0.65 1331071.42
Lasso 0.61 0.65 1331072.08
LassoLars 0.61 0.65 1331072.09
Ridge 0.61 0.65 1331290.05
SGDRegressor 0.60 0.65 1332795.68
LassoCV 0.60 0.65 1332883.21
RidgeCV 0.60 0.65 1333447.11
HistGradientBoostingRegressor 0.58 0.63 1369076.05
BaggingRegressor 0.58 0.63 1370335.03
XGBRegressor 0.58 0.63 1374945.10
LGBMRegressor 0.58 0.62 1381195.18
ExtraTreesRegressor 0.57 0.62 1391999.83
RandomForestRegressor 0.56 0.61 1400765.84
ElasticNet 0.55 0.60 1418765.63
HuberRegressor 0.55 0.60 1420233.36
KNeighborsRegressor 0.53 0.58 1451363.57
OrthogonalMatchingPursuitCV 0.52 0.57 1467413.32
AdaBoostRegressor 0.50 0.56 1497403.63
TweedieRegressor 0.49 0.55 1512162.75
GammaRegressor 0.49 0.55 1515460.98
RANSACRegressor 0.48 0.54 1527036.86
DecisionTreeRegressor 0.40 0.47 1639566.30
ExtraTreeRegressor 0.33 0.41 1729079.53
OrthogonalMatchingPursuit 0.18 0.27 1917103.70
ElasticNetCV -0.14 -0.02 2265132.23
BayesianRidge -0.15 -0.02 2268298.23
DummyRegressor -0.15 -0.02 2268298.23
NuSVR -0.17 -0.04 2294658.59
QuantileRegressor -0.24 -0.10 2356800.96
SVR -0.24 -0.10 2359647.74
KernelRidge -4.62 -4.00 5024858.77
PassiveAggressiveRegressor -4.78 -4.13 5094459.54
LinearSVR -5.71 -4.96 5488681.78
MLPRegressor -5.71 -4.96 5488874.14
GaussianProcessRegressor -12833.58 -11407.52 240135712.32

Time Taken
Model
GradientBoostingRegressor 0.19
PoissonRegressor 0.02
LassoLarsCV 0.03
LassoLarsIC 0.02
LarsCV 0.05
Lars 0.09
TransformedTargetRegressor 0.01
LinearRegression 0.01
Lasso 0.01
LassoLars 0.01
Ridge 0.01
SGDRegressor 0.01
LassoCV 0.08
RidgeCV 0.01
HistGradientBoostingRegressor 0.25
BaggingRegressor 0.05
XGBRegressor 0.13
LGBMRegressor 0.32
ExtraTreesRegressor 0.25
RandomForestRegressor 0.32
ElasticNet 0.02
HuberRegressor 0.02
KNeighborsRegressor 0.01
OrthogonalMatchingPursuitCV 0.02
AdaBoostRegressor 0.13
TweedieRegressor 0.02
GammaRegressor 0.02
RANSACRegressor 0.21
DecisionTreeRegressor 0.01
ExtraTreeRegressor 0.01
OrthogonalMatchingPursuit 0.03
ElasticNetCV 0.08
BayesianRidge 0.02
DummyRegressor 0.01
NuSVR 0.08
QuantileRegressor 9.24
SVR 0.02
KernelRidge 0.12
PassiveAggressiveRegressor 0.05
LinearSVR 0.01
MLPRegressor 1.87
GaussianProcessRegressor 0.13

In [30]: predictions

Out[30]: Adjusted R-Squared R-Squared RMSE Time Taken

Model

GradientBoostingRegressor 0.62 0.66 1301871.87 0.19

PoissonRegressor 0.62 0.66 1303698.42 0.02

LassoLarsCV 0.61 0.65 1331071.42 0.03

LassoLarsIC 0.61 0.65 1331071.42 0.02

LarsCV 0.61 0.65 1331071.42 0.05

Lars 0.61 0.65 1331071.42 0.09

TransformedTargetRegressor 0.61 0.65 1331071.42 0.01

LinearRegression 0.61 0.65 1331071.42 0.01

Lasso 0.61 0.65 1331072.08 0.01

LassoLars 0.61 0.65 1331072.09 0.01

Ridge 0.61 0.65 1331290.05 0.01

SGDRegressor 0.60 0.65 1332795.68 0.01

LassoCV 0.60 0.65 1332883.21 0.08

RidgeCV 0.60 0.65 1333447.11 0.01

HistGradientBoostingRegressor 0.58 0.63 1369076.05 0.25

BaggingRegressor 0.58 0.63 1370335.03 0.05

XGBRegressor 0.58 0.63 1374945.10 0.13

LGBMRegressor 0.58 0.62 1381195.18 0.32

ExtraTreesRegressor 0.57 0.62 1391999.83 0.25

RandomForestRegressor 0.56 0.61 1400765.84 0.32

ElasticNet 0.55 0.60 1418765.63 0.02

HuberRegressor 0.55 0.60 1420233.36 0.02

KNeighborsRegressor 0.53 0.58 1451363.57 0.01

OrthogonalMatchingPursuitCV 0.52 0.57 1467413.32 0.02

AdaBoostRegressor 0.50 0.56 1497403.63 0.13

TweedieRegressor 0.49 0.55 1512162.75 0.02

GammaRegressor 0.49 0.55 1515460.98 0.02

RANSACRegressor 0.48 0.54 1527036.86 0.21

DecisionTreeRegressor 0.40 0.47 1639566.30 0.01

ExtraTreeRegressor 0.33 0.41 1729079.53 0.01

OrthogonalMatchingPursuit 0.18 0.27 1917103.70 0.03

ElasticNetCV -0.14 -0.02 2265132.23 0.08

BayesianRidge -0.15 -0.02 2268298.23 0.02

DummyRegressor -0.15 -0.02 2268298.23 0.01

NuSVR -0.17 -0.04 2294658.59 0.08

QuantileRegressor -0.24 -0.10 2356800.96 9.24

SVR -0.24 -0.10 2359647.74 0.02

KernelRidge -4.62 -4.00 5024858.77 0.12

PassiveAggressiveRegressor -4.78 -4.13 5094459.54 0.05

LinearSVR -5.71 -4.96 5488681.78 0.01

MLPRegressor -5.71 -4.96 5488874.14 1.87

GaussianProcessRegressor -12833.58 -11407.52 240135712.32 0.13

Gradient Boosting Regressor
In [31]: # Create and fit the Gradient Boosting Regression model
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

# Predict on the training and testing sets

train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

Model Evaluation
In [32]: # Evaluate the model using metrics
train_rmse = mean_squared_error(y_train, train_predictions, squared=False)
train_mae = mean_absolute_error(y_train, train_predictions)
test_rmse = mean_squared_error(y_test, test_predictions, squared=False)
test_mae = mean_absolute_error(y_test, test_predictions)

# Print the evaluation metrics

print("Training set - RMSE:", train_rmse)
print("Training set - MAE:", train_mae)
print("Testing set - RMSE:", test_rmse)
print("Testing set - MAE:", test_mae)

Training set - RMSE: 641817.0283186645

Training set - MAE: 476055.9856965125
Testing set - RMSE: 1304741.320665057
Testing set - MAE: 966476.970839526

In [33]: # Visualize the predicted values vs. actual values for the training set
plt.scatter(y_train, train_predictions, color='violet', alpha=0.5)
plt.plot([min(y_train), max(y_train)], [min(y_train), max(y_train)], color='cyan', lines
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Training Set - Actual vs. Predicted Price')
plt.show()

# Visualize the predicted values vs. actual values for the testing set
plt.scatter(y_test, test_predictions, color='violet', alpha=0.5)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='cyan', linestyle
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Testing Set - Actual vs. Predicted Price')
plt.show()
Model Interpretation
In [34]: importances = model.feature_importances_
feature_names = X_train.columns
# Sort the feature importances in descending order
sorted_indices = importances.argsort()[::-1]
sorted_importances = importances[sorted_indices]
sorted_features = feature_names[sorted_indices]

# Plot the feature importances

plt.figure(figsize=(10, 6))
plt.bar(range(len(sorted_importances)), sorted_importances, tick_label=sorted_features,
plt.xlabel('Features')
plt.ylabel('Importance')
plt.title('Feature Importances')
plt.xticks(rotation=45)
plt.show()

Predicting price for new house

In [35]: # Example input for a new house
new_house = np.array([[2000, 4, 3, 2, 1, 1, 2, 1, 1, 2, 3, 1]])

# Predict the price for the new house

predicted_price = model.predict(new_house)

print('Predicted Price:', predicted_price)

Predicted Price: [7152296.80587577]

SPEED Manifesting Method Home Study Workbook Bonus
100% (1)
SPEED Manifesting Method Home Study Workbook Bonus
45 pages
X/Open CAE Specification: Data Management: SQL Remote Database Access
100% (1)
X/Open CAE Specification: Data Management: SQL Remote Database Access
76 pages
Soil Mechanics I Tutorial For Exit Exam
100% (1)
Soil Mechanics I Tutorial For Exit Exam
177 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
7 Day Vegan - Vegetarian Meal Plan For Weight Gain - Holland & Barrett
No ratings yet
7 Day Vegan - Vegetarian Meal Plan For Weight Gain - Holland & Barrett
17 pages
Eda Project
No ratings yet
Eda Project
28 pages
Deep Learning - House Price Prediction
No ratings yet
Deep Learning - House Price Prediction
17 pages
Ex 1
No ratings yet
Ex 1
119 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Open Group Standard: DRDA, Version 5, Volume 2: Formatted Data Object Content Architecture (FD:OCA)
No ratings yet
Open Group Standard: DRDA, Version 5, Volume 2: Formatted Data Object Content Architecture (FD:OCA)
110 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
Report
No ratings yet
Report
40 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Kaggle House Prices Advanced Regression Techniques
No ratings yet
Kaggle House Prices Advanced Regression Techniques
87 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
Chapter 1: Introduction To Cybersecurity and The Economy
No ratings yet
Chapter 1: Introduction To Cybersecurity and The Economy
100 pages
PHD Thesis: STATIC FRICTION IN RUBBER-METAL CONTACTS WITH APPLICATION TO RUBBER PAD FORMING PROCESSES
100% (1)
PHD Thesis: STATIC FRICTION IN RUBBER-METAL CONTACTS WITH APPLICATION TO RUBBER PAD FORMING PROCESSES
183 pages
Webinarcloudmigration 210526181903
No ratings yet
Webinarcloudmigration 210526181903
67 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
33 pages
Chapter8 Decision Trees Exercises v2 20230112
No ratings yet
Chapter8 Decision Trees Exercises v2 20230112
42 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
Chapter7 Clustering Exercises v2 20230112
No ratings yet
Chapter7 Clustering Exercises v2 20230112
49 pages
IndianHouses 1695069727
No ratings yet
IndianHouses 1695069727
7 pages
Multiple Linear Regression Housing Case Study PDF
No ratings yet
Multiple Linear Regression Housing Case Study PDF
151 pages
The Zodiac PDF
100% (1)
The Zodiac PDF
35 pages
House Price Prediction
No ratings yet
House Price Prediction
63 pages
CH 5 Budgetary Control Responsibility Accounting
No ratings yet
CH 5 Budgetary Control Responsibility Accounting
89 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Get Clock Wise Workbook
100% (1)
Get Clock Wise Workbook
25 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
No ratings yet
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
14 pages
Prompt Engineering
100% (1)
Prompt Engineering
33 pages
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Housing Main
No ratings yet
Housing Main
23 pages
Chapter8 Regression Exercises v2 20230112
No ratings yet
Chapter8 Regression Exercises v2 20230112
13 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
15 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
DL 1
No ratings yet
DL 1
11 pages
AWS Resold
No ratings yet
AWS Resold
14 pages
Module 4
No ratings yet
Module 4
14 pages
Dolar - Phrenology of Spirit
No ratings yet
Dolar - Phrenology of Spirit
10 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Module 3
No ratings yet
Module 3
11 pages
Pract1.printdsbdapdf 2
No ratings yet
Pract1.printdsbdapdf 2
7 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
Example Project California Data Anaylsis Jupyter Notebook
No ratings yet
Example Project California Data Anaylsis Jupyter Notebook
28 pages
Mastering Analysis: Data With Pandas
No ratings yet
Mastering Analysis: Data With Pandas
8 pages
Creating Sense of Place
No ratings yet
Creating Sense of Place
15 pages
CHAPTER 1 Heat Transfer
No ratings yet
CHAPTER 1 Heat Transfer
63 pages
Project PDF
No ratings yet
Project PDF
13 pages
DL - LR - 1.ipynb - Colab
No ratings yet
DL - LR - 1.ipynb - Colab
5 pages
1684918425867
No ratings yet
1684918425867
14 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Francis Bacon and The Modern Dilemma Loren Eiseley
100% (2)
Francis Bacon and The Modern Dilemma Loren Eiseley
115 pages
Experience Optimization Playbook
100% (2)
Experience Optimization Playbook
33 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Housing Prices Linear Regression
No ratings yet
Housing Prices Linear Regression
3 pages
Data Science Project
No ratings yet
Data Science Project
7 pages
Evan Marie Carr - Python and SKlearn
No ratings yet
Evan Marie Carr - Python and SKlearn
32 pages
Checkpoint Monitor
No ratings yet
Checkpoint Monitor
30 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
NOTES From TBBOTC Harry Binswanger
No ratings yet
NOTES From TBBOTC Harry Binswanger
3 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Online Genealogy Research Resources
80% (5)
Online Genealogy Research Resources
77 pages
Cornel Session Plan Final
No ratings yet
Cornel Session Plan Final
9 pages
Knowledge-Based Standard Progress Measurement For Integrated Cost and Schedule Performance Control
No ratings yet
Knowledge-Based Standard Progress Measurement For Integrated Cost and Schedule Performance Control
12 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Calypso 10 Automation
No ratings yet
Calypso 10 Automation
16 pages
Exp 10
No ratings yet
Exp 10
1 page
Assignement 4
No ratings yet
Assignement 4
6 pages
Elaborating On The Influence of Culture On The
No ratings yet
Elaborating On The Influence of Culture On The
9 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Week 12
No ratings yet
Week 12
2 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
A
No ratings yet
A
2 pages
38.dynamic Analysis of Multi-Storey RCC Building
No ratings yet
38.dynamic Analysis of Multi-Storey RCC Building
7 pages
Add Column in Table: Syntax
No ratings yet
Add Column in Table: Syntax
4 pages
Using MIS 5e International Edition: MIS in Business by David Kroenke
No ratings yet
Using MIS 5e International Edition: MIS in Business by David Kroenke
37 pages
Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
Axis Bank Set 1
50% (2)
Axis Bank Set 1
14 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Purposive Communication
100% (2)
Purposive Communication
13 pages
De Revolutionibus Orbium Coelestium
No ratings yet
De Revolutionibus Orbium Coelestium
8 pages
Django PDF
No ratings yet
Django PDF
7 pages
N A T D R : Number of Term First Term at Common Differennce Common Ratio
No ratings yet
N A T D R : Number of Term First Term at Common Differennce Common Ratio
6 pages
A Pawn in Someone Else's Game?: The Cognitive, Motivational, and Paradigmatic Barriers To Excelling in Negotiations
No ratings yet
A Pawn in Someone Else's Game?: The Cognitive, Motivational, and Paradigmatic Barriers To Excelling in Negotiations
5 pages
The Critical Review Essay
No ratings yet
The Critical Review Essay
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

House Price Prediction: # Importing Necessary Libraries

Uploaded by

House Price Prediction: # Importing Necessary Libraries

Uploaded by

House Price Prediction

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy vers

In [2]: # loading datset

In [3]: # checking first 5 rows

0 13300000 7420 4 2 3 yes no no no

1 12250000 8960 4 4 4 yes no no no

2 12250000 9960 3 2 2 yes no yes no

3 12215000 7500 4 2 2 yes no yes no

4 11410000 7420 4 1 2 yes yes yes no

In [4]: # checking last 5 rows

540 1820000 3000 2 1 1 yes no yes no

541 1767150 2400 3 1 1 no no no no

542 1750000 3620 2 1 1 yes no no no

543 1750000 2910 3 1 1 no no no no

544 1750000 3850 3 1 2 yes no no no

In [5]: # checking null values

In [6]: # checking duplicate values

In [7]: # checking column names

Index(['price', 'area', 'bedrooms', 'bathrooms', 'stories', 'mainroad',

In [8]: # checking data types

In [9]: # checking unique values

In [10]: # getting statistical summary

Out[10]: price area bedrooms bathrooms stories parking

count 545.00 545.00 545.00 545.00 545.00 545.00

mean 4766729.25 5150.54 2.97 1.29 1.81 0.69

std 1870439.62 2170.14 0.74 0.50 0.87 0.86

min 1750000.00 1650.00 1.00 1.00 1.00 0.00

25% 3430000.00 3600.00 2.00 1.00 1.00 0.00

75% 5740000.00 6360.00 3.00 2.00 2.00 1.00

max 13300000.00 16200.00 6.00 4.00 4.00 3.00

In [12]: # Visualizing 'bedrooms'

In [14]: # Visualizing 'stories'

In [15]: # Visualizing 'mainroads'

In [16]: # Visualizing 'guestrooms'

In [17]: # Visualizing 'basement'

In [18]: # Visualizing 'Hot Water Heating'

In [19]: # Visualizing 'Air Conditioners'

In [20]: # Visualizing 'parking'

In [21]: # Visualizing 'prefarea'

In [22]: # Visualizing 'furnishing status'

In [23]: # Visualizing 'area' vs. 'price'

In [25]: # Calculating the correlation matrix

# Creating a correlation heatmap

# Perform label encoding

# Display the updated DataFrame

541 1767150 2400 3 1 1 0 0 0 0

542 1750000 3620 2 1 1 1 0 0 0

543 1750000 2910 3 1 1 0 0 0 0

544 1750000 3850 3 1 2 1 0 0 0

545 rows × 13 columns

Machine Learning Model

Splitting the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

In [28]: regressor = LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)

100%|██████████| 42/42 [00:14<00:00, 2.98it/s]

Adjusted R-Squared R-Squared RMSE \

Out[30]: Adjusted R-Squared R-Squared RMSE Time Taken

GradientBoostingRegressor 0.62 0.66 1301871.87 0.19

PoissonRegressor 0.62 0.66 1303698.42 0.02

LassoLarsIC 0.61 0.65 1331071.42 0.02

LarsCV 0.61 0.65 1331071.42 0.05

Lars 0.61 0.65 1331071.42 0.09

TransformedTargetRegressor 0.61 0.65 1331071.42 0.01

LinearRegression 0.61 0.65 1331071.42 0.01

Lasso 0.61 0.65 1331072.08 0.01

LassoLars 0.61 0.65 1331072.09 0.01

Ridge 0.61 0.65 1331290.05 0.01

SGDRegressor 0.60 0.65 1332795.68 0.01

LassoCV 0.60 0.65 1332883.21 0.08

RidgeCV 0.60 0.65 1333447.11 0.01

HistGradientBoostingRegressor 0.58 0.63 1369076.05 0.25

BaggingRegressor 0.58 0.63 1370335.03 0.05

XGBRegressor 0.58 0.63 1374945.10 0.13

LGBMRegressor 0.58 0.62 1381195.18 0.32

ExtraTreesRegressor 0.57 0.62 1391999.83 0.25

RandomForestRegressor 0.56 0.61 1400765.84 0.32

/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy vers