0% found this document useful (0 votes)
8 views6 pages

Exp 2 (Multiple Linear Regression)

The document discusses the application of multiple linear regression (MLR) to analyze datasets, particularly focusing on the Boston Housing dataset. It outlines the theory behind MLR, its limitations, and applications, followed by a code implementation for model training and evaluation. The results indicate that while both models perform well, the Boston Housing model demonstrates superior prediction accuracy based on mean squared error (MSE).

Uploaded by

piyushdohare143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Exp 2 (Multiple Linear Regression)

The document discusses the application of multiple linear regression (MLR) to analyze datasets, particularly focusing on the Boston Housing dataset. It outlines the theory behind MLR, its limitations, and applications, followed by a code implementation for model training and evaluation. The results indicate that while both models perform well, the Boston Housing model demonstrates superior prediction accuracy based on mean squared error (MSE).

Uploaded by

piyushdohare143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Aim:

To Perform multiple linear regression on multiple datasets and see the results and check which one
has better output.

Theory:

Multiple Linear Regression: Theory and Understanding

Multiple linear regression (MLR) is a statistical technique used to model the relationship between a
single dependent variable (what you want to predict) and multiple independent variables (features
that influence the dependent variable). It assumes a linear relationship between these variables and
builds a linear equation to capture this relationship.

Key Concepts:

Equation:

y_hat = β₀ + β₁x₁ + β₂x₂ + ... + β_p * x_p

 y_hat is the predicted value of the dependent variable.

 β₀ is the intercept term (constant value when all independent variables are zero).

 β_i are the coefficients for each independent variable x_i.

 p is the number of independent variables.

Limitations of MLR:

 Cannot capture non-linear relationships.

 Sensitive to assumptions, and their violation can lead to inaccurate results.

 Cannot establish causation; only identifies correlations.

Applications of MLR:

 Predicting house prices based on features like size, location, and amenities.

 Understanding how factors like age, income, and education affect job satisfaction.

 Analysing the impact of advertising campaigns on sales


Code:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import StandardScaler, PolynomialFeatures

from sklearn.feature_selection import SelectFromModel

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset (replace 'your_dataset_filename.csv' with the actual name)

df = pd.read_csv('boston.csv')

# Handle outliers using IQR (adjust based on your data's characteristics)

numeric_cols = df.select_dtypes(include=[np.number]).columns

Q1 = df[numeric_cols].quantile(0.25)

Q3 = df[numeric_cols].quantile(0.75)

IQR = Q3 - Q1

df = df[~((df[numeric_cols] < (Q1 - 1.5 * IQR)) | (df[numeric_cols] > (Q3 + 1.5 * IQR))).any(axis=1)]

# Extract features and target variable (using the provided column names)

X = df.drop(['TOWN', 'TRACT', 'LON', 'LAT', 'MEDV'], axis=1)

y = df['MEDV']

# Feature Scaling

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=0)

# Feature selection (experiment with different thresholds and methods)

rf_model = RandomForestRegressor(random_state=0)

rf_model.fit(X_train, y_train)

sfm = SelectFromModel(rf_model, threshold=0.1) # Adjust threshold if needed

X_train = sfm.transform(X_train)

X_test = sfm.transform(X_test)

# Polynomial features (consider different degrees)

poly = PolynomialFeatures(degree=2, include_bias=False) # Adjust degree if needed

X_train_poly = poly.fit_transform(X_train)

X_test_poly = poly.transform(X_test)

# Model fitting

regressor = LinearRegression()

regressor.fit(X_train_poly, y_train)

# Evaluation

y_pred = regressor.predict(X_test_poly)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print('Train Score: ', regressor.score(X_train_poly, y_train))

print('Test Score: ', regressor.score(X_test_poly, y_test))

print('Mean Squared Error (MSE): ', mse)

print('R-squared (R2): ', r2)

# Visualization (optional)

plt.scatter(y_test, y_pred)

plt.xlabel("Actaul Medv")

plt.ylabel("Predicted Medv")

plt.title("Actual Medv vs Predicted Medv")

plt.show()
Performance Metrics:
Multiple Linear Regression Dataset:

Boston Housing Dataset:

Output:
Multiple Regression dataset:
Boston Housing Dataset Output:

Comparission.

Comparing the performance of models trained on a multiple regression dataset and the Boston
Housing dataset:

Train Score:

The multiple regression model achieves a very high train score (0.983), indicating an excellent fit to
the training data.

The Boston Housing model also demonstrates a reasonably high train score (0.822), suggesting a
good fit to its training data.

Test Score:

Both models exhibit high test scores, with the multiple regression model at 0.887 and the Boston
Housing model at 0.877, indicating strong generalization performance.
Mean Squared Error (MSE):

The multiple regression model has a relatively high MSE of 2,611,228, suggesting higher prediction
errors on average.

In contrast, the Boston Housing model shows a much lower MSE of 5.379, indicating superior
prediction accuracy.

R-squared (R2):

The multiple regression model and the Boston Housing model both achieve high R-squared values
(0.887 and 0.877 respectively), indicating good explanatory power over the variance in their
respective dependent variables.

Conclusion:

While both models exhibit strong performance in terms of train and test scores, the Boston Housing
model outperforms in terms of MSE, suggesting superior prediction accuracy.

Despite the multiple regression model's higher R-squared value, indicating a better fit to the data, its
higher MSE implies potential issues with prediction accuracy on unseen data.

Therefore, for accurate prediction of housing prices, the Boston Housing model is preferred.
However, if the goal is to explain variance in the dependent variable, the multiple regression model
may be more suitable.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy