0% found this document useful (0 votes)
65 views5 pages

20MIS1025 - Regression - Ipynb - Colaboratory

This document introduces linear regression models for predicting continuous target variables. It explores a housing dataset, implements an ordinary least squares regression model, evaluates model performance, and discusses regularized regression methods. Key steps include visualizing dataset characteristics, fitting a linear regression model with scikit-learn, calculating performance metrics, and comparing ordinary least squares to lasso regression.

Uploaded by

Sandip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views5 pages

20MIS1025 - Regression - Ipynb - Colaboratory

This document introduces linear regression models for predicting continuous target variables. It explores a housing dataset, implements an ordinary least squares regression model, evaluates model performance, and discusses regularized regression methods. Key steps include visualizing dataset characteristics, fitting a linear regression model with scikit-learn, calculating performance metrics, and comparing ordinary least squares to lasso regression.

Uploaded by

Sandip Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

8/23/23, 11:42 PM 20MIS1025_Regression.

ipynb - Colaboratory

Predicting Continuous Target Variables with Regression Analysis

Overview

Introducing a simple linear regression model


Exploring the Housing Dataset

Visualizing the important characteristics of a dataset


Implementing an ordinary least squares linear regression model

Solving regression for regression parameters with gradient descent


Estimating the coefficient of a regression model via scikit-learn
Evaluating the performance of linear regression models
Summary

from IPython.display import Image
%matplotlib inline

Introducing a simple linear regression model

Exploring the Housing dataset

Source: https://archive.ics.uci.edu/ml/datasets/Housing

Attributes:

1. CRIM per capita crime rate by town


2. ZN proportion of residential land zoned for lots over
25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000's

import pandas as pd

#df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data', header=None, sep='\s+')
df = pd.read_csv('KDD_Train.csv')
X=df.iloc[: , 22:26].values
y=df.iloc[:, 27].values
print(X)

[[ 2. 2. 0. 0.]
[ 13. 1. 0. 0.]
[123. 6. 1. 1.]
...
[ 1. 1. 0. 0.]

https://colab.research.google.com/drive/12StZd_gIxuO71hxieKEPhxYWmM4cpSxr#scrollTo=kbWX24mXbhQz&printMode=true 1/5
8/23/23, 11:42 PM 20MIS1025_Regression.ipynb - Colaboratory
[144. 8. 1. 1.]
[ 1. 1. 0. 0.]]

Visualizing the important characteristics of a dataset

import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style='whitegrid', context='notebook')
cols = ['count', 'srv_count', 'serror_rate']

sns.pairplot(df[cols], height=2.5)
plt.tight_layout()
# plt.savefig('./figures/scatter.png', dpi=300)
plt.show()

import numpy as np

cm = np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.5)
hm = sns.heatmap(cm,
                 cbar=True,
                 annot=True,
                 square=True,
                 fmt='.2f',
                 annot_kws={'size': 15},
                 yticklabels=cols,
                 xticklabels=cols)

# plt.tight_layout()
# plt.savefig('./figures/corr_mat.png', dpi=300)
plt.show()

https://colab.research.google.com/drive/12StZd_gIxuO71hxieKEPhxYWmM4cpSxr#scrollTo=kbWX24mXbhQz&printMode=true 2/5
8/23/23, 11:42 PM 20MIS1025_Regression.ipynb - Colaboratory

sns.reset_orig()
%matplotlib inline

Estimating the coefficient of a regression model via scikit-learn

from sklearn.linear_model import LinearRegression

X = df[['count']].values
y = df['srv_count'].values

from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
X_std = sc_x.fit_transform(X)

y_std = sc_y.fit_transform(y[:, np.newaxis]).flatten()
y_std

array([-0.35434285, -0.36811021, -0.2992734 , ..., -0.36811021,


-0.27173867, -0.36811021])

slr = LinearRegression()
slr.fit(X, y)
y_pred = slr.predict(X)
print('Slope: %.3f' % slr.coef_[0])
print('Intercept: %.3f' % slr.intercept_)

Slope: 0.299
Intercept: 2.605

y_pred

array([ 3.20265828, 6.48965824, 39.35965793, ..., 2.9038401 ,


45.63483969, 2.9038401 ])

def lin_regplot(X, y, model):
    plt.scatter(X, y, c='lightblue')
    plt.plot(X, model.predict(X), color='red', linewidth=2)
    return

lin_regplot(X, y, slr)
plt.xlabel('[count]')
plt.ylabel('[srv_count]')
plt.tight_layout()
# plt.savefig('./figures/scikit_lr_fit.png', dpi=300)
plt.show()

https://colab.research.google.com/drive/12StZd_gIxuO71hxieKEPhxYWmM4cpSxr#scrollTo=kbWX24mXbhQz&printMode=true 3/5
8/23/23, 11:42 PM 20MIS1025_Regression.ipynb - Colaboratory

Evaluating the performance of linear regression models

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

slr = LinearRegression()

slr.fit(X_train, y_train)
y_train_pred = slr.predict(X_train)
y_test_pred = slr.predict(X_test)

plt.scatter(y_train_pred,  y_train_pred - y_train,
            c='blue', marker='o', label='Training data')
plt.scatter(y_test_pred,  y_test_pred - y_test,
            c='lightgreen', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.legend(loc='upper left')
plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color='red')
plt.xlim([-10, 50])
plt.tight_layout()

# plt.savefig('./figures/slr_residuals.png', dpi=300)
plt.show()

from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

https://colab.research.google.com/drive/12StZd_gIxuO71hxieKEPhxYWmM4cpSxr#scrollTo=kbWX24mXbhQz&printMode=true 4/5
8/23/23, 11:42 PM 20MIS1025_Regression.ipynb - Colaboratory
print('MSE train: %.3f, test: %.3f' % (
        mean_squared_error(y_train, y_train_pred),
        mean_squared_error(y_test, y_test_pred)))
print('R^2 train: %.3f, test: %.3f' % (
        r2_score(y_train, y_train_pred),
        r2_score(y_test, y_test_pred)))

MSE train: 4121.121, test: 4067.767


R^2 train: 0.222, test: 0.221

Using regularized methods for regression

from sklearn.linear_model import Lasso

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_train_pred = lasso.predict(X_train)
y_test_pred = lasso.predict(X_test)
print(lasso.coef_)

[0.29923578]

print('MSE train: %.3f, test: %.3f' % (
        mean_squared_error(y_train, y_train_pred),
        mean_squared_error(y_test, y_test_pred)))
print('R^2 train: %.3f, test: %.3f' % (
        r2_score(y_train, y_train_pred),
        r2_score(y_test, y_test_pred)))

MSE train: 4121.121, test: 4067.767


R^2 train: 0.222, test: 0.221

C l b id d t C l t t h
check 0s completed at 11:41 PM

https://colab.research.google.com/drive/12StZd_gIxuO71hxieKEPhxYWmM4cpSxr#scrollTo=kbWX24mXbhQz&printMode=true 5/5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy