0% found this document useful (0 votes)
19 views6 pages

Emllab

Uploaded by

Pavan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Emllab

Uploaded by

Pavan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

7/19/24, 10:57 AM Untitled3.

ipynb - Colab

from sklearn.datasets import fetch_california_housing


data=fetch_california_housing()
print(data)

{'data': array([[ 8.3252 , 41. , 6.98412698, ..., 2.55555556,


37.88 , -122.23 ],
[ 8.3014 , 21. , 6.23813708, ..., 2.10984183,
37.86 , -122.22 ],
[ 7.2574 , 52. , 8.28813559, ..., 2.80225989,
37.85 , -122.24 ],
...,
[ 1.7 , 17. , 5.20554273, ..., 2.3256351 ,
39.43 , -121.22 ],
[ 1.8672 , 18. , 5.32951289, ..., 2.12320917,
39.43 , -121.32 ],
[ 2.3886 , 16. , 5.25471698, ..., 2.61698113,
39.37 , -121.24 ]]), 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]), 'frame': None, 'target_names'

print(data.DESCR)

.. _california_housing_dataset:

California Housing dataset


--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
- MedInc median income in block group
- HouseAge median house age in block group
- AveRooms average number of rooms per household
- AveBedrms average number of bedrooms per household
- Population block group population
- AveOccup average number of household members
- Latitude block group latitude
- Longitude block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.


https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bureau publishes sample data (a block group typically has a population
of 600 to 3,000 people).

A household is a group of people residing within a home. Since the average


number of rooms and bedrooms in this dataset are provided per household, these
columns may take surprisingly large values for block groups with few households
and many empty houses, such as vacation resorts.

It can be downloaded/loaded using the


:func:`sklearn.datasets.fetch_california_housing` function.

.. topic:: References

- Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,


Statistics and Probability Letters, 33 (1997) 291-297

data.feature_names

['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude']

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 1/6
7/19/24, 10:57 AM Untitled3.ipynb - Colab
import pandas as pd
df=pd.DataFrame(data.data,columns=data.feature_names)
df.head()

MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude

0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23

1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22

2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24

3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25

4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25

Next steps: Generate code with df


toggle_off View recommended plots

df['Price']=data.target
df.head()

MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude Pric

0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23 4.52

1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22 3.58

2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24 3.52

3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25 3.4

4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25 3.42

Next steps: Generate code with df


toggle_off View recommended plots

df.describe()

MedInc HouseAge AveRooms AveBedrms Population AveOccup

count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000

mean 3.870671 28.639486 5.429000 1.096675 1425.476744 3.070655

std 1.899822 12.585558 2.474173 0.473911 1132.462122 10.386050

min 0.499900 1.000000 0.846154 0.333333 3.000000 0.692308

25% 2.563400 18.000000 4.440716 1.006079 787.000000 2.429741

50% 3.534800 29.000000 5.229129 1.048780 1166.000000 2.818116

75% 4.743250 37.000000 6.052381 1.099526 1725.000000 3.282261

max 15.000100 52.000000 141.909091 34.066667 35682.000000 1243.333333

df.isnull().sum()

MedInc 0
HouseAge 0
AveRooms 0
AveBedrms 0
Population 0
AveOccup 0
Latitude 0
Longitude 0
Price 0
dtype: int64

import seaborn as sns


df_copy=df.sample(frac=0.25)
df_copy.shape

(5160, 9)

sns.pairplot(df_copy)

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 2/6
7/19/24, 10:57 AM Untitled3.ipynb - Colab

<seaborn.axisgrid.PairGrid at 0x7ebddafac040>

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 3/6
7/19/24, 10:57 AM Untitled3.ipynb - Colab

#divide the dataset into independent and dependent


x=df.iloc[:,:-1]
y=df.iloc[:,-1]

from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.33,random_state=36)
x.shape,x_train.shape,x_test.shape

((20640, 8), (13828, 8), (6812, 8))

#feature scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_train

array([[ 2.94615603, 1.69257444, 0.65840852, ..., 0.00631568,


-0.67676958, 0.7291751 ],
[-0.53712836, 0.26178584, -0.61551141, ..., 0.10882511,
-0.78878315, 0.59982719],
[ 0.02597188, 0.50025061, -0.14510172, ..., -0.11288116,
0.79807576, -1.19114398],
...,
[-0.46390242, 0.57973886, -0.72060079, ..., 0.24671234,
-0.77478146, 0.69932558],
[ 0.03045085, -0.85104973, 0.29681501, ..., -0.04515438,
1.27880067, -1.66873629],
[ 0.48355594, -0.69207322, -0.02275645, ..., -0.12518059,
0.95676166, -1.25084302]])

x_test=sc.transform(x_test)
x_test

array([[ 0.07003245, -1.88439705, -0.14330011, ..., -0.10560014,


-0.74677806, 0.93812174],
[-1.04892878, 1.0566684 , -0.26431499, ..., -0.01249385,
1.40014871, -0.91752338],
[ 0.15617449, -0.13565543, -0.33324929, ..., -0.17835535,
-0.74211083, 0.55007799],
...,
[ 3.81372206, 0.26178584, 0.71929056, ..., -0.0768889 ,
-0.71877467, 0.55007799],
[ 2.52179968, 0.26178584, 0.88449779, ..., -0.03343845,
-0.69077128, 0.54510307],
[-0.37291683, 0.50025061, -0.01804122, ..., 0.18848079,
-0.74677806, 0.8037989 ]])

#model training
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(x_train,y_train)

▾ LinearRegression
LinearRegression()

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 4/6
7/19/24, 10:57 AM Untitled3.ipynb - Colab
lr.coef_

array([ 0.83700024, 0.12271899, -0.26347102, 0.30713139, -0.0081633 ,


-0.02764702, -0.90609856, -0.87576409])

lr.intercept_

2.0708259184263813

#prediction
y_pred=lr.predict(x_test)

from sklearn.metrics import mean_squared_error,mean_absolute_error


import numpy as np
mse=mean_squared_error(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
print(mse)
print(mae)
print(np.sqrt(mse))

0.5335029155157139
0.540898948179417
0.730412839095613

#accuracy r2 and adjusted r square


from sklearn.metrics import r2_score
r2_score(y_test,y_pred)

0.5875394343499214

#display adjusted R-squared


adj_r2 = 1 - (1-r2_score(y_test, y_pred)) * (len(y_test)-1)/(len(y_test)-x_test.shape[1]-1)
print(adj_r2)

-0.0012124958000889752

from sklearn.linear_model import Ridge


ridge=Ridge(alpha=20.0)
ridge.fit(x_train,y_train)

▾ Ridge
Ridge(alpha=20.0)

y_pred=ridge.predict(x_test)
y_pred

array([1.7411772 , 0.91515687, 2.4477038 , ..., 5.22643706, 4.15328118,


1.73922719])

mse=mean_squared_error(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
print(mse)
print(mae)
print(np.sqrt(mse))

0.5335706984910803
0.5408065397594831
0.7304592380763489

from sklearn.linear_model import Lasso


lasso=Lasso(alpha=20.0)
lasso.fit(x_train,y_train)

▾ Lasso
Lasso(alpha=20.0)

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 5/6
7/19/24, 10:57 AM Untitled3.ipynb - Colab
y_pred=lasso.predict(x_test)
mse=mean_squared_error(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
print(mse)
print(mae)
print(np.sqrt(mse))

1.2935112672240048
0.9000629779192262
1.1373263679454568

from sklearn.linear_model import ElasticNet


elastic=ElasticNet(alpha=20.0)
elastic.fit(x_train,y_train)

▾ ElasticNet
ElasticNet(alpha=20.0)

y_pred=elastic.predict(x_test)
mse=mean_squared_error(y_test,y_pred)
mae=mean_absolute_error(y_test,y_pred)
print(mse)
print(mae)
print(np.sqrt(mse))

1.2935112672240048
0.9000629779192262
1.1373263679454568

df_copy.corr()

MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Lo

MedInc 1.000000 -0.124239 0.336794 -0.063449 0.014632 0.044156 -0.078932 -0

HouseAge -0.124239 1.000000 -0.179126 -0.110799 -0.315880 0.024940 0.007281 -0

AveRooms 0.336794 -0.179126 1.000000 0.841736 -0.083540 -0.007071 0.101773 -0

AveBedrms -0.063449 -0.110799 0.841736 1.000000 -0.078531 -0.007916 0.066090 0

Population 0.014632 -0.315880 -0.083540 -0.078531 1.000000 0.104686 -0.104619 0

AveOccup 0.044156 0.024940 -0.007071 -0.007916 0.104686 1.000000 0.020495 -0

Latitude -0.078932 0.007281 0.101773 0.066090 -0.104619 0.020495 1.000000 -0

Longitude -0.018687 -0.108191 -0.011224 0.035848 0.097267 -0.012839 -0.923449 1

Price 0.688407 0.105085 0.153914 -0.057169 -0.023082 -0.017680 -0.145997 -0

https://colab.research.google.com/drive/1ihyBCBb0Gx3Ajpj_XyPsrzNoDSjiskhN#scrollTo=OEGyszt0Z-j3&printMode=true 6/6

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy