0% found this document useful (0 votes)
8 views55 pages

Ml PROGRAMS

The document provides a series of Python code snippets demonstrating the use of the NumPy library for various operations, including generating random numbers, performing statistical calculations, and manipulating arrays. It also includes examples of creating and modifying DataFrames using the pandas library, showcasing data selection, addition, and deletion of columns. Overall, it serves as a practical guide for using NumPy and pandas for data analysis and manipulation.

Uploaded by

salimathvarsha7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views55 pages

Ml PROGRAMS

The document provides a series of Python code snippets demonstrating the use of the NumPy library for various operations, including generating random numbers, performing statistical calculations, and manipulating arrays. It also includes examples of creating and modifying DataFrames using the pandas library, showcasing data selection, addition, and deletion of columns. Overall, it serves as a practical guide for using NumPy and pandas for data analysis and manipulation.

Uploaded by

salimathvarsha7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

1.

Assignment on practice of numpy library


In [1]: import numpy as np
x=np.random.random(5*5)
print(x)

[0.96011595 0.24347315 0.53785581 0.20718046 0.32130134 0.43911596


0.95759849 0.24552722 0.9245766 0.10116944 0.74120665 0.85946428
0.95166155 0.03005689 0.57937103 0.60647244 0.15993956 0.89931236
0.18834676 0.86078237 0.64931501 0.0738019 0.88085663 0.26720987
0.92823743]

In [2]: import numpy as np


x=np.random.randint(1,100,20)
print(x)

[78 7 48 45 83 70 3 13 61 39 87 79 8 96 34 61 97 88 75 21]

In [4]: import numpy as np


x=np.random.randint(1,100,size=(10,10))
print(x)

[[43 75 44 29 8 10 37 42 45 56]
[86 96 81 85 26 56 14 96 43 53]
[92 37 4 20 25 84 94 72 83 45]
[10 12 77 37 4 54 25 35 79 95]
[99 77 58 60 14 15 33 87 76 31]
[51 6 63 91 36 18 96 53 55 7]
[32 34 23 46 88 18 20 19 67 13]
[46 26 45 94 49 91 90 31 31 32]
[ 1 21 30 24 52 51 54 48 5 61]
[69 15 10 42 94 99 38 82 2 96]]

In [5]: import numpy as np


x=np.array([1,2,3,4])
y=np.flip(x,0)

print(y)

[4 3 2 1]

In [6]: import numpy as np


x=np.array([1,2,3,4])
y=np.mean(x)
z=np.median(x)
p=np.std(x)
r=np.min(x)
t=np.max(x)
print("ARRAY ELEMETS ARE=",x)
print("MEAN IS=",y)
print("MEDIAN IS=",z)
print("STANDARD DEVATION=",p)
print("MINIMUM=",r)
print("MAXIMUM=",t)

ARRAY ELEMETS ARE= [1 2 3 4]


MEAN IS= 2.5
MEDIAN IS= 2.5
STANDARD DEVATION= 1.118033988749895
MINIMUM= 1
MAXIMUM= 4
1
In [7]: import numpy as np
y=np.ones([3,3])
print(y)

[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]

In [8]: import numpy as np


y=np.zeros([3,3])
print(y)

[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]

In [9]: import numpy as np


arr1=np.array([1,2,3,4])
arr2=np.array([5,6,7,8])
y=np.concatenate([arr1,arr2],0)
print(y)

[1 2 3 4 5 6 7 8]

In [10]: import numpy as np


x=np.arange(0,20,2)
print(x)

[ 0 2 4 6 8 10 12 14 16 18]

In [11]: import numpy as np


x=np.arange(1,20,2)
print(x)

[ 1 3 5 7 9 11 13 15 17 19]

In [12]: import numpy as np


x=np.random.normal(5,5)
print(x)

-1.5761257093904577

In [13]: import numpy as np


x=np.array([1,2,3,4,5])
y=np.array([1,2,3,4,5])
z=np.multiply(x,y)
print(z)

[ 1 4 9 16 25]

In [14]: import numpy as np


x=np.array([1,2,3])
y=np.array([4,0,6])
z=np.multiply(x,y)
p=np.resize(2,3)
print(p)

[2 2 2]

In [15]: import numpy as np


x=np.array([1,2,3,4,5,6])
p=x.reshape(2,3)
print(p)

[[1 2 3]
[4 5 6]]

In [16]: import numpy as np 2


x=np.array([1,2,3])
y=np.array([4,0,6])
p=np.dot(x,y)
print(p)

22

In [17]: import numpy as np


x=np.random.rand(3,4)
print(x)

[[0.89302682 0.86211757 0.90498356 0.15209972]


[0.6139432 0.8836222 0.38338498 0.78726926]
[0.58824787 0.24759667 0.0684686 0.50056723]]

In [18]: import numpy as np


x=np.array([[1,2,3],[4,5,6]])
p=np.transpose(x)
print(p)

[[1 4]
[2 5]
[3 6]]

In [19]: import pandas as pd


import numpy as np
from numpy.random import randn
np.random.seed(101)
df=pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df

Out[19]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [20]: df[['W','Z']]

Out[20]: W Z

A 2.706850 0.503826

B 0.651118 0.605965

C -2.018168 -0.589001

D 0.188695 0.955057

E 0.190794 0.683509

In [21]: df.W

A 2.706850
Out[21]:
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64

In [22]: type(df['W'])

pandas.core.series.Series 3
Out[22]:
In [23]: df['new']=df['W']+df['Y']

In [24]: df

Out[24]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [25]: df.drop('new',axis=1)

Out[25]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [26]: df

Out[26]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [27]: df.drop('new',axis=1,inplace=True)

In [28]: df

Out[28]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [29]: df.drop('E')

Out[29]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826


4
B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

In [30]: df.drop('E',axis=0)

Out[30]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

In [31]: df.loc['A']

W 2.706850
Out[31]:
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64

In [32]: df.iloc[2]

W -2.018168
Out[32]:
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64

In [33]: df.loc['B','Y']
-0.8480769834036315
Out[33]:

In [34]: df.loc[['A','B'],['W','Y']]

Out[34]: W Y

A 2.706850 0.907969

B 0.651118 -0.848077

In [35]: df

Out[35]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [36]: df>0

Out[36]: W X Y Z

A True True True True

B True False False True 5


C False True True False

D True False False True

E True True True True

In [37]: df[df>0]

Out[37]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 NaN NaN 0.605965

C NaN 0.740122 0.528813 NaN

D 0.188695 NaN NaN 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [38]: df[df['W']>0]

Out[38]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [39]: df

Out[39]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [40]: df[df['W']>0]['Y']

A 0.907969
Out[40]:
B -0.848077
D -0.933237
E 2.605967
Name: Y, dtype: float64

In [41]: df[(df['W']>0)&(df['Y']>1)]

Out[41]: W X Y Z

E 0.190794 1.978757 2.605967 0.683509

In [42]: df

Out[42]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826 6


B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [43]: df.reset_index()

Out[43]: index W X Y Z

0 A 2.706850 0.628133 0.907969 0.503826

1 B 0.651118 -0.319318 -0.848077 0.605965

2 C -2.018168 0.740122 0.528813 -0.589001

3 D 0.188695 -0.758872 -0.933237 0.955057

4 E 0.190794 1.978757 2.605967 0.683509

In [44]: newind='CA NY WY OR CO'.split()

In [45]: df['States']=newind

In [46]: df

Out[46]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO

In [47]: df.set_index('States')

Out[47]: W X Y Z

States

CA 2.706850 0.628133 0.907969 0.503826

NY 0.651118 -0.319318 -0.848077 0.605965

WY -2.018168 0.740122 0.528813 -0.589001

OR 0.188695 -0.758872 -0.933237 0.955057

CO 0.190794 1.978757 2.605967 0.683509

In [48]: df

Out[48]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO


7
In [49]: outside=['G1','G1','G1','G2','G2','G2']
inside=[1,2,3,1,2,3]
hier_index=list(zip(outside,inside))
hier_index=pd.MultiIndex.from_tuples(hier_index)

In [50]: hier_index

MultiIndex([('G1', 1),
Out[50]:
('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)

In [51]: df=pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Out[51]: A B

G1 1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

G2 1 0.166905 0.184502

2 0.807706 0.072960

3 0.638787 0.329646

In [52]: df.loc['G1']

Out[52]: A B

1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

In [53]: df.loc['G1'].loc[1]

A 0.302665
Out[53]:
B 1.693723
Name: 1, dtype: float64

In [54]: df.index.names

FrozenList([None, None])
Out[54]:

In [55]: df.index.names=['Group','Num']

In [56]: df

Out[56]: A B

Group Num

G1 1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528 8
G2 1 0.166905 0.184502

2 0.807706 0.072960

3 0.638787 0.329646

In [58]: df.xs(1,level='Num')

Out[58]: A B

Group

G1 0.302665 1.693723

G2 0.166905 0.184502

In [59]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
data_frame=pd.read_csv("salary_data.csv")
data_frame

Out[59]: Age Gender Education Level Job Title Years of Experience Salary

0 32.0 Male Bachelor's Software Engineer 5.0 90000.0

1 28.0 Female Master's Data Analyst 3.0 65000.0

2 45.0 Male PhD Senior Manager 15.0 150000.0

3 36.0 Female Bachelor's Sales Associate 7.0 60000.0

4 52.0 Male Master's Director 20.0 200000.0

... ... ... ... ... ... ...

6699 49.0 Female PhD Director of Marketing 20.0 200000.0

6700 32.0 Male High School Sales Associate 3.0 50000.0

6701 30.0 Female Bachelor's Degree Financial Manager 4.0 55000.0

6702 46.0 Male Master's Degree Marketing Manager 14.0 140000.0

6703 26.0 Female High School Sales Executive 1.0 35000.0

6704 rows × 6 columns

In [60]: data_frame.head()

Out[60]: Age Gender Education Level Job Title Years of Experience Salary

0 32.0 Male Bachelor's Software Engineer 5.0 90000.0

1 28.0 Female Master's Data Analyst 3.0 65000.0

2 45.0 Male PhD Senior Manager 15.0 150000.0

3 36.0 Female Bachelor's Sales Associate 7.0 60000.0

4 52.0 Male Master's Director 20.0 200000.0

In [61]: data_frame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6704 entries, 0 to 6703
Data columns (total 6 columns):
# Column Non-Null Count Dtype 9
0 Age 6702 non-null float64
1 Gender 6702 non-null object
2 Education Level 6701 non-null object
3 Job Title 6702 non-null object
4 Years of Experience 6701 non-null float64
5 Salary 6699 non-null float64
dtypes: float64(3), object(3)
memory usage: 314.4+ KB

10
11
12
13
3. Assignment on finds Algorithm. Apply on 'Enjoy
Sport Data to find Specific hypothesis for it.
In [1]: import pandas as pd
import numpy as np

In [2]: data=pd.read_csv('ENJOYSPORT.csv')
data

Out[2]: Sky AirTemp Humidity Wind Water Forecast EnjoySport

0 Sunny Warm Normal Strong Warm Same 1

1 Sunny Warm High Strong Warm Same 1

2 Rainy Cold High Strong Warm Change 0

3 Sunny Warm High Strong Cool Change 1

In [4]: concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]

In [5]: for i,val in enumerate(target):


if val==1:
specific_h=concept[i].copy()
break

In [9]: for i,val in enumerate (concept):


if target[i]==1:
for x in range( len(specific_h)):
if val[x]!=specific_h[x]:
specific_h[x]='?'

In [10]: print(specific_h)

['Sunny' 'Warm' '?' 'Strong' '?' '?']

In [

14
4. Assignment on Candidate Elimination.
Algorithm. Apply it on Dataset to Enjoy Sportfind
Version Space for it.
import numpy as np
import pandas as pd
data=pd.read_csv('ENJOYSPORT.csv')
data

Sky AirTemp Humidity Wind Water Forecast EnjoySport


0 Sunny Warm Normal Strong Warm Same 1
1 Sunny Warm High Strong Warm Same 1
2 Rainy Cold High Strong Warm Change 0
3 Sunny Warm High Strong Cool Change 1

concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]

for i,val in enumerate(target):


if val==1:
specific_h=concept[i].copy()
break
general_h=[['?' for i in range(len(specific_h))]for i in
range(len(specific_h))]
print('Initial specific hypothesis:',specific_h)
print('Initial general hypothesis:',general_h)

Initial specific hypothesis: ['Sunny' 'Warm' 'Normal' 'Strong' 'Warm'


'Same']
Initial general hypothesis: [['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?']]

for i,val in enumerate(concept):


if target[i]==1:
for x in range(len(specific_h)):
if val[x]!=specific_h[x]:
specific_h[x]='?'
general_h[x][x]='?'
if target[i]==0:
for x in range(len(specific_h)):
if val[x]!=specific_h[x]:
general_h[x][x]=specific_h[x]
else:
general_h[x][x]='?'

15
print('step',i+1,'specific hypothesis=',specific_h)
print('step',i+1,'general hypothesis=',general_h)

step 1 specific hypothesis= ['Sunny' 'Warm' 'Normal' 'Strong' 'Warm'


'Same']
step 1 general hypothesis= [['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?']]
step 2 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' 'Warm'
'Same']
step 2 general hypothesis= [['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?']]
step 3 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' 'Warm'
'Same']
step 3 general hypothesis= [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', 'Same']]
step 4 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' '?' '?']
step 4 general hypothesis= [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?']]
indices=[]
for i,val in enumerate(general_h):
if val==['?','?','?','?','?','?']:
indices.append(i)
for i in indices:
general_h.remove(['?','?','?','?','?','?'])
print('Final specific hypothesis:',specific_h)
print('Final general hypothesis:',general_h)

Final specific hypothesis: ['Sunny' 'Warm' '?' 'Strong' '?' '?']


Final general hypothesis: [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?']]

16
5.Assignment on Simple Regression. Build an
application where it can predict based on year of
Experience a salary using single Variable Linear
Regression (use dataset from the Kaggle). Display co-
efficient and intercept. Also Display MSE.Plot model
on Testing data.
[1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[2]: data=pd.read_csv('Salary_dataset.csv')

[3]: data

[3]: Unnamed: 0 YearsExperience Salary


0 0 1.2 39344.0
1 1 1.4 46206.0
2 2 1.6 37732.0
3 3 2.1 43526.0
4 4 2.3 39892.0
5 5 3.0 56643.0
6 6 3.1 60151.0
7 7 3.3 54446.0
8 8 3.3 64446.0
9 9 3.8 57190.0
10 10 4.0 63219.0
11 11 4.1 55795.0
12 12 4.1 56958.0
13 13 4.2 57082.0
14 14 4.6 61112.0
15 15 5.0 67939.0
16 16 5.2 66030.0
17 17 5.4 83089.0
18 18 6.0 81364.0
19 19 6.1 93941.0
20 20 6.9 91739.0

17
21 21 7.2 98274.0
22 22 8.0 101303.0
23 23 8.3 113813.0
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0

[4]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)

[5]: plt.scatter(x,y)
plt.xlabel('YearsExperience')
plt.ylabel('Salary')
plt.show()

[7]: from sklearn.model_selection import train_test_split


X_train,X_test,y_train,y_test=train_test_split(x,y)

18
[8]: from sklearn.linear_model import LinearRegression

[9]: model=LinearRegression()

[10]: model.fit(X_train,y_train)

[10]: LinearRegression()

[11]: pred=model.predict(X_test)

[12]: print(pred)

[[ 33808.97372662]
[ 75208.93931102]
[ 54508.95651882]
[117594.61836171]
[ 35780.40065921]
[ 62394.66424918]
[ 54508.95651882]
[ 59437.52385029]]
[13]: from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,pred))

46671077.28879917

[14]: print('Coefficient:',model.coef_)

Coefficient: [[9857.13466295]]

[18]: print('Intercept:',model.intercept_)

Intercept: [21980.41213108]

[20]: plt.scatter(X_train,y_train)
plt.plot(X_test,model.predict(X_test))
plt.show()

19
4

20
6.Assignment on Multi Regression: Build an application
where it can predict price of a house using a multiple variable
Linear regression (use Housing dataset from Kaggle). Display
all the co-efficients and MSE.

[1]: import numpy as np


import pandas as pd

[3]: data=pd.read_csv('Housing.csv')
data

[3]: price area bedrooms bathrooms stories mainroad guestroom basement \


0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes
.. … … … … … … … …
540 1820000 3000 2 1 1 yes no yes
541 1767150 2400 3 1 1 no no no
542 1750000 3620 2 1 1 yes no no
543 1750000 2910 3 1 1 no no no
544 1750000 3850 3 1 2 yes no no

hotwaterheating airconditioning parking prefarea furnishingstatus


0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished
.. … … … … …
540 no no 2 no unfurnished
541 no no 0 no semi-furnished
542 no no 0 no unfurnished
543 no no 0 no furnished
544 no no 0 no unfurnished

21
[545 rows x 13 columns]

[4]: x=data[['area','bedrooms','bathrooms','stories','parking']].values
y=data['price']

[5]: from sklearn.model_selection import train_test_split

[6]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[7]: from sklearn.linear_model import LinearRegression

[8]: model=LinearRegression()
model.fit(X_train,y_train)

[8]: LinearRegression()

[9]: pred=model.predict(X_test)

[10]: print('Coefficients:',model.coef_)

Coefficients: [3.47470179e+02 1.61382678e+05 1.03169790e+06 5.87966142e+05


3.82066782e+05]

[11]: from sklearn.metrics import mean_squared_error

[12]: print('Mean squared error:',mean_squared_error(y_test,pred))

Mean squared error: 1102048313201.5125

[ ]:

22
7.Assignment on Binary classification:Build application
tennis to decide on whether to play Decision Tree
classifier. Do the required data preprocessing. Display
Accuracy score, classification report & confusion Matrix.

[17]: import numpy as np


import pandas as pd

[18]: data=pd.read_csv('PlayTennis.csv')
data

[18]: Outlook Temperature Humidity Wind Play Tennis


0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Mild High Strong No

[19]: dataset=pd.get_dummies(data)

[20]: dataset.astype(int)

[20]: Outlook_Overcast Outlook_Rain Outlook_Sunny Temperature_Cool \


0 0 0 1 0
1 0 0 1 0
2 1 0 0 0
3 0 1 0 0

23
4 0 1 0 1
5 0 1 0 1
6 1 0 0 1
7 0 0 1 0
8 0 0 1 1
9 0 1 0 0
10 0 0 1 0
11 1 0 0 0
12 1 0 0 0
13 0 1 0 0

Temperature_Hot Temperature_Mild Humidity_High Humidity_Normal \


0 1 0 1 0
1 1 0 1 0
2 1 0 1 0
3 0 1 1 0
4 0 0 0 1
5 0 0 0 1
6 0 0 0 1
7 0 1 1 0
8 0 0 0 1
9 0 1 0 1
10 0 1 0 1
11 0 1 1 0
12 1 0 0 1
13 0 1 1 0

Wind_Strong Wind_Weak Play Tennis_No Play Tennis_Yes


0 0 1 1 0
1 1 0 1 0
2 0 1 0 1
3 0 1 0 1
4 0 1 0 1
5 1 0 1 0
6 1 0 0 1
7 0 1 1 0
8 0 1 0 1
9 0 1 0 1
10 1 0 0 1
11 1 0 0 1
12 0 1 0 1
13 1 0 1 0

[21]: x=np.array(dataset)[:,1:10]

[22]: y=dataset['Play Tennis_No'].values

24
[23]: from sklearn.model_selection import train_test_split

[24]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[25]: from sklearn.tree import DecisionTreeClassifier

[26]: model=DecisionTreeClassifier(criterion='entropy')

[27]: model.fit(X_train,y_train)
DecisionTreeClassifier(criterion='entropy')
[27]:

[28]: pred=model.predict(X_test)

[33]: from sklearn.metrics import␣


↪classification_report,confusion_matrix,accuracy_score

[34]: print('Accuracy score:',accuracy_score(y_test,pred))


print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Accuracy score: 0.5


Classification report: precision recall f1-score support

False 0.67 0.67 0.67 3


True 0.00 0.00 0.00 1

accuracy 0.50 4
macro avg 0.33 0.33 0.33 4
weighted avg 0.50 0.50 0.50 4

Confusion matrix: [[2 1]


[1 0]]

25
8.Assignment on Binary classification using Perceptron.
Implement Perception model. Use this model to classify a
patient is having cancer or not (use Breast cancer dataset
from sklearn). Display Accuracy score, classification
Report and confusion matrix.
[16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[2]: from sklearn.datasets import load_breast_cancer

[3]: breast_cancer=load_breast_cancer()

[4]: x=breast_cancer.data

[5]: y=breast_cancer.target

[6]: data=pd.DataFrame(x,columns=breast_cancer.feature_names)

[7]: data['class']=y

[8]: data

[8]: mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
.. … … … … …
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263

26
mean compactness mean concavity mean concave points mean symmetry \
0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \


0 0.07871 … 17.33 184.60 2019.0
1 0.05667 … 23.41 158.80 1956.0
2 0.05999 … 25.53 152.50 1709.0
3 0.09744 … 26.50 98.87 567.7
4 0.05883 … 16.67 152.20 1575.0
.. … … … … …
564 0.05623 … 26.40 166.10 2027.0
565 0.05533 … 38.25 155.00 1731.0
566 0.05648 … 34.12 126.70 1124.0
567 0.07016 … 39.42 184.60 1821.0
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \


0 0.16220 0.66560 0.7119
1 0.12380 0.18660 0.2416
2 0.14440 0.42450 0.4504
3 0.20980 0.86630 0.6869
4 0.13740 0.20500 0.4000
.. … … …
564 0.14100 0.21130 0.4107
565 0.11660 0.19220 0.3215
566 0.11390 0.30940 0.3403
567 0.16500 0.86810 0.9387
568 0.08996 0.06444 0.0000

worst concave points worst symmetry worst fractal dimension class


0 0.2654 0.4601 0.11890 0
1 0.1860 0.2750 0.08902 0
2 0.2430 0.3613 0.08758 0
3 0.2575 0.6638 0.17300 0
4 0.1625 0.2364 0.07678 0
.. … … … …

27
564 0.2216 0.2060 0.07115 0
565 0.1628 0.2572 0.06637 0
566 0.1418 0.2218 0.07820 0
567 0.2650 0.4087 0.12400 0
568 0.0000 0.2871 0.07039 1

[569 rows x 31 columns]

[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[13]: from sklearn.model_selection import train_test_split


from sklearn.metrics import accuracy_score

[32]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.
↪2,random_state=42,stratify=y)

[33]: class Perceptron:


def init (self):
self.w=None
self.b=None
def model(self,x):
return 1 if np.dot(self.w,x)>=self.b else 0
def predict(self,X):
Y=[]
for x in X:
result=self.model(x)
Y.append(result)
return np.array(Y)
def fit(self,X,Y,epochs=1,lr=1):
self.w=np.ones(X.shape[1])
self.b=0
wt_matrix=[]
accuracy={}
max_accuracy=0
for i in range(epochs):
for x,y in zip(X,Y):
y_predict=self.model(x)
if y==1 and y_predict==0:
self.w=self.w+lr*x
self.b=self.b-lr*1
elif y==0 and y_predict==1:
self.w=self.w-lr*x
self.b=self.b+lr*1
wt_matrix.append(self.w)
accuracy[i]=accuracy_score(self.predict(X),Y)
if accuracy[i]>max_accuracy:

28
max_accuracy=accuracy[i]
chkptw=self.w.copy()
chkptb=self.b
self.w=chkptw
self.b=chkptb
plt.plot(accuracy.values())
plt.ylim([0,1])
return np.array(wt_matrix)

[34]: perceptron=Perceptron()

[35]: wt_matrix=perceptron.fit(X_train,y_train,10000,0.5)

[36]: plt.plot(wt_matrix[-1])

[36]: [<matplotlib.lines.Line2D at 0x1fa60adba10>]

29
[37]: pred=perceptron.predict(X_test)

[38]: from sklearn.metrics import accuracy_score

[39]: print(accuracy_score(y_test,pred))

0.9122807017543859

30
9.Assignment on Multiclassification using MLP
(Multilayer Perception). Build an application to
classify give iris flower into its Specie using MLP
Cuse iris data set Kaggle / sklearn). Display
Accuracy Score, classification report and
Confusion matrix.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

[3]: iris=load_iris()

[4]: x=iris.data

[5]: y=iris.target

[6]: data=pd.DataFrame(x,columns=iris.feature_names)

[7]: data['class']=y

[8]: data

[8]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

class

31
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2

[150 rows x 5 columns]

[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[10]: from sklearn.model_selection import train_test_split

[12]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)

[14]: from sklearn.neural_network import MLPClassifier

[18]: model=MLPClassifier(hidden_layer_sizes=(10,10,10),max_iter=1000)
model.fit(X_train,y_train)

[18]: MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)

[19]: pred=model.predict(X_test)

[20]: from sklearn.metrics import␣


↪accuracy_score,confusion_matrix,classification_report

[21]: ('Accuracy score:',accuracy_score(y_test,pred))


print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Classification report: precision recall f1-score support

0.0 1.00 1.00 1.00 14


1.0 1.00 1.00 1.00 18
2.0 1.00 1.00 1.00 13

accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45

32
Confusion matrix: [[14 0 0]
[ 0 18 0]
[ 0 0 13]]

33
10.Assignment on Regression using KNN. Build
an application where it can predict Salary based
on of experience using KINN (use salary dataset
from Kaggle). Display MSE.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

[2]: data=pd.read_csv('Salary_dataset.csv')
data

[2]: Unnamed: 0 YearsExperience Salary


0 0 1.2 39344.0
1 1 1.4 46206.0
2 2 1.6 37732.0
3 3 2.1 43526.0
4 4 2.3 39892.0
5 5 3.0 56643.0
6 6 3.1 60151.0
7 7 3.3 54446.0
8 8 3.3 64446.0
9 9 3.8 57190.0
10 10 4.0 63219.0
11 11 4.1 55795.0
12 12 4.1 56958.0
13 13 4.2 57082.0
14 14 4.6 61112.0
15 15 5.0 67939.0
16 16 5.2 66030.0
17 17 5.4 83089.0
18 18 6.0 81364.0
19 19 6.1 93941.0
20 20 6.9 91739.0
21 21 7.2 98274.0
22 22 8.0 101303.0
23 23 8.3 113813.0

34
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0

[5]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)

[6]: from sklearn.model_selection import train_test_split

[7]: plt.scatter(x,y)
plt.xlabel('Experience')
plt.ylabel('Salary')
plt.show()

[8]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[10]: from sklearn.neighbors import KNeighborsRegressor

35
[19]: model=KNeighborsRegressor(n_neighbors=3)
model.fit(X_train,y_train)

[19]: KNeighborsRegressor(n_neighbors=3)

[20]: pred=model.predict(X_test)

[21]: from sklearn.metrics import mean_squared_error

[22]: print('Mean squared error:',mean_squared_error(y_test,pred))

Mean squared error: 37878589.34722221

36
11.Assignment on Classification using KNN. an
application classify a iris flower into its specie using
KNN(use Iris dataset from Sklearn).Display
Accuracy score, classification Report & con- fusion
Matrix.

[1]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

[2]: iris=load_iris()

[3]: x=iris.data

[4]: y=iris.target

[5]: data=pd.DataFrame(x,columns=iris.feature_names)

[6]: data['class']=y

[7]: data

[7]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

class

37
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2

[150 rows x 5 columns]

[8]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[9]: from sklearn.model_selection import train_test_split

[10]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)

[12]: from sklearn.neighbors import KNeighborsClassifier

[14]: model=KNeighborsClassifier(n_neighbors=7)
model.fit(X_train,y_train)

[14]: KNeighborsClassifier(n_neighbors=7)

[15]: pred=model.predict(X_test)

[16]: from sklearn.metrics import␣


↪classification_report,confusion_matrix,accuracy_score

[17]: print('Accuracy score:',accuracy_score(y_test,pred))


print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Accuracy score: 0.9777777777777777


Classification report: precision recall f1-score support

0.0 1.00 1.00 1.00 14


1.0 1.00 0.94 0.97 18
2.0 0.93 1.00 0.96 13

accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45

38
Confusion matrix: [[14 0 0]
[ 0 17 1]
[ 0 0 13]]

39
12. Assignment on Naive Bayes Classifier. Build an
application to classify a given text using a Naive
classifier. Use data from sklearn. Display Accuracy
score, Classification Report, confusion matrix.
In [16]: import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.datasets import fetch_20newsgroups

In [8]: data = fetch_20newsgroups()


data.target_names

['alt.atheism',
Out[8]:
'comp.graphics',
'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware',
'comp.windows.x',
'misc.forsale',
'rec.autos',
'rec.motorcycles',
'rec.sport.baseball',
'rec.sport.hockey',
'sci.crypt',
'sci.electronics',
'sci.med',
'sci.space',
'soc.religion.christian',
'talk.politics.guns',
'talk.politics.mideast',
'talk.politics.misc',
'talk.religion.misc']

In [17]: categories = ['talk.religion.misc', 'soc.religion.christian',


'sci.space', 'comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)

In [18]: print(train.data[5])

From: dmcgee@uluhe.soest.hawaii.edu (Don McGee)


Subject: Federal Hearing
Originator: dmcgee@uluhe
Organization: School of Ocean and Earth Science and Technology
Distribution: usa
Lines: 10

Fact or rumor....? Madalyn Murray O'Hare an atheist who eliminated the


use of the bible reading and prayer in public schools 15 years ago is now
going to appear before the FCC with a petition to stop the reading of the
Gospel on the airways of America. And she is also campaigning to remove
Christmas programs, songs, etc from the public schools. If it is true
then mail to Federal Communications Commission 1919 H Street Washington DC
20054 expressing your opposition to her request. Reference Petition number

2493. 40
In [11]: from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

model = make_pipeline(TfidfVectorizer(), MultinomialNB())


model.fit(train.data, train.target)
labels = model.predict(test.data)
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(test.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=train.target_names, yticklabels=train.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

In [12]: mat

array([[344, 13, 32, 0],


Out[12]:
[ 6, 364, 24, 0],
[ 1, 5, 392, 0],
[ 4, 12, 187, 48]], dtype=int64)

In [13]: def predict_category(s, train=train, model=model):


pred = model.predict([s])
return train.target_names[pred[0]]
predict_category('sending a payload to the ISS')

'sci.space'
Out[13]:

In [14]: predict_category('discussing islam vs atheism')


'soc.religion.christian'
Out[14]:

In [15]: predict_category('determining the screen resolution')


'comp.graphics'
Out[15]:

In [ ]:

41
13. Assignment on K-mean clusting. Apply K-mean
clustering on Income data set to form 3 clusters and
display there clusters using scatter graph.
In [1]: from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline

In [2]: df = pd.read_csv("income.csv")
df.head()

Out[2]: Name Age Income($)

0 Rob 27 70000

1 Michael 29 90000

2 Mohan 29 61000

3 Ismail 28 60000

4 Kory 42 150000

In [3]: plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')

Text(0,0.5,'Income($)')
Out[3]:

In [4]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2])
Out[4]:

In [5]: df['cluster']=y_predicted
df.head()

Out[5]: Name Age Income($) cluster


42
0 Rob 27 70000 0
1 Michael 29 90000 0

2 Mohan 29 61000 2

3 Ismail 28 60000 2

4 Kory 42 150000 1

In [6]: km.cluster_centers_

array([[3.40000000e+01, 8.05000000e+04],
Out[6]:
[3.82857143e+01, 1.50000000e+05],
[3.29090909e+01, 5.61363636e+04]])

In [7]: df1 = df[df.cluster==0]


df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()

<matplotlib.legend.Legend at 0x1ba914f7cc0>
Out[7]:

In [8]: scaler = MinMaxScaler()


scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])
scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])

In [9]: df.head()

Out[9]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 2

3 Ismail 0.117647 0.128205 2

4 Kory 0.941176 0.897436 1

43
In [10]: plt.scatter(df.Age,df['Income($)'])
<matplotlib.collections.PathCollection at 0x1ba91605a58>
Out[10]:

In [11]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2])
Out[11]:

In [12]: df['cluster']=y_predicted
df.head()

Out[12]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 0

3 Ismail 0.117647 0.128205 0

4 Kory 0.941176 0.897436 1

In [13]: km.cluster_centers_

Out[13]: array([[0.1372549 , 0.11633428],


[0.72268908, 0.8974359 ],
[0.85294118, 0.2022792 ]])

In [15]: df1 = df[df.cluster==0]


df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
plt.legend()

<matplotlib.legend.Legend at 0x1ba9166db00>
Out[15]:

44
In [16]: sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)

In [17]: sse

[5.434011511988179,
Out[17]:
2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]

In [18]: sse
[5.434011511988179,
Out[18]: 2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]

In [19]: plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_rng,sse)

[<matplotlib.lines.Line2D at 0x1ba916ddeb8>]
Out[19]:

45
In [ ]:

46
14. Assignment on hierarchical clustering. Apply iton
mall customers to form 5 clusters and display these
clusters using scatter graph.
In [1]: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]: ourData = pd.read_csv('Mall_Customers.csv')


ourData.head()

Out[2]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [3]: newData = ourData.iloc[:, [3, 4]].values

In [4]: import scipy.cluster.hierarchy as sch


dendrogram = sch.dendrogram(sch.linkage(newData, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

In [5]: from sklearn.cluster import AgglomerativeClustering


Agg_hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward
y_hc = Agg_hc.fit_predict(newData)

In [10]: # plotting cluster 1


plt.scatter(newData[y_hc == 0, 0], newData[y_hc == 0, 1], s = 100, c = 'red', label = 'C

<matplotlib.collections.PathCollection at 0x1a4d7225320>
47
Out[10]:
In [11]:
plt.scatter(newData[y_hc == 0, 0], newData[y_hc == 0, 1], s = 100, c = 'red', label = 'C
plt.scatter(newData[y_hc == 1, 0], newData[y_hc == 1, 1], s = 100, c = 'blue', label = '
plt.scatter(newData[y_hc == 2, 0], newData[y_hc == 2, 1], s = 100, c = 'green', label =
plt.scatter(newData[y_hc == 3, 0], newData[y_hc == 3, 1], s = 100, c = 'cyan', label = '
plt.scatter(newData[y_hc == 4, 0], newData[y_hc == 4, 1], s = 100, c = 'magenta', label

plt.title('Clusters of customers')

plt.xlabel('Annual Income (k$)')

plt.ylabel('Spending Score (1-100)')

plt.legend()

plt.show()

In [ ]:

48
15. Assignment on dimensionality reduction. Apply
principle component analysis pca on iris dataset to
reduces its dimensionality into three principle
components before and after reduction using scatter
graph.
In [2]:

import pandas as pd
import matplotlib.pyplot as plt

# load dataset into Pandas DataFrame


df = pd.read_csv("C:/Users/abhis/Desktop/SSY/deeplearning part1/iris-
flower-dataset/IRIS.csv")
df.head()

Out[2]:

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

In [3]:

fig = plt.figure(figsize = (8,8))


sepal = fig.add_subplot(1,1,1)
sepal.set_xlabel('sepal_length', fontsize = 15)
sepal.set_ylabel('sepal_width', fontsize = 15)
sepal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['species'] == target
sepal.scatter(df.loc[indicesToKeep, 'sepal_length']
, df.loc[indicesToKeep, 'sepal_width']
, c = color
, s = 50)
sepal.legend(targets)
sepal.grid()

49
In [6]:

fig = plt.figure(figsize = (8,8))


petal = fig.add_subplot(1,1,1)
petal.set_xlabel('pepal_length', fontsize = 15)
petal.set_ylabel('petal_width', fontsize = 15)
petal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['species'] == target
petal.scatter(df.loc[indicesToKeep, 'petal_length']
, df.loc[indicesToKeep, 'petal_width']
, c = color
, s = 50)
petal.legend(targets)
petal.grid()

50
In [2]:

from sklearn.preprocessing import StandardScaler


features = ['sepal_length','sepal_width','petal_length','petal_width']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['species']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3,
random_state=101)

In [3]:

from sklearn.decomposition import PCA


pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component
2', 'principal component 3'])

In [4]:
51
finalDf = pd.concat([principalDf, df[['species']]], axis = 1)

In [5]:

finalDf.head()

Out[5]:

principal component 1 principal component 2 principal component 3 species

0 -2.264542 0.505704 -0.121943 Iris-setosa

1 -2.086426 -0.655405 -0.227251 Iris-setosa

2 -2.367950 -0.318477 0.051480 Iris-setosa

3 -2.304197 -0.575368 0.098860 Iris-setosa

4 -2.388777 0.674767 0.021428 Iris-setosa

In [8]:

import matplotlib.pyplot as plt


fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['species'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()

52
In [6]:

pca.explained_variance_ratio_

Out[6]:

array([0.72770452, 0.23030523, 0.03683832])

In [7]:

from sklearn.ensemble import RandomForestClassifier

In [8]:

#using original data


model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22. 53
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until

In [9]:

predictions

Out[9]:

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',


'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-
setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-versicolor', 'Iris-versicolor'], dtype=object)

In [10]:

from sklearn.metrics import accuracy_score


accuracy_score(y_test, predictions)

Out[10]:

0.9777777777777777

In [11]:

# Separating out the features


x = finalDf.drop(["species"], axis = 1)
x = StandardScaler().fit_transform(x)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3,
random_state=101)

In [12]:
54
#using PCA
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until

In [13]:

predictions

Out[13]:

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',


'Iris-virginica', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-
setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-virginica', 'Iris-versicolor'], dtype=object)

In [14]:

accuracy_score(y_test, predictions)

Out[14]:

0.9111111111111111

In [ ]:

In [ ]: 55

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy