0% found this document useful (0 votes)

8 views55 pages

Ml PROGRAMS

The document provides a series of Python code snippets demonstrating the use of the NumPy library for various operations, including generating random numbers, performing statistical calculations, and manipulating arrays. It also includes examples of creating and modifying DataFrames using the pandas library, showcasing data selection, addition, and deletion of columns. Overall, it serves as a practical guide for using NumPy and pandas for data analysis and manipulation.

Uploaded by

salimathvarsha7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views55 pages

Ml PROGRAMS

Uploaded by

salimathvarsha7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

1.

Assignment on practice of numpy library

In [1]: import numpy as np
x=np.random.random(5*5)
print(x)

[0.96011595 0.24347315 0.53785581 0.20718046 0.32130134 0.43911596

0.95759849 0.24552722 0.9245766 0.10116944 0.74120665 0.85946428
0.95166155 0.03005689 0.57937103 0.60647244 0.15993956 0.89931236
0.18834676 0.86078237 0.64931501 0.0738019 0.88085663 0.26720987
0.92823743]

In [2]: import numpy as np

x=np.random.randint(1,100,20)
print(x)

[78 7 48 45 83 70 3 13 61 39 87 79 8 96 34 61 97 88 75 21]

In [4]: import numpy as np

x=np.random.randint(1,100,size=(10,10))
print(x)

[[43 75 44 29 8 10 37 42 45 56]
[86 96 81 85 26 56 14 96 43 53]
[92 37 4 20 25 84 94 72 83 45]
[10 12 77 37 4 54 25 35 79 95]
[99 77 58 60 14 15 33 87 76 31]
[51 6 63 91 36 18 96 53 55 7]
[32 34 23 46 88 18 20 19 67 13]
[46 26 45 94 49 91 90 31 31 32]
[ 1 21 30 24 52 51 54 48 5 61]
[69 15 10 42 94 99 38 82 2 96]]

In [5]: import numpy as np

x=np.array([1,2,3,4])
y=np.flip(x,0)

print(y)

[4 3 2 1]

In [6]: import numpy as np

x=np.array([1,2,3,4])
y=np.mean(x)
z=np.median(x)
p=np.std(x)
r=np.min(x)
t=np.max(x)
print("ARRAY ELEMETS ARE=",x)
print("MEAN IS=",y)
print("MEDIAN IS=",z)
print("STANDARD DEVATION=",p)
print("MINIMUM=",r)
print("MAXIMUM=",t)

ARRAY ELEMETS ARE= [1 2 3 4]

MEAN IS= 2.5
MEDIAN IS= 2.5
STANDARD DEVATION= 1.118033988749895
MINIMUM= 1
MAXIMUM= 4
1
In [7]: import numpy as np
y=np.ones([3,3])
print(y)

[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]

In [8]: import numpy as np

y=np.zeros([3,3])
print(y)

[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]

In [9]: import numpy as np

arr1=np.array([1,2,3,4])
arr2=np.array([5,6,7,8])
y=np.concatenate([arr1,arr2],0)
print(y)

[1 2 3 4 5 6 7 8]

In [10]: import numpy as np

x=np.arange(0,20,2)
print(x)

[ 0 2 4 6 8 10 12 14 16 18]

In [11]: import numpy as np

x=np.arange(1,20,2)
print(x)

[ 1 3 5 7 9 11 13 15 17 19]

In [12]: import numpy as np

x=np.random.normal(5,5)
print(x)

-1.5761257093904577

In [13]: import numpy as np

x=np.array([1,2,3,4,5])
y=np.array([1,2,3,4,5])
z=np.multiply(x,y)
print(z)

[ 1 4 9 16 25]

In [14]: import numpy as np

x=np.array([1,2,3])
y=np.array([4,0,6])
z=np.multiply(x,y)
p=np.resize(2,3)
print(p)

[2 2 2]

In [15]: import numpy as np

x=np.array([1,2,3,4,5,6])
p=x.reshape(2,3)
print(p)

[[1 2 3]
[4 5 6]]

In [16]: import numpy as np 2

x=np.array([1,2,3])
y=np.array([4,0,6])
p=np.dot(x,y)
print(p)

In [17]: import numpy as np

x=np.random.rand(3,4)
print(x)

[[0.89302682 0.86211757 0.90498356 0.15209972]

[0.6139432 0.8836222 0.38338498 0.78726926]
[0.58824787 0.24759667 0.0684686 0.50056723]]

In [18]: import numpy as np

x=np.array([[1,2,3],[4,5,6]])
p=np.transpose(x)
print(p)

[[1 4]
[2 5]
[3 6]]

In [19]: import pandas as pd

import numpy as np
from numpy.random import randn
np.random.seed(101)
df=pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df

Out[19]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [20]: df[['W','Z']]

Out[20]: W Z

A 2.706850 0.503826

B 0.651118 0.605965

C -2.018168 -0.589001

D 0.188695 0.955057

E 0.190794 0.683509

In [21]: df.W

A 2.706850
Out[21]:
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64

In [22]: type(df['W'])

pandas.core.series.Series 3
Out[22]:
In [23]: df['new']=df['W']+df['Y']

In [24]: df

Out[24]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [25]: df.drop('new',axis=1)

Out[25]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [26]: df

Out[26]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [27]: df.drop('new',axis=1,inplace=True)

In [28]: df

Out[28]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [29]: df.drop('E')

Out[29]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

4
B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

In [30]: df.drop('E',axis=0)

Out[30]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

In [31]: df.loc['A']

W 2.706850
Out[31]:
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64

In [32]: df.iloc[2]

W -2.018168
Out[32]:
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64

In [33]: df.loc['B','Y']
-0.8480769834036315
Out[33]:

In [34]: df.loc[['A','B'],['W','Y']]

Out[34]: W Y

A 2.706850 0.907969

B 0.651118 -0.848077

In [35]: df

Out[35]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [36]: df>0

Out[36]: W X Y Z

A True True True True

B True False False True 5

C False True True False

D True False False True

E True True True True

In [37]: df[df>0]

Out[37]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 NaN NaN 0.605965

C NaN 0.740122 0.528813 NaN

D 0.188695 NaN NaN 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [38]: df[df['W']>0]

Out[38]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [39]: df

Out[39]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [40]: df[df['W']>0]['Y']

A 0.907969
Out[40]:
B -0.848077
D -0.933237
E 2.605967
Name: Y, dtype: float64

In [41]: df[(df['W']>0)&(df['Y']>1)]

Out[41]: W X Y Z

E 0.190794 1.978757 2.605967 0.683509

In [42]: df

Out[42]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826 6

B 0.651118 -0.319318 -0.848077 0.605965
C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [43]: df.reset_index()

Out[43]: index W X Y Z

0 A 2.706850 0.628133 0.907969 0.503826

1 B 0.651118 -0.319318 -0.848077 0.605965

2 C -2.018168 0.740122 0.528813 -0.589001

3 D 0.188695 -0.758872 -0.933237 0.955057

4 E 0.190794 1.978757 2.605967 0.683509

In [44]: newind='CA NY WY OR CO'.split()

In [45]: df['States']=newind

In [46]: df

Out[46]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO

In [47]: df.set_index('States')

Out[47]: W X Y Z

States

CA 2.706850 0.628133 0.907969 0.503826

NY 0.651118 -0.319318 -0.848077 0.605965

WY -2.018168 0.740122 0.528813 -0.589001

OR 0.188695 -0.758872 -0.933237 0.955057

CO 0.190794 1.978757 2.605967 0.683509

In [48]: df

Out[48]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO

7
In [49]: outside=['G1','G1','G1','G2','G2','G2']
inside=[1,2,3,1,2,3]
hier_index=list(zip(outside,inside))
hier_index=pd.MultiIndex.from_tuples(hier_index)

In [50]: hier_index

MultiIndex([('G1', 1),
Out[50]:
('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)

In [51]: df=pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Out[51]: A B

G1 1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

G2 1 0.166905 0.184502

2 0.807706 0.072960

3 0.638787 0.329646

In [52]: df.loc['G1']

Out[52]: A B

1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

In [53]: df.loc['G1'].loc[1]

A 0.302665
Out[53]:
B 1.693723
Name: 1, dtype: float64

In [54]: df.index.names

FrozenList([None, None])
Out[54]:

In [55]: df.index.names=['Group','Num']

In [56]: df

Out[56]: A B

Group Num

G1 1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528 8
G2 1 0.166905 0.184502

2 0.807706 0.072960

3 0.638787 0.329646

In [58]: df.xs(1,level='Num')

Out[58]: A B

Group

G1 0.302665 1.693723

G2 0.166905 0.184502

In [59]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
data_frame=pd.read_csv("salary_data.csv")
data_frame

Out[59]: Age Gender Education Level Job Title Years of Experience Salary

0 32.0 Male Bachelor's Software Engineer 5.0 90000.0

1 28.0 Female Master's Data Analyst 3.0 65000.0

2 45.0 Male PhD Senior Manager 15.0 150000.0

3 36.0 Female Bachelor's Sales Associate 7.0 60000.0

4 52.0 Male Master's Director 20.0 200000.0

... ... ... ... ... ... ...

6699 49.0 Female PhD Director of Marketing 20.0 200000.0

6700 32.0 Male High School Sales Associate 3.0 50000.0

6701 30.0 Female Bachelor's Degree Financial Manager 4.0 55000.0

6702 46.0 Male Master's Degree Marketing Manager 14.0 140000.0

6703 26.0 Female High School Sales Executive 1.0 35000.0

6704 rows × 6 columns

In [60]: data_frame.head()

Out[60]: Age Gender Education Level Job Title Years of Experience Salary

0 32.0 Male Bachelor's Software Engineer 5.0 90000.0

1 28.0 Female Master's Data Analyst 3.0 65000.0

2 45.0 Male PhD Senior Manager 15.0 150000.0

3 36.0 Female Bachelor's Sales Associate 7.0 60000.0

4 52.0 Male Master's Director 20.0 200000.0

In [61]: data_frame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6704 entries, 0 to 6703
Data columns (total 6 columns):
# Column Non-Null Count Dtype 9
0 Age 6702 non-null float64
1 Gender 6702 non-null object
2 Education Level 6701 non-null object
3 Job Title 6702 non-null object
4 Years of Experience 6701 non-null float64
5 Salary 6699 non-null float64
dtypes: float64(3), object(3)
memory usage: 314.4+ KB

10
11
12
13
3. Assignment on finds Algorithm. Apply on 'Enjoy
Sport Data to find Specific hypothesis for it.
In [1]: import pandas as pd
import numpy as np

In [2]: data=pd.read_csv('ENJOYSPORT.csv')
data

Out[2]: Sky AirTemp Humidity Wind Water Forecast EnjoySport

0 Sunny Warm Normal Strong Warm Same 1

1 Sunny Warm High Strong Warm Same 1

2 Rainy Cold High Strong Warm Change 0

3 Sunny Warm High Strong Cool Change 1

In [4]: concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]

In [5]: for i,val in enumerate(target):

if val==1:
specific_h=concept[i].copy()
break

In [9]: for i,val in enumerate (concept):

if target[i]==1:
for x in range( len(specific_h)):
if val[x]!=specific_h[x]:
specific_h[x]='?'

In [10]: print(specific_h)

['Sunny' 'Warm' '?' 'Strong' '?' '?']

In [

14
4. Assignment on Candidate Elimination.
Algorithm. Apply it on Dataset to Enjoy Sportfind
Version Space for it.
import numpy as np
import pandas as pd
data=pd.read_csv('ENJOYSPORT.csv')
data

Sky AirTemp Humidity Wind Water Forecast EnjoySport

0 Sunny Warm Normal Strong Warm Same 1
1 Sunny Warm High Strong Warm Same 1
2 Rainy Cold High Strong Warm Change 0
3 Sunny Warm High Strong Cool Change 1

concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]

for i,val in enumerate(target):

if val==1:
specific_h=concept[i].copy()
break
general_h=[['?' for i in range(len(specific_h))]for i in
range(len(specific_h))]
print('Initial specific hypothesis:',specific_h)
print('Initial general hypothesis:',general_h)

Initial specific hypothesis: ['Sunny' 'Warm' 'Normal' 'Strong' 'Warm'

'Same']
Initial general hypothesis: [['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?']]

for i,val in enumerate(concept):

if target[i]==1:
for x in range(len(specific_h)):
if val[x]!=specific_h[x]:
specific_h[x]='?'
general_h[x][x]='?'
if target[i]==0:
for x in range(len(specific_h)):
if val[x]!=specific_h[x]:
general_h[x][x]=specific_h[x]
else:
general_h[x][x]='?'

15
print('step',i+1,'specific hypothesis=',specific_h)
print('step',i+1,'general hypothesis=',general_h)

step 1 specific hypothesis= ['Sunny' 'Warm' 'Normal' 'Strong' 'Warm'

'Same']
step 1 general hypothesis= [['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?']]
step 2 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' 'Warm'
'Same']
step 2 general hypothesis= [['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?',
'?', '?']]
step 3 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' 'Warm'
'Same']
step 3 general hypothesis= [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', 'Same']]
step 4 specific hypothesis= ['Sunny' 'Warm' '?' 'Strong' '?' '?']
step 4 general hypothesis= [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?']]
indices=[]
for i,val in enumerate(general_h):
if val==['?','?','?','?','?','?']:
indices.append(i)
for i in indices:
general_h.remove(['?','?','?','?','?','?'])
print('Final specific hypothesis:',specific_h)
print('Final general hypothesis:',general_h)

Final specific hypothesis: ['Sunny' 'Warm' '?' 'Strong' '?' '?']

Final general hypothesis: [['Sunny', '?', '?', '?', '?', '?'], ['?',
'Warm', '?', '?', '?', '?']]

16
5.Assignment on Simple Regression. Build an
application where it can predict based on year of
Experience a salary using single Variable Linear
Regression (use dataset from the Kaggle). Display co-
efficient and intercept. Also Display MSE.Plot model
on Testing data.
[1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[2]: data=pd.read_csv('Salary_dataset.csv')

[3]: data

[3]: Unnamed: 0 YearsExperience Salary

0 0 1.2 39344.0
1 1 1.4 46206.0
2 2 1.6 37732.0
3 3 2.1 43526.0
4 4 2.3 39892.0
5 5 3.0 56643.0
6 6 3.1 60151.0
7 7 3.3 54446.0
8 8 3.3 64446.0
9 9 3.8 57190.0
10 10 4.0 63219.0
11 11 4.1 55795.0
12 12 4.1 56958.0
13 13 4.2 57082.0
14 14 4.6 61112.0
15 15 5.0 67939.0
16 16 5.2 66030.0
17 17 5.4 83089.0
18 18 6.0 81364.0
19 19 6.1 93941.0
20 20 6.9 91739.0

17
21 21 7.2 98274.0
22 22 8.0 101303.0
23 23 8.3 113813.0
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0

[4]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)

[5]: plt.scatter(x,y)
plt.xlabel('YearsExperience')
plt.ylabel('Salary')
plt.show()

[7]: from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(x,y)

18
[8]: from sklearn.linear_model import LinearRegression

[9]: model=LinearRegression()

[10]: model.fit(X_train,y_train)

[10]: LinearRegression()

[11]: pred=model.predict(X_test)

[12]: print(pred)

[[ 33808.97372662]
[ 75208.93931102]
[ 54508.95651882]
[117594.61836171]
[ 35780.40065921]
[ 62394.66424918]
[ 54508.95651882]
[ 59437.52385029]]
[13]: from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,pred))

46671077.28879917

[14]: print('Coefficient:',model.coef_)

Coefficient: [[9857.13466295]]

[18]: print('Intercept:',model.intercept_)

Intercept: [21980.41213108]

[20]: plt.scatter(X_train,y_train)
plt.plot(X_test,model.predict(X_test))
plt.show()

19
4

20
6.Assignment on Multi Regression: Build an application
where it can predict price of a house using a multiple variable
Linear regression (use Housing dataset from Kaggle). Display
all the co-efficients and MSE.

[1]: import numpy as np

import pandas as pd

[3]: data=pd.read_csv('Housing.csv')
data

[3]: price area bedrooms bathrooms stories mainroad guestroom basement \

0 13300000 7420 4 2 3 yes no no
1 12250000 8960 4 4 4 yes no no
2 12250000 9960 3 2 2 yes no yes
3 12215000 7500 4 2 2 yes no yes
4 11410000 7420 4 1 2 yes yes yes
.. … … … … … … … …
540 1820000 3000 2 1 1 yes no yes
541 1767150 2400 3 1 1 no no no
542 1750000 3620 2 1 1 yes no no
543 1750000 2910 3 1 1 no no no
544 1750000 3850 3 1 2 yes no no

hotwaterheating airconditioning parking prefarea furnishingstatus

0 no yes 2 yes furnished
1 no yes 3 no furnished
2 no no 2 yes semi-furnished
3 no yes 3 yes furnished
4 no yes 2 no furnished
.. … … … … …
540 no no 2 no unfurnished
541 no no 0 no semi-furnished
542 no no 0 no unfurnished
543 no no 0 no furnished
544 no no 0 no unfurnished

21
[545 rows x 13 columns]

[4]: x=data[['area','bedrooms','bathrooms','stories','parking']].values
y=data['price']

[5]: from sklearn.model_selection import train_test_split

[6]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[7]: from sklearn.linear_model import LinearRegression

[8]: model=LinearRegression()
model.fit(X_train,y_train)

[8]: LinearRegression()

[9]: pred=model.predict(X_test)

[10]: print('Coefficients:',model.coef_)

Coefficients: [3.47470179e+02 1.61382678e+05 1.03169790e+06 5.87966142e+05

3.82066782e+05]

[11]: from sklearn.metrics import mean_squared_error

[12]: print('Mean squared error:',mean_squared_error(y_test,pred))

Mean squared error: 1102048313201.5125

[ ]:

22
7.Assignment on Binary classification:Build application
tennis to decide on whether to play Decision Tree
classifier. Do the required data preprocessing. Display
Accuracy score, classification report & confusion Matrix.

[17]: import numpy as np

import pandas as pd

[18]: data=pd.read_csv('PlayTennis.csv')
data

[18]: Outlook Temperature Humidity Wind Play Tennis

0 Sunny Hot High Weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Mild High Strong No

[19]: dataset=pd.get_dummies(data)

[20]: dataset.astype(int)

[20]: Outlook_Overcast Outlook_Rain Outlook_Sunny Temperature_Cool \

0 0 0 1 0
1 0 0 1 0
2 1 0 0 0
3 0 1 0 0

23
4 0 1 0 1
5 0 1 0 1
6 1 0 0 1
7 0 0 1 0
8 0 0 1 1
9 0 1 0 0
10 0 0 1 0
11 1 0 0 0
12 1 0 0 0
13 0 1 0 0

Temperature_Hot Temperature_Mild Humidity_High Humidity_Normal \

0 1 0 1 0
1 1 0 1 0
2 1 0 1 0
3 0 1 1 0
4 0 0 0 1
5 0 0 0 1
6 0 0 0 1
7 0 1 1 0
8 0 0 0 1
9 0 1 0 1
10 0 1 0 1
11 0 1 1 0
12 1 0 0 1
13 0 1 1 0

Wind_Strong Wind_Weak Play Tennis_No Play Tennis_Yes

0 0 1 1 0
1 1 0 1 0
2 0 1 0 1
3 0 1 0 1
4 0 1 0 1
5 1 0 1 0
6 1 0 0 1
7 0 1 1 0
8 0 1 0 1
9 0 1 0 1
10 1 0 0 1
11 1 0 0 1
12 0 1 0 1
13 1 0 1 0

[21]: x=np.array(dataset)[:,1:10]

[22]: y=dataset['Play Tennis_No'].values

24
[23]: from sklearn.model_selection import train_test_split

[24]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[25]: from sklearn.tree import DecisionTreeClassifier

[26]: model=DecisionTreeClassifier(criterion='entropy')

[27]: model.fit(X_train,y_train)
DecisionTreeClassifier(criterion='entropy')
[27]:

[28]: pred=model.predict(X_test)

[33]: from sklearn.metrics import␣

↪classification_report,confusion_matrix,accuracy_score

[34]: print('Accuracy score:',accuracy_score(y_test,pred))

print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Accuracy score: 0.5

Classification report: precision recall f1-score support

False 0.67 0.67 0.67 3

True 0.00 0.00 0.00 1

accuracy 0.50 4
macro avg 0.33 0.33 0.33 4
weighted avg 0.50 0.50 0.50 4

Confusion matrix: [[2 1]

[1 0]]

25
8.Assignment on Binary classification using Perceptron.
Implement Perception model. Use this model to classify a
patient is having cancer or not (use Breast cancer dataset
from sklearn). Display Accuracy score, classification
Report and confusion matrix.
[16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

[2]: from sklearn.datasets import load_breast_cancer

[3]: breast_cancer=load_breast_cancer()

[4]: x=breast_cancer.data

[5]: y=breast_cancer.target

[6]: data=pd.DataFrame(x,columns=breast_cancer.feature_names)

[7]: data['class']=y

[8]: data

[8]: mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
.. … … … … …
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263

26
mean compactness mean concavity mean concave points mean symmetry \
0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \

0 0.07871 … 17.33 184.60 2019.0
1 0.05667 … 23.41 158.80 1956.0
2 0.05999 … 25.53 152.50 1709.0
3 0.09744 … 26.50 98.87 567.7
4 0.05883 … 16.67 152.20 1575.0
.. … … … … …
564 0.05623 … 26.40 166.10 2027.0
565 0.05533 … 38.25 155.00 1731.0
566 0.05648 … 34.12 126.70 1124.0
567 0.07016 … 39.42 184.60 1821.0
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \

0 0.16220 0.66560 0.7119
1 0.12380 0.18660 0.2416
2 0.14440 0.42450 0.4504
3 0.20980 0.86630 0.6869
4 0.13740 0.20500 0.4000
.. … … …
564 0.14100 0.21130 0.4107
565 0.11660 0.19220 0.3215
566 0.11390 0.30940 0.3403
567 0.16500 0.86810 0.9387
568 0.08996 0.06444 0.0000

worst concave points worst symmetry worst fractal dimension class

0 0.2654 0.4601 0.11890 0
1 0.1860 0.2750 0.08902 0
2 0.2430 0.3613 0.08758 0
3 0.2575 0.6638 0.17300 0
4 0.1625 0.2364 0.07678 0
.. … … … …

27
564 0.2216 0.2060 0.07115 0
565 0.1628 0.2572 0.06637 0
566 0.1418 0.2218 0.07820 0
567 0.2650 0.4087 0.12400 0
568 0.0000 0.2871 0.07039 1

[569 rows x 31 columns]

[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[13]: from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

[32]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.
↪2,random_state=42,stratify=y)

[33]: class Perceptron:

def init (self):
self.w=None
self.b=None
def model(self,x):
return 1 if np.dot(self.w,x)>=self.b else 0
def predict(self,X):
Y=[]
for x in X:
result=self.model(x)
Y.append(result)
return np.array(Y)
def fit(self,X,Y,epochs=1,lr=1):
self.w=np.ones(X.shape[1])
self.b=0
wt_matrix=[]
accuracy={}
max_accuracy=0
for i in range(epochs):
for x,y in zip(X,Y):
y_predict=self.model(x)
if y==1 and y_predict==0:
self.w=self.w+lr*x
self.b=self.b-lr*1
elif y==0 and y_predict==1:
self.w=self.w-lr*x
self.b=self.b+lr*1
wt_matrix.append(self.w)
accuracy[i]=accuracy_score(self.predict(X),Y)
if accuracy[i]>max_accuracy:

28
max_accuracy=accuracy[i]
chkptw=self.w.copy()
chkptb=self.b
self.w=chkptw
self.b=chkptb
plt.plot(accuracy.values())
plt.ylim([0,1])
return np.array(wt_matrix)

[34]: perceptron=Perceptron()

[35]: wt_matrix=perceptron.fit(X_train,y_train,10000,0.5)

[36]: plt.plot(wt_matrix[-1])

[36]: [<matplotlib.lines.Line2D at 0x1fa60adba10>]

29
[37]: pred=perceptron.predict(X_test)

[38]: from sklearn.metrics import accuracy_score

[39]: print(accuracy_score(y_test,pred))

0.9122807017543859

30
9.Assignment on Multiclassification using MLP
(Multilayer Perception). Build an application to
classify give iris flower into its Specie using MLP
Cuse iris data set Kaggle / sklearn). Display
Accuracy Score, classification report and
Confusion matrix.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

[3]: iris=load_iris()

[4]: x=iris.data

[5]: y=iris.target

[6]: data=pd.DataFrame(x,columns=iris.feature_names)

[7]: data['class']=y

[8]: data

[8]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

class

31
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2

[150 rows x 5 columns]

[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[10]: from sklearn.model_selection import train_test_split

[12]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)

[14]: from sklearn.neural_network import MLPClassifier

[18]: model=MLPClassifier(hidden_layer_sizes=(10,10,10),max_iter=1000)
model.fit(X_train,y_train)

[18]: MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)

[19]: pred=model.predict(X_test)

[20]: from sklearn.metrics import␣

↪accuracy_score,confusion_matrix,classification_report

[21]: ('Accuracy score:',accuracy_score(y_test,pred))

print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Classification report: precision recall f1-score support

0.0 1.00 1.00 1.00 14

1.0 1.00 1.00 1.00 18
2.0 1.00 1.00 1.00 13

accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45

32
Confusion matrix: [[14 0 0]
[ 0 18 0]
[ 0 0 13]]

33
10.Assignment on Regression using KNN. Build
an application where it can predict Salary based
on of experience using KINN (use salary dataset
from Kaggle). Display MSE.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

[2]: data=pd.read_csv('Salary_dataset.csv')
data

[2]: Unnamed: 0 YearsExperience Salary

34
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0

[5]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)

[6]: from sklearn.model_selection import train_test_split

[7]: plt.scatter(x,y)
plt.xlabel('Experience')
plt.ylabel('Salary')
plt.show()

[8]: X_train,X_test,y_train,y_test=train_test_split(x,y)

[10]: from sklearn.neighbors import KNeighborsRegressor

35
[19]: model=KNeighborsRegressor(n_neighbors=3)
model.fit(X_train,y_train)

[19]: KNeighborsRegressor(n_neighbors=3)

[20]: pred=model.predict(X_test)

[21]: from sklearn.metrics import mean_squared_error

[22]: print('Mean squared error:',mean_squared_error(y_test,pred))

Mean squared error: 37878589.34722221

36
11.Assignment on Classification using KNN. an
application classify a iris flower into its specie using
KNN(use Iris dataset from Sklearn).Display
Accuracy score, classification Report & confusion
Matrix.

[1]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

[2]: iris=load_iris()

[3]: x=iris.data

[4]: y=iris.target

[5]: data=pd.DataFrame(x,columns=iris.feature_names)

[6]: data['class']=y

[7]: data

[7]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

class

37
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2

[150 rows x 5 columns]

[8]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]

[9]: from sklearn.model_selection import train_test_split

[10]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)

[12]: from sklearn.neighbors import KNeighborsClassifier

[14]: model=KNeighborsClassifier(n_neighbors=7)
model.fit(X_train,y_train)

[14]: KNeighborsClassifier(n_neighbors=7)

[15]: pred=model.predict(X_test)

[16]: from sklearn.metrics import␣

↪classification_report,confusion_matrix,accuracy_score

[17]: print('Accuracy score:',accuracy_score(y_test,pred))

print('Classification report:',classification_report(y_test,pred))
print('Confusion matrix:',confusion_matrix(y_test,pred))

Accuracy score: 0.9777777777777777

Classification report: precision recall f1-score support

0.0 1.00 1.00 1.00 14

1.0 1.00 0.94 0.97 18
2.0 0.93 1.00 0.96 13

accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45

38
Confusion matrix: [[14 0 0]
[ 0 17 1]
[ 0 0 13]]

39
12. Assignment on Naive Bayes Classifier. Build an
application to classify a given text using a Naive
classifier. Use data from sklearn. Display Accuracy
score, Classification Report, confusion matrix.
In [16]: import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.datasets import fetch_20newsgroups

In [8]: data = fetch_20newsgroups()

data.target_names

['alt.atheism',
Out[8]:
'comp.graphics',
'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware',
'comp.windows.x',
'misc.forsale',
'rec.autos',
'rec.motorcycles',
'rec.sport.baseball',
'rec.sport.hockey',
'sci.crypt',
'sci.electronics',
'sci.med',
'sci.space',
'soc.religion.christian',
'talk.politics.guns',
'talk.politics.mideast',
'talk.politics.misc',
'talk.religion.misc']

In [17]: categories = ['talk.religion.misc', 'soc.religion.christian',

'sci.space', 'comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)

In [18]: print(train.data[5])

From: dmcgee@uluhe.soest.hawaii.edu (Don McGee)

Subject: Federal Hearing
Originator: dmcgee@uluhe
Organization: School of Ocean and Earth Science and Technology
Distribution: usa
Lines: 10

Fact or rumor....? Madalyn Murray O'Hare an atheist who eliminated the

use of the bible reading and prayer in public schools 15 years ago is now
going to appear before the FCC with a petition to stop the reading of the
Gospel on the airways of America. And she is also campaigning to remove
Christmas programs, songs, etc from the public schools. If it is true
then mail to Federal Communications Commission 1919 H Street Washington DC
20054 expressing your opposition to her request. Reference Petition number

2493. 40
In [11]: from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

model = make_pipeline(TfidfVectorizer(), MultinomialNB())

model.fit(train.data, train.target)
labels = model.predict(test.data)
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(test.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=train.target_names, yticklabels=train.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

In [12]: mat

array([[344, 13, 32, 0],

Out[12]:
[ 6, 364, 24, 0],
[ 1, 5, 392, 0],
[ 4, 12, 187, 48]], dtype=int64)

In [13]: def predict_category(s, train=train, model=model):

pred = model.predict([s])
return train.target_names[pred[0]]
predict_category('sending a payload to the ISS')

'sci.space'
Out[13]:

In [14]: predict_category('discussing islam vs atheism')

'soc.religion.christian'
Out[14]:

In [15]: predict_category('determining the screen resolution')

'comp.graphics'
Out[15]:

In [ ]:

41
13. Assignment on K-mean clusting. Apply K-mean
clustering on Income data set to form 3 clusters and
display there clusters using scatter graph.
In [1]: from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline

In [2]: df = pd.read_csv("income.csv")
df.head()

Out[2]: Name Age Income($)

0 Rob 27 70000

1 Michael 29 90000

2 Mohan 29 61000

3 Ismail 28 60000

4 Kory 42 150000

In [3]: plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')

Text(0,0.5,'Income($)')
Out[3]:

In [4]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2])
Out[4]:

In [5]: df['cluster']=y_predicted
df.head()

Out[5]: Name Age Income($) cluster

42
0 Rob 27 70000 0
1 Michael 29 90000 0

2 Mohan 29 61000 2

3 Ismail 28 60000 2

4 Kory 42 150000 1

In [6]: km.cluster_centers_

array([[3.40000000e+01, 8.05000000e+04],
Out[6]:
[3.82857143e+01, 1.50000000e+05],
[3.29090909e+01, 5.61363636e+04]])

In [7]: df1 = df[df.cluster==0]

df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()

<matplotlib.legend.Legend at 0x1ba914f7cc0>
Out[7]:

In [8]: scaler = MinMaxScaler()

scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])
scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])

In [9]: df.head()

Out[9]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 2

3 Ismail 0.117647 0.128205 2

4 Kory 0.941176 0.897436 1

43
In [10]: plt.scatter(df.Age,df['Income($)'])
<matplotlib.collections.PathCollection at 0x1ba91605a58>
Out[10]:

In [11]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2])
Out[11]:

In [12]: df['cluster']=y_predicted
df.head()

Out[12]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 0

3 Ismail 0.117647 0.128205 0

4 Kory 0.941176 0.897436 1

In [13]: km.cluster_centers_

Out[13]: array([[0.1372549 , 0.11633428],

[0.72268908, 0.8974359 ],
[0.85294118, 0.2022792 ]])

In [15]: df1 = df[df.cluster==0]

<matplotlib.legend.Legend at 0x1ba9166db00>
Out[15]:

44
In [16]: sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)

In [17]: sse

[5.434011511988179,
Out[17]:
2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]

In [18]: sse
[5.434011511988179,
Out[18]: 2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]

In [19]: plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_rng,sse)

[<matplotlib.lines.Line2D at 0x1ba916ddeb8>]
Out[19]:

45
In [ ]:

46
14. Assignment on hierarchical clustering. Apply iton
mall customers to form 5 clusters and display these
clusters using scatter graph.
In [1]: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]: ourData = pd.read_csv('Mall_Customers.csv')

ourData.head()

Out[2]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [3]: newData = ourData.iloc[:, [3, 4]].values

In [4]: import scipy.cluster.hierarchy as sch

dendrogram = sch.dendrogram(sch.linkage(newData, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

In [5]: from sklearn.cluster import AgglomerativeClustering

Agg_hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward
y_hc = Agg_hc.fit_predict(newData)

In [10]: # plotting cluster 1

plt.scatter(newData[y_hc == 0, 0], newData[y_hc == 0, 1], s = 100, c = 'red', label = 'C

<matplotlib.collections.PathCollection at 0x1a4d7225320>
47
Out[10]:
In [11]:
plt.scatter(newData[y_hc == 0, 0], newData[y_hc == 0, 1], s = 100, c = 'red', label = 'C
plt.scatter(newData[y_hc == 1, 0], newData[y_hc == 1, 1], s = 100, c = 'blue', label = '
plt.scatter(newData[y_hc == 2, 0], newData[y_hc == 2, 1], s = 100, c = 'green', label =
plt.scatter(newData[y_hc == 3, 0], newData[y_hc == 3, 1], s = 100, c = 'cyan', label = '
plt.scatter(newData[y_hc == 4, 0], newData[y_hc == 4, 1], s = 100, c = 'magenta', label

plt.title('Clusters of customers')

plt.xlabel('Annual Income (k$)')

plt.ylabel('Spending Score (1-100)')

plt.legend()

plt.show()

In [ ]:

48
15. Assignment on dimensionality reduction. Apply
principle component analysis pca on iris dataset to
reduces its dimensionality into three principle
components before and after reduction using scatter
graph.
In [2]:

import pandas as pd
import matplotlib.pyplot as plt

# load dataset into Pandas DataFrame

df = pd.read_csv("C:/Users/abhis/Desktop/SSY/deeplearning part1/iris-
flower-dataset/IRIS.csv")
df.head()

Out[2]:

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

In [3]:

fig = plt.figure(figsize = (8,8))

sepal = fig.add_subplot(1,1,1)
sepal.set_xlabel('sepal_length', fontsize = 15)
sepal.set_ylabel('sepal_width', fontsize = 15)
sepal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['species'] == target
sepal.scatter(df.loc[indicesToKeep, 'sepal_length']
, df.loc[indicesToKeep, 'sepal_width']
, c = color
, s = 50)
sepal.legend(targets)
sepal.grid()

49
In [6]:

fig = plt.figure(figsize = (8,8))

petal = fig.add_subplot(1,1,1)
petal.set_xlabel('pepal_length', fontsize = 15)
petal.set_ylabel('petal_width', fontsize = 15)
petal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['species'] == target
petal.scatter(df.loc[indicesToKeep, 'petal_length']
, df.loc[indicesToKeep, 'petal_width']
, c = color
, s = 50)
petal.legend(targets)
petal.grid()

50
In [2]:

from sklearn.preprocessing import StandardScaler

features = ['sepal_length','sepal_width','petal_length','petal_width']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['species']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3,
random_state=101)

In [3]:

from sklearn.decomposition import PCA

pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component
2', 'principal component 3'])

In [4]:
51
finalDf = pd.concat([principalDf, df[['species']]], axis = 1)

In [5]:

finalDf.head()

Out[5]:

principal component 1 principal component 2 principal component 3 species

0 -2.264542 0.505704 -0.121943 Iris-setosa

1 -2.086426 -0.655405 -0.227251 Iris-setosa

2 -2.367950 -0.318477 0.051480 Iris-setosa

3 -2.304197 -0.575368 0.098860 Iris-setosa

4 -2.388777 0.674767 0.021428 Iris-setosa

In [8]:

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['species'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()

52
In [6]:

pca.explained_variance_ratio_

Out[6]:

array([0.72770452, 0.23030523, 0.03683832])

In [7]:

from sklearn.ensemble import RandomForestClassifier

In [8]:

#using original data

model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22. 53
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until

In [9]:

predictions

Out[9]:

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',

'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-
setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-versicolor', 'Iris-versicolor'], dtype=object)

In [10]:

from sklearn.metrics import accuracy_score

accuracy_score(y_test, predictions)

Out[10]:

0.9777777777777777

In [11]:

# Separating out the features

x = finalDf.drop(["species"], axis = 1)
x = StandardScaler().fit_transform(x)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3,
random_state=101)

In [12]:
54
#using PCA
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until

In [13]:

predictions

Out[13]:

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-
setosa',
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
'Iris-virginica', 'Iris-versicolor'], dtype=object)

In [14]:

accuracy_score(y_test, predictions)

Out[14]:

0.9111111111111111

In [ ]:

In [ ]: 55

Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
ML Journal
No ratings yet
ML Journal
58 pages
2 Numpy Basics
No ratings yet
2 Numpy Basics
14 pages
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
No ratings yet
Name: Muhammad Sarfraz Seat: EP1850086 Section: A Course Code: 514 Course Name: Data Warehousing and Data Mining
39 pages
Notebook 1 - Numpy
No ratings yet
Notebook 1 - Numpy
17 pages
Week 3 GGG
No ratings yet
Week 3 GGG
17 pages
ML Lab Programs
No ratings yet
ML Lab Programs
21 pages
NUMPY - Jupyter Notebook
No ratings yet
NUMPY - Jupyter Notebook
32 pages
Prac3.ipynb (Auto-R) - JupyterLab
No ratings yet
Prac3.ipynb (Auto-R) - JupyterLab
6 pages
Python 2
No ratings yet
Python 2
28 pages
Mmds
No ratings yet
Mmds
12 pages
Nguyenquangmanh
No ratings yet
Nguyenquangmanh
15 pages
CS2209 Python Pandas
No ratings yet
CS2209 Python Pandas
30 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
11 pages
Numpy_TE2D6
No ratings yet
Numpy_TE2D6
8 pages
Section 7
No ratings yet
Section 7
33 pages
FDS Slot 3
No ratings yet
FDS Slot 3
15 pages
Numpy_TE2
No ratings yet
Numpy_TE2
12 pages
Pandas - Colab
No ratings yet
Pandas - Colab
9 pages
Numpy
No ratings yet
Numpy
27 pages
PCA
No ratings yet
PCA
23 pages
DAV Practicals
No ratings yet
DAV Practicals
26 pages
Pandas TE2D6
No ratings yet
Pandas TE2D6
34 pages
2.1
No ratings yet
2.1
7 pages
DL Lab2
No ratings yet
DL Lab2
38 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
FDS Tutorial3
No ratings yet
FDS Tutorial3
1 page
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Import As
100% (1)
Import As
27 pages
Veri Analizi Hafta 6-2 Ipynb - Colab
No ratings yet
Veri Analizi Hafta 6-2 Ipynb - Colab
7 pages
L_AND_T_project_Naveen 24cs002895
No ratings yet
L_AND_T_project_Naveen 24cs002895
7 pages
prg7a - Jupyter Notebook
No ratings yet
prg7a - Jupyter Notebook
12 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
Merged
No ratings yet
Merged
35 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
ml labs
No ratings yet
ml labs
14 pages
Numpy Project Part-1
No ratings yet
Numpy Project Part-1
49 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
ML_labs
No ratings yet
ML_labs
15 pages
Python
No ratings yet
Python
5 pages
Pandas Part-2
No ratings yet
Pandas Part-2
9 pages
NumpyCheatSheet
No ratings yet
NumpyCheatSheet
20 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Tutorial_ Stream and Catchment Delineation _ OCWGIS
No ratings yet
Tutorial_ Stream and Catchment Delineation _ OCWGIS
59 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Pca
No ratings yet
Pca
11 pages
Practical_File (1)
No ratings yet
Practical_File (1)
19 pages
Data_Cleaning
No ratings yet
Data_Cleaning
22 pages
DP prog
No ratings yet
DP prog
10 pages
numpy_dataframe
No ratings yet
numpy_dataframe
12 pages
Data warehousing and data mining
No ratings yet
Data warehousing and data mining
24 pages
data analytics lab manual
No ratings yet
data analytics lab manual
26 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
1.5
No ratings yet
1.5
39 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Administracion SAP HANA PDF
No ratings yet
Administracion SAP HANA PDF
18 pages
FDA_BATCH2PROGRAM
No ratings yet
FDA_BATCH2PROGRAM
18 pages
SQL Tuning Workshop v2
100% (1)
SQL Tuning Workshop v2
172 pages
Radicati APT Protection Market Quadrant 2023
No ratings yet
Radicati APT Protection Market Quadrant 2023
42 pages
IAM VOLTE Project Installation Presentation: Huawei Technologies Co., Ltd. Huawei Confidential
No ratings yet
IAM VOLTE Project Installation Presentation: Huawei Technologies Co., Ltd. Huawei Confidential
39 pages
ZEM-1 Market User Manual - v30
No ratings yet
ZEM-1 Market User Manual - v30
10 pages
EmbNet Owner's Manual
No ratings yet
EmbNet Owner's Manual
29 pages
Embedded Systems MCQs
No ratings yet
Embedded Systems MCQs
4 pages
DAVP PYQ 2023 SOLUTION
No ratings yet
DAVP PYQ 2023 SOLUTION
15 pages
Dissertation On Database Management System
100% (2)
Dissertation On Database Management System
5 pages
Manual p8 Energy Pro
No ratings yet
Manual p8 Energy Pro
97 pages
Nov - 2016 - NEO - Firmware Upgrade v130
No ratings yet
Nov - 2016 - NEO - Firmware Upgrade v130
6 pages
MathType 7.4.10 Portable Free Download - Rahim Soft
100% (1)
MathType 7.4.10 Portable Free Download - Rahim Soft
4 pages
DP 600t00a Enu Powerpoint 02
No ratings yet
DP 600t00a Enu Powerpoint 02
30 pages
New Perspectives Microsoft Office 365 & Office 2019 Introductory 2019th Edition Patrick Carey - Quickly download the ebook to never miss important content
100% (1)
New Perspectives Microsoft Office 365 & Office 2019 Introductory 2019th Edition Patrick Carey - Quickly download the ebook to never miss important content
59 pages
Dragon Illusion Print Out PDF - Google Search
No ratings yet
Dragon Illusion Print Out PDF - Google Search
1 page
Cornering The Chimera: Author
No ratings yet
Cornering The Chimera: Author
12 pages
Docker Fundamentals: Kalyan Reddy Daida
No ratings yet
Docker Fundamentals: Kalyan Reddy Daida
11 pages
Version Comparison
No ratings yet
Version Comparison
13 pages
Peter Cooper - From Behaviorism To Cognitivism To Constructivism
No ratings yet
Peter Cooper - From Behaviorism To Cognitivism To Constructivism
9 pages
Dot Net1
No ratings yet
Dot Net1
2 pages
Soumeet Basak CSE Roll 35
No ratings yet
Soumeet Basak CSE Roll 35
20 pages
MPSC-ENGG-2018-PRE-Final-Answer-Key_w
No ratings yet
MPSC-ENGG-2018-PRE-Final-Answer-Key_w
3 pages
AWS_MAD_Agenda
No ratings yet
AWS_MAD_Agenda
1 page
Vpad IV Technical Specifications July 2021
No ratings yet
Vpad IV Technical Specifications July 2021
2 pages
TANCET MCA Entrance Sample Question Paper With Answers
No ratings yet
TANCET MCA Entrance Sample Question Paper With Answers
12 pages
Practical 01
No ratings yet
Practical 01
5 pages
Graphic Designing Course
No ratings yet
Graphic Designing Course
18 pages
Building A Cybersecurity Program For Industrial Control Systems
No ratings yet
Building A Cybersecurity Program For Industrial Control Systems
9 pages
Q No. Marks
No ratings yet
Q No. Marks
3 pages
Mac Os: Chapter 1 Getting Started
No ratings yet
Mac Os: Chapter 1 Getting Started
5 pages
Computer Solved Differential Equations
From Everand
Computer Solved Differential Equations
Joe J.
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.