Ml PROGRAMS
Ml PROGRAMS
[78 7 48 45 83 70 3 13 61 39 87 79 8 96 34 61 97 88 75 21]
[[43 75 44 29 8 10 37 42 45 56]
[86 96 81 85 26 56 14 96 43 53]
[92 37 4 20 25 84 94 72 83 45]
[10 12 77 37 4 54 25 35 79 95]
[99 77 58 60 14 15 33 87 76 31]
[51 6 63 91 36 18 96 53 55 7]
[32 34 23 46 88 18 20 19 67 13]
[46 26 45 94 49 91 90 31 31 32]
[ 1 21 30 24 52 51 54 48 5 61]
[69 15 10 42 94 99 38 82 2 96]]
print(y)
[4 3 2 1]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[1 2 3 4 5 6 7 8]
[ 0 2 4 6 8 10 12 14 16 18]
[ 1 3 5 7 9 11 13 15 17 19]
-1.5761257093904577
[ 1 4 9 16 25]
[2 2 2]
[[1 2 3]
[4 5 6]]
22
[[1 4]
[2 5]
[3 6]]
Out[19]: W X Y Z
In [20]: df[['W','Z']]
Out[20]: W Z
A 2.706850 0.503826
B 0.651118 0.605965
C -2.018168 -0.589001
D 0.188695 0.955057
E 0.190794 0.683509
In [21]: df.W
A 2.706850
Out[21]:
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
In [22]: type(df['W'])
pandas.core.series.Series 3
Out[22]:
In [23]: df['new']=df['W']+df['Y']
In [24]: df
Out[24]: W X Y Z new
In [25]: df.drop('new',axis=1)
Out[25]: W X Y Z
In [26]: df
Out[26]: W X Y Z new
In [27]: df.drop('new',axis=1,inplace=True)
In [28]: df
Out[28]: W X Y Z
In [29]: df.drop('E')
Out[29]: W X Y Z
In [30]: df.drop('E',axis=0)
Out[30]: W X Y Z
In [31]: df.loc['A']
W 2.706850
Out[31]:
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64
In [32]: df.iloc[2]
W -2.018168
Out[32]:
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64
In [33]: df.loc['B','Y']
-0.8480769834036315
Out[33]:
In [34]: df.loc[['A','B'],['W','Y']]
Out[34]: W Y
A 2.706850 0.907969
B 0.651118 -0.848077
In [35]: df
Out[35]: W X Y Z
In [36]: df>0
Out[36]: W X Y Z
In [37]: df[df>0]
Out[37]: W X Y Z
In [38]: df[df['W']>0]
Out[38]: W X Y Z
In [39]: df
Out[39]: W X Y Z
In [40]: df[df['W']>0]['Y']
A 0.907969
Out[40]:
B -0.848077
D -0.933237
E 2.605967
Name: Y, dtype: float64
In [41]: df[(df['W']>0)&(df['Y']>1)]
Out[41]: W X Y Z
In [42]: df
Out[42]: W X Y Z
In [43]: df.reset_index()
Out[43]: index W X Y Z
In [45]: df['States']=newind
In [46]: df
Out[46]: W X Y Z States
In [47]: df.set_index('States')
Out[47]: W X Y Z
States
In [48]: df
Out[48]: W X Y Z States
In [50]: hier_index
MultiIndex([('G1', 1),
Out[50]:
('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)
In [51]: df=pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df
Out[51]: A B
G1 1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528
G2 1 0.166905 0.184502
2 0.807706 0.072960
3 0.638787 0.329646
In [52]: df.loc['G1']
Out[52]: A B
1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528
In [53]: df.loc['G1'].loc[1]
A 0.302665
Out[53]:
B 1.693723
Name: 1, dtype: float64
In [54]: df.index.names
FrozenList([None, None])
Out[54]:
In [55]: df.index.names=['Group','Num']
In [56]: df
Out[56]: A B
Group Num
G1 1 0.302665 1.693723
2 -1.706086 -1.159119
3 -0.134841 0.390528 8
G2 1 0.166905 0.184502
2 0.807706 0.072960
3 0.638787 0.329646
In [58]: df.xs(1,level='Num')
Out[58]: A B
Group
G1 0.302665 1.693723
G2 0.166905 0.184502
Out[59]: Age Gender Education Level Job Title Years of Experience Salary
In [60]: data_frame.head()
Out[60]: Age Gender Education Level Job Title Years of Experience Salary
In [61]: data_frame.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6704 entries, 0 to 6703
Data columns (total 6 columns):
# Column Non-Null Count Dtype 9
0 Age 6702 non-null float64
1 Gender 6702 non-null object
2 Education Level 6701 non-null object
3 Job Title 6702 non-null object
4 Years of Experience 6701 non-null float64
5 Salary 6699 non-null float64
dtypes: float64(3), object(3)
memory usage: 314.4+ KB
10
11
12
13
3. Assignment on finds Algorithm. Apply on 'Enjoy
Sport Data to find Specific hypothesis for it.
In [1]: import pandas as pd
import numpy as np
In [2]: data=pd.read_csv('ENJOYSPORT.csv')
data
In [4]: concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]
In [10]: print(specific_h)
In [
14
4. Assignment on Candidate Elimination.
Algorithm. Apply it on Dataset to Enjoy Sportfind
Version Space for it.
import numpy as np
import pandas as pd
data=pd.read_csv('ENJOYSPORT.csv')
data
concept=np.array(data)[:,:-1]
target=np.array(data)[:,-1]
15
print('step',i+1,'specific hypothesis=',specific_h)
print('step',i+1,'general hypothesis=',general_h)
16
5.Assignment on Simple Regression. Build an
application where it can predict based on year of
Experience a salary using single Variable Linear
Regression (use dataset from the Kaggle). Display co-
efficient and intercept. Also Display MSE.Plot model
on Testing data.
[1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
[2]: data=pd.read_csv('Salary_dataset.csv')
[3]: data
17
21 21 7.2 98274.0
22 22 8.0 101303.0
23 23 8.3 113813.0
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0
[4]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)
[5]: plt.scatter(x,y)
plt.xlabel('YearsExperience')
plt.ylabel('Salary')
plt.show()
18
[8]: from sklearn.linear_model import LinearRegression
[9]: model=LinearRegression()
[10]: model.fit(X_train,y_train)
[10]: LinearRegression()
[11]: pred=model.predict(X_test)
[12]: print(pred)
[[ 33808.97372662]
[ 75208.93931102]
[ 54508.95651882]
[117594.61836171]
[ 35780.40065921]
[ 62394.66424918]
[ 54508.95651882]
[ 59437.52385029]]
[13]: from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test,pred))
46671077.28879917
[14]: print('Coefficient:',model.coef_)
Coefficient: [[9857.13466295]]
[18]: print('Intercept:',model.intercept_)
Intercept: [21980.41213108]
[20]: plt.scatter(X_train,y_train)
plt.plot(X_test,model.predict(X_test))
plt.show()
19
4
20
6.Assignment on Multi Regression: Build an application
where it can predict price of a house using a multiple variable
Linear regression (use Housing dataset from Kaggle). Display
all the co-efficients and MSE.
[3]: data=pd.read_csv('Housing.csv')
data
21
[545 rows x 13 columns]
[4]: x=data[['area','bedrooms','bathrooms','stories','parking']].values
y=data['price']
[6]: X_train,X_test,y_train,y_test=train_test_split(x,y)
[8]: model=LinearRegression()
model.fit(X_train,y_train)
[8]: LinearRegression()
[9]: pred=model.predict(X_test)
[10]: print('Coefficients:',model.coef_)
[ ]:
22
7.Assignment on Binary classification:Build application
tennis to decide on whether to play Decision Tree
classifier. Do the required data preprocessing. Display
Accuracy score, classification report & confusion Matrix.
[18]: data=pd.read_csv('PlayTennis.csv')
data
[19]: dataset=pd.get_dummies(data)
[20]: dataset.astype(int)
23
4 0 1 0 1
5 0 1 0 1
6 1 0 0 1
7 0 0 1 0
8 0 0 1 1
9 0 1 0 0
10 0 0 1 0
11 1 0 0 0
12 1 0 0 0
13 0 1 0 0
[21]: x=np.array(dataset)[:,1:10]
24
[23]: from sklearn.model_selection import train_test_split
[24]: X_train,X_test,y_train,y_test=train_test_split(x,y)
[26]: model=DecisionTreeClassifier(criterion='entropy')
[27]: model.fit(X_train,y_train)
DecisionTreeClassifier(criterion='entropy')
[27]:
[28]: pred=model.predict(X_test)
accuracy 0.50 4
macro avg 0.33 0.33 0.33 4
weighted avg 0.50 0.50 0.50 4
25
8.Assignment on Binary classification using Perceptron.
Implement Perception model. Use this model to classify a
patient is having cancer or not (use Breast cancer dataset
from sklearn). Display Accuracy score, classification
Report and confusion matrix.
[16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
[3]: breast_cancer=load_breast_cancer()
[4]: x=breast_cancer.data
[5]: y=breast_cancer.target
[6]: data=pd.DataFrame(x,columns=breast_cancer.feature_names)
[7]: data['class']=y
[8]: data
[8]: mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
.. … … … … …
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263
26
mean compactness mean concavity mean concave points mean symmetry \
0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587
27
564 0.2216 0.2060 0.07115 0
565 0.1628 0.2572 0.06637 0
566 0.1418 0.2218 0.07820 0
567 0.2650 0.4087 0.12400 0
568 0.0000 0.2871 0.07039 1
[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]
[32]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.
↪2,random_state=42,stratify=y)
28
max_accuracy=accuracy[i]
chkptw=self.w.copy()
chkptb=self.b
self.w=chkptw
self.b=chkptb
plt.plot(accuracy.values())
plt.ylim([0,1])
return np.array(wt_matrix)
[34]: perceptron=Perceptron()
[35]: wt_matrix=perceptron.fit(X_train,y_train,10000,0.5)
[36]: plt.plot(wt_matrix[-1])
29
[37]: pred=perceptron.predict(X_test)
[39]: print(accuracy_score(y_test,pred))
0.9122807017543859
30
9.Assignment on Multiclassification using MLP
(Multilayer Perception). Build an application to
classify give iris flower into its Specie using MLP
Cuse iris data set Kaggle / sklearn). Display
Accuracy Score, classification report and
Confusion matrix.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
[3]: iris=load_iris()
[4]: x=iris.data
[5]: y=iris.target
[6]: data=pd.DataFrame(x,columns=iris.feature_names)
[7]: data['class']=y
[8]: data
[8]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8
class
31
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2
[9]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]
[12]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)
[18]: model=MLPClassifier(hidden_layer_sizes=(10,10,10),max_iter=1000)
model.fit(X_train,y_train)
[19]: pred=model.predict(X_test)
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
32
Confusion matrix: [[14 0 0]
[ 0 18 0]
[ 0 0 13]]
33
10.Assignment on Regression using KNN. Build
an application where it can predict Salary based
on of experience using KINN (use salary dataset
from Kaggle). Display MSE.
[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
[2]: data=pd.read_csv('Salary_dataset.csv')
data
34
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
28 28 10.4 122392.0
29 29 10.6 121873.0
[5]: x=data['YearsExperience'].values.reshape(-1,1)
y=data['Salary'].values.reshape(-1,1)
[7]: plt.scatter(x,y)
plt.xlabel('Experience')
plt.ylabel('Salary')
plt.show()
[8]: X_train,X_test,y_train,y_test=train_test_split(x,y)
35
[19]: model=KNeighborsRegressor(n_neighbors=3)
model.fit(X_train,y_train)
[19]: KNeighborsRegressor(n_neighbors=3)
[20]: pred=model.predict(X_test)
36
11.Assignment on Classification using KNN. an
application classify a iris flower into its specie using
KNN(use Iris dataset from Sklearn).Display
Accuracy score, classification Report & con- fusion
Matrix.
[2]: iris=load_iris()
[3]: x=iris.data
[4]: y=iris.target
[5]: data=pd.DataFrame(x,columns=iris.feature_names)
[6]: data['class']=y
[7]: data
[7]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. … … … …
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8
class
37
0 0
1 0
2 0
3 0
4 0
.. …
145 2
146 2
147 2
148 2
149 2
[8]: x=np.array(data)[:,:-1]
y=np.array(data)[:,-1]
[10]: X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)
[14]: model=KNeighborsClassifier(n_neighbors=7)
model.fit(X_train,y_train)
[14]: KNeighborsClassifier(n_neighbors=7)
[15]: pred=model.predict(X_test)
accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45
38
Confusion matrix: [[14 0 0]
[ 0 17 1]
[ 0 0 13]]
39
12. Assignment on Naive Bayes Classifier. Build an
application to classify a given text using a Naive
classifier. Use data from sklearn. Display Accuracy
score, Classification Report, confusion matrix.
In [16]: import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.datasets import fetch_20newsgroups
['alt.atheism',
Out[8]:
'comp.graphics',
'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware',
'comp.windows.x',
'misc.forsale',
'rec.autos',
'rec.motorcycles',
'rec.sport.baseball',
'rec.sport.hockey',
'sci.crypt',
'sci.electronics',
'sci.med',
'sci.space',
'soc.religion.christian',
'talk.politics.guns',
'talk.politics.mideast',
'talk.politics.misc',
'talk.religion.misc']
In [18]: print(train.data[5])
2493. 40
In [11]: from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
In [12]: mat
'sci.space'
Out[13]:
In [ ]:
41
13. Assignment on K-mean clusting. Apply K-mean
clustering on Income data set to form 3 clusters and
display there clusters using scatter graph.
In [1]: from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline
In [2]: df = pd.read_csv("income.csv")
df.head()
0 Rob 27 70000
1 Michael 29 90000
2 Mohan 29 61000
3 Ismail 28 60000
4 Kory 42 150000
In [3]: plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')
Text(0,0.5,'Income($)')
Out[3]:
In [4]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted
array([0, 0, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2])
Out[4]:
In [5]: df['cluster']=y_predicted
df.head()
2 Mohan 29 61000 2
3 Ismail 28 60000 2
4 Kory 42 150000 1
In [6]: km.cluster_centers_
array([[3.40000000e+01, 8.05000000e+04],
Out[6]:
[3.82857143e+01, 1.50000000e+05],
[3.29090909e+01, 5.61363636e+04]])
<matplotlib.legend.Legend at 0x1ba914f7cc0>
Out[7]:
In [9]: df.head()
43
In [10]: plt.scatter(df.Age,df['Income($)'])
<matplotlib.collections.PathCollection at 0x1ba91605a58>
Out[10]:
In [11]: km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']])
y_predicted
array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2])
Out[11]:
In [12]: df['cluster']=y_predicted
df.head()
In [13]: km.cluster_centers_
<matplotlib.legend.Legend at 0x1ba9166db00>
Out[15]:
44
In [16]: sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)
In [17]: sse
[5.434011511988179,
Out[17]:
2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]
In [18]: sse
[5.434011511988179,
Out[18]: 2.091136388699078,
0.4750783498553095,
0.3491047094419565,
0.2755825568722977,
0.22443334487241418,
0.16869711728567788,
0.13265419827245162,
0.10497488680620906]
In [19]: plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_rng,sse)
[<matplotlib.lines.Line2D at 0x1ba916ddeb8>]
Out[19]:
45
In [ ]:
46
14. Assignment on hierarchical clustering. Apply iton
mall customers to form 5 clusters and display these
clusters using scatter graph.
In [1]: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Out[2]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
<matplotlib.collections.PathCollection at 0x1a4d7225320>
47
Out[10]:
In [11]:
plt.scatter(newData[y_hc == 0, 0], newData[y_hc == 0, 1], s = 100, c = 'red', label = 'C
plt.scatter(newData[y_hc == 1, 0], newData[y_hc == 1, 1], s = 100, c = 'blue', label = '
plt.scatter(newData[y_hc == 2, 0], newData[y_hc == 2, 1], s = 100, c = 'green', label =
plt.scatter(newData[y_hc == 3, 0], newData[y_hc == 3, 1], s = 100, c = 'cyan', label = '
plt.scatter(newData[y_hc == 4, 0], newData[y_hc == 4, 1], s = 100, c = 'magenta', label
plt.title('Clusters of customers')
plt.legend()
plt.show()
In [ ]:
48
15. Assignment on dimensionality reduction. Apply
principle component analysis pca on iris dataset to
reduces its dimensionality into three principle
components before and after reduction using scatter
graph.
In [2]:
import pandas as pd
import matplotlib.pyplot as plt
Out[2]:
In [3]:
49
In [6]:
50
In [2]:
In [3]:
In [4]:
51
finalDf = pd.concat([principalDf, df[['species']]], axis = 1)
In [5]:
finalDf.head()
Out[5]:
In [8]:
52
In [6]:
pca.explained_variance_ratio_
Out[6]:
In [7]:
In [8]:
C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22. 53
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until
In [9]:
predictions
Out[9]:
In [10]:
Out[10]:
0.9777777777777777
In [11]:
In [12]:
54
#using PCA
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
C:\Users\abhis\Anaconda3\lib\site-
packages\sklearn\ensemble\forest.py:246: FutureWarning: The default
value of n_estimators will change from 10 in version 0.20 to 100 in
0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\Users\abhis\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
DataConversionWarning: A column-vector y was passed when a 1d array was
expected. Please change the shape of y to (n_samples,), for example
using ravel().
This is separate from the ipykernel package so we can avoid doing
imports until
In [13]:
predictions
Out[13]:
In [14]:
accuracy_score(y_test, predictions)
Out[14]:
0.9111111111111111
In [ ]:
In [ ]: 55