ML Lab - 231009 - 210335
ML Lab - 231009 - 210335
LABORATORY
JNTUA COLLEGE OF ENGINEERING (AUTONOMOUS)
ANANTAPUR
Prepared by
Potte Thumucherla Khasim Baba
Admission No: - 21001F0056
2 For a given set of training data examples stored in a .csv file, 4-7
implement and demonstrate the Candidate-Elimination algorithm
to output a description of the set of all hypotheses consistent with
the training examples.
Experiment 1:
Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .csv
file.
Code: -
import csv
a = []
print("The given Training Data Set")
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\nThe total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\nThe initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'yes':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\nThe hypothesis for the training instance {} is :\n" .format(i+1),hypothesis)
Dataset: -
Sky AirTemp Humidity Wind Water Forecast Enjoysport
enjoysport.csv: -
sunny warm normal strong warm same yes
Output: -
The given Training Data Set
[['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong',
'warm', 'same', 'yes'], ['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no'], ['sunny', 'warm',
'high', 'strong', 'cool', 'change', 'yes']]
Experiment 2:
For a given set of training data examples stored in a .csv file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.
Code: -
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
print("The training data is : \n",data)
concepts = np.array(data.iloc[:,0:-1])
print("\nThe concepts are :\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nThe targets of concepts are :\n",target)
def learn(concepts, target):
print("\nInitialization of specific_h and general_h :")
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("General Hypothesis :\n",general_h)
print("Specific Hypothesis :\n",specific_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print("\nStep",i+1,"of Candidate Elimination Algorithm :")
print("\nGeneral Hypothesis is :\n",general_h)
print("\nSpecific Hypothesis is :\n",specific_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("\nFinal General Hypothesis :", g_final, sep="\n")
print("\nFinal Specific Hypothesis :", s_final, sep="\n")
Dataset: -
enjoysport.csv:
Sky AirTemp Humidity Wind Water Forecast Enjoysport
Output: -
The training data is :
Sky AirTemp Humidity Wind Water Forecast Enjoysport
0 sunny warm normal strong warm same yes
1 sunny warm high strong warm same yes
2 rainy cold high strong warm change no
3 sunny warm high strong cool change yes
Specific Hypothesis is :
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
Specific Hypothesis is :
['sunny' 'warm' '?' 'strong' '?' '?']
Experiment 3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.
Code: -
import pandas as pd
import math
import numpy as np
data = pd.read_csv("decisiontreedataset.csv")
features = [feat for feat in data]
features.remove("answer")
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
21001F0056 Khasim MCA
9
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
def printTree(root: Node, depth=0):
for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)
def classify(root: Node, new):
for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:", child.pred)
exit
else:
classify (child.children[0], new)
printTree(root)
print ("--------------------------------------------------------------------------------")
new = {"outlook":"sunny", "temperature":"hot", "humidity":"normal", "wind":"strong"}
classify (root, new)
Dataset: -
decisiontreedataset.csv:
outlook temperature humidity wind answer
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rain mild high weak yes
rain cool normal weak yes
rain cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rain mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rain mild high strong no
Output: -
Decision Tree is:
outlook
overcast -> ['yes']
rain
wind
strong -> ['no']
sunny
humidity
high -> ['no']
--------------------------------------------------------------------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot', 'humidity': 'normal',
'wind': 'strong'} is: ['yes']
Experiment 4:
Write a Python program to implement k-Nearest Neighbour algorithm to classify the
iris data set. Print both correct and wrong predictions.
Code: -
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
dataset = pd.read_csv("irisdataset.csv", names=names)
x = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(x.head())
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(xtrain, ytrain)
ypred = classifier.predict(xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
21001F0056 Khasim MCA
14
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
Dataset: -
irisdataset.csv: - headings are not necessary in excel csv file
sepal-length sepal-width petal-length petal-width class
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5 3.6 1.4 0.2 Iris-setosa
5.4 3.9 1.7 0.4 Iris-setosa
4.6 3.4 1.4 0.3 Iris-setosa
5 3.4 1.5 0.2 Iris-setosa
4.4 2.9 1.4 0.2 Iris-setosa
4.9 3.1 1.5 0.1 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
5.5 2.3 4 1.3 Iris-versicolor
6.5 2.8 4.6 1.5 Iris-versicolor
5.7 2.8 4.5 1.3 Iris-versicolor
6.3 3.3 4.7 1.6 Iris-versicolor
4.9 2.4 3.3 1 Iris-versicolor
6.6 2.9 4.6 1.3 Iris-versicolor
5.2 2.7 3.9 1.4 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 5.1 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
6.5 3 5.8 2.2 Iris-virginica
7.6 3 6.6 2.1 Iris-virginica
4.9 2.5 4.5 1.7 Iris-virginica
7.3 2.9 6.3 1.8 Iris-virginica
6.7 2.5 5.8 1.8 Iris-virginica
7.2 3.6 6.1 2.5 Iris-virginica
21001F0056 Khasim MCA
15
Output: -
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------
Confusion Matrix:
[[1 0]
[1 1]]
-------------------------------------------------------------------
Classification Report:
precision recall f1-score support
Iris-versicolor 0.50 1.00 0.67 1
Iris-virginica 1.00 0.50 0.67 2
accuracy 0.67 3
macro avg 0.75 0.75 0.67 3
weighted avg 0.83 0.67 0.67 3
-------------------------------------------------------------------
Accuracy of the classifer is 0.67
Experiment 5:
Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate data sets.
Code: -
import numpy as np
x = np.array(([1, 2], [3, 4], [5, 6]), dtype=float)
y = np.array(([30], [60], [90]), dtype=float)
x = x/np.amax(x,axis=0)
y = y/100
def sigmoid (x):
return 1/(1 + np.exp(-x))
def derivatives_sigmoid(x):
return x * (1 - x)
epoch=5000
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(x,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
21001F0056 Khasim MCA
17
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
wh += x.T.dot(d_hiddenlayer) *lr
print("Input: \n" , str(x))
print("Actual Output: \n" , str(y))
print("Predicted Output: \n" ,output)
Output: -
Input:
[[0.2 0.33333333]
[0.6 0.66666667]
[1. 1. ]]
Actual Output:
[[0.3]
[0.6]
[0.9]]
Predicted Output:
[[0.33497618]
[0.64084238]
[0.80288884]]
Experiment 6:
Write a program to implement the naive Bayesian classifier for a sample training data
set stored as a .csv file. Compute the accuracy of the classifier, considering few test data sets.
Code: -
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('playtennis.csv')
print("The first 5 values of data is :\n",data.head())
x = data.iloc[:,:-1]
print("\nThe First 5 values of train data is\n",x.head())
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
le_outlook = LabelEncoder()
x.Outlook = le_outlook.fit_transform(x.Outlook)
le_Temperature = LabelEncoder()
x.Temperature = le_Temperature.fit_transform(x.Temperature)
le_Humidity = LabelEncoder()
x.Humidity = le_Humidity.fit_transform(x.Humidity)
le_Windy = LabelEncoder()
x.Windy = le_Windy.fit_transform(x.Windy)
print("\nNow the Train data is :\n",x.head())
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.20)
classifier = GaussianNB()
classifier.fit(x_train,y_train)
print("Accuracy is:",accuracy_score(classifier.predict(x_test),y_test))
Dataset: -
playtennis.csv:
Outlook Temperature Humidity Windy PlayTennis
rainy cold high weak no
rainy cold normal mild no
sunny warm normal weak yes
rainy cold normal weak no
sunny mild mild strong yes
cloudy cold high weak yes
rainy warm normal strong no
sunny mild mild mild yes
cloudy warm mild mild yes
cloudy cold high high no
Output: -
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 rainy cold high weak no
1 rainy cold normal mild no
2 sunny warm normal weak yes
3 rainy cold normal weak no
4 sunny mild mild strong yes
Experiment 7:
Write a Python program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set.
Code: -
import numpy as np
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('naivebayesnetworkdataset.csv')
heartDisease = heartDisease.replace('?',np.nan)
print('Sample instances from the dataset are:')
print(heartDisease.head())
print('\n Attributes and datatypes of dataset are:')
print(heartDisease.dtypes)
model=BayesianNetwork([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),
('cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum Likelihood Estimator')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
print('\n Inferencing with Bayesian Network:')
HeartDiseaseTest_infer = VariableElimination(model)
print('\n 1. Probability of HeartDisease given evidence= restecg')
q1=HeartDiseaseTest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)
print('\n 2. Probability of HeartDisease given evidence= cp ')
q2=HeartDiseaseTest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)
Dataset: -
naivebayesnetworkdataset.csv:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal heartdisease
62 1 2 128 208 1 2 140 0 0 1 0 3 0
57 1 4 110 201 0 0 126 1 1.5 2 0 6 0
58 1 4 146 218 0 0 105 0 2 2 1 7 1
64 1 4 128 263 0 0 105 1 0.2 2 1 7 0
51 0 3 120 295 0 2 157 0 0.6 1 0 3 0
43 1 4 115 303 0 0 181 0 1.2 2 0 3 0
42 0 3 120 209 0 0 173 0 0 2 0 3 0
67 0 4 106 223 0 0 142 0 0.3 1 2 3 0
76 0 3 140 197 0 1 116 0 1.1 2 0 3 0
70 1 2 156 245 0 2 143 0 0 1 0 3 0
57 1 2 124 261 0 0 141 0 0.3 1 0 7 1
44 0 3 118 242 0 0 149 0 0.3 2 1 3 0
58 0 2 136 319 1 2 152 0 0 1 2 3 3
60 0 1 150 240 0 0 171 0 0.9 1 0 3 0
44 1 3 120 226 0 0 169 0 0 1 0 3 0
61 1 4 138 166 0 2 125 1 3.6 2 1 3 4
42 1 4 136 315 0 0 125 1 1.8 2 0 6 2
52 1 4 128 204 1 0 156 1 1 2 0 ? 2
59 1 3 126 218 1 0 134 0 2.2 2 1 6 2
40 1 4 152 223 0 0 181 0 0 1 0 7 1
58 0 4 130 197 0 0 131 0 0.6 2 0 3 0
57 1 4 110 335 0 0 143 1 3 2 1 7 2
47 1 3 130 253 0 0 179 0 0 1 0 3 0
55 0 4 128 205 0 1 130 1 2 2 1 7 3
35 1 2 122 192 0 0 174 0 0 1 0 3 0
61 1 4 148 203 0 0 161 0 0 1 1 7 2
58 1 4 114 318 0 1 140 0 4.4 3 3 6 4
58 0 4 170 225 1 2 146 1 2.8 2 2 6 2
58 1 2 125 220 0 0 144 0 0.4 2 ? 7 0
56 1 2 130 221 0 2 163 0 0 1 0 7 0
Output: -
Sample instances from the dataset are:
age sex cp trestbps chol ... oldpeak slope ca thal heartdisease
0 62 1 2 128 208 ... 0.0 1 0 3 0
1 57 1 4 110 201 ... 1.5 2 0 6 0
2 58 1 4 146 218 ... 2.0 2 1 7 1
3 64 1 4 128 263 ... 0.2 2 1 7 0
4 51 0 3 120 295 ... 0.6 1 0 3 0
[5 rows x 14 columns]
Experiment 8:
Assuming a set of documents that need to be classified, use the naive Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision and recall for your data set.
Code: -
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
msg = pd.read_csv('naivebayestextdoc.csv',encoding= 'unicode_escape', names=['message',
'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
x = msg.message
y = msg.labelnum
Dataset: -
naivebayestextdoc.csv: headings are not necessary in excel csv file
Message Label
I love this sandwich pos
This is an amazing place pos
I feel very good about these beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tired of this stuff neg
I can’t deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos
I do not like the taste of this juice neg
I love to dance pos
I am sick and tired of this place neg
What a great holiday pos
That is a bad locality to stay neg
We will have good fun tomorrow pos
I went to my enemy’s house today neg
Output: -
Total Instances of Dataset: 18
about am an and awesome bad ... view we what will with work
0 0 0 0 0 0 0 ... 0 0 0 0 0 1
1 0 0 0 0 0 0 ... 0 0 0 0 0 0
2 0 0 0 0 0 0 ... 0 0 0 0 0 0
3 1 0 0 0 0 0 ... 0 0 0 0 0 0
4 0 0 0 0 0 0 ... 0 0 1 0 0 0
[5 rows x 47 columns]
Accuracy Metrics:
Accuracy: 0.8
Recall: 1.0
Precision: 0.6666666666666666
Confusion Matrix:
[[2 1]
[0 2]]
Experiment 9:
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering using Python Programming.
Code: -
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']
dataset = pd.read_csv("irisdataset.csv", names=names)
x = dataset.iloc[:, :-1]
label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}
y = [label[c] for c in dataset.iloc[:, -1]]
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[y])
model=KMeans(n_clusters=3, random_state=0).fit(x)
plt.subplot(1,3,2)
plt.title('K-Means')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[model.labels_])
print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))
print('The Confusion matrix of K-Mean:\n',metrics.confusion_matrix(y, model.labels_))
gmm=GaussianMixture(n_components=3, random_state=0).fit(x)
y_cluster_gmm=gmm.predict(x)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[y_cluster_gmm])
print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))
print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))
plt.show()
Dataset: -
irisdataset.csv: - headings are not necessary in excel csv file
sepal-length sepal-width petal-length petal-width class
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5 3.6 1.4 0.2 Iris-setosa
5.4 3.9 1.7 0.4 Iris-setosa
4.6 3.4 1.4 0.3 Iris-setosa
5 3.4 1.5 0.2 Iris-setosa
4.4 2.9 1.4 0.2 Iris-setosa
4.9 3.1 1.5 0.1 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
5.5 2.3 4 1.3 Iris-versicolor
6.5 2.8 4.6 1.5 Iris-versicolor
5.7 2.8 4.5 1.3 Iris-versicolor
6.3 3.3 4.7 1.6 Iris-versicolor
4.9 2.4 3.3 1 Iris-versicolor
6.6 2.9 4.6 1.3 Iris-versicolor
5.2 2.7 3.9 1.4 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 5.1 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
6.5 3 5.8 2.2 Iris-virginica
7.6 3 6.6 2.1 Iris-virginica
4.9 2.5 4.5 1.7 Iris-virginica
7.3 2.9 6.3 1.8 Iris-virginica
6.7 2.5 5.8 1.8 Iris-virginica
7.2 3.6 6.1 2.5 Iris-virginica
21001F0056 Khasim MCA
32
Output: -
The accuracy score of K-Mean: 0.4
The Confusion matrix of K-Mean:
[[10 0 0]
[ 0 0 10]
[ 0 8 2]]
The accuracy score of EM: 0.03333333333333333
The Confusion matrix of EM:
[[ 0 10 0]
[ 6 0 4]
[ 9 0 1]]
Experiment 10:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.
Code: -
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def kernel(point,xmat, k):
m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - x[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(x.T*(wei*x)).I*(x.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
data = pd.read_csv('lowweightregressiondataset.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)
mbill = np.mat(bill)
mtip = np.mat(tip)
21001F0056 Khasim MCA
34
m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
x= np.hstack((one.T,mbill.T))
ypred = localWeightRegression(x,mtip,3)
SortIndex = x[:,1].argsort(0)
xsort = x[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=3)
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()
Dataset: -
lowweightregressiondataset.csv:
Output: -