0% found this document useful (0 votes)
37 views38 pages

ML Lab - 231009 - 210335

The document describes an experiment implementing the Candidate-Elimination algorithm to output the set of hypotheses consistent with a given training data set stored in a CSV file. It provides the code to initialize the specific and general hypotheses, learn from the training examples by eliminating inconsistent values, and output the final specific and general hypotheses. The training data contains examples of weather conditions and whether someone enjoys sports. The algorithm is demonstrated on this data set to learn a hypothesis to predict sport enjoyment.

Uploaded by

Jahnavi Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views38 pages

ML Lab - 231009 - 210335

The document describes an experiment implementing the Candidate-Elimination algorithm to output the set of hypotheses consistent with a given training data set stored in a CSV file. It provides the code to initialize the specific and general hypotheses, learn from the training examples by eliminating inconsistent values, and output the final specific and general hypotheses. The training data contains examples of weather conditions and whether someone enjoys sports. The algorithm is demonstrated on this data set to learn a hypothesis to predict sport enjoyment.

Uploaded by

Jahnavi Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

MACHINE LEARNING WITH PYTHON

LABORATORY
JNTUA COLLEGE OF ENGINEERING (AUTONOMOUS)
ANANTAPUR

Department of Computer Science and Engineering


Master of Computer Applications (MCA), R20

Prepared by
Potte Thumucherla Khasim Baba
Admission No: - 21001F0056

21001F0056 Khasim MCA


Experiment Experiment Name Page No.
No.
1 Implement and demonstrate the FIND-S algorithm for finding the 1-3
most specific hypothesis based on a given set of training data
samples. Read the training data from a .csv file.

2 For a given set of training data examples stored in a .csv file, 4-7
implement and demonstrate the Candidate-Elimination algorithm
to output a description of the set of all hypotheses consistent with
the training examples.

3 Write a program to demonstrate the working of the decision tree 8-12


based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample.

4 Write a Python program to implement k-Nearest Neighbour 13-15


algorithm to classify the iris data set. Print both correct and
wrong predictions.

5 Build an Artificial Neural Network by implementing the 16-18


Backpropagation algorithm and test the same using appropriate
data sets.

6 Write a program to implement the naive Bayesian classifier for a 19-22


sample training data set stored as a .csv file. Compute the
accuracy of the classifier, considering few test data sets.

7 Write a Python program to construct a Bayesian network 23-26


considering medical data. Use this model to demonstrate the
diagnosis of heart patients using standard Heart Disease Data Set.

8 Assuming a set of documents that need to be classified, use the 27-29


naive Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the
accuracy, precision and recall for your data set.

9 Apply EM algorithm to cluster a set of data stored in a .CSV file. 30-32


Use the same data set for clustering using k-Means algorithm.
Compare the results of these two algorithms and comment on the
quality of clustering using Python Programming.

10 Implement the non-parametric Locally Weighted Regression 33-36


algorithm in order to fit data points. Select appropriate data set
for your experiment and draw graphs.

21001F0056 Khasim MCA


1

Experiment 1:
Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .csv
file.

Code: -
import csv
a = []
print("The given Training Data Set")
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\nThe total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\nThe initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'yes':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\nThe hypothesis for the training instance {} is :\n" .format(i+1),hypothesis)

print("\nThe Maximally specific hypothesis for the training instance is ")


print(hypothesis)

21001F0056 Khasim MCA


2

Dataset: -
Sky AirTemp Humidity Wind Water Forecast Enjoysport

sunny warm normal strong warm same yes

sunny warm high strong warm same yes

rainy cold high strong warm change no

sunny warm high strong cool change yes

enjoysport.csv: -
sunny warm normal strong warm same yes

sunny warm high strong warm same yes

rainy cold high strong warm change no

sunny warm high strong cool change yes

21001F0056 Khasim MCA


3

Output: -
The given Training Data Set
[['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes'], ['sunny', 'warm', 'high', 'strong',
'warm', 'same', 'yes'], ['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no'], ['sunny', 'warm',
'high', 'strong', 'cool', 'change', 'yes']]

The total number of training instances are : 4

The initial hypothesis is :


['0', '0', '0', '0', '0', '0']

The hypothesis for the training instance 1 is :


['sunny', 'warm', 'normal', 'strong', 'warm', 'same']

The hypothesis for the training instance 2 is :


['sunny', 'warm', '?', 'strong', 'warm', 'same']

The hypothesis for the training instance 3 is :


['sunny', 'warm', '?', 'strong', 'warm', 'same']

The hypothesis for the training instance 4 is :


['sunny', 'warm', '?', 'strong', '?', '?']

The Maximally specific hypothesis for the training instance is


['sunny', 'warm', '?', 'strong', '?', '?']

21001F0056 Khasim MCA


4

Experiment 2:
For a given set of training data examples stored in a .csv file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

Code: -
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
print("The training data is : \n",data)
concepts = np.array(data.iloc[:,0:-1])
print("\nThe concepts are :\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nThe targets of concepts are :\n",target)
def learn(concepts, target):
print("\nInitialization of specific_h and general_h :")
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("General Hypothesis :\n",general_h)
print("Specific Hypothesis :\n",specific_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]

21001F0056 Khasim MCA


5

else:
general_h[x][x] = '?'
print("\nStep",i+1,"of Candidate Elimination Algorithm :")
print("\nGeneral Hypothesis is :\n",general_h)
print("\nSpecific Hypothesis is :\n",specific_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("\nFinal General Hypothesis :", g_final, sep="\n")
print("\nFinal Specific Hypothesis :", s_final, sep="\n")

Dataset: -

enjoysport.csv:
Sky AirTemp Humidity Wind Water Forecast Enjoysport

sunny warm normal strong warm same yes

sunny warm high strong warm same yes

rainy cold high strong warm change no

sunny warm high strong cool change yes

21001F0056 Khasim MCA


6

Output: -
The training data is :
Sky AirTemp Humidity Wind Water Forecast Enjoysport
0 sunny warm normal strong warm same yes
1 sunny warm high strong warm same yes
2 rainy cold high strong warm change no
3 sunny warm high strong cool change yes

The concepts are :


[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']]
The targets of concepts are :
['yes' 'yes' 'no' 'yes']
Initialization of specific_h and general_h :
General Hypothesis :
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Specific Hypothesis :
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']

Step 1 of Candidate Elimination Algorithm :


General Hypothesis is :
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Specific Hypothesis is :
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']

21001F0056 Khasim MCA


7

Step 2 of Candidate Elimination Algorithm :


General Hypothesis is :
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Specific Hypothesis is :
['sunny' 'warm' '?' 'strong' 'warm' 'same']

Step 3 of Candidate Elimination Algorithm :


General Hypothesis is :
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]
Specific Hypothesis is :
['sunny' 'warm' '?' 'strong' 'warm' 'same']

Step 4 of Candidate Elimination Algorithm :


General Hypothesis is :
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Specific Hypothesis is :
['sunny' 'warm' '?' 'strong' '?' '?']

Final General Hypothesis :


[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

Final Specific Hypothesis :


['sunny' 'warm' '?' 'strong' '?' '?']

21001F0056 Khasim MCA


8

Experiment 3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.

Code: -
import pandas as pd
import math
import numpy as np
data = pd.read_csv("decisiontreedataset.csv")
features = [feat for feat in data]
features.remove("answer")
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
21001F0056 Khasim MCA
9

return -(p * math.log(p, 2) + n * math.log(n, 2))


def info_gain(examples, attr):
uniq = np.unique(examples[attr])
gain = entropy(examples)
for u in uniq:
subdata = examples[examples[attr] == u]
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
return gain
def ID3(examples, attrs):
root = Node()
max_gain = 0
max_feat = ""
for feature in attrs:
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
uniq = np.unique(examples[max_feat])
for u in uniq:
subdata = examples[examples[max_feat] == u]
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)

21001F0056 Khasim MCA


10

else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
def printTree(root: Node, depth=0):
for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)
def classify(root: Node, new):
for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:", child.pred)
exit
else:
classify (child.children[0], new)

root = ID3(data, features)


print("Decision Tree is:")
21001F0056 Khasim MCA
11

printTree(root)
print ("--------------------------------------------------------------------------------")
new = {"outlook":"sunny", "temperature":"hot", "humidity":"normal", "wind":"strong"}
classify (root, new)

Dataset: -
decisiontreedataset.csv:
outlook temperature humidity wind answer
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rain mild high weak yes
rain cool normal weak yes
rain cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rain mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rain mild high strong no

21001F0056 Khasim MCA


12

Output: -
Decision Tree is:
outlook
overcast -> ['yes']

rain
wind
strong -> ['no']

weak -> ['yes']

sunny
humidity
high -> ['no']

normal -> ['yes']

--------------------------------------------------------------------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot', 'humidity': 'normal',
'wind': 'strong'} is: ['yes']

21001F0056 Khasim MCA


13

Experiment 4:
Write a Python program to implement k-Nearest Neighbour algorithm to classify the
iris data set. Print both correct and wrong predictions.

Code: -
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
dataset = pd.read_csv("irisdataset.csv", names=names)
x = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(x.head())
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(xtrain, ytrain)
ypred = classifier.predict(xtest)

i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
21001F0056 Khasim MCA
14

print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))

Dataset: -
irisdataset.csv: - headings are not necessary in excel csv file
sepal-length sepal-width petal-length petal-width class
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5 3.6 1.4 0.2 Iris-setosa
5.4 3.9 1.7 0.4 Iris-setosa
4.6 3.4 1.4 0.3 Iris-setosa
5 3.4 1.5 0.2 Iris-setosa
4.4 2.9 1.4 0.2 Iris-setosa
4.9 3.1 1.5 0.1 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
5.5 2.3 4 1.3 Iris-versicolor
6.5 2.8 4.6 1.5 Iris-versicolor
5.7 2.8 4.5 1.3 Iris-versicolor
6.3 3.3 4.7 1.6 Iris-versicolor
4.9 2.4 3.3 1 Iris-versicolor
6.6 2.9 4.6 1.3 Iris-versicolor
5.2 2.7 3.9 1.4 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 5.1 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
6.5 3 5.8 2.2 Iris-virginica
7.6 3 6.6 2.1 Iris-virginica
4.9 2.5 4.5 1.7 Iris-virginica
7.3 2.9 6.3 1.8 Iris-virginica
6.7 2.5 5.8 1.8 Iris-virginica
7.2 3.6 6.1 2.5 Iris-virginica
21001F0056 Khasim MCA
15

Output: -
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------
Confusion Matrix:
[[1 0]
[1 1]]
-------------------------------------------------------------------
Classification Report:
precision recall f1-score support
Iris-versicolor 0.50 1.00 0.67 1
Iris-virginica 1.00 0.50 0.67 2

accuracy 0.67 3
macro avg 0.75 0.75 0.67 3
weighted avg 0.83 0.67 0.67 3
-------------------------------------------------------------------
Accuracy of the classifer is 0.67

21001F0056 Khasim MCA


16

Experiment 5:
Build an Artificial Neural Network by implementing the Backpropagation algorithm
and test the same using appropriate data sets.

Code: -
import numpy as np
x = np.array(([1, 2], [3, 4], [5, 6]), dtype=float)
y = np.array(([30], [60], [90]), dtype=float)
x = x/np.amax(x,axis=0)
y = y/100
def sigmoid (x):
return 1/(1 + np.exp(-x))
def derivatives_sigmoid(x):
return x * (1 - x)
epoch=5000
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(x,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
21001F0056 Khasim MCA
17

EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
wh += x.T.dot(d_hiddenlayer) *lr
print("Input: \n" , str(x))
print("Actual Output: \n" , str(y))
print("Predicted Output: \n" ,output)

21001F0056 Khasim MCA


18

Output: -
Input:
[[0.2 0.33333333]
[0.6 0.66666667]
[1. 1. ]]
Actual Output:
[[0.3]
[0.6]
[0.9]]
Predicted Output:
[[0.33497618]
[0.64084238]
[0.80288884]]

21001F0056 Khasim MCA


19

Experiment 6:
Write a program to implement the naive Bayesian classifier for a sample training data
set stored as a .csv file. Compute the accuracy of the classifier, considering few test data sets.

Code: -
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = pd.read_csv('playtennis.csv')
print("The first 5 values of data is :\n",data.head())
x = data.iloc[:,:-1]
print("\nThe First 5 values of train data is\n",x.head())
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
le_outlook = LabelEncoder()
x.Outlook = le_outlook.fit_transform(x.Outlook)
le_Temperature = LabelEncoder()
x.Temperature = le_Temperature.fit_transform(x.Temperature)
le_Humidity = LabelEncoder()
x.Humidity = le_Humidity.fit_transform(x.Humidity)
le_Windy = LabelEncoder()
x.Windy = le_Windy.fit_transform(x.Windy)
print("\nNow the Train data is :\n",x.head())
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.20)

21001F0056 Khasim MCA


20

classifier = GaussianNB()
classifier.fit(x_train,y_train)
print("Accuracy is:",accuracy_score(classifier.predict(x_test),y_test))

Dataset: -
playtennis.csv:
Outlook Temperature Humidity Windy PlayTennis
rainy cold high weak no
rainy cold normal mild no
sunny warm normal weak yes
rainy cold normal weak no
sunny mild mild strong yes
cloudy cold high weak yes
rainy warm normal strong no
sunny mild mild mild yes
cloudy warm mild mild yes
cloudy cold high high no

21001F0056 Khasim MCA


21

Output: -
The first 5 values of data is :
Outlook Temperature Humidity Windy PlayTennis
0 rainy cold high weak no
1 rainy cold normal mild no
2 sunny warm normal weak yes
3 rainy cold normal weak no
4 sunny mild mild strong yes

The First 5 values of train data is


Outlook Temperature Humidity Windy
0 rainy cold high weak
1 rainy cold normal mild
2 sunny warm normal weak
3 rainy cold normal weak
4 sunny mild mild strong

The first 5 values of Train output is


0 no
1 no
2 yes
3 no
4 yes
Name: PlayTennis, dtype: object

21001F0056 Khasim MCA


22

Now the Train data is :


Outlook Temperature Humidity Windy
0 1 0 0 3
1 1 0 2 1
2 2 2 2 3
3 1 0 2 3
4 2 1 1 2
Now the Train output is
[0 0 1 0 1 1 0 1 1 0]
Accuracy is: 0.5

21001F0056 Khasim MCA


23

Experiment 7:
Write a Python program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set.

Code: -
import numpy as np
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('naivebayesnetworkdataset.csv')
heartDisease = heartDisease.replace('?',np.nan)
print('Sample instances from the dataset are:')
print(heartDisease.head())
print('\n Attributes and datatypes of dataset are:')
print(heartDisease.dtypes)
model=BayesianNetwork([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),
('cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum Likelihood Estimator')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
print('\n Inferencing with Bayesian Network:')
HeartDiseaseTest_infer = VariableElimination(model)
print('\n 1. Probability of HeartDisease given evidence= restecg')
q1=HeartDiseaseTest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)
print('\n 2. Probability of HeartDisease given evidence= cp ')
q2=HeartDiseaseTest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)

21001F0056 Khasim MCA


24

Dataset: -
naivebayesnetworkdataset.csv:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal heartdisease
62 1 2 128 208 1 2 140 0 0 1 0 3 0
57 1 4 110 201 0 0 126 1 1.5 2 0 6 0
58 1 4 146 218 0 0 105 0 2 2 1 7 1
64 1 4 128 263 0 0 105 1 0.2 2 1 7 0
51 0 3 120 295 0 2 157 0 0.6 1 0 3 0
43 1 4 115 303 0 0 181 0 1.2 2 0 3 0
42 0 3 120 209 0 0 173 0 0 2 0 3 0
67 0 4 106 223 0 0 142 0 0.3 1 2 3 0
76 0 3 140 197 0 1 116 0 1.1 2 0 3 0
70 1 2 156 245 0 2 143 0 0 1 0 3 0
57 1 2 124 261 0 0 141 0 0.3 1 0 7 1
44 0 3 118 242 0 0 149 0 0.3 2 1 3 0
58 0 2 136 319 1 2 152 0 0 1 2 3 3
60 0 1 150 240 0 0 171 0 0.9 1 0 3 0
44 1 3 120 226 0 0 169 0 0 1 0 3 0
61 1 4 138 166 0 2 125 1 3.6 2 1 3 4
42 1 4 136 315 0 0 125 1 1.8 2 0 6 2
52 1 4 128 204 1 0 156 1 1 2 0 ? 2
59 1 3 126 218 1 0 134 0 2.2 2 1 6 2
40 1 4 152 223 0 0 181 0 0 1 0 7 1
58 0 4 130 197 0 0 131 0 0.6 2 0 3 0
57 1 4 110 335 0 0 143 1 3 2 1 7 2
47 1 3 130 253 0 0 179 0 0 1 0 3 0
55 0 4 128 205 0 1 130 1 2 2 1 7 3
35 1 2 122 192 0 0 174 0 0 1 0 3 0
61 1 4 148 203 0 0 161 0 0 1 1 7 2
58 1 4 114 318 0 1 140 0 4.4 3 3 6 4
58 0 4 170 225 1 2 146 1 2.8 2 2 6 2
58 1 2 125 220 0 0 144 0 0.4 2 ? 7 0
56 1 2 130 221 0 2 163 0 0 1 0 7 0

21001F0056 Khasim MCA


25

Output: -
Sample instances from the dataset are:
age sex cp trestbps chol ... oldpeak slope ca thal heartdisease
0 62 1 2 128 208 ... 0.0 1 0 3 0
1 57 1 4 110 201 ... 1.5 2 0 6 0
2 58 1 4 146 218 ... 2.0 2 1 7 1
3 64 1 4 128 263 ... 0.2 2 1 7 0
4 51 0 3 120 295 ... 0.6 1 0 3 0

[5 rows x 14 columns]

Attributes and datatypes of dataset are:


age int64
sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca object
thal object
heartdisease int64
dtype: object

Learning CPD using Maximum Likelihood Estimator

Inferencing with Bayesian Network:

21001F0056 Khasim MCA


26

1. Probability of HeartDisease given evidence= restecg


+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.0793 |
+-----------------+---------------------+
| heartdisease(1) | 0.0000 |
+-----------------+---------------------+
| heartdisease(2) | 0.0000 |
+-----------------+---------------------+
| heartdisease(3) | 0.4404 |
+-----------------+---------------------+
| heartdisease(4) | 0.4803 |
+-----------------+---------------------+

2. Probability of HeartDisease given evidence= cp


+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.2352 |
+-----------------+---------------------+
| heartdisease(1) | 0.2180 |
+-----------------+---------------------+
| heartdisease(2) | 0.1663 |
+-----------------+---------------------+
| heartdisease(3) | 0.2142 |
+-----------------+---------------------+
| heartdisease(4) | 0.1663 |
+-----------------+---------------------+

21001F0056 Khasim MCA


27

Experiment 8:
Assuming a set of documents that need to be classified, use the naive Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision and recall for your data set.

Code: -
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
msg = pd.read_csv('naivebayestextdoc.csv',encoding= 'unicode_escape', names=['message',
'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
x = msg.message
y = msg.labelnum

xtrain, xtest, ytrain, ytest = train_test_split(x, y)


count_v = CountVectorizer()
xtrain_dm = count_v.fit_transform(xtrain)
xtest_dm = count_v.transform(xtest)
df = pd.DataFrame(xtrain_dm.toarray(),columns=count_v.get_feature_names_out())
print('\nFeatures for first 5 training instances are:\n')
print(df[0:5])
clf = MultinomialNB()
clf.fit(xtrain_dm, ytrain)
pred = clf.predict(xtest_dm)
print('\nClassstification results of testing samples are:\n')

21001F0056 Khasim MCA


28

for doc, p in zip(xtrain, pred):


p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
print('\nAccuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

Dataset: -
naivebayestextdoc.csv: headings are not necessary in excel csv file

Message Label
I love this sandwich pos
This is an amazing place pos
I feel very good about these beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tired of this stuff neg
I can’t deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos
I do not like the taste of this juice neg
I love to dance pos
I am sick and tired of this place neg
What a great holiday pos
That is a bad locality to stay neg
We will have good fun tomorrow pos
I went to my enemy’s house today neg

21001F0056 Khasim MCA


29

Output: -
Total Instances of Dataset: 18

Features for first 5 training instances are:

about am an and awesome bad ... view we what will with work
0 0 0 0 0 0 0 ... 0 0 0 0 0 1
1 0 0 0 0 0 0 ... 0 0 0 0 0 0
2 0 0 0 0 0 0 ... 0 0 0 0 0 0
3 1 0 0 0 0 0 ... 0 0 0 0 0 0
4 0 0 0 0 0 0 ... 0 0 1 0 0 0

[5 rows x 47 columns]

Classstification results of testing samples are:

This is my best work -> pos


I do not like the taste of this juice -> neg
I love this sandwich -> pos
I feel very good about these beers -> neg
What a great holiday -> pos

Accuracy Metrics:

Accuracy: 0.8
Recall: 1.0
Precision: 0.6666666666666666
Confusion Matrix:
[[2 1]
[0 2]]

21001F0056 Khasim MCA


30

Experiment 9:
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering using Python Programming.

Code: -
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']
dataset = pd.read_csv("irisdataset.csv", names=names)
x = dataset.iloc[:, :-1]
label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}
y = [label[c] for c in dataset.iloc[:, -1]]
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[y])
model=KMeans(n_clusters=3, random_state=0).fit(x)
plt.subplot(1,3,2)
plt.title('K-Means')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[model.labels_])
print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))
print('The Confusion matrix of K-Mean:\n',metrics.confusion_matrix(y, model.labels_))
gmm=GaussianMixture(n_components=3, random_state=0).fit(x)
y_cluster_gmm=gmm.predict(x)

21001F0056 Khasim MCA


31

plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(x.Petal_Length,x.Petal_Width,c=colormap[y_cluster_gmm])
print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))
print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))
plt.show()

Dataset: -
irisdataset.csv: - headings are not necessary in excel csv file
sepal-length sepal-width petal-length petal-width class
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5 3.6 1.4 0.2 Iris-setosa
5.4 3.9 1.7 0.4 Iris-setosa
4.6 3.4 1.4 0.3 Iris-setosa
5 3.4 1.5 0.2 Iris-setosa
4.4 2.9 1.4 0.2 Iris-setosa
4.9 3.1 1.5 0.1 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
5.5 2.3 4 1.3 Iris-versicolor
6.5 2.8 4.6 1.5 Iris-versicolor
5.7 2.8 4.5 1.3 Iris-versicolor
6.3 3.3 4.7 1.6 Iris-versicolor
4.9 2.4 3.3 1 Iris-versicolor
6.6 2.9 4.6 1.3 Iris-versicolor
5.2 2.7 3.9 1.4 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 5.1 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
6.5 3 5.8 2.2 Iris-virginica
7.6 3 6.6 2.1 Iris-virginica
4.9 2.5 4.5 1.7 Iris-virginica
7.3 2.9 6.3 1.8 Iris-virginica
6.7 2.5 5.8 1.8 Iris-virginica
7.2 3.6 6.1 2.5 Iris-virginica
21001F0056 Khasim MCA
32

Output: -
The accuracy score of K-Mean: 0.4
The Confusion matrix of K-Mean:
[[10 0 0]
[ 0 0 10]
[ 0 8 2]]
The accuracy score of EM: 0.03333333333333333
The Confusion matrix of EM:
[[ 0 10 0]
[ 6 0 4]
[ 9 0 1]]

21001F0056 Khasim MCA


33

Experiment 10:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

Code: -
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def kernel(point,xmat, k):
m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - x[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(x.T*(wei*x)).I*(x.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
data = pd.read_csv('lowweightregressiondataset.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)
mbill = np.mat(bill)
mtip = np.mat(tip)
21001F0056 Khasim MCA
34

m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
x= np.hstack((one.T,mbill.T))
ypred = localWeightRegression(x,mtip,3)
SortIndex = x[:,1].argsort(0)
xsort = x[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=3)
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

21001F0056 Khasim MCA


35

Dataset: -
lowweightregressiondataset.csv:

total_bill tip sex smoker day time size


28.17 6.5 Female Yes Sat Dinner 3
12.9 1.1 Female Yes Sat Dinner 2
28.15 3 Male Yes Sat Dinner 5
11.59 1.5 Male Yes Sat Dinner 2
7.74 1.44 Male Yes Sat Dinner 2
30.14 3.09 Female Yes Sat Dinner 4
12.16 2.2 Male Yes Fri Lunch 2
13.42 3.48 Female Yes Fri Lunch 2
8.58 1.92 Male Yes Fri Lunch 1
15.98 3 Female No Fri Lunch 3
13.42 1.58 Male Yes Fri Lunch 2
16.27 2.5 Female Yes Fri Lunch 2
10.09 2 Female Yes Fri Lunch 2
20.45 3 Male No Sat Dinner 4
13.28 2.72 Male No Sat Dinner 2
13.51 2 Male Yes Thur Lunch 2
18.71 4 Male Yes Thur Lunch 3
12.74 2.01 Female Yes Thur Lunch 2
13 2 Female Yes Thur Lunch 2
16.4 2.5 Female Yes Thur Lunch 2
20.53 4 Male Yes Thur Lunch 4
16.47 3.23 Female Yes Thur Lunch 3
26.59 3.41 Male Yes Sat Dinner 3
38.73 3 Male Yes Sat Dinner 4
24.27 2.03 Male Yes Sat Dinner 2
12.76 2.23 Female Yes Sat Dinner 2
30.06 2 Male Yes Sat Dinner 3
25.89 5.16 Male Yes Sat Dinner 4
48.33 9 Male No Sat Dinner 4
13.27 2.5 Female Yes Sat Dinner 2

21001F0056 Khasim MCA


36

Output: -

21001F0056 Khasim MCA

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy