Ad3461-ML Manual
Ad3461-ML Manual
Write a program to implement the Candidate Elimination Algorithm using python script.
ALGORITHM:
Step 1: Initialize the version space
o Initialize the most general hypothesis (h_G):
o Set hGh_GhG to the maximally general hypothesis where all attributes are '?'. This
means that hGh_GhG can classify any instance as positive.
o Initialize the most specific hypothesis (h_S):
o Set hSh_ShS to the maximally specific hypothesis where all attributes are set to 'null'
or the most specific values possible. This means that hSh_ShS does not classify any
instance as positive initially.
Step 2: Iterate through the training examples
importnumpyasnp
importpandasaspd
data=pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target=np.array(data.iloc[:,-1])
specific_h=concepts[0].copy()
print("initializationofspecific_handgeneral_h")
print(specific_h)
general_h=[["?"foriinrange(len(specific_h))]foriinrange(len(specific_h))] print(general_h)
fori,hinenumerate(concepts): if
target[i] == "Yes":
forxinrange(len(specific_h)): if
h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x]='?'
iftarget[i] =="No":
forxinrange(len(specific_h)): if
h[x] != specific_h[x]:
general_h[x][x]=specific_h[x] else:
general_h[x][x]='?'
print("stepsofCandidateEliminationAlgorithm",i+1)
print(specific_h)
print("general_h",i+1,"\n")
print(general_h)
general_h.remove(['?','?','?','?','?','?'])
returnspecific_h,general_hs_final,g_final=learn(concepts,target)
print("FinalGeneral_h:",g_final,sep="\n")
OUTPUT:
['Cloudy''Cold''High''Strong''Warm''Change']
stepsofCandidateEliminationAlgorithm8
Specific_h 8
['?''?''?''Strong''?''?']
general_h8
FinalSpecific_h:
['?''?''?''Strong''?''?']
FinalGeneral_h:
[['?','?','?','Strong','?','?']]
RESULT:
To write a program to implement the decision tree based ID3 Algorithm using python script.
ALGORITHM:
PROGRAM:
importpandasaspd
importnumpyasnp
dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
range(len(elements))])
return entropy
defInfoGain(data,split_attribute_name,target_name="class"):
total_entropy=entropy(data[target_name])
vals,counts=np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])
Information_Gain=total_entropy-Weighted_Entropy
return Information_Gain
defID3(data,originaldata,features,target_attribute_name="class",
parent_node_class = None):
np.unique(data[target_attribute_name])[0]
elif len(data)==0:
returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]
eliflen(features)==0:
returnparent_node_class
else:
parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
item_values=[InfoGain(data,feature,target_attribute_name)forfeatureinfeatures] #Return
the information gain values for the features in the dataset
best_feature_index=np.argmax(item_values)
best_feature = features[best_feature_index]
tree= {best_feature:{}}
features=[iforiinfeaturesifi!=best_feature] for
value in np.unique(data[best_feature]):
value= value
sub_data = data.where(data[best_feature] == value).dropna()
subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)
tree=ID3(dataset,dataset,dataset.columns[:-1])
OUTPUT:
DisplayTree
{'outlook':{'Overcast':'Yes','Rain':{'wind':{'Strong':'No','Weak':'Yes'}},'Sunny':
{'humidity':{'High':'No','Normal':'Yes'}}}}
RESULT:
Thus the program to implement the decision tree based ID3 Algorithm using python script has
been implemented successfully
Ex.No.3
IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK USING
BACKPROPAGATION ALGORITHM
AIM:
ALGORITHM:
Step 1: Inputs X arrive through the preconnected path
Initialize the input data X.
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
Initialize weights randomly for the connections between layers.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer
Define the activation function (e.g., sigmoid) and the forward propagation function.
Step 4: Calculate the error in the outputs
Calculate the error (difference between the predicted output and the actual output).
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error
Implement backpropagation to update the weights.
Step 6: Repeat the process until the desired output is achieved
Iterate through the steps of forward propagation and backpropagation until the network
converges.
PROGRAM:
import numpy as np
# Input data (XOR problem)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
np.random.seed(1)
W1 = np.random.randn(input_layer_size, hidden_layer_size)
W2 = np.random.randn(hidden_layer_size, output_layer_size)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
def forward_propagation(X, W1, W2):
hidden_input = np.dot(X, W1)
hidden_output = sigmoid(hidden_input)
W2 += hidden_output.T.dot(output_delta) * learning_rate
W1 += X.T.dot(hidden_delta) * learning_rate
return W1, W2
# Training the neural network
epochs = 10000
for epoch in range(epochs):
hidden_output, final_output = forward_propagation(X, W1, W2)
W1, W2 = backpropagation(X, y, W1, W2, hidden_output, final_output)
OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual output:
[[ 0.92]
[ 0.86]
[ 0.89]]
PredictedOutput:
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]
RESULT:
Thus the implementation of backpropagation algorithm has been done successfully.
Ex.No. 4
AIM:
Write a program to implement the naïve bayesian classifier using python script.
ALGORITHM:
PROGRAM:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
fromsklearn.model_selectionimporttrain_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
fromsklearn.feature_extraction.textimportCountVectorizer
count_vect = CountVectorizer()
xtrain_dtm=count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
fromsklearn.naive_bayesimportMultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
fromsklearnimportmetrics
print('Accuracy metrics')
print('Accuracyoftheclassiferis',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:
Thedimensionsofthedataset(18, 2)
1. I love this sandwich
2. This is an amazing place
3. I feel very good about these beers
4. Thisismybestwork
5. Whatanawesomeview
6. Idonotlikethisrestaurant
7. Iamtiredof this stuff
8. Ican'tdealwiththis
9. He ismysworn enemy
10. Myboss is horrible
11. Thisisanawesome place
12. Idonotlikethetasteof thisjuice
13. Iloveto dance
14. Iamsickandtiredof this place
15. Whatagreatholiday
16. Thatisabadlocalitytostay
17. Wewillhavegoodfun tomorrow
18. I went tomyenemy'shosetoday
01
11
21
31
41
50
60
70
80
90
101
110
121
130
141
150
161
170
Name:labelnum,dtype:int64 (5,)
(13,)
(5,)
(13,)
Accuracymetrics
Accuracyoftheclassiferis0.8 Confusion matrix
[[31]
[01]]
RecallandPrecison 1.0
0.5
RESULT:
Thus the implementation of Naïve Bayesian Classifier algorithm has been done
successfully.
EX.NO.5
AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using
Python.
ALGORITHM:
Step3: Create a 2D array and append each document list into an array
Step4: Using a Set data structure, store all the keywords in a list.
Step5: Input the text to be classified by the user.
PROGRAM:
import csv
importrandom
import math
defloadCsv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
foriin range(len(dataset)):
#convertingstringsintonumbersforprocessing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
defsplitDataset(dataset,splitRatio):
#67%trainingsize
trainSize=int(len(dataset)*splitRatio);
trainSet = []
copy=list(dataset);
whilelen(trainSet)<trainSize:
#generateindices forthedatasetlistrandomlytopickelefortrainingdata
index=random.randrange(len(copy));
trainSet.append(copy.pop(index))
return[trainSet,copy]
defseparateByClass(dataset):
separated={}
#createsadictionaryofclasses1and0wherethevaluesaretheinstacnes belonging to
each class
foriinrange(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]]=[]
separated[vector[-1]].append(vector)
return separated
defmean(numbers):
return sum(numbers)/float(len(numbers))
defstdev(numbers):
avg= mean(numbers)
variance=sum([pow(x-avg,2)forxinnumbers])/float(len(numbers)-1) return
math.sqrt(variance)
defsummarize(dataset):
summaries=[(mean(attribute),stdev(attribute))forattributeinzip(*dataset)]; del
summaries[-1]
return summaries
defsummarizeByClass(dataset):
separated=separateByClass(dataset);
summaries = {}
forclassValue,instancesin separated.items():
#summariesisadicoftuples(mean,std)foreachclassvaluesummaries[class
Value] = summarize(instances)
return summaries
defcalculateProbability(x,mean,stdev):
exponent=math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
defcalculateClassProbabilities(summaries,inputVector):
probabilities= {}
forclassValue,classSummariesinsummaries.items():#classandattributeinformation as
mean and sd
probabilities[classValue]=1
foriin range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every
attributefor class 0 and 1 seperaely
x = inputVector[i] #testvector's first
attributeprobabilities[classValue]*=calculateProbability(x,mean,stdev
);#use
normaldist
returnprobabilities
defpredict(summaries,inputVector):
probabilities=calculateClassProbabilities(summaries,inputVector)
bestLabel, bestProb = None, -1
forclassValue,probabilityinprobabilities.items():#assignsthatclasswhichhashe
highestprob
ifbestLabelisNoneorprobability>bestProb: bestProb =
probability
bestLabel=classValue
return bestLabel
defgetPredictions(summaries,testSet):
predictions = []
foriin range(len(testSet)):
result=predict(summaries,testSet[i])
predictions.append(result)
returnpredictions
defgetAccuracy(testSet,predictions):
correct = 0
foriin range(len(testSet)):
iftestSet[i][-1]==predictions[i]: correct
+= 1
return(correct/float(len(testSet)))*100.0
defmain():
filename='5data.csv'splitRatio=0.67 dataset
= loadCsv(filename);
trainingSet,testSet=splitDataset(dataset,splitRatio)
print('Split{0}rowsintotrain={1}andtest={2}rows'.format(len(dataset),
len(trainingSet), len(testSet)))
#prepare model
summaries=summarizeByClass(trainingSet); #
test model
predictions=getPredictions(summaries,testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracyoftheclassifieris:{0}%'.format(accuracy)) main()
OUTPUT:
confusionmatrixisas
follows [[17 0 0]
[017 0]
[00 11]]
Accuracymetrics
precisionrecallf1-scoresupport
RESULT:
Thus the implementation of Naïve Bayesian Classifier model has been done successfully.
EX.NO:6 IMPLEMENTATION OF EM ALGORITHM TO CLUSTER
A SET OF DATA
AIM:
ALGORITHM:
Step1: Identify the variable in which these to f attributes are specified in the dataset
Step 2: Determine the domain of each variable to take from the set of values.
Step3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.
Step4: Determine the prior and conditional probability for each attribute.
Step5: Perform the inference on the module and determine them original probability.
PROGRAM:
Import numpy as np
From sk learn.clusterimportKMeans
import matplotlib.pyplot as plt
fromsklearn.mixtureimportGaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2=X['Speeding_Feature'].values
X=np.array(list(zip(x1,x2))).reshape(len(x1),2)
plt.plot()
plt.xlim([0,100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1,x2)
plt.show()
#code for EM
gmm=GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions=gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0],X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
import matplotlib.pyplot as plt1
kmeans=KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],color='black')
OUTPUT:
EMpredictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean:[[57.70629058 25.73574491][52.1204402222.46250453]
[46.4364858 39.43288647]]
Covariances[[[83.5187879614.926902][14.9269022.70846907]][[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54 27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40. ]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05 26. ]
[58.45 30. ] [43.42 23. ] [55.68 37. ] [55.15 18. ][[57.74090909 24.27272727] [48.6 38. ] [45.176
16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]
RESULT:
Thus the EM Algorithm to cluster a dataset has been implemented successfully.
EX.NO.7 IMPLEMENTATION OF K-NEAREST NEIGHBOUR ALGORITHM
TO CLASSIFY IRIS DATASET
AIM:
Write a program to implement the K-Nearest Neighbour Algorithm to classify the Dataset
using python.
ALGORITHM:
Step1: Start the Program
Step2: Importing the Modules.
Step3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.
PROGRAM:
Import numpy as np
import p andasaspd
fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.model_selection import train_test_split
fromsklearnimport metrics
fromsklearn.datasetsimportload_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
X=df
y=iris['target']
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier=KNeighborsClassifier(n_neighbors=3).fit(Xtrain,ytrain)
ypred = classifier.predict(Xtest)
i= 0
print("\n ------------------------------------------------------------------------- ")
print('%-25s%-25s%-25s'%('OriginalLabel','PredictedLabel','Correct/Wrong')) print
(" ------------------------------------------------------------------------------- ")
forlabel inytest:
print('%-25s%-25s'%(label,ypred[i]),end="") if
(label == ypred[i]):
print('%-25s'%('Correct')) else:
print('%-25s'%('Wrong'))
i=i+1
print("-------------------------------------------------------------------------- ")
print("\nConfusionMatrix:\n",metrics.confusion_matrix(ytest,ypred))
print("-------------------------------------------------------------------------- ")
print("\nClassificationReport:\n",metrics.classification_report(ytest,ypred))
print (" ------------------------------------------------------------------------- ")
print('Accuracyoftheclassiferis%0.2f'%metrics.accuracy_score(ytest,ypred))
print (" ------------------------------------------------------------------------- ")
OUTPUT:
0123
5.13.51.40.2
14.93.01.40.2
24.73.21.30.2
34.63.11.50.2
45.03.61.40.2
Accuracyoftheclassiferis1.00\n
RESULT:
Thus the K-Nearest Neighbour Algorithm to classify the dataset using Python has been
implemented successfully.
EX.NO:8
IMPLEMENTATION OF NON-PARAMETRIC LOCALLY WEIGHTED
REGRESSION ALGORITHM
AIM:
ALGORITHM:
PROGRAM:
Import matplotlib.pyplotasplt
import pandas as pd
Import numpy as np
def kernel(point,xmat,k):
m,n = np.shape(xmat)
weights=np.mat(np.eye((m)))
for j in range(m):
diff= point- X[j]
weights[j,j]=np.exp(diff*diff.T/(-2.0*k**2))
return weights
deflocalWeight(point,xmat,ymat,k): wei
= kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
deflocalWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred=np.zeros(m)
for i in range(m):
ypred[i]=xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
#loaddata points
data=pd.read_csv("/Users/HP/Downloads/10-dataset.csv") bill
= np.array(data.total_bill)
tip=np.array(data.tip)
#preparingandadd1inbill mbill
= np.mat(bill)
mtip=np.mat(tip)
m= np.shape(mbill)[1]
one=np.mat(np.ones(m))
X= np.hstack((one.T,mbill.T))
#setkhere
ypred=localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort= X[SortIndex][:,0]
fig= plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip,color='green')
ax.plot(xsort[:,1],ypred[SortIndex],color='red',linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()
OUTPUT:
RESULT:
Thus the non parametric locally weighted regression algorithm has been implemented
successfully.
Ex.No.9
Implementation of FIND-S algorithm
AIM:
Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
ALGORITHM:
1. Load Data set
2. Initialize h to the most specific hypothesis in H
3. For each positive training instance x
PROGRAM:
Import random
Import csv
def read_data(filename):
withopen(filename, 'r')as csvfile:
datareader=csv.reader(csvfile,deli
miter=',') traindata = []
for row in
datareader:
traindata.appen
d(row)
return(traindata)
h=['phi','phi','phi','phi','phi','phi'
data=read_data('finds.csv')
def isConsistent(h,d):
iflen(h)!=len(d)-1:
print('Numberofattributesarenotsameinhypothe
sis.') return False
else:
matched=0
fori in range(len(h)):
if((h[i]==d[i])|(h[i]=
'any')):
matched=matched+1
ifmatched==len(h):
returnTrue
else:
returnFalse
def makeConsistent(h,d):
fori in range(len(h)):
if((h[i]=='phi')):
h[i]=d[i]
elif(h[i]!=d[i]):
h[i]='any'
returnh
print('Begin:Hypothesis:',h)
print('==========================================')
ford in data:
ifd[len(d)-1]=='Yes':
if(isConsiste
nt(h,d))
: pass
else:
h=makeConsistent(h,d)
print ('Training data :',d)
print ('Updated
Hypothesis :',h)
print()
print(' ')
print('=======================================
===')
print('maximallysepcificdatasetEnd:Hypothesis :',h)
Output:
Trainingdata :['Sunny','Warm','Normal','Strong','Warm','Same','Yes']
UpdatedHypothesis :['any','any','any','Strong','Warm','any']
Trainingdata :['Sunny','Warm','High','Strong','Cool','Change','Yes']
UpdatedHypothesis :['any','any','any','Strong','any','any']
Trainingdata :['Overcast','Cool','Normal','Strong','Warm','Same','Yes']
UpdatedHypothesis :['any','any','any','Strong','any','any']
==========================================
maximallysepcificdatasetEnd:Hypothesis:['any','any', 'any','Strong','any','any']
Result:
Theory
A Bayesian network is a directed acyclic graph in which each edge corresponds to a
conditional dependency, and each node corresponds to a unique random variable.
Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
The directed acyclic graph is a set of random variables represented by nodes.
The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
For illustration, consider the following example. Suppose we attempt to turn on our computer,
but the computer does not start (observation/evidence). We would like to know which of the
possible causes of computer failure is more likely. In this simplified illustration, we assume
only two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.
Fig: Directed acyclic graph representing two independent possible causes of a computer failure.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Data Set:
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
Value 0: normal
Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation
or depression of > 0.05 mV)
Value 2: showing probable or definite left ventricular hypertrophy by Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
Value 1: upsloping
Value 2: flat
Value 3: downsloping
12. ca = number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. Heartdisease: It is integer valued from 0 (no presence) to 4. Diagnosis of heart disease
(angiographic disease status)
Some instance from the dataset:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 0 3 0
62 0 4 140 268 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
PROGRAM:
import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimatorfrom pgmpy.inference
import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Few examples from the dataset are given below') print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
#Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:') HeartDisease_infer =
VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol print('\n 2. Probability of
HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'chol':100})
print(q['heartdisease'])
Output:
Few examples from the dataset are given below
age sex cp trestbps ...slope ca thal heartdisease
0 63 1 1 145 ... 3 0 6 0
1 67 1 4 160 ... 2 3 3 2
2 67 1 4 120 ... 2 2 7 1
3 37 1 3 130 ... 3 0 3 0
4 41 0 2 130 ... 1 0 3 0
[5 rows x 14 columns]
RESULT:
Thus the diagnosis of heart patients using standard Heart Disease Data Set has been
implemented successfully.