0% found this document useful (0 votes)
13 views27 pages

Ad3461-ML Manual

The document outlines the implementation of various machine learning algorithms using Python, including the Candidate Elimination Algorithm, Decision Tree based ID3 Algorithm, Backpropagation Algorithm for Neural Networks, and Naïve Bayesian Classifier. Each section provides an aim, algorithm steps, program code, and results demonstrating successful implementation. The document serves as a comprehensive lab manual for machine learning practices.

Uploaded by

sriashwabala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Ad3461-ML Manual

The document outlines the implementation of various machine learning algorithms using Python, including the Candidate Elimination Algorithm, Decision Tree based ID3 Algorithm, Backpropagation Algorithm for Neural Networks, and Naïve Bayesian Classifier. Each section provides an aim, algorithm steps, program code, and results demonstrating successful implementation. The document serves as a comprehensive lab manual for machine learning practices.

Uploaded by

sriashwabala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

AD3461

Machine Learning Lab


Manual
ExNo:1 IMPLEMENTATION OF CANDIDATE ELIMINATION ALGORITHM
AIM:

Write a program to implement the Candidate Elimination Algorithm using python script.

ALGORITHM:
Step 1: Initialize the version space
o Initialize the most general hypothesis (h_G):
o Set hGh_GhG to the maximally general hypothesis where all attributes are '?'. This
means that hGh_GhG can classify any instance as positive.
o Initialize the most specific hypothesis (h_S):
o Set hSh_ShS to the maximally specific hypothesis where all attributes are set to 'null'
or the most specific values possible. This means that hSh_ShS does not classify any
instance as positive initially.
Step 2: Iterate through the training examples

 For each positive example, update hGh_GhG and hSh_ShS as follows:


o For each attribute that does not match the positive example, make it more
specific in hGh_GhG.
o For each attribute that matches the positive example, make it more
specific in hSh_ShS.

 For each negative example, update hGh_GhG and hSh_ShS as follows:


o For each attribute that does not match the negative example, make it more
specific in hSh_ShS.
o For each attribute that matches the negative example, make it more
specific in hGh_GhG.
Step 3: Refine the version space
 Remove any hypothesis from the version space that is more general than another
hypothesis or more specific than another hypothesis.
Step 4: Repeat Steps 2 and 3 until convergence
 Keep iterating through the training examples and refining the version space until
it becomes consistent. This means the version space should contain only one
specific hypothesis that correctly classifies all the training examples.
Step 5: Output the final hypothesis
 Output the final hypothesis which is the one specific hypothesis that correctly
classifies all the training examples.
PROGRAM:

importnumpyasnp

importpandasaspd

data=pd.DataFrame(data=pd.read_csv('finds1.csv'))

concepts = np.array(data.iloc[:,0:-1])

target=np.array(data.iloc[:,-1])

def learn(concepts, target):

specific_h=concepts[0].copy()

print("initializationofspecific_handgeneral_h")

print(specific_h)

general_h=[["?"foriinrange(len(specific_h))]foriinrange(len(specific_h))] print(general_h)

fori,hinenumerate(concepts): if

target[i] == "Yes":

forxinrange(len(specific_h)): if

h[x] != specific_h[x]:

specific_h[x] = '?'

general_h[x][x]='?'

iftarget[i] =="No":

forxinrange(len(specific_h)): if

h[x] != specific_h[x]:

general_h[x][x]=specific_h[x] else:

general_h[x][x]='?'

print("stepsofCandidateEliminationAlgorithm",i+1)

print("Specific_h ",i+1,"\n ")

print(specific_h)

print("general_h",i+1,"\n")

print(general_h)

indices=[I for i,v alin enumerate(general_h)ifval== ['?','?','?','?','?','?']


For I in indices:

general_h.remove(['?','?','?','?','?','?'])

returnspecific_h,general_hs_final,g_final=learn(concepts,target)

print("Final Specific_h:", s_final, sep="\n")

print("FinalGeneral_h:",g_final,sep="\n")

OUTPUT:

initialization of specific_h and general_h

['Cloudy''Cold''High''Strong''Warm''Change']

[['?','?','?','?','?','?'],['?','?','?','?','?','?'],['?','?','?','?','?','?'],['?','?','?','?','?','?'],['?','?','?', '?', '?', '?'], ['?',

'?', '?', '?', '?', '?']]

stepsofCandidateEliminationAlgorithm8

Specific_h 8

['?''?''?''Strong''?''?']

general_h8

[['?','?','?','?','?','?'],['?','?','?','?','?','?'],['?','?','?','?','?','?'],['?','?','?','Strong','?','?'],['?', '?', '?', '?', '?',

'?'], ['?', '?', '?', '?', '?', '?']]

FinalSpecific_h:

['?''?''?''Strong''?''?']

FinalGeneral_h:

[['?','?','?','Strong','?','?']]

RESULT:

Thus the implementation candidate-Elimination algorithm has been implemented


successfully.
Ex.No:2 IMPLEMENTATION OF DECISION TREE BASED ID3 ALGORITHM
AIM:

To write a program to implement the decision tree based ID3 Algorithm using python script.

ALGORITHM:

Step 1: Start the program


 Initialize your programming environment and set up necessary libraries (e.g., Python
with libraries like pandas and numpy).
Step 2: Load the dataset and organize it into a table
 Load the dataset into a data structure such as a DataFrame using pandas.
 Ensure that rows represent instances and columns represent features.
 The last column should contain the class labels.
Step 3: Define a function to calculate the entropy of the dataset
 Entropy measures the uncertainty or impurity in the dataset.
Step 4: For each feature, calculate the information gain
 Information gain measures how much a feature reduces the uncertainty in the dataset.
Step 5: Select the feature with the highest information gain
 Iterate through each feature and calculate the information gain.
 Select the feature with the highest information gain.
Step 6: Divide the dataset into subsets based on the best feature
 Split the dataset into subsets where each subset corresponds to a unique value of the
selected feature.
Step 7: Repeat Recursively
 Continue the process recursively for each subset until a stopping criterion is met (e.g., all
instances in a subset belong to the same class or no features remain).
Step 8: Build the decision tree
 Construct the decision tree by assigning the best feature as the splitting criterion at each
internal node.
 Assign the majority class as the class label for each leaf node.
Step 9: Use the created decision tree to classify new instances
 Traverse the tree from the root to the appropriate leaf node based on feature values of the
new instance.
Step 10: Evaluate the Model
 Test the decision tree on a test set or use cross-validation to evaluate its performance
Step 11: Stop the program

PROGRAM:

importpandasaspd

importnumpyasnp

dataset=pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])

def entropy(target_col):

elements,counts = np.unique(target_col,return_counts = True)


entropy=np.sum([(counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts))foriin

range(len(elements))])

return entropy

defInfoGain(data,split_attribute_name,target_name="class"):

total_entropy=entropy(data[target_name])

vals,counts=np.unique(data[split_attribute_name],return_counts=True)

Weighted_Entropy=np.sum([counts[i]/np.sum(counts))*entropy(data.where(data[split_attrib
ute_name]==vals[i].dropna()[target_name]) for i in range(len(vals))])

Information_Gain=total_entropy-Weighted_Entropy

return Information_Gain

defID3(data,originaldata,features,target_attribute_name="class",

parent_node_class = None):

iflen(np.unique(data[target_attribute_name])) <=1: return

np.unique(data[target_attribute_name])[0]

elif len(data)==0:

returnnp.unique(originaldata[target_attribute_name])
[np.argmax(np.uniqe(originaldata[target_attribute_name],return_counts=True)[1])]

eliflen(features)==0:

returnparent_node_class

else:

parent_node_classnp.unique(data[target_attribute_name])
[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]

item_values=[InfoGain(data,feature,target_attribute_name)forfeatureinfeatures] #Return
the information gain values for the features in the dataset

best_feature_index=np.argmax(item_values)

best_feature = features[best_feature_index]

tree= {best_feature:{}}

features=[iforiinfeaturesifi!=best_feature] for

value in np.unique(data[best_feature]):

value= value
sub_data = data.where(data[best_feature] == value).dropna()

subtree=ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)

tree[best_feature][value] = subtree

return(tree)

tree=ID3(dataset,dataset,dataset.columns[:-1])

print(' \nDisplay Tree\n',tree)

OUTPUT:

DisplayTree

{'outlook':{'Overcast':'Yes','Rain':{'wind':{'Strong':'No','Weak':'Yes'}},'Sunny':

{'humidity':{'High':'No','Normal':'Yes'}}}}

RESULT:

Thus the program to implement the decision tree based ID3 Algorithm using python script has
been implemented successfully
Ex.No.3
IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK USING
BACKPROPAGATION ALGORITHM

AIM:

To write a program to implement artificial neural network apply backPropagation Algorithm


using python script.

ALGORITHM:
Step 1: Inputs X arrive through the preconnected path
 Initialize the input data X.
Step 2: The input is modeled using true weights W. Weights are usually chosen randomly.
 Initialize weights randomly for the connections between layers.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer to the
output layer
 Define the activation function (e.g., sigmoid) and the forward propagation function.
Step 4: Calculate the error in the outputs
 Calculate the error (difference between the predicted output and the actual output).
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the
error
 Implement backpropagation to update the weights.
Step 6: Repeat the process until the desired output is achieved
Iterate through the steps of forward propagation and backpropagation until the network
converges.

PROGRAM:
import numpy as np
# Input data (XOR problem)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Initialize weights randomly


input_layer_size = X.shape[1]
hidden_layer_size = 2
output_layer_size = y.shape[1]

np.random.seed(1)
W1 = np.random.randn(input_layer_size, hidden_layer_size)
W2 = np.random.randn(hidden_layer_size, output_layer_size)

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)
def forward_propagation(X, W1, W2):
hidden_input = np.dot(X, W1)
hidden_output = sigmoid(hidden_input)

final_input = np.dot(hidden_output, W2)


final_output = sigmoid(final_input)

return hidden_output, final_output

def calculate_error(y, output):


return y - output

def backpropagation(X, y, W1, W2, hidden_output, final_output, learning_rate=0.1):


output_error = calculate_error(y, final_output)
output_delta = output_error * sigmoid_derivative(final_output)
hidden_error = output_delta.dot(W2.T)
hidden_delta = hidden_error * sigmoid_derivative(hidden_output)

W2 += hidden_output.T.dot(output_delta) * learning_rate
W1 += X.T.dot(hidden_delta) * learning_rate
return W1, W2
# Training the neural network
epochs = 10000
for epoch in range(epochs):
hidden_output, final_output = forward_propagation(X, W1, W2)
W1, W2 = backpropagation(X, y, W1, W2, hidden_output, final_output)

# Output after training


_, final_output = forward_propagation(X, W1, W2)
print("Output after training:")
print(final_output)

OUTPUT:
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual output:

[[ 0.92]
[ 0.86]
[ 0.89]]

PredictedOutput:

[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]

RESULT:
Thus the implementation of backpropagation algorithm has been done successfully.
Ex.No. 4

IMPLEMENTATION OF NAÏVE BAYESIAN CLASSIFIER

AIM:

Write a program to implement the naïve bayesian classifier using python script.

ALGORITHM:

Step1: Data Pre-processing step

Step2: Fitting Naïve Bayes to the Training set

Step3: Predicting the test result

Step4: Test accuracy of the result (Creation of Confusion matrix)

Step5: Visualizing the test set result.

PROGRAM:

import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
fromsklearn.model_selectionimporttrain_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
fromsklearn.feature_extraction.textimportCountVectorizer
count_vect = CountVectorizer()
xtrain_dtm=count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
fromsklearn.naive_bayesimportMultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
fromsklearnimportmetrics
print('Accuracy metrics')
print('Accuracyoftheclassiferis',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
OUTPUT:
Thedimensionsofthedataset(18, 2)
1. I love this sandwich
2. This is an amazing place
3. I feel very good about these beers
4. Thisismybestwork
5. Whatanawesomeview
6. Idonotlikethisrestaurant
7. Iamtiredof this stuff
8. Ican'tdealwiththis
9. He ismysworn enemy
10. Myboss is horrible
11. Thisisanawesome place
12. Idonotlikethetasteof thisjuice
13. Iloveto dance
14. Iamsickandtiredof this place
15. Whatagreatholiday
16. Thatisabadlocalitytostay
17. Wewillhavegoodfun tomorrow
18. I went tomyenemy'shosetoday

Name: message, dtype: object

01
11
21
31
41
50
60
70
80
90
101
110
121
130
141
150
161
170
Name:labelnum,dtype:int64 (5,)
(13,)
(5,)
(13,)
Accuracymetrics
Accuracyoftheclassiferis0.8 Confusion matrix
[[31]
[01]]
RecallandPrecison 1.0
0.5

RESULT:
Thus the implementation of Naïve Bayesian Classifier algorithm has been done
successfully.
EX.NO.5

IMPLEMENTATION OF NAÏVE BAYESIAN CLASSIFIER MODEL TO


CLASSIFY A SET OF DOCUMENTS

AIM:
To implement the Naïve Bayesian Classifier Model to Classify the document set using
Python.
ALGORITHM:

Step1: Input the total Number of Documents from the user.


Step2: Input the text and class of each document and split it into a List.

Step3: Create a 2D array and append each document list into an array
Step4: Using a Set data structure, store all the keywords in a list.
Step5: Input the text to be classified by the user.

PROGRAM:
import csv
importrandom
import math

defloadCsv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
foriin range(len(dataset)):
#convertingstringsintonumbersforprocessing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

defsplitDataset(dataset,splitRatio):
#67%trainingsize
trainSize=int(len(dataset)*splitRatio);
trainSet = []
copy=list(dataset);
whilelen(trainSet)<trainSize:
#generateindices forthedatasetlistrandomlytopickelefortrainingdata
index=random.randrange(len(copy));
trainSet.append(copy.pop(index))
return[trainSet,copy]
defseparateByClass(dataset):
separated={}
#createsadictionaryofclasses1and0wherethevaluesaretheinstacnes belonging to
each class
foriinrange(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]]=[]
separated[vector[-1]].append(vector)
return separated

defmean(numbers):
return sum(numbers)/float(len(numbers))

defstdev(numbers):
avg= mean(numbers)
variance=sum([pow(x-avg,2)forxinnumbers])/float(len(numbers)-1) return
math.sqrt(variance)
defsummarize(dataset):
summaries=[(mean(attribute),stdev(attribute))forattributeinzip(*dataset)]; del
summaries[-1]
return summaries

defsummarizeByClass(dataset):
separated=separateByClass(dataset);
summaries = {}
forclassValue,instancesin separated.items():
#summariesisadicoftuples(mean,std)foreachclassvaluesummaries[class
Value] = summarize(instances)
return summaries

defcalculateProbability(x,mean,stdev):
exponent=math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
defcalculateClassProbabilities(summaries,inputVector):
probabilities= {}
forclassValue,classSummariesinsummaries.items():#classandattributeinformation as
mean and sd
probabilities[classValue]=1
foriin range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every
attributefor class 0 and 1 seperaely
x = inputVector[i] #testvector's first
attributeprobabilities[classValue]*=calculateProbability(x,mean,stdev
);#use
normaldist
returnprobabilities

defpredict(summaries,inputVector):
probabilities=calculateClassProbabilities(summaries,inputVector)
bestLabel, bestProb = None, -1
forclassValue,probabilityinprobabilities.items():#assignsthatclasswhichhashe
highestprob
ifbestLabelisNoneorprobability>bestProb: bestProb =
probability
bestLabel=classValue
return bestLabel

defgetPredictions(summaries,testSet):
predictions = []
foriin range(len(testSet)):
result=predict(summaries,testSet[i])
predictions.append(result)
returnpredictions
defgetAccuracy(testSet,predictions):
correct = 0
foriin range(len(testSet)):
iftestSet[i][-1]==predictions[i]: correct
+= 1
return(correct/float(len(testSet)))*100.0
defmain():
filename='5data.csv'splitRatio=0.67 dataset
= loadCsv(filename);

trainingSet,testSet=splitDataset(dataset,splitRatio)
print('Split{0}rowsintotrain={1}andtest={2}rows'.format(len(dataset),
len(trainingSet), len(testSet)))
#prepare model
summaries=summarizeByClass(trainingSet); #
test model
predictions=getPredictions(summaries,testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracyoftheclassifieris:{0}%'.format(accuracy)) main()

OUTPUT:
confusionmatrixisas
follows [[17 0 0]
[017 0]
[00 11]]
Accuracymetrics
precisionrecallf1-scoresupport

0 1.00 1.00 1.00 17


1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
avg/total 1.00 1.00 1.00 45

RESULT:
Thus the implementation of Naïve Bayesian Classifier model has been done successfully.
EX.NO:6 IMPLEMENTATION OF EM ALGORITHM TO CLUSTER
A SET OF DATA
AIM:

Write a program to implement EM algorithm to cluster a dataset using python.

ALGORITHM:
Step1: Identify the variable in which these to f attributes are specified in the dataset

Step 2: Determine the domain of each variable to take from the set of values.

Step3: Create a directed graph network or node where each node represents the attributes
and edges represents child relationship.

Step4: Determine the prior and conditional probability for each attribute.

Step5: Perform the inference on the module and determine them original probability.

PROGRAM:
Import numpy as np

From sk learn.clusterimportKMeans
import matplotlib.pyplot as plt
fromsklearn.mixtureimportGaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2=X['Speeding_Feature'].values
X=np.array(list(zip(x1,x2))).reshape(len(x1),2)
plt.plot()
plt.xlim([0,100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1,x2)
plt.show()

#code for EM
gmm=GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions=gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0],X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
import matplotlib.pyplot as plt1
kmeans=KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],color='black')

OUTPUT:
EMpredictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean:[[57.70629058 25.73574491][52.1204402222.46250453]
[46.4364858 39.43288647]]
Covariances[[[83.5187879614.926902][14.9269022.70846907]][[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938] [29.55835938 18.17157304]]] [[71.24 28. ] [52.53 25. ] [64.54 27. ]
[55.69 22. ] [54.58 25. ] [41.91 10. ] [58.64 20. ] [52.02 8. ] [31.25 34. ] [44.31 19. ] [49.35 40. ]
[58.07 45. ] [44.22 22. ] [55.73 19. ] [46.63 43. ] [52.97 32. ] [46.25 35. ] [51.55 27. ] [57.05 26. ]
[58.45 30. ] [43.42 23. ] [55.68 37. ] [55.15 18. ][[57.74090909 24.27272727] [48.6 38. ] [45.176
16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]
RESULT:
Thus the EM Algorithm to cluster a dataset has been implemented successfully.
EX.NO.7 IMPLEMENTATION OF K-NEAREST NEIGHBOUR ALGORITHM
TO CLASSIFY IRIS DATASET

AIM:
Write a program to implement the K-Nearest Neighbour Algorithm to classify the Dataset
using python.

ALGORITHM:
Step1: Start the Program
Step2: Importing the Modules.
Step3: Creating dataset, scikit_learn has a lot of tools for creating synthetic datasets.

Step 4: Visualize the dataset

Step5: Splitting data into training and testing dataset.


Step6: Build a KNN classifier object for the implementation.
Step7: Predictions for the KNN Classifier, then in the test set, we forecast the target
values and compare them to the actual values.
Step8: Predict Accuracy for both K-values

Step 9: Visualize Predictions

Step10: Stop the Program.

PROGRAM:
Import numpy as np
import p andasaspd
fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.model_selection import train_test_split
fromsklearnimport metrics
fromsklearn.datasetsimportload_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
X=df
y=iris['target']
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier=KNeighborsClassifier(n_neighbors=3).fit(Xtrain,ytrain)
ypred = classifier.predict(Xtest)
i= 0
print("\n ------------------------------------------------------------------------- ")
print('%-25s%-25s%-25s'%('OriginalLabel','PredictedLabel','Correct/Wrong')) print
(" ------------------------------------------------------------------------------- ")
forlabel inytest:
print('%-25s%-25s'%(label,ypred[i]),end="") if
(label == ypred[i]):
print('%-25s'%('Correct')) else:
print('%-25s'%('Wrong'))
i=i+1
print("-------------------------------------------------------------------------- ")
print("\nConfusionMatrix:\n",metrics.confusion_matrix(ytest,ypred))
print("-------------------------------------------------------------------------- ")
print("\nClassificationReport:\n",metrics.classification_report(ytest,ypred))
print (" ------------------------------------------------------------------------- ")
print('Accuracyoftheclassiferis%0.2f'%metrics.accuracy_score(ytest,ypred))
print (" ------------------------------------------------------------------------- ")

OUTPUT:
0123
5.13.51.40.2
14.93.01.40.2
24.73.21.30.2
34.63.11.50.2
45.03.61.40.2

OriginalLabel PredictedLabel Correct/Wrong


0 2 Correct
1 1 Correct
2 2 Correct
3 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
2 2 Correct
0 0 Correct
0 0 Correct
0 0 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
ConfusionMatrix:
[[500]
[050]
[005]]
Classi昀椀cationReport:
precisionrecallf1-scoresupport
0 1.00 1.00 1.00 5
1 1.00 1.00 1.00 5
2 1.00 1.00 1.00 5
accuracy 1.00 15
macroavg 1.00 1.00 1.00 15
weightedavg 1.00 1.00 1.00 15

Accuracyoftheclassiferis1.00\n

RESULT:
Thus the K-Nearest Neighbour Algorithm to classify the dataset using Python has been
implemented successfully.
EX.NO:8
IMPLEMENTATION OF NON-PARAMETRIC LOCALLY WEIGHTED
REGRESSION ALGORITHM
AIM:

To implement then non-parametric locally weighted regression algorithm using python.

ALGORITHM:

Step 1: Import Necessary Libraries

Step 2: Define the Kernel Function

Step 3: Fit the Model and Predict

Implement the LWR algorithm.

Step 4: Create a Function to Plot the Results

Step 5: Generate Some Data and Run the Algorithm

PROGRAM:

Import matplotlib.pyplotasplt
import pandas as pd
Import numpy as np

def kernel(point,xmat,k):
m,n = np.shape(xmat)
weights=np.mat(np.eye((m)))
for j in range(m):
diff= point- X[j]
weights[j,j]=np.exp(diff*diff.T/(-2.0*k**2))
return weights

deflocalWeight(point,xmat,ymat,k): wei
= kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

deflocalWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred=np.zeros(m)
for i in range(m):
ypred[i]=xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

#loaddata points
data=pd.read_csv("/Users/HP/Downloads/10-dataset.csv") bill
= np.array(data.total_bill)
tip=np.array(data.tip)
#preparingandadd1inbill mbill
= np.mat(bill)
mtip=np.mat(tip)

m= np.shape(mbill)[1]
one=np.mat(np.ones(m))
X= np.hstack((one.T,mbill.T))

#setkhere
ypred=localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort= X[SortIndex][:,0]

fig= plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip,color='green')
ax.plot(xsort[:,1],ypred[SortIndex],color='red',linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show()

OUTPUT:

RESULT:

Thus the non parametric locally weighted regression algorithm has been implemented
successfully.
Ex.No.9
Implementation of FIND-S algorithm
AIM:
Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.

ALGORITHM:
1. Load Data set
2. Initialize h to the most specific hypothesis in H
3. For each positive training instance x

For each attribute constraint ai in h


If the constraint ai in h is satisfied by x then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
4. Output hypothesis h

PROGRAM:
Import random
Import csv

def read_data(filename):
withopen(filename, 'r')as csvfile:
datareader=csv.reader(csvfile,deli
miter=',') traindata = []
for row in
datareader:
traindata.appen
d(row)
return(traindata)

h=['phi','phi','phi','phi','phi','phi'
data=read_data('finds.csv')
def isConsistent(h,d):
iflen(h)!=len(d)-1:
print('Numberofattributesarenotsameinhypothe
sis.') return False
else:
matched=0
fori in range(len(h)):
if((h[i]==d[i])|(h[i]=
'any')):
matched=matched+1
ifmatched==len(h):
returnTrue
else:
returnFalse
def makeConsistent(h,d):
fori in range(len(h)):
if((h[i]=='phi')):
h[i]=d[i]
elif(h[i]!=d[i]):
h[i]='any'
returnh
print('Begin:Hypothesis:',h)

print('==========================================')
ford in data:
ifd[len(d)-1]=='Yes':
if(isConsiste
nt(h,d))
: pass
else:
h=makeConsistent(h,d)
print ('Training data :',d)
print ('Updated
Hypothesis :',h)
print()
print(' ')
print('=======================================
===')
print('maximallysepcificdatasetEnd:Hypothesis :',h)

Output:

Begin:Hypothesis:['phi','phi','phi', 'phi','phi', 'phi']


==========================================
Training data :['Cloudy','Cold','High','Strong','Warm','Change','Yes'] Updated Hypothesis
: ['Cloudy', 'Cold', 'High', 'Strong', 'Warm', 'Change']

Trainingdata :['Sunny','Warm','Normal','Strong','Warm','Same','Yes']
UpdatedHypothesis :['any','any','any','Strong','Warm','any']

Trainingdata :['Sunny', 'Warm','High','Strong','Warm','Same','Yes']


UpdatedHypothesis :['any','any','any','Strong','Warm','any']

Trainingdata :['Sunny','Warm','High','Strong','Cool','Change','Yes']
UpdatedHypothesis :['any','any','any','Strong','any','any']

Trainingdata :['Overcast','Cool','Normal','Strong','Warm','Same','Yes']
UpdatedHypothesis :['any','any','any','Strong','any','any']

==========================================
maximallysepcificdatasetEnd:Hypothesis:['any','any', 'any','Strong','any','any']

Result:

Thus the FIND-S algorithm has been implemented successfully.


EX.NO:10
Construct a Bayesian network to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set
AIM:
Write a program to construct a Bayesian network considering medical data. Use this
modelto demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
You canuse Java/Python ML library classes/API

Theory
A Bayesian network is a directed acyclic graph in which each edge corresponds to a
conditional dependency, and each node corresponds to a unique random variable.

Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
 The directed acyclic graph is a set of random variables represented by nodes.
 The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).

For illustration, consider the following example. Suppose we attempt to turn on our computer,
but the computer does not start (observation/evidence). We would like to know which of the
possible causes of computer failure is more likely. In this simplified illustration, we assume
only two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.

Fig: Directed acyclic graph representing two independent possible causes of a computer failure.

The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Data Set:

Title: Heart Disease Databases


The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used
by ML researchers to this date. The "Heartdisease" field refers to the presence of heart disease
in the patient. It is integer valued from 0 (no presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303

Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
 Value 1: typical angina
 Value 2: atypical angina
 Value 3: non-anginal pain
 Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
 Value 0: normal
 Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation
or depression of > 0.05 mV)
 Value 2: showing probable or definite left ventricular hypertrophy by Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
 Value 1: upsloping
 Value 2: flat
 Value 3: downsloping
12. ca = number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. Heartdisease: It is integer valued from 0 (no presence) to 4. Diagnosis of heart disease
(angiographic disease status)
Some instance from the dataset:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heartdisease
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 0 3 0
62 0 4 140 268 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4

PROGRAM:

import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimatorfrom pgmpy.inference
import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Few examples from the dataset are given below') print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
#Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:') HeartDisease_infer =
VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol print('\n 2. Probability of
HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'chol':100})
print(q['heartdisease'])
Output:
Few examples from the dataset are given below
age sex cp trestbps ...slope ca thal heartdisease
0 63 1 1 145 ... 3 0 6 0
1 67 1 4 160 ... 2 3 3 2
2 67 1 4 120 ... 2 2 7 1
3 37 1 3 130 ... 3 0 3 0
4 41 0 2 130 ... 1 0 3 0
[5 rows x 14 columns]

Learning CPD using Maximum likelihood estimators


Inferencing with Bayesian Network:
1. Probability of HeartDisease given Age=28
╒════════════════╤═════════════════════╕
│ heartdisease │ phi(heartdisease) │
╞════════════════╪═════════════════════╡
│ heartdisease_0 │ 0.6791 │
├────────────────┼─────────────────────┤
│ heartdisease_1 │ 0.1212 │
├────────────────┼─────────────────────┤
│ heartdisease_2 │ 0.0810 │
├────────────────┼─────────────────────┤
│ heartdisease_3 │ 0.0939 │
├────────────────┼─────────────────────┤
│ heartdisease_4 │ 0.0247 │
╘════════════════╧═════════════════════╛

2. Probability of HeartDisease given cholesterol=100


╒════════════════╤═════════════════════╕
│ heartdisease │ phi(heartdisease) │
╞════════════════╪═════════════════════╡
│ heartdisease_0 │ 0.5400 │
├────────────────┼─────────────────────┤
│ heartdisease_1 │ 0.1533 │
├────────────────┼─────────────────────┤
│ heartdisease_2 │ 0.1303 │
├────────────────┼─────────────────────┤
│ heartdisease_3 │ 0.1259 │
├────────────────┼─────────────────────┤
│ heartdisease_4 │ 0.0506 │
╘════════════════╧═════════════════════╛

RESULT:

Thus the diagnosis of heart patients using standard Heart Disease Data Set has been
implemented successfully.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy