Outcome Based Lab Report
Outcome Based Lab Report
Submitted by
NAME – 191IG1__
1
BANNARI AMMAN INSTITUTE OF TECHNOLOGY
(An Autonomous Institution Affiliated to Anna University, Chennai)
BONAFIDE CERTIFICATE
2
19IS707 - MACHINE LEARNING LABORATORY
Course Outcomes
1. Implement machine learning algorithms using Java or python.
2. Solve machine learning and problems relevant to machine learning.
3
Table of Contents
S. No. Name of the Experiment Page No. Marks Signature
Awarded
1 Experiment 1 5-12
2. Introduction 5
6. Conclusion 12
7. Reference 12
2 Experiment 2 13- 22
2. Introduction 13
6. Conclusion 22
7. Reference 22
4
EXPERIMENT 1
2.INTRODUCTION:
For the set of training data to implement algorithms most specific hypotheses are used; they
include FIND-S algorithm and Candidate elimination algorithm.
The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here
that the algorithm considers only those positive training examples. The find-S algorithm starts with the
most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed
positive training data.
The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly shrinks
the version space by removing the hypotheses that are inconsistent with the example. The candidate
elimination algorithm does this by updating the general and specific boundary for each new example.
3.PROPOSED METHODOLOGY:
● Find-S algorithm includes the idea of following concepts for learning those includes,
1. Concept Learning
Any algorithm that supports concept learning requires the following:
● Training Data
● Target Concept
5
● Actual Data Objects
2. General Hypothesis
Hypothesis, in general, is an explanation for something. The general hypothesis basically states
the general relationship between the major variables.
G = { ‘?’, ‘?’, ‘?’, …..’?’}
3. Specific Hypothesis
The specific hypothesis fills in all the important details about the variables given in the general
hypothesis.
S = {‘Φ’,’Φ’,’Φ’, ……,’Φ’}
6
2. CANDIDATE ELIMINATION ALGORITHM:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalizations more specific.
4.2 CODING
○ Find-s algorithm
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# https://www.kaggle.com/imvickykumar999/find-s-algorithm-dataset
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
folder = '/kaggle/input/find-s-algorithm-dataset/'
file = '../input/find-s-algorithm-dataset/ws.csv'
with open(file, 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
type(reader)
a[0][:-1]
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j]
7
hypothesis
hypothesis == a[0][:-1]
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
print(a[i][j], end=' ')
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print("\n\nFor Training instance No:{} the hypothesis is ".format(i), hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
a
import csv
num_attributes = 5
a = []
print("\n The Given Training Data Set \n")
with open('../input/tennis/tennis.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[1][j]
print("\n The a[1] value of hypothesis: ")
print(hypothesis)
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{} the hypothesis is ".format(i), hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
9
min_specializations(h=('?', 'x',),
domains=[['a', 'b', 'c'], ['x', 'y']],
x=('b', 'x'))
with open('../input/wscecsv/wsce.csv') as csvFile:
examples = [tuple(line) for line in csv.reader(csvFile)]
#examples = [('sunny', 'warm', 'normal', 'strong', 'warm', 'same',True),
# ('sunny', 'warm', 'high', 'strong', 'warm', 'same',True),
# ('rainy', 'cold', 'high', 'strong', 'warm', 'change',False),
# ('sunny', 'warm', 'high', 'strong', 'cool', 'change',True)]
examples
def get_domains(examples):
d = [set() for i in examples[0]]
for x in examples:
for i, xi in enumerate(x):
d[i].add(xi)
return [list(sorted(x)) for x in d]
get_domains(examples)
def candidate_elimination(examples):
domains = get_domains(examples)[:-1]
G = set([g_0(len(domains))])
S = set([s_0(len(domains))])
i=0
print("\n G[{0}]:".format(i),G)
print("\n S[{0}]:".format(i),S)
for xcx in examples:
i=i+1
x, cx = xcx[:-1], xcx[-1] # Splitting data into attributes and decisions
if cx=='Y': # x is positive example
G = {g for g in G if fulfills(x, g)}
S = generalize_S(x, G, S)
else: # x is negative example
S = {s for s in S if not fulfills(x, s)}
G = specialize_G(x, domains, G, S)
print("\n G[{0}]:".format(i),G)
print("\n S[{0}]:".format(i),S)
return
def generalize_S(x, G, S):
S_prev = list(S)
for s in S_prev:
if s not in S:
continue
if not fulfills(x, s):
S.remove(s)
Splus = min_generalizations(s, x)
## keep only generalizations that have a counterpart in G
10
S.update([h for h in Splus if any([more_general(g,h)
for g in G])])
## remove hypotheses less specific than any other in S
S.difference_update([h for h in S if
any([more_general(h, h1)
for h1 in S if h != h1])])
return S
def specialize_G(x, domains, G, S):
G_prev = list(G)
for g in G_prev:
if g not in G:
continue
if fulfills(x, g):
G.remove(g)
Gminus = min_specializations(g, domains, x)
## keep only specializations that have a conuterpart in S
G.update([h for h in Gminus if any([more_general(h, s)
for s in S])])
## remove hypotheses less general than any other in G
G.difference_update([h for h in G if
any([more_general(g1, h)
for g1 in G if h != g1])])
return G
candidate_elimination(examples)
5.OUTPUT:
● FIND -S ALGORITHM
11
● CANDIDATE ELIMINATION ALGORITHM:
6.CONCLUSION:
Hence the given task is completed successfully. The course outcome of the experiment is
obtained through lab experiments.
7.REFERENCE:
● https://www.kaggle.com/code/imvickykumar999/candidate-elimination-algorithm
● https://www.kaggle.com/code/imvickykumar999/find-s-algorithm
12
EXPERIMENT 2
1.OBJECTIVE OF THE TASK :
To Write a program to construct a Bayesian network considering medical data.Use this model
to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. To Apply EM
algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using the
k-Means algorithm. To Write a program to implement k-Nearest Neighbour algorithm to classify the
iris data set. Apply reinforcement learning and develop a game of your own.
2.INTRODUCTION:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a
conditional dependency, and each node corresponds to a unique random variable. The Bayesian
network consists of two major parts: a directed acyclic graph and a set of conditional probability
distributions The directed acyclic graph is a set of random variables represented by nodes.
3.PROPOSED METHODOLOGY:
There are two components involved in learning a Bayesian network: (i) structure learning,
which involves discovering the DAG that best describes the causal relationships in the data, and (ii)
parameter learning, which involves learning about the conditional probability distributions.
k-means clustering is a method of vector quantization, originally from signal processing, that
aims to partition n observations into k clusters in which each observation belongs to the cluster with
the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
13
KNN works by finding the distances between a query and all the examples in the data,
selecting the specified number examples (K) closest to the query, then votes for the most frequent label
(in the case of classification) or averages the labels (in the case of regression).
In model-based deep reinforcement learning algorithms, a forward model of the environment
dynamics is estimated, usually by supervised learning using a neural network. Then, actions are
obtained by using model predictive control using the learned model.
● Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
3. K-NN ALGORITHM:
The K-NN working can be explained on the basis of the below algorithm:
● Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
● Step-4: Among these k neighbors, count the number of the data points in each category.
● Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
4. DEEP REINFORCEMENT:
In model-based deep reinforcement learning algorithms, a forward model of the
environment dynamics is estimated, usually by supervised learning using a neural
network. Then, actions are obtained by using model predictive control using the learned
14
model. Since the true environment dynamics will usually diverge from the learned
dynamics, the agent re-plans often when carrying out actions in the environment.
4.2 CODING:
1. BAYESIAN NETWORK:
import numpy as np
import csv
import pandas as pd
from pgmpy.models
import BayesianModel from pgmpy.estimators
import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data print('Few examples from the dataset are given below')
print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),('sex','trestbps'),('exang','trestbps'),('trestbs'
,'heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),('heartdisease','thalach'),('heartdisea
se','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol
print('\n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])
15
2. K-MEANS ALGORITHM
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])
# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])
print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))
print('The Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))
# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])
16
print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))
print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))
3. K-NN ALGORITHM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Read dataset to pandas dataframe
dataset = pd.read_csv("9-dataset.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
17
print ("-------------------------------------------------------------------------")
4. DEEP REINFORCEMENT:
from kaggle_environments import make, evaluate
# Create the game environment
# Set debug=True to see the errors if your agent refuses to run
env = make("connectx", debug=True)
# List of available default agents
print(list(env.agents))
# Two random agents play one game round
env.run(["random", "random"])
return random.choice(valid_moves)
# Selects middle column
def agent_middle(obs, config):
return config.columns//2
# Selects leftmost valid column
def agent_leftmost(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
return valid_moves[0]
<img src="https://i.imgur.com/kSYx4Nx.png" width=25%><br/> </center>
`obs.board` would be `[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 1, 2, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 2, 1, 2, 0, 2, 0]`.
# Agents play one game round
env.run([agent_leftmost, agent_random])
env.render(mode="ipython")
def get_win_percentages(agent1, agent2, n_rounds=100):
# Use default Connect Four setup
config = {'rows': 6, 'columns': 7, 'inarow': 4}
# Agent 1 goes first (roughly) half the time
outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2) # Agent 2
goes first (roughly) half the time
outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1],
config, [], n_rounds-n_rounds//2)]
print("Agent 1 Win Percentage:",
np.round(outcomes.count([1,-1])/len(outcomes), 2))
print("Agent 2 Win Percentage:",
np.round(outcomes.count([-1,1])/len(outcomes), 2))
print("Number of Invalid Plays by Agent 1:", outcomes.count([None, 0])) print("Number
18
of Invalid Plays by Agent 2:", outcomes.count([0, None]))
get_win_percentages(agent1=agent_middle, agent2=agent_random)
get_win_percentages(agent1=agent_leftmost, agent2=agent_random)
5.OUTPUT:
1. BAYESIAN NETWORK:
2. K-MEANS ALGORITHM:
19
3. K-NN ALGORITHM:
20
4. DEEP REINFORCEMENT:
21
6. CONCLUSION:
Hence the given task is completed successfully. The course outcome of the experiment is
obtained through lab experiments.
7. REFERENCE:
● https://deepakdvallur.weebly.com/uploads/8/9/7/5/89758787/lab_program_7.pdf
● https://www.vtupulse.com/machine-learning/k-means-and-em-algorithm-in-python/
● https://www.vtupulse.com/machine-learning/k-nearest-neighbour-algorithm-in-python/
● https://towardsdatascience.com/ultimate-guide-for-reinforced-learning-part-1-creating-a-game-
956f1f2b0a91
22