0% found this document useful (0 votes)
19 views22 pages

Outcome Based Lab Report

Uploaded by

subasaran2107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views22 pages

Outcome Based Lab Report

Uploaded by

subasaran2107
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Outcome based Laboratory Record

19IS707 - MACHINE LEARNING LABORATORY

B.E & INFORMATION SCIENCE AND ENGINEERING


Semester - VII

Academic Year 2022-2023

Submitted by
NAME – 191IG1__

BANNARI AMMAN INSTITUTE OF TECHNOLOGY


(An Autonomous Institution Affiliated to Anna University, Chennai)

SATHYAMANGALAM - 638 401

1
BANNARI AMMAN INSTITUTE OF TECHNOLOGY
(An Autonomous Institution Affiliated to Anna University, Chennai)

SATHYAMANGALAM - 638 401


DEPARTMENT OF
INFORMATION SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is a Certified Bonafide Record Book of Ms. ___NAME________ - 191IG1__ submitted


for Machine Learning Laboratory during the academic year 2022-2023.

Staff in - charge Head of the Department


Mr. MANU RAJU MS. NANDHINI S S

2
19IS707 - MACHINE LEARNING LABORATORY

Course Outcomes
1. Implement machine learning algorithms using Java or python.
2. Solve machine learning and problems relevant to machine learning.

3
Table of Contents
S. No. Name of the Experiment Page No. Marks Signature
Awarded

1 Experiment 1 5-12

1. Objective of the experiment 5

2. Introduction 5

3. Proposed Methodology 5-6

4. Algorithm and Coding 6-11

5. Output screenshot 11-12

6. Conclusion 12

7. Reference 12

2 Experiment 2 13- 22

1. Objective of the experiment 13

2. Introduction 13

3. Proposed Methodology 13-14

4. Algorithm and Coding 14-18

5. Output screenshot 19-21

6. Conclusion 22

7. Reference 22

4
EXPERIMENT 1

1.OBJECTIVE OF THE EXPERIMENT:


To Consider a set of training data examples and implement algorithms to find the most specific
hypothesis and set of all hypotheses that are consistent with the training examples.And to Implement
and demonstrate the FIND-S algorithm for finding the most hypothesis based on a given set of training
data samples. Implement and demonstrate the Candidate-Elimination algorithm to output a description
of the set of all hypotheses consistent with the training examples.

2.INTRODUCTION:
For the set of training data to implement algorithms most specific hypotheses are used; they
include FIND-S algorithm and Candidate elimination algorithm.
The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here
that the algorithm considers only those positive training examples. The find-S algorithm starts with the
most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed
positive training data.
The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly shrinks
the version space by removing the hypotheses that are inconsistent with the example. The candidate
elimination algorithm does this by updating the general and specific boundary for each new example.

● You can consider this as an extended form of Find-S algorithm.


● Consider both positive and negative examples.
● Actually, positive examples are used here as the Find-S algorithm (Basically they are
generalizing from the specification).
● While the negative example is specified from the generalized form.

3.PROPOSED METHODOLOGY:
● Find-S algorithm includes the idea of following concepts for learning those includes,
1. Concept Learning
Any algorithm that supports concept learning requires the following:
● Training Data
● Target Concept

5
● Actual Data Objects
2. General Hypothesis
Hypothesis, in general, is an explanation for something. The general hypothesis basically states
the general relationship between the major variables.
G = { ‘?’, ‘?’, ‘?’, …..’?’}
3. Specific Hypothesis
The specific hypothesis fills in all the important details about the variables given in the general
hypothesis.
S = {‘Φ’,’Φ’,’Φ’, ……,’Φ’}

● Candidate Elimination Algorithm:


A concept is a well-defined collection of objects. For example, the concept “a bird” encompasses all
the animals that are birds and includes no animal that isn’t a bird.
Each concept has a definition that fully describes all the concept’s members and applies to no objects
that belong to other concepts. Therefore, we can say that a concept is a boolean function defined over
a set of all possible objects and that it returns true only if a given object is a member of the concept.
Otherwise, it returns false.
In concept learning, we have a dataset of objects labeled as either positive or negative. The positive
ones are members of the target concept, and the negative ones aren’t. Our goal is to formulate a proper
concept function using the data, and the Candidate Elimination Algorithm (CEA) is a technique for
doing precisely so.

4.ALGORITHM AND CODING:


4.1 ALGORITHM
1. FIND-S ALGORITHM:
● Start with the most specific hypothesis.
h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}
● Take the next example and if it is negative, then no changes occur to the
hypothesis.
● If the example is positive and we find that our initial hypothesis is too specific
then we update our current hypothesis to a general condition.
● Keep repeating the above steps till all the training examples are complete.
● After we have completed all the training examples we will have the final
hypothesis when we can use it to classify the new examples.

6
2. CANDIDATE ELIMINATION ALGORITHM:
Step1: Load Data set
Step2: Initialize General Hypothesis and Specific Hypothesis.
Step3: For each training example
Step4: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step5: If example is Negative example
Make generalizations more specific.

4.2 CODING
○ Find-s algorithm
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# https://www.kaggle.com/imvickykumar999/find-s-algorithm-dataset
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
folder = '/kaggle/input/find-s-algorithm-dataset/'
file = '../input/find-s-algorithm-dataset/ws.csv'
with open(file, 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
type(reader)
a[0][:-1]
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)

for j in range(0,num_attributes):
hypothesis[j] = a[0][j]
7
hypothesis
hypothesis == a[0][:-1]
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
print(a[i][j], end=' ')
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print("\n\nFor Training instance No:{} the hypothesis is ".format(i), hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
a
import csv
num_attributes = 5
a = []
print("\n The Given Training Data Set \n")
with open('../input/tennis/tennis.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[1][j]
print("\n The a[1] value of hypothesis: ")
print(hypothesis)
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{} the hypothesis is ".format(i), hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)

● Candidate elimination algorithm:


8
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import random
import csv
def g_0(n):
return ("?",)*n
def s_0(n):
return ('0',)*n
def more_general(h1, h2):
more_general_parts = []
for x, y in zip(h1, h2):
mg = x == "?" or (x != "0" and (x == y or y == "0"))
more_general_parts.append(mg)
return all(more_general_parts)
l1 = [1, 2, 3]
l2 = [3, 4, 5]
list(zip(l1, l2))
# min_generalizations
def fulfills(example, hypothesis):
### the implementation is the same as for hypotheses:
return more_general(hypothesis, example)
def min_generalizations(h, x):
h_new = list(h)
for i in range(len(h)):
if not fulfills(x[i:i+1], h[i:i+1]):
h_new[i] = '?' if h[i] != '0' else x[i]
return [tuple(h_new)]
min_generalizations(h=('0', '0' , 'sunny'),
x=('rainy', 'windy', 'cloudy'))
def min_specializations(h, domains, x):
results = []
for i in range(len(h)):
if h[i] == "?":
for val in domains[i]:
if x[i] != val:
h_new = h[:i] + (val,) + h[i+1:]
results.append(h_new)
elif h[i] != "0":
h_new = h[:i] + ('0',) + h[i+1:]
results.append(h_new)
return results

9
min_specializations(h=('?', 'x',),
domains=[['a', 'b', 'c'], ['x', 'y']],
x=('b', 'x'))
with open('../input/wscecsv/wsce.csv') as csvFile:
examples = [tuple(line) for line in csv.reader(csvFile)]
#examples = [('sunny', 'warm', 'normal', 'strong', 'warm', 'same',True),
# ('sunny', 'warm', 'high', 'strong', 'warm', 'same',True),
# ('rainy', 'cold', 'high', 'strong', 'warm', 'change',False),
# ('sunny', 'warm', 'high', 'strong', 'cool', 'change',True)]
examples
def get_domains(examples):
d = [set() for i in examples[0]]
for x in examples:
for i, xi in enumerate(x):
d[i].add(xi)
return [list(sorted(x)) for x in d]
get_domains(examples)
def candidate_elimination(examples):
domains = get_domains(examples)[:-1]
G = set([g_0(len(domains))])
S = set([s_0(len(domains))])
i=0
print("\n G[{0}]:".format(i),G)
print("\n S[{0}]:".format(i),S)
for xcx in examples:
i=i+1
x, cx = xcx[:-1], xcx[-1] # Splitting data into attributes and decisions
if cx=='Y': # x is positive example
G = {g for g in G if fulfills(x, g)}
S = generalize_S(x, G, S)
else: # x is negative example
S = {s for s in S if not fulfills(x, s)}
G = specialize_G(x, domains, G, S)
print("\n G[{0}]:".format(i),G)
print("\n S[{0}]:".format(i),S)
return
def generalize_S(x, G, S):
S_prev = list(S)
for s in S_prev:
if s not in S:
continue
if not fulfills(x, s):
S.remove(s)
Splus = min_generalizations(s, x)
## keep only generalizations that have a counterpart in G

10
S.update([h for h in Splus if any([more_general(g,h)
for g in G])])
## remove hypotheses less specific than any other in S
S.difference_update([h for h in S if
any([more_general(h, h1)
for h1 in S if h != h1])])
return S
def specialize_G(x, domains, G, S):
G_prev = list(G)
for g in G_prev:
if g not in G:
continue
if fulfills(x, g):
G.remove(g)
Gminus = min_specializations(g, domains, x)
## keep only specializations that have a conuterpart in S
G.update([h for h in Gminus if any([more_general(h, s)
for s in S])])
## remove hypotheses less general than any other in G
G.difference_update([h for h in G if
any([more_general(g1, h)
for g1 in G if h != g1])])
return G
candidate_elimination(examples)

5.OUTPUT:
● FIND -S ALGORITHM

11
● CANDIDATE ELIMINATION ALGORITHM:

6.CONCLUSION:
Hence the given task is completed successfully. The course outcome of the experiment is
obtained through lab experiments.
7.REFERENCE:
● https://www.kaggle.com/code/imvickykumar999/candidate-elimination-algorithm
● https://www.kaggle.com/code/imvickykumar999/find-s-algorithm

12
EXPERIMENT 2
1.OBJECTIVE OF THE TASK :
To Write a program to construct a Bayesian network considering medical data.Use this model
to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. To Apply EM
algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using the
k-Means algorithm. To Write a program to implement k-Nearest Neighbour algorithm to classify the
iris data set. Apply reinforcement learning and develop a game of your own.

2.INTRODUCTION:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a
conditional dependency, and each node corresponds to a unique random variable. The Bayesian
network consists of two major parts: a directed acyclic graph and a set of conditional probability
distributions The directed acyclic graph is a set of random variables represented by nodes.

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled


dataset into different clusters. Here K defines the number of predefined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters.
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric,
supervised learning classifier, which uses proximity to make classifications or predictions about the
grouping of an individual data point.
Deep reinforcement learning (deep RL) is a subfield of machine learning that combines
reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent
learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution

3.PROPOSED METHODOLOGY:
There are two components involved in learning a Bayesian network: (i) structure learning,
which involves discovering the DAG that best describes the causal relationships in the data, and (ii)
parameter learning, which involves learning about the conditional probability distributions.
k-means clustering is a method of vector quantization, originally from signal processing, that
aims to partition n observations into k clusters in which each observation belongs to the cluster with
the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

13
KNN works by finding the distances between a query and all the examples in the data,
selecting the specified number examples (K) closest to the query, then votes for the most frequent label
(in the case of classification) or averages the labels (in the case of regression).
In model-based deep reinforcement learning algorithms, a forward model of the environment
dynamics is estimated, usually by supervised learning using a neural network. Then, actions are
obtained by using model predictive control using the learned model.

4.ALGORITHM AND CODING:


4.1ALGORITHM
1. BAYESIAN NETWORK:
A Bayesian network (BN) is a probabilistic graphical model for representing knowledge about
an uncertain domain where each node corresponds to a random variable and each edge
represents the conditional probability for the corresponding random variables [9]. BNs are also
called belief networks or Bayes nets.
2. K-MEANS ALGORITHM
The k-means clustering algorithm mainly performs two tasks:
● Determines the best value for K center points or centroids by an iterative process.

● Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
3. K-NN ALGORITHM:
The K-NN working can be explained on the basis of the below algorithm:

● Step-1: Select the number K of the neighbors

● Step-2: Calculate the Euclidean distance of K number of neighbors

● Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

● Step-4: Among these k neighbors, count the number of the data points in each category.

● Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.

● Step-6: Our model is ready.

4. DEEP REINFORCEMENT:
In model-based deep reinforcement learning algorithms, a forward model of the
environment dynamics is estimated, usually by supervised learning using a neural
network. Then, actions are obtained by using model predictive control using the learned

14
model. Since the true environment dynamics will usually diverge from the learned
dynamics, the agent re-plans often when carrying out actions in the environment.

4.2 CODING:
1. BAYESIAN NETWORK:
import numpy as np
import csv
import pandas as pd
from pgmpy.models
import BayesianModel from pgmpy.estimators
import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data print('Few examples from the dataset are given below')
print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),('sex','trestbps'),('exang','trestbps'),('trestbs'
,'heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),('heartdisease','thalach'),('heartdisea
se','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol
print('\n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100})
print(q['heartdisease'])

15
2. K-MEANS ALGORITHM
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
import sklearn.metrics as metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']


dataset = pd.read_csv("8-dataset.csv", names=names)
X = dataset.iloc[:, :-1]
label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}
y = [label[c] for c in dataset.iloc[:, -1]]
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])

# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])
print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))
print('The Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))

# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])
16
print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))
print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y, y_cluster_gmm))

3. K-NN ALGORITHM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Read dataset to pandas dataframe
dataset = pd.read_csv("9-dataset.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
17
print ("-------------------------------------------------------------------------")

4. DEEP REINFORCEMENT:
from kaggle_environments import make, evaluate
# Create the game environment
# Set debug=True to see the errors if your agent refuses to run
env = make("connectx", debug=True)
# List of available default agents
print(list(env.agents))
# Two random agents play one game round
env.run(["random", "random"])

# Show the game


env.render(mode="ipython")
# Defining agents
import random
import numpy as np
# Selects random valid column
def agent_random(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]

return random.choice(valid_moves)
# Selects middle column
def agent_middle(obs, config):
return config.columns//2
# Selects leftmost valid column
def agent_leftmost(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
return valid_moves[0]
<img src="https://i.imgur.com/kSYx4Nx.png" width=25%><br/> </center>
`obs.board` would be `[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 1, 2, 0, 0, 0,
0, 1, 1, 1, 0, 0, 0, 0, 2, 1, 2, 0, 2, 0]`.
# Agents play one game round
env.run([agent_leftmost, agent_random])
env.render(mode="ipython")
def get_win_percentages(agent1, agent2, n_rounds=100):
# Use default Connect Four setup
config = {'rows': 6, 'columns': 7, 'inarow': 4}
# Agent 1 goes first (roughly) half the time
outcomes = evaluate("connectx", [agent1, agent2], config, [], n_rounds//2) # Agent 2
goes first (roughly) half the time
outcomes += [[b,a] for [a,b] in evaluate("connectx", [agent2, agent1],
config, [], n_rounds-n_rounds//2)]
print("Agent 1 Win Percentage:",
np.round(outcomes.count([1,-1])/len(outcomes), 2))
print("Agent 2 Win Percentage:",
np.round(outcomes.count([-1,1])/len(outcomes), 2))
print("Number of Invalid Plays by Agent 1:", outcomes.count([None, 0])) print("Number
18
of Invalid Plays by Agent 2:", outcomes.count([0, None]))
get_win_percentages(agent1=agent_middle, agent2=agent_random)
get_win_percentages(agent1=agent_leftmost, agent2=agent_random)

5.OUTPUT:

1. BAYESIAN NETWORK:

2. K-MEANS ALGORITHM:

19
3. K-NN ALGORITHM:

20
4. DEEP REINFORCEMENT:

21
6. CONCLUSION:
Hence the given task is completed successfully. The course outcome of the experiment is
obtained through lab experiments.
7. REFERENCE:
● https://deepakdvallur.weebly.com/uploads/8/9/7/5/89758787/lab_program_7.pdf
● https://www.vtupulse.com/machine-learning/k-means-and-em-algorithm-in-python/
● https://www.vtupulse.com/machine-learning/k-nearest-neighbour-algorithm-in-python/
● https://towardsdatascience.com/ultimate-guide-for-reinforced-learning-part-1-creating-a-game-
956f1f2b0a91

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy