0% found this document useful (0 votes)
56 views30 pages

Name: Dhruvil K Kotecha ID No.: 17CP024 Sub. Code: CP-402 Sub. Name: ADT Semester: 7 Year: 2020/21

The document contains the code and output for 10 programs implementing various machine learning algorithms including decision trees, naive bayes, k-nearest neighbors, apriori, and k-means clustering. The first program reads in 100 random integer or float values, calculates statistics like mean, median, mode, range, variance and standard deviation. The second program performs normalization techniques like min-max, z-score and decimal scaling on a weather dataset. The third program applies a decision tree using information gain on a given dataset. It calculates the entropy at each node to determine the best attribute to split on.

Uploaded by

Dhruvil Kotecha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views30 pages

Name: Dhruvil K Kotecha ID No.: 17CP024 Sub. Code: CP-402 Sub. Name: ADT Semester: 7 Year: 2020/21

The document contains the code and output for 10 programs implementing various machine learning algorithms including decision trees, naive bayes, k-nearest neighbors, apriori, and k-means clustering. The first program reads in 100 random integer or float values, calculates statistics like mean, median, mode, range, variance and standard deviation. The second program performs normalization techniques like min-max, z-score and decimal scaling on a weather dataset. The third program applies a decision tree using information gain on a given dataset. It calculates the entropy at each node to determine the best attribute to split on.

Uploaded by

Dhruvil Kotecha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

17CP024 CP402

Name : Dhruvil K Kotecha


ID No. : 17CP024
Sub. Code : CP-402
Sub. Name : ADT
Semester : 7th
Year : 2020/21
Index
LAB 1 ......................................................................................................................2
AIM: WRITE A PROGRAM IN C++ TO READ RANDOM VALUE OF 100 INTEGER/FLOAT ARRAY AND FIND MEAN, MEDIAN,
MODE, RANGE, VARIANCE AND STANDARD DEVIATION. ................................................................................................... 2
AIM: WRITE A PROGRAM IN C++ TO PERFORM MIN-MAX NORMALIZATION,Z-SCORE AND DECIMAL SCALING
NORMALIZATION TECHNIQUE ON A GIVEN DATA-SET ....................................................................................................... 4

LAB 2 ......................................................................................................................8
AIM: WRITE A PROGRAM TO APPLY DECISION TREE METHOD USING INFORMATION GAIN ON GIVEN DATASET; ............. 8

LAB 3 ....................................................................................................................12
AIM: WRITE A PROGRAM TO APPLY DECISION TREE METHOD USING GINI INDEX ON GIVEN DATASET;......................... 12

LAB 4 ....................................................................................................................14
AIM: WRITE A PROGRAM TO PERFORM NAIVE-BAYES CLASSIFICATION ALGORITHM .................................................... 14

LAB 5 ....................................................................................................................16
AIM: WRITE A PROGRAM TO PERFORM K-NN CLASSIFICATION USING MIN- MAX NORMALIZATION. ............................. 16

LAB 6 ....................................................................................................................18
AIM: SIMPLE K-NN CLASSIFICATION ALGORITHM .......................................................................................................... 18

LAB 7 ....................................................................................................................18
AIM: WRITE A PROGRAM TO PERFORM APRIORI ALGORITHM......................................................................................... 19

LAB 8 ....................................................................................................................22
AIM: WRITE A PROGRAM TO PERFORM FP GROWTH ALGORITHM ................................................................................... 22

LAB 9 ....................................................................................................................26
AIM: WRITE A PROGRAM TO PERFORM K-MEANS CLUSTERING ALGORITHM .................................................................. 26

LAB 10 ..................................................................................................................29
AIM: WRITE A PROGRAM TO PERFORM K-MEDOIDS ALGORITHM .................................................................................... 29

1 |P a g e
17CP024 CP402

Lab 1 30/7/2020

Program 1
Aim: Write a program in C++ to read random value of 100 integer/float array
and find mean, median, mode, range, variance and standard deviation.
Code:
#include<iostream>
#include<math.h>
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n;
cin>>n;
int *a=new int [n];
int i;
for(i=0;i<n;i++)
a[i]=rand() % 100;
int sum=0;
for(i=0;i<n;i++)
sum+=a[i];
float mean= sum/n;
sort(a,a+n);
float median;

median=(float)(a[(n-1)/2] + a[n/2])/2.0;

cout<<"mean="<<mean<<endl;
cout<<"median"<<median<<endl;
int max = *max_element(a, a + n);
int min = *min_element(a,a+n);
int *cnt= new int [max+1];
for(i=0;i<max+1;i++)
{
cnt[i]=0;
}
for(i=0;i<max+1;i++)
cnt[a[i]]++;
int mode=*max_element(cnt,cnt+max+1);
cout<<"mode="<<mode<<endl;
float vari;
for (int i = 0; i < n; i++)
vari += (a[i] - mean) * (a[i] - mean);
float variance=vari/n;
float sd=sqrt(variance);
cout<<"variance="<<variance<<endl;
cout<<"standard deviation="<<sd<<endl;
cout<<"range="<<max-min;
}
2 |P a g e
17CP024 CP402

Output:

3 |P a g e
17CP024 CP402

Program 2

Aim: Write a program in C++ to perform min-max normalization,Z-score and


decimal scaling normalization technique on a given data-set

Code:

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <vector>
#include <sstream>
#include <string>
#include <cmath>
using namespace std;

int main(){
fstream fin, fout;
fin.open("D://sem7//cp402_lab//lab_1//road-weather-information-stations_
Final.csv", ios::in);
fout.open("Practical.csv", ios::out);
int i, n = 8004;
vector<string> row;
string line, word;
float m1=0, m2=0, std1, std2;
float min1=1000, min2=1000, max1=0, max2=0;

for(i=0;i<8005;i++){
row.clear();
getline(fin, line);

stringstream s(line);

while(getline(s, word, ',')){


row.push_back(word);
}

if (row[0] == "StationName"){
continue;
}
float temp = ::atof(row[6].c_str());
float temp1 = ::atof(row[7].c_str());
m1 += temp;
m2 += temp1;
if (temp1 < min1){
min1 = temp;
}
if (temp > max1){
4 |P a g e
17CP024 CP402
max1 = temp;
}
if (temp1 < min2){
min2 = temp1;
}
if (temp1 > max2){
max2 = temp1;
}
}
m1 = m1 / n;
m2 = m2 / n;
int div1=1, div2=1;
int x1 = static_cast<int>(max1);
int x2 = static_cast<int>(max2);

while (x1 > 0){


div1 *= 10;
x1 /= 10;
}

while (x2 > 0){


div2 *= 10;
x2 /= 10;
}
fin.close();

fin.open("road-weather-information-stations_Final.csv", ios::in);
float var1=0, var2=0;

for(i=0;i<8005;i++){
row.clear();
getline(fin, line);

stringstream s(line);

while(getline(s, word, ',')){


row.push_back(word);
}
if (row[0] == "StationName"){
continue;
}

float tmp = ::atof(row[6].c_str());


float tmp1 = ::atof(row[7].c_str());
var1 += (tmp-m1) * (tmp-m1);
var2 += (tmp-m2) * (tmp-m2);
}

std1 = sqrt(var1 / n);


std2 = sqrt(var2 / n);
float dif1 = max1 - min1;
5 |P a g e
17CP024 CP402
float dif2 = max2 - min2;

fin.close();

fin.open("road-weather-information-stations_Final.csv", ios::in);
for(i=0;i<8005;i++){
row.clear();
getline(fin, line);

stringstream s(line);

while(getline(s, word, ',')){


row.push_back(word);
}
if (row[0] == "StationName"){
continue;
}

float tp = ::atof(row[6].c_str());
float tp1 = ::atof(row[7].c_str());
fout<<row[6]<<",";
fout<<row[7]<<",";
fout<<(tp-min1)/dif1<<",";
fout<<(tp1-min2)/dif2<<",";
fout<<(tp-m1)/std1<<",";
fout<<(tp1-m2)/std2<<",";
fout<<tp/div1<<",";
fout<<tp1/div2<<"\n";
}

fin.close();
fout.close();
}

Output:

6 |P a g e
17CP024 CP402

7 |P a g e
17CP024 CP402

Lab 2 13/8/2020

Program
Aim: Write a program to apply decision tree method using Information gain on
given Dataset;

Code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import copy

dataset = pd.read_csv('dataset3.csv')
X = dataset.iloc[:, 1:].values
print(X)
attribute = ['outlook', 'temp', 'humidity', 'wind']

class Node(object):
def __init__(self):
self.value = None
self.decision = None
self.childs = None

def findEntropy(data, rows):


yes = 0
no = 0
ans = -1
idx = len(data[0]) - 1
entropy = 0
for i in rows:
if data[i][idx] == 'Yes':
yes = yes + 1
else:
no = no + 1

x = yes/(yes+no)
y = no/(yes+no)
if x != 0 and y != 0:
entropy = -1 * (x*math.log2(x) + y*math.log2(y))
if x == 1:
ans = 1
if y == 1:
ans = 0
return entropy, ans

8 |P a g e
17CP024 CP402

def findMaxGain(data, rows, columns):


maxGain = 0
retidx = -1
entropy, ans = findEntropy(data, rows)
if entropy == 0:
"""if ans == 1:
print("Yes")
else:
print("No")"""
return maxGain, retidx, ans

for j in columns:
mydict = {}
idx = j
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] = mydict[key] + 1
gain = entropy

# print(mydict)
for key in mydict:
yes = 0
no = 0
for k in rows:
if data[k][j] == key:
if data[k][-1] == 'Yes':
yes = yes + 1
else:
no = no + 1
# print(yes, no)
x = yes/(yes+no)
y = no/(yes+no)
# print(x, y)
if x != 0 and y != 0:
gain += (mydict[key] * (x*math.log2(x) + y*math.log2(y)))/14
# print(gain)
if gain > maxGain:
# print("hello")
maxGain = gain
retidx = j

return maxGain, retidx, ans

def buildTree(data, rows, columns):

maxGain, idx, ans = findMaxGain(X, rows, columns)


9 |P a g e
17CP024 CP402
root = Node()
root.childs = []
# print(maxGain
#
# )
if maxGain == 0:
if ans == 1:
root.value = 'Yes'
else:
root.value = 'No'
return root

root.value = attribute[idx]
mydict = {}
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] += 1

newcolumns = copy.deepcopy(columns)
newcolumns.remove(idx)
for key in mydict:
newrows = []
for i in rows:
if data[i][idx] == key:
newrows.append(i)
print(newrows)
temp = buildTree(data, newrows, newcolumns)
temp.decision = key
root.childs.append(temp)
return root

def traverse(root):
print(root.decision)
print(root.value)

n = len(root.childs)
if n > 0:
for i in range(0, n):
traverse(root.childs[i])

def calculate():
rows = [i for i in range(0, 14)]
columns = [i for i in range(0, 4)]
root = buildTree(X, rows, columns)
root.decision = 'Start'
traverse(root)

10 | P a g e
17CP024 CP402

calculate()

11 | P a g e
17CP024 CP402

Lab 3 20/8/2020

Program
Aim: Write a program to apply decision tree method using Gini Index on given
Dataset;

Code:
import pandas as pd
from collections import Counter
import math

def sgi(data_frame, attribute, tr):


data_len = len(data_frame)
sets = sub(data_frame[attribute])

gini_indexes = []
for set in sets:
index = 0
for s in set:
grp = data_frame.copy(deep=True)
grp = grp.set_index(attribute)
grp = grp.loc[s]
gini_d = (len(grp) / data_len) * (gi(grp[tr]))
index += gini_d
gini_indexes.append(index)

return min(gini_indexes)

def cart(data_frame, attributes, tr):


classes = Counter(c for c in data_frame[tr])

if len(classes) == 1:
return list(classes.keys())[0]
else:
ginis = [sgi(data_frame, attr, tr) for attr in attributes]
print(ginis)
min_gini_index = ginis.index(min(ginis))

root_node = attributes[min_gini_index]
print(root_node)
tree = {root_node: {}}
attributes.remove(root_node)

for sub_attr, sub_data in data_frame.groupby(root_node):


tree[root_node][sub_attr] = ct(sub_data, attributes, tr)

return tree

def sub(list):
12 | P a g e
17CP024 CP402
subs = []
classes = Counter(c for c in list)
elements = [x for x in classes.keys()]

for i in range(len(elements) + 1):


for j in range(i + 1, len(elements) + 1):
d1 = elements[i:j]
d2 = [v for v in elements if v not in d1]
if len(d1) != 0 and len(d1) != len(elements):
if not any(all(item in ele for item in [d1, d2]) for ele in
subs):
subs.append([d1, d2])
return subs
def gi(td):
classes = Counter(c for c in td) # Counter calculates the proportion of
classes
ri = len(td) * 1.0
probs = [math.pow(x / ri, 2) for x in classes.values()]
return 1 - sum([prob for prob in probs])
data = pd.read_csv("Gini_Index_Dataset.csv")
features = list(data.columns)[1:5]
print("List of features", features)

tr = "BUY COMPUTER"
df = pd.DataFrame(data)
decision_tree = ct(df, features, tr)
print(decision_tree)

Output:

13 | P a g e
17CP024 CP402

Lab 4 3/9/2020

Program

Aim: Write a program to perform Naive-Bayes classification algorithm

Code:
import pandas as pd
import numpy as np

dataset = pd.read_csv("nb.csv")
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
y = y.reshape(-1,1)
print(y)
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
X[:, 1] = labelencoder_X.fit_transform(X[:, 1])
X[:, 2] = labelencoder_X.fit_transform(X[:, 2])
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
c1=np.array(X[:,0])
c2=np.array(X[:,1])
c3=np.array(X[:,2])
c4=np.array(X[:,3])
print(c1)
print(c2)
print(c3)
print(c4)
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
features=zip(c1,c2,c3,c4)
k = list(features)

m = np.array(k)

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(m,y)
answer=model.predict([[0,2,1,1]])
print(answer)

14 | P a g e
17CP024 CP402

Output:

15 | P a g e
17CP024 CP402
Lab 5

Program

Aim: Write a program to perform k-NN classification using min- max


normalization.

Code:
import math
arr=[25,35,45,20,35,52,23,40,60,48,33]
que=int(input("enter age = "))
mn = min(arr)
mx = max(arr)
d=mx-mn
ans=[]
for i in arr:
t=i-mn
t=t/d
ans.append(t)
que=(que-mn)/d

arr2=[40000,60000,80000,20000,120000,18000,95000,62000,100000,220000,1500
00]
arr3=["N","N","N","N","N","N","Y","Y","Y","Y","Y"]
que2=int(input("enter Loan = "))
mn2 = min(arr2)
mx2 = max(arr2)
d2=mx2-mn2
ans2=[]
for i in arr2:
t=i-mn2
t=t/d2
ans2.append(t)
que2=(que2-mn2)/d2

No=0
Yes=0

dist=[]
for i in range(len(ans)):
an = math.sqrt((ans[i]-que)**2 + (ans2[i]-que2)**2)
dist.append(an)

k=int(input("enter value of k = "))


for i in range(0,k):
print("min number " + str(i) + " is " + str(min(dist)))
16 | P a g e
17CP024 CP402
id=dist.index(min(dist))
x=arr3[id]
if x == "N":
No=No+1
if x == "Y":
Yes=Yes+1
dist.remove(min(dist))
print("Yes is ", str(Yes))
print("No is ", str(No))
if Yes>No:
print("Answer is Yes")
else:
print("Answer is No")

Output:

17 | P a g e
17CP024 CP402
Lab 6
Aim: Simple k-NN classification algorithm

import math
ans=[25,35,45,20,35,52,23,40,60,48,33]
que=48
ans2=[40000,60000,80000,20000,120000,18000,95000,62000,100000,220000,15
0000]
que2=142000
arr3=["N","N","N","N","N","N","Y","Y","Y","Y","Y"]
No=0
Yes=0
dist=[]
for i in range(len(ans)):
an = math.sqrt((ans[i]-que)**2 + (ans2[i]-que2)**2)
dist.append(an)
k=int(input("enter value of k = "))
for i in range(0,k):
print("min number " + str(i) + " is " + str(min(dist)))
id=dist.index(min(dist))
x=arr3[id]
if x == "N":
No=No+1
if x == "Y":
Yes=Yes+1
dist.remove(min(dist))
print("Yes is ", str(Yes))
print("No is ", str(No))
if Yes>No:
print("Answer is Yes")
else:
print("Answer is No")

Output:

18 | P a g e
17CP024 CP402
Lab 7

Program

Aim: Write a program to perform Apriori algorithm

Code:
from itertools import chain, combinations
from collections import defaultdict

def subsets(arr):
return chain(*[combinations(arr, i + 1) for i, a in enumerate(arr)])

def minsupport(itemSet, transactionList, minSupport, freqSet):


_itemSet = set()
localSet = defaultdict(int)

for item in itemSet:


for transaction in transactionList:
if item.issubset(transaction):
freqSet[item] += 1
localSet[item] += 1

for item, count in localSet.items():

if count >= minSupport:


_itemSet.add(item)

return _itemSet

def joinSet(itemSet, length):


return set([i.union(j) for i in itemSet for j in itemSet if
len(i.union(j)) == length])

def getItemSetTransactionList(data_iterator):
transactionList = list()
itemSet = set()
for record in data_iterator:
transaction = frozenset(record)
transactionList.append(transaction)
for item in transaction:
itemSet.add(frozenset([item]))
return itemSet, transactionList

def runApriori(data_iter, minSupport, minConfidence):


itemSet, transactionList = getItemSetTransactionList(data_iter)
freqSet = defaultdict(int)
largeSet = dict()
19 | P a g e
17CP024 CP402
currentLSet = minsupport(itemSet,transactionList,minSupport,freqSet)
k = 2
while(currentLSet != set([])):
largeSet[k-1] = currentLSet
currentLSet = joinSet(currentLSet, k)
currentCSet =
minsupport(currentLSet,transactionList,minSupport,freqSet)
currentLSet = currentCSet
k = k + 1

def retsupport(item):
return float(freqSet[item])

toRetItems = []
for key, value in largeSet.items():
toRetItems.extend([(tuple(item), retsupport(item))
for item in value])

toRetRules = []
for key, value in list(largeSet.items())[1:]:
for item in value:
_subsets = map(frozenset, [x for x in subsets(item)])
for element in _subsets:
remain = item.difference(element)
if((len(element)+len(remain))==(k-2)):
if len(remain) > 0:
confidence = retsupport(item)/retsupport(element)
if confidence >= minConfidence:
toRetRules.append(((tuple(element),
tuple(remain)),
confidence))
return toRetItems, toRetRules

def printResults(rules):
print("\n------------------------ RULES:")
for rule, confidence in sorted(rules, key=lambda x: x[1]):
pre, post = rule
print("Rule: %s ==> %s , %.3f" % (str(pre), str(post), confidence))

if __name__ == "__main__":

dataset = [
['bread', 'milk'],
['bread', 'diaper', 'beer', 'egg'],
['milk', 'diaper', 'beer', 'cola'],
['bread', 'milk', 'diaper', 'beer'],
['bread', 'milk', 'diaper', 'cola'],
]

20 | P a g e
17CP024 CP402
minSupport = 2
minConfidence = 0.60

items, rules=runApriori(dataset, minSupport, minConfidence)


printResults(rules)

Output:

21 | P a g e
17CP024 CP402
Lab 8

Program

Aim: Write a program to perform FP growth algorithm

Code:

import csv
import numpy as np

def Frequency(data,mfreq,combi_values):
rep=[0]*len(combi_values)
#print(rep)
for i in data:
for j in range(len(combi_values)):
if combi_values[j] in i:
rep[j]+=1

rem=[]
#print(combi_values,rep)
for i in range(len(rep)):
if(rep[i]<mfreq):
rem.append(i)

combi_values=[combi_values[i] for i in range(len(combi_values)) if i not


in rem]
rep=[rep[i] for i in range(len(rep)) if i not in rem]
new_list=[]

#print("Here",combi_values)
uniq_freq=list(set(rep))
uniq_freq.sort(reverse=True)

for i in uniq_freq:
for j in range(len(rep)):
if(rep[j]==i):
new_list.append(combi_values[j])

#new_list.sort(key= lambda x:x[1],reverse=True)

#print(new_list)
return new_list

def Unique(data):
22 | P a g e
17CP024 CP402
val=[]
for i in data:
for j in i:
#print(val,i)
if j not in val:
val.append(j)
#val=list(set(val))
if('' in val):
del val[val.index('')]
return val

#FPTREE:
def fptree(Data,mfreq,uni_values):
#FP1
fp1=Frequency(Data,mfreq,uni_values)
#print(fp1)
#FP2
freq_pattern=[]
maxl=0
for i in Data:
entry=[]
for j in fp1:
if(j in i):
entry.append(j)
if(maxl<len(entry)):
maxl=len(entry)
freq_pattern.append(entry)
#print(freq_pattern)
#FP3
ftree=[]
for i in range(maxl):
ftree.append([])
#print(freq_pattern)
for i in freq_pattern:
for j in range(len(i)):
flag=0
if ftree[j]==[]:
ftree[j].append([i[j],1])
else:
for k in ftree[j]:
if(k[0]==i[j]):
k[1]+=1
flag=1
if(flag==0):
ftree[j].append([i[j],1])
values=[]
#print("FP" ,fp1)
#36
for i in range(1,len(fp1)):
dic={}
for p in fp1:
23 | P a g e
17CP024 CP402
dic[p]=0

for j in freq_pattern:
if fp1[i] in j:
f=j.index(fp1[i])
for k in dic.keys():
if k in j[:f]:
dic[k]+=1
values.append(dic)
frp=[]
for i in values:
ele=[]
cnt=0
for j,k in i.items():
if k>=mfreq:
cnt+=1
ele.append([j,fp1[values.index(i)+1],k])
if cnt>1:
rep=[]
for j,k in i.items():
if k>=mfreq:
rep.append(j)
count=0
for p in freq_pattern:
if fp1[values.index(i)+1] in p:
f=p.index(fp1[values.index(i)+1])
flag=0
for q in rep:
if q not in p[:f]:
flag=1
if(flag==0):
count+=1
if(count>=mfreq):
rep.extend([fp1[values.index(i)+1],count])
ele.append(rep)
frp.append(ele)
print("\n")
print("-------------------Frequent Pattern--------------------")
for i in frp[::-1]:
print(i)

return ftree

Data_Matrix=[]
with open('Transactions.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
Data_Matrix.append(row)
#print(row)
res=[]
fData=np.array(Data_Matrix)
#print(fData[0][1],2020)
24 | P a g e
17CP024 CP402
min_freq=int(input("Enter Min. Frequency : "))
#min_confidence=int(input("Enter Min. Confidence : "))/100
Values=Unique(fData)
ans=fptree(fData,min_freq,Values)
print("\n")
print("-------------------------FP Tree-------------------------")
#print("\n")
for i in range(len(ans)):
print("lEVEL :",i,ans[i])

Output:

25 | P a g e
17CP024 CP402

Lab 9

Program

Aim: Write a program to perform k-means clustering algorithm

Code:
import math

import matplotlib.pyplot as plt


import numpy
import pandas as pd

def euclidean_distance(point1, point2):


sum = 0
for i in range(len(point1)):
sum += math.pow(point1[i] - point2[i], 2)
return math.sqrt(sum)

def find_nearest_cluster(clusters, data_point, X, Y):


distances = [euclidean_distance((row[X], row[Y]), (data_point[X],
data_point[Y])) for index, row in
clusters.iterrows()]
return distances.index(min(distances))

data = pd.read_csv("kmedoids.csv")
X = "x1"
Y = "x2"
print(data[[X, Y]])

k = int(input("Enter number of clusters: "))

for i in range(k):
data.loc[i, 'cluster'] = i + 1

centroids = data.loc[0:k - 1].copy()


26 | P a g e
17CP024 CP402

for i in range(k, len(data)):


cluster_index = find_nearest_cluster(centroids, data.loc[i], X, Y)
data.loc[i, 'cluster'] = cluster_index + 1
centroids.loc[cluster_index, X] = (centroids.loc[cluster_index, X] +
data.loc[i, X]) / 2
centroids.loc[cluster_index, Y] = (centroids.loc[cluster_index, Y] +
data.loc[i, Y]) / 2

print(data)
print(centroids.loc[:, X:Y])

# random colors
colors = [numpy.random.random(3).reshape(1, -1) for i in range(k)]
# plot cluster data
plt.figure(figsize=(6, 6))
for i in range(1, k + 1):
plt.scatter(data.loc[data['cluster'] == i, X], data.loc[data['cluster']
== i, Y], s=100, c=colors[i - 1])
plt.xlabel(X)
plt.ylabel(Y)
plt.title('Visualization of raw data')
plt.show()

Output:

27 | P a g e
17CP024 CP402

28 | P a g e
17CP024 CP402
Lab 10

Program

Aim: Write a program to perform k-medoids algorithm

Code:
from itertools import combinations

def distance(p1,p2):
x=p1[0]-p2[0]
y=p1[1]-p2[1]
ans=abs(x)+abs(y)
return ans
dataset=[
[2,6],
[3,8],
[4,7],
[6,2],
[6,4],
[7,3],
[8,5],
[7,6],
[2, 6],
[3, 8],
[4, 7],
[6, 2],
[6, 4],
[7, 3],
[8, 5],
[7, 6]]

k=int(input("Enter value of k :"))


prev_cost = 999
ans_dict = {}
combination = combinations(dataset, k)
for j in list(combination):
clusters={}
for i in range(0,k):
t_centroid=j[i]
clusters[i+1]=[t_centroid]
cost = 0
for data_point in dataset:
min = 0
current_near_centroid=0
for i in clusters:
d = distance(data_point, clusters[i][0])
if d < min or i == 1:
min = d
29 | P a g e
17CP024 CP402
current_near_centroid = i
t_var = clusters[current_near_centroid]
t_var.append(data_point)
cost += min
if cost < prev_cost:
prev_cost = cost
ans_dict = clusters.copy()
for i in ans_dict:
del ans_dict[i][0]
print(ans_dict)
print(prev_cost)
Output:

30 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy