ML Lab R18
ML Lab R18
(CS604PC)
DEPARTMENT VISION AND MISSION
COURSE OUTCOMES
CO1 Develop skills in data extraction and manipulation using Python.
CO2 Apply Machine Learning and Text Classification to model relationships between variable.
CO3 Analyze credit-worthiness classification data and calculate unconditional and conditional
probabilities using Python.
PROGRAM OUTCOMES (POs)
PO-1 Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
PO-2 Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO-3 Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
PO-4 Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis
of the information to provide valid conclusions.
PO-5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
PO-6 The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant
to the professional engineering practice.
PO-7 Environment and sustainability: Understand the impact of the professional engineering
solution sin societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
PO-8 Apply ethical principles and commit to professional ethics and responsibilities and norms
of the engineering practice (Ethics).
PO-9 Individual and team work: Function effectively as an individual, and as a member or
leader indiverse teams, and in multidisciplinary settings.
PO-10 Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give and
receive clear instructions.
PO-11 Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO-12
Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
PROGRAM SPECIFIC OUTCOMES
Apply core and advanced concepts of database management systems, data mining
PSO 3 and machine learning to devise engineer solutions for practical problems.
ADDITIONAL PROGRAMS
Source Code:
PFIA=float(input('Enter probability that it is Friday and that a student is absent='))
PABF=PFIA / PF
print('probability that a student is absent given that today is Friday using conditional
probabilities=',PABF)
Output:
probability that a student is absent given that today is Friday using conditional probabilities=
0.15
1
2.Extract the data from the database using python.
SOURCECODE:
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password")
print(mydb)
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password")
cur=mydb.cursor()
cur.execute("CREATE DATABASE COLLEGE")
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password",database="college")
cur=mydb.cursor()
s="CREATE TABLE student(rollno integer(4), name varchar(20))"
cur.execute(s)
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password",database="college")
cur=mydb.cursor()
s="INSERT INTO student(rollno,name) VALUES(%s,%s)"
a1=[(1,"Suresh"),(2,"Ramesh")]
cur.executemany(s,a1)
mydb.commit()
print("Done")
import mysql.connector
mydb=mysql.connector.connect(host="localhost",user="root",password="password",database="college")
cur=mydb.cursor()
s="SELECT * from student"
cur.execute(s)
result=cur.fetchall()
for rec in result:
print(rec)
Output:
(1, ‘Suresh’)(2, ‘Ramesh’)
2
3. Implement k-nearest neighbors classification using python.
Source Code:
irisData=load_iris()
x=irisData.data
y=irisData.target
print('irirData.feature_names')
print('irirData.target_names')
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
knn=KNeighborsClassifier(n_neighbors=2)
knn.fit(x_train,y_train)
knn.predict([[3.2,5.4,4.1,2.5]])
Output:
array([1])
3
4.Given the following data, which specify classifications for nine combinations of
VAR1 and VAR2 predict a classification for a case where VAR1=0.906 and
VAR2=0.606, using the result of k-means clustering with 3means(i.e.,3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
Source Code:
import numpy as np
x=np.array([[1.713,1.586],[0.180,1.786],[0.353,1.240],[0.940,1.566],[1.486,0.759],[1.266,1.106],
[1.540,0.419],[0.459,1.799],[0.773,0.186]])
y=np.array([0,1,1,0,1,0,1,1,1])
kmeans=KMeans(n_clusters=3,random_state=0).fit(x,y)
kmeans.predict([[0.906,0.606]])
Output:
array([0])
4
5. The following training examples map descriptions of Finding individuals with
high, medium, and low creditworthiness.
Medium skiing design single twenties no -> high-Risk
high golf trading married forties yes-> low-Risk
Low-speed way transport married thirties yes -> med-Risk
medium football banking single thirties yes -> low-risk
high flying media married fifties yes -> high-Risk
Low football security single twenties no -> med-Risk
medium golf media single thirties yes -> med-Risk
medium golf transport married forties yes ->low-Risk
high skiing banking single thirties yes -> high-Risk
low golf unemployed married forties yes -> high-Risk
Input attributes are (from left to right) income, recreation, job, status, age-group, and
home-owner. Find the unconditional probability of `golf' and the conditional
probability of `single' given `med Risk' in the dataset?
Source Code:
totalRecords=10
numGolfRecords=4
unConditionalprobGolf=numGolfRecords / total_Records
numMedRiskSingle=2
numMedRisk=3
probMedRiskSingle=numMedRiskSingle/total_Records
probMedRisk=numMedRisk/total_Records
conditionalProb=(probMedRiskSingle/probMedRisk)
Output:
Unconditional probability of golf: =0.4
5
6. Implement linear regression using Python.
Source Code:
import numpy as np
# number of observations/points
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
6
# predicted response vector
# putting labels
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
plot_regression_line(x, y, b)
main()
7
Output:
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697
8
7. Implement Naïve Bayes theorem to classify the English text.
Source Code:
Import pandas as pd
From sklearn.model_selection import train_test_split
From sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score,confusion_matrix,precision_score,recall_score
msg = pd.read_csv('document.csv', names=['message', 'label'])print("Total Instances of
Dataset: ", msg.shape[0])msg['labelnum'] =msg.label.map({'pos': 1,'neg':0})
X = msg.messagey=msg.labelnum
Xtrain,Xtest,ytrain,ytest=train_test_split(X,y)count_v=CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)Xtest_dm=count_v.transform(Xtest)
df=pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = ultinomialNB()clf.fit(Xtrain_dm,ytrain)pred=clf.predict(Xtest_dm)
print('AccuracyMetrics:')
print('Accuracy:',accuracy_score(ytest,pred))print('Recall:',recall_score(ytest,pred))
print('Precision:',precision_score(ytest,pred))
print('ConfusionMatrix:\n',confusion_matrix(ytest,pred))
document.csv:
I love this and wich, pos This is an amazing place , pos
I feel very good about these beers, pos This is mybest work, pos
What an awesome view,pos I do not like this restaurant, neg I am tired of this stuff,
neg I can't deal with this, neg Heismys worn enemy, neg My boss is horrible, neg
This is an awesome place, pos I do not like the taste of this juice, neg I love to
dance, pos I am sick and tired of this place, neg What a great holiday, pos That is a
bad locality to stay,neg We will have good fun tomorrow, pos I went to my enemy's
house today, neg
Output:
Total Instances of Dataset: 18AccuracyMetrics: Accuracy:0.6
Recall:
0.6666666666666666
Precision:0.66666666666666
9
8. Implement an algorithm to demonstrate the significance of the genetic
algorithm.
Source Code:
# Python3 program to create target string, starting from#
random string using Genetic Algorithm
import random
# Number of individuals in each generation
POPULATION_SIZE = 100
# Valid genes
GENES = '''abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP
QRSTUVWXYZ 1234567890, .-;:_!"#%&/()=?@${[]}'''
# Target string to be generated
TARGET = "Ravindra Raman Cholla"
class Individual(object):'''
Class representing individual in population'''
def init (self, chromosome):
self.chromosome = chromosome
self.fitness = self.cal_fitness()
@classmethod
def mutated_genes(self):
'''
create random genes for mutation'''
global GENES
gene = random.choice(GENES)
return gene
@classmethod
def create_gnome(self):
'''
create chromosome or string of genes'''
global TARGET gnome_len
= len(TARGET)
return [self.mutated_genes() for _ in range(gnome_len)]
def mate(self, par2):
'''
Perform mating and produce new offspring'''
10
# chromosome for offspring
child_chromosome = []
for gp1, gp2 in zip(self.chromosome, par2.chromosome):
# Driver code
11
def main():
global POPULATION_SIZE
#current generation
generation = 1
found = False
population = []
# create initial population
for _ in range(POPULATION_SIZE):
gnome = Individual.create_gnome()
population.append(Individual(gnome))
while not found:
# sort the population in increasing order of fitness score
population = sorted(population, key = lambda x:x.fitness)
12
{}\tString: {}\tFitness: {}".\
format(generation,
"".join(population[0].chromosome),
population[0].fitness))
generation += 1
Output:
Generation: 1 String: qRIS Fitness: 3
Generation: 2 String: qRIS Fitness: 3
Generation: 3 String: qRIS Fitness: 3
Generation: 4 String: NR:n Fitness: 2
Generation: 5 String: NR:n Fitness: 2
Generation: 6 String: NRCn Fitness: 1
Generation: 7 String: NRCn Fitness: 1
Generation: 8 String: NRCn Fitness: 1
Generation: 9 String: NRCn Fitness: 1
Generation: 10 String: NRCn Fitness: 1
Generation: 11 String: NRCn Fitness: 1
Generation: 12 String: NRCn Fitness: 1
Generation: 13 String: NRCn Fitness: 1
Generation: 14 String: NRCMFitness: 0
13
9. Implement the finite words classification system using Back-propagation
algorithm.
Source code:
# 4. BackPropagation algorithm:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) #maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=5 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*y
for i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts contributed toerror
14
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr # dotproduct of next layer error and current layerop
wh += X.T.dot(d_hiddenlayer) *lr
print ("-----------Epoch-", i+1, "Starts--------- ")
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
print ("-----------Epoch-", i+1, "Ends --------- \n")
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Output:
Epoch- 1 Starts
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.7014538 ]
[0.68028913]
[0.69778034]]
Epoch- 1 Ends
Epoch- 2 Starts
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.70594364]
[0.68437414]
[0.70224318]]
Epoch- 2 Ends
Epoch- 3 Starts
Input:
15
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.71026427]
[0.68831216]
[0.70653908]]
Epoch- 3 Ends
Epoch- 4 Starts
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.71442409]
[0.69211026]
[0.71067628]]
Epoch- 4 Ends
Epoch- 5 Starts
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.71843105]
[0.69577512]
[0.71466255]]
Epoch- 5 Ends
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
16
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.71843105]
[0.69577512]
[0.71466255]]
17
ADDITIONAL PROGRAMS:
18
print('DENDOGRAM OVERVIEW :') Image('EE76D8A0-
92FA-4705-B1F6-748F38D83112.png')
# Dendrogram (Median Linkage)
19
cluster_H.fit(data)
label = model_clt.labels_
sil_coeff = silhouette_score(data,label,metric = 'euclidean')
print('For cluster= {}, Silhouette Coefficient is {}'.format(k,sil_coeff)) print('\n')
print('For Cluster = 2, it has highest Silhouette Value. So Number of Cluster = 2')
# Lets take another example : IRIS DATASET # Loading the Dataset
iris = datasets.load_iris()
iris_data = pd.DataFrame(iris.data)
iris_data.columns = iris.feature_names iris_data['Type']=iris.target
iris_data.head()
iris_X = iris_data.iloc[:, [0, 1, 2,3]].values print(iris_X[:5,:]) # Printing First
5 Rows
iris_Y = iris_data['Type'] iris_Y = np.array(iris_Y) print(iris_Y)
# Frequency count of the Output clusters
unique, counts = np.unique(iris_Y, return_counts=True) freq_1 = dict(zip(unique,
counts))
freq_1
# Filtering Setosa
Setosa = iris_data['Type'] == 0
print("Filtering Setosa, True means its Setosa and False means Non Setosa")
print(Setosa.head())
print("Top 6 Rows of Setosa") Setosa_v2 = iris_data[Setosa]
print(Setosa_v2[Setosa_v2.columns[0:2]].head()) print("Last 6 Rows of Setosa")
print(Setosa_v2[Setosa_v2.columns[0:2]].tail())
# Filtering Setosa for 2D Plot
20
Versi = iris_data['Type'] == 1
print("Filtering Versicolour, True means its Versicolour and False means Non
Versicolour") print(Versi.head())
print("Top 6 Rows of Versicolour") Versi_v2 = iris_data[Versi]
print(Versi_v2[Versi_v2.columns[0:2]].head()) print("Last 6 Rows of Versicolour")
print(Versi_v2[Versi_v2.columns[0:2]].tail())
# Filtering Versicolour for 2D Plot
print("Versicolour for 2D Plot") print("X Axis points")
print(iris_X[iris_Y == 1,0]) print("Y Axis Points")
print(iris_X[iris_Y == 1,1]) print('\n')
plt.scatter(iris_X[iris_Y == 1, 0], iris_X[iris_Y == 1, 1],
s = 80, c = 'yellow', label = 'Iris-versicolour')
plt.xlim([4.5,8])
plt.ylim([2,4.5])
# Filtering Virginica
Virginica = iris_data['Type'] == 2
print("Filtering Virginica, True means its Virginica and False means Non Virginica")
print(Virginica.head())
print("Top 6 Rows of Virginica")
Virginica_v2 = iris_data[Virginica]
print(Virginica_v2[Virginica_v2.columns[0:2]].head()) print("Last 6 Rows of
Virginica")
print(Virginica_v2[Virginica_v2.columns[0:2]].tail())
21
s = 80, c = 'green', label = 'Iris-virginica')
plt.legend()
iris_X_1 = iris_data[['sepal length (cm)','sepal width (cm)',
'petal length (cm)','petal width (cm)']]
iris_X_1.head()
22
unique, counts = np.unique(pred1, return_counts=True) print('Hierarchical
Clustering Output Cluster')
print(dict(zip(unique, counts)))
# Silhouette Score
print('Silhouette Score for 3 Clusters') print(silhouette_score(iris_X,pred1))
print('\n')
# In the above output we got value labels: ‘0’, ‘1’ and ‘2’
# For a better understanding, we can visualize these clusters.
# We use the above-found class labels and visualise how the clusters have been formed.
plt.scatter(iris_X[pred1 == 0, 0], iris_X[pred1 == 0, 1], s = 80, c =
'orange', label = 'Iris-setosa')
plt.scatter(iris_X[pred1 == 1, 0], iris_X[pred1 == 1, 1],
s = 80, c = 'yellow', label = 'Iris-versicolour') plt.scatter(iris_X[pred1 == 2,
0], iris_X[pred1 == 2, 1],
s = 80, c = 'green', label = 'Iris-virginica')
plt.legend()
for k in range(2,10):
cluster_H = AgglomerativeClustering(n_clusters=k,linkage= 'average') model_clt =
cluster_H.fit(iris_X)
label = model_clt.labels_
sil_coeff = silhouette_score(iris_X,label,metric = 'euclidean')
print('For cluster= {}, Silhouette Coefficient is {}'.format(k,sil_coeff)) print('\n')
print('For Cluster = 2, it has highest Silhouette Value')
print('But according to Visualization and data, Number of Cluster is 3')
Output:
Original Cluster
{0: 50, 1: 50, 2: 50}
Hierarchical Clustering Output Cluster
{0: 64, 1: 50, 2: 36}
Silhouette Score for 3 Clusters 0.5541608580282847
For cluster= 2, Silhouette Coefficient is 0.6867350732769776
For cluster= 3, Silhouette Coefficient is 0.5541608580282847
For cluster= 4, Silhouette Coefficient is 0.4719936084994249
For cluster= 5, Silhouette Coefficient is 0.4306699739542549
23
For cluster= 6, Silhouette Coefficient is 0.3419903827982995
For cluster= 7, Silhouette Coefficient is 0.3707424079292066
For cluster= 8, Silhouette Coefficient is 0.3658753388418643 For cluster= 9, Silhouette
Coefficient is 0.3166806903618151
For Cluster = 2, it has highest Silhouette Value
24
2.Write a Python Program to implement Logistic Regression.
Objective: To implement Logistic Regression.
Outcome: Student will be able to implement Logistic Regression method.
Input: User_Data.csv
Source code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('...\\User_Data.csv')
# input
x = dataset.iloc[:, [2, 3]].values
# output
y = dataset.iloc[:, 4].values
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(
25
x, y, test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(xtrain)
xtest = sc_x.transform(xtest)
print (xtrain[0:10, :])
Output :
[[ 0.58164944 -0.88670699]
[-0.60673761 1.46173768]
[-0.01254409 -0.5677824 ]
[-0.60673761 1.89663484]
[ 1.37390747 -1.40858358]
[ 1.47293972 0.99784738]
[ 0.08648817 -0.79972756]
[-0.01254409 -0.24885782]
[-0.21060859 -0.5677824 ]
[-0.21060859 -0.19087153]]
Here once see that Age and Estimated salary features values are sacled and now there in
the -1 to 1. Hence, each feature will contribute equally in decision making i.e. finalizing
the hypothesis.
26
Output:
Confusion Matrix :
[[65 3]
[ 8 24]]
out of 100 :
TruePostive + TrueNegative = 65 + 24
FalsePositive + FalseNegative = 3 + 8
Performance measure – Accuracy:
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(ytest, y_pred))
Output:
Accuracy : 0.89
Visualizing the performance of our model.
from matplotlib.colors import ListedColormap
X_set, y_set = xtest, ytest
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(
np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Classifier (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
27
28