0% found this document useful (0 votes)
11 views25 pages

01 Machine Learning

Practical bajsnsbsbsbsbnajjsnsnshsbsvsbsbn

Uploaded by

kambleyash1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

01 Machine Learning

Practical bajsnsbsbsbsbnajjsnsnshsbsvsbsbn

Uploaded by

kambleyash1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

N.G.ACHARYA & D.K.

MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)

PRACTICAL JOURNAL

PSCSP512
Machine Learning
SUBMITTED BY
KAMBLE YASH RAJESH
SEAT NO :

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS


FOR QUALIFYING M.Sc. (CS) PART-I (SEMESTER – II) EXAMINATION.

2023-2024

DEPARTMENT OF COMPUTER SCIENCE


SHREE N.G.ACHARYA MARG,CHEMBUR

MUMBAI-400 071
N.G.ACHARYA & D.K.MARATHE COLLEGE OF

ARTS, SCIENCE & COMMERCE.

(Affiliated to University Of Mumbai)

CERTIFICATE

This is to certify that Mr. Kamble Yash Rajesh Seat No. studying
in Master of Science in Computer Science Part I Semester II has
satisfactorily completed the Practical of PSCSP512 Machine Learning
as prescribed by University of Mumbai, during the academic year 2023-
24.

Signature Signature Signature

Internal Guide External Examiner Head Of Department

College Seal Date:


INDEX

Practical Practical Signature.


No.

1 Implement Linear Regression(Diabetes Dataset)

2 Implement Logistics Regression(Iris Dataset).

3 Implement Multinomial Logistic Regression(Iris


Dataset).

4 Implement SVM classifier(Iris Dataset)

5 Train and fine-tune a decision tree for the moons


dataset.

6 Train and SVM regression on the California


housing dataset.

7 Implement Batch Gradient Descent with early


Stopping for softmax regression.

8 Implement MLP for classification of handwritten


Digits (MNIST Dataset).
Practical No:- 01
AIM :- Implement Linear Regression (Diabetes Dataset).
# Import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

diabetes = datasets.load_diabetes()
diabetes

print(diabetes.DESCR)
# columns
diabetes.feature_names

# Now we will split the data into the independent and independent variabl
X = diabetes.data
Y = diabetes.target
X.shape, Y.shape

# We will split the data into training and testing data


from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(X,Y,test_size=0.3,random_state=99)

train_x.shape, train_y.shape

# Linear Regression
from sklearn.linear_model import LinearRegression

le = LinearRegression()
le.fit(train_x,train_y)

y_pred = le.predict(test_x)
y_pred
result = pd.DataFrame({'Actual': test_y, 'Predict' : y_pred})
result

# we will check the accuracy


print('coefficient', le.coef_)
print('intercept', le.intercept_)
from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score

#Variance_Score
explained_variance_score(test_y,predicted_y)

>> 0.47737703777354545
# mean_squared_error
mean_squared_error(test_y,y_pred)

>> 3157.972848565651

# r2 score
r2_score(test_y,y_pred)

Inference:-
The model explains 0.477377 variance of the target w.r.t. features
The mean absolute error of model is 3157.972848565651
The R-Square score of model is 0.45
Below are the Coefficients & intercepts of the Regression Equation as calculated by the
model.

coeff = pd.Series(le.coef_, index = train_x.columns)


intercept = le.intercept_

print("Coefficients:\n")
print(coeff)
print("\n")
print("Intercept:\n")
print(intercept)
print("\n")

Coefficients:

age 54.820535
sex -260.930304
bmi 458.001802
bp 303.502332
s1 -995.584889
s2 698.811401
s3 183.095229
s4 185.698494
s5 838.503887
s6 96.441048
dtype: float64

Intercept:

154.42752615353518

The regression Equation would be :

Diabetes Progression = Intercept + coeff(1) X age + coeff(2) X sex +.....+ coeff(10) X s6


Practical No 2
Aim :- Implement Logistic Regression (Iris Dataset)
#Check the accuracy of model
from sklearn import accuracy_score
print(accuracy_score(Y_test, Y_predict))
0.973684
Inference:-
The logistic regression model’s accuracy is 97.36%
We have a 97.36% of accuracy which is a very good model, and with the confusion matrix
we can see that we have just only one misclassified data.
Practical No 3
Aim:- Implement Multinomial Logistic Regression (Iris Dataset)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing Sklearn module and classes


from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from sklearn import datasets
from sklearn.model_selection import train_test_split

#Data Loading – IRIS dataset


iris = datasets.load_iris()
iris

{'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5.
, 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4], {'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7,
3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5. , 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4],

X = iris.data[:, [0, 2]]


Y = iris.target

#Create Training / Test Data


X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1,
stratify=Y)

X_train
X_train.shape

(105, 2) .

X_test.shape

(45, 2)

#Perform Feature Scaling


#in order to make sure features are in fixed range irrespective of their values / units etc.

sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
X_train
X_train_std

#Train a Logistic Regression Model


# Create an instance of LogisticRegression classifier
lr = Multiple LogisticRegression(C=100.0, solver='lbfgs', multi_class='ovr')

# Fit the model

lr.fit(X_train_std, Y_train)

#Measure model performance


# Create the predictions

Y_predict = lr.predict(X_test_std)
Y_predict

array([2, 0, 0, 1, 1, 1, 2, 1, 2, 0, 0, 2, 0, 1, 0, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 1, 2, 0, 2, 0, 0, 1, 1, 2,
2, 0, 0, 0, 1, 2, 2, 1, 0, 0])

# Use metrics.accuracy_score to measure the score


print("multinomial LogisticRegression Accuracy %.3f" %metrics.accuracy_score(Y_test,
Y_predict))

Output

multinomial LogisticRegression Accuracy 0.956

Infernce – The accuracy score of model is 95%


Practical No 4
Aim:- Implement SVM classifier (Iris Dataset)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Define the col names


colnames=["sepal_length_in_cm",
"sepal_width_in_cm","petal_length_in_cm","petal_width_in_cm", "class"]

#Read the dataset


dataset = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data", header = None, names= colnames )

#Data
dataset.head()

#Encoding the categorical column


dataset = dataset.replace({"class": {"Iris-setosa":1,"Iris-versicolor":2, "Iris-virginica":3}})
#Visualize the new dataset
dataset.head()

# Now we’re going to analyze our data


plt.figure(1)
sns.heatmap(dataset.corr())
plt.title('Correlation On iris Classes')

# Spliting the data


X = dataset.iloc[:,:-1]
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

#Create the SVM model


from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
#Fit the model for the data

classifier.fit(X_train, y_train)

#Make the prediction


y_pred = classifier.predict(X_test)

#And finally for check the acurracy of the model


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

from sklearn.model_selection import cross_val_score


accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))

Output
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]
Accuracy: 98.18 %
Standard Deviation: 3.64 %

Inference - Accuracy score of model is 98.18% and standard deviation is 3.64%


We have a 98% of accuracy which is a very good model, and with the confusion matrix we
can see that we have just only one misclassified data.
Practical No 5
Aim:- Train and fine-tune a Decision Tree for the Moons Dataset

import numpy as np
import matplotlib.pyplot as plt
def plot_dataset(X, y, axes):
plt.figure(figsize=(10,6))
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs",alpha = 0.5)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^",alpha = 0.2)
plt.axis(axes)
plt.grid(True, which='both')
plt.xlabel(r"$x_1$", fontsize=20)
plt.ylabel(r"$x_2$", fontsize=20, rotation=0)

from sklearn.datasets import make_moons

X, y = make_moons(n_samples=10000, noise=0.4, random_state=21)


plot_dataset(X, y, [-3, 5, -3, 3])

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test =train_test_split(X,y, test_size = 0.2)

from sklearn.tree import DecisionTreeClassifier

tree_clf = DecisionTreeClassifier()

from sklearn.model_selection import GridSearchCV

parameter = {
'criterion' : ["gini", "entropy"],
'max_leaf_nodes': list(range(2, 50)),
'min_samples_split': [2, 3, 4]
}

clf = GridSearchCV(tree_clf, parameter, cv = 5,scoring =


"accuracy",return_train_score=True,n_jobs=-1)

clf.fit(X_train, y_train)

clf.best_params_

{'criterion': 'gini', 'max_leaf_nodes': 37, 'min_samples_split': 2}

cvres = clf.cv_results_
for mean_score, params in zip(cvres["mean_train_score"], cvres["params"]):
print(mean_score, params)

0.77859375 {'criterion': 'gini', 'max_leaf_nodes': 2, 'min_samples_split': 2}


0.77859375 {'criterion': 'gini', 'max_leaf_nodes': 2, 'min_samples_split': 3}
0.77859375 {'criterion': 'gini', 'max_leaf_nodes': 2, 'min_samples_split': 4}
0.8201562500000001 {'criterion': 'gini', 'max_leaf_nodes': 3, 'min_samples_split': 2}
0.8201562500000001 {'criterion': 'gini', 'max_leaf_nodes': 3, 'min_samples_split': 3}
0.8201562500000001 {'criterion': 'gini', 'max_leaf_nodes': 3, 'min_samples_split': 4}
0.8596875 {'criterion': 'gini', 'max_leaf_nodes': 4, 'min_samples_split': 2}

#Getting the training score:


clf.score(X_train, y_train)
0.875125

We have an accuracy of approximately 87% but accuracy is sometimes not a good measure to
use,

lets see the confusion matrix.

from sklearn.metrics import confusion_matrix


pred = clf.predict(X_train)
confusion_matrix(y_train,pred)

array([[3547, 481],
[ 518, 3454]])
Now from the confusion matrix let's get our precision and recall, which are better
metrics.
from sklearn.metrics import precision_score, recall_score

pre = precision_score(y_train, pred)


re = recall_score(y_train, pred)
print(f"Precision: {pre} Recall:{re}")

Precision: 0.8777636594663278 Recall:0.8695871097683786

Not bad we have a higher precision than recall but lets combine the two metrics
into F1_ score.

from sklearn.metrics import f1_score


f1_score(y_train, pred)

0.8736562539521943
Our F1_Score and accuracy are almost the same.

Getting the testing score:


clf.score(X_test, y_test)

0.8585

Inference:-
We have an accuracy of approximately 85% on the testing set.
Practical No 6
Aim:- Train and SVM regression on the California housing dataset
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
housing_data = fetch_california_housing()
descr = housing_data[‘DESCR’]
feature_names = housing_data[‘feature_names’]
data = housing_data[‘data’]
target = housing_data[‘target’]
df1 = pd.DataFrame(data=data)
df1.rename(columns={0: feature_names[0], 1: feature_names[1], 2: feature_names[2], 3:
feature_names[3],
4: feature_names[4], 5: feature_names[5], 6: feature_names[6], 7: feature_names[7]},
inplace=True)
df2 = pd.DataFrame(data=target)
df2.rename(columns={0: ‘Target’}, inplace=True)
housing = pd.concat([df1, df2], axis=1)
print(housing.columns)
housing.head()

MedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudeTarget08.32
5241.06.9841271.023810322.02.55555637.88-
122.234.52618.301421.06.2381370.9718802401.02.10984237.86-
122.223.58527.257452.08.2881361.073446496.02.80226037.85-
122.243.52135.643152.05.8173521.073059558.02.54794537.85-
122.253.41343.846252.06.2818531.081081565.02.18146737.85-122.253.422
print(“dimension of housing data: {}”.format(housing.shape))

dimension of housing data: (20640, 9)

housing.info()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing.loc[:, housing.columns != 'Target'],
housing['Target'], random_state=66)
from sklearn.svm import SVR
svr = SVR()
svr.fit(X_train, y_train)
s1 = svr.score(X_train, y_train)
s2 = svr.score(X_test, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s1))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s2))
O/P

R² of Support Vector Regressor on training set: -0.023

R² of Support Vector Regressor on test set: -0.033

Inference:-
The model underperform quite substantially, with a negative score on both the training set
and the test set.
SVM requires all the features to vary on a similar scale. We will need to re-scale our data that
all the features are approximately on the same scale:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
svr1 = SVR()
svr1.fit(X_train_scaled, y_train)
s3 = svr1.score(X_train_scaled, y_train)
s4 = svr1.score(X_test_scaled, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s3))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s4))

R² of Support Vector Regressor on training set: 0.659

R² of Support Vector Regressor on test set: 0.663

Inference:-
Scaling the data made a huge difference! Now we are actually underfitting, where training
and test set performance are quite similar but less close to 100% accuracy. From here, we can
try increasing either gamma or C to fit a more complex model.
svr2 = SVR(gamma=10)
svr2.fit(X_train_scaled, y_train)
s5 = svr2.score(X_train_scaled, y_train)
s6 = svr2.score(X_test_scaled, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s5))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s6))

R² of Support Vector Regressor on training set: 0.702

R² of Support Vector Regressor on test set: 0.697

Inference:-
Here, increasing gamma allows us to improve the model, resulting in 69.7% test set accuracy.
Practical No 7
Aim:- Implement Batch Gradient Descent with early stopping for Softmax
Regression.

import numpy as np
import math

class SoftmaxClassifier:
def __init__(self,learning_rate=0.1,max_iter=1000):
self.__learning_rate=learning_rate
self.__max_iter=max_iter

def __calculate_score(self,k,x):
weight=self.__weights[k]
return x.dot(weight)
def train(self,x,y):
self.__x=x
self.__y=y
self.__class_count=len(self.__y[0])
self.__weights=np.random.rand(self.__class_count,x.shape[1])
for i in range(self.__max_iter):
for j in range(self.__class_count):
self.__weights[j]=self.__calculate_new_weights(j)
def __calculate_softmax(self,k,x):
sum_of_exp=0
for i in range(self.__class_count):
sum_of_exp+=self.__calculate_score(i,x)
return self.__calculate_score(k,x)/sum_of_exp
def __calculate_cross_entropy_gradient(self,k):
sum=0
for i in range(len(self.__x)):
sum+=((self.__calculate_softmax(k,self.__x[i])-self.__y[i][k])*self.__x[i])
return sum
def __calculate_new_weights(self,k):
step_size=(self.__calculate_cross_entropy_gradient(k) )* self.__learning_rate
return self.__weights[k]-step_size
def predict(self,x):
y=np.zeros((len(x),self.__class_count))
for i in range(len(x)):
max_score_index=0
max_score=0
for j in range(self.__class_count):
score=self.__calculate_softmax(j,x[i])
if score>max_score:
max_score=score
max_score_index=j
y[i][max_score_index]=1
return y

from sklearn import datasets


from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

import numpy as np

def convert_to_one_hot(labels):
class_count=len(set(labels))
one_hot=np.zeros((len(labels),class_count))
for i in range(len(labels)):
one_hot[i][labels[i]]=1
return one_hot

def main():
iris=datasets.load_iris()
data=iris['data']
labels_one_hot=convert_to_one_hot(iris['target'])
rand=np.random.permutation(len(data))
x_train,x_test,y_train,y_test=train_test_split(data[rand],labels_one_hot[rand],test_size=0.33
)
soft_clf=SoftmaxClassifier()
soft_clf.train(x_train,y_train)
y_pred=soft_clf.predict(x_test)
Accuracy=accuracy_score(y_test,y_pred)
print(Accuracy)
if __name__=="__main__":
main()

O/P Accuracy=0.8
Inference:-
Accuracy of model is 80%
Practical No 8
Aim:- Implement MLP for classification of handwritten digits (MNIST
Dataset).

import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

import matplotlib.pyplot as plt

image_index = 7777 # You may select anything up to 60,000


print(y_train[image_index]) # The label is 8
plt.imshow(x_train[image_index], cmap='Greys')

8
<matplotlib.image.AxesImage at 0x7fa7325add50>

x_train.shape
(60000, 28, 28)

# Reshaping the array to 4-dims so that it can work with the Keras API
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])
x_train shape: (60000, 28, 28, 1)
Number of images in x_train 60000
Number of images in x_test 10000

# Importing the required Keras modules containing model and layers


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
# Creating a Sequential Model and adding the layers
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x=x_train,y=y_train, epochs=10)
Epoch 1/10
1875/1875 [==============================] - 43s 22ms/step - loss: 0.2121 -
accuracy: 0.9357
Epoch 2/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0878 -
accuracy: 0.9727
Epoch 3/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0588 -
accuracy: 0.9815
Epoch 4/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0447 -
accuracy: 0.9855
Epoch 5/10
1875/1875 [==============================] - 44s 23ms/step - loss: 0.0359 -
accuracy: 0.9886
Epoch 6/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0307 -
accuracy: 0.9897
Epoch 7/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0259 -
accuracy: 0.9911
Epoch 8/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0230 -
accuracy: 0.9921
Epoch 9/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0197 -
accuracy: 0.9932
Epoch 10/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0183 -
accuracy: 0.9943
<keras.callbacks.History at 0x7fa72f2c22f0>

model.evaluate(x_test, y_test)

313/313 [==============================] - 4s 13ms/step - loss: 0.0644 - accuracy:


0.9849
[0.06441233307123184, 0.9848999977111816]

Inference:-
The model accuracy is 98%

Use the model to predict image at index position 4444

image_index = 4444
plt.imshow(x_test[image_index].reshape(28, 28),cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1, 28, 28, 1))
print(pred.argmax())

1/1 [==============================] - 0s 115ms/step


9

Inference:- The model predicts the correct digit

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy