0% found this document useful (0 votes)
11 views9 pages

ML Fat

The document outlines the FAT exam for the course 'Machine Learning for Data Science (LAB)', detailing the steps taken to preprocess a dataset, divide it into training, validation, and test sets, and apply machine learning models including Random Forest and an Artificial Neural Network (ANN). It includes code snippets for data handling, model training, hyperparameter tuning, and performance evaluation using accuracy scores and confusion matrices. The document emphasizes the importance of model validation and performance metrics in machine learning tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

ML Fat

The document outlines the FAT exam for the course 'Machine Learning for Data Science (LAB)', detailing the steps taken to preprocess a dataset, divide it into training, validation, and test sets, and apply machine learning models including Random Forest and an Artificial Neural Network (ANN). It includes code snippets for data handling, model training, hyperparameter tuning, and performance evaluation using accuracy scores and confusion matrices. The document emphasizes the importance of model validation and performance metrics in machine learning tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Winter semester 23-24

Course code MDI4001


Course name Machine Learning for DataScience (LAB)

Submitted to:
Jyotismita Chaki
Jyotismita@vit.ac.in

FAT Exam

Submitted by :
Shiny. S (21MID0079)
Shiny.2021@vitstudent.ac.in

Date: 29 April 2024


a) Performing the preprocessing steps in the given dataset

CODE:

import pandas as pd

import numpy as np

data = pd.read_csv("agriculture_dataset.csv")

data.head()

data.info()

data.isnull().sum()

data.describe()

SCREENSHOT :
There are no null values in the dataset. So there isn’t need for further preprocessing
steps.

b. Divide the dataset into train, validation, and test sets.

CODE :

x = data.iloc[:,0:6]

y = data['Plant type']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

SCREENSHOT:
C ) Use a suitable hyperparameter-tuned ML model to train the dataset.

Random Forest is the suitable hyperparameter-tuned model to train the given dataset.

CODE :

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

rf_classifier.fit(X_train, y_train)

y_pred = rf_classifier.predict(X_test)

SCREENSHOT:

d. After training, validate it and test the model’s performance

CODE:

model = RandomForestClassifier(random_state=1, max_depth=10)

model.fit(X_train, y_train)

pred_train = model.predict(X_train)

train_score = accuracy_score(y_train,pred_train)

print('train_accuracy_score',train_score)

pred_val = model.predict(X_test)

val_score = accuracy_score(y_test,pred_val)

print('val_accuracy_score',val_score)
SCREENSHOT:

Hypertuning the model for better value accuracy:


CODE:

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,


recall_score, ConfusionMatrixDisplay

from sklearn.model_selection import RandomizedSearchCV

from scipy.stats import randint

param_dist = {'n_estimators': randint(50,500),'max_depth': randint(1,20)}

rf = RandomForestClassifier()

rand_search = RandomizedSearchCV(rf,param_distributions = param_dist, n_iter=5,


cv=5)

rand_search.fit(X_train, y_train)

# Create a variable for the best model

best_rf = rand_search.best_estimator_

# Print the best hyperparameters

print('Best hyperparameters:', rand_search.best_params_)

# Generate predictions with the best model

pred_train = best_rf.predict(X_train)

train_score = accuracy_score(y_train,pred_train)

print('train_accuracy_score',train_score)

pred_val = best_rf.predict(X_test)

val_score = accuracy_score(y_test,pred_val)
print('val_accuracy_score',val_score)

SCREENSHOT:

Creating the confusion matrix:

CODE:

cm = confusion_matrix(y_test,pred_val)

ConfusionMatrixDisplay(confusion_matrix=cm).plot()

SCREENSHOT:
a) Perform the pre-processing steps if needed. If the pre-processing steps are not needed

CODE:

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from keras.models import Sequential

from keras.layers import Dense

data = pd.read_csv('University_dataset.csv')

data.head()

data.isnull().sum()

SCREENSHOT:
B ) Divide the dataset into train, validation, and test set.

CODE:

X = data.iloc[:, 1:6].values

y = data.iloc[:, 6].values

SCREENSHOT:

C ) Can we use an ANN to train the dataset? If yes, then create an ANN and train and validate the
model by using the dataset and write a discussion on the performance of the model on the answer
booklet given. If no, then write your justification on the answer booklet given.

CODE:

scaler = StandardScaler()

X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the ANN

model = Sequential()

model.add(Dense(128, input_dim=5, activation='relu'))

model.add(Dense(64, activation='relu'))

model.add(Dense(1, activation='linear'))

# Compile the model

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_error'])

# Train the model

model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test)) #Evaluate


the model

loss, accuracy = model.evaluate(X_test, y_test)

print(f'Loss: {loss}, Mean Absolute Error: {accuracy}')

# Make predictions

predictions = model.predict(X_test)
SCREENSHOT:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy