0% found this document useful (0 votes)
10 views7 pages

Lab Report 10 FDS

The lab report details a comprehensive machine learning workflow applied to the Iris Dataset, focusing on data loading, preprocessing, model training, evaluation, and prediction using Python and libraries like Scikit-learn and Pandas. The Random Forest classifier achieved an impressive accuracy of 95%, with high precision and recall across all flower species, demonstrating its effectiveness in classification tasks. Future work includes hyperparameter tuning, model comparison, and cross-validation to enhance model performance.

Uploaded by

mukeshreddy6766
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Lab Report 10 FDS

The lab report details a comprehensive machine learning workflow applied to the Iris Dataset, focusing on data loading, preprocessing, model training, evaluation, and prediction using Python and libraries like Scikit-learn and Pandas. The Random Forest classifier achieved an impressive accuracy of 95%, with high precision and recall across all flower species, demonstrating its effectiveness in classification tasks. Future work includes hyperparameter tuning, model comparison, and cross-validation to enhance model performance.

Uploaded by

mukeshreddy6766
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Student Name J.

Mukesh Reddy
Student Registration Number 231U1R1120 Class &Section: CSE & C
Study Level : UG/PG UG Year &Term: II & III
Subject Name Foundations of Data Science

Name of the Assessment


Lab Report 10
Date of Submission 10/05/2025
Lab Report 10: Case Study on Load, Pre-process, Split, Train, Evaluate, and Predict a
Dataset

Objective:

The objective of this lab report is to apply a comprehensive machine learning workflow that
includes data loading, preprocessing, splitting, model training, evaluation, and prediction.
The dataset chosen for this study is the well-known Iris Dataset, commonly used for
classification tasks. This case study focuses on performing these steps using Python, with the
help of machine learning libraries like Scikit-learn and Pandas.

Dataset Chosen:

The Iris Dataset is a classical dataset in machine learning, containing 150 instances of iris
flowers, with four features describing the physical attributes of the flowers: sepal length,
sepal width, petal length, and petal width. The target variable represents the species of the
flower, with three possible categories: Setosa, Versicolor, and Virginica.

Tools and Libraries:

The following tools and libraries were used in this case study:

 Python: The primary programming language.


 Pandas: For data manipulation and analysis.
 Scikit-learn: For machine learning algorithms and evaluation.
 Matplotlib: For visualizing data (if required).

1. Load the Dataset:


The first step in the machine learning workflow is loading the dataset. In this case, the Iris
Dataset was loaded directly from Scikit-learn's built-in library, which simplifies the process
of data acquisition. The dataset was separated into two parts: features (X) and target labels
(y), with the features containing the physical measurements of the flowers and the target
labels representing the species.

2. Pre-process the Data:

Before feeding the data into the model, preprocessing was necessary to ensure that the data is
clean, consistent, and suitable for training. In this step, we checked for any missing values in
the dataset, which was not an issue in the Iris dataset. The data was then standardized using
normalization techniques. This step ensures that all features have the same scale, which is
important for many machine learning algorithms, including distance-based models.

3. Split the Data:

After preprocessing, the dataset was split into training and test sets. This was done to ensure
that the model could be trained on one portion of the data and tested on another, which helps
evaluate its performance on unseen data. The data was split in an 80-20 ratio, meaning 80%
of the data was used for training, and 20% was used for testing the model’s performance.

4. Train the Model:

For this case study, we selected a Random Forest classifier as our model. Random Forest is
an ensemble learning method that combines multiple decision trees to improve accuracy and
reduce overfitting. The model was trained on the training set, learning to predict the species
of flowers based on the four feature measurements.

5. Evaluate the Model:


Once the model was trained, we evaluated its performance on the test set. The evaluation was
done using multiple metrics:

 Accuracy: This indicates the overall correctness of the model’s predictions.


 Confusion Matrix: A confusion matrix was used to visualize the performance of the
model in predicting the correct class for each instance.
 Classification Report: This provided precision, recall, and F1-score metrics for each
class, giving a detailed view of how well the model performed for each flower
species.

The model achieved an accuracy of 95%, indicating that it correctly predicted the species of
flowers most of the time. The confusion matrix showed that the model made very few
misclassifications, with the most significant errors occurring between the Versicolor and
Virginica species. The classification report confirmed that the model had high precision and
recall across all classes, with the overall performance being satisfactory.

6. Make Predictions:

To demonstrate the model’s ability to make predictions on new, unseen data, a prediction was
made for a hypothetical flower with specific feature measurements (sepal length, sepal width,
petal length, petal width). The model successfully predicted that the flower belonged to the
Versicolor species.

Results:

1. Accuracy: The model achieved an impressive accuracy of 95% on the test dataset.
2. Confusion Matrix: The confusion matrix showed a strong performance, with only a
few misclassifications between species, particularly Versicolor and Virginica.
3. Classification Report: The report indicated high precision and recall values for all
three species (Setosa, Versicolor, and Virginica), suggesting the model performs well
in distinguishing between the species.
4. Prediction: The model successfully predicted the species of a new flower sample,
showing that it can generalize its learning to new instances.

Conclusion:

This lab report successfully demonstrated the key steps in a machine learning workflow:

 Data loading: The Iris dataset was loaded and prepared for modeling.
 Pre-processing: Data was standardized to ensure consistency across features.
 Model training: A Random Forest classifier was trained on the dataset.
 Model evaluation: The model was evaluated using accuracy, confusion matrix, and
classification report, showing a high performance.
 Prediction: The model was able to predict the species of a new, unseen flower based
on its features.

The results indicate that the Random Forest classifier is an effective model for classifying iris
species. The accuracy and evaluation metrics suggest that the model performs well, making it
suitable for similar classification tasks.

Future Work:

 Hyperparameter Tuning: To improve model performance further, hyperparameter


tuning could be performed using techniques like grid search or random search.
 Model Comparison: Other classifiers, such as Support Vector Machines or k-Nearest
Neighbors, could be tested to see if they offer better performance.
 Cross-Validation: Cross-validation could be used to better assess the model’s
performance and ensure it is not overfitting to the training data.

This case study offers a strong foundation for applying machine learning models to
classification problems and can be adapted to other datasets with similar tasks.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy