0% found this document useful (0 votes)
6 views6 pages

Comparison of Classifiers

The document analyzes the performance of three classification algorithms (Decision Tree, K-Nearest Neighbors, and Logistic Regression) on the Iris dataset. Each classifier underwent 5-fold cross-validation, with KNN and Logistic Regression achieving the highest accuracy of 0.97, while the Decision Tree achieved 0.96. A summary of the cross-validation accuracies for all classifiers is provided at the end.

Uploaded by

pnagakalyan.aiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Comparison of Classifiers

The document analyzes the performance of three classification algorithms (Decision Tree, K-Nearest Neighbors, and Logistic Regression) on the Iris dataset. Each classifier underwent 5-fold cross-validation, with KNN and Logistic Regression achieving the highest accuracy of 0.97, while the Decision Tree achieved 0.96. A summary of the cross-validation accuracies for all classifiers is provided at the end.

Uploaded by

pnagakalyan.aiml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Comparison_of_Classifiers

November 5, 2024

[1]: #Performance analysis of Classification Algorithms on a IRIS dataset

[2]: import numpy as np


import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,␣
↪confusion_matrix

import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.model_selection import cross_val_score

[3]: # The Iris dataset is a classic dataset in the field of machine learning and␣
↪statistics, widely used for classification tasks. :

# Features of the Iris Dataset


# The dataset consists of four features measured for each flower sample:
# 1. Sepal Length: The length of the sepal in centimeters.
# 2. Sepal Width: The width of the sepal in centimeters.
# 3. Petal Length: The length of the petal in centimeters.
# 4. Petal Width: The width of the petal in centimeters.

# Species of Iris Flowers


# The dataset includes three species of the Iris flower:

# 1. Iris Setosa:Typically has shorter and narrower petals and sepals.


# 2.Iris Versicolor:Intermediate flower size compared to Setosa and Virginica.
# 3. Iris Virginica: Usually has the largest petals and sepals among the three␣
↪species.

# - Total Samples: 150


# - Samples per Species: 50 for each species
# - Total Features: 4 (Sepal Length, Sepal Width, Petal Length, Petal Width)
# - Target Variable: Species of the Iris flower (Setosa, Versicolor, Virginica)

1
[4]: # Load the Iris dataset
data = load_iris()
X = data.data # Features
y = data.target # Target labels

[5]: # Dictionary to store accuracy results for final comparison


accuracy_results = {}

[6]: # 1. Decision Tree Classifier


decision_tree = DecisionTreeClassifier()

# Perform 5-fold cross-validation


dt_scores = cross_val_score(decision_tree, X, y, cv=5)
dt_accuracy = dt_scores.mean()
accuracy_results["Decision Tree"] = dt_accuracy

# Train the model on the full dataset


decision_tree.fit(X, y)
dt_y_pred = decision_tree.predict(X)

# Print results
print("\nDecision Tree Results:")
print(f"Cross-Validation Accuracy: {dt_accuracy:.2f}")
print("Classification Report:\n", classification_report(y, dt_y_pred,␣
↪target_names=data.target_names, zero_division=1))

# Confusion matrix and plot


dt_conf_matrix = confusion_matrix(y, dt_y_pred)
plt.figure(figsize=(6, 4))
sns.heatmap(dt_conf_matrix, annot=True, fmt="d", cmap="Blues",
xticklabels=data.target_names, yticklabels=data.target_names)
plt.title("Decision Tree - Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

Decision Tree Results:


Cross-Validation Accuracy: 0.96
Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 50


versicolor 1.00 1.00 1.00 50
virginica 1.00 1.00 1.00 50

accuracy 1.00 150


macro avg 1.00 1.00 1.00 150

2
weighted avg 1.00 1.00 1.00 150

[7]: # 2. K-Nearest Neighbors Classifier


knn = KNeighborsClassifier(n_neighbors=3)

# Perform 5-fold cross-validation


knn_scores = cross_val_score(knn, X, y, cv=5)
knn_accuracy = knn_scores.mean()
accuracy_results["K-Nearest Neighbors"] = knn_accuracy

# Train the model on the full dataset


knn.fit(X, y)
knn_y_pred = knn.predict(X)

# Print results
print("\nK-Nearest Neighbors Results:")
print(f"Cross-Validation Accuracy: {knn_accuracy:.2f}")
print("Classification Report:\n", classification_report(y, knn_y_pred,␣
↪target_names=data.target_names, zero_division=1))

# Confusion matrix and plot


knn_conf_matrix = confusion_matrix(y, knn_y_pred)

3
plt.figure(figsize=(6, 4))
sns.heatmap(knn_conf_matrix, annot=True, fmt="d", cmap="Blues",
xticklabels=data.target_names, yticklabels=data.target_names)
plt.title("K-Nearest Neighbors - Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

K-Nearest Neighbors Results:


Cross-Validation Accuracy: 0.97
Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 50


versicolor 0.94 0.94 0.94 50
virginica 0.94 0.94 0.94 50

accuracy 0.96 150


macro avg 0.96 0.96 0.96 150
weighted avg 0.96 0.96 0.96 150

4
[8]: # 3. Logistic Regression Classifier
logistic_regression = LogisticRegression(max_iter=200)

# Perform 5-fold cross-validation


lr_scores = cross_val_score(logistic_regression, X, y, cv=5)
lr_accuracy = lr_scores.mean()
accuracy_results["Logistic Regression"] = lr_accuracy

# Train the model on the full dataset


logistic_regression.fit(X, y)
lr_y_pred = logistic_regression.predict(X)

# Print results
print("\nLogistic Regression Results:")
print(f"Cross-Validation Accuracy: {lr_accuracy:.2f}")
print("Classification Report:\n", classification_report(y, lr_y_pred,␣
↪target_names=data.target_names, zero_division=1))

# Confusion matrix and plot


lr_conf_matrix = confusion_matrix(y, lr_y_pred)
plt.figure(figsize=(6, 4))
sns.heatmap(lr_conf_matrix, annot=True, fmt="d", cmap="Blues",
xticklabels=data.target_names, yticklabels=data.target_names)
plt.title("Logistic Regression - Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

Logistic Regression Results:


Cross-Validation Accuracy: 0.97
Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 50


versicolor 0.98 0.94 0.96 50
virginica 0.94 0.98 0.96 50

accuracy 0.97 150


macro avg 0.97 0.97 0.97 150
weighted avg 0.97 0.97 0.97 150

5
[9]: # Print a summary of cross-validation accuracies for all classifiers
print("\nFinal Comparison of Cross-Validation Accuracies:")
for name, accuracy in accuracy_results.items():
print(f"{name}: {accuracy:.2f}")

Final Comparison of Cross-Validation Accuracies:


Decision Tree: 0.96
K-Nearest Neighbors: 0.97
Logistic Regression: 0.97

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy