0% found this document useful (0 votes)
8 views2 pages

VL2024250504566 Ast03

The document outlines an assessment for a B.Tech (CSE) Data Mining Lab course, focusing on implementing and evaluating classification algorithms (ID3 Decision Tree, CART, and Naïve Bayes) and customer segmentation using clustering algorithms (K-Means, K-Medoids, Hierarchical Clustering). Students are required to use real-world datasets, preprocess the data, and evaluate model performance using various metrics and visualizations. Additionally, a comparative analysis of the clustering algorithms' performance is to be conducted.

Uploaded by

jee2022.acc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

VL2024250504566 Ast03

The document outlines an assessment for a B.Tech (CSE) Data Mining Lab course, focusing on implementing and evaluating classification algorithms (ID3 Decision Tree, CART, and Naïve Bayes) and customer segmentation using clustering algorithms (K-Means, K-Medoids, Hierarchical Clustering). Students are required to use real-world datasets, preprocess the data, and evaluate model performance using various metrics and visualizations. Additionally, a comparative analysis of the clustering algorithms' performance is to be conducted.

Uploaded by

jee2022.acc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

SLOT – L43+L44

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING


ASSESSMENT – III – WINTER S E M E S T E R 2024-2025
Programme Name & Branch: B.Tech (CSE) Course Name: Data Mining Lab
Course Code: BCSE208P

Objective: To Implement and Evaluate Classification Algorithms (Decision Tree


and Naïve Bayes)
1. Implement and Evaluate Classification Algorithms:
Dataset: Use the dataset (available from Kaggle or UCI Machine
Learning Repository). Consider external dataset containing real-world data (e.g.,
customer churn prediction, medical diagnosis, or credit risk classification). The
dataset includes both categorical and numerical attributes.
Load and preprocess the dataset.
Split the dataset into training and testing sets (80% training, 20% testing).
Implement the following classification algorithms:
• ID3 Decision Tree: Implement the Iterative Dichotomiser 3 (ID3) algorithm
using an existing Python library or from scratch if desired.
• CART Decision Tree: Implement the Classification and Regression Tree
(CART) algorithm, which uses the Gini Index for splitting.
• Naïve Bayes Classifier: Implement the Gaussian Naïve Bayes or Multinomial
Naïve Bayes based on the dataset's characteristics.
Evaluate Model Performance:
• Compute accuracy, precision, recall, F1-score, and confusion matrix for each
classifier.
• Visualize the decision trees (ID3 and CART) to analyze how the models
make decisions.
• Use ROC-AUC curves to compare classifier performance.
2. Customer Segmentation using Clustering Algorithms
Dataset:
Use a combination of datasets to create a comprehensive customer profile. You
can source datasets from:
• Kaggle (e.g., customer transaction data, customer demographics)
• UCI Machine Learning Repository (e.g., online retail data)
Combine at least two datasets to enrich the feature set for each customer. Ensure
that the combined dataset has a minimum of 10,000 data points.
Objective:
Perform customer segmentation to identify distinct groups of customers based on
their purchasing behavior, demographics, and other relevant features.
Instructions:
Data Preparation:
• Load and merge the datasets into a single Pandas DataFrame.
• Handle missing values appropriately (e.g., imputation or removal).
• Encode categorical variables using techniques like one-hot encoding or
label encoding.
• Scale the numerical features using StandardScaler or MinMaxScaler.
Implement the following clustering algorithms:
• K-Means
• K-Medoids
• Hierarchical Clustering
For each algorithm:
• Determine the optimal number of clusters using appropriate methods such
as the elbow method, silhouette score3, or dendrograms.
• Train the model on the preprocessed dataset using the determined optimal
number of clusters.
• Assign each data point to a cluster.
• Visualize the clusters using dimensionality reduction techniques (e.g.,
PCA or t-SNE) for higher-dimensional data.
• Evaluate the clustering performance using appropriate metrics such as
silhouette score3.
Comparative Analysis:
Compare the performance of the three algorithms and visualize the results.

^^^^^^^^^^^^^^^^^^^^^^^^^^^

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy