0% found this document useful (0 votes)

11 views

Data Mining

A leading bank wants to develop a customer segmentation to give promotional offers to its customers. They collected a sample that summarizes the activities of users during the past few months. You are given the task to identify the segments based on credit card usage.

Uploaded by

Mohseen Sayyed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Data Mining

Uploaded by

Mohseen Sayyed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

PROJECT DATA MINING

BUSINESS REPORT
Mohseen Sayyed | Mohseen.Sayyed@gmail.com
PART I
CLUSTERING

Problem Statement …… 3

1.1 Read the data, do the necessary initial

steps, and exploratory data analysis
(Univariate, Bi-variate, and multivariate
analysis)……… 4 to 7

1.2 Do you think scaling is necessary for

clustering in this case? Justify……… 8

1.3 Apply hierarchical clustering to scaled

data. Identify the number of optimum
clusters using Dendrogram and briefly
describe them……… 8 to 9

1.4 Apply K-Means clustering on scaled

data and determine optimum clusters.
Apply elbow curve and silhouette score.
Explain the results properly. Interpret and
write inferences on the finalized clusters.
20XX PRESENTATION TITLE 2

……… 10

1.5 Describe cluster profiles for the clusters

defined. Recommend different promotional
strategies for different clusters. ……… 11
PROBLEM STATEMENT

A leading bank wants to develop a customer segmentation to

give promotional offers to its customers. They collected a sample
that summarizes the activities of users during the past few
months. You are given the task to identify the segments based
on credit card usage.

3
DATA ANALYSIS

DATA SHAPE
• The Data frame has 210 rows and 7 Columns

DATA DESCRIPTION
• The Dataset contains 7 Columns and 210 Rows
• No Categorical Variables
• We see for most of the variable, mean/medium are nearly equal
• Include a 90% to see variations and it looks distributed evenly
• Std Deviation is high for spending variable

DATA SAMPLE

• The Data is clean with no Null or NA values

• No Duplicated found in the data
EXPLORATORY DATA ANALYSIS

• We have performed Univariate, Bi-Variate and Multi-variate Analysis on all the Columns
of the data set
• We have found the Outliers if any in the data, the correlation between the columns
• You can see the skewness on the data basis the charts we have

spending advance_payments probability_of_full_payment current_balance

Range of values 10.59 4.84 0.1102 1.776
Minimum 10.59 12.41 0.8081 4.899
Maximum 21.18 17.25 0.9183 6.675
Mean value 14.8475 14.5593 0.871 5.6285
Median value 14.355 14.32 0.8735 5.5235
Standard deviation 2.9097 1.306 0.0236 0.4431
Null values False False False False
spending - 1st Quartile (Q1) 12.27 13.45 0.8569 5.2623
spending - 3st Quartile (Q3) 17.305 15.715 0.8878 5.9798
Interquartile range (IQR) 5.035 2.265 0.0309 0.7175

Range of values credit_limit min_payment_amt max_spent_in_single_shopping

Range of values 1.403 7.6909 2.031
Minimum 2.63 0.7651 4.519
Maximum 4.033 8.456 6.55
Mean value 3.2586 3.7002 5.4081
Median value 3.237 3.599 5.223
Standard deviation 0.3777 1.5036 0.4915
Null values False False False
spending - 1st Quartile (Q1) 2.944 2.5615 5.045
spending - 3st Quartile (Q3) 3.5618 4.7688 5.877
Interquartile range (IQR) 0.6177 2.2073 0.832

• The Table above gives us details on each columns Max and Min Value
• Shows Mean, Median and Inter Quartile range of the Column

OUTLIER ANALYSIS

Fig (a)

• Outliers Analysis on all Columns show that we have outliers only on

Probability_of_full_payment and min_payment_amt
SKEWNESS
• We see the distribution is skewed towards the right tail for all the variable except
probability_of_full_payment variable. (Fig b)

Fig (b)

Fig (c)
CORRELATION
• Strong positive correlation observed between the below Features: (Fig c , Pg 6)
• - spending & advance_payments,
• - advance_payments & current_balance,
• - credit_limit & spending
• - spending & current_balance
• - credit_limit & advance_payments
• - max_spent_in_single_shopping current_balance
PROJECT ANALYSIS

1.2 Do you think scaling is necessary for clustering in this case?

Justify

Yes Scaling is Needed

Inference
• Scaling needs to be performed only on Numerical Values. Z-Score method is used
for Scaling on this dataset
• The dataset has columns with variety of Data fetaures like
Credit_Limit,Min_payment, payment_amount, Spending. which would vary between
people. There can be higher Credit limit for one and lower for the other basis the
salary and Credit Score of a person. Hence we would need to scale the data.
• With Z-Score we would come to know how many std deviation is the point away from
mean. It also states the Direction.

1.3 Apply hierarchical clustering to scaled data. Identify the

number of optimum clusters using Dendrogram and briefly
describe them

We will use 2 Methods to do this

1. Ward
2. Agglomerative Clustering

Dendrogram on entire dataset Dendrogram on Last 25 Merge

WARD

• We have used Ward method here, Ward´s linkage is a method for hierarchical cluster
analysis.
• The idea has much in common with analysis of variance (ANOVA). The linkage
function specifying the distance between two clusters is computed as the increase in
the "error sum of squares" (ESS) after fusing two clusters into a single cluster.
• In the above dendrogram, we locate the largest vertical difference between nodes, and
in the middle pass an horizontal line. The number of vertical lines intersecting it is the
optimal number of clusters
• I assume the number of clusters basis this Dendrogram as 3
PROJECT ANALYSIS

AGGLOMERATIVE CLUSTERING

• Agglomerative Clustering is a type of hierarchical clustering algorithm. It is an

unsupervised machine learning technique that divides the population into several
clusters such that data points in the same cluster are more similar and data points in
different clusters are dissimilar
• I assume the number of clusters basis this Dendrogram as 3

WARD

AGGLOMERATIVE CLUSTERING

• Inference
• Both the method are almost similer means , minor variation, which we know it occurs.
• We for cluster grouping based on the dendrogram, 3 or 4 looks good. Did the further
analysis, and based on the dataset had gone for 3 group cluster solution based on the
hierarchical clustering
• Also in real time, there colud have been more variables value captured - tenure,
BALANCE_FREQUENCY, balance, purchase, installment of purchase, others.
• And three group cluster solution gives a pattern based on high/medium/low spending
with max_spent_in_single_shopping (high value item) and
probability_of_full_payment(payment made)
PROJECT ANALYSIS

1.4 Apply K-Means clustering on scaled data and determine

optimum clusters. Apply elbow curve and silhouette score.
Explain the results properly. Interpret and write inferences on the
finalized clusters.

• k-means clustering is a method of vector quantization, originally from signal processing, that
aims to partition n observations into k clusters in which each observation belongs to the
cluster with the nearest mean, serving as a prototype of the cluster.
• Inertia measures how well a dataset was clustered by K-Means. It is calculated by measuring
the distance between each data point and its centroid, squaring this distance, and summing
these squares across one cluster.

• Basis the Silhoutte Score, Cluster 3 or 4 could be the Optimum Clusters

• This Dataset can be explained in terms of Spending Pattern of an Individual and Risk they
bring with the Spend basically as Low Spend - High Risk, Medium Spend - Low Risk and High
Spend - Low Risk
PROJECT ANALYSIS

1.5 Describe cluster profiles for the clusters defined. Recommend

different promotional strategies for different clusters.

• Cluster 0 : High Spending

• Cluster 1 : Low Spending
• Cluster 2 - Medium Spending

Cluster 0: High Spending

Observations:
• High Spending Customers, have lower Credit Limit thus their Spend gets limited.
• Most of them prefer to have full Payment done, to maintain their Credit Score
Recommendations:
• Increase their Credit Limit, giver promotional Offers and introduce a Reward scheme to
encourage Spend as the max spent in Single shopping is higher in this category
• Give Loan on Credit Card as they have a good repayment history

Cluster 1: Low Spending

Observations:
• Low Spending Customers, have higher Credit Limit. They have the highest Spent amount in
Single Shopping
• Most of them prefer to have full Payment done, to maintain their Credit Score
Recommendations:
• Offers to be provided to increase their Spend
• Tie up with multiple vendors for groceries, Utlities that would help them spend more
• Advertisement emails to be sent by studying their Spend Pattern

Cluster 2: Medium Spending

Observations:
• Medium Spending Customers, have higher Credit Limit.
• Most of them prefer to have full Payment done, to maintain their Credit Score
Recommendations:
• These are our Potentional High Spenders
• Promote premium Card or Loyalty Program to increase the Transaction.
• Tie up with sirport Lounge, Travel partners
• Advertisement emails of Premium Sites, Brands can help boost their Spend
PART II
CART-RF-ANN

Problem Statement …… 13

2.1 Read the data, do the necessary initial

steps, and exploratory data analysis
(Univariate, Bi-variate, and multivariate
analysis)…… 14 to 17

2.2 Data Split: Split the data into test and

train, build classification model CART,
Random Forest, Artificial Neural
Network…… 18

2.3 Performance Metrics: Comment and

Check the performance of Predictions on
Train and Test sets using Accuracy,
Confusion Matrix, Plot ROC curve and get
ROC_AUC score, classification reports for
each model…… 19 to 24

2.4 Final Model: Compare all the models

20XX PRESENTATION TITLE 12

and write an inference which model is

best/optimized….. 25

2.5 Inference: Based on the whole

Analysis, what are the business insights
and recommendations ….. 26
PROBLEM STATEMENT

An Insurance firm providing tour insurance is facing higher claim

frequency. The management decides to collect data from the
past few years. You are assigned the task to make a model
which predicts the claim status and provide recommendations to
management. Use CART, RF & ANN and compare the models'
performances in train and test sets.

13
DATA ANALYSIS

DATA SHAPE
• The Data frame has 210 rows and 9 Columns

DATA DESCRIPTION
• 10 features within the dataset
• 3000 rows, no null Values
• Apart from Age, Commision, Durationand Sales rest are Object data type
• Target variable is Claimed
• Agency_code has 4 unique Values
• There are 2 Types of Insurance
• 2 Channels
• 5 Products and 3 Destinations

DATA SAMPLE
14

• The Data is clean with no Null or NA values

• The dataset shows 139 Duplicate values, we cannot choose to remove it. This data does
not provide us with the Customer ID, there can be a possibility that the data belongs to
different Customers. Hence i choose not to drop these rows
EXPLORATORY DATA ANALYSIS

Age Commision Duration Sales

Range of values 76 210.21 4581 539
Minimum 8 0 -1 0
Maximum 84 210.21 4580 539
Mean value 38.091 14.5292 70.0013 60.2499
Median value 36 4.63 26.5 33
Standard deviation 10.4635 25.4815 134.0533 70.734
Null values False False False False
spending - 1st Quartile (Q1) is 32 0 11 20
spending - 3st Quartile (Q3) is 42 17.235 63 69
Interquartile range (IQR) 10 17.235 52 49

• The Table above gives us details on each columns Max and Min Value
• Shows Mean, Median and Inter Quartile range of the Column

OUTLIER ANALYSIS

• There are Outliers in every Column of the data

PAIRPLOT AND CORRELATION
CATEGORICAL FEATURES
• There is a need to convert all Categorical variable to Integer

DATA SAMPLE
PROJECT ANALYSIS

The Data Need to be Scaled

Inference
• Scaling needs to be performed only on Numerical Values. Z-Score method is used
for Scaling on this dataset
• With Z-Score we would come to know how many std deviation is the point away from
mean. It also states the Direction.

Scaled Dataset

Scaled Dataset is used for Test and Train Split. 30% Test and 70% Train

X_train (2100, 9) | X_test (900, 9) | train_labels (2100,) | test_labels (900,)

PROJECT ANALYSIS

2.3 Performance Metrics: Comment and Check the performance of Predictions

on Train and Test sets using Accuracy, Confusion Matrix, Plot ROC curve and get
ROC_AUC score, classification reports for each model.

We have tried 3 Models, Decision Tree, Random Forest and Artificial Neural Network to
predict the Accuracy

DECISION TREE
A decision tree is a decision support tool that uses a tree-like model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility. It is
one way to display an algorithm that only contains conditional control statements.

Decisin Tree
Decision Tree Image

The above 2 Attachment has the Decision tree before estimating the best Cluster and Leaf Size.
Considering the Original dataset, below is the feature Importance details we get to see.

We tried a few Combinations to arrive at the best values in Grid Search to estimate the Accuracy
GridSearchCV(cv=3, estimator=DecisionTreeClassifier(),
param_grid={'max_depth': [3, 4, 4.5, 5, 5.5],
'min_samples_leaf': [30, 35, 36, 38],
'min_samples_split': [90, 105, 112, 116]})

Best Parameters we found basis the above combination is :

{'max_depth': 5, 'min_samples_leaf': 35, 'min_samples_split': 116}

Feature Importance graph basis the

best parameters of Decision Tree
looks like this.
Agency_code gained highest
importance followed by Sales.
PROJECT ANALYSIS

Scores basis train Model Scores basis test Model

Train Model

Test Model
PROJECT ANALYSIS

RANDOM FOREST
Random forest is a Supervised Machine Learning Algorithm that is used widely in
Classification and Regression problems. It builds decision trees on different samples and
takes their majority vote for classification and average in case of regression

We tried a few Combinations to arrive at the best values in Grid Search to estimate the Accuracy

GridSearchCV(cv=3, estimator=RandomForestClassifier(),
param_grid={'max_depth': [8, 10, 12], 'max_features': [2, 3, 4, 5],
'min_samples_leaf': [8, 9, 10, 11],
'min_samples_split': [24, 36, 40, 44],
'n_estimators': [101, 301]})

Best Parameters we found basis the above combination is :

{'max_depth': 8, 'max_features': 5, 'min_samples_leaf': 8, 'min_samples_split': 36,
'n_estimators': 301}

Feature Importance graph basis the best parameters of Random Forest looks like this.
Agency_code gained highest importance followed by Sales.

Scores basis train Model Scores basis test Model

PROJECT ANALYSIS

Train Model

Test Model
PROJECT ANALYSIS

ARTIFICIAL NEURAL NETWORK

Artificial neural networks, usually simply called neural networks, are computing systems
inspired by the biological neural networks that constitute animal brains. An ANN is based
on a collection of connected units or nodes called artificial neurons, which loosely model
the neurons in a biological brain

We tried a few Combinations to arrive at the best values in Grid Search to estimate the Accuracy

GridSearchCV(cv=3, estimator=MLPClassifier(),
param_grid={'activation': ['logistic', 'relu'],
'hidden_layer_sizes': [(100, 100, 100)],
'max_iter': [10000], 'solver': ['sgd', 'adam'],
'tol': [0.1, 0.01]})

Best Parameters we found basis the above combination is :

{'activation': 'relu’, 'hidden_layer_sizes': (100, 100, 100), 'max_iter': 10000,
'solver': 'adam’, 'tol': 0.01}

Feature Importance graph won’t be seen in ANN as it is termed as black box algorithm

Scores basis train Model Scores basis test Model

PROJECT ANALYSIS

Train Model

Test Model
PROJECT ANALYSIS

MODEL COMPARISON
2.4 Final Model: Compare all the models and write an
inference which model is best/optimized.
We have compared the outputs of the 3 models to study, Precision, F1 Score, Accuracy and
Recall. This Should help us decide the optimum model
PROJECT ANALYSIS

MODEL COMPARISON
2.5 Inference: Based on the whole Analysis, what are the
business insights and recommendations

• The Analysis calls for gathering more real-time Data and Historical data as
the accuracy and Recall across the model does not fluctuate big time.
• Given set of information are the primary columns but other passive
parameters can fluctuate the data to a large extent like Weather,
Diseases, Types of Vehicles.
• Streamlining online experiences benefitted customers, leading to an
increase in conversions, which subsequently raised profits.
• As per the data 90% of insurance is done by online channel.
• We need to find why, Other interesting fact, is almost all the offline
business has a claimed associated
• Need to train the JZI agency resources to pick up sales as they are in
bottom, need to run promotional marketing campaign or evaluate if we
need to tie up with alternate agency
• Also based on the model we are getting 80%accuracy, so we need
customer books airline tickets or plans, cross sell the insurance based on
the claim data pattern.
• Other interesting fact is more sales happen via Agency than Airlines and
the trend shows the claim are processed more at Airline. So we may need
to deep dive into the process to understand the workflow and why?
• Key performance indicators (KPI) The KPI’s of insurance claims are:
Reduce claims cycle time, Increase customer satisfaction, Combat fraud,
Optimize claims recovery, Reduce claim handling costs Insights gained
from data and AI-powered analytics could expand the boundaries of
insurability, extend existing products, and give rise to new risk transfer
solutions in areas like a non-damage business interruption and
reputational damage.
THANK YOU

20XX PRESENTATION TITLE 27

Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
2023 Open Source Tech Stack Guide For Msps
No ratings yet
2023 Open Source Tech Stack Guide For Msps
42 pages
Data Mining Business Report-Clustering & CART
100% (4)
Data Mining Business Report-Clustering & CART
57 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Data Mininig Project
67% (3)
Data Mininig Project
28 pages
Project Questions
No ratings yet
Project Questions
4 pages
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
50 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
Credit Card Segmentation
No ratings yet
Credit Card Segmentation
3 pages
Clustering Analysis: Reading The Data
100% (1)
Clustering Analysis: Reading The Data
15 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Project Report - Data Mining
0% (1)
Project Report - Data Mining
52 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Data Mining
No ratings yet
Data Mining
28 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining Project Ragunathan
No ratings yet
Data Mining Project Ragunathan
21 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Data Mining Project Report - Reshma
No ratings yet
Data Mining Project Report - Reshma
23 pages
Bank Customer Segmentation
No ratings yet
Bank Customer Segmentation
14 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
No ratings yet
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
10 pages
Atelier N5 PDF
No ratings yet
Atelier N5 PDF
5 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
(2018) Data Analysis of Consumer Complaints in Banking
No ratings yet
(2018) Data Analysis of Consumer Complaints in Banking
5 pages
unsupervised learning
No ratings yet
unsupervised learning
19 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Joseph Xavier J - FML
No ratings yet
Joseph Xavier J - FML
15 pages
Project 4 Data Mining Final v2
100% (1)
Project 4 Data Mining Final v2
19 pages
final-khdl
No ratings yet
final-khdl
32 pages
Business Report
No ratings yet
Business Report
18 pages
Description: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Description: Bank - Marketing - Part1 - Data - CSV
4 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Data Mining Project
No ratings yet
Data Mining Project
34 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Report-Yifan_Lu.1
No ratings yet
Report-Yifan_Lu.1
13 pages
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Customer Clustering Using RFM Analysis
No ratings yet
Customer Clustering Using RFM Analysis
5 pages
doc_A5
No ratings yet
doc_A5
3 pages
Credit Card Usage Analysis Using KMeans Clustering Report
No ratings yet
Credit Card Usage Analysis Using KMeans Clustering Report
16 pages
Data Segmentation
No ratings yet
Data Segmentation
27 pages
_DWDM_PPT
No ratings yet
_DWDM_PPT
13 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
4 pages
IEEE Conference Template 5
No ratings yet
IEEE Conference Template 5
5 pages
Ass6(DMDS)
No ratings yet
Ass6(DMDS)
7 pages
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
No ratings yet
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
6 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Anh 10 Be0b5
No ratings yet
Anh 10 Be0b5
6 pages
How To Investigate and Optimize LTE Throughput in 5 Steps
100% (1)
How To Investigate and Optimize LTE Throughput in 5 Steps
12 pages
Dslrbooth Photo Booth Pro 6
No ratings yet
Dslrbooth Photo Booth Pro 6
2 pages
Get Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems, and the Internet of Things. 4th Edition Peter Marwedel PDF ebook with Full Chapters Now
100% (7)
Get Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems, and the Internet of Things. 4th Edition Peter Marwedel PDF ebook with Full Chapters Now
55 pages
Q 3:-Development of Static Pages Using HTML of An Online Department Store - The Website Should Be User Friendly and Should Have The Following Pages
No ratings yet
Q 3:-Development of Static Pages Using HTML of An Online Department Store - The Website Should Be User Friendly and Should Have The Following Pages
12 pages
Sharing Files, Data, and Information. in A Network
No ratings yet
Sharing Files, Data, and Information. in A Network
17 pages
Digital Humanities (DH) Is An Area of
No ratings yet
Digital Humanities (DH) Is An Area of
19 pages
DX Diag
No ratings yet
DX Diag
34 pages
Counters
No ratings yet
Counters
56 pages
Presentation Format
No ratings yet
Presentation Format
13 pages
Introduction To XHTML
No ratings yet
Introduction To XHTML
26 pages
Hands On Contiki OS and Cooja Simulator: Exercises (Part II)
No ratings yet
Hands On Contiki OS and Cooja Simulator: Exercises (Part II)
15 pages
The Control Unit Unit 4
No ratings yet
The Control Unit Unit 4
18 pages
Petrel 2010 Property Modeling - 5047465 - 01
No ratings yet
Petrel 2010 Property Modeling - 5047465 - 01
37 pages
InfinityMaxPro UserManual en
No ratings yet
InfinityMaxPro UserManual en
74 pages
SE Notes
No ratings yet
SE Notes
9 pages
Social Scripts and Expectancy Violations Evaluating Communication With Human or AI Chatbot Interactants
No ratings yet
Social Scripts and Expectancy Violations Evaluating Communication With Human or AI Chatbot Interactants
17 pages
Splash PRO: User Manual
No ratings yet
Splash PRO: User Manual
28 pages
First Periodical Test in g9 Math
No ratings yet
First Periodical Test in g9 Math
5 pages
Hasee Pricelist On Januray
No ratings yet
Hasee Pricelist On Januray
3 pages
The Motherboard
No ratings yet
The Motherboard
9 pages
On Dell by Kartik and Shivani
No ratings yet
On Dell by Kartik and Shivani
18 pages
Report Card Making: A Project Report On
No ratings yet
Report Card Making: A Project Report On
34 pages
Web Result With Site Links: Google
No ratings yet
Web Result With Site Links: Google
6 pages
OFM Forecast Limits and Forecast Schedule Implementation
No ratings yet
OFM Forecast Limits and Forecast Schedule Implementation
2 pages
MJ 9120 User Manual 102.5x145mm
No ratings yet
MJ 9120 User Manual 102.5x145mm
56 pages
Eye Blink Sensor Project Glasses and Python for Eye Tracking.pdf 20241021 224110 0000
No ratings yet
Eye Blink Sensor Project Glasses and Python for Eye Tracking.pdf 20241021 224110 0000
10 pages
10.2.6-Packet-Tracer - Use-Lldp-To-Map-A-Network
No ratings yet
10.2.6-Packet-Tracer - Use-Lldp-To-Map-A-Network
6 pages
QTA1 - TranscribeMe Sessions Handbook - Version 1.1 20191202
No ratings yet
QTA1 - TranscribeMe Sessions Handbook - Version 1.1 20191202
39 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining

Uploaded by

Data Mining

Uploaded by

PROJECT DATA MINING

1.1 Read the data, do the necessary initial

1.2 Do you think scaling is necessary for

1.3 Apply hierarchical clustering to scaled

1.4 Apply K-Means clustering on scaled

1.5 Describe cluster profiles for the clusters

A leading bank wants to develop a customer segmentation to

• The Data is clean with no Null or NA values

spending advance_payments probability_of_full_payment current_balance

Range of values credit_limit min_payment_amt max_spent_in_single_shopping

• Outliers Analysis on all Columns show that we have outliers only on

1.2 Do you think scaling is necessary for clustering in this case?

Yes Scaling is Needed

1.3 Apply hierarchical clustering to scaled data. Identify the

We will use 2 Methods to do this

Dendrogram on entire dataset Dendrogram on Last 25 Merge

• Agglomerative Clustering is a type of hierarchical clustering algorithm. It is an

1.4 Apply K-Means clustering on scaled data and determine

• Basis the Silhoutte Score, Cluster 3 or 4 could be the Optimum Clusters

1.5 Describe cluster profiles for the clusters defined. Recommend

• Cluster 0 : High Spending

Cluster 0: High Spending

Cluster 1: Low Spending

Cluster 2: Medium Spending

2.1 Read the data, do the necessary initial

2.2 Data Split: Split the data into test and

2.3 Performance Metrics: Comment and

2.4 Final Model: Compare all the models

and write an inference which model is

2.5 Inference: Based on the whole

An Insurance firm providing tour insurance is facing higher claim

• The Data is clean with no Null or NA values

Age Commision Duration Sales

• There are Outliers in every Column of the data

The Data Need to be Scaled

X_train (2100, 9) | X_test (900, 9) | train_labels (2100,) | test_labels (900,)

2.3 Performance Metrics: Comment and Check the performance of Predictions

Best Parameters we found basis the above combination is :

Feature Importance graph basis the

Scores basis train Model Scores basis test Model

Best Parameters we found basis the above combination is :

Scores basis train Model Scores basis test Model

ARTIFICIAL NEURAL NETWORK

Best Parameters we found basis the above combination is :

Scores basis train Model Scores basis test Model

20XX PRESENTATION TITLE 27

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.