0% found this document useful (0 votes)

5 views23 pages

Final - Bank Customer Response Prediction Model

This document presents a comparative study on classification models to predict the success of bank telemarketing campaigns, specifically focusing on whether customers will subscribe to term deposits. The study utilizes various machine learning classifiers, including Multilayer Perceptron Neural Network, Decision Tree, Logistic Regression, and Random Forest, to analyze a dataset from a Portuguese bank's telemarketing efforts. The research aims to enhance campaign effectiveness by identifying key features influencing customer subscriptions and evaluates model performance using metrics such as precision, recall, and F1 score.

Uploaded by

Tahmid Shihab 201-15-13667

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views23 pages

Final - Bank Customer Response Prediction Model

Uploaded by

Tahmid Shihab 201-15-13667

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

A Long-term deposits prediction: a comparative classification model for

predict the success of bank telemarketing.

Sayed Rafiad Hossan Abhijeet Kundu

ID: 201-15-13963 ID:201-15-13896

A. Introduction

Marketing campaigns basically constitute a technique of outsourcing by organizations with

the goal of improving the financial posture of their businesses and also having a
competitive advantage over there. Organizations utilize direct marketing when focusing
on fragments of clients by reaching them to meet a particular objective. Reaching out to
the client through remote communication centers facilitates operational administration of
campaigns. Such call centers permit speaking with clients through different mediums for
instance by using phones. The advertisement of a product undertaken via a contact center
is called telemarketing because of its nature of remoteness. There are two primary
methodologies by which organizations advance their products. through mass crusades,
which focuses on the overall populace, and direct campaign, which targets just a particular
group of individuals. The formal review demonstrates that the efficacy of mass campaign
is quite low. Typically, under 1% of the entire populace will have a positive reaction to the
mass campaign. Interestingly, direct campaign concentrates just on a little group of
individuals who are thought to have a higher prospect of being attracted to the product
being advertised and, in this manner, would substantially be more productive to engage.
The choosing of these prospective customers poses a dire challenge of classification in
Data Mining which rises by way of matching customer attributes and other to different
outputs. The contribution of this study comes in two main dimensions. First is to build a
prediction model, which is suitable to predict whether a client has subscribed to a term
deposit or not, given a Portuguese Bank’s telemarketing campaign data. In this regard, the
study applies and compares Multilayer Perceptron Neural Network, Decision Tree Logistic
Regression and Random Forest classifiers. The second objective of this study is to enhance
the campaign effectiveness by determining the key features that influence a term deposit
to be subscribed by customers. The remainder of this paper is organized as follows.
B. Methodology
This model aims to predict whether a customer will respond positively (open a bank account,
apply for a loan) or negatively to a marketing campaign, based on various features and historical
data.

B.1 Proposed Model

We must create a work process graph in order to carry out this investigation. Our labor method
was divided into five phases. The chart in (fig.1) below displays the graph.

Data Collection

Null Data Reduction Encoding

Feature Engineering Correlation

Classification

Logistic Regression Decision Tree Classifier

Random Forest Classifier Random Forest

Classifier

K-Neighbors Classifier Random Forest Classifier

Application Comparison Classification

and Result
B.2 Dataset Description
We collect data from Bank. The data is related with direct marketing campaigns (phone calls) of
a Portuguese banking institution. The classification goal is to predict if the client will subscribe a
term deposit. This dataset feature such as age, job, marital, education, default, housing, loan,
contact, month, day_of_week, duration.
Table.1: Parameter for Bank Customer data

B.3 Dataset Preprocessing

The dataset of interest concerns a Direct Marketing Campaign of a Portuguese Banking
Institution retrieved from the University of California Irvine (UCI) Machine Learning
Repository containing (42640) instances with (17) attributes without missing values. The
marketing campaigns were based on phone calls. Usually, more than one contact with the
same client was necessary to ascertain whether or not the product (bank term deposit) was
subscribed. The Table I summarizes the description of the telemarketing dataset.
The attributes in the dataset are nominal (categorical and binary) and numerical. The
numerical properties include Age, Balance, Day, Duration, Campaign, Pdays, and
Previous. Job, Marital, Education, Contact, Month, and Outcome are the classified items.
Output, Housing Loan, and Default are all binary properties. The number of classes
present in an attribute is shown in the column with the heading Description. Numerous
job titles are listed in the Job column, including "admin," "unknown," "unemployed,"
"management," "housemaid," "entrepreneur," "student," "blue-collar," "self- employed,"
"retired," "technician," and "services." The terms "married," "divorced," and "single"
might be used in classes to explain the trait "marital." The subcategories of "education"
are unknown, secondary, elementary, and tertiary. Only two classes—yes or no—are
available for the attributes Default, Housing, Loan, and output. The categories for contact
communication are mobile phone, unknown, and telephone.
Classes in the attitude Month are named after actual months, such as Jan, Feb, etc. The
attribute "Poutoute" describes the result of the preceding marketing campaign, such as
whether it was successful, unsuccessful, or unknown.
We found a sizable number of instances of missing data, erroneous values, and improper
data types in the dataset at hand. We used a variety of data preprocessing approaches to
address these problems and guarantee data integrity. We used the procedure listed below
to preprocess the dataset, which is represented in the diagram below.

Missing data
Include Feature Encoding data Min-Max Scaler
handle

Missing Value Handling: For handling missing data, we can utilize methods such as mode()
and mean() to fill in the gaps based on the most frequent or average values of the respective features.
This approach allowed us to retain the completeness of the dataset and prevent biased analysis due to
missing values. We can also implement another technique. In this sector, we first tried to see if there is
any missing data. If we find any missing value then we fill up it.

Encoding Data: Then we encode the dataset based on dependent and independent data. Here we
implement label encoding technique.

Min Max Scaler: In our dataset, the range of features is very large due to which we do not get
good results. The Min Max scaler is used to minimize these large-range values. That's why we used the
Min Max scaler.

B.4 Dataset Preprocessing Analysis

There are several changes in this dataset after preprocessing. Before preprocessing we have 17 columns
and there are huge missing values in our dataset. But after preprocessing we have only 13 columns and
there are no missing variables. Because we drop the irrelevant column and handle all the missing values.
Here shown the dataset before preprocessing:
Fig.C.1.1: Data Set

Data Scaling:

In this section we scaling dataset with MinMaxScaler.

Fig.C.1.2: After Scaling Data Set

Data Balancing:

Fig.C.1.3: Data Set Balancing

C. Feature selection

Correlation: Correlation is a statistical measure that describes the extent to which two
variables change together. In other words, it quantifies the degree to which a change in
one variable is associated with a change in another variable. Correlation does not imply
causation; it only measures the strength and direction of a linear relationship between two
variables.

Fig.C.1.2: Correlation in Dataset

Select SelectKBest: SelectKBest is a feature selection method in machine learning,

particularly within the context of feature selection using univariate statistics. This method
is available in the scikit-learn library in Python. The purpose of feature selection is to
choose a subset of relevant features from the original set of features. This can lead to a
more efficient and accurate machine learning model, as irrelevant or redundant features
may introduce noise or lead to overfitting. In the case of SelectKBest, it selects
the top k features based on univariate statistical tests. The selection is done independently
for each feature, and the features are chosen solely based on their individual performance
with respect to the target variable.

Fig.C.1.2: After applying SelectKBest method

D. Perform Measurement Matrix

Precision:

Precision is a metric that measures the ratio of accurate positive forecasts. The calculation involves
determining the ratio between the number of true positives and the sum of true positives and false
positives. Precision is a valuable metric, especially in situations when the consequences of false positives
are significant, since it allows for the reduction of wrong positive predictions to a minimum.

True Positives
Precision=
True Positives+ False Positives
Recall(Sensitivity):

Recall, also known as sensitivity or true positive rate, quantifies the ratio of correctly predicted positive
occurrences to the total number of real positive instances as determined by the model. The calculation
involves determining the proportion of genuine positives in relation to the combined total of true positives
and false negatives. The importance of recall becomes evident in situations where the consequences of
false negatives are significant, as it is desirable to reduce the occurrence of missed positive examples.

True Positives
Recall=
True Positives+ False Negatives

F1 Score:
The F1-score is calculated as the harmonic mean of precision and recall. This approach offers a harmonious
equilibrium between the aforementioned indicators, proving particularly advantageous when seeking to
account for both false positives and false negatives throughout the evaluation process. The F1-score is
computed using the following formula:

2 (Precision∗Recall)
F 1−Score=
Precision+ Recall

Support:
Here fig.C.1.3 show the updated dataset after using precision, recall,F1 score, support
model

Fig.C.1.3: Precision, Recall, F1 score, Support

Specificity:

True Positives+ F alse Positive

Sensitivity=
True Negatives

Here fig.C.1.4 show the updated dataset after using sensitivity and specificity

Fig.C.1.4: Sensitivity, Specificity

E. Model Selection

SVM:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and
regression. The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional
space that can separate the data points in different classes in the feature space. SVM works by mapping data
to a high-dimensional feature space so that data points can be categorized, even when the data are not
otherwise linearly separable. A separator between the categories is found, then the data are transformed in
such a way that the separator could be drawn as a hyperplane. In this project we will use svm because it is
effective in high-dimensional cases. Another thing is Different kernel functions can be specified for the
decision functions and its possible to specify custom kernels.

Equation:

f ( x )= β0 + β 1 x 1 +… β n xn
Random Forest Classifier:
Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset
and takes the average to improve the predictive accuracy of that dataset. Random Forest grows multiple
decision trees which are merged together for a more accurate prediction. The logic behind the Random
Forest model is that multiple uncorrelated models (the individual decision trees) perform much better as a
group than they do alone.

Equation:

N
1
y= ∑ yi
N i =1

K-Nearest Neighbors:
The k-nearest neighbors’ algorithm, also known as KNN or k-NN, is a nonparametric, supervised learning
classifier, which uses proximity to make classifications or predictions about the grouping of an individual
data point. While it can be used for either regression or classification problems, it is typically used as a
classification algorithm, working off the assumption that similar points can be found near one another. KNN
classifier operates by finding the k nearest neighbors to a given data point, and it takes the majority vote to
classify the data point. The value of k is crucial, and one needs to choose it wisely to prevent overfitting or
underfitting the model.

Equation:

y=mode( y 1, y 2 , …. , yk )

Decision Tree Classifier:

Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf
node represents the outcome. Decision trees use a recursive partitioning process, where each node is divided
into child nodes, and this process continues until a stopping criterion is met. This assumes that data can be
effectively subdivided into smaller, more manageable subsets.

Equation:
F. Classification Metrics Accuracy Verification

Here I’m check the Decision Tree classifier model and KNN classifier model accuracy test with Recall and
F1 performance measurement metrics.

Recall: Recall, also known as sensitivity or true positive rate, is a crucial metric in classification tasks. If
you focus only on accuracy, a model might predict the majority class all the time and still have a high
accuracy, but it would miss important predictions of the minority class. Recall specifically focuses on the
ability of the model to find all relevant cases within a dataset, especially the minority class.

F1: Our F1 accuracy is outstanding result for classifier model because F1 performance measurement range
is 0 to 1 when the F1 value is near the 0 then we called our data preprocess is not actual and when the recall
value is near the 1 then it perfect

F.1 Regression Metrics Accuracy Verification

Here I’m check the KNN Regression model accuracy test with MAE, MSE, RMSE and RMSLE
performance measurement metrics.

MAE (Mean Absolute Error): Measures the performance of a classification model where the
prediction output is a probability value between 0 and 1. Regression Metrics: Mean Absolute Error (MAE):

Equation:

This metrics work on output test set and predict set for KNN regressor model and the accuracy rate is
(0.36) which is under the range of 0 to 1.
MSE (Mean Square Error): Measures the proportion of the variance in the dependent variable
that is predictable from the independent variables. Best possible score is 1.0 and it can be negative.

Equation:

This metrics work on output test set and predict set for KNN regressor model and the accuracy rate is
(0.51) which is under the range of 0 to 1. But it’s not satisfied score. Because the best score is 1.

MSE Score

Support vector Machine 0.18

Logistic Regression 0.17

Decision Tree Classification 0.14

Random Forest Classifier 0.11

KNN 0.095

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

MSE Score

RMSE (Root Mean Square Error): Root of MSE. Provides an interpretable measure in the
same unit as the target variable.

Equation:
This metrics work on output test set and predict set for KNN regressor model and the accuracy rate is
(0.72) which is under the range of 0 to 1. Which is satisfied score. Because the highest score is 1. The
threshold value for the Root Mean Square Error (RMSE) depends on the specific problem you are working
on and the context of your analysis. RMSE is a measure of the error or the difference between predicted
and actual values, so the threshold value for RMSE is not a fixed constant.

RMSE Score

Support vector Machine 0.43

Logistic Regression 0.41

Decision Tree Classification 0.38

Random Forest Classifier 0.33

KNN 0.3

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

RMSE Score

R^2(Root Squared): Measures the proportion of the variance in the dependent variable that is
predictable from the independent variables. Best possible score is 1.0 and it can be negative.

Equation:

This metrics work on output test set and predict set for KNN regressor model and the accuracy rate is
(0.84) which is under the range of 0 to 1. Which is satisfied score. Best possible score is 1.0 and it can be
negative.
Jaccard Score:
The jaccard score is a metric used to evaluate and contrast the similarities and diversity of sample. The
intersection to union ratio is the same for both. Comparing two finite sample sets of comparable size is
fesible with the use of the Jaccard coefficient, which is a kind of statistical statistic. It is calculated by
dividing the sample sets cross sectional area by the combined sample sets total area. For each algorithm in
this model, the Jaccard score chart and percentages are shown in figure:

Jaccard Score

Support vector Machine 0.68

Logistic Regression 0.7

Decision Tree Classification 0.74

Random Forest Classifier 0.79

KNN 0.82

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Jaccard Score(%)

In this fig, KNN had the highest accuracy rate of 0.82. followed by Random Forest classifier, Decision
Tree Classification, Logistic Regression rate of 0.79, 0.74, 0.7 and Support Vector had the lowest accuracy
is 0.68.
Cross_Val_Score:
Cross_val_score is a function in the scikit-learn package which trains and tests a model over multiple folds
of your dataset. This cross validation method gives you a better understanding of model performance over
the whole dataset instead of just a single train/test split.

Algorithm Name Cross_Val_Score

n=5
Logistic Regression [0.92070019 0.92163083 0.90776642
0.90471155 0.90012924]

Decision Tree Classifier [0.87359023 0.8517213 0.8376219

0.80883562 0.73093644]

Random Forest Classifier [0.91635338 0.89191729 0.90977444

0.90225564 0.89802632]

Support Vector Machine [0.90977444, 0.90553401, 0.90330161,

0.89848431, 0.88285748]

K-NeighborsClassifier [0.90707237, 0.90717894, 0.90717894,

0.90623898, 0.90706145]

Sensitivity Score(%)

Support vector Machine 82.1

Logistic Regression 83.91

Decision Tree Classification 89.58

Random Forest Classifier 92.1

KNN 98.93

0 10 20 30 40 50 60 70 80 90 100

Jaccard Score(%)

Specificity:

Specificity Score(%)

71.89
Support vector Machine

71.89
Logistic Regression

45.31
Decision Tree Classification

54.68
Random Forest Classifier

7.34
KNN

0 10 20 30 40 50 60 70 80

Jaccard Score(%)

G. Results Discussion
After applying different algorithms to our Bank deposit project, we conducted a comparison between
different model. The results indicated that the k-NN algorithm performed well in the approach.

Algorithm Name Train Set Test Set Sensitivity Specificity

Logistic Regression 80.95% 79% 83.91% 71.89%

Decision Tree 100% 83.9% 89.58% 45.31%

Classifier

Random Forest 99.9% 86.31% 92.10% 54.68%

Classifier

Support Vector Machine 83.47% 81.37% 82.10% 71.89%

K-NeighborsClassifier 90.92% 90.55% 98.93% 7.34%

From the table we see that during the implementation of model Logistic Regression the accuracy
of train set 80.95% where the test set accuracy is 79%, That means that it makes a better fit and make a
good result. While implementing Decision Tree Classifier the accuracy of train set 100% where
the test set accuracy is 83.9%. In that case we see that the train set overfit cause the test set
result has more gap. If the model is best fit then the test result would be closer to the train set
accuracy. So, this model is not best fit for the bank deposit data. On the other hand, while
implementing Random Forest Classifier the accuracy of train set 99.9% close to 100% where
the test set accuracy is 86.13%. Here we see that this model is as like as Decision Tree
Classifier cause the Random Forest Classifier model is overfitted but when compare the test set
result with the Decision Tree Classifier test set result then the Random Forest Classifier result
far better. On the other hand, during implementing Support Vector Machine (SVM) the
accuracy of the train set 83.47% where the test set accuracy is 81.37%.so we sat that this model
is fitted and it also make promising result. Among of these models the K-Neighbors Classifier
(KNN) make a promising result. Here we see that in the implementation the accuracy of the
train set 90.92% where the test set accuracy is 90.55%. This model is best fit in that case if we
compare with another model that we implement in this project. In that model we implement
Weighted K-Nearest Neighbors (K-NN) that assigns different weights to the neighbors based
on their distances from the query point.
Neighbors that are closer to the query point are given higher weights in the decision-making
process.
For that reason, it fitted best. When we see in the table the “Root Mean Squared Error"
(RMSE) the lowest value is 0.30 and this value belongs the KNN model and it that case when
the RMSE value is low that means there have a lowest error. We also se the Mean Square Error
(MSE) the lowest value belongs KNN model that is 0.095. In the analysis of sensitivity, we see
that the value of KNN model is best (98.93) %. In the context of binary classification,
sensitivity, also known as true positive rate, recall, or hit rate, is a measure of the proportion of
actual positive cases that were correctly identified by a classification model. A high sensitivity
value indicates that the model is effective at identifying positive instances, but it may come at
the cost of an increased false positive rate. Sensitivity is particularly important in scenarios
where the cost of missing positive instances (false negatives) is high, and it is often used
alongside other metrics such as specificity, precision, and F1-score for a more comprehensive
evaluation of a classifier. The Jaccard Index is a measure of similarity between two sets. In the
context of binary classification, it is often used to evaluate the similarity between the predicted
positive instances and the true positive instances. We also see in the table the score is highest
in the model KNN. The Jaccard Index ranges from 0 to 1, with 1 indicating perfect similarity
between the sets. It is a useful metric when dealing with imbalanced datasets or when you want
to focus on the intersection of positive instances. We also cheek the cross validation, here the
KNN perform best. we used % random set of data and find out the score around all the point
around 0.95. we also make ROC to AUC curve for visualization. That are given bellow-

Fig-Logistic Regression Fig- Decision Tree Classifier

Fig- Random Forest Classifier Fig- Support Vector Machine

Fig- K-Nearest Neighbors Classifier

Conclusion

In summary, the Decision Tree Classifier and Random Forest algorithms exhibited overfitting issues,
with 100% and 99.99 % accuracy on the training set but poor accuracy on the test set. The K-Nearest
Neighbors showed the best overall performance, achieving 90.92% accuracy on the training set and
90.55% on the test set. The logistic regression and Support Vector Machine (SVM) also make a promising
accuracy around 79% and 81% for that dataset. Overall, K-Neighbors Classifier (KNN), Logistic
Regression and Support Vector Machine (SVM) are promising choices for a bank deposit model.

Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
D-MSS-DS-23 Exam Updated Practice Questions 2025
No ratings yet
D-MSS-DS-23 Exam Updated Practice Questions 2025
5 pages
Get PC: Red Giant Pluraleyes
No ratings yet
Get PC: Red Giant Pluraleyes
6 pages
Writing PHD Thesis Latex
100% (3)
Writing PHD Thesis Latex
4 pages
Introduction To The UPS Developer Kit
No ratings yet
Introduction To The UPS Developer Kit
33 pages
Final Project Bank Marketing Campaign
No ratings yet
Final Project Bank Marketing Campaign
42 pages
Data Pre Processing
No ratings yet
Data Pre Processing
26 pages
PIC-P67J60 Development Board Users Manual: Rev. C, December 2009
100% (2)
PIC-P67J60 Development Board Users Manual: Rev. C, December 2009
18 pages
C Programming Class 12 Functions
No ratings yet
C Programming Class 12 Functions
36 pages
Great Learnings - Data Science and Business Analytics - Project List
No ratings yet
Great Learnings - Data Science and Business Analytics - Project List
30 pages
BDP and CapDev Format Sample
No ratings yet
BDP and CapDev Format Sample
17 pages
Final Project
No ratings yet
Final Project
32 pages
Telco Customer Churn
100% (2)
Telco Customer Churn
11 pages
BANK MARKETING ANALYSIS Sumit Anuj Pabitra Palash
No ratings yet
BANK MARKETING ANALYSIS Sumit Anuj Pabitra Palash
59 pages
CRM - Part 3 - Analytical CRM - Chap 7
No ratings yet
CRM - Part 3 - Analytical CRM - Chap 7
36 pages
Conclusion
No ratings yet
Conclusion
2 pages
Banking Project Final
No ratings yet
Banking Project Final
38 pages
GC 2024 06 30
No ratings yet
GC 2024 06 30
8 pages
Random Forest and Logistic Regression Algorithms A Comparison of Classification Methods For Bank Ma
No ratings yet
Random Forest and Logistic Regression Algorithms A Comparison of Classification Methods For Bank Ma
4 pages
DA Caravan 6672064
No ratings yet
DA Caravan 6672064
26 pages
Bai Giang - Le Thi Thuy
No ratings yet
Bai Giang - Le Thi Thuy
56 pages
SSRN 4976040
No ratings yet
SSRN 4976040
14 pages
Capastone - Project - Subash Karnatakapu
No ratings yet
Capastone - Project - Subash Karnatakapu
54 pages
Dawak2006 Final
No ratings yet
Dawak2006 Final
10 pages
CS Iaabr
No ratings yet
CS Iaabr
6 pages
SYBBA CA Sem IV Labbook
No ratings yet
SYBBA CA Sem IV Labbook
116 pages
Module 7 Homework Prompt - JMP
No ratings yet
Module 7 Homework Prompt - JMP
6 pages
A Data-Driven Approach To Predict The Success of Bank Telemarketing
No ratings yet
A Data-Driven Approach To Predict The Success of Bank Telemarketing
35 pages
Using Data Mining Techniques For Detecting The Important Features of The Bank Direct Marketing Data (#354551) - 365990
No ratings yet
Using Data Mining Techniques For Detecting The Important Features of The Bank Direct Marketing Data (#354551) - 365990
5 pages
Final Review Presentation 24msp3077
No ratings yet
Final Review Presentation 24msp3077
26 pages
Lec 12
No ratings yet
Lec 12
15 pages
ML1 CAOnline Retail IIresearch Paper
No ratings yet
ML1 CAOnline Retail IIresearch Paper
8 pages
Cmam2022 285 290
No ratings yet
Cmam2022 285 290
6 pages
TextAds LandingPage L1 v5 07292022-1
No ratings yet
TextAds LandingPage L1 v5 07292022-1
4 pages
ANSYS Stress Linearization
No ratings yet
ANSYS Stress Linearization
15 pages
Iscte - Instituto Universitário de Lisboa Bru-Iul (Unide-Iul) Av. Forças Armadas 1649-126 Lisbon-Portugal FCT Strategic Project Ui 315 Pest-Oe/Ege/Ui0315
No ratings yet
Iscte - Instituto Universitário de Lisboa Bru-Iul (Unide-Iul) Av. Forças Armadas 1649-126 Lisbon-Portugal FCT Strategic Project Ui 315 Pest-Oe/Ege/Ui0315
23 pages
Walmart Sales Prediction Using Support Vector Regression and Multivariate Regression
No ratings yet
Walmart Sales Prediction Using Support Vector Regression and Multivariate Regression
5 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
CSC270 DB CDF V4.0
No ratings yet
CSC270 DB CDF V4.0
2 pages
Bank Data Analysis Report
No ratings yet
Bank Data Analysis Report
14 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Report Varsha GanapathyRao 10539034
No ratings yet
Report Varsha GanapathyRao 10539034
17 pages
Mlproj
No ratings yet
Mlproj
49 pages
NEB Letter Authorizing Seismic Tests
No ratings yet
NEB Letter Authorizing Seismic Tests
5 pages
A Geometrical Approach To Enhance Security Against Cyber Attacks in Digital Substations
No ratings yet
A Geometrical Approach To Enhance Security Against Cyber Attacks in Digital Substations
15 pages
Predictive Analysis For Retail Banking
No ratings yet
Predictive Analysis For Retail Banking
28 pages
College Presentation
No ratings yet
College Presentation
9 pages
Data Analytics On Banking
No ratings yet
Data Analytics On Banking
3 pages
Predicting The Term Deposit Subscription
No ratings yet
Predicting The Term Deposit Subscription
38 pages
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
0% (1)
Huawei ICT Competition 2023-2024 Exam Outline - Cloud Track
1 page
Untitled Document
No ratings yet
Untitled Document
5 pages
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
No ratings yet
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
56 pages
Enhancing Telemarketing Success Using Ensemble-Based Online Machine Learning
No ratings yet
Enhancing Telemarketing Success Using Ensemble-Based Online Machine Learning
21 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Problem Based Task Dfc2043
100% (2)
Problem Based Task Dfc2043
3 pages
Dataset Information
No ratings yet
Dataset Information
1 page
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
10 pages
Bank Additional Names
No ratings yet
Bank Additional Names
1 page
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
No ratings yet
EEE - 559: Mathematical Pattern Recognition Individual Project Abinaya Manimaran
41 pages
Project Presentation.
No ratings yet
Project Presentation.
19 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Work Book 1 Crisp DM
No ratings yet
Work Book 1 Crisp DM
6 pages
November 2010)
No ratings yet
November 2010)
6 pages
Nazreen - CIA 2 Applied Data Mining and Big Data
No ratings yet
Nazreen - CIA 2 Applied Data Mining and Big Data
5 pages
Project Presentation
No ratings yet
Project Presentation
19 pages
Classification - Bank - Marketing - Dataset - Jupyter Notebook
No ratings yet
Classification - Bank - Marketing - Dataset - Jupyter Notebook
23 pages
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
No ratings yet
Everything You Need To Know About Chatgpt Expeed Software 240314091646 b2188bc5
19 pages
Bank Names
No ratings yet
Bank Names
2 pages
Final Project Bank Marketing Campaign
No ratings yet
Final Project Bank Marketing Campaign
42 pages
Dbms Notes
No ratings yet
Dbms Notes
48 pages
Power World Simulator
No ratings yet
Power World Simulator
10 pages
Hp1047, Vmr286 Loan Default Prediction Final Report
No ratings yet
Hp1047, Vmr286 Loan Default Prediction Final Report
8 pages
Topic 3 - Java Data Types and Variables
No ratings yet
Topic 3 - Java Data Types and Variables
19 pages
Telemarketing Dataset Analysis: Group 7 Abhishek Jagdale Nilay N Sonal Mittal Swapnil B Swapnil T Vishal Sinha
No ratings yet
Telemarketing Dataset Analysis: Group 7 Abhishek Jagdale Nilay N Sonal Mittal Swapnil B Swapnil T Vishal Sinha
21 pages
DM Assg 041
No ratings yet
DM Assg 041
9 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
No ratings yet
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
4 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Enterprise Architecture Udemy Course Contents
No ratings yet
Enterprise Architecture Udemy Course Contents
17 pages
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
No ratings yet
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
98 pages
Knight Eod Robot
No ratings yet
Knight Eod Robot
11 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
ABAP Performance Tuning
No ratings yet
ABAP Performance Tuning
40 pages
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
No ratings yet
Utilization of The AISC Steel Sculpture For An Introductory Construction Plan Reading Course
7 pages
LTE Overview
No ratings yet
LTE Overview
44 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Final - Bank Customer Response Prediction Model

Uploaded by

Final - Bank Customer Response Prediction Model

Uploaded by

A Long-term deposits prediction: a comparative classification model for

predict the success of bank telemarketing.

ID: 201-15-13963 ID:201-15-13896

Marketing campaigns basically constitute a technique of outsourcing by organizations with

B.1 Proposed Model

Null Data Reduction Encoding

Feature Engineering Correlation

Logistic Regression Decision Tree Classifier

Random Forest Classifier Random Forest

K-Neighbors Classifier Random Forest Classifier

Application Comparison Classification

B.3 Dataset Preprocessing

B.4 Dataset Preprocessing Analysis

In this section we scaling dataset with MinMaxScaler.

Fig.C.1.2: After Scaling Data Set

Fig.C.1.3: Data Set Balancing

Fig.C.1.2: Correlation in Dataset

Select SelectKBest: SelectKBest is a feature selection method in machine learning,

Fig.C.1.2: After applying SelectKBest method

D. Perform Measurement Matrix

Fig.C.1.3: Precision, Recall, F1 score, Support

True Positives+ F alse Positive

Fig.C.1.4: Sensitivity, Specificity

Decision Tree Classifier:

F.1 Regression Metrics Accuracy Verification

Support vector Machine 0.18

Logistic Regression 0.17

Decision Tree Classification 0.14

Random Forest Classifier 0.11

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

Support vector Machine 0.43

Logistic Regression 0.41

Decision Tree Classification 0.38

Random Forest Classifier 0.33

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Support vector Machine 0.68

Logistic Regression 0.7

Decision Tree Classification 0.74

Random Forest Classifier 0.79

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Algorithm Name Cross_Val_Score

Decision Tree Classifier [0.87359023 0.8517213 0.8376219

Random Forest Classifier [0.91635338 0.89191729 0.90977444

Support Vector Machine [0.90977444, 0.90553401, 0.90330161,

K-NeighborsClassifier [0.90707237, 0.90717894, 0.90717894,

Support vector Machine 82.1

Logistic Regression 83.91

Decision Tree Classification 89.58

Random Forest Classifier 92.1

Algorithm Name Train Set Test Set Sensitivity Specificity

Logistic Regression 80.95% 79% 83.91% 71.89%

Decision Tree 100% 83.9% 89.58% 45.31%

Random Forest 99.9% 86.31% 92.10% 54.68%

Support Vector Machine 83.47% 81.37% 82.10% 71.89%

K-NeighborsClassifier 90.92% 90.55% 98.93% 7.34%

Fig-Logistic Regression Fig- Decision Tree Classifier

Fig- K-Nearest Neighbors Classifier

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.