Credit Card Fraud Detection Using Machine Learning: Ruttala Sailusha V. Gnaneswar
Credit Card Fraud Detection Using Machine Learning: Ruttala Sailusha V. Gnaneswar
classifier. Adaboost algorithm can be used with short Adaboost algorithm is a powerful classifier that works well on
decision trees. The way the Adaboost is created is such that both the basic and complex problems. The disadvantage of this
initially at first the nodes are created and the tree is made, algorithm is that this algorithm is mostly sensitive to noisy
then the performance of the tree on each of the instances is data. This algorithm is also sensitive to outliers.
checked. Also, a weight is assigned. The training data that is
hard to predict is the( one ) that gives more weight. The Steps for Adaboost Algorithm
()∑
Create f child nodes 1. The Kaggle credit card fraud dataset is taken and is
For i=1 to f do trained. Randomly select some of the sample data.
Set contents f N to Di 2. Using the randomly created sample data now creates
Call Build Tree(Ni) the decision trees sequentially for classifying the
End for fraud and non-fraud cases.
3. The decision trees are formed initially. This can be
End done by splitting the node based on which has the
highest information gain, make it as the root node,
B. Adaboost Algorithm and classify the fraud and non-fraud cases.
Boosting is one of the ensemble techniques. This algorithm 4. Now calculate the error rate, performance, and
is used to build strong classifiers from weaker classifiers. This update the weights of the fraud and non-fraud
can be done by building a strong model by using a weak model transactions that are incorrectly classified.
in the series. Initially, a model is built from the training data. 5. Now majority vote is performed and the decision
Then the second model is built from the first model by trees may result as output which indicates the non-
correcting the errors that represent in the model that is created fraud cases.
before. This is a repetitive process and is continued until either 6. The decision trees may output 1 which indicates that
the maximum number of models is added or the complete it is a fraud case.
training dataset is predicted correctly. Adboost was one of the 7. Finally, we find the accuracy, precision, recall, and
most successful boosting algorithms that were developed for F1-score for both the fraud and non-fraud cases.
the binary classification.
Adaboost Algorithm
Algorithm Adaboost :
IINPUTdataset
Initialize weights, w1(n)=1/n
Create a decision tree
Select the one that has the lowest Entropy
If Incorrectly classified
Calculate Total Error (TE)= sum of up incorrectly
Classified sample weights
Calculate Performance,
For each
Incorrectly classified, increase weights:
Weights incorrect =old weight *
Correctly classified, decrease the weights:
Weight correct =old weight *
Normalized weight of each sample:
Figure.6 Adaboost Algorithm
Normalized weight =
The short name for Adaboost is adaptive boosting. It is best End for
used with weak learners. This Adaboost boosting technique End if
[Figure. 6]combines the multiple weak classifiers into a strong
A. Dataset
The dataset, credit card fraud data is taken from the
European credit card company. The dataset is obtained from
the Kaggle. The dataset holds the transactions that are done
by the credit cardholders in the year 2013 September. The
dataset contains the transactions that are done in two days.
The data set contains 284,807 transactions in which 492
transactions are a fraud. These fraud transactions account for
only 0.172%of all the transactions. The dataset having the
input variable are converted into the numerical values by the
Figure.7 Output for Random Forest
PCA transformation. This is done due to confidentiality
reasons. The features ‘Time' and ‘Amount ‘can’t be PCA
The evaluation criteria are explained[Figure.7] and the
transformed. The class ‘Time ‘represents the difference in the
precision, recall, F1-score are the same for that of the non-
seconds elapsed between the particular transaction and the
fraud cases and differ for that of the fraud cases.
first transaction. The class ‘Amount ‘ represents the money
transaction that had occurred. Another important feature
‘Class' shows whether the transaction is fraudulent or not.
The number indication 1 shows that it is a fraud transaction
and 0 indicates the non-fraud transactions.
B. Evaluation Criteria
To compare various algorithms, we need to evaluate
metrics like accuracy, precision, recall, and F1-score. The
confusion matrix is also plotted. The confusion matrix is a 2*2
Figure.8 Confusion Matrix for Random Forest
matrix. The matrix contains four outputs which are TPR,
TNR, FPR, FNR. Measures such as sensitivity, specificity,
The confusion matrix[Figure.8] shows us that for the train
accuracy, and error-rate can be derived from the confusion
data the true positives are 190490 and false positives are 0, the
matrix. Then we that best suit to detect the credit card fraud.
true negatives are 0 and the false negatives are 330. For the
test data, the true positives are 93818 and false positives are
The output of the confusion matrix is
37, the true negatives are 7 and the false negatives are 125.
1. True Positive Rate, which can be defined as the
number of fraudulent transactions that are even
classified by the system as fraudulent.
2. True Negative Rate, which can be defined as the
number of legitimate transactions that are even
classified as legitimate by the system.
3. False Positive Rate, which can be defined as a
number of the legal transactions which are wrongly
classified as fraud.
4. False Negative Rate is defined as the transactions
that are fraud but are wrongly classified as legal.
Now the comparison of the random forest and the Adaboost REFERENCES
algorithms is shown [Figure.12]. The two algorithms have the 1. Adi Saputra1, Suharjito2L: Fraud Detection using Machine
same accuracy but the precision, recall, and the F1-score of the Learning in e-Commerce, (IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 10, No. 9,
two algorithms differ. The random forest algorithms have the 2019.
highest precision, recall, and F1-score. 2. Dart Consulting,Growth Of Internet Users In India And Impact
On Country’s Economy: https://www.dartconsulting.co.in/market -
news/growth-of-internet-users-in-india-and-impact -on-countrys-
economy/
3. Ganga Rama Koteswara Rao and R.Satya Prasad, “ - Shielding
The Networks Depending On Linux Servers Against Arp
Spoofing, International Journal of Engineering and
Technology(UAE),Vol. 7, PP.75-79, May 2018, ISSN No:
2227-524X, DOI - 10.14419/ijet.v7i2.32.13531.
4. Heta Naik , Prashasti Kanikar: Credit card Fraud Detection
based on Machine Learning Algorithms,International Journal
of Computer Applications (0975 – 8887) Volume 182 – No.
44, March 2019.
5. Navanshu Khare ,Saad Yunus Sait: Credit Card Fraud
Detection Using Machine Learning Models and Collating
Machine Learning Models, International Journal of Pure and
Applied Mathematics Volume 118 No. 20 2018, 825 -838
ISSN: 1314-3395.
6. Randula Koralage, , Faculty of Information Technology,
Figure.13 Comparision of Algorithms University of Moratuwa,Data Mining Techniques for Credit
Card Fraud Detection.
7. Roy, Abhimanyu, et al:Deep learning detecting fraud in credit
card transactions, 2018 Systems and Information Engineering
Design Symposium (SIEDS), IEEE, 2018.
8. Sahayasakila.V, D. Kavya Monisha, Aishwarya, Sikhakolli
VenkatavisalakshiseshsaiYasaswi: Credit Card Fraud Detection
System using Smote Technique and Whale Optimization
Algorithm,International Journal of Engineering and Advanced
Technology (IJEAT) ISSN: 2249-8958, Volume-8 Issue-5,
June 2019.
9. Statista.com. retail e-commerce revenue forecast from 2017 to
2023 (in billion U.S. dollars). Retrieved April 2020, from India
: https://www.statista.com/statistics/280925/e-commerce-
revenueforecast-in-india/
10. Yashvi Jain, NamrataTiwari, ShripriyaDubey,Sarika Jain:A
Comparative Analysis of Various Credit Card Fraud Detection
Techniques, International Journal of Recent Technology and
Engineering (IJRTE) ISSN: 2277-3878, Volume-7 Issue-5S2,
January 2019.
11. Yong Fang1, Yunyun Zhang2 and Cheng Huang1, Credit Card
Fraud Detection Based on Machine Learning, Computers,
Materials & Continua CMC, vol.61, no.1, pp.185
-195, 2019.
12. Kaithekuzhical Leena Kurien, Dr. Ajeet Chikkamannur:
Detection And Prediction Of Credit Card Fraud Transactions
Using Machine Learning , International Journal Of
Engineering Sciences & Research Technolog.