0% found this document useful (0 votes)
36 views34 pages

Credit Card Fraud Detection

Uploaded by

Karthick-5018-IT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views34 pages

Credit Card Fraud Detection

Uploaded by

Karthick-5018-IT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

MAHENDRA INSTITUTE OF TECHNOLOGY

(AUTONOMOUS)
DEPARTMENT OF INFORMATION TECHNOLOGY

CREDIT CARD FRAUD DETECTION USING STATE OF THE ART MACHINE


LEARNING AND DEEP LEARNING ALGORITHMS

2nd REVIEW-DATE-01/03/2023

PRESENTED BY:
GUIDED BY:
S.KARTHICK (611619205018)
K.MAHESHWARAN (611619205023) Mr.M.PREMKUMAR , M.E
B.NAGARAJAN (611619205027) ASSISTANT PROFESSOR-IT
A.SARONJOHNSON (611619205046)

1
OBJECTIVE
To check our transaction is fraudulent or not.

2
ABSTRACT
 Collect the credit card datasets.

 Evaluated based on sensitivity and precision.

 The optimal accuracy for Random Forest is 98.6% respectively.

3
LITERATURE SURVEY
1. BLAST-SSAHA Hybridization for Credit Card Fraud Detection
(Author:-Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior
Member, IEEE, and Arun K. Majumdar, Senior Member, IEEE )
Abstract
The need to locate objects and to be situated in the space, whether
inside or outside, has long been the focus of a substantial amount
of research.Especially in Wireless Sensor Networks, indoor
localization has become an important issue in many fields of
applications. In this paper, we propose an indoor location solution
based on Support Vector Machine (SVM). SVM is a class of
learning algorithms defined to resolve discrimination and
regression problems.
4
2. Detecting Credit Card Fraud by Decision Trees and
Support Vector Machines (Author:-Y. Sahin and E. Duman)
Abstract
◦ The developments in the Information Technology and
improvements in the communication channels, fraud is spreading
all over the world, resulting in huge financial losses. Though
fraud prevention mechanisms such as CHIP&PIN are developed,
these mechanisms do not prevent the most common fraud types
such as fraudulent credit card usages over virtual POS terminals
or mail orders. As a result, fraud detection is the essential tool and
probably the best way to stop such fraud types. In this study,
classification models based on decision trees and support vector
machines (SVM) are developed and applied on credit card fraud
detection problem.
5
EXISTING SYSTEM
 Data normalization is applied before Cluster Analysis.

 Find the fraud detection and to increase the accuracy of results.

 The result obtained was by 23% .

 The algorithm they find was Bayes minimum risk.

6
DISADVANTAGES OF EXISTING SYSTEM
 The gains and losses due to fraud detection is proposed.

 Based on Bayes minimum risk is presented using the PCM.

7
PROPOSED SYSTEM
 We are applying random forest algorithm for credit card dataset.

 Random Forest is an algorithm for classification and regression.

 It is a collection of decision tree classifiers.

8
ADVANTAGES
 Even for large data sets with many features.

 Data instances training is extremely fast in random forest.

 The RFA has been found to provide a good estimate.

9
BLOCK DIAGRAM
Credit card Trained
Data set data

Pre - Pre -
pr
processing processing

Feature ML Feature
Extraction Random Extraction
Forest
Secured Fraudulent
f
Transaction Notification

10
MODULE

MODULE 1 - DATA SET


MODULE 2 - PRE-PROCESSING
MODULE 3 - FEATURE EXTRACTION
MODULE 4 - TRAINED DATA
MODULE 5 - RFM

11
DATA SET
 The transactions made by credit cards , the dataset is highly
unbalanced.
 It contains only numerical input variables which are the result
of a PCA transformation.
 This feature can be used for example cost-sensitive learning.
 Feature 'Class' is the response variable and it takes value 1 in

case of fraud and 0 otherwise."

12
PRE-PROCESSING
 Data preprocessing which mainly include Data cleaning
integration,
transformation and reduction .
 It is a data mining technique that transforms raw data into an
understandable format.

13
STEPS IN DATA PREPROCESSING
1. Import libraries
2. Read data
3. Checking for missing values
4. Checking for categorical data
5. Standardize the data
6. PCA transformation
7. Data splitting

14
FEATURE EXTRACTION
 The features that are well selected not only optimize the
classification accuracy.
 But also reduce the number of required data for achieving an
optimum level of performance.
 Search strategy used for producing a subset of candidate
features for assessment.
 An assessment measure is applied for evaluating the quality of
the subset.

15
TRAINED DATA
 The quality, variety, and quantity of your training data determine
the success of your machine learning models.
 The form and content of the training data often referred to as
labeled.
 Human labeled data is designed for to train specific ML models
with an end application in perspective.

16
RANDOM FOREST ALGORITHM
 It is the basic classifier and it establishes a large number of trees.
 Effective prediction tool widely used in data mining.
 The idea used to create a classifier model is constructing multiple
decision trees.
 Candidate split point one of the first m structure points to arrive in
a leaf.
 Candidate children split in a leaf induces two candidate children
for that leaf.

17
FRAUDULENT NOTIFICATION
 Card issuer may be able to send credit card fraud alert
notifications.
 Via text message to help you detect unauthorized charges
quickly.

18
RAW CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
fraudTrain = pd.read_csv(r"C:\Users\Python_Pc_2\Music\Credit Card
Transactions Fraud\Dataset\fraudTrain.csv")
fraudTest = pd.read_csv(r"C:\Users\Python_Pc_2\Music\Credit Card
Transactions Fraud\Dataset\fraudTest.csv")
fraudTrain.head(5)
fraudTest.head(5)
print(fraudTrain.columns)
print('--------'*10)
print(fraudTest.columns)
# It seems that the first column called Unnamed : 0 will not be useful for data
analysis. Therefore I would drop it.
fraudTrain.drop("Unnamed: 0",axis=1,inplace=True)
fraudTest.drop("Unnamed: 0",axis=1,inplace=True)
fraudTrain = fraudTrain.drop(['cc_num','first','last','trans_num'],axis=1)
fraudTest = fraudTest.drop(['cc_num','first','last','trans_num'],axis=1)
from datetime import datetime as dt
fraudTrain["trans_date_trans_time"] =
pd.to_datetime(fraudTrain["trans_date_trans_time"])
fraudTrain["trans_date"] = fraudTrain["trans_date_trans_time"].dt.date
fraudTrain["trans_date"]= pd.to_datetime(fraudTrain["trans_date"])

fraudTrain['year'] = fraudTrain['trans_date'].dt.year

fraudTrain['month'] = fraudTrain['trans_date'].dt.month
fraudTrain['day'] = fraudTrain['trans_date'].dt.day

fraudTest["trans_date_trans_time"] =
pd.to_datetime(fraudTest["trans_date_trans_time"])
fraudTest["trans_date"] = fraudTest["trans_date_trans_time"].dt.date
fraudTest["trans_date"]= pd.to_datetime(fraudTest["trans_date"])
fraudTest['year'] = fraudTest['trans_date'].dt.year

fraudTest['month'] = fraudTest['trans_date'].dt.month
fraudTest['day'] = fraudTest['trans_date'].dt.day
print(fraudTrain.shape)
print(fraudTrain.columns)
print('--------'*10)
print(fraudTest.shape)
print(fraudTest.columns)
fraudTrain.dtypes
fraudTest.dtypes
fraudTrain["is_fraud"].value_counts()
fraudTrain.groupby("is_fraud")['amt'].mean()
pd.crosstab(fraudTrain["category"],fraudTrain["is_fraud"])
pd.crosstab(fraudTrain["category"],fraudTrain["is_fraud"],normali
ze='index')
ax = sns.countplot(x="gender",data =
fraudTrain[fraudTrain["is_fraud"]==1])
# creating dummy values
fraudTrain =
pd.get_dummies(fraudTrain,columns=['category'],drop_first=True
)
fraudTest =
pd.get_dummies(fraudTest,columns=['category'],drop_first=True)
fraudTrain.columns = fraudTrain.columns.str.replace(' ', '')
fraudTest.columns = fraudTest.columns.str.replace(' ', '')

train = fraudTrain.select_dtypes(include='number')
test = fraudTest.select_dtypes(include='number')
total = pd.concat([train, test])
total.shape
X = total.drop("is_fraud",axis=1)
y = total["is_fraud"]
print(sum(y))
X.dtypes
X=
X.drop(['zip','lat','long','unix_time','merch_lat','merch_long'],axis
=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)
from sklearn.metrics import accuracy_score,
mean_absolute_error ,mean_squared_error, confusion_matrix,
median_absolute_error,classification_report, f1_score,recall_score,precision_score

print("Score the X-train with Y-train is : ", dtc.score(X_train,y_train))


print("Score the X-test with Y-test is : ", dtc.score(X_test,y_test))
y_pred=dtc.predict(X_test)

print("Accuracy score " , accuracy_score(y_test,y_pred))


matrix = classification_report(y_test,y_pred,labels=[1,0])
print('Classification report : \n',matrix)
from sklearn.metrics import confusion_matrix
cf_mat=confusion_matrix( y_pred,y_test)
fig, ax = plt.subplots(figsize=(15,10))
sns.heatmap(cf_mat, linewidths=1, annot=True, ax=ax, fmt='g')
SCREENSHOTS
CONCLUSION
 This paper has examined the performance of two kinds of

random forest models. A real-life dataset on credit card


transactions is used in our experiment.

 Although random forest obtains good results on small set


data, there are still some problems such as imbalanced data.
Our future work will focus on solving these problems.
CONCLUSION
 The algorithm of random forest itself should be improved.

For example, the voting mechanism assumes that each of


base classifiers has equal weight, but some of them may be
more important than others.
 Therefore, we also try to make some improvement for this
algorithm
REFERENCE
 Analysis on Credit Card Fraud Detection Methods "International
Journal of Computer Trends and Technology (IJCTT) – volume 8
number 1– Feb 2014.
 “Credit card Fraud Detection System using Hidden Markov Model
and Adaptive Communal Detection”, International Journal of
Computer Science and Information Technologies, vol 6 (2), 2015
 “Cost sensitive Modeling of Credit Card Fraud Using Neural
Network strategy”, ICSPIS 2016, 14-15 Dec 2016, Amir kabir
University of Technology Tehran, Iran.
19
THANK YOU

Any Queries

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy