Credit Card Fraud Detection
Credit Card Fraud Detection
(AUTONOMOUS)
DEPARTMENT OF INFORMATION TECHNOLOGY
2nd REVIEW-DATE-01/03/2023
PRESENTED BY:
GUIDED BY:
S.KARTHICK (611619205018)
K.MAHESHWARAN (611619205023) Mr.M.PREMKUMAR , M.E
B.NAGARAJAN (611619205027) ASSISTANT PROFESSOR-IT
A.SARONJOHNSON (611619205046)
1
OBJECTIVE
To check our transaction is fraudulent or not.
2
ABSTRACT
Collect the credit card datasets.
3
LITERATURE SURVEY
1. BLAST-SSAHA Hybridization for Credit Card Fraud Detection
(Author:-Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior
Member, IEEE, and Arun K. Majumdar, Senior Member, IEEE )
Abstract
The need to locate objects and to be situated in the space, whether
inside or outside, has long been the focus of a substantial amount
of research.Especially in Wireless Sensor Networks, indoor
localization has become an important issue in many fields of
applications. In this paper, we propose an indoor location solution
based on Support Vector Machine (SVM). SVM is a class of
learning algorithms defined to resolve discrimination and
regression problems.
4
2. Detecting Credit Card Fraud by Decision Trees and
Support Vector Machines (Author:-Y. Sahin and E. Duman)
Abstract
◦ The developments in the Information Technology and
improvements in the communication channels, fraud is spreading
all over the world, resulting in huge financial losses. Though
fraud prevention mechanisms such as CHIP&PIN are developed,
these mechanisms do not prevent the most common fraud types
such as fraudulent credit card usages over virtual POS terminals
or mail orders. As a result, fraud detection is the essential tool and
probably the best way to stop such fraud types. In this study,
classification models based on decision trees and support vector
machines (SVM) are developed and applied on credit card fraud
detection problem.
5
EXISTING SYSTEM
Data normalization is applied before Cluster Analysis.
6
DISADVANTAGES OF EXISTING SYSTEM
The gains and losses due to fraud detection is proposed.
7
PROPOSED SYSTEM
We are applying random forest algorithm for credit card dataset.
8
ADVANTAGES
Even for large data sets with many features.
9
BLOCK DIAGRAM
Credit card Trained
Data set data
Pre - Pre -
pr
processing processing
Feature ML Feature
Extraction Random Extraction
Forest
Secured Fraudulent
f
Transaction Notification
10
MODULE
11
DATA SET
The transactions made by credit cards , the dataset is highly
unbalanced.
It contains only numerical input variables which are the result
of a PCA transformation.
This feature can be used for example cost-sensitive learning.
Feature 'Class' is the response variable and it takes value 1 in
12
PRE-PROCESSING
Data preprocessing which mainly include Data cleaning
integration,
transformation and reduction .
It is a data mining technique that transforms raw data into an
understandable format.
13
STEPS IN DATA PREPROCESSING
1. Import libraries
2. Read data
3. Checking for missing values
4. Checking for categorical data
5. Standardize the data
6. PCA transformation
7. Data splitting
14
FEATURE EXTRACTION
The features that are well selected not only optimize the
classification accuracy.
But also reduce the number of required data for achieving an
optimum level of performance.
Search strategy used for producing a subset of candidate
features for assessment.
An assessment measure is applied for evaluating the quality of
the subset.
15
TRAINED DATA
The quality, variety, and quantity of your training data determine
the success of your machine learning models.
The form and content of the training data often referred to as
labeled.
Human labeled data is designed for to train specific ML models
with an end application in perspective.
16
RANDOM FOREST ALGORITHM
It is the basic classifier and it establishes a large number of trees.
Effective prediction tool widely used in data mining.
The idea used to create a classifier model is constructing multiple
decision trees.
Candidate split point one of the first m structure points to arrive in
a leaf.
Candidate children split in a leaf induces two candidate children
for that leaf.
17
FRAUDULENT NOTIFICATION
Card issuer may be able to send credit card fraud alert
notifications.
Via text message to help you detect unauthorized charges
quickly.
18
RAW CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
fraudTrain = pd.read_csv(r"C:\Users\Python_Pc_2\Music\Credit Card
Transactions Fraud\Dataset\fraudTrain.csv")
fraudTest = pd.read_csv(r"C:\Users\Python_Pc_2\Music\Credit Card
Transactions Fraud\Dataset\fraudTest.csv")
fraudTrain.head(5)
fraudTest.head(5)
print(fraudTrain.columns)
print('--------'*10)
print(fraudTest.columns)
# It seems that the first column called Unnamed : 0 will not be useful for data
analysis. Therefore I would drop it.
fraudTrain.drop("Unnamed: 0",axis=1,inplace=True)
fraudTest.drop("Unnamed: 0",axis=1,inplace=True)
fraudTrain = fraudTrain.drop(['cc_num','first','last','trans_num'],axis=1)
fraudTest = fraudTest.drop(['cc_num','first','last','trans_num'],axis=1)
from datetime import datetime as dt
fraudTrain["trans_date_trans_time"] =
pd.to_datetime(fraudTrain["trans_date_trans_time"])
fraudTrain["trans_date"] = fraudTrain["trans_date_trans_time"].dt.date
fraudTrain["trans_date"]= pd.to_datetime(fraudTrain["trans_date"])
fraudTrain['year'] = fraudTrain['trans_date'].dt.year
fraudTrain['month'] = fraudTrain['trans_date'].dt.month
fraudTrain['day'] = fraudTrain['trans_date'].dt.day
fraudTest["trans_date_trans_time"] =
pd.to_datetime(fraudTest["trans_date_trans_time"])
fraudTest["trans_date"] = fraudTest["trans_date_trans_time"].dt.date
fraudTest["trans_date"]= pd.to_datetime(fraudTest["trans_date"])
fraudTest['year'] = fraudTest['trans_date'].dt.year
fraudTest['month'] = fraudTest['trans_date'].dt.month
fraudTest['day'] = fraudTest['trans_date'].dt.day
print(fraudTrain.shape)
print(fraudTrain.columns)
print('--------'*10)
print(fraudTest.shape)
print(fraudTest.columns)
fraudTrain.dtypes
fraudTest.dtypes
fraudTrain["is_fraud"].value_counts()
fraudTrain.groupby("is_fraud")['amt'].mean()
pd.crosstab(fraudTrain["category"],fraudTrain["is_fraud"])
pd.crosstab(fraudTrain["category"],fraudTrain["is_fraud"],normali
ze='index')
ax = sns.countplot(x="gender",data =
fraudTrain[fraudTrain["is_fraud"]==1])
# creating dummy values
fraudTrain =
pd.get_dummies(fraudTrain,columns=['category'],drop_first=True
)
fraudTest =
pd.get_dummies(fraudTest,columns=['category'],drop_first=True)
fraudTrain.columns = fraudTrain.columns.str.replace(' ', '')
fraudTest.columns = fraudTest.columns.str.replace(' ', '')
train = fraudTrain.select_dtypes(include='number')
test = fraudTest.select_dtypes(include='number')
total = pd.concat([train, test])
total.shape
X = total.drop("is_fraud",axis=1)
y = total["is_fraud"]
print(sum(y))
X.dtypes
X=
X.drop(['zip','lat','long','unix_time','merch_lat','merch_long'],axis
=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)
from sklearn.metrics import accuracy_score,
mean_absolute_error ,mean_squared_error, confusion_matrix,
median_absolute_error,classification_report, f1_score,recall_score,precision_score
Any Queries
20