0% found this document useful (0 votes)
73 views26 pages

Credit Card Fraud Detection Report

Uploaded by

jyothibg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views26 pages

Credit Card Fraud Detection Report

Uploaded by

jyothibg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Credit Card Fraud Detection with Data Visualization in R 2022-23

CREDIT CARD FRAUD DETECTION IN R

Chapter 1

INTRODUCTION

1.1 Overview
There has been the need for displaying massive amounts of data in a way that is easily
accessible and understandable. Organizations generate data every day. As a result, the amount of
data available on the Web has increased dramatically. It is difficult for users to visualize, explore,
and use this enormous data. The ability to visualize data is crucial to scientific research. Today,
computers can be used to process large amounts of data. Data visualization is concerned with the
design, development, and application of computer-generated graphical representation of the data.
It provides effective data representation of data originating from different sources. This enables
decision makers to see analytics in visual form and makes it easy for them to make sense of the
data. It helps them discover patterns, comprehend information, and form an opinion.
Data visualization is also regarded as information visualization or scientific visualization. Human
beings have always employed visualizations to make messages or information last in time. What
Cannot be touched, smelled, or tasted can be represented visually [1].
Data visualization is the technique used to deliver insights in data using visual cues such as
graphs, charts, maps, and many others. This is useful as it helps in intuitive and easy
understanding of the large quantities of data and thereby make better decisions regarding it.

Data Visualization in R Programming Language

The popular data visualization tools that are available are Tableau, Plotly, R, Google Charts,
Infogram, and Kibana. The various data visualization platforms have different capabilities,
functionality, and use cases. They also require a different skill set. This article discusses the use
of R for data visualization.

R is a language that is designed for statistical computing, graphical data analysis, and scientific
research. It is usually preferred for data visualization as it offers flexibility and minimum
required coding through its packages.

Visualization Techniques

Dept. of CSE, BGSIT, BG Nagar Page 1


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Visualization is the use of computer-supported, visual representation of data. Unlike static data
visualization, interactive data visualization allows users to specify the format used in displaying
data.
Common visualization techniques are as shown in Figure 1.1 and include [2]:
 Line graph: This shows the relationship between items. It can be used to compare changes
over a period of time.
 Bar chart: This is used to compare quantities of different categories.
 Scatter plot: This is a two-dimensional plot showing variation of two items.
 Pie chart: This is used to compare the parts of a whole.
Thus, the format of graphs and charts can take the form of bar chart, pie chart, line graph, etc. It is
important to understand which chart or graph to use for your data.
Data visualization uses computer graphics to show patterns, trends, and relationship among
elements of the data. It can generate pie charts, bar charts, scatter plots, and other types of data
graphs with simple pull-down menus and mouse clicks. Colours are carefully selected for certain
types of visualization. When color is used to represent data, we must choose effective colors to
differentiate between data elements.
In data visualization, data is abstracted and summarized. Spatial variables such as position, size,
and shape represent key elements in the data. A visualization system should perform a data
reduction, transform and project the original dataset on a screen. It should visualize results in the
form of charts and graphs and present results in user friendly way.

Line Graph

Pie Bar
Chart
Data Visualization Chart

Scatter Plot

Figure 1.1 Commonly used data visualization techniques

Dept. of CSE, BGSIT, BG Nagar Page 2


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Applications

Most visualization designs are to aid decision making and serve as tools that augment
cognition. In designing and building a data visualization prototype, one must be guided by
how the visualization will be applied. Data visualization is more than just representing
numbers; it involves selecting and rethinking the numbers on which the visualization is
based [3].
Visualization of data is an important branch of computer science and has wide range of
application areas. Several application-specific tools have been developed to analyze
individual datasets in many fields of medicine and science.

Public Health: The ability to analyze and present data in an understandable manner is
critical to the success of public health surveillance. Health researchers need useful and
intelligent tools to aid their work [4]. Security is important in cloud-based medical data
visualizations. Open any medical or health magazine today, and you will see all kinds of
graphical representations.

Renewal Energy: Calculation of energy consumption compared to production is important


for optimum solution [5].

Environmental Science: As environmental managers are required to make decisions based


on highly complex data, they require visualization. Visualization applications within
applied environmental research are beginning to emerge [6]. It is desirable to have at one’s
disposal different programs for displaying results.

Fraud Detection: Data visualization is important in the early stages of fraud investigation.
Fraud investigator may use data visualization as a proactive detection approach, using it to
see patterns that suggest fraudulent activity [7].

Library-Decision Making: Data visualization software allows librarians the flexibility to


better manage and present information collected from different sources. It gives them the
skill to present information in a creative, compelling way [8]. Visualization of library data
highlights purchasing decisions, future library needs and goals. Librarians, as de facto
experts of data visualization, can assist students, faculty and researchers visualize their
data [9].

Dept. of CSE, BGSIT, BG Nagar Page 3


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Several information visualization algorithms and associated software have been developed.
This software enables users to interpret data more rapidly than ever before. These include
ManyEyes from IBM, SmartMoney for stock market, Insights from Facebook
Corporation, Visual Analytics from SAS, and Thoth from California Institute of
Technology, Tableau, and TOPCAT [10, 11]. They make data visualizations easy to
interpret and rapid to produce. Each tool has its own good features and limitations.
Visualization of a large-scale multidimensional data sets can be combined with new
approaches of interacting with a computer using the Web application (as a service).
Challenges

Large, time-varying datasets pose great challenge for data visualization because of the
enormous data volume. Real-time data visualization can enable users to proactively
respond to issues that arise. Animation generation approach is used for interactive
exploration process of time-varying data. It visualizes temporal events by mimicking the
composition of storytelling techniques [12].
Users differ in their ability to use data visualization and make decisions under tight time
constraints. It is hard to quantify the merit of a data visualization technique. This is the
reason for having a multitude of visualization algorithms and associated software. Most of
this software have not taken advantage of the multi-touch interactions and direct
manipulation capabilities of the new devices.
Big data, structured and unstructured, introduces a unique set of challenges for developing
visualizations. This is due to the fact that we must take into account the speed, size, and
diversity of the data. A new set of issues related to performance, operability, and degree
of discrimination challenge large data visualization and analysis [13]. It is difficult and
time-consuming to create a large simulated data set. It is also difficult to decide what
visual might be the best to use.

1.2 Problem Statement


Credit Card Fraud Detection
Nowadays credit card frauds are drastically increasing in number as compared to earlier
times. Criminals are using fake identity and various technologies to trap the users and get
the money out of them. Therefore, it is very essential to find a solution to these types of
frauds. In this project we will be designing a method or model to detect fraudulent activity

Dept. of CSE, BGSIT, BG Nagar Page 4


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

in credit card transactions. As the technology is changing and becoming more and more
advanced day by day, it is becoming more and more difficult to track the behavior and
pattern of criminal activities. Through this project we will be able to provide a solution
that can make use of technologies such as Machine Learning and Data Visualization using
R. Hence, easing the process of detection of fraudulent card transactions.

1.3 Objectives
The basic objectives of the projects are listed below:
 To study the unauthorized and unwanted ‘fraud’ in credit card transactions.
 To monitor the activities of the population of users in order to perceive or avoid
objectionable behavior.
 To collect data from a trusted source and analyze the data.
 To visualize the ongoing trend in such frauds by using advanced visualizing tools
such as R.
1.4 Project Scope

This is a very relevant problem that demands the attention of communities such as machine
learning and data science where the solution to this problem can be automated. Fraud detection
involves monitoring the activities of populations of users in order to estimate, perceive or avoid
objectionable behaviour, which consist of fraud, intrusion, and defaulting.
This problem is particularly challenging from the perspective of learning, as it is characterized by
various factors such as class imbalance. The number of valid transactions far outnumber fraudu-
lent ones. Also, the transaction patterns often change their statistical properties over the course of
time. These are not the only challenges in the implementation of a real-world fraud detection
system, however. In real world examples, the massive stream of payment requests is quickly
scanned by automatic tools that determine which transactions to authorize.

1.5 Existing System


An existing system for credit card fraud detection with data visualization typically involves a
combination of software and technologies to effectively detect and visualize fraudulent activities
in credit card transactions. These systems are designed to help financial institutions and

Dept. of CSE, BGSIT, BG Nagar Page 5


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

businesses reduce fraud and protect their customers. Here is an overview of the components and
functionalities you might find in such a system:
Machine Learning Models:
Machine learning algorithms are used for fraud detection. Common algorithms include logistic
regression, decision trees, random forests, neural networks, and anomaly detection models. These
models are trained on historical transaction data to learn patterns of legitimate and fraudulent
behavior.
Real-time Monitoring:
The system continuously monitors incoming transactions in real-time to identify suspicious
activities. It uses the trained machine learning models to flag transactions that exhibit
characteristics of potential fraud.
Alerting and Reporting:
When a potentially fraudulent transaction is detected, the system generates alerts for review.
Alerts are sent to authorized personnel, such as fraud analysts or investigators, for further action.
Detailed reports are created to document suspicious transactions and provide evidence for
investigation and reporting.
Data Visualization:
Data visualization tools and libraries, which may include R, are used to create interactive charts,
graphs, and dashboards. Visualizations help analysts and decision-makers quickly understand
trends and patterns in transaction data. Common visualization types include time series plots,
geographical heatmaps, and network graphs.
User Interface:
The system typically offers a user-friendly web-based interface where authorized users can log in
to access alerts, reports, and visualizations. User interfaces may also support filtering and drill-
down capabilities for in-depth analysis.
Compliance:
The system must comply with relevant data protection and security regulations, such as GDPR,
HIPAA, and PCI DSS, to protect customer data and ensure data privacy.
Integration:
The system often needs to integrate with external services and databases to access additional
information for fraud detection, such as blacklists, watchlists, and customer profiles.
Performance and Scalability:

Dept. of CSE, BGSIT, BG Nagar Page 6


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

These systems must be able to handle a large volume of transactions and scale to accommodate
increased data flows.

Security:
Robust security measures are implemented to protect the system against cyber threats and
unauthorized access.
Maintenance and Support:
Regular maintenance and updates are essential to keep the system effective against evolving fraud
tactics.
The specific technology stack and architecture of an existing system may vary from one
organization to another, but the core functionalities described above are common in credit card
fraud detection and data visualization systems. The choice of technologies, algorithms, and tools
depends on the organization's needs, resources, and technological capabilities.

1.6 Proposed System

We propose to make a Credit Card Fraud Detection System in R language by making use of
Machine Learning and advanced R concepts. We would be incorporating various algorithms
like Decision Tress, Artificial Neural Networks, Logistic Regression and Gradient Boosting
Classifier. In order to carry out the task of credit card fraud detection, we will be making use
of a Credit Card Transactions dataset consisting of a mix of fraud as well as non-fraudulent
transactions.
Method Used
We have referred various research papers to identify the various components that might be
required in our project. We also referred few websites to understand the various packages
and libraries in R that are to be used.

1.7 Organization of the Project


This Project is organized as several chapters: such as, Chapter 1: Introduction gives the overview
of the project and the technology used. Chapter 2: Literature Survey: Presents the background
work and collection of data. Chapter 3: Software requirement specification for the project.

Dept. of CSE, BGSIT, BG Nagar Page 7


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Chapter 4: System analysis and Design: which provides the architecture and analysis.
Chapter 5: Provides the detailed technology view and the pseudocode.
Chapter 6: Testing: Provides the several testing benchmarks to check whether the requirements
are acquired. Finally, Chapter 7: Brief out the Results and the visualization.

Dept. of CSE, BGSIT, BG Nagar Page 8


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Chapter 2
Literature Survey

Authors Method Purpose Advantages Disadvantages


Patil, S., Nemade, V., Proposed To detect Tuned analytical Limited to only
& Soni, P. K. [1] interfacing of credit card server with machine learning
SAS with fraudulent most optimal approaches
Hadoop transactions model for fraud
framework. detection
Used Decision
trees, ROC
Curves
Awoyemi, J.O., Used f naïve Credit card naïve bayes, k- Logistic regression
Adetunmbi, A. O., & bayes, fraud detection nearest neighbor has an accuracy of
Oluwadare, S. A [2] k-nearest using machine get accuracy as 54.86%
neighbor and learning high as 97.92%
logistic techniques and 97.69%
regression on
highly skewed
credit card
fraud data
Roy, A., Sun, J., Used ANN Deep learning
Utilized a high Comparable results
Mahoney, R., Alonzi, powered detecting fraud
performance, to machine
L., Adams, S., & By cloud in credit card
distributed cloud learning
Beling, P [3] computing and transactions
computing approaches
fine tunes environment to
various navigate past
parameters for common fraud
better results detection.
Xuan, S., Liu, G., Random forest To detect Used two kinds of Only tested on
Li, Z., Zheng, L., for credit cardCredit card random forests datasets pertaining
Wang, S., & Jiang, C. fraud detection.fraudulent are used to train to china
[4] transactions the behavior
using Random features of
Forest normal and
abnormal
transactions.
Jurgovsky, J.,Authors have To study and Concludes to Concluded their
Granitzer, M., Ziegler,made a compare LSTM use a study with a
K., Calabretto, S.,comparison And Random combination of discussion on
Portier, P. E., He-between Forest on two both practical
Guelton, L [5] Random Forest CCFD problem scientific
challenges.

Table 2.1: Literature Survey

Dept. of CSE, BGSIT, BG Nagar Page 9


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Chapter 3
Software Requirement Specification
3.1 Purpose
The purpose of this document is to outline the requirements for a Credit Card Fraud Detection
System with data visualization capabilities using the R programming language. This system is
designed to detect and visualize potential fraudulent credit card transactions, helping financial
institutions to reduce fraud and protect their customers.

3.2 Scope
The system will analyze transaction data, identify suspicious activities, and provide interactive
data visualization using R. It will be capable of handling a large volume of transaction data and
generate meaningful insights for fraud detection.

3.3 Functional Requirements


User Authentication
The system shall provide user authentication to ensure that only authorized personnel can access
and use the system. Users shall have different roles, including administrators and analysts, with
varying levels of access and permissions.
Data Ingestion
The system shall be able to ingest credit card transaction data from various sources, including
databases and external data feeds. It shall support batch and real-time data ingestion.
Data Preprocessing
Data preprocessing shall include cleaning, transformation, and feature extraction.
The system shall handle missing data, outliers, and anomalies in the transaction data.
Fraud Detection
The system shall use machine learning algorithms for fraud detection, including but not limited to
logistic regression, decision trees, and neural networks.
It shall continuously monitor incoming transactions for potential fraud and trigger alerts when
suspicious activities are detected.
Alerting and Reporting
The system shall generate alerts for detected fraud cases and notify authorized personnel.

Dept. of CSE, BGSIT, BG Nagar Page 10


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

It shall provide detailed reports and visualizations of suspicious transactions.


Data Visualization
The system shall use R for data visualization, creating interactive plots and charts. Visualization
shall include transaction trends, geographical heatmaps, and other relevant fraud detection
visuals.
3.4 Non-Functional Requirements
Performance
The system shall be capable of handling a large volume of transactions efficiently and in real-
time. Response times for fraud detection and visualization shall not exceed predefined thresholds.
Security
The system shall implement robust security measures to protect sensitive transaction data and
user information. Access to the system shall be logged and auditable.
Scalability
The system shall be scalable to accommodate the growth in transaction volume and data sources.
Reliability
The system shall have a high level of availability, with minimal downtime.
It shall be capable of recovering from failures and data loss.
System Constraints
The system shall be developed using the R programming language and relevant R packages for
machine learning and data visualization.
It shall run on a dedicated server or cloud infrastructure.
User Interface
The system shall have a user-friendly web-based interface for users to interact with the system,
view alerts, and access data visualization.
Compliance
The system shall comply with relevant data privacy and security regulations, such as GDPR,
HIPAA, or PCI DSS.
3.5 Software Requirements
R Studio software, and various R packages and libraries relating to ML.
3.6 Hardware Requirements
Computer with 8GB+ RA

Dept. of CSE, BGSIT, BG Nagar Page 11


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Chapter 4
System Analysis and Design

4.1 System analysis


is a critical phase in the development of a credit card fraud detection system with data
visualization using R. During this phase, you define the requirements, identify constraints, and
plan the architecture and design of the system. Here's a step-by-step breakdown of the system
analysis for such a project:
Define Objectives and Scope:
Clearly state the objectives of the system, such as reducing credit card fraud and providing data
visualization for decision-making. Define the scope by specifying what the system will and will
not do.
Gather Requirements:
Collect requirements from stakeholders, including end-users, business analysts, fraud analysts,
and compliance teams. Identify functional and non-functional requirements, considering data
sources, data preprocessing, machine learning models, real-time monitoring, alerting, reporting,
data visualization, user interface, compliance, security, and scalability.
Data Collection and Analysis:
Analyze the types of data sources available, including transaction logs, databases, external data
feeds, and historical data. Determine the format, volume, and quality of data to understand the
challenges of data preprocessing.
Data Preprocessing Requirements:
Specify the data preprocessing steps, including data cleaning, transformation, and feature
extraction. Define how to handle missing data, outliers, and data normalization.
Machine Learning Requirements:
Outline the machine learning algorithms that will be used for fraud detection. Specify the
requirements for model training, validation, and deployment.
Real-time Monitoring:
Describe the requirements for real-time transaction monitoring, including the monitoring
frequency and response time.
Alerting and Reporting:

Dept. of CSE, BGSIT, BG Nagar Page 12


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Define the criteria for generating alerts when potential fraud is detected. Specify reporting
requirements, including the format, content, and delivery method of reports.
Data Visualization:
Specify the types of visualizations required, such as time series plots, geographical heatmaps, and
network graphs. Define the interactivity and user-friendliness of the visualizations.
User Interface:
Detail the requirements for the user interface, including user roles, access levels, and user
interactions. Specify features like filtering, drill-down capabilities, and data export options.
Compliance and Security:
Define the system's compliance requirements with data protection regulations (e.g., GDPR,
HIPAA, PCI DSS). Specify security measures to protect sensitive transaction data and user
access.
Scalability:
Describe how the system will scale to accommodate increased transaction volume and data
sources.
Technology Stack:
Specify the technology stack, tools, and frameworks that will be used for system development.
Discuss the choice of R packages and libraries for data visualization and analysis.
Performance Requirements:
Set performance requirements for response times, throughput, and system availability.
Historical Data Retention:
Specify the requirements for storing historical transaction data for analysis and reporting
purposes.
Model Retraining:
Define the frequency and process for retraining machine learning models to adapt to evolving
fraud patterns.
Once the system analysis is completed, we have a comprehensive understanding of the project's
requirements and constraints. This analysis serves as the foundation for the design and
development phases of the credit card fraud detection system with data visualization using R.

Dept. of CSE, BGSIT, BG Nagar Page 13


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

4.2 System Architecture


The key components in the system:
Data Sources: Credit card transactions, Databases, Logs, External data feeds
Data Ingestion: Data integration, Data cleansing and validation, Transformation.
Data Storage: Database for historical and real-time data, Data warehouse for large-scale data
management.
Machine Learning Models: Model training with historical data, Model deployment for real-time
monitoring.
Real-time Transaction Monitoring: Stream processing, Alert generation
Alerting and Reporting: Alert management, Report generation
Data Visualization: R-based visualization, Dashboards, Geographical heatmaps, Time series
plots.

Dept. of CSE, BGSIT, BG Nagar Page 14


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R
Figure 4.1 System Architecture

Chapter 5
Implementation

5.1 Methodology
Implementing a credit card fraud detection system with data visualization using R involves
several steps. Here's a high-level outline of the implementation process:
Data Collection and Preprocessing:
Collect historical credit card transaction data, including both legitimate and fraudulent
transactions. Preprocess the data by cleaning, transforming, and extracting relevant features.
Handle missing values and outliers.
Data Visualization with R:
Use R and relevant packages (e.g., ggplot2, plotly, Shiny) for data visualization.
Create interactive data visualizations, such as time series plots, geographical heatmaps, and
network graphs, to gain insights into the data.
Machine Learning Model Development:
Train machine learning models on historical data to detect fraudulent transactions. Common
algorithms include logistic regression, decision trees, random forests, neural networks, and
clustering methods. Evaluate model performance using appropriate metrics (e.g., precision, recall,
F1-score).
Real-time Monitoring and Alerting:
Implement a real-time monitoring system to continuously analyze incoming credit card
transactions. Generate alerts when a transaction is detected as potentially fraudulent based on the
trained machine learning models.
User Interface Development:
Create a web-based user interface using R and Shiny or other web development tools.
Design a user-friendly dashboard where users can view alerts, reports, and visualizations.
Compliance and Security:
Ensure the system complies with relevant data protection regulations (e.g., GDPR, HIPAA, PCI
DSS). Implement robust security measures to protect sensitive transaction data and user access.
Integration:

Dept. of CSE, BGSIT, BG Nagar Page 15


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Integrate the system with external data sources and fraud databases to enhance fraud detection
accuracy. Implement APIs for external systems to interact with the fraud detection and
visualization components.
The implementation of a credit card fraud detection system with data visualization is a complex
process that requires expertise in data analysis, machine learning, data visualization, and web
development.

5.2 Pseudo Code


Step 1: Getting The DataSet:
> library(ranger)

> library(caret)

> library(data.table)

> creditcard_data <- read.csv ("C:\Users\Arpitha\Desktop\creditcard.csv") > creditcard_data


Time V1 V2 V3 V4 V5 V6
1 0 -1.359807134 -7.278117e-02 2.536346738 1.3781552243 -3.383208e-01 4.623878e-01
2 0 1.191857111 2.661507e-01 0.166480113 0.4481540785 6.001765e-02 -8.236081e-02
3 1 -1.358354062 -1.340163e+00 1.773209343 0.3797795930 -5.031981e-01 1.800499e+00
4 1 -0.966271712 -1.852260e-01 1.792993340 -0.8632912750 -1.030888e-02 1.247203e+00
5 2 -1.158233093 8.777368e-01 1.548717847 0.4030339340 -4.071934e-01 9.592146e-02
6 2 -0.425965884 9.605230e-01 1.141109342 -0.1682520798 4.209869e-01 -2.972755e-02
7 4 1.229657635 1.410035e-01 0.045370774 1.2026127367 1.918810e-01 2.727081e-01
8 7 -0.644269442 1.417964e+00 1.074380376 -0.4921990185 9.489341e-01 4.281185e-01
9 7 -0.894286082 2.861572e-01 -0.113192213 -0.2715261301 2.669599e+00 3.721818e+00 10 9
-0.338261752 1.119593e+00 1.044366552 -0.2221872767 4.993608e-01 - 2.467611e-01
11 10 1.449043781 -1.176339e+00 0.913859833 -1.3756666550 -1.971383e+00 -6.291521e-01
12 10 0.384978215 6.161095e-01 -0.874299703 -0.0940186260 2.924584e+00 3.317027e+00
13 10 1.249998742 -1.221637e+00 0.383930151 -1.2348986877 -1.485419e+00 -7.532302e-01
14 11 1.069373588 2.877221e-01 0.828612727 2.7125204296 -1.783980e-01 3.375437e-01
15 12 -2.791854766 -3.277708e-01 1.641750161 1.7674727439 -1.365884e-01 8.075965e-01
16 12 -0.752417043 3.454854e-01 2.057322913 -1.4686432984 -1.158394e+00 -7.784983e-02

Dept. of CSE, BGSIT, BG Nagar Page 16


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

17 12 1.103215435 -4.029621e-02 1.267332089 1.2890914696 -7.359972e-01 2.880692e-01


18 13 -0.436905071 9.189662e-01 0.924590774 -0.7272190536 9.156787e-01 -1.278674e-01
19 14 -5.401257663 -5.450148e+00 1.186304631 1.7362388001 3.049106e+00 -1.763406e+00
20 15 1.492935977 -1.029346e+00 0.454794734 -1.4380258799 -1.555434e+00 -7.209611e-01
21 16 0.694884776 -1.361819e+00 1.029221040 0.8341592992 -1.191209e+00 1.309109e+00
22 17 0.962496070 3.284610e-01 -0.171479054 2.1092040677 1.129566e+00 1.696038e+00
23 18 1.166616382 5.021201e-01 -0.067300314 2.2615692395 4.288042e-01 8.947352e-02
24 18 0.247491128 2.776656e-01 1.185470842 -0.0926025499 -1.314394e+00 -1.501160e-01
25 22 -1.946525131 -4.490051e-02 -0.405570068 -1.0130573370 2.941968e+00 2.955053e+00
26 22 -2.074294672 -1.214818e-01 1.322020630 0.4100075142 2.951975e-01 -9.595372e-01
27 23 1.173284610 3.534979e-01 0.283905065 1.1335633179 -1.725772e-01 -9.160537e01
28 23 1.322707269 -1.740408e-01 0.434555031 0.5760376524 -8.367580e-01 -8.310834e-01
29 23 -0.414288810 9.054373e-01 1.727452944 1.4734712666 7.442741e-03 -2.003307e-01
30 23 1.059387115 -1.753192e-01 1.266129643 1.1861099547 -7.860018e-01 5.784353e-01
31 24 1.237429030 6.104258e-02 0.380525880 0.7615641114 -3.597707e-01 -4.940841e-01
32 25 1.114008595 8.554609e-02 0.493702487 1.3357599851 -3.001886e-01 -1.075378e-02
33 26 -0.529912284 8.738916e-01 1.347247329 0.1454566766 4.142089e-01 1.002231e-01
34 26 -0.529912284 8.738916e-01 1.347247329 0.1454566766 4.142089e-01 1.002231e-01
35 26 -0.535387763 8.652678e-01 1.351076288 0.1475754745 4.336802e-01 8.698294e-02
36 26 -0.535387763 8.652678e-01 1.351076288 0.1475754745 4.336802e-01 8.698294e-02
37 27 -0.246045949 4.732669e-01 1.695737554 0.2624114880 -1.086641e-02 -6.108359e-01
38 27 -1.452187279 1.765124e+00 0.611668541 1.1768249842 -4.459799e-01 2.468265e-01
……………...and so on
Step 2: understanding the structure of the Dataset
> dim(creditcard_data) Gives us the dimension of the dataset [1] 284807 31

> head(creditcard_data,6) Gives the first 6 data entries in the dataset > tail(creditcard_data,6)
Gives the last 6 entries in the dataset.

> summary(creditcard_data$Amount) This gives us the summary of the dataset Statistically. >
names(creditcard_data) This command will tell us about the names of the columns in the dataset

> var(creditcard_data$Amount) It gives the variance of the amount column

Dept. of CSE, BGSIT, BG Nagar Page 17


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

> sd(creditcard_data$Amount) It gives us the standard deviation in the amount column.

Step 3: We will be scaling our data by using the scale() function in R, in order to remove any
extreme values which might hinder in the functioning of our model. The scaling function helps
standardize the data, by structuring them according to a specific range.

> creditcard_data$Amount=scale(creditcard_data$Amount) This will scale our dataset. >


NewData=creditcard_data[,-c(1)]

> head(NewData) This is used in order to recheck our model after scaling
Step 4: Now, after we have scaled our data, it is ready for training. So, now we will be extracting
two sets of data from the existing data, one will be train_data, and the other will be test_data. >
library(caTools)
> set.seed(123) It generates random numbers

> data_sample = sample.split(NewData$Class,SplitRatio=0.80) This function is used in order to


split the dataset into two datasets in the ratio 0.8: 0.2

> train_data = subset(NewData,data_sample==TRUE) This is used to transfer all the elements in


data_sample which have a value of data_sample = true.

> test_data = subset(NewData,data_sample==FALSE) This is used to transfer all the elements in


data_sample which have a value of data_sample = false.

> dim(train_data) It is used to check the dimensions of the training data. [1] 227846 30

> dim(test_data) It is used to heck the dimensions of the test dataset [1] 56961 30
Step 5: In this step, we will be performing Logical regression. The Logistic Regression
determines the extent to which there is a linear relationship between a dependent variable and one
or more independent variables. In terms of output, linear regression will give us a trend line
plotted amongst a set of data points. So, in our project, we have used it determine the relationship
between fraud or not fraud.
> Logistic_Model=glm(Class~.,test_data,family=binomial()) It is used to generate a Binomial
Linear Regression Model. > summary(Logistic_Model)

> plot(Logistic_Model) To plot the Logistic_model values

Dept. of CSE, BGSIT, BG Nagar Page 18


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Then, we have to assess the performance of our model, so we use it to delineate the ROC
curve(Receiver Optimistic Characterisitics).

Chapter 6
Testing
Testing is a crucial step in the development of a credit card fraud detection system with data
visualization using R. It ensures that the system functions as intended, accurately detects
fraudulent transactions, and provides meaningful data visualization. Here are various types of
testing that should be conducted:
Unit Testing:
Test individual components and functions in isolation to ensure they work correctly.
Verify that data preprocessing, machine learning algorithms, alerting, and visualization functions
produce the expected results.
Integration Testing:
Test the interactions between different system components to verify that they work together
harmoniously. Check data flow between modules, APIs, and external data sources.
System Testing:
Test the system as a whole to ensure all components function cohesively. Verify that the end-to-
end process from data ingestion to data visualization is working as expected.
Functional Testing:
Verify that all system functions meet their specified requirements. Test the real-time fraud
detection, alerting, reporting, and data visualization features.
Non-Functional Testing:
Test non-functional aspects of the system, including performance, security, and scalability. Check
that the system can handle a high volume of transactions without performance degradation.
User Acceptance Testing (UAT):
Involve end-users and stakeholders in testing the system. Ensure that the system meets their
requirements and is user-friendly.
Regression Testing:
Continuously test the system as new features are added or changes are made to existing ones.
Ensure that new updates do not introduce issues or break existing functionality.
Security Testing:

Dept. of CSE, BGSIT, BG Nagar Page 19


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Perform security testing, including penetration testing, to identify vulnerabilities in the system.
Ensure that sensitive data is protected, and there are no security breaches.
Performance Testing:
Evaluate the system's performance under different loads and conditions. Measure response times,
throughput, and scalability.
Data Quality Testing:
Test the quality of the data by verifying that data preprocessing steps handle missing values and
outliers appropriately. Ensure that transformed data maintains its integrity.
Visualization Testing:
Validate the accuracy and interactivity of data visualizations created with R. Ensure that charts,
graphs, and dashboards provide meaningful insights.

Testing should be an iterative process, with issues identified, documented, and addressed before
moving on to the next stage of development. It is essential to involve end-users and stakeholders
in the testing process to gather their feedback and ensure that the system meets their needs and
expectations.

Dept. of CSE, BGSIT, BG Nagar Page 20


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Chapter 7
Results and Snapshots
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.261e+01 1.024e+01 -1.231 0.2182
V1 -1.730e-01 1.274e+00 -0.136 0.8920
V2 1.445e+00 4.231e+00 0.342 0.7327
V3 1.790e-01 2.406e-01 0.744 0.4569
V4 3.136e+00 7.178e+00 0.437 0.6622
V5 1.490e+00 3.804e+00 0.392 0.6952
V6 -1.243e-01 2.220e-01 -0.560 0.5756
V7 1.409e+00 4.226e+00 0.333 0.7388
V8 -3.525e-01 1.746e-01 -2.019 0.0435 *
V9 3.022e+00 8.673e+00 0.348 0.7275
V10 -2.896e+00 6.624e+00 -0.437 0.6620
V11 -9.769e-02 2.827e-01 -0.346 0.7297
V12 1.980e+00 6.567e+00 0.301 0.7630
V13 -7.167e-01 1.256e+00 -0.570 0.5684
V14 1.932e-01 3.289e+00 0.059 0.9532
V15 1.039e+00 2.893e+00 0.359 0.7195
V16 -2.982e+00 7.114e+00 -0.419 0.6751
V17 -1.818e+00 4.998e+00 -0.364 0.7160
V18 2.748e+00 8.132e+00 0.338 0.7354
V19 -1.632e+00 4.772e+00 -0.342 0.7323
V20 -6.993e-01 1.151e+00 -0.607 0.5436
V21 -4.508e-01 1.992e+00 -0.226 0.8209
V22 -1.404e+00 5.190e+00 -0.271 0.7868
V23 1.903e-01 6.119e-01 0.311 0.7559
V24 -1.289e-01 4.470e-01 -0.288 0.7731
V25 -5.784e-01 1.950e+00 -0.297 0.7668
V26 2.659e+00 9.350e+00 0.284 0.7761
V27 -4.540e-01 8.150e-01 -0.557 0.5775
V28 -6.639e-02 3.573e-01 -0.186 0.8526
Amount 9.026e-04 2.874e-03 0.314 0.7535
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1443.40 on 56960 degrees of freedom


Residual deviance: 378.59 on 56931 degrees of freedom
AIC: 438.59

Dept. of CSE, BGSIT, BG Nagar Page 21


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Number of Fisher Scoring iterations: 17

7.1 Predicted Values v/s Residuals 7.2 Theoretical Quantities v/s Std. Deviation
Residuals

7.3 Predicted Values v/s sqrt. Std. 7.4 Leverage glm v/s Std. Pearson Resid.
Deviation Residuals

Dept. of CSE, BGSIT, BG Nagar Page 22


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Conclusion and Future Work


Conclusion
It can be concluded that the development of a Credit Card Fraud Detection is a very essential
thing for any Bank or organization, in order to keep track if any fraudulent activities are taking
place using its customer’s credit cards. And this can be performed using Machine Learning
Techniques. And efficiently analyzed by using data visualization techniques using R.

Future Work
The current Fraud Detection System can be expanded by adding more ways to secure the data
by adding extensive Machine Learning Applications and Techniques. So, in the near future, we
will be going over more research projects in order to understand more techniques which can be
applied in order to make the current model more efficient.

Dept. of CSE, BGSIT, BG Nagar Page 23


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

References
[1] Patil, S., Nemade, V., & Soni, P. K. (2018). Predictive modelling for credit card
fraud detection using data analytics. Procedia computer science, 132, 385-395.

[2] Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017, October). Credit card
fraud detection using machine learning techniques: A comparative analysis. In 2017
International Conference on Computing Networking and Informatics (ICCNI) (pp. 1-9).
IEEE.

[3] Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., & Beling, P. (2018, April).
Deep learning detecting fraud in credit card transactions. In 2018 Systems and Information
Engineering Design Symposium (SIEDS) (pp. 129-134). IEEE.

[4] Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018, March). Random
forest for credit card fraud detection. In 2018 IEEE 15th International Conference on
Networking, Sensing and Control (ICNSC) (pp. 1-6). IEEE.

[5] Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He-Guelton,
L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert
Systems with Applications, 100, 234-245.

[6] Elgendy, N., & Elragal, A. (2014, July). Big data analytics: a literature review paper.
In Industrial Conference on Data Mining (pp. 214-227). Springer, Cham.

[7] Gamon, M. (2004, August). Sentiment classification on customer feedback data:


noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the
20th international conference on Computational Linguistics (p. 841). Association
forComputational Linguistics.

Dept. of CSE, BGSIT, BG Nagar Page 24


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

[8] Leppäaho, E., Ammad-ud-din, M., & Kaski, S. (2017). GFA: exploratory analysis of
multiple data sources with group factor analysis. The Journal of Machine Learning
Research, 18(1), 12941298.

[9] Andrienko, G., Andrienko, N., Drucker, S., Fekete, J. D., Fisher, D., Idreos, S., ... &
Stonebraker, M. Big Data Visualization and Analytics: Future Research Challenges and
Emerging Applications.

[10] Kamaruddin, S., & Ravi, V. (2016, August). Credit card fraud detection using big
data analytics: use of PSOAANN based one-class classification. In Proceedings of the
International Conference on Informatics and Analytics (pp. 1-8).

[11] Maniraj, S & Saini, Aditya & Ahmed, Shadab & Sarkar, Swarna. (2019). Credit Card
Fraud Detection using Machine Learning and Data Science. International Journal of
Engineering Research and. 08. 10.17577/IJERTV8IS090031.

[12] Varmedja, Dejan & Karanovic, Mirjana & Sladojevic, Srdjan & Arsenovic, Marko &
Anderla, Andras. (2019). Credit Card Fraud Detection - Machine Learning methods. 1-5.
10.1109/INFOTEH.2019.8717766.

[13] Maniraj, S & Saini, Aditya & Ahmed, Shadab & Sarkar, Swarna. (2019). Credit Card
Fraud Detection using Machine Learning and Data Science. International Journal of
Engineering Research and. 08. 10.17577/IJERTV8IS090031

[14] Varmedja, Dejan & Karanovic, Mirjana & Sladojevic, Srdjan & Arsenovic, Marko &
Anderla, Andras. (2019). Credit Card Fraud Detection - Machine Learning methods. 1-5.
10.1109/INFOTEH.2019.8717766.

[15] F. Carcillo, Y.A. Le Borgne, O. Caelen, Y. Kessaci, F. Oblé, G. Bontempi


Combining unsupervised and supervised learning in credit card fraud detection Inf. Sci.
(Ny). (2019), 10.1016/j.ins.2019.05.042

Dept. of CSE, BGSIT, BG Nagar Page 25


Credit Card Fraud Detection with Data Visualization in R 2022-23
CREDIT CARD FRAUD DETECTION IN R

Dept. of CSE, BGSIT, BG Nagar Page 26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy