0% found this document useful (0 votes)
17 views22 pages

Presentation 1

Uploaded by

shahreartonmoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Presentation 1

Uploaded by

shahreartonmoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

MACHINE LEARNING

Credit Card
Fraud Detection
MEMBERS

Hiruni Dinesha Ekanayaka – 21053413​
Tharushi Dewmini – 21053426​
Ahmat Senoussi – 21053376
Tariq Hossain – 21053293
Overview & Main Challenges

The challenge is to Main Challenges involved in credit card fraud


detection are:
recognize fraudulent credit
• Enormous data is processed every day and the
card transaction so that the model built must be fast enough to respondd to
the scam in time.
customers of creddit card • Imbalanced data i.e most of the transactions
(99.8%) are not fraudulent which make really
companies are not charged hard fo detecting the fraudulent ones.
for items that they did not • Data availability as the data is mostly private.
• Misclassified data can be another major issue, as
purchase.
not every fraudulent transaction is caught and
reported.
• Adaptive technique used against the model by the
scammers.
How to tackle these challenges?
• The model used must be simple and fast enough to
detect the anomaly and classify it as a fraudulent
transaction as quickly as possible.
• Imbalance can be dealt with by properly using some
methods which we will talk about in the next paragraph.

2024
• For protecting the privacy of the user the dimensionality
of the data can be reduced.
• A more trustworthy source must be taken which double-
check the data, at least for training the model.
• We can make the model simple and interpretable so that
when the scammer adapts to it with just some tweaks we
can have a new model up and running to deploy.
Random Forest Algorithm
Random forest is a popular machine learning algorithm that belongs to the supervised learning
technique.It can be used for both classifucation and Regression problems in Machine Learning.It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of a model.
“Random forest is a classifier that contains a number of decision trees on various subsets of the
given dataset and takes the average to improve the predictive accuracy of the dataset.”

We use Random Forest Algorithm here for Credit Card fraud detection.
How it works?
K data points
Step 1 Selecting random K data points.

Decision Trees
Build the decision trees associated with the
Step 2 selected data points(subsets)
Number N
Choose the number N for the decision trees
Step 3 that you want to build..

Step 4 Repition.
Repeat step 1 & 2.
Predictions.
Step 5 For new data points, find the predictions of
each decision tree, and assign the new data
points to the category that wins the majority
votes
 It takes less time as
compared to other Why
algorithms use
 It predicts output with high Random
accuracy. Forest
 Dataset runs efficiently.
 It can also maintain
accuracy when a large
proportion of data is
Understanding
Understanding
Credit Card Fraud Detection Using Random
Forest
1.Import all packages and read the csv file
2.print(data.describe()) for describing the Data
3.

Only 0.17% fraudulent transaction out all the


transactions. The data is highly unbalanced.
4.Print the amount details for fraudulent
transactions.
5.Print the amount details for normal
transaction

As we can clearly notice from this, the average Money


transaction for the fraudulent ones is more. This makes
this problem crucial to deal with
6. In the HeatMap we can
clearly see the most of
the features do not
correlate to other
features but there are
some features that either
has a posiitive or a
negative correlation with
each other.For example ,
V2 and V5 are highly
negatively correlated
with the feature called
amount.We also see
some correlation with
V20 and amount.
7.Seperating the X and Y values.
Dividing the data into inputs parameters and
outputs value format.
8. Training and testing Data Bifurication.

We will be dividing the dataset into two main groups. One


for training the model’s performance.
8. Building a Random Forest Model using scikit
learn.
8. Building all kinds of evaluating parameters.
8. Visualizing the Confusion Matrix.
As you can see with our Random Forest Model we are
getting a better result even for the recall which is the
most tricky part.
THANKS

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy