0% found this document useful (0 votes)
8 views8 pages

Imac Pretty 1

The document discusses the evolution of cybersecurity threats and the importance of anomaly detection in protecting the banking sector from fraud. It emphasizes the use of machine learning techniques, such as neural networks and data mining, to identify and predict fraudulent activities in banking transactions. The study highlights the challenges of data acquisition and the necessity of continuous model retraining to adapt to evolving fraud patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Imac Pretty 1

The document discusses the evolution of cybersecurity threats and the importance of anomaly detection in protecting the banking sector from fraud. It emphasizes the use of machine learning techniques, such as neural networks and data mining, to identify and predict fraudulent activities in banking transactions. The study highlights the challenges of data acquisition and the necessity of continuous model retraining to adapt to evolving fraud patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

ANOMALOUS PATTERNS IN
CYBERSECURITY: PROTECTING THE
BANKING SECTOR THROUGH ADVANCED
DETECTION
WRITTEN BY: KAROL NOLASCO, JESUS
RODRIGUEZ & DEMIAN PIÑA

The cybersecurity landscape has undergone a Information Security: Intrusion Detection


radical metamorphosis in response to the rapid Network Monitoring: Fault and Issue Detection
advancement of digital threats. From the early Finance: Fraud Detection in Financial
incidents of malware to high-level intrusions and Transactions Health: Disease Detection through
large-scale coordinated attacks, the nature and Medical Data Manufacturing: Quality Control in
complexity of threats have evolved Manufacturing Processes
exponentially. Internet of Things (IoT): Monitoring of IoT Devices
Digital Marketing: Detection of Fraudulent
INTRODUCTION
Activity in Online Advertising
The cunning and sophistication of malicious actors have
Telecommunications: Network Issue Detection in
given rise to increasingly elusive tactics. The emergence
Telecommunications
of stealth attacks, the exploitation of unknown
Environment: Environmental Monitoring to
vulnerabilities, and the camouflage within conventional
data traffic have challenged conventional security Identify Significant
approaches. In response to this ever-evolving landscape, Changes Traffic and Transport: Detection of
anomaly detection has emerged as a beacon of hope in Congestion and Anomalies in Traffic Patterns
proactive defense against emerging threats. This However, we will focus on the financial area, which is
approach goes beyond the identification of known a common concern for everyone. To address this, we
threats; it delves into the deep analysis of unusual will use resources related to computing and
behaviors or deviations from normal patterns, utilizing
intelligent systems, such as data mining, machine
intelligent algorithms and advanced models to detect
learning, learning algorithms, and neural networks.
suspicious activities before they cause significant harm.
Today, credit cards and banking applications play a
The detection of anomalous patterns not only aims to
crucial role in daily financial management. They are
identify intrusions or malicious activities but also
addresses the challenge of internal threats, where convenient and secure tools that facilitate access to

legitimate users may behave unexpectedly due to financial services and simplify both consumption
human errors or compromised credentials. and fund administration. Credit cards allow
After analyzing all the risks, we identified the following: transactions without cash, providing flexibility in
payments by offering options for full settlement or
installment payments.
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

On the other hand, banking applications have


Hence, the implementation of machine learning
transformed the way people manage their finances. They
offer immediate access to balances, transactions, and
techniques has been chosen to develop and

banking services from mobile devices, facilitating compare specific models for predicting fraud related
constant monitoring of financial activity. Additionally, to banking cards.
they enable transfers, bill payments, and investments, These models are based on a self-learning system
granting users greater control over their resources. that identifies patterns and trends from historical
Despite their advantages, the use of credit cards and transactional data provided by customers. The aim is
banking applications carries potential risks, especially
to detect irregular circumstances in the use of bank
fraud. Account theft, such as phishing (where scammers
cards early and effectively.
impersonate legitimate entities to obtain confidential
information) and carding (unauthorized use of credit
Acquiring knowledge and applying machine
card information), poses a significant threat.
learning techniques to financial data with the aim of

Detecting fraud presents a complex challenge, given the creating an application capable of anticipating
constant change and diversity of methods this potential fraudulent activities in banking card
phenomenon has experienced. In the global financial transactions. This process involves:
sphere, statistics are employed alongside data mining Thoroughly researching fraud in banking card
tools and machine learning to identify patterns of transactions and its connection to machine
fraudulent behavior. learning.
Current detection systems typically offer two types of
Evaluating and comparing different models for
alerts: one based on probabilistic ratings and another on
fraud analysis in banking card transactions by
rule compliance. The former heavily relies on predictive
implementing various machine learning
models generating a "score," while the latter employs
techniques.
SQL command-based filters.
In this context, the aim is to apply machine learning Developing a predictive algorithm capable of
techniques to develop and compare models that can identifying potential fraudulent activities by
anticipate fraud related to banking cards. These analyzing historical transaction data of
techniques, such as data mining and machine learning, customers.
leverage efficient probabilistic models like generalized Phishing: Online deception where scammers
regression models, artificial neural networks, decision impersonate legitimate entities to fraudulently
trees, and Bayesian belief networks. These models seek
obtain confidential information.
to determine and predict the possibility of fraud through
Machine learning: A field of artificial intelligence
a probability rating or "score."
that develops algorithms and models enabling
Achieving this involves an autonomous learning system
computers to learn and enhance their
that identifies patterns and trends based on historical
data, primarily transactions conducted by customers. performance in specific tasks based on

This data allows for the rapid identification of anomalous experience, without explicit programming for
behaviors, signals that could indicate fraudulent each situation.
activities differing from a customer's usual behavior. Data mining: The process of discovering patterns,
Fraud, whether against businesses or individuals, often trends, or relevant information within large and
arises from mishandling personal or corporate complex datasets to gain useful insights and
information. Cybercriminals employ various methods
make decisions.
and techniques, such as phishing, to obtain stolen
information through malicious emails.
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

Carding: Criminal activity involving the


unauthorized use of credit or debit card
information to make fraudulent online
purchases.
The features V1 to V28 represent the principal
Fraud: Deceptive or dishonest actions carried out
components derived from this transformation, while
with the intention of gaining personal benefits.
the variables that have not undergone PCA include
Learning algorithms: Set of rules and logical
time, item class, and transaction amount in euros.
procedures used by computers to learn patterns
The variable 'time' records the elapsed time in
from data and improve their performance in
seconds since the first transaction in the dataset,
specific tasks.
while 'amount' represents the monetary value of
Naive Bayes: A classification algorithm based on
each transaction, the latter being useful for a
Bayes' theorem, assuming independence among
sample-dependent learning approach. The 'class'
features. It's commonly used in classification
variable determines the nature of the transaction,
problems and text analysis.
marking a value of (1) for fraud and (0) for legitimate
Random Forest: A machine learning method
transactions. To analyze the optimal efficiency of the
that uses multiple decision trees to make
probabilistic model, the artificial neural network
predictions. Each tree "votes" for the most
algorithm will be employed through the 'scikit-learn'
popular classification, and the final prediction is
library in Python. The focus of this analysis is on the
determined by majority.
dataset stratified by fraud class and segmented
Neural Networks: Artificial neural networks that
according to monetary values in euros. The process
mimic the functioning of the human brain. They
is divided into two phases: in the first phase, using a
consist of layers of interconnected nodes
training dataset ('train'), the parameters of the
(neurons) that process and transmit information
probabilistic models will be estimated; in the second
for machine learning tasks like pattern
phase, employing a test dataset ('test'), predictions
recognition or predictions.
will be made.

MATERIALS AND METHODS


Start process
The dataset under consideration contains records of
Load and evaluate the data
transactions carried out by European credit
The dataset is extracted from the 'training.csv' file,
cardholders in September 2018. This dataset outlines
detailing the class distribution. Findings reveal the
transactions spread over two days, totaling 283,012
following results (which can be seen in Figure 1):
transactions, among which 285 cases of fraud have
Examples of non-fraudulent transactions:
been identified, representing 0.172% of the total. It's
274,229.
important to highlight that this sample exhibits a
Examples of fraudulent transactions: 476.
significant imbalance, with the fraud class being
This dataset reflects that non-fraudulent
considerably minority.
transactions represent 99.83%, while fraudulent
The input variables are predominantly numerical
transactions constitute 0.17% of the total.
and have been transformed through principal
component analysis (PCA).
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

To mitigate this issue, the 'oversampling' technique


is employed, which involves increasing the amount
of data in the minority class to match it with the
majority class. This approach aims to prevent the
model from being biased towards disproportionate
classification and achieves a better balance in the
model's learning process.

Results
Estimation of original unprocessed data
An anticipation was performed using Neural
Standardizing the dataset Networks and Naive Bayes Bernoulli estimators
Classifying our columns into two distinct categories, employing the data in its original state, without any
categorical and numerical, is crucial. This allows us additional processing. The obtained results were as
to perform individualized preprocessing on each follows:
column type, facilitating the standardization of our
Neural Networks
data into a coherent matrix suitable for subsequent
analysis.

Feature and Target Acquisition


We classify the features as the variable 'X',
representing scaled time and amount. Meanwhile,
'V' will be employed as the values for the model,
linking them to the objectives contained in the
variable 'y'. This methodology allows the model to
learn to distinguish between fraudulent and non-
fraudulent records.

Training and Testing


The data is divided into training and testing sets,
allocating 100% of the data to the training set
initially. Later on, cross-validation will be employed
to create different training and testing sets.
Neuronal Networks exhibited effective classification
of non-fraudulent data; however, it failed to identify
Over-sampling SMOTE
instances of fraud, presumably due to its assumption
Given the significant imbalance in the data
that all data belonged to the non-fraud category.
distribution between fraud cases (0.17%) and non-
This led to an inaccurate classification of instances
fraud cases (99.83%), there's a considerable risk of
that were indeed fraudulent. (Neural Network
the model falling into overfitting. This situation can
Confusion Matrix)
lead the model to incorrectly assume that the
majority of cases belong to the 'non-fraud' category.
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

Gaussian Naive Bayes Neural Networks

It has been observed that neural networks have


achieved high accuracy in correctly classifying all
The effectiveness of the analyzed models is fraudulent transactions. However, they show a
compromised due to the acquisition of data without tendency to make errors by classifying legitimate
proper processing. The metrics reveal poor transactions as fraudulent (Neural Network
performance in both cases, characterized by low Confusion Matrix).
precision. This phenomenon indicates that relevant
results are notably surpassed by those lacking Gaussian Naive Bayes

relevance. In particular, when considering the 'recall


score,' it is highlighted that the Naive Bayes Bernoulli
model achieves superior performance, marked by an
indicator of 0.67, representing the proportion of
elements correctly identified as positives compared
to the total real positives. These metrics emphasize
that the performance of the Naive Bayes Bernoulli
model significantly surpasses that of neural
networks in this evaluation. (Confusion matrix NB)

Estimation of original data through processing


Predictions were made using estimators such as
Neural Networks and Naive Bayes using previously
processed data. The obtained results were as follows:
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

Estimation of processed data


In order to estimate the previously processed data, a
grid search process was conducted to determine the
optimal hyperparameters for training our model.
These hyperparameters were evaluated using the
accuracy metric for both model configurations. In
this process, a dictionary was initialized where
parameters to investigate were assigned along with
their respective potential values for each estimator.
To ensure a rigorous evaluation, the grid search
employed 5-fold cross-validations for both
estimators. The training was conducted with the aim
of identifying the most suitable parameters for each
estimator, which turned out to be those observed in Cross-validation was employed using optimized

the image named 'Best Parameters. parameters through a grid search, evaluating the
average accuracy of each model across 10 training
iterations. Among the considered models, Neural
Networks demonstrated the best performance,
standing out for their closer approximation to the
desired outcomes. Subsequently, predictions were
made using the optimal model obtained through
the grid search, which is based on the Neural
Networks architecture. (Images "Confusion Matrix
NN")

Predictions from the best model obtained through


The highest score achieved by the most effective
grid search using Naive Bayes Bernoulli. (Confusion
estimator using optimal parameters is determined.
Matrix Images for NBB)
(Image: 'Best Score')
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

During our project, we encountered one of the most


significant limitations: obtaining data, specifically
banking data. This task became a considerable
obstacle, presenting challenges both in acquiring
the data and in the subsequent use of the gathered
information.
The complexity of obtaining banking data stemmed
from various factors, including the highly sensitive
and confidential nature of this information. Financial
institutions impose rigorous security and privacy
policies, making it challenging to acquire data for
research or model development purposes, even
when aiming to contribute to enhancing security
systems such as detecting credit card fraud.
DISCUSSION
Furthermore, once overcoming the barriers to
As mentioned earlier, fraud detection becomes a last
obtaining the data, we are faced with the additional
resort strategy when preventive measures fail.
challenge of effectively using them in our analyses
Currently, traditional tools such as statistics and
and models. The inherent complexity in handling
advanced Data Mining techniques, including Neural
banking data demanded a meticulous approach to
Networks, Bayesian Belief Networks, and Decision
ensure compliance with regulations and standards,
Trees, are employed. These techniques have allowed
as well as the preservation of information integrity
the creation of more sophisticated models to
and confidentiality.
identify fraudulent behaviors.
Since fraud patterns constantly evolve, the
involvement of experts in rule formulation is crucial.
Analysts, by daily monitoring potential fraudulent
behaviors, regularly discover new cases. Therefore, it
becomes imperative to frequently retrain Data
Mining models to update them with more recent
data.
Data Mining provides various technologies to detect
fraudulent operations, often requiring the
combination of several of these to yield improved
results. The precise selection and combination of
these technologies largely depend on the
particularities of the available data.
RESEARCH IN APPLIED MATHEMATICS AND COMPUTATION

REFERENCES
CONCLUSSION Tianming Hu and Sam Y. Sung. Detecting pattern-
After analyzing the results, it can be affirmed that based outliers. Pattern Recognition Letters, Vol
the neural network model, specifically the multi- 24, pp3059-3068,Dec 2003.
layer perceptron, shows higher efficacy in predicting M. Pal (2005): Random forest classifier for
fraudulent and non-fraudulent transactions
remote sensing classification, International
compared to Naive Bayes. In the realm of Machine
Journal of Remote Sensing, 26:1, 217-222.
Learning, it is crucial to normalize input variables to
S. N. Pang and D. Kim and S. Y. Bang. Fraud detection
optimize the performance of many algorithms.
using support vector machine ensemble. Pohang
Normalization involves adjusting variable values to a
specific range, but incorrect implementation or University of Science and Technology (POSTECH),

improper method selection can distort data and 2001.


affect analysis. There's no one-size-fits-all approach Hand, David J and Blunt, Gordon and Kelly, Mark G
for all variable forms; understanding data and Adams, Niall M. Data Mining for Fun and Profit.
distribution, identifying anomalies, and evaluating Data Mining for Fun and Profit, Vol 15, pp 111-126,
ranges are essential steps to select the most May 2000.
appropriate technique without distorting Zhao, Q. and Bhowmick, S. S. Association Rule
information.
Mining: A Survey. Nanyang Technological
For those interested in delving deeper into this field,
University, Singapore, 2006.
it is recommended not only to address data
U Fayyad, R Uthurusamy. From Data Mining to
normalization but also to proactively seek optimal
Knowledge Discovery in Databases.ACM ,1996.
methodologies. Normalization is a crucial
component in data analysis, enabling effective Jiawei Han. Data Mining: Concepts and Techniques.

comparison of variables with different scales and Morgan Kaufmann, 2006


units. An effective normalization strategy could Zengyou He and Xiaofei Xu and Joshua Zhexue
involve transforming data to fit a specific Huang and Shengchun Deng. Mining class
distribution or applying techniques like min-max outliers: concepts, algorithms and applications
scaling to adjust values to a predefined range. This in CRM. Expert Systems with Applications, Vol 27,
approach ensures coherence and comparability of pp 681-697, Nov 2004.
data, particularly in heterogeneous and complex
Zengyou He and Xiaofei Xu and Shengchun Deng.
datasets.
Data Mining for Actionable Knowledge: A Survey.
In addition to normalization, it is imperative to
Computer Science, 2001
consider and apply advanced techniques for fraud
Jon T.S. Quah and M. Sriganesh. Real-time credit
detection. The integration of machine learning
models, artificial intelligence, and pattern analysis card fraud detection using computational

greatly enriches a system's ability to identify intelligence. Expert Systems with Applications,
anomalous and potentially fraudulent behaviors. 2007.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy