Team 09 Report (2) Removed
Team 09 Report (2) Removed
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
G. Navya 22881A12E7
G. Susanna 22881A12E9
Y Jai Anthony Rahul Reddy 22881A12K0
SUPERVISOR
Dr. G. Suryanarayana
Associate Professor
May, 2025
CERTIFICATE
G. Navya 22881A12E7
G. Susanna 22881A12E9
Y Jai Anthony Rahul Reddy 22881A12K0
Examiner
The satisfaction that accompanies the successful completion of the task would
be put incomplete without the mention of the people who made it possible,
whose constant guidance and encouragement crown all the efforts with success.
We sincerely thank Dr.Saroja Kumar Rout, our project Convenor, for his
valuable support and guidance during my mini project.
We avail this opportunity to express our deep sense of gratitude and heart-
ful thanks to Dr. Teegala Vijender Reddy, Chairman and Sri Teegala
Upender Reddy, Secretary of VCE for providing a congenial atmosphere to
complete this mini - project successfully.
We also thank all the staff members of Information Technology department for
their valuable support and generous advice. Finally thanks to all our friends
and family members for their continuous support and enthusiastic help.
G. Navya
G. Susanna
Y Jai Anthony Rahul Reddy
ii
Abstract
Short text message phishing (smishing) and voice phishing (vishing) attacks
have become more frequent, which leads to loss of sensitive information such
as passwords, credit card numbers, or personal identification details, identity
theft, and also significant financial loss through unauthorized transactions.
As mobile technology and voice communication platforms become more inte-
grated into daily life, cybercriminals exploit these channels to deceive users
and gain unauthorized access to private data. The increasing sophistication
of such attacks makes early and accurate detection more critical than ever.
Traditional phishing detection methods, largely based on machine learning al-
gorithms such as Decision Trees, Support Vector Machines (SVM), and Naive
Bayes classifiers, have been widely studied and applied. However, these ap-
proaches often struggle with handling the sequential nature of textual and
audio data, limiting their effectiveness in capturing complex contextual and
temporal patterns.Additionally, the scarcity of large, well-labeled datasets for
smishing and vishing further challenges the development of robust detection
systems. To overcome these limitations, this work explores the use of ad-
vanced deep learning techniques tailored to sequential and high-dimensional
data. Models such as Long Short-Term Memory (LSTM), Gated Recurrent
Units (GRU), Convolutional Neural Networks (CNN), and Residual Networks
(ResNet) offer improved capabilities in identifying hidden features and patterns
in both SMS content and audio signals. This research focuses on building
a unified framework capable of detecting both smishing and vishing attacks
with high accuracy and real-time performance.The proposed system not only
enhances detection precision but also enables faster response times, helping to
prevent data breaches and financial loss. By integrating advanced deep learning
models, this work aims to provide a more comprehensive and scalable solu-
tion to combat evolving phishing threats, thereby strengthening cybersecurity
measures and protecting user privacy across digital communication channels.
iv
Table of Contents
vi
3.4.2 Mel Spectrogram Generation for Audio . . . . . . . . . . . 29
3.5 Summary of Model Pipelines . . . . . . . . . . . . . . . . . . . . . 30
CHAPTER 4 System Architecture and Model Design . . . . . . . 31
4.1 Overall System Architecture . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Smishing Detection Models . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 LSTM-based Model . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 GRU-based Model . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 CNN-based Model . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Vishing Detection Models . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 CNN + BiGRU Model . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Stacked GRU Model . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 ResNet-Inspired Model . . . . . . . . . . . . . . . . . . . . . 34
4.4 Tools and Technologies Used . . . . . . . . . . . . . . . . . . . . . 36
4.5 Design Constraints and Assumptions . . . . . . . . . . . . . . . . . 37
CHAPTER 5 Implementation and Experimental Results . . . . . 38
5.1 Experimental Setup and Parameters . . . . . . . . . . . . . . . . . 38
5.2 Performance Metrics Used . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Results for Smishing Detection . . . . . . . . . . . . . . . . . . . . 41
5.4 Results for Vishing Detection . . . . . . . . . . . . . . . . . . . . . 41
5.5 Graphical Analysis and Model Comparisons . . . . . . . . . . . . 42
5.6 Discussion on Findings . . . . . . . . . . . . . . . . . . . . . . . . . 43
CHAPTER 6 Conclusions and Future Scope . . . . . . . . . . . . . 45
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
List of Tables
viii
List of Figures
4.1 Flowchart of Deep Learning models on SMS and voice call datasets 35
ix
Abbreviations
Abbreviation Description
NB Naive Bayes
Introduction
1
The increasing use of mobile devices in digital transactions and identity
verification has made these attack vectors particularly dangerous. Smishing and
vishing can lead to severe consequences such as identity theft, unauthorized
banking transactions, data breaches, and the compromise of personal and
professional credentials [2], [8]. These consequences are not only financial but
also psychological, as victims often experience anxiety and a loss of trust in
digital systems.
Traditional security tools such as spam filters and firewalls offer limited
protection against smishing and vishing. This is due to the dynamic and
evolving nature of these attacks, which often change their tactics to bypass
signature-based detection methods [3]. Attackers employ linguistic manipula-
tion, use of URL shorteners, and even AI-generated voice messages to make
their communications appear credible and evade traditional filters [14].
The threat landscape has also been exacerbated by the rise in mobile app
usage and remote work. More users are now dependent on mobile devices
for tasks such as banking, business communication, and e-commerce. This
trend has provided attackers with a larger target base and more opportunities
to exploit. Studies have shown that user trust in mobile communication is
frequently abused by attackers, leading to a higher success rate of smishing
and vishing compared to email phishing [11].
Furthermore, adversarial methods have emerged that intentionally deceive
detection systems, even those based on deep learning. These methods involve
altering message structures or audio features to exploit weaknesses in model
generalization [3]. As a result, detection models must now be both context-
aware and robust to subtle variations in attack strategies.
In recent years, deep learning approaches have been explored to counter
these threats. Techniques such as recurrent neural networks (RNNs), convo-
lutional neural networks (CNNs), and transformer-based models have shown
promise in identifying complex patterns in both text and audio data [18],
[19]. These models are capable of capturing semantic, syntactic, and acoustic
features that traditional rule-based systems miss.
To effectively combat smishing and vishing, detection systems must evolve
Behavioral studies have shown that many users, especially those with low
digital literacy or high trust in institutions, are particularly vulnerable to mobile
phishing [11]. Attackers capitalize on these vulnerabilities by crafting messages
or voice calls that appear urgent, familiar, and trustworthy. As a result, users
may unwittingly compromise their accounts or install harmful software on their
devices, leading to broader data breaches or financial exploitation.
The integration of mobile phones with digital banking, e-commerce, health-
care, and government services further amplifies the risks. Vishing attacks that
target voice-based authentication methods or phone-based password recovery
systems can bypass traditional forms of security, posing threats to entire digital
ecosystems [8]. Likewise, smishing attacks can lead to ransomware installations
or credential theft, which are then used in more complex cyberattacks.
Traditional response mechanisms such as blacklists, keyword filters, and rule-
based detection systems are increasingly inadequate. These static approaches
fail to detect evolving and contextualized threats that vary in content, language,
or behavior. Attackers often change tactics and payloads to evade detection,
making traditional models obsolete without frequent manual updates [3].
The situation is further complicated by the limitations of existing network-
level defenses, which often cannot inspect encrypted content or voice signals
without violating user privacy. This necessitates the use of device-level intelli-
– Unlike email phishing, smishing messages are shorter and more context-
dependent, making them harder to detect using conventional NLP meth-
ods [5][6].
– Recent studies demonstrated that deep learning models like LSTM and
BERT can be deceived using carefully crafted adversarial examples [3].
– Publicly available datasets for smishing and vishing are limited and
imbalanced, affecting model training [25].
Literature Survey
The current paradigm has shifted decisively toward deep learning architec-
tures, yielding significant improvements in detection capabilities. Research by
[16] established that convolutional neural networks (CNNs) could effectively
detect localized phishing patterns (including urgent action cues and suspicious
12
n-grams) with 96% precision. Complementary work by [18] demonstrated
that long short-term memory (LSTM) networks achieved superior performance
(98.2% F1-score) in capturing sequential dependencies and contextual relation-
ships within message content. The introduction of transformer models marked
a substantial breakthrough, with [19] reporting BERT-based systems attaining
99.1% accuracy through advanced contextual analysis of message semantics,
albeit requiring approximately 15 times more computational resources than
traditional machine learning approaches.Several persistent challenges continue
to impact smishing detection systems.
All models were trained using the binary cross-entropy loss function and
the Adam optimizer. Training was conducted over 20+ epochs with an 80:20
train-test split. Evaluation metrics included accuracy, precision, recall, and F1-
score. The CNN and LSTM models achieved an accuracy of 99% for smishing
detection, whereas the Stacked GRU model achieved 98.90% accuracy for
vishing detection, validating the robustness of these architectures [5], [14].
These results confirm that deep learning models, with their ability to
model non-linear and hierarchical relationships, are particularly suited for
phishing detection. Moreover, hybrid architectures that combine spatial and
temporal analysis further enhance detection performance, especially in audio-
based vishing scenarios [14], [26].
Challenges:
• Limited Context: SMS messages are typically short and lack rich context,
making it harder to detect subtle phishing cues [2], [16].
Challenges:
• Small Dataset Size: With only 200 samples, the dataset is prone to
overfitting when training deep learning models. It limits the model’s
generalization capacity [20], [26].
Research Methodology
28
measured using metrics like accuracy, precision, recall, and F1-score.
The SMS dataset used in this study is sourced from the UCI Machine Learning
Repository and contains 5,572 messages labeled as either ”ham” or ”spam.”
Each entry includes the text content and the label. This dataset is preprocessed
through text cleaning, tokenization, and padding to ensure uniform input for
deep learning models.
The vishing dataset consists of 200 labeled audio recordings simulating both
normal and fraudulent calls. Features include call type, gender, duration,
transcript, and the audio file itself. These recordings are converted into Mel
spectrograms for deep learning model input.
• Conversion to lowercase
31
verts word indices into dense vectors. This embedding is followed by a 64-unit
LSTM layer that maintains a memory of prior words and their relationships.
LSTM networks are particularly suitable for sequence modeling tasks like
smishing detection because they are capable of retaining important information
over long text sequences and mitigating vanishing gradient problems. The
output from the LSTM layer is passed to a fully connected Dense layer with
a sigmoid activation function, which outputs a binary classification indicating
whether a message is smishing or safe.
Although LSTMs provide strong performance in modeling temporal se-
quences, they tend to be computationally intensive and require more training
time compared to simpler architectures like GRUs. Nevertheless, this model
demonstrated high recall, indicating its strength in identifying most malicious
messages, albeit with slightly increased false positives.
Figure 4.1: Flowchart of Deep Learning models on SMS and voice call
datasets
38
The training data was split using an 80:20 train-test split to ensure reliable
performance evaluation. Furthermore, 10% of the training data was reserved for
validation to monitor the models for overfitting or underfitting during training.
All models were trained using the Adam optimizer, which is a widely adopted
stochastic gradient descent variant known for its computational efficiency and
suitability for non-convex optimization. The learning rate was set to 0.001
for most models, and binary crossentropy was used as the loss function due
to the binary nature of the classification task.The following hyperparameters
were fixed across most experiments:
• Batch Size: 32
1. Accuracy
Accuracy is the most commonly used metric that measures the overall
correctness of the model by calculating the ratio of correctly predicted instances
TP + TN
Accuracy =
TP + TN + FP + FN
where T P is True Positives, T N is True Negatives, F P is False Positives,
and F N is False Negatives. While accuracy is informative, it can be misleading
in imbalanced datasets where one class significantly outnumbers the other.
2. Precision
Precision evaluates the model’s ability to correctly identify only the relevant
positive cases (i.e., spam or fraud). It is defined as the ratio of true positives
to the sum of true and false positives:
TP
Precision =
TP + FP
High precision indicates that fewer legitimate messages or calls were wrongly
classified as malicious.
3. Recall (Sensitivity)
Recall measures the model’s ability to detect all relevant positive instances,
i.e., the proportion of actual smishing or vishing messages that were correctly
identified:
TP
Recall =
TP + FN
A high recall indicates that most fraudulent messages or calls were correctly
detected, although it may come at the cost of lower precision.
4. F1-Score
The F1-score is the harmonic mean of precision and recall, offering a
balanced metric when there is an uneven class distribution:
Precision × Recall
F1-Score = 2 ×
Precision + Recall
Both LSTM and CNN achieved the highest accuracy, while CNN slightly
outperformed others in precision and recall, indicating better classification of
phishing messages.
Table 5.3: Comparison Between Existing Models and Proposed Project Model
• Sequential models are ideal for both text and audio sequences.
45
• Transfer Learning: Use of transformer models like BERT for text and
AudioBERT for voice to further enhance detection accuracy.
47
[11] P. Kumarasinghe, D. Dissanayake, P. Gamage, and G. U. Ganegoda.
“User Behavior Analysis in Determining the Vulnerable Category of
Vishing and Smishing”. In: 2023 5th International Conference on Ad-
vancements in Computing (ICAC). Colombo, Sri Lanka, 2023, pp. 35–40.
doi: 10.1109/ICAC60630.2023.10417682.
[12] W. L. T. T. N. Kumarasiri, M. K. J. C. Siriwardhana, S. A. D. S. L.
Suraweera, A. N. Senarathne, and S. M. B. Harshanath. “Cybersmish: A
Proactive Approach for Smishing Detection and Prevention using Machine
Learning”. In: 2023 7th International Conference on I-SMAC (IoT in
Social, Mobile, Analytics and Cloud) (I-SMAC). Kirtipur, Nepal, 2023,
pp. 210–217. doi: 10.1109/I-SMAC58438.2023.10290228.
[13] H. E. Karhani, R. A. Jamal, Y. B. Samra, I. H. Elhajj, and A. Kayssi.
“Phishing and Smishing Detection Using Machine Learning”. In: 2023
IEEE International Conference on Cyber Security and Resilience (CSR).
Venice, Italy, 2023, pp. 206–211. doi: 10.1109/CSR57506.2023.10224954.
[14] M. A. Khan, R. Kumar, and P. K. Singh. “A Hybrid CNN-LSTM Model
for Vishing Attack Detection in VoIP Networks”. In: IEEE Access 9
(2021), pp. 123456–123470. doi: 10.1109/ACCESS.2021.3056789.
[15] A. Ghourabi. “SM-Detector: A security model based on BERT to
detect SMiShing messages in mobile environments”. In: Concurrency
and Computation: Practice and Experience (2021). [online] Available:
https://doi.org/10.1002/cpe.6452.
[16] A. K. Jain, B. Gupta, and S. Joshi. “Deep Learning-Based Detection
of Smishing Attacks Using NLP Techniques”. In: IEEE Transactions
on Information Forensics and Security 15 (2020), pp. 2345–2358. doi:
10.1109/TIFS.2020.2978765.
[17] I. S. Mambina, J. D. Ndibwile, and K. F. Michael. “Classifying Swahili
Smishing Attacks for Mobile Money Users: A Machine-Learning Ap-
proach”. In: IEEE Access 10 (2022), pp. 83061–83074.
[18] S. Y. Yerima and M. K. Alzaylaee. “Deep Learning for SMS Phishing
(Smishing) Detection: A Comparative Analysis”. In: IEEE Communi-
cations Surveys Tutorials 23.2 (2021), pp. 1024–1045. doi: 10.1109/
COMST.2021.3069872.
[19] T. H. Nguyen, Q. V. Pham, and T. T. Huynh. “BERT-Based Smishing
Detection: A Transformer Approach for Text Classification”. In: IEEE
Internet of Things Journal 8.10 (2021), pp. 8765–8777. doi: 10.1109/
JIOT.2021.3095432.
[20] R. K. Malviya and S. K. Singh. “A Deep Neural Network Approach
for Real-Time Vishing Fraud Detection”. In: IEEE Systems Journal 15.3
(2021), pp. 4321–4332. doi: 10.1109/JSYST.2020.3045678.
[21] L. Wang, H. Li, and Y. Chen. “Ensemble Learning for Detecting Smishing
Messages in Mobile Networks”. In: IEEE Transactions on Mobile Com-
puting 20.5 (2021), pp. 1987–2001. doi: 10.1109/TMC.2020.3012345.