D (1) (1) Report2 Srushti
D (1) (1) Report2 Srushti
2024-2025
DEPARTMENTOFELECTRONICSANDCOMMUNICATION
ENGINEERING
SRITARALABALUJAGADGURUINSTITUTEOF TECHNOLOGY
RANEBENNUR-581115
SRITARALABALUJAGADGURUINSTITUTEOF TECHNOLOGY
RANEBENNUR-581 115
(Affiliated toVisvesvaraya Technological University,Belagavi)
2024-2025
CERTIFICATE
It is to certify that the Seminar on “A Dynamic Reward-Based Deep Reinforcement
Learning For IOT Intrusion Detection” has been successfully presented at STJ INTITUTE
OF TECHNOLOGY by Srushti Siddappa Menasinakai, 2SR21EC050, in partial fulfilment of
the requirements for the degree of Bachelor of Electronics and communication Engineering
of Visvesvaraya Technological University, Belagavi during academic year 2024-2025. It is
certified that all correction indicated for Internal Assessment have been incorporated in the
report deposited in the department library. The Seminar report Sasitsatisfies theAcademic
requirements in respect of Seminar work for the above degree.
Submitted by
Date:
Place:Ranebennur
Submittedby
The rapid proliferation of Internet of Things (IoT) technology has enhanced human quality of life
while simultaneously introducing significant cybersecurity challenges. IoT devices are constrained
by limited computational resources, storage capacity, and power supply, rendering them susceptible
to botnet exploitation and Distributed Denial of Service (DDoS) attacks. Conventional signature-
based intrusion detection systems (IDS) are frequently deployed to mitigate network attacks in IoT
environments. However, these systems heavily rely on manual expert knowledge and exhibit limited
adaptability to emerging threats, particularly when confronted with novel attacks such as zero-day
exploits. Deep reinforcement learning (DRL), which enables agents to make autonomous decisions
through policy function approximation, has demonstrated promising results in network attack
identification. This paper proposes a DRL-based intrusion detection system for identifying diverse
attack vectors in IoT environments. To enhance the model’s sensitivity to multi- class samples, we
design a dynamic reward function, thereby improving the overall network attack recognition
capabilities. The proposed model is validated using the Bot-IoT dataset. Experimental results
indicate that the model achieves 99% accuracy in classification.
LISTOF FIGURES
FIGN0. PAGE NO.
1 INTRODUCTION 1
1.1 HISTORY 1
1.2PROBLEM STATEMENT 2
1.3 OBJECTIVES 2
2 LITERATURESURVEY 3
5 CONCLUSIONANDFUTUREWORK 15
REFERENCES
A Dynamic Reward-Based Deep Reinforcement Learning for IOT Intrusion Detection
CHAPTER1:
INTRODUCTION
The rapid proliferation of IoT technology has enhanced efficiency in smart cities, homes, and industrial
automation. However, sophisticated cyber attacks pose unprecedented challenges to IoT security.
Existing intrusion detection systems lack scalability and adaptability for the increasing number of
connected devices and emerging attack vectors, necessitating more robust defensive mechanisms.
Intrusion Detection Systems (IDS) originated from model. Conventional IDS include signature- based
and anomaly-based approaches. Signature-based IDS identify known threats but struggle with zero-day
attacks, while anomaly-based IDS detect deviations from baseline behavior, enabling discovery of
unknown threats. Machine learning (ML) has advanced intrusion detection by analyzing network traffic
patterns. Deep learning (DL) leverages multi-layer neural networks to approximate complex functions,
demonstrating superior capabilities in attack classification
1.1 History:
The field of intrusion detection has evolved significantly over the years, beginning with rule-based expert
systems in the early 1980s. These early IDS models relied on predefined signatures of known attacks to detect
intrusions. While effective against well-documented threats, they lacked the capability to detect zero-day
attacks and sophisticated malware.
In the 1990s and early 2000s, the rise of network-based intrusion detection systems (NIDS) introduced
anomaly detection techniques, which analyzed deviations from normal network behavior. These methods
employed statistical models, clustering algorithms, and basic machine learning classifierslike decision trees
and support vector machines (SVM). However, as IoT networks became more complex, these approaches
struggled with scalability and false positive rates.
With the rapid growth of deep learning in the 2010s, researchers began integrating neural networks into IDS.
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) improved the accuracy of
anomaly detection, but these models required large labeled datasets and extensive computational resources.
More recently, reinforcement learning has gained attention due to its ability to *adapt to evolving attack
strategies without extensive retraining.
The latest advancements in Deep Reinforcement Learning (DRL)have paved the way for highly adaptive and
self-learning IDS models. These models leverage real-time feedback loops to adjust their detection strategies
dynamically, making them well-suited for IoT environments where security threats evolve continuously.
The proposed dynamic reward-based DRL approach builds on this historical progression, introducing a more
efficient and adaptable intrusion detection mechanism that addresses the limitations of previous systems.
Uses the Bot-IoT dataset, which is a benchmark dataset for IoT security.z
Achieves 99% accuracy in attack detection by addressing class imbalance with negative sampling and
SARSA encoding.
Modern Advancements:
Dynamic Reward-Based Deep Reinforcement Learning (DRL) for IoT Intrusion Detection have evolved
significantly over the past decade. In 2015, early works focused on basic anomaly detection using traditional
machine learning models. By 2018, researchers integrated Deep Q-Networks (DQN) and Convolutional Neural
Networks (CNN) to improve accuracy and adaptability. In 2020, Principal Component Analysis (PCA) and
Recursive Feature Elimination (RFE) were introduced for efficient feature selection, reducing computational
overhead. From 2021 onwards, adaptive reward mechanisms and hybrid AI approaches (e.g., combining DRL
with anomaly-based IDS) have enhanced real-time intrusion detection. Recent works in 2023-2024emphasize
cloud-based IDS solutions and federated learning, enabling scalable and privacy-preserving security in smart
homes, industrial IoT, and critical infrastructure.
1.2 Problemstatement:
With the increasing number of IoT devices, cybersecurity threats such as malware, unauthorized access, and
denial-of-service (DoS) attacks are becoming more common. Traditional intrusion detection systems (IDS)
struggle to handle the dynamic and complex nature of IoT networks due to their limited adaptability and
predefined rule-based mechanisms. These systems often produce high false positive rates and fail to detect
new or evolving threats efficiently.
1.3 Objectives:
1.Improve detection accuracy by leveraging reinforcement learning to adapt to new attack patterns.
2. Reduce false positives through an optimized reward mechanism that refines the learning process.
3.Enhance adaptability by allowing the IDS to learn and evolve dynamically without manual intervention.
CHAPTER2
LITERATURESURVEY
[1] D. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. SE-13,
pp. 222–232, Feb. 1987. Conference Name: IEEE Transactions on Software Engineering.
[2] B. Sudharsan, J. G. Breslin, and M. I. Ali, “ML-MCU: A Framework to Train ML Classifiers on MCU-
Based IoT Edge Devices,” IEEE Internet of Things Journal, vol. 9, pp. 15007–15017, Aug. 2022. Conference
Name: IEEE Internet of Things Journal.
[3] Z. Bao, Y. Lin, S. Zhang, Z. Li, and S. Mao, “Threat of Adversarial Attacks on DL-Based IoT Device
Identification,” IEEE Internet of Things Journal, vol. 9, pp. 9012–9024, June 2022. Conference Name: IEEE
Internet of Things Journal.
[4] A. Uprety and D. B. Rawat, “Reinforcement Learning for IoT Security: A Comprehensive Survey,” IEEE
Internet of Things Journal, vol. 8, pp. 8693–8706, June 2021. Conference Name: IEEE Internet of Things
Journal.
[5] Y. Wang, Y. Jia, Y. Tian, and J. Xiao, “Deep reinforcement learning with the confusion-matrix-based
dynamic reward function for customer credit scoring,” Expert Systems with Applications, vol. 200, p. 117013,
Aug. 2022.
[6] M. Shafiq, Z. Tian, A. K. Bashir, X. Du, and M. Guizani, “CorrAUC: A Malicious Bot-IoT Traffic
Detection Method in IoT Network Using Machine-Learning Techniques,” IEEE Internet of Things Journal,
vol. 8, pp. 3242–3254, Mar. 2021. Conference Name: IEEE Internet of Things Journal.
[7] H. Qiu, Q. Zheng, T. Zhang, M. Qiu, G. Memmi, and J. Lu, “Toward Secure and Efficient Deep Learning
Inference in Dependable IoT Systems,” IEEE Internet of Things Journal, vol. 8, pp. 3180–3188, Mar. 2021.
Conference Name: IEEE Internet of Things Journal.
[8] N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards the development of realistic botnet
dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset,” Future Generation Computer
Systems, vol. 100, pp. 779–796, Nov. 2019.
CHAPTER3
SYSTEMARCHITECTURE
The Dynamic Reward-Based Reward Intrusion Detection System enhances IoT security by learning and
adapting to cyber threats. It uses deep reinforcement learning (PPO, DQN) with a dynamic reward mechanism
to improve detection accuracy. The system analyzes real-time network traffic, detects intrusions like DDoS
and botnets, and reduces false positives. A self-learning feedback loopensures continuous improvement for
proactive threat defense.
An IoT Intrusion Detection System (IDS) using Deep Reinforcement Learning (DRL). The system
starts with a ToN_IoT dataset, which undergoes preprocessing (normalization and scaling) to clean
and standardize the data. After sampling, key features are extracted using log-likelihood sliding
principal component analysis, followed by training a Dynamic Reward-Based Reinforcement
Learning model to detect cyber threats.
3.1 Working :
The IoT ecosystem encompasses a diverse array of devices, including cameras, sensors, vehicles,
and industrial equipment, all of which are vulnerable to net- work attacks. The proposed model
aggregates both malicious and benign network traffic to construct a comprehensive dataset for IDS
training. During the preprocessing phase, the data undergoes a series of operations including
integration, cleaning, transformation, and normalization (as formulated in equation 1). The
processed data is subsequently structured in the SARSA format [st , at , r, st+1, at+1] to conform to
the input requirements of the reinforcement learning paradigm. The preprocessed data serves as
input for the Deep Reinforcement Learning (DRL) model during the training phase. The model
iteratively optimizes its parameters based on the reward signals provided by the simulated
environment while concurrently making predictions on the input data to classify potential network
attacks. Leveraging neural networks, the model approximates the policy function intrinsic to
reinforce- ment learning, thereby achieving efficient and adaptive attack identification.
The Bot-IoT dataset was collaboratively developed by the University of New South Wales, Australia, and
the Cyber Security Cooperative Research Centre, with a primary focus on botnet attack research in IoT
environments. This comprehensive dataset incorporates both authentic and simulated network traffic,
encompassing a diverse range of attack vectors. It comprises over 72 million records, each characterized by
42 features. The high-fidelity labeling of this dataset renders it an invaluable resource for research in intrusion
detection and network anomaly detection.
As per above figure Using the Bot-IoT dataset as a case study, the preprocessing pipeline includes: data
integration, feature selection, data cleaning, data transformation, and normalization. Approximately 3.5
million records are extracted and consolidated from the original log files, with about 20 features selected
based on the recommendations from the original paper [8]. Post-cleaning, approximately 3.3 million
valid records are retained, with categorical variables encoded using label Encoder. The ”attack” and
”daddr” columns, which exhibit high correlation with attack traffic, are excluded to enhance model
generalization. Finally, all numerical values are scaled to the range of 0-1 using min-max normalization.
The DQN algorithm combines Q-Learning with deep neural networks to approximate the action-value
function Q(s, a), and optimizes for maximum cumulative reward based on the Bellman equation
(Equation 2). The policy is derived from the Q-function as follows: π(st+1) = argmaxaQ(st+1,a)
This enhancement aims to augment the model’s capability to detect underrepresented attack vectors and
improve overall classification accuracy in the context of anomaly detection. And show in algorithm 1.
3.2 Challenges:
"A Dynamic Reward-Based Deep Reinforcement Learning for IoT Intrusion Detection" highlights several key
challenges in developing an effective intrusion detection system using deep reinforcement learning (DQN-
based model). Some of the main challenges include:
The Bot-IoT dataset used in the study has an inherent imbalance, where certain attack categories (such
as Information Theft) have significantly fewer samples than others (like DDoS and DoS).
This imbalance creates difficulties in training the model effectively, leading to reduced accuracy for
underrepresented attack types.
The model is tested in both binary classification (benign vs. malicious) and multi-class classification
(benign, DDoS, DoS, Reconnaissance, Information Theft).
While binary classification achieves high accuracy (close to 1.0 for benign and malicious traffic),
multi-class classification introduces additional complexity due to overlapping attack patterns.
The model performs well for most attack types but struggles with Information Theft, which has fewer
samples in the dataset.
This leads to a lower recognition rate for this category compared to others like DDoS and
Reconnaissance.
Training the model on a specific dataset may not ensure optimal performance when applied to different real-
world IoT environments.
IoT devices vary in architecture, behavior, and security vulnerabilities, making it difficult to generalize the
model’s performance across different networks.
Deep reinforcement learning models require high computational power and large training times, making
deployment challenging for resource-constrained IoT devices.
Optimizing the model for real-time detection while maintaining efficiency remains a challenge.
3.3 RESULTS:
This study utilizes the Bot-IoT dataset for empirical evaluation, where the inherent class imbalance poses
significant challenges for model training and performance assessment. We train the proposed DQN-based
model using preprocessed data and evaluate its efficacy. The experimental framework encompasses two
classification paradigms: binary classification (discriminating between benign and malicious traffic) and
multi-class classification (categorizing traffic into five distinct classes: benign,DDoS, DoS, Reconnaissance,
and Information Theft). This experimental design enables a comprehensive evaluation of the model’s
performance across tasks of varying complexity and granularity.
The model achieves performance scores exceeding 0.99 for the Normal, DDoS, DoS, and Reconnaissance
cate- gories, while the Information Theft category shows slightly lower but still robust performance above
0.95, potentially attributable to the limited sample size in this class. (b) illustrating multi-class classification.
In these matrices, rows represent predicted classes, columns denote true classes, diagonal elements indicate
correct classification proportions, and color intensity corresponds to proportion levels. In binary
classification, the model attains recognition rates of 1.0 and 0.99 for benign and malicious traffic,
respectively.
CHAPTER4:
ADAVNTAGES,DISADAVNTAGESANDAPPLICATIONS
Adavntages:
Disadavntages:
1.Computational Complexity :
Training DRL models requires high processing power and memory, which may not be feasible for
low-power IoT devices.
3. Data Dependency :
The model's performance depends on large labeled datasets like Bot-IoT. If real-world IoT traffic
differs significantly, retraining is required.
4. Overfitting Risks :
The model may become too specialized to training data, leading to poor generalization in real-world
IoT networks.
Applications:
CHAPTER5:
CONCLUSION AND FUTUREWORK
Conclusion:
This study proposes a novel Network Intrusion Detec- tion System (NIDS) for IoT environments, leveraging
deep reinforcement learning techniques. The system employs a dynamic reward function to enhance the
model’s capability in addressing class imbalance and improving multi-class at- tack detection. Empirical
evaluations on the Bot-IoT dataset demonstrate the model’s exceptional performance in both binary and multi-
class classification tasks, achieving detection accuracy exceeding 99% for the majority of attack vectors. The
enhanced DQN model exhibits superior performance compared to conventional methodologies in mitigating
class imbalance issues, particularly in the detection of DDoS and DoS attacks, thus offering a robust solution
for IoT network security.
This study proposes a novel Network Intrusion Detection System (NIDS) for IoT environments, leveraging
deep reinforcement learning techniques. The system employs a dynamic reward function to enhance the
model’s capability in addressing class imbalance and improving multi-class at- tack detection. Empirical
evaluations on the Bot-IoT dataset demonstrate the model’s exceptional performance in both binary and multi-
class classification tasks, achieving detection accuracy exceeding 99% for the majority of attack vectors. The
enhanced DQN model exhibits superior performance compared to conventional methodologies in mitigating
class imbalance issues, particularly in the detection of DDoS and DoS attacks, thus offering a robust solution
for IoT network security.
Futurework:
A Dynamic Reward-Based Deep Reinforcement Learning for IoT Intrusion Detection focuses on enhancing
the model’s performance, scalability, and real-world applicability. Key directions include improving the deep
reinforcement learning framework with advanced architectures, optimizing computational efficiency for large-
scale IoT networks, and testing the model in real-time environments with evolving cyber threats. Additionally,
future research aims to generalize the system across diverse IoT communication protocols (e.g., MQTT,
CoAP) and integrate hybrid detection mechanisms such as federated learning or blockchain for enhanced
security. Addressing adversarial attacks and refining the reward mechanism for better adaptability are also
crucial areas for further exploration.
[2] M. Shafiq, Z. Tian, A. K. Bashir, X. Du, and M. Guizani, “CorrAUC: A Malicious Bot-IoT
Traffic Detection Method in IoT Network Using Machine-Learning Techniques,” IEEE Internet
of Things Journal, vol. 8, pp. 3242–3254, Mar. 2021. Conference Name: IEEE Internet of
Things Journal.
[3] H. Qiu, Q. Zheng, T. Zhang, M. Qiu, G. Memmi, and J. Lu, “Toward Secure and Efficient
Deep Learning Inference in Dependable IoT Systems,” IEEE Internet of Things Journal, vol. 8,
pp. 3180–3188, Mar. 2021. Conference Name: IEEE Internet of Things Journal.
[5] A. M. Seid, G. O. Boateng, B. Mareri, G. Sun, and W. Jiang, “Multi-Agent DRL for Task
Offloading and Resource Allocation in Multi-UAV Enabled IoT Edge Network,” IEEE
Transactions on Network and Service Management, vol. 18, pp. 4531–4547, Dec. 2021.
Conference Name: IEEE Transactions on Network and Service Management.