Doc - Dissertation
Doc - Dissertation
I hereby declare that the work which is being presented in this dissertation entitled, "Malicious
URL Detection System” submitted to Jharkhand Raksha Shakti University in the partial
fulfillment of the requirements for the award of the degree of Bachelor of Science (Honours) in
Computer Application and Cyber Security, is an authentic record of my own research work
carried out under the supervision of Mr. Vikash Kumar Agarwal, Assistant Professor in
Department of Computer Application and Cyber Security, JRSU, Ranchi.
i
CERTIFICATE
This is to certify that the research work entitled as “Malicious URL Detection System” is an
authentic record of research work carried out Mr. Pinku Kumar(JRSU/BCACS/30/2020), of
Bachelor of Science (Honours) in Computer Application and Cyber Security of Jharkhand
Raksha Shakti University, Merus Road, Ranchi. His work is original and has not been submitted
anywhere else. His Conduct was very good during the course of his association. The dissertation
has not previously served as the basis for the award of any other degree, diploma, fellowship, or
similar title.
_________________ _________________
(Internal Examiner) (External Examiner)
__________________
(Course Coordinator)
ii
TABLE OF CONTENTS
Candidate’s Declaration i
Certificate by Guide ii
Table of Content iii
List of Figures iv
List of Tables v
Abstract vi
CHAPTER 1
Introduction 1
CHAPTER 2
Literature Review 9
CHAPTER 3
Methodology 13
CHAPTER 4
System Design and Analysis 27
CHAPTER 5
Implementation 34
CHAPTER 6
Results and Evaluation 42
CHAPTER 7
Conclusion 47
CHAPTER 8
Future Work 50
REFERENCES 73
iii
LIST OF FIGURES
1. URL Features 1
2. Malicious URL Detection System Framework 3
3. URL Distribution 36
4. Feature Correlation Heatmap 39
5. Voting Classification Report 44
6. Confusion Matrix Analysis 45
7. Main Menu 52
8. CyberTraq (URL Generation system) 52
9. Location Prompt 53
10. Location Data 53
11. Sandboxing 54
12. URLTraq (URL Detection System) 54
iv
LIST OF TABLES
1. Classification Report 43
v
ABSTRACT
In the dynamic landscape of cybersecurity, the battle between malicious actors and defenders rages
on, with the battleground being the very URLs we navigate daily. This paper presents an innovative
Malicious URL Generation and Detection System designed to tackle the ever-evolving threats
lurking behind innocent-looking links. Our system employs cutting-edge techniques in machine
learning, URL manipulation, and real-time analysis to safeguard users and systems from potential
cyber onslaughts.
Data preprocessing and storage lay the groundwork for accurate and robust URL classification. By
collecting, cleaning, and encoding data, we create a solid foundation upon which our machine
learning models thrive. Feature engineering extracts subtle nuances from URLs, while sandbox
logs and historical records capture the essence of each analysis, empowering users with
transparency and insight.
Ingenious techniques for malicious URL generation simulate potential attack scenarios, bringing
the system's capabilities to life without causing harm. From URL manipulation to redirection, the
system demonstrates its prowess in generating URLs that mimic real-world threats within a
controlled environment.
Central to our system's strength lies its URL detection algorithms and models. A symphony of data
collection, preprocessing, model selection, and training harmoniously produce classifiers that
distinguish between the benign and the malicious. As users interact via the intuitive CLI, they
become part of a security symposium, engaging in real-time analysis, historical reflections, and
personalized settings.
This work represents a significant stride towards fortifying the digital realm against malicious
URLs. By fusing innovative technology, adaptable algorithms, and user-centric design, our
Malicious URL Generation and Detection System stands as a resilient sentinel, tirelessly guarding
against the unseen threats lurking beneath the surface of every click.
vi