0% found this document useful (0 votes)
55 views7 pages

BreachSeek A Multi-Agent Automated Penetration Tester

Uploaded by

Mai Trọng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

BreachSeek A Multi-Agent Automated Penetration Tester

Uploaded by

Mai Trọng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

BreachSeek: A Multi-Agent Automated Penetration

Tester

Ibrahim AlShehri 1 * Adnan AlShehri 1 * Abdulrahman AlMalki 1 *


ibrahimalshehri@pm.me adnan66b@gmail.com almalki abdulrahman@outlook.com
arXiv:2409.03789v1 [cs.CR] 31 Aug 2024

1*
Majed Bamardouf Alaqsa Akbar 1 *
majedTB12@gmail.com alaqsaakbar@hotmail.com

1
King Fahd University of Petroleum and Minerals (KFUPM)
*
Equal contribution

Abstract while thorough, is inherently time-consuming and


increasingly ineffective in keeping pace with the
The increasing complexity and scale of mod- growing sophistication and diversity of cyberat-
ern digital environments have exposed significant tacks. In an era where networks and applica-
gaps in traditional cybersecurity penetration test- tions are constantly exposed to new vulnerabili-
ing methods, which are often time-consuming, ties, there is a pressing need for automated so-
labor-intensive, and unable to rapidly adapt to lutions that can efficiently identify, exploit, and
emerging threats. There is a critical need for report on these weaknesses.
an automated solution that can efficiently identify Recent advancements in Artificial Intelligence
and exploit vulnerabilities across diverse systems (AI) and Natural Language Processing (NLP)
without extensive human intervention. Breach- have opened up new possibilities for automat-
Seek addresses this challenge by providing an AI- ing complex tasks, including cybersecurity. Large
driven multi-agent software platform that leverages Language Models (LLMs), renowned for their ca-
Large Language Models (LLMs) integrated through pabilities in natural language understanding and
LangChain and LangGraph in Python. This sys- generation, have demonstrated the potential to
tem enables autonomous agents to conduct thor- perform tasks that traditionally required signif-
ough penetration testing by identifying vulnerabil- icant human expertise. Despite these advance-
ities, simulating a variety of cyberattacks, execut- ments, the application of LLMs in cybersecu-
ing exploits, and generating comprehensive secu- rity, particularly for automating penetration test-
rity reports. In preliminary evaluations, Breach- ing, remains largely underexplored, presenting an
Seek successfully exploited vulnerabilities in ex- opportunity to revolutionize how security assess-
ploitable machines within local networks, demon- ments are conducted.
strating its practical effectiveness. Future develop- BreachSeek addresses this critical gap by in-
ments aim to expand its capabilities, positioning it troducing an AI-driven, multi-agent software plat-
as an indispensable tool for cybersecurity profes- form specifically designed to automate penetration
sionals. testing for websites and networks. The platform
leverages the power of LLMs through LangChain
and LangGraph in Python, allowing autonomous
1 Introduction agents to identify vulnerabilities, simulate a vari-
ety of sophisticated cyberattacks, and execute ex-
The rapid evolution of cyber threats has un- ploits with minimal human intervention. By au-
derscored the limitations of traditional cyberse- tomating these processes, BreachSeek not only ac-
curity practices, particularly in the domain of celerates the penetration testing workflow but also
penetration testing. Manual penetration testing, enhances the accuracy and comprehensiveness of

1
the results, providing a robust solution to the ever- GPT outperforming previous models like GPT-
evolving landscape of cybersecurity threats. 3.5 and GPT-4 by significant margins. This un-
One of the key technical innovations in Breach- derscores its effectiveness in maintaining context
Seek is the use of multiple AI agents, each with throughout complex testing scenarios, a critical
a distinct focus, to manage the complexity and challenge in the application of LLMs to penetra-
breadth of tasks involved in penetration testing. tion testing tasks [1].
This approach ensures that the system avoids run- In a broader context, the use of generative AI in
ning out of context window, a common limitation penetration testing offers both opportunities and
in LLMs, and allows for the separation of concerns. challenges. On one hand, generative models can
Each agent is tasked with a specific aspect of the quickly identify vulnerabilities and generate test
testing process, ensuring a high level of specializa- scenarios that might be missed by human testers.
tion and accuracy. This design principle not only For example, tools like Mayhem utilize techniques
optimizes the performance of individual agents but such as fuzzing and symbolic execution to uncover
also contributes to the overall efficiency and effec- vulnerabilities in a fraction of the time it would
tiveness of the platform. take a human tester. These models also bring a
The platform’s scalability further enhances its level of creativity to the process, simulating novel
utility, enabling it to be deployed in a wide range attack vectors that enhance the robustness of pen-
of environments, from small to large-scale net- etration testing. On the other hand, challenges
works. By deploying multiple agents in differ- remain, particularly regarding the models’ ability
ent containers, BreachSeek can efficiently manage to fully grasp the broader context of testing sce-
large volumes of data and complex network archi- narios. This can lead to incomplete or inaccurate
tectures, making it adaptable to various cyberse- results, highlighting the need for further refine-
curity needs. This scalability is particularly bene- ment of these models to ensure they meet the spe-
ficial for organizations that operate in sectors with cific needs of different organizations [2]. Breach-
high security demands, such as finance, healthcare, Seek addresses some of these challenges by em-
and government, where the ability to rapidly and ploying multiple AI agents to manage context win-
accurately identify vulnerabilities is crucial. dows, ensuring a more comprehensive understand-
In summary, BreachSeek represents a signifi- ing throughout the penetration testing process.
cant advancement in the field of automated cy- Unlike other tools, BreachSeek doesn’t just gen-
bersecurity penetration testing. By combining the erate text-based outputs but also executes com-
power of AI-driven agents with the flexibility and mands within a terminal, directly interacting with
scalability required in modern network environ- the target environment.
ments, BreachSeek offers a comprehensive solution
LLMs are not only transforming penetration
that addresses the limitations of traditional pen-
testing but are also being integrated into vari-
etration testing methods. As cyber threats con-
ous aspects of cybersecurity. Their applications
tinue to evolve, tools like BreachSeek will become
extend to defensive measures, such as risk man-
increasingly vital in ensuring the security and re-
agement and automated vulnerability fixing. In
silience of digital infrastructure.
these areas, LLMs help automate complex tasks,
reducing the need for human intervention and al-
2 Literature Review lowing for faster, more efficient responses to secu-
rity threats. However, the effectiveness of LLMs
Recent advancements in large language mod- is often limited by their ability to maintain con-
els (LLMs) have significantly impacted the field text over extended interactions, a challenge that
of cybersecurity, particularly in the automation continues to be a focal point in ongoing research.
of penetration testing. Traditionally, penetration Future advancements are expected to improve the
testing has been a manual and labor-intensive adaptability of LLMs to specific organizational en-
process, requiring significant expertise and time. vironments, enabling them to continuously learn
However, the introduction of tools like Pentest- and remain effective against evolving cybersecu-
GPT marks a turning point in how these tasks rity threats [3]. Additionally, BreachSeek uniquely
can be automated. PentestGPT leverages the ex- contributes to this space by generating a compre-
tensive knowledge embedded in LLMs to perform hensive, formatted PDF report that captures the
tasks traditionally handled by human penetration entire journey of the penetration testing process,
testers. This tool has been evaluated using a providing valuable insights that are automatically
benchmark created from popular platforms like documented and ready for review.
HackTheBox and VulnHub, which includes 182 The integration of LLMs into cybersecurity,
sub-tasks aligned with OWASP’s top 10 vulner- particularly in automated penetration testing, rep-
abilities. The results indicate a remarkable im- resents a significant step forward in enhancing se-
provement in task completion rates, with Pentest- curity measures. However, these advancements

2
Figure 1: The general workflow of such models

come with their own set of challenges that re- 3.3 Specific Architecture for Pene-
searchers and practitioners must continue to ad- tration Testing
dress to fully realize the potential of these tech-
nologies. The continued refinement of tools like For this study, we implemented a specialized
PentestGPT, alongside broader applications of architecture (Figure 2) that adheres to the general
generative AI in cybersecurity, will likely shape the workflow while incorporating task-specific agents:
future of how organizations defend against increas-
1. Recorder: Maintains a summary of actions
ingly sophisticated cyber threats.
and generates a final report when prompted

2. Pentester: Accesses tools including a shell


3 Model Architecture and tool and a Python tool, enabling the utiliza-
Implementation tion of popular penetration utilities in a Kali
Linux environment. Its primary role is to ex-
3.1 Graph-Based Approach Using ecute commands generated by the supervisor
LangGraph and report the output to the evaluator.

Our model employs a graph-based architecture


implemented using LangGraph, enabling the cre- 3.4 Implementation Environment
ation of multiple specialized nodes that communi- The model was deployed in a Docker-based Kali
cate with each other. This distributed approach Linux environment hosted on RunPod. Key imple-
offers several advantages: mentation details include:
1. Enhanced performance through task distribu- • Development phase: Utilized Anthropic’s
tion across multiple nodes/agents Claude 3.5 Sonnet model
2. Flexibility in customizing logic for individual
• Testing and future deployment: Plans to use
nodes
Llama 3.1, an open-source model allowing for
3. Mitigation of context window limitations by customized fine-tuning
distributing tasks
This architecture and implementation approach
allow for a flexible, scalable, and efficient system
3.2 General Model Workflow for automated penetration testing using large lan-
The general workflow of our model, as illus- guage models. The combination of specialized
trated in Figure 1, consists of the following com- agents, a robust evaluation mechanism, and a su-
ponents: pervisory component enables complex, multi-step
operations while maintaining coherence and goal-
• Supervisor: Oversees the entire process, gen- directedness throughout the penetration testing
erating action plans and identifying subse- process.
quent steps
• Specialized agents: Execute specific tasks 3.5 Testing Methodology
within their domains of expertise
For evaluation purposes, a Metasploitable 2 ma-
• Evaluator: Assesses the output quality and chine was hosted on the same local network as the
task completion accuracy model. The model was then tasked with exploiting

3
vulnerabilities on this machine, providing a real- focus on integrating a user permission system
istic scenario for assessing its penetration testing that prompts for approval before executing specific
capabilities. tools or commands. This feature will allow users to
maintain oversight and intervene when necessary,
ensuring that critical actions are only performed
with explicit consent. This approach not only in-
creases the security of the testing process but also
provides a safeguard against unintended or poten-
tially harmful operations.

5.2 Fine Tuning


Further development of BreachSeek will involve
fine-tuning the model using specialized cybersecu-
rity data. By implementing web scraping tech-
niques to gather cybersecurity write-ups and de-
tailed penetration testing reports, BreachSeek can
be trained on a vast array of real-world scenar-
Figure 2: The specific workflow used by our model ios and methodologies. This training will enable
the system to become more adept at identifying
3.6 Web UI vulnerabilities and recommending effective testing
strategies, ultimately improving its performance
As part of the product suite we offer, a web UI and reliability in diverse environments.
was developed using NextJS for the front-end and
FastAPI for the back-end. A sample from the web
UI can be seen in the appendix.
5.3 Retrieval-Augmented Genera-
tion (RAG)
4 Results To enhance the decision-making process during
penetration testing, BreachSeek will incorporate a
The efficacy of our model was initially evalu- Retrieval-Augmented Generation (RAG) system.
ated through qualitative assessment. Future work This approach will allow BreachSeek to reference a
will incorporate quantitative measures using es- vector database containing useful penetration test-
tablished benchmarks and standardized examina- ing techniques, strategies, and past experiences.
tions. By accessing this database, the system can provide
Potential benchmarks may include the OWASP more informed and contextually relevant recom-
Web Security Testing Guide (WSTG) [4]. Addi- mendations, thereby increasing the effectiveness of
tionally, we plan to utilize the Offensive Security the testing process.
Certified Professional (OSCP) [5] exam content as
a standardized measure of performance. 5.4 Dynamic and Engaging Re-
In our preliminary testing, the model success-
sponses for Enhanced Interac-
fully exploited a Metasploitable 2 machine, achiev-
ing root access with approximately 150,000 to- tion
kens. This demonstrates the model’s capability In future versions of BreachSeek, a key enhance-
to perform complex penetration testing tasks au- ment will be the introduction of dynamic response
tonomously. modes that cater to a variety of user preferences.
Moreover, our findings suggest that minor ad- Instead of restricting the system to purely security-
justments to the workflow and system prompts en- related prompts, BreachSeek will offer users the
able the creation of systems capable of address- ability to engage with the model in different styles,
ing challenges in diverse domains. This versatil- including fun and relaxed modes. This flexibility
ity indicates the potential for developing general- will not only make interactions more enjoyable but
purpose workflows based on our approach. also allow users to choose their preferred character
or style for chatting.
For those who prefer a more focused approach,
5 Future Work BreachSeek will also include a mode that priori-
tizes task-related communication, minimizing any
5.1 Human Intervention
off-topic chatter. This option ensures that users
To enhance the safety and control of Breach- who need to stay concentrated on security test-
Seek during penetration testing, future work will ing can do so without distractions, with the model

4
only responding when the conversation is directly References
related to the task at hand.
Whether a user prefers a witty companion, a [1] G. Deng, Y. Liu, V. Mayoral-Vilches, et al.,
laid-back conversationalist, or a task-focused pro- “Pentestgpt: An llm-empowered automatic
fessional, BreachSeek will adapt to meet these penetration testing tool,” arXiv (Cornell Uni-
preferences while still providing top-notch secu- versity), Jan. 1, 2023. doi: 10.48550/arxiv.
rity testing services. By expanding the scope of 2308 . 06782. [Online]. Available: https : / /
responses and introducing a range of interaction arxiv.org/abs/2308.06782.
styles, BreachSeek will maintain both its relevance [2] E. Hilario, S. Azam, J. Sundaram, K. I. Mo-
to security tasks and its appeal to a broader audi- hammed, and B. Shanmugam, “Generative ai
ence. for pentesting: The good, the bad, the ugly,”
International Journal of Information Secu-
5.5 Multi-Modality rity, vol. 23, no. 3, pp. 2075–2097, Mar. 15,
2024. doi: 10.1007/s10207- 024- 00835- x.
To further expand the capabilities of Breach- [Online]. Available: https://doi.org/10.
Seek, future iterations will introduce multi-modal 1007/s10207-024-00835-x.
input support, allowing users to submit images
and videos as part of the penetration testing pro- [3] F. N. Motlagh, M. Hajizadeh, M. Majd, P.
cess. This feature will enable the system to analyze Najafi, F. Cheng, and C. Meinel, “Large lan-
visual content, such as screenshots of network se- guage models in cybersecurity: State-of-the-
tups or video recordings of security camera feeds, art,” arXiv (Cornell University), Jan. 30,
providing a more comprehensive analysis and en- 2024. doi: 10 . 48550 / arxiv . 2402 . 00891.
abling more sophisticated testing scenarios. By in- [Online]. Available: https : / / arxiv . org /
corporating multiple data types, BreachSeek will abs/2402.00891.
be better equipped to handle a broader range of [4] “Owasp web security testing guide — owasp
penetration testing challenges. foundation.” (Dec. 3, 2020), [Online]. Avail-
able: https : / / owasp . org / www - project -
web-security-testing-guide/.
6 Conclusion [5] OffSec. “Pen-200: Penetration testing certifi-
BreachSeek, a multi-agent automated pene- cation with kali linux — offsec.” (Aug. 26,
tration testing platform, addresses critical gaps 2024), [Online]. Available: https : / / www .
in traditional cybersecurity practices by leverag- offensive-security.com/pwk-oscp/.
ing Large Language Models through LangGraph.
Its graph-based architecture, comprising special-
ized agents like the supervisor, pentester, and
recorder, enables efficient task distribution and
mitigates context window limitations. Deployed in
a Docker-based Kali Linux environment, Breach-
Seek demonstrated its effectiveness by success-
fully exploiting a Metasploitable 2 machine within
150,000 tokens. While initially evaluated quali-
tatively, future work will incorporate quantitative
measures using benchmarks like OWASP WSTG
and OSCP exam content. Planned enhancements
include a user permission system for human over-
sight, fine-tuning with specialized cybersecurity
data, integration of Retrieval-Augmented Genera-
tion (RAG), enhanced dynamic and responsive in-
teractions according to user preference, and multi-
modal input support. These advancements, cou-
pled with BreachSeek’s ability to generate compre-
hensive security reports, position it as a powerful,
adaptable tool in the evolving landscape of AI-
driven cybersecurity solutions, promising contin-
ued innovation in automated penetration testing
and defense against sophisticated cyber threats.
The code used for the model can be found here:
https://github.com/snow10100/pena/

5
A Appendix

Figure 3: The clean web UI when you start chatting with model

Figure 4: The AI agents performing a task

6
Figure 5: The web UI when the task is done

Figure 6: Web UI Dark mode

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy