0% found this document useful (0 votes)
77 views12 pages

Understanding Hackers Work An Empirical Study of

Uploaded by

rabin adhikari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views12 pages

Understanding Hackers Work An Empirical Study of

Uploaded by

rabin adhikari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Understanding Hackers’ Work:

An Empirical Study of Offensive Security Practitioners


Andreas Happe Jürgen Cito
andreas.happe@tuwien.ac.at juergen.cito@tuwien.ac.at
TU Wien TU Wien
Vienna, Austria Vienna, Austria
ABSTRACT be verified too. A pragmatic approach1 is to perform security as-
arXiv:2308.07057v1 [cs.SE] 14 Aug 2023

Offensive security-tests are a common way to pro-actively discover sessments, also known as penetration tests (pen-tests), to identify
potential vulnerabilities. They are performed by specialists, often vulnerabilities and remediate them before they are discovered and
called penetration-testers or white-hat hackers. The chronic lack exploited by malicious actors. This approach is limited by the avail-
of available white-hat hackers prevents sufficient security test cov- ability of skilled offensive security professionals. While this situa-
erage of software. Research into automation tries to alleviate this tion should be remediated through increased enrollment in IT se-
problem by improving the efficiency of security testing. To achieve curity educational programs, improving the efficiency of the pene-
this, researchers and tool builders need a solid understanding of tration testers through tooling is an equally important measure. To
how hackers work, their assumptions, and pain points. accomplish this, research and tooling should be well-aligned with
In this paper, we present a first data-driven exploratory quali- security professionals’ activities and needs.
tative study of twelve security professionals, their work and prob- However, to the best of our knowledge, there has been no empir-
lems occurring therein. We perform a thematic analysis to gain in- ical research into what type of security assessments are performed,
sights into the execution of security assignments, hackers’ thought what actions are regularly performed within those, or how profes-
processes and encountered challenges. sionals select attacks to be run against their targets. Without this,
This analysis allows us to conclude with recommendations for developments might be swift but potentially misguided, and thus
researchers and tool builders to increase the efficiency of their au- eventually irrelevant.
tomation and identify novel areas for research.
Research Questions & Structure of this Work. We used three
research questions to drive the development of this work; the ap-
KEYWORDS plied research method is described in the Methodology section.
software testing, offensive security testing, ethical hacking Our first research question was “What do common security
tests look like?” We present the gathered information in section
ACM Reference Format:
Performing Security Tests, detailing different types of assign-
Andreas Happe and Jürgen Cito. 2023. Understanding Hackers’ Work: An
ments, their particularities, common actions performed during as-
Empirical Study of Offensive Security Practitioners. In Proceedings of the
31st ACM Joint European Software Engineering Conference and Symposium signments, and the role of automation.
on the Foundations of Software Engineering (ESEC/FSE ’23), December 3–9, The second research question “How do Hackers perform their
work?” focused on the inner world of our participants. Education
2023, San Francisco, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3611643.3613900
is an important part of socialization, therefore, results about this
aspect is included in section Becoming A Hacker. In Section How
1 INTRODUCTION do Hackers think? we present recurring themes detected during
For convenience and efficiency reasons, more and more devices our analysis. We focus on thought processes during assignment ex-
are being connected and thus exposed to public networks. While ecution, target and attack selection, dealing with uncertainty, and
beneficial, this has a dark undercurrent: the respective system’s at- internal quality assurance.
tack surface is increased and could be exploited by malicious actors. The Discussions and Implications Section is the response to
In a perfect world, all created software would be free from faults. the final “What tedious or time-consuming areas could be
As recent [19, 20], and not so recent [29], news implies, we are improved?” question. We grouped the identified research and de-
sadly not there yet. While secure software development, enabled velopment opportunities according to our target audience of re-
by defensive security testing [44, 52, 57, 58], is the long-term goal, searchers and tool builders.
short-term interventions are needed. In addition, there is an ever-
increasing abundance of legacy software whose security needs to 2 RELATED WORK
While there has been ample research on secure software develop-
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA
ment and defensive security testing [44, 52, 57, 58], the focus of our
© 2023 Copyright held by the owner/author(s). study is offensive security testing. To the best of our knowledge,
This is the author’s version of the work. It is posted here for your personal use. Not this is the first work that focuses on how hackers work, i.e., the
for redistribution. The definitive Version of Record was published in Proceedings of
the 31st ACM Joint European Software Engineering Conference and Symposium on the
Foundations of Software Engineering (ESEC/FSE ’23), December 3–9, 2023, San Francisco,
1 Pen-tests
are not the only approach. Another pro-active approach, involved during
CA, USA, https://doi.org/10.1145/3611643.3613900.
software conception and development, has become known as “left-shifting” security.
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

context within which a security professional moves and the pro- Table 1: Participants
cesses that influence their decisions during security assignments.
Huaman et al. [39] performed a large-scale interview study of Participant Primary Secondary
German small-to-medium enterprises (SMEs). While SMEs are mak-
Participant 1 web infrastructure, iso27001
ing up a third of Germany’s GDP, they often lack resources for
establishing an effctive cyber-security posture. It analyzes their Participant 2 web infrastructure, mobile
preconception with regards to cybercrime, their adoption of secu- Participant 3 red-team AD, OT, web
Participant 4 web social engineering
rity measures and their experiences with attacks. In contrast to
our study, this focuses upon the potential “victims”, not upon se- Participant 5 red-teaming, IoT OT, web, social engineering
curity operators. One interesting finding was that 45.1% of inter- Participant 6 web AD, social engineering
Participant 7 infrastructure web, tool development
viewed companies had a cybersecurity incident warranting man-
ual response in the preceding 12 months — further highlighting Participant 8 web infrastructure
the need for trained personnel. Participant 9 infrastructure AD
Participant 10 red-teaming, AD
Smith, Theisen and Barik [49] describe Red Teams working at
Microsoft. They cover a wide range of topics including how cor- Participant 11 OT, IoT web
porate culture and red teaming interact. They also lightly touched Participant 12 web
on how people became security professionals and the interactions
in their daily work. Its interviewees were recruited from within
Microsoft, a single large-scale company and thus might not reflect The PhD thesis “How Hackers Think” [51] is a high-level trea-
industry which, cf. the previously mentioned paper, consists of to tise on hacker history, culture and their thought processes. It identi-
a large part of SMEs. In contrast, this publication focuses upon fies multiple characteristics of hackers, e.g., being highly self-motivated
the execution of security assignments, highlights hacker’s thought and curious, being able to tolerate ambiguity, and their use of men-
processes and details challenges in academic and automation re- tal models and patterning. Its focus lies on a high conceptual level
search. Furthermore, this paper is not limited to the discipline of and does not analyze how hackers actually identify and chose vul-
red-teaming. nerabilities to test. Neither does the study identify how different
Van den Hout [54] investigated the impact of different penetra- areas of penetration-testing, e.g., OT or red-teaming, might impact
tion test methodologies on the quality of the tests performed, but a hacker’s mindset.
concluded that only one reviewed methodology had widespread
adoption, but its recommendations for a structured approach were 3 METHODOLOGY
not taken into account. This could indicate a gap between “real” In general, our research follows a pragmatist approach [42, 46]
penetration testing and codified methodologies. combining methods from the empiricist and summarist interpretist
Multiple papers describe aspects of penetration-testing without traditions [33]. We used semi-structured interviews to gather in-
focusing on the operator’s mindset or their decision processes. Mu- sights into hackers’ work and thought processes.
naiah et al. [43] analyze event datasets and manually map attack Ethical Considerations. Our institution does not have a for-
patterns to MITRE ATT&CK Enterprise. This is used to show a- mal IRB process but offers voluntary submission to a Pilot Research
posteriori attack patterns but does not analyze how hackers se- Ethics Committee. As human interviews were conducted, the com-
lect the attacks to execute. MITRE ATT&CK itself is a taxonomy mittee was consulted, and topics were discussed, including ethi-
of TTPs (Tactics, Techniques and Procedures) and not a full attack cally relevant methodological clarifications, more specifically ques-
methodology. Bhuiyan et al. [23] uses GitHub security bug reports tions related to the involvement of voluntary participants in the
to identify the origins of bug reports. Examples of these origins research, as well as mitigating the risk of contextual identification.
are software source code, software log files, binary files, etc. This Participants gave their informed consent before the interviews took
details what data are used during reporting, but does not explain place; all data collected were anonymized by researchers prior to
how a security professional identifies potential vulnerabilities for analysis. All data storage and processing complied with strict na-
research in the first place, e.g., why a security professional analyzes tional privacy regulations and the EU’s General Data Protection
a mentioned log file for relevant security information. Regulation (GDPR).
Other papers focus upon narrow sub-disciplines of hacking which Recruitment. We define the target population as offensive-security
cannot be projected upon the hacking industry at large. Ceccato practitioners that work directly with customer systems. Previous
et al. [28] describes how “hackers” perform attacks against pro- research has found that security professionals are reluctant to com-
tected software, i.e., how software protection mechanisms in pro- municate with outsiders [41], especially when it comes to their
vided binary files are analyzed through reverse engineering. Based methodology and techniques. To counteract this, researchers reached
upon the responses of our interview series, reverse-engineering is out to public figures: the initial seed was populated by contacting
not representative for activities performed by offensive operators security companies, finalists of public security challenges, and se-
at large (a single mention of reverse engineering by an interview curity conference participants. We use snowball sampling to im-
partner well-renowed for publicly disclosed high-impact vulner- prove the interview pool: At the end of each interview, we asked
abilities, mentioned that he is leaving reversing due to time and the current interviewee to connect us with other offensive security
requirement constraints). professionals. In addition, we cold-called both a hacking education
youtuber and a public hacking collective that is well known for
Understanding Hacker’s Work ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

publishing vulnerability disclosures. Both were mentioned by the themes are reviewed, defined, and named. The results of the find-
participants during the interviews, both did not react to the contact ings are presented in Section 4–6.
attempt further enforcing the idea of a close-knit community [41]. Threats to Validity. Any interview-based study faces the threat
We sampled new interview participants until theoretical satura- of selection bias (internal threat). To counteract this, we performed
tion was reached, that is, no new information was obtained during snowball sampling, recruited random security professionals dur-
the interviews. When considering theoretical saturation we differ- ing security conferences, and explicitly invited security profession-
entiated between common themes and themes specific to the inter- als from different disciplines.
viewee’s specialty area. We continued interviews until neither two For ethical reasons, interview participation was limited to white-
subsequent interviews contributed new specialty area information, hat hackers, i.e., ethical hackers (internal threat). According to prior
nor three subsequent interviews contributed new common themes. analysis, the activities of black-hat hackers, e.g., Ransomware groups,
Theoretical saturation was reached after the 12th interview which can be seen as a subset of the activities performed by ethical red-
fit recommendations [31, 34]. teams [3, 4] which are covered in this work.
Participants. Participants had to work primarily in an offen- Another potential bias would be experimenter bias (internal threat).
sive security field, we excluded participants that primarily worked To reduce the risk, all the data collected was analyzed separately
within social engineering or physical security. If participants were by the different authors, and their respective labeling results were
working in a hybrid field, such as reverse-engineering or source- compared for differences, ambiguities were discussed and resolved.
code analysis, their primary focus had to be offensive. Hacking contains multiple disciplines. Our results might only
We reached out to offensive security professionals with at least capture common themes of a subset of those (external validity). We
four years of experience in the IT security field. try to counteract this by inviting interviewees from various hack-
To our dismay, we were not able to recruit any offensive security ing fields, as is reflected in Table 1.
professionals that identified as nonmale. While we come from a cul- The geographical distribution covered roughly Central Europe.
ture that naively prides itself to blind meritocracy [24], we found Other geographic regions might be more advanced when it comes
this contradiction disturbing. As we did not deem it relevant, we to the utilization of the different types of security assignment. If
did not ask about our participants’ religious or cultural affectivities, so, the relative importance of red-teaming would be diminished
but in hindsight, we can assume diversity in that area. within this study while the techniques, processes, and problems
To protect the anonymity of the participants, we cannot detail described would remain valid.
their employment status, ethnicity, work experience before secu-
rity work, and time of employment within the security field, etc.
When excluding education and CTF-participation, participants had 4 BECOMING A HACKER
an average work experience of 9 years (𝜇 = 9.0, 𝜎 = 6.5, 𝑚𝑒𝑑𝑖𝑎𝑛 = The interview responses reveal several interesting themes regard-
8). ing the path to becoming a hacker.
Interview Protocol. Interviews were conducted as semi-structured Academic Education. All but one participant attended at least
interviews utilizing video conferencing software. All but two inter- a single university-level class. Nine completed bachelor’s degree
viewees enabled both video and audio transmission. The average studies in IT, of those, all continued to add a master’s level degree.
duration of the interview was 55 minutes. Before the main inter- The percentage of interviewees enrolled in IT security specific pro-
view started, the participants were informed about data processing, grams increased from 55% (𝑛 = 5) for bachelor’s studies to 78%
and their rights, and asked for their informed consent. (𝑛 = 7) for master’s studies, indicating that participants felt drawn
The interviews were opened with questions about the intervie- toward IT security. This fits the perceived lack of IT-Security and
wee’s job description and how they acquired the needed skill set. Secure Development lectures during non IT-security centric pro-
Those were followed up by talking about the types of security as- grams, which was partially addressed by attending CTFs or en-
signments the participants are involved with. For 1–3 of these ar- rolling for non-mandatory security classes. Classes were often taken
eas, detailed questions about particularities, procedures, automa- in an extra occupational capacity. All fitting a common theme of
tion, and problems were asked; since the questions were open- “fascination with IT security” combined with high intrinsic motiva-
ended, the interviews branched out to subtopics organically from tion.
there. The interviews were closed with questions about grievances Experience before IT-Security. Having 2–3 years of non-security
and additional thoughts related to the field of IT security. IT exposure before entering the IT security field was found to
We recorded and manually transcribed all interviews. During be advantageous. Another related recommendation was to have a
the transcription, sensitive data was scrubbed from the interview; broad IT security base combined with one or two specialization ar-
the transcribed interview was then submitted for confirmation to eas. Within our group of interviewees, the common base was web
the interviewee. Scrubbed interviews were loaded into delve [5] for security or internal network assessments; examples of specializa-
thematic analysis. tions were red teaming or cloud-specific knowledge.
Analysis. Reflexive Thematic Analysis [26] was chosen to per- Staying relevant. All interviewees perceived a need for ongo-
form a data-driven exploratory analysis of interview transcriptions. ing education. The ubiquitous information source was Twitter, fol-
In summary, when performing thematic analysis, the researchers lowed by other online services such as YouTube channels, blog
initially familiarize themselves with the data, and extracts of the posts, Reddit, Github, or paid-for online courses. In the physical
data are tagged with codes. These codes are then used to create world, colleagues and conferences were mentioned. The quality of
clusters that identify or construct underlying themes. Then, those online material was considered high, although an interviewee had
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

Table 2: Types of Security Assessments custom-written software where no prior vulnerabilities are pub-
lished in vulnerability databases. As the scope is tight, customers
Type Covert Team-Size Effort in Days commonly provide dedicated test environments against which de-
Vulnerability Assessment not typical 1 2-4 structive tests can be performed. Another benefit of the limited
Penetration Test optional 1-2 5-10 scope is that the execution of a penetration test can be highly
Internal Network Test optional 1-2 7-10 structured, some (𝑛 = 2) interviewees went as far as calling them
IoT Test optional 1-2 7-10 “catalog-based”. Pen-tests are primarily performed manually.
OT Test never 1-2 7-10 Internal Network Penetration Tests verify the security and
Red-Teaming always 3-4 30+ resilience of internal networks. Their basic assumption is “assumed
breach”, i.e., the adversary is already within the local network and
now attempts to gain sensitive data or achieve higher privileges
— emulating Ransomware scenarios which have recently scourged
qualms publishing information due to potential misuse. A single companies. Microsoft Active Directory (AD) is ubiquitous in corpo-
participant regularly used the Darknet as news source. rate networks; thus, if present, it is the main target. In these cases,
To CTF or not. CTF attendance was a common theme. Partic- the security assignment’s intent is to obtain domain administrator
ipants saw a bidirectional information transfer: skills learned in privileges. The focus lies on exploiting known vulnerabilities, prod-
CTFs were applicable at work and vice versa. Tasks in CTFs were uct features, mis-configurations, and insufficient access-control or
considered very targeted in that they narrowly target a vulnerabil- hardening measures. Another big aspect is Lateral Movement, i.e.,
ity, and solving the challenge or reading a write-up were consid- using compromised systems to pivot to new targets. Assignments
ered efficient ways of gathering knowledge about the respective are made against productive environments.
vulnerability. Specialized security practitioners, e.g., from the OT IoT Tests are often performed as product tests and target con-
or ICS area, found CTFs to be introductory and shallow. crete products such as Smart-Home or smart medical devices. The
scope of typical IoT pen-tests is broadened by the inclusion of
5 HOW DO HACKERS WORK? hardware-specific tests, as well as testing if regulatory safety re-
While we encountered the common muttering of “every projects quirements are upheld during attacks. IoT pen-tests often include
is different”, these sections identify types of penetration tests, each tests of connected web- or mobile applications. Together with the
with distinct requirements, strategies, and particular actions. When hardware angle, this makes for a broader scope compared to a Web-
looking at a pen-tester’s work, this is the external view, i.e., how a , Infrastructure-, or Mobile Penetration Test.
pen-tester’s work is perceived from the outside. OT Tests target Operational Technology (OT) such as SCADA,
ICS or utility networks. They can be differentiated into product
tests and in-situ network tests of already configured systems. As
5.1 Types of Security Tests and their solutions consist of off-the-shelf software that is highly customized
Differences for usage within the corresponding client network, the latter are
Although different assignments have a similar project organiza- often preferred by the customer. Tested subjects often use propri-
tion, their execution differs due to the respective client and tar- etary protocols; therefore, reverse engineering is a common prac-
get environment. Table 2 shows the main types of security assign- tice in OT tests.
ments encountered during interviews. OT facilities, e.g., power plants, are expensive and often hard
Vulnerability Assessments focus upon achieving a high cov- to come by, thus a dedicated testing environment is rarely avail-
erage of the targeted assets, which are typically external IP-ranges able. Testing commonly occurs during scheduled down-times; this
(including web servers) or internal networks (including clients and severely impacts the available test window. Another related par-
internal infrastructure). Enumerating targets, e.g., through web crawl- ticularity: availability often trumps the breadth or depth of per-
ing or network scans, leads to the creation of important inventory formed security tests. As test subjects are “connected to the real
databases. Those are subsequently used to test against known vul- world”, negative side effects are potentially catastrophic. Security
nerability databases, known configuration errors or generic vul- tests are therefore highly coordinated with customers to prevent
nerability classes such as SQL injections. As assignments typically any negative fallout. This often prohibits any covert action.
include large amounts of potential targets, a high level of automa- Regulatory requirements [21] lead to a convergence between
tion is necessary. IoT and OT devices. In addition, Microsoft Active Directory starts
Web-, Infrastructure- or Mobile- Penetration Tests share to encreep OT networks, thus creating an overlap with Internal
similarities with vulnerability assessments. The demarcation point Network Tests.
between those two varied between interviewees. The situation is Compared to other approaches, in Red-Teaming the attack-
further complicated as vulnerability scans are often used as an ini- ers have a concrete mission, e.g., gain access to a defined subset
tial step during pen-testing. Generally speaking, while vulnerabil- of computers or a source code repository. While in Internal Net-
ity assessments focus on breadth, pen-testing focuses on depth, i.e., work Penetration Tests gaining Domain Admin is often the final
thoroughly breaking a single target. Pen-Tests are within the realm goal, this is only a means for achieving the mission during Red-
of application security: in addition to well-known vulnerabilities Teaming. Attackers holistically target a company and employ addi-
or configuration errors, new vulnerabilities are hunted within the tional techniques such as Open Source Intelligence (OSINT), Persis-
software under test. Penetration tests are often performed against tence, Command&Control (C2) and Social Engineering; Post-Exploitation
Understanding Hacker’s Work ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

is more prominent compared to other disciplines. Red teaming is manual crawling, was integrated as additional input into the tested
not concerned with broad coverage, but with achieving the team’s steps.
well-defined objective. Red-Teaming does not only attack the tar- According to interviewees, most time and effort are spent upon
get’s technical security posture but also the response of the blue authorization tests. An application typically has multiple user groups
team, i.e., defenders. Thus covert operations, hidden persistence, with different access rights. During testing, penetration testers re-
command&control systems and evasion of defensive techniques quest one or more users per existing group and try to perform
enter the picture. unauthorized data access with one user using data of another user.
Assignments are often performed in larger teams and over ex- To verify responses, testers need documentation about the imple-
tensive time frames, making information transfer between partici- mented access groups. If none was given, interviewees approxi-
pants more important. Adding additional team members to speed mate a model of the access rules through probing/testing and ex-
up an ongoing operation is problematic as the new team members perience.
do not share the existing member’s target system knowledge. With the exception of testing for authentication or authoriza-
tion, automated testing was deemed well-established and automated
5.2 Black- vs. Gray-Box Security Testing tooling was commonly employed. Common injection attack vec-
tors were well covered by tooling, for example sqlmap [18] for
When it comes to test execution, an important distinction is the
testing for SQL injections. Multiple Web-Application-Testers “com-
amount of information and support provided by the customer. Dur-
plained” that typical injection-based attacks which were common
ing black-box tests, practitioners go in “blind”; no information ex-
10 years ago are now seldom seen and are rather used for illus-
cept the scope is given. During white-box tests, full system access
trative purposes during education. Their suspected “culprit” is the
or even the source-code of the tested application is given. Gray-box
rise of web application frameworks with sane defaults that auto-
tests lie in-between: often access credentials or system architecture
matically prevent many attack classes. Multiple interviewees con-
descriptions are provided before testing commences.
sidered switching their area of interest due to this development.
Pure white-box tests, as in “source-code reviews”, are rarely per-
Multiple interviewees described API-based tests as tedious. Typ-
formed due to their prohibitive costs. The type of assignment is
ically an API test is performed by calling a sequence of opera-
also of importance: red-teaming is almost always performed as a
tions. Each operation is detailed through an API specification pro-
black-box test as the target’s personnel is not involved beneficially.
vided by the customer, e.g., through OpenAPI/Swagger or WSDL
OT tests are often performed in tight lock-step with customers
files. In theory, directly testing the back-end API reduces the pen-
(to reduce the potential fallout) and thus are gray-boxed. Intervie-
testing overhead as the tester can focus upon the core function-
wees overwhelmingly recommended moving from black-box
ality; in practice, API tests become time-consuming due to a lack
towards white-box testing. The reasons given were time and
of documentation with sufficient quality. API documentation only
thus cost efficiency, as well as potential for improved test cover-
describes single operations, often lacking detailed descriptions of
age.
valid input formats and their semantics. In addition, to achieve
In other areas, customers are helping pen testers to improve effi-
good test coverage, test cases need to perform a sequence of causally
ciency too. “Assumed breach” scenarios in Internal Network Pene-
dependent API calls, potentially reusing and refining data between
tration Testing conceptually assume that a client computer will be
operations. While performing a traditional web application pen
breached eventually and thus use a breached computer as a starting
test, this causality and examples of input data can be derived from
point for investigations. During web pen tests or during external
the captured web traffic. When performing API tests, these have
scans, rate limits or firewalls are commonly disabled to allow swift
to be derived from the API specifications or, more realistically, by
pen test execution. During web application pen-tests, internal de-
pestering the customer’s liaison contact.
tails, such as used technologies, are commonly provided to reduce
Internal Network Tests often occur in phases which are or-
the search space.
dered from “quiet” to “loud” when it comes to visibility. A typical
assignment targeting a Microsoft Active Directory might include
5.3 Typical Testing Workflows the following phases: initially, only network access is granted. The
Participants were asked to detail the execution of the different attacker either sniffs the network for exposed access credentials
types of assignments. This section describes the peculiarities of or utilizes MitM- and spoofing attacks to gain user credentials or
the different areas. tokens. In addition, anonymously accessible network shares are
Activities performed during Web Penetration Tests can be investigated for “juicy” information such as user or admin creden-
separated into exploratory intuitive testing and exhaustive testing tials. Exploits are used against vulnerable network services if the
against checklists or standards. All interviewees utilized both, no risks of detection and to stability are deemed acceptable. In the sec-
specific ordering between those two was detected, although if the ond phase, an attacker has either already gathered user credentials
checklist-verification was automated, it often was run in parallel to or has been provided with those by the customer. These credentials
exploratory testing. If a high-level of automation is achieved, the are typically for non-privileged domain users, and attackers utilize
manual (exploratory) testing can be integrated into the automation: them to further enumerate shares, gain access to additional domain
one interviewee detailed a multi-stage automated test-setup con- accounts or computers, or to gain local administrative privileges.
taining multiple enumeration steps, e.g., website crawling, where Lateral Movement often incurs during this phase. In the next phase,
the result of each step was manually verified, rectified and used the attacker has either gained or is provided local administrative
to instrument subsequent automated steps. Manual testing, e.g.,
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

privileges and tries to perform further Lateral Movement until a do- Table 3: Commonly Named Tools. # denotes the interviewee
main administrative account is compromised. With that, the whole count.
network is owned.
Please note, that phases do not follow a traditional waterfall Tool Area Availability #
model. According to interviewees (𝑛 = 2), often the domain ad- PortSwigger BURP Suite [12] Web-Testing free, commercial 7
min credentials can be gathered during the initial phase. This is BloodHound [2] AD Enumeration OSS 5
then noted, and additional attacks are performed until the agreed SQLMap.py [18] Web/SQLi OSS 3
upon timebox is reached. nmap [14] Network OSS 7
Many automated attacks, e.g., EternalBlue [25] or certify [7], nessus [13] Network commercial 8
were described as “too loud” or “unstable” for use during the ini- gobuster [8], dirbuster [6] Network OSS 2
certify [7] AD Exploitation OSS 4
tial phases. Another automation topic was the identification of metasploit [11] Exploitation OSS 3
“juicy” files within network shares: this activity is performed pri- nuclei [15] Exploitation OSS 3
marily manually as the identified data are context specific. In addi-
tion, creating a full-copy of a network share is time- and network-
sensitive as well as easily detectable and countermeasure systems
using honey-tokens are beginning to be deployed at customers’
sites. Due to the potentially catastrophic side-effects of testing, a risk-
IoT-Tests contain a wide range of potential targets. This in- based approach is often applied: together with the customer a threat
cludes both the tested IoT hardware, which can range from con- model workshop can be performed and potential scenarios that
nected toys to life-sustaining IoT devices, as well as the wider ecosys- warrant testing identified. Those scenarios, and only those, are sub-
tem where IoT devices communicate with cloud infrastructure, web sequently manually executed against the OT system. As the avail-
applications or mobile applications. During the initial enumeration able amount of time is fixed, threat modeling and performing the
phase, a model of the different communication channels as well derived tests compete for the same temporal resources.
as their responsibilities is often created. During hardware analysis,
optical inspection is used to identify used processing, memory and
storage components as well as their interaction and potential uses. 5.4 Automation
All interviewees in the IoT area mentioned applying industrial All interviewees used pre-made tooling, while few (𝑛 = 3) wrote
standards as well as the usage of checklists that included the OWASP additional tooling on their own. Overall, the tooling situation for
IoT [36] and OWASP Firmware Testing guides [35]. specific testing areas was seen in a positive light. In contrast, “all-
Red-Teaming is special due to its evasion- and deception-based in-one” tools were seen in a negative light. Multiple interviewees
methods as well as through its objective-based approach. A red remarked that a “fully automated tool cannot replace a pen-tester”
team initially has knowledge of its objective, e.g., gain access to a or, as one interviewee cynically replied, “yeah, I want a tool where
special server in department X, as well as a broad allowed scope, I can click a button and magically I get a finished pen-test report”.
e.g., the targeted company. Teams initially model how to breach the Practitioners relied on multiple small tools for different areas, e.g.,
company, e.g., by identifying potential social engineering victims. gobuster [8] for content discovery or sqlmap [18] for testing SQL
After the breach, low-key enumeration is used to covertly model injections. PortSwigger’s BURP Proxy Suite [12] was used by every
“how a company works” and then abuse that knowledge to derive at- web application pen-tester interviewed. See Table 3 for a list of
tacks that mirror expected traffic and behavior patterns. Through- commonly named automated tools.
out a red-teaming campaign, a map of known or breached elements Problems with tooling. Interviewees remarked that the setup
is built and compared to the imagined map of the company that in- overhead of automation tools can be problematic. Especially for
cluding the final objective: if both converge, the objective should short-term projects, such as vulnerability assessments or tightly-
be achieved. timed web application pen-tests, the initial setup overhead and
Automation employed for network lateral movement or breach- processing time can be prohibitive for deploying tooling. Another
ing web applications originate from the other pen-testing disci- problem was coverage: even within the same problem area, the cov-
plines, but have to be re-evaluated against their chance of being erage of different tools widely diverges, and the situation is made
detected. As red-team assignments are performed against real and worse as commonly no tool provides full coverage of a testing area.
live systems, the scope of destructive operations might be limited. To counteract this, practitioners commonly use multiple tools re-
OT-Tests have their own challenges. Due to the prevalence of dundantly, yielding more processing time overhead and needing
proprietary protocols, time-consuming reverse engineering of those manual merging of the different tools’ results.
protocols must initially occur. Mentioned experiences of our inter- Some areas were described as not suitable for automation. As
viewees indicate that Security-by-Obscurity is still common within OT systems are finicky and the potential fallout catastrophic, auto-
this area; this would match the perceived resistance of some Indus- mated tests are often not feasible. Additionally, when performing
trial Control System (ICS) suppliers when faced with responsible social engineering during red-team assignments, fully automated
disclosure requests. Due to the time burden of reverse-engineering, tools are avoided for both fear of detection and ethical qualms be-
it frequently has to be aborted due to the timeboxed nature of test- cause they would be used on human targets.
ing. Extendability and Community was identified as an impor-
tant discriminator by practitioners. Both are related to fast-paced
Understanding Hacker’s Work ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

Table 4: Excerpt of sub-themes of “Identifying Vulnerable Areas or Operations”

Subtheme # Representative Quotes


High-Level Targeting 7 “We select the attack that would be the most cost-effective for the attacker”
“. . . and windows environments. . . before we attack proprietary protocols we’ll attack a windows domain
server missing updates.”
Experience 12 “How we actually work? We look for obvious vulnerabilities, those that jump out immediately”
“I know that from my time programming C/C++. . . I find the errors that I made back then”
“I search for vulnerabilities that I have seen and exploited before.”
“. . . often I see systems that I have already seen when doing CTFs. . . then I already know how to attack it”
Familiarity with Target 4 ”If it is a repeat customer then you already know how they tick and what their problems are”
Observed Features 10 “One runs through the web applications and sees a feature and thinks “this looks interesting, could it be
implemented weirdly”?”
“If there’s an upload function, I am interested.”
Observed Technology 11 “Some things cannot be done securely, for example PHP.”
“Well, you always feel happy when the application is somehow a PHP application.”
Modeling Behavior 9 “Testing is manual, as you need to get a feel how the application is supposed to work and answer”
“You search for unexpected behavior. . . for example a database that throws an error when you enter a ’. ”
Intuition 8 “This will be esoteric. . . but I believe there is some organ that tingles if an operation looks fishy”

Subthemes mentioned by interviewees, # denotes the interviewee count.

developments within the exploit community: if a tool can be proac- These two categories are fluid. For example, findings from “hunt-
tively extended or scripted by the community, it and its imple- ing for bugs”, i.e., a new 0-day exploit against a software, can end
mented methods can evolve faster compared to reactive develop- up within “searching for known vulnerabilities”, i.e., when a rule for
ment within walled gardens. An example of an OSS tool utilizing detecting 0-day is added to a web vulnerability scanner.
community-provided detection rules is nuclei [15]; an example of While not stated explicitly during the interviews, we assume
a commercial tool with good OSS extendability is the PortSwigger that our interviewee’s mental model is primed through their un-
BURP Proxy Suite [12] with its integrated BApp Store. derstanding of this divide, and highly impacts tool and technique
Manual fine-tuning to reduce search space. Multiple inter- selection. As an interviewee mentioned, “you don’t hunt for 0-days
viewees mentioned that they are adjusting the tooling according to during an Active Directory assignment”. This implies that pen-testers
their ongoing findings. Examples of this feedback loop would be will not consider spending days fuzzing a domain controller for
limiting tested vulnerability classes to feasible ones, e.g., not test- new vulnerabilities during an internal network scans.
ing a static website for SQL injections, or limiting tested database
queries to concrete database dialects.
6.2 Identifying Vulnerable Areas or Operations
Participants often described exploratory testing during which they
6 HOW DO HACKERS THINK? were guided by intuition. Through follow-up questions, further in-
While Section 5 describes the externally visible different types and formation about this intuition was gathered.
activities performed during security testing, this section focuses on All interviewees were analyzing requests and responses; the for-
the inner workings and thoughts of security professionals during mer for conspicuous parameters and the latter for occurrences of
security testing, detailing their decision processes and potential error messages or other suspicious behavior, that is, behavior that
sources of their intrinsic motivation. does not fulfill the testers’ expectations.
During the interviews, multiple areas were identified where se-
curity testers possessed a mental model of the expected behavior of
6.1 Exploiting Configuration vs. Applications the software-under-test; during testing security testers were trying
A reoccurring theme was the distinction between searching for to find operations that could trigger unexpected behavior which, in
known vulnerabilities and hunting for new vulnerabilities. Exam- turn, might turn into a security vulnerability. Those mental mod-
ples of the former would be executing a vulnerability scan against els were built from experience, e.g., prior assignments or experi-
off-the-shelf software, or investigating a Microsoft Active Direc- ence within the specific business area, as well as adapted during
tory for misconfigurations. An example of the latter would be search- the security test itself, e.g., “learning how the application works”. A
ing for unknown SQL injection vulnerabilities within a custom summarization of multiple observed mental models can be seen in
written web application or discovering a new vulnerability class. Table 5.
Synonyms given for “searching for known vulnerabilities vs. hunt- Pen-testers attributed their intuition to experience which could
ing for new vulnerabilities” were “vulnerability assessments vs. ap- be built from performed penetration tests, participation in CTF
plication security” or “hacking configuration vs. hacking programs”. events, prior engagements with the same client or industry area,
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

Table 5: Excerpt of observed models

Area Input Identified Elements Describes Used for


Web Testing Web Traffic Access Rules ACL model Authentication Checks
Red-Teaming Network Traffic Communication Patterns Expected Communication Covert Channels
Network Tests local data and network shares File data and metadata Company Data find juicy information
OT tests data flows data flow model system architecture identify test scenarios
OT tests network traffic network commands network protocol protocol reversing
Web Testing web traffic, context used technologies technology stack potential vulnerabilities
Web Testing web traffic HTTP requests and responses input model generate tests
IoT network traffic and PCB images data flows system architecture identify attack vectors

or by implementing similar software solutions during their former Concrete examples of uncertainty would be a pen-tester issuing
life as software developers. Participants remarked that during test- a HTTP request where they expect an “access denied” response but
ing, they are triggered by vulnerabilities or exploits they had re- instead receiving a successful response containing data which can-
cently read about and, in response, would start additional research. not be clearly classified as belonging to the current user or not. An-
One penetration tester explicitly mentioned creating a topic map other example would be testing for time-based blind SQL injection
during everyday research which they then refer back to during as- vulnerabilities where the measured latency is not sufficiently de-
signments. terministic for verifying the vulnerability. Similarly, second-order
Related to experience, practitioners had preconceptions about attacks cannot easily be pinned on the invoking background pro-
the technologies used or features implemented. Some functional- cess.
ity, e.g., file uploads or XML processing, were thought to be hard Penetration testers modify existing valid requests to include ma-
to implement in a secure manner — to quote a participant, “there licious payloads. When these requests produce errors, the reason
are some things that just cannot be implemented correctly”. Similar can be uncertain: was it a potential vulnerability? A successful in-
resentments were discovered about used technologies. Some pro- put filtering algorithm? Or an application error that cannot be ex-
gramming languages were deemed to increase the probability of an ploited? This classification impacts the selection of subsequent re-
application containing defects; an interviewee mentioned thinking quests and attacks.
“let’s see how developers have been fooled again” when going into as- Another instance of uncertainty occurs during tool optimiza-
signments. As cynical as it may be, PHP was often mentioned as tion: tool output is continuously used to further optimize subse-
such a technology. quent tool invocations. Interviewees performed a sanity check if
It is important to note that participants may be subject to selec- reported system fingerprints were feasible and forfeited them oth-
tion and survivor bias. They might find vulnerabilities in areas they erwise. In addition, some high-impact decisions, such as limiting
focus on, ignoring plentiful vulnerabilities in other areas they are the expectations to a single DBMS type, were verified with the
historically ignoring. After a vulnerability has been found in an client before incorporating them into tooling selection or config-
area, the increased attention upon that area often yields multiple uration.
subsequent vulnerabilities [9].
Two distinct positions were experienced regarding the learnabil-
ity of this intuition. On one side, “nobody is born a super hacker”, 6.4 Don’t waste my time
on the other hand, one interviewee mentioned that the best pene-
tration testers in their peer group exhibited hacking-style behavior One theme discovered was that interviewees feel the need to be
already during kindergarten. Debating nature-vs-nurture or art-vs- time-efficient. This might be related to tight time-budgets or very
craft would go beyond the scope of this publication. Regardless constrained test-bed availability being anathema to good test cov-
of this, common consensus was found that hacking skills are im- erage. Shortcuts were taken to reduce menial tasks. For example,
proved through practice. during internal network tests, a breach is already assumed. The in-
terviewees defended this decision through “this will eventually hap-
pen through social engineering anyways”. A similar argument was
given for being provided accounts with local administrative priv-
6.3 Dealing with Uncertainty ileges: “a real attacker can just wait for the next 0-day”, or for dis-
Pen-testers routinely have to deal with uncertainty as they lack abling Anti-Virus solutions as evading them “takes time not skill”.
transparency of the tested system: pen-testers must make assump- Tests with foregone conclusions were considered tedious, one ex-
tions about requirements, the tested system’s architecture, as well ample given was testing an Anti-Virus solution embedded within a
as about accepted input values and the corresponding expected web-application with different payloads. The repetitiveness of this
output parameters. They evaluate those against their expectations, task might contribute to this too. This aversion to responsible dis-
and if a system deviates, examine the deviation for exploitability. closure procedures might be correlated to bad experiences during
When in doubt, testers can escalate and query their clients, but this prior disclosures: the vendor’s responses were mostly “wasting”
is deemed to be time-inefficient and thus minimized. the interviewee’s time.
Understanding Hacker’s Work ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

6.5 Quality Control 7 DISCUSSION AND IMPLICATIONS


Pen-Testers were concerned about the quality of their work, espe- We review our findings following the structure of our initial re-
cially when working with high-stakes data such as health records search questions to formulate points of discussion and implications
— “nobody wants to be that pen-tester that overlooked a vulnerabil- for security researchers and practitioners.
ity that was later exploited”. A tester’s attention is also a limited re-
source: at least one pen-tester remarked that web application tests
7.1 Alignment between Research and Industry
can be monotonous and that after 3–4 days their motivation de-
grades. Usage of checklists, automated baseline scans and working We started this study with two questions, “What do common se-
in teams were encountered as quality improvement measures. curity tests look like?” and “How do Hackers perform their
Applicability of checklists depends upon the testing domain. Some work?”. Those questions were broadly formulated to gain insight
domains, e.g., Web-Applications or Mobile Applications, were seen into how common assignments for practitioners look like, and how
as narrow and thus supporting the creation of security checklists. practitioners navigate their tasks within those assignments. These
Other domains such as IoT were described as diverse and impeding questions were particularly motivated by the fact that existing work
the creation of a unified security checklist. is not grounded in the realities of offensive security practices (i.e.,
Checklists were often derived from open industry standards; penetration testers).
they were maintained and extended by companies, but the result-
ing in-house checklists were seldom given back to the community 7.1.1 Research must match a Project’s Scope. During interviews,
and published. Common base for checklists was the OWASP tri- we identified typical security assignments with their respective
fecta of Vulnerability Top 10, Software Verification Standard and typical resource allocations. Research should heed those resources
Testing Guide; instances of those are provided by OWASP for mul- allocated. For example, when targeting web vulnerability assess-
tiple domains such as Web-Applications [45, 55, 56], Mobile Appli- ments, a typical project was given with 2–4 days of manual ef-
cations [38, 47], IoT [36] or Firmware [35]. Surprisingly, neither fort. Setting up a fuzzing pipeline, running the fuzzer, and analyz-
MITRE ATT&CK© [50] nor PTES [17] were mentioned by our in- ing its results is not feasible in this short time frame, thus render-
terviewees. Working in teams or asking colleagues can be seen ing generic fuzzing rather infeasible for web security practitioners.
as a broadening of the available experience pool or as employing Still, searching Google Scholar for “fuzzing web applications” yields
a “human checklist”. Usage of automated tools as baseline scans 23000 results. Given that interviewees mentioned the prevalence
that upheld minimal quality standards can also be interpreted as of web application frameworks and their preference for grey-box
quality control. Interviewees mentioned usage of fully-automated testing, SBOM-based solutions should be a better fit and would
commercial web vulnerability scanners such as NetSparker [10] or warrant additional research.
Acunetix [1] for this purpose. Some HTTP interception proxies, Most assignment types were done in solitary or as a paired team,
for example PortSwigger BURP [12] or OWASP ZAP [16], have indicating that research into collaborative solutions might be of
gained similar scanning capabilities. Those were used by some of limited use. The one exception using larger teams was Red-Teaming
the interviewees and encroach on terrain traditionally taken by although here collaborative solutions integrated into C2-frameworks
web vulnerability scanners. In defense of testers, full coverage of are already commonly used.
the software-under-test is not feasible due to the black- to gray-box Automation with direct target-interactions were deemed prob-
nature of security assignments. lematic in the Red-Teaming and OT areas due to the sensitivity of
their targets. Within the OT area, security by obscurity still seems
to be common, limiting the opportunities for source-code analysis
based approaches. On the other hand, improvements to reverse-
engineering binaries or protocols would be appreciated by practi-
6.6 Dealing with Change tioners.
Security is in a constant state of flux. Compared to other disci-
plines, the existence of active adversaries — the struggle between 7.1.2 Security Researchers and Security Practitioners. Separating
red and blue teams — lead to a Red Queen’s Race: participants must security into academic research and industry creates a false di-
run to stand still [27, 37]. If not evolving, the respective adversary chotomy. Industry itself is, at least, separated up into security prac-
will overcome. titioners and security researchers. The former are practitioners that
One example: Interviewees lamented that some areas — break- perform customer-specific assignments: those are the people that
ing into web applications, breaching external infrastructure/perimeters typically perform short-term penetration tests and directly com-
and reverse-engineering — have become harder due to boosted de- municate with clients to improve their security. In contrast, se-
fenses such as usage of frameworks, improved default configura- curity researchers do not exclusively work on short-term client
tions, and heightened awareness of security posture. They are par- projects but spent time researching new attack techniques and vec-
tially switching work areas, i.e., turning towards OT or internal net- tors. An example of the former would be an anonymous pen-tester
work testing. The security of those areas is currently being slowly working on a different web-application every week; an example of
improved, yielding the following quote “in the future we might use the latter would be James Kettle investigating and documenting a
social engineering not only for gaining the initial foothold, but also new attack class, HTTP Request Smuggling, over many years. Se-
for lateral movement”. If Zero-Trust Architectures become common, curity researchers search for new attack vectors or analyze a soft-
web-hacking might reemerge during Internal Network Testing. ware product for prolonged time to release exploits or be awarded
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

Table 6: Expert of Dealing with Change: how is Security-Testing changing?

Sub-Theme # Representative Quotes


Impact of Frameworks 5 “Security improves because frameworks help developers write secure code”
“Pen-Testing has become boring as critical vulnerabilities are found less often”
“Usage of secure frameworks pushed vulnerability hunting towards business logic.”
Defensive Mindset 3 “Developer awareness about security has become better.”
Changing targets 7 “In the future we might use social engineering not only for gaining the initial foothold, but also for lateral move-
ment”
“Rich-client applications are still fun. . . they feel like web applications twenty years ago.”
“Active-Directory: I moved into this area because it is fun to break into a system within days.”
“The situation in OT will stay the same. It’s hard to modernize all the legacy hard- and software.”
“Some OT networks are ransomware-ready.”

CVEs. Security Practitioners are more focused upon hunting con- security professionals searching for vulnerabilities as well as de-
figuration errors, exploiting well-known vulnerabilities, or identi- fensive software developers trying to prevent vulnerabilities from
fying new instances of known attack classes. They utilize informa- entering their code in the first place.
tion and tools from security researchers for that.
Tools such as fuzzers are thus more applicable to security re-
7.2.3 API Workflow Discovery for Security Test Generation.
searchers than to security practitioners. The large amount of re-
Interviewees lamented that the manual creation of API security
search into fuzzing indicates that academic research is targeting
test-cases is a tedious and time-consuming process. While the au-
security researchers rather than practitioners and thus only indi-
tomation of API test generation would be advantageous, the fol-
rectly improving the security landscape when information from
lowing gaps currently prevent this: discovery of API endpoints
security researchers trickles down to practitioners.
and operations, generation of benign requests as baseline, combin-
ing single requests into test flows using social and semantic infor-
7.2 Opportunities for Research mation, deriving malicious test cases, and finally evaluating test
We now want to answer the important final question, “What te- outcomes. The automatic generation of security test suites
dious or time-consuming areas could be improved?” through- based upon API definitions and traffic patterns would reduce
out the rest of this section and frame them as opportunities for testers’ odium for utilizing this important class of testing. While
future research that directly benefits security practitioners. there have been several works that propose approaches for API
discovery [53, 60], the kind of discovery we envision would focus
7.2.1 Automating Authorization Testing. For security tests with on maximizing coverage for security tests.
a relatively restricted scope such as web application tests, we sug-
gest research into covering additional vulnerability classes. Autho-
7.2.4 Information Discovery for Security Testing. Internal Net-
rization Testing is currently performed manually and was named
work Tests and Red-Teaming are highly dependent on discovering
one of the most time-consuming parts of testing and thus would
and utilizing client-specific information. Automated and stealthy
be a fruitful target for automation research. Current gaps are man-
information gathering from compromised systems or net-
ifold: detection of potential operations, accepted parameters, and
work shares is performed manually, and thus its efficiency could
potentially malicious parameters; generation of payloads as well as
be improved. The goal is the efficient identification of “juicy” in-
the assessment of an attack’s success. A subtle problem is the clas-
formation while reducing the number of read requests to minimize
sification of returned web pages and downloads into authorized
network impact or the chance of triggering intrusion detection sys-
and unauthorized content as this is highly context specific.
tems. Research in this area would also benefit defenders as it would
make forensic work, e.g., analyzing data breaches, more efficient.
7.2.2 Gray-box Testing. The preference for gray-box testing in
the work practice of software security professionals was surpris-
ing and can have a significant impact on software testing design: if 7.2.5 Scaling Personalized Phishing with ML. Phishing is an
security testing solutions can access configuration or the target’s important part of the red-teaming workflow and is commonly done
source code (or if the target is willing to instrumentalize the tar- manually, due to the nature of customization proper phishing re-
get software through sensors as is done in IAST) , automated quires. We see an opportunity to investigate the increase of scal-
software testing approaches using source-code or configu- ability of social engineering through machine learning tech-
ration become increasingly feasible for security testing. Further niques. To create highly effective phishing mails, currently, mails
research into potential synergy effects, as well as research into au- are manually customized to fit the respective recipient. Machine
tomated source code and configuration file analysis from a security learning techniques could automate this and thus provide Spear
perspective, is currently underexplored and ripe for investigation. Phising at Scale, as they have already been shown to personal-
Research in this area yields dual-use tools, aiding both offensive ize natural language communication in other domains [30, 40, 59].
Understanding Hacker’s Work ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

An additional avenue for research is the identification of poten- ACKNOWLEDGMENT


tial targets for social engineering, both from an external perspec- We thank the anonymous interview participants for their time and
tive (identifying initial recipients within a company) as well as de- Loren Kohnfelder for providing feedback.
tecting informal networks within companies to enrich sub-
sequent social-engineering campaigns — this is an example of
the red-teaming theme of “understanding how companies function”.
REFERENCES
[1] Acunetix: Web vulnerability scanner. https://www.acunetix.com/. Accessed:
7.2.6 Human-in-the-loop for OT testing. OT professionals were 2022-09-30.
weary of fully automated security tests due to the potential neg- [2] Bloodhoundad: Six degrees of domain admin.
https://github.com/BloodHoundAD/BloodHound. Accessed: 2022-09-30.
ative impact upon stability and thus availability. We suggest re- [3] Conti cyber attack on the HSE, independent post incident review.
search into supplemental areas while letting humans decide which https://www.hse.ie/eng/services/publications/conti-cyber-attack-on-the-hse-full-report.pdf.
attacks to execute. One example would be to reduce the pain and Accessed: 2022-09-30.
[4] Conti’s hacker manuals — read, reviewed & analyzed.
effort of reverse engineering protocols: OT tests are very time- https://www.akamai.com/blog/security/conti-hacker-manual-reviewed. Ac-
bound thus there is little time for fuzzing or reverse-engineering cessed: 2022-09-30.
OT protocols while the potential benefit might be immense due to [5] Delve: Software tool to analyze qualitative data. https://delvetool.com/. Ac-
cessed: 2022-10-01.
security being provided by the obscurity of those protocols. Com- [6] Dirbuster. https://www.kali.org/tools/dirbuster/. Accessed: 2022-09-30.
bining fuzzing with automatic reverse-engineering should yield [7] Ghostpack/certify: Active directory certificate abuse.
https://github.com/GhostPack/Certify. Accessed: 2022-09-30.
large benefits [32]. The fear of potential fall-out has other conse- [8] gobuster: Directory/file, dns and vhost busting tool written in go.
quences too: OT-tests are often performed by executing scenarios https://github.com/OJ/gobuster. Accessed: 2022-09-30.
in lockstep with the customer. The scenarios are identified through [9] https://nakedsecurity.sophos.com/2021/07/16/more-printnightmare-
we-told-you-not-to-turn-the-print-spooler-back-on/.
threat modeling components and their data flows. To reduce the https://nakedsecurity.sophos.com/2021/07/16/more-printnightmare-we-told-you-not-to-turn-the-
time spent on this effort, ways of automatically deriving scenar- Accessed: 2022-10-03.
ios including attack paths from system and data flow diagrams [10] Invicti: Web application security for enterprise. https://www.invicti.com/. Ac-
cessed: 2022-09-30.
should be investigated. [11] Metasploit: Penetration testing software. https://github.com/rapid7/metasploit-framework.
Both OT professionals and red-teams were weary of fully auto- Accessed: 2022-09-30.
[12] Methodology for top 10. https://groups.google.com/a/owasp.org/g/leaders/c/pFLxDLE28ZA.
mated testing solutions due to the potential negative impact upon Accessed: 2022-09-30.
stealth (red-teaming) or stability (OT). To facilitate the deployment [13] Nessus vulnerability assessment solution. https://www.tenable.com/products/nessus/nessus-profes
of automated systems, research into Human-Computer Inter- Accessed: 2022-09-30.
[14] Nmap: the network mapper — free security scanner. https://nmap.org. Accessed:
actions to bolster the acceptance of ML and automated sys- 2022-09-30.
tems is needed. It is assumed that important topics will include [15] Nuclei: Fast and customizable vulnerability scanner based on simple yaml based
humans-in-the-loop as well as the explainability of automated rea- dsl. https://github.com/projectdiscovery/nuclei. Accessed: 2022-09-30.
[16] Owasp zed attack proxy (zap). https://www.zapproxy.org/. Accessed: 2022-09-
soning. 30.
[17] Ptes technical guidelines. http://www.pentest-standard.org/index.php/PTES_Technical_Guidelines
Accessed: 2022-09-30.
7.2.7 Studying Knowledge Communities for Security Testers. [18] sqlmap: automatic sql injection and database takeover tool. https://sqlmap.org/.
Our interview participants unsurprisingly felt the need for ongo- Accessed: 2022-09-30.
ing education w.r.t. new vulnerabilities and security trends. They [19] Windows print spooler remote code execution vulnerability (cve-2021-34527).
https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-34527. Ac-
synthesized information from multiple sources, the pivotal one be- cessed: 2022-09-30.
ing Twitter. Research on how developers stay current [48] and [20] Zero day initiative. https://www.zerodayinitiative.com/blog. Accessed: 2022-09-
30.
how development communities shape around news outlets [22] [21] Directive (eu) 2016/1148 of the european parliament and of the council of 6 july
should be extended to the security arena, especially now that re- 2016 concerning measures for a high common level of security of network and
cent stewardship changes at Twitter might impact its reach. Auto- information systems across the union. Official Journal of the European Union L
194 (2016-07-06), 1–30.
mated recommender systems utilizing diverse hacking news [22] Aniche, M., Treude, C., Steinmacher, I., Wiese, I., Pinto, G., Storey, M.-A.,
sources such as news outlets, social media, and, the “darknet” should and Gerosa, M. A. How modern news aggregators help development com-
enable security professionals to stay up to date easier. munities shape and share knowledge. In Proceedings of the 40th International
conference on software engineering (2018), pp. 499–510.
[23] Bhuiyan, F. A., Rahman, A., and Morrison, P. Vulnerability discovery strate-
gies used in software projects. In Proceedings of the 35th IEEE/ACM International
8 DATA AVAILABILITY Conference on Automated Software Engineering Workshops (2020), pp. 13–18.
[24] Blankenship, L. The conscience of a hacker. Phrack 7 (Jan. 1986).
The data used in this study was collected through interviews with a [25] Boyanov, P. Educational exploiting the information resources and invading the
close-knit community of ethical hackers. Deanonymization would security mechanisms of the operating system windows 7 with the exploit eter-
likely not be preventable. In accordance with ethical guidelines and nalblue and backdoor doublepulsar. Association Scientific and Applied Research
14 (2018), 34.
agreement with the interview participants, the decision was made [26] Braun, V., and Clarke, V. Reflecting on reflexive thematic analysis. Qualitative
not to release the interview data. All meta-information related to research in sport, exercise and health 11, 4 (2019), 589–597.
[27] Bukac, V., Lorenc, V., and Matyáš, V. Red queen’s race: Apt win-win game. In
the interviews, including the interview guide and consent forms Cambridge International Workshop on Security Protocols (2014), Springer, pp. 55–
are part of our replication package. It is important to note that the 61.
data protection and privacy of the participants is a top priority [28] Ceccato, M., Tonella, P., Basile, C., Falcarin, P., Torchiano, M., Coppens,
B., and De Sutter, B. Understanding the behaviour of hackers while perform-
and all ethical guidelines have been followed in the collection and ing attack tasks in a professional setting and in a public challenge. Empirical
handling of the data. Software Engineering 24 (2019), 240–286.
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Andreas Happe and Jürgen Cito

[29] Durumeric, Z., Li, F., Kasten, J., Amann, J., Beekman, J., Payer, M., Weaver, N., [55] van der Stork, A., Glas, B., Smithline, N., and Gigler, T. Owasp top 10:2021.
Adrian, D., Paxson, V., Bailey, M., et al. The matter of heartbleed. In Proceed- https://owasp.org/Top10/0x00-notice/, Sep 2021.
ings of the 2014 conference on internet measurement conference (2014), pp. 475– [56] van der Stork, A., Grossman, J., Cuthbert, D., Lang, E., and
488. Manico, J. Owasp application security verification standard.
[30] Ferretti, S., Mirri, S., Prandi, C., and Salomoni, P. Automatic web content https://raw.githubusercontent.com/OWASP/ASVS/v4.0.3/4.0/OWASP%20Application%20Security%
personalization through reinforcement learning. Journal of Systems and Software Oct 2021.
121 (2016), 157–169. [57] Wysopal, C., Nelson, L., Dustin, E., and Dai Zovi, D. The art of software
[31] Francis, J. J., Johnston, M., Robertson, C., Glidewell, L., Entwistle, V., Ec- security testing: identifying software security flaws. Pearson Education, 2006.
cles, M. P., and Grimshaw, J. M. What is an adequate sample size? operational- [58] Wysopal, C., Nelson, L., Dustin, E., and Dai Zovi, D. The art of software
ising data saturation for theory-based interview studies. Psychology and health security testing: identifying software security flaws. Pearson Education, 2006.
25, 10 (2010), 1229–1245. [59] Xu, M., Qian, F., Mei, Q., Huang, K., and Liu, X. Deeptype: On-device deep
[32] Gascon, H., Wressnegger, C., Yamaguchi, F., Arp, D., and Rieck, K. Pul- learning for input personalization service with minimal privacy concern. Pro-
sar: Stateful black-box fuzzing of proprietary network protocols. In Security ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
and Privacy in Communication Networks: 11th EAI International Conference, Se- 2, 4 (2018), 1–26.
cureComm 2015, Dallas, TX, USA, October 26-29, 2015, Proceedings 11 (2015), [60] Yessenov, K., Kuraj, I., and Solar-Lezama, A. Demomatch: Api discovery from
Springer, pp. 330–347. demonstrations. ACM SIGPLAN Notices 52, 6 (2017), 64–78.
[33] Guba, E. G., Lincoln, Y. S., et al. Competing paradigms in qualitative research.
Handbook of qualitative research 2, 163-194 (1994), 105.
[34] Guest, G., Bunce, A., and Johnson, L. How many interviews are enough? an
experiment with data saturation and variability. Field methods 18, 1 (2006), 59–
82.
[35] Guzman, A. Owasp firmware security testing methodology.
https://scriptingxss.gitbook.io/firmware-security-testing-methodology/.
Accessed: 2022-09-30.
[36] Guzman, A., and Bassem, C. Owasp iot security verification standard.
https://github.com/OWASP/IoT-Security-Verification-Standard-ISVS/releases/download/1.0RC/OWASP_ISVS-1.0RC-en_WIP_.pdf ,
Dec 2020.
[37] Harang, R., and Ducau, F. N. Measuring the speed of the red queen’s race.
BlackHat: Las Vegas, NV, USA (2018).
[38] Holguera, C., Müller, B., Schleier, S., and Willemsen,
J. Owasp mobile application security verification standard.
https://github.com/OWASP/owasp-masvs/releases/latest/download/OWASP_MASVS-v1.4.2-en.pdf ,
Jan 2022.
[39] Huaman, N., von Skarczinski, B., Wermke, D., Stransky, C., Acar, Y., Dreissi-
gacker, A., and Fahl, S. A large-scale interview study on information security
in and attacks against small and medium-sized enterprises. In In 30th USENIX
Security Symposium (2021).
[40] Katakis, I., Tsoumakas, G., Banos, E., Bassiliades, N., and Vlahavas, I. An
adaptive personalized news dissemination system. Journal of intelligent infor-
mation systems 32 (2009), 191–212.
[41] Kotulic, A. G., and Clark, J. G. Why there aren’t more information security
research studies. Information & Management 41, 5 (2004), 597–607.
[42] Mackenzie, N., and Knipe, S. Research dilemmas: Paradigms, methods and
methodology. Issues in educational research 16, 2 (2006), 193–205.
[43] Munaiah, N., Rahman, A., Pelletier, J., Williams, L., and Meneely, A. Char-
acterizing attacker behavior in a cybersecurity penetration testing competition.
In 2019 ACM/IEEE International Symposium on Empirical Software Engineering
and Measurement (ESEM) (2019), IEEE, pp. 1–6.
[44] Potter, B., and McGraw, G. Software security testing. IEEE Security & Privacy
2, 5 (2004), 81–85.
[45] Saad, E., and Mitchell, R. Owasp web security testing guide.
https://github.com/OWASP/wstg/releases/download/v4.2/wstg-v4.2.pdf ,
Dec 2020.
[46] Saunders, M., and Tosey, P. The layers of research design. Tech. rep., University
of Surrey, 2013.
[47] Schleier, S., Mueller, B., Holguera, C., and Willem-
sen, J. Owasp mobile application security testing guide.
https://github.com/OWASP/owasp-mastg/releases/latest/download/OWASP_MASTG-v1.5.0.pdf ,
Sep 2022.
[48] Singer, L., Figueira Filho, F., and Storey, M.-A. Software engineering at the
speed of light: how developers stay current using twitter. In Proceedings of the
36th International Conference on Software Engineering (2014), pp. 211–221.
[49] Smith, J., Theisen, C., and Barik, T. A case study of software security red teams
at microsoft. In 2020 IEEE Symposium on Visual Languages and Human-Centric
Computing (VL/HCC) (2020), IEEE, pp. 1–10.
[50] Strom, B. E., Applebaum, A., Miller, D. P., Nickels, K. C., Pennington, A. G.,
and Thomas, C. B. Mitre att&ck: Design and philosophy. In Technical report.
The MITRE Corporation, 2018.
[51] Summers, T. C. How hackers think: A mixed method study of mental models and
cognitive patterns of high-tech wizards. Case Western Reserve University, 2015.
[52] Takanen, A., Demott, J. D., Miller, C., and Kettunen, A. Fuzzing for software
security testing and quality assurance. Artech House, 2018.
[53] Torres, R., Tapia, B., et al. Improving web api discovery by leveraging social
information. In 2011 IEEE International Conference on Web Services (2011), IEEE,
pp. 744–745.
[54] van den Hout, N. J. Standardised Penetration Testing? Examining the Usefulness
of Current Penetration Testing Methodologies. PhD thesis, 09 2019.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy