100% found this document useful (1 vote)
504 views200 pages

CyBOK v1.1.0-4

This document discusses establishing design requirements for secure software. It recommends following principles like least privilege and defense in depth. It also recommends using approved tools for static and dynamic analysis security testing during development, as well as penetration testing and having an incident response plan to prepare for potential attacks.

Uploaded by

Adrian N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
504 views200 pages

CyBOK v1.1.0-4

This document discusses establishing design requirements for secure software. It recommends following principles like least privilege and defense in depth. It also recommends using approved tools for static and dynamic analysis security testing during development, as well as penetration testing and having an incident response plan to prepare for potential attacks.

Uploaded by

Adrian N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 200

The Cyber Security Body Of Knowledge

www.cybok.org

5. Establish Design Requirements. Design requirements guide the implementation of ’se-


cure features’ (i.e., features that are well engineered with respect to security). Additionally,
the architecture and design must be resistant to known threats in the intended opera-
tional environment.
The design of secure features involves abiding by the timeless security principles set
forth by Saltzer and Schroeder [8] in 1975 and restated by Viega and McGraw [1469] in
2002. The eight Saltzer and Schroeder principles are:
• Economy of mechanism. Keep the design of the system as simple and small as
possible.
• Fail-safe defaults. Base access decisions on permissions rather than exclusion; the
default condition is lack of access and the protection scheme identifies conditions
under which access is permitted. Design a security mechanism so that a failure
will follow the same execution path as disallowing the operation.
• Complete mediation. Every access to every object must be checked for authorisa-
tion.
• Open design. The design should not depend upon the ignorance of attackers but
rather on the possession of keys or passwords.
• Separation of privilege. A protection mechanism that requires two keys to unlock
is more robust than one that requires a single key when two or more decisions
must be made before access should be granted.
• Least privilege. Every program and every user should operate using the least set
of privileges necessary to complete the job.
• Least common mechanism. Minimise the amount of mechanisms common to
more than one user and depended on by all users.
• Psychological acceptability. The human interface should be designed for ease of
use so that users routinely and automatically apply the mechanisms correctly and
securely.
Two other important secure design principles include the following:
• Defense in depth. Provide multiple layers of security controls to provide redundancy
in the event a security breach.
• Design for updating. The software security must be designed for change, such as
for security patches and security property changes.
Design requirements also involve the selection of security features, such as cryptography,
authentication and logging to reduce the risks identified through threat modelling. Teams
also take actions to reduce the attack surface of their system design. The attack surface,
a concept introduced by Howard [1582] in 2003, can be thought of as the sum of the
points where attackers can try to enter data to or extract data from a system [1589, 1590].
In 2014, the IEEE Center for Secure Design [1583] enumerated the top ten security design
flaws and provided guidelines on techniques for avoiding them. These guidelines are as
follows:
(a) Earn or give, but never assume, trust.

KA Secure Software Lifecycle | July 2021 Page 563


The Cyber Security Body Of Knowledge
www.cybok.org

(b) Use an authentication mechanism that cannot be bypassed or tampered with.


(c) Authorise after you authenticate.
(d) Strictly separate data and control instructions, and never process control instruc-
tions received from untrusted sources.
(e) Define an approach that ensures all data are explicitly validated.
(f) Use cryptography correctly.
(g) Identify sensitive data and how they should be handled.
(h) Always consider the users.
(i) Understand how integrating external components changes your attack surface.
(j) Be flexible when considering future changes to objects and actors.
6. Define and Use Cryptography Standards. The use of cryptography is an important design
feature for a system to ensure security- and privacy-sensitive data is protected from
unintended disclosure or alteration when it is transmitted or stored. However, an incorrect
choice in the use of cryptography can render the intended protection weak or ineffective.
Experts should be consulted in the use of clear encryption standards that provide
specifics on every element of the encryption implementation and on the use of only
properly vetted encryption libraries. Systems should be designed to allow the encryption
libraries to be easily replaced, if needed, in the event the library is broken by an attacker,
such as was done to the Data Encryption Standard (DES) through ’Deep Crack’9 , a brute
force search of every possible key as designed by Paul Kocher, president of Cryptography
Research.
7. Manage the Security Risk of Using Third-Party Components. The vast majority of soft-
ware projects are built using proprietary and open-source third-party components. The
Black Duck On-Demand audit services group [1584] conducted open-source audits on
over 1,100 commercial applications and found open-source components in 95% of the
applications with an average 257 components per application. Each of these compo-
nents can have vulnerabilities upon adoption or in the future. An organisation should
have an accurate inventory of third-party components [1598], continuously use a tool
to scan for vulnerabilities in its components, and have a plan to respond when new
vulnerabilities are discovered. Freely available and proprietary tools can be used to
identify project component dependencies and to check if there are any known, publicly
disclosed, vulnerabilities in these components.
8. Use Approved Tools. An organisation should publish a list of approved tools and their
associated security checks and settings such as compiler/linker options and warnings.
Engineers should use the latest version of these tools, such as compiler versions, and
take advantage of new security analysis functionality and protections. Often, the resultant
software must be backward compatible with previous versions.
9. Perform Static Analysis Security Testing (SAST). SAST tools can be used for an auto-
mated security code review to find instances of insecure coding patterns and to help
ensure that secure coding policies are being followed. SAST can be integrated into the
commit and deployment pipeline as a check-in gate to identify vulnerabilities each time
the software is built or packaged. For increased efficiency, SAST tools can integrate into
9
https://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/HTML/19980716_eff_des_faq.html

KA Secure Software Lifecycle | July 2021 Page 564


The Cyber Security Body Of Knowledge
www.cybok.org

the developer environment and be run by the developer during coding. Some SAST tools
spot certain implementation bugs, such as the existence of unsafe or other banned
functions and automatically replace with (or suggest) safer alternatives as the developer
is actively coding. See also the Software Security Knowledge Area (Section 15.3.1).
10. Perform Dynamic Analysis Security Testing (DAST). DAST performs run-time verifica-
tion of compiled or packaged software to check functionality that is only apparent when
all components are integrated and running. DAST often involves the use of a suite of
pre-built attacks and malformed strings that can detect memory corruption, user priv-
ilege issues, injection attacks, and other critical security problems. DAST tools may
employ fuzzing, an automated technique of inputting known invalid and unexpected test
cases at an application, often in large volume. Similar to SAST, DAST can be run by the
developer and/or integrated into the build and deployment pipeline as a check-in gate.
DAST can be considered to be automated penetration testing. See also the Software
Security Knowledge Area (Section 15.3.2).
11. Perform Penetration Testing. Manual penetration testing is black box testing of a run-
ning system to simulate the actions of an attacker. Penetration testing is often performed
by skilled security professionals, who can be internal to an organisation or consultants,
opportunistically simulating the actions of a hacker. The objective of a penetration test
is to uncover any form of vulnerability - from small implementation bugs to major design
flaws resulting from coding errors, system configuration faults, design flaws or other
operational deployment weaknesses. Tests should attempt both unauthorised misuse
of and access to target assets and violations of the assumptions. A widely-referenced
resource for structuring penetration tests is the OWASP Top 10 Most Critical Web Ap-
plication Security Risks10 . As such, penetration testing can find the broadest variety of
vulnerabilities, although usually less efficiently compared with SAST and DAST [1585].
Penetration testers can be referred to as white hat hackers or ethical hackers. In the
penetration and patch model, penetration testing was the only line of security analysis
prior to deploying a system.
12. Establish a Standard Incident Response Process. Despite a secure software lifecycle,
organisations must be prepared for inevitable attacks. Organisations should proactively
prepare an Incident Response Plan (IRP). The plan should include who to contact in
case of a security emergency, establish the protocol for efficient vulnerability mitigation,
for customer response and communication, and for the rapid deployment of a fix. The
IRP should include plans for code inherited from other groups within the organisation
and for third-party code. The IRP should be tested before it is needed. Lessons learned
through responses to actual attack should be factored back into the SDL.
10
https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

KA Secure Software Lifecycle | July 2021 Page 565


The Cyber Security Body Of Knowledge
www.cybok.org

17.2.1.2 Touchpoints

International software security consultant, Gary McGraw, provided seven Software Security
Touchpoints [1578] by codifying extensive industrial experience with building secure products.
McGraw uses the term touchpoint to refer to software security best practices which can
be incorporated into a secure software lifecycle. McGraw differentiates vulnerabilities that
are implementation bugs and those that are design flaws [1583]. Implementation bugs are
localized errors, such as buffer overflow and input validation errors, in a single piece of
code, making spotting and comprehension easier. Design flaws are systemic problems at the
design level of the code, such as error-handling and recovery systems that fail in an insecure
fashion or object-sharing systems that mistakenly include transitive trust issues [1578]. Kuhn
et al. [1598] analysed the 2008 - 2016 vulnerability data from the US National Vulnerability
Database (NVD)11 and found that 67% of the vulnerabilities were implementation bugs. The
seven touchpoints help to prevent and detect both bugs and flaws.
These seven touchpoints are described below and are provided in order of effectiveness
based upon McGraw’s experience with the utility of each practice over many years, hence
prescriptive:
1. Code Review (Tools).
Code review is used to detect implementation bugs. Manual code review may be used,
but requires that the auditors are knowledgeable about security vulnerabilities before
they can rigorously examine the code. ’Code review with a tool’ (a.k.a. the use of static
analysis tools or SAST) has been shown to be effective and can be used by engineers
that do not have expert security knowledge. For further discussion on static analysis,
see Section 2.1.1 bullet 9.
2. Architectural Risk Analysis.
Architectural Risk Analysis, which can also be referred to as threat modelling (see Section
2.1.1 bullet 4), is used to prevent and detect design flaws. Designers and architects
provide a high-level view of the target system and documentation for assumptions, and
identify possible attacks. Through architectural risk analysis, security analysts uncover
and rank architectural and design flaws so mitigation can begin. For example, risk
analysis may identify a possible attack type, such as the ability for data to be intercepted
and read. This identification would prompt the designers to look at all their code’s traffics
flows to see if interception was a worry, and whether adequate protection (i.e. encryption)
was in place. That review that the analysis prompted is what uncovers design flaws,
such as sensitive data is transported in the clear.
No system can be perfectly secure, so risk analysis must be used to prioritise secu-
rity efforts and to link system-level concerns to probability and impact measures that
matter to the business building the software. Risk exposure is computed by multiply-
ing the probability of occurrence of an adverse event by the cost associated with that
event [1599].
McGraw proposes three basic steps for architectural risk analysis:
• Attack resistance analysis. Attack resistance analysis uses a checklist/systematic
approach of considering each system component relative to known threats, as is
done in Microsoft threat modelling discussed in Section 2.1.1 bullet 4. Information
11
http://nvd.nist.gov

KA Secure Software Lifecycle | July 2021 Page 566


The Cyber Security Body Of Knowledge
www.cybok.org

about known attacks and attack patterns are used during the analysis, identifying
risks in the architecture and understanding the viability of known attacks. Threat
modelling with the incorporation of STRIDE-based attacks, as discussed in Section
2.1.1 bullet 4, is an example process for performing attack resistance analysis.
• Ambiguity analysis. Ambiguity analysis is used to capture the creative activity re-
quired to discover new risks. Ambiguity analysis requires two or more experienced
analysts who carry out separate analysis activities in parallel on the same system.
Through unifying the understanding of multiple analysis, disagreements between
the analysts can uncover ambiguity, inconsistency and new flaws.
• Weakness analysis. Weakness analysis is focused on understanding risk related
to security issues in other third-party components (see Section 2.1.1 bullet 7). The
idea is to understand the assumptions being made about third-party software and
what will happen when those assumptions fail.
Risk identification, ranking, and mitigation is a continuous process through the software
lifecycle, beginning with the requirement phase.
3. Penetration Testing.
Penetration testing can be guided by the outcome of architectural risk analysis (See
Section 2.1.2 bullet 2). For further discussion on penetration testing, see Section 2.1.1,
bullet 11.
4. Risk-based Security Testing.
Security testing must encompass two strategies: (1) testing of security functionality with
standard functional testing techniques; and (2) risk-based testing based upon attack
patterns and architectural risk analysis results (see Section 2.1.2 bullet 2), and abuse
cases (see Section 2.1.2 bullet 5). For web applications, testing of security functionality
can be guided by the OWASP Application Security Verfication Standard (ASVS) Project12
open standard for testing application technical security controls. ASVS also provides
developers with a list of requirements for secure development.
Guiding tests with knowledge of the software architecture and construction, common
attacks, and the attacker’s mindset is extremely important. Using the results of archi-
tectural risk analysis, the tester can properly focus on areas of code where an attack is
likely to succeed.
The difference between risk-based testing and penetration testing is the level of the
approach and the timing of the testing. Penetration testing is done when the software is
complete and installed in an operational environment. Penetration tests are outside-in,
black box tests. Risk-based security testing can begin before the software is complete
and even pre-integration, including the use of white box unit tests and stubs. The two are
similar in that they both should be guided by risk analysis, abuse cases and functional
security requirements.
5. Abuse Cases.
This touchpoint codifies ’thinking like an attacker’. Use cases describe the desired
system’s behaviour by benevolent actors. Abuse cases [1586] describe the system’s
behaviour when under attack by a malicious actor. To develop abuse cases, an analyst
12
https://www.owasp.org/index.php/Category:OWASP_Application_Security_Verification_Standard_Project#tab=Home

KA Secure Software Lifecycle | July 2021 Page 567


The Cyber Security Body Of Knowledge
www.cybok.org

enumerates the types of malicious actors who would be motivated to attack the system.
For each bad actor, the analyst creates one or more abuse case(s) for the functionality the
bad actor desires from the system. The analyst then considers the interaction between
the use cases and the abuse cases to fortify the system. Consider an automobile
example. An actor is the driver of the car, and this actor has a use case ’drive the car’.
A malicious actor is a car thief whose abuse case is ’steal the car’. This abuse case
threatens the use case. To prevent the theft, a new use case ’lock the car’ can be added
to mitigate the abuse case and fortify the system.
Human error is responsible for a large number of breaches. System analysts should
also consider actions by benevolent users, such as being the victim of a phishing attack,
that result in a security breach. These actions can be considered misuse cases [1587]
and should be analysed similarly to abuse cases, considering what use case the misuse
case threatens and the fortification to the system to mitigate the misuse case.
The attacks and mitigations identified by the abuse and misuse case analysis can be
used as input into the security requirements (Section 2.1.1 bullet 2.); penetration testing
(Section 2.1.1 bullet 11); and risk-based security testing (Section 2.1.2 bullet 4).
6. Security Requirements.
For further discussion on security requirements, see Section 2.1.1 bullet 2.
7. Security Operations.
Network security can integrate with software security to enhance the security posture.
Inevitably, attacks will happen, regardless of the applications of the other touchpoints.
Understanding attacker behaviour and the software that enabled a successful attack is
an essential defensive technique. Knowledge gained by understanding attacks can be
fed back into the six other touchpoints.
The seven touchpoints are intended to be cycled through multiple times as the software
product evolves. The touchpoints are also process agnostic, meaning that the practices can
be included in any software development process.

17.2.1.3 SAFECode

The Software Assurance Forum for Excellence in Code (SAFECode)13 is a non-profit, global,
industry-led organisation dedicated to increasing trust in information and communications
technology products and services through the advancement of effective software assurance
methods. The SAFECode mission is to promote best practices for developing and delivering
more secure and reliable software, hardware and services. The SAFECode organisation pub-
lishes the ’Fundamental practices for secure software development: Essential elements of a
secure development lifecycle program’ [1600] guideline to foster the industry-wide adoption of
fundamental secure development practices. The fundamental practices deal with assurance
– the ability of the software to withstand attacks that attempt to exploit design or imple-
mentation errors. The eight fundamental practices outlined in their guideline are described
below:
1. Application Security Control Definition. SAFECode uses the term Application Security
Controls (ASC) to refer to security requirements (see Section 2.1.1 bullet 2). Similarly,
13
https://safecode.org/

KA Secure Software Lifecycle | July 2021 Page 568


The Cyber Security Body Of Knowledge
www.cybok.org

NIST 800-53 [53] uses the phrase security control to refer to security functionality and
security assurance requirements.
The inputs to ASC include the following: secure design principles (see Section 2.1.3 bullet
3); secure coding practices; legal and industry requirements with which the application
needs to comply (such as HIPAA, PCI, GDPR, or SCADA); internal policies and standards;
incidents and other feedback; threats and risk. The development of ASC begins before
the design phase and continues throughout the lifecycle to provide clear and actionable
controls and to be responsive to changing business requirements and the ever-evolving
threat environment.
2. Design. Software must incorporate security features to comply with internal security
practices and external laws or regulations. Additionally, the software must resist known
threats based upon the operational environment. (see Section 2.1.1 bullet 5.) Threat
modelling (see Section 2.1.1 bullet 4), architectural reviews, and design reviews can be
used to identify and address design flaws before their implementation into source code.
The system design should incorporate an encryption strategy (see Section 2.1.1 bullet 6)
to protect sensitive data from unintended disclosure or alteration while the data are at
rest or in transit.
The system design should use a standardised approach to identity and access man-
agement to perform authentication and authorisation. The standardisation provides
consistency between components and clear guidance on how to verify the presence of
the proper controls. Authenticating the identity of a principal (be it a human user, other
service or logical component) and verifying the authorisation to perform an action are
foundational controls of the system. Several access control schemes have been devel-
oped to support authorisation: mandatory, discretionary, role-based or attribute-based.
Each of these has benefits and drawbacks and should be chosen based upon project
characteristics.
Log files provide the evidence needed in forensic analysis when a breach occurs to
mitigate repudiation threats. In a well-designed application, system and security log files
provide the ability to understand an application’s behaviour and how it is used at any
moment, and to distinguish benevolent user behaviour from malicious user behaviour.
Because logging affects the available system resources, the logging system should be
designed to capture the critical information while not capturing excess data. Policies and
controls need to be established around storing, tamper prevention and monitoring log
files. OWASP provides valuable resources on designing and implementing logging1415 .
3. Secure Coding Practices. Unintended code-level vulnerabilities are introduced by pro-
grammer mistakes. These types of mistakes can be prevented and detected through the
use of coding standards; selecting the most appropriate (and safe) languages, frame-
works and libraries, including the use of their associated security features (see Section
2.1.1 bullet 8); using automated analysis tools (see Section 2.1.1 bullets 9 and 10); and
manually reviewing the code.
Organisations provide standards and guidelines for secure coding, for example:
(a) OWASP Secure Coding Practices, Quick Reference Guide 16
14
https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
15
https://www.owasp.org/images/e/e0/OWASP_Logging_Guide.pdf
16
https://www.owasp.org/images/0/08/OWASP_SCP_Quick_Reference_Guide_v2.pdf

KA Secure Software Lifecycle | July 2021 Page 569


The Cyber Security Body Of Knowledge
www.cybok.org

(b) Oracle Secure Coding Guidelines for Java SE 17


(c) Software Engineering Institute (SEI) CERT Secure Coding Standards 18
Special care must also be given to handling unanticipated errors in a controlled and
graceful way through generic error handlers or exception handlers that log the events.
If the generic handlers are invoked, the application should be considered to be in an
unsafe state such that further execution is no longer considered trusted.
4. Manage Security Risk Inherent in the Use of Third-Party Components. See Section
2.1.1 bullet 7.
5. Testing and Validation. See Section 2.1.1 bullets 9-11 and Section 2.1.2 bullets 1, 3 and 4.
6. Manage Security Findings. The first five practices produce artifacts that contain or
generate findings related to the security of the product (or lack thereof). The findings in
these artifacts should be tracked and actions should be taken to remediate vulnerabilities,
such as is laid out in the Common Criteria (see Section 4.3) flaw remediation procedure
[1601]. Alternatively, the team may consciously accept the security risk when the risk is
determined to be acceptable. Acceptance of risk must be tracked, including a severity
rating; a remediation plan, an expiration or a re-review deadline; and the area for re-
review/validation.
Clear definitions of severity are important to ensure that all participants have and com-
municate with a consistent understanding of a security issue and its potential impact.
A possible starting point is mapping to the severity levels, attributes, and thresholds
used by the Common Vulnerability Scoring System (CVSS)19 such as 10–8.5 is critical,
8.4–7.0 is high, etc. The severity levels are used to prioritise mitigations based upon
their complexity of exploitation and impact on the properties of a system.
7. Vulnerability Response and Disclosure. Even with following a secure software lifecycle,
no product can be ’perfectly secure’ because of the constantly changing threat land-
scapes. Vulnerabilities will be exploited and the software will eventually be compromised.
An organisation must develop a vulnerability response and disclosure process to help
drive the resolution of externally discovered vulnerabilities and to keep all stakeholders
informed of progress. ISO provides industry-proven standards20 for vulnerability dis-
closure and handling. To prevent vulnerabilities from re-occurring in new or updated
products, the team should perform a root cause analysis and feed the lessons learned
into the secure software lifecycle practices. For further discussion, see Sections 2.1.1
bullet 12 and 2.1.2 bullet 7.
8. Planning the Implementation and Deployment of Secure Development. A healthy and
mature secure development lifecycle includes the above seven practices but also an
integration of these practices into the business process and the entire organisation,
including program management, stakeholder management, deployment planning, met-
rics and indicators, and a plan for continuous improvement. The culture, expertise and
skill level of the organisation needs to be considered when planning to deploy a secure
software lifecycle. Based upon past history, the organisation may respond better to a
corporate mandate, to a bottom-up groundswell approach or to a series of pilot programs.
17
https://www.oracle.com/technetwork/java/seccodeguide-139067.html
18
https://wiki.sei.cmu.edu/confluence/display/seccode/SEI+CERT+Coding+Standards
19
https://www.first.org/cvss/
20
https://www.iso.org/standard/45170.html and https://www.iso.org/standard/53231.html

KA Secure Software Lifecycle | July 2021 Page 570


The Cyber Security Body Of Knowledge
www.cybok.org

Training will be needed (see Section 2.1.1 bullet 1). The specification of the organisation’s
secure software lifecycle including the roles and responsibilities should be documented.
Plans for compliance and process health should be made (see Section 17.4).

17.2.2 Comparing the Secure Software Lifecycle Models


In 2009, De Win et al. [1602] compared CLASP, Microsoft’s originally-documented SDL [1572],
and Touchpoints (see Section 2.1.2)for the purpose of providing guidance on their commonali-
ties and the specificity of the approach, and making suggestions for improvement. The authors
mapped the 153 possible activities of each lifecycle model into six software development
phases: education and awareness; project inception; analysis and requirements; architectural
and detailed design; implementation and testing; and release, deployment and support. The
activities took the practices in Sections 2.1.1–2.1.3 into much finer granularity. The authors
indicated whether each model includes each of the 153 activities and provides guidance on
the strengths and weaknesses of each model. The authors found no clear comprehensive
’winner’ among the models, so practitioners could consider using guidelines for the desired
fine-grained practices from all the models.
Table 17.1 places the the practices of Sections 2.1.1–2.1.3 into the six software development
phases used by De Win et al. [1602]. Similar to prior work [1602], the models demonstrate
strengths and weaknesses in terms of guidance for the six software development phases. No
model can be considered perfect for all contexts. Security experts can customize a model
for their organizations considering the spread of practices for the six software development
phases.

KA Secure Software Lifecycle | July 2021 Page 571


The Cyber Security Body Of Knowledge
www.cybok.org

Microsoft SDL Touchpoints SAFECode

• Planning the
Education and implementation and
awareness • Provide training
deployment of secure
development

• Define metrics and


compliance reporting
• Planning the
Project inception • Define and use implementation and
cryptography deployment of secure
standards development
• Use approved tools

• Define security
Analysis and requirements • Abuse cases • Application security
requirements control definition
• Perform threat • Security requirements
modelling

Architectural and • Establish design • Architectural Risk


detailed design requirements Analysis
• Design

• Perform static analysis


security testing (SAST)
• Perform dynamic
analysis security
• Secure coding
testing (DAST)
• Code review (tools) practices
Implementation • Perform penetration
• Penetration testing • Manage security risk
and testing testing
inherent in the use of
• Risk-based security
• Define and use third-party components
testing
cryptography
• Testing and validation
standards
• Manage the risk of
using third-party
components

Release, • Establish a standard


deployment, and incident response • Security operations
• Vulnerability response
support process
and disclosure

Table 17.1: Comparing the Software Security Lifecycle Models

KA Secure Software Lifecycle | July 2021 Page 572


The Cyber Security Body Of Knowledge
www.cybok.org

17.3 ADAPTATIONS OF THE SECURE SOFTWARE


LIFECYCLE
[1600, 1603, 1604, 1605, 1606, 1607, 1608, 1609, 1610, 1611]
The secure software lifecycle models discussed in Section 17.2.1 can be integrated with any
software development model and are domain agnostic. In this section, information on six
adaptations to secure software lifecycle is provided.

17.3.1 Agile Software Development and DevOps


Agile and continuous software development methodologies are highly iterative, with new
functionality being provided to customers frequently - potentially as quickly as multiple times
per day or as ’slowly’ as every two to four weeks.
Agile software development methodologies can be functional requirement-centric, with the
functionality being expressed as user stories. SAFECode [1603] provides practical software
security guidance to agile practitioners. This guidance includes a set of 36 recommended
security-focused stories that can be handled similarly to the functionality-focused user stories.
These stories are based upon common security issues such as those listed in the OWASP
Top 1021 Most Critical Web Application Security Risks. The stories are mapped to Common
Weakness Enumerations (CWEs)22 identifiers, as applicable. The security-focused stories
are worded in a format similar to functionality stories (i.e., As a [stakeholder], I want to [new
functionality] so that I can [goal]). For example, a security-focused story using this format
is provided: As Quality Assurance, I want to verify that all users have access to the specific
resources they require which they are authorised to use, that is mapped to CWE-862 and CWE-
863. The security-focused stories are further broken down into manageable and concrete
tasks that are owned by team roles, including architects, developers, testers and security
experts, and are mapped to SAFECode Fundamental Practices [1600]. Finally, 17 operational
security tasks were specified by SAFECode. These tasks are not directly tied to stories but
are handled as continuous maintenance work (such as, Continuously verify coverage of static
code analysis tools) or as an item requiring special attention (such as, Configure bug tracking
to track security vulnerabilities).
With a DevOps approach to developing software, development and operations are tightly inte-
grated to enable fast and continuous delivery of value to end users. Microsoft has published
a DevOps secure software lifecycle model [1604] that includes activities for operations engi-
neers to provide fast and early feedback to the team to build security into DevOps processes.
The Secure DevOps model contains eight practices, including eight of the 12 practices in the
Microsoft Security Development Lifecycle discussed in Section 2.1.1:
1. Provide Training. The training, as outlined in Section 2.1.1 bullet 1, must include the
operations engineers. The training should encompass attack vectors made available
through the deployment pipeline.
2. Define Requirements. See Section 2.1.1 bullet 2.
21
https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
22
https://cwe.mitre.org/; CWE is a community-developed list of common software security weaknesses. It
serves as a common language, a measuring stick for software security tools, and as a baseline for weakness
identification, mitigation, and prevention efforts.

KA Secure Software Lifecycle | July 2021 Page 573


The Cyber Security Body Of Knowledge
www.cybok.org

3. Define Metrics and Compliance Reporting. See Section 2.1.1 bullet 3.


4. Use Software Composition Analysis (SCA) and Governance. When selecting both com-
mercial and open-source third-party components, the team should understand the impact
that a vulnerability in the component could have on the overall security of the system
and consider performing a more thorough evaluation before using them. Software
Composition Analysis (SCA) tools, such as WhiteSource23 can assist with licensing
exposure, provide an accurate inventory of components, and report any vulnerabilities
with referenced components. See also Section 2.1.1 bullet 7.
5. Perform Threat Modelling. See Section 2.1.1 bullet 4. Threat modelling may be perceived
as slowing down the rapid DevOps pace. However, products that are deployed rapidly
under a DevOps deployment process should have a defined overall architecture within
which the DevOps process makes changes and adds features. That architecture should
be threat modeled, and when the team needs to change the architecture the threat model
should also be updated. New features that do not have an architectural impact represent
a null change to the threat model.
6. Use Tools and Automation. See Section 2.1.1 bullets 8, 9 and 10. The team should
carefully select tools that can be integrated into the engineer’s Integrated Development
Environment (IDE) and workflow such that they cause minimal disruption. The goal of
using these tools is to detect defects and vulnerabilities and not to overload engineers
with too many tools or alien processes outside of their everyday engineering experience.
The tools used as part of a secure DevOps workflow should adhere to the following
principles:
(a) Tools must be integrated into the Continuous Integration/Continuous Delivery
(CI/CD) pipeline.
(b) Tools must not require security expertise beyond what is imparted by the training.
(c) Tools must avoid a high false-positive rate of reporting issues.
7. Keep Credentials Safe. Scanning for credentials and other sensitive content in source
files is necessary during pre-commit to reduce the risk of propagating the sensitive
information through the CI/CD process, such as through Infrastructure as Code or
other deployment scripts. Tools, such as CredScan24 , can identify credential leaks,
such as those in source code and configuration files. Some commonly found types of
credentials include default passwords, hard-coded passwords, SQL connection strings
and Certificates with private keys.
8. Use Continuous Learning and Monitoring. Rapidly-deployed systems often monitor the
health of applications, infrastructure and networks through instrumentation to ensure
the systems are behaving ’normally’. This monitoring can also help uncover security
and performance issues which are departures from normal behaviour. Monitoring is
also an essential part of supporting a defense-in-depth strategy and can reduce an
organisation’s Mean Time To Identify (MTTI) and Mean Time To Contain (MTTC) an
attack.
23
https://www.whitesourcesoftware.com/
24
https://secdevtools.azurewebsites.net/helpcredscan.html

KA Secure Software Lifecycle | July 2021 Page 574


The Cyber Security Body Of Knowledge
www.cybok.org

17.3.2 Mobile
Security concerns for mobile apps differ from traditional desktop software in some important
ways, including local data storage, inter-app communication, proper usage of cryptographic
APIs and secure network communication. The OWASP Mobile Security Project [1605] is a
resource for developers and security teams to build and maintain secure mobile applications;
see also the Web & Mobile Security Knowledge Area (Chapter 16).
Four resources are provided to aid in the secure software lifecycle of mobile applications:
1. OWASP Mobile Application Security Verification Standard (MASVS) Security Require-
ments and Verification. The MASVS defines a mobile app security model and lists
generic security requirements for mobile apps. The MASVS can be used by architects,
developers, testers, security professionals, and consumers to define and understand
the qualities of a secure mobile app.
2. Mobile Security Testing Guide (MSTG). The guide25 is a comprehensive manual for
mobile application security testing and reverse engineering for iOS and Android mobile
security testers. The guide provides the following content:
(a) A general mobile application testing guide that contains a mobile app security test-
ing methodology and general vulnerability analysis techniques as they apply to
mobile app security. The guide also contains additional technical test cases that are
operating system independent, such as authentication and session management,
network communications, and cryptography.
(b) Operating system-dependent testing guides for mobile security testing on the An-
droid and iOS platforms, including security basics; security test cases; reverse
engineering techniques and prevention; and tampering techniques and prevention.
(c) Detailed test cases that map to the requirements in the MASVS.
3. Mobile App Security Checklist. The checklist26 is used for security assessments and
contains links to the MSTG test case for each requirement.
4. Mobile Threat Model. The threat model [1606] provides a checklist of items that should
be documented, reviewed and discussed when developing a mobile application. Five
areas are considered in the threat model:
(a) Mobile Application Architecture. The mobile application architecture describes
device-specific features used by the application, wireless transmission protocols,
data transmission medium, interaction with hardware components and other appli-
cations. The attack surface can be assessed through a mapping to the architecture.
(b) Mobile Data. This section of the threat model defines the data the application
stores, transmits and receives. The data flow diagrams should be reviewed to
determine exactly how data are handled and managed by the application.
(c) Threat Agent Identification. The threat agents are enumerated, including humans
and automated programs.
(d) Methods of Attack. The most common attacks utilised by threat agents are defined
so that controls can be developed to mitigate attacks.
25
https://www.owasp.org/index.php/OWASP_Mobile_Security_Testing_Guide
26
https://github.com/OWASP/owasp-mstg/tree/master/Checklists

KA Secure Software Lifecycle | July 2021 Page 575


The Cyber Security Body Of Knowledge
www.cybok.org

(e) Controls. The controls to mitigate attacks are defined.

17.3.3 Cloud Computing


The emergence of cloud computing bring unique security risks and challenges. In conjunction
with the Cloud Security Alliance (CSA)27 , SAFECode has provided a ’Practices for Secure
Development of Cloud Applications’ [1607] guideline as a supplement to the ’Fundamental
Practices for Secure Software Development’ guideline [1600] discussed in Section 17.2.1.3 -
see also the Distributed Systems Security Knowledge Area (Chapter 12). The Cloud guideline
provides additional secure development recommendations to address six threats unique to
cloud computing and to identify specific security design and implementation practices in the
context of these threats. These threats and associated practices are provided:
1. Threat: Multitenancy. Multitenancy allows multiple consumers or tenants to maintain a
presence in a cloud service provider’s environment, but in a manner where the computa-
tions, processes, and data (both at rest and in transit) of one tenant are isolated from
and inaccessible to another tenant. Practices:
(a) Model the application’s interfaces in threat models. Ensure that the multitenancy
threats, such as information disclosure and privilege elevation are modeled for each
of these interfaces, and ensure that these threats are mitigated in the application
code and/or configuration settings.
(b) Use a ’separate schema’ database design and tables for each tenant when building
multitenant applications rather than relying on a ’TenantID’ column in each table.
(c) When developing applications that leverage a cloud service provider’s Platform as
a Service (PaaS) services, ensure common services are designed and deployed in
a way that ensures that the tenant segregation is maintained.
2. Tokenisation of Sensitive Data. An organisation may not wish to generate and store
intellectual property in a cloud environment not under its control. Tokenisation is a
method of removing sensitive data from systems where they do not need to exist or
disassociating the data from the context or the identity that makes them sensitive.
The sensitive data are replaced with a token for those data. The token is later used to
rejoin the sensitive data with other data in the cloud system. The sensitive data are
encrypted and secured within an organisation’s central system which can be protected
with multiple layers of protection and appropriate redundancy for disaster recovery and
business continuity. Practices:
(a) When designing a cloud application, determine if the application needs to process
sensitive data and if so, identify any organisational, government, or industry regu-
lations that pertain to that type of sensitive data and assess their impact on the
application design.
(b) Consider implementing tokenisation to reduce or eliminate the amount of sensitive
data that need to be processed and or stored in cloud environments.
(c) Consider data masking, an approach that can be used in pre-production test and
debug systems in which a representative data set is used, but does not need to
have access to actual sensitive data. This approach allows the test and debug
systems to be exempt from sensitive data protection requirements.
27
https://cloudsecurityalliance.org/

KA Secure Software Lifecycle | July 2021 Page 576


The Cyber Security Body Of Knowledge
www.cybok.org

3. Trusted Compute Pools. Trusted Compute Pools are either physical or logical groupings
of compute resources/systems in a data centre that share a security posture. These
systems provide measured verification of the boot and runtime infrastructure for mea-
sured launch and trust verification. The measurements are stored in a trusted location
on the system (referred to as a Trusted Platform Module (TPM)) and verification occurs
when an agent, service or application requests the trust quote from the TPM. Practices:
(a) Ensure the platform for developing cloud applications provides trust measurement
capabilities and the APIs and services necessary for your applications to both
request and verify the measurements of the infrastructure they are running on.
(b) Verify the trust measurements as either part of the initialisation of your application
or as a separate function prior to launching the application.
(c) Audit the trust of the environments your applications run on using attestation
services or native attestation features from your infrastructure provider.
4. Data Encryption and Key Management. Encryption is the most pervasive means of
protecting sensitive data both at rest and in transit. When encryption is used, both
providers and tenants must ensure that the associated cryptographic key materials are
properly generated, managed and stored. Practices:
(a) When developing an application for the cloud, determine if cryptographic and key
management capabilities need to be directly implemented in the application or
if the application can leverage cryptographic and key management capabilities
provided by the PaaS environment.
(b) Make sure that appropriate key management capabilities are integrated into the
application to ensure continued access to data encryption keys, particularly as the
data move across cloud boundaries, such as enterprise to cloud or public to private
cloud.
5. Authentication and Identity Management. As an authentication consumer, the appli-
cation may need to authenticate itself to the PaaS to access interfaces and services
provided by the PaaS. As an authentication provider, the application may need to authen-
ticate the users of the application itself. Practices:
(a) Cloud application developers should implement the authentication methods and
credentials required for accessing PaaS interfaces and services.
(b) Cloud application developers need to implement appropriate authentication meth-
ods for their environments (private, hybrid or public).
(c) When developing cloud applications to be used by enterprise users, developers
should consider supporting Single Sign On (SSO) solutions.
6. Shared-Domain Issues. Several cloud providers offer domains that developers can use
to store user content, or for staging and testing their cloud applications. As such, these
domains, which may be used by multiple vendors, are considered ’shared domains’ when
running client-side script (such as JavaScript) and from reading data. Practices:
(a) Ensure that your cloud applications are using custom domains whenever the cloud
provider’s architecture allows you to do so.
(b) Review your source code for any references to shared domains.

KA Secure Software Lifecycle | July 2021 Page 577


The Cyber Security Body Of Knowledge
www.cybok.org

The European Union Agency for Cybersecurity (ENISA) [1609] conducted an in-depth and
independent analysis of the information security benefits and key security risks of cloud
computing. The analysis reports that the massive concentrations of resources and data in the
cloud present a more attractive target to attackers, but cloud-based defences can be more
robust, scalable and cost-effective.

17.3.4 Internet of Things (IoT)


The Internet of Things (IoT) is utilised in almost every aspect of our daily life, including the
extension into industrial sectors and applications (i.e. Industrial IOT (IIoT)). IoT and IIoT
constitute an area of rapid growth that presents unique security challenges. [From this point
forth we include IIoT when we use IoT.] Some of these are considered in the Cyber-Physical
Systems Security Knowledge Area (Chapter 21), but we consider specifically software lifecycle
issues here. Devices must be securely provisioned, connectivity between these devices and
the cloud must be secure, and data in storage and in transit must be protected. However, the
devices are small, cheap, resource-constrained. Building security into each device may not be
considered to be cost effective by its manufacturer, depending upon the value of the device
and the importance of the data it collects. An IoT-based solution often has a large number of
geographically-distributed devices. As a result of these technical challenges, trust concerns
exist with the IoT, most of which currently have no resolution and are in need of research.
However, the US National Institute of Standards and Technology (NIST) [1608] recommends
four practices for the development of secure IoT-based systems.
1. Use of Radio-Frequency Identification (RFID) tags. Sensors and their data may be tam-
pered with, deleted, dropped, or transmitted insecurely. Counterfeit ’things’ exist in the
marketplace. Unique identifiers can mitigate this problem by attaching Radio-Frequency
Identification (RFID) tags to devices. Readers activate a tag, causing the device to
broadcast radio waves within a bandwidth reserved for RFID usage by governments
internationally. The radio waves transmit identifiers or codes that reference unique
information associated with the device.
2. Not using or allowing the use of default passwords or credentials. IoT devices are often
not developed to require users and administrators to change default passwords during
system set up. Additionally, devices often lack intuitive user interfaces for changing
credentials. Recommended practices are to require passwords to be changed or to
design in intuitive interfaces. Alternatively, manufacturers can randomise passwords
per device rather than having a small number of default passwords.
3. Use of the Manufacturer Usage Description (MUD) specification. The Manufacturer
Usage Description (MUD)28 specification allows manufacturers to specify authorised
and expected user traffic patterns to reduce the threat surface of an IoT device by
restricting communications to/from the device to sources and destinations intended by
the manufacturer.
4. Development of a Secure Upgrade Process. In non-IoT systems, updates are usually
delivered via a secure process in which the computer can authenticate the source
pushing the patches and feature and configuration updates. IoT manufacturers have,
generally, not established such a secure upgrade process, which enables attackers to
conduct a man-in-the-middle push of their own malicious updates to the devices. The IoT
28
https://tools.ietf.org/id/draft-ietf-opsawg-mud-22.html

KA Secure Software Lifecycle | July 2021 Page 578


The Cyber Security Body Of Knowledge
www.cybok.org

Firmware Update Architecture 29 provides guidance on implementing a secure firmware


update architecture including hard rules defining how device manufacturers should
operate.
Additionally, the UK Department for Digital, Culture, Media, and Sport have provided the Code
of Practice for consumer IoT security30 . Included in the code of practice are 13 guidelines
for improving the security of consumer IoT products and associated services. Two of the
guidelines overlap with NIST bullets 2 and 4 above. The full list of guidelines include the
following: (1) No default passwords; (2) Implement a vulnerability disclosure policy; (3) Keep
software updated; (4) Securely store credentials and security-sensitive data; (5) Communicate
securely (i.e. use encryption for sensitive data); (6) Minimise exposed attack surfaces; (7)
Ensure software integrity (e.g. use of a secure boot); (8) Ensure that personal data is protected
(i.e. in accordance with GDPR); (9) Make systems resilient to outages; (10) Monitor system
telemetry data; (11) Make it easy for consumers to delete personal data; (12) Make installation
and maintenance of devices easy; and (13) Validate input data. Finally, Microsoft has provided
an Internet of Things security architecture.31

17.3.5 Road Vehicles


A hacker that compromises a connected road vehicle‘s braking or steering system could
cause a passenger or driver to lose their lives. Attacks such as these have been demonstrated,
beginning with the takeover of a Ford Escape and a Toyota Prius by white-hat hackers Charlie
Miller and Chris Valasek in 201332 . Connected commercial vehicles are part of the critical
infrastructure in complex global supply chains. In 2018, the number of reported attacks on
connected vehicles shot up six times more than the number just three years earlier [1610],
due to both the increase in connected vehicles and their increased attractiveness as a target
of attackers [1611]. Broader issues with Cyber-Physical Systems are addressed in the Cyber-
Physical Systems Security Knowledge Area (Chapter 21).
The US National Highway Traffic Safety Administration (HTSA) defines road vehicle cyber
security as the protection of automotive electronic systems, communication networks, control
algorithms, software, users and underlying data from malicious attacks, damage, unauthorised
access or manipulation33 . The HTSA provides four guidelines for the automotive industry for
consideration in their secure software development lifecycle:
1. The team should follow a secure product development process based on a systems-
engineering approach with the goal of designing systems free of unreasonable safety
risks including those from potential cyber security threats and vulnerabilities.
2. The automotive industry should have a documented process for responding to incidents,
vulnerabilities and exploits. This process should cover impact assessment, containment,
recovery and remediation actions, the associated testing, and should include the creation
of roles and responsibilities for doing so. The industry should also establish metrics to
periodically assess the effectiveness of their response process.
29
https://tools.ietf.org/id/draft-moran-suit-architecture-02.html
30
https://www.gov.uk/government/publications/code-of-practice-for-consumer-iot-security/code-of-practice-for-
consumer-iot-security
31
https://docs.microsoft.com/en-us/azure/iot-fundamentals/iot-security-architecture
32
https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/
33
https://www.nhtsa.gov/crash-avoidance/automotive-cybersecurity#automotive-cybersecurity-overview

KA Secure Software Lifecycle | July 2021 Page 579


The Cyber Security Body Of Knowledge
www.cybok.org

3. The automotive industry should document the details related to their cyber security
process, including the results of risk assessment, penetration testing and organisa-
tions decisions related to cyber security. Essential documents, such as cyber security
requirements, should follow a robust version control protocol.
4. These security requirements should be incorporated into the product’s security require-
ments, as laid out in Section 2.1.1 bullet 2, Section 2.1.2 bullet 6, and Section 2.1.3 bullet
1.:
(a) Limit developer/debugging access to production devices, such as through an open
debugging port or through a serial console.
(b) Keys (e.g., cryptographic) and passwords which can provide an unauthorised, el-
evated level of access to vehicle computing platforms should be protected from
disclosure. Keys should not provide access to multiple vehicles.
(c) Diagnostic features should be limited to a specific mode of vehicle operation
which accomplishes the intended purpose of the associated feature. For example,
a diagnostic operation which may disable a vehicle’s individual brakes could be
restricted to operating only at low speeds or not disabling all the brakes at the same
time.
(d) Encryption should be considered as a useful tool in preventing the unauthorised
recovery and analysis of firmware.
(e) Limit the ability to modify firmware and/or employ signing techniques to make it
more challenging for malware to be installed on vehicles.
(f) The use of network servers on vehicle ECUs should be limited to essential func-
tionality, and services over these ports should be protected to prevent use by
unauthorised parties.
(g) Logical and physical isolation techniques should be used to separate processors,
vehicle networks, and external connections as appropriate to limit and control
pathways from external threat vectors to cyber-physical features of vehicles.
(h) Sending safety signals as messages on common data buses should be avoided, but
when used should employ a message authentication scheme to limit the possibility
of message spoofing.
(i) An immutable log of events sufficient to enable forensic analysis should be main-
tained and periodically scrutinised by qualified maintenance personnel to detect
trends of cyber-attack.
(j) Encryption methods should be employed in any IP-based operational communi-
cation between external servers and the vehicle, and should not accept invalid
certificates.
(k) Plan for and design-in features that could allow for changes in network routing
rules to be quickly propagated and applied to one, a subset or all vehicles
The International Organization for Standardization (ISO)34 and the Society for Automotive
Engineering (SAE) International 35 are jointly developing an international Standard, ISO 21434
34
https://www.iso.org/standard/70918.html
35
www.sae.org

KA Secure Software Lifecycle | July 2021 Page 580


The Cyber Security Body Of Knowledge
www.cybok.org

Road vehicles - cyber security engineering36 . The standard will specify minimum requirements
on security engineering processes and activities, and will define criteria for assessment.
Explicitly, the goal is to provide a structured process to ensure cyber security is designed in
upfront and integrated throughout the lifecycle process for both hardware and software.
The adoption of a secure software lifecycle in the automotive industry may be driven by
legislation, such as through the US SPY Car Act37 or China and Germany’s Intelligent and
Connected Vehicles (ICVs) initiative38 .

17.3.6 ECommerce/Payment Card Industry


The ability to steal large quantities of money makes the Payment Card Industry (PCI) an
especially attractive target for attackers. In response, the PCI created the Security Standards
Council, a global forum for the ongoing development, enhancement, storage, dissemination,
and implementation of security standards for account data protection. The Security Standards
Council established the Data Security Standard (PCI DSS), which must be upheld by any
organisations that handle payment cards, including debit and credit cards. PCI DSS contains
12 requirements39 that are a set of security controls that businesses are required to implement
to protect credit card data. These specific requirements are incorporated into the product’s
security requirements, as laid out in Section 2.1.1 bullet 2, Section 2.1.2 bullet 6, and Section
2.1.3 bullet 1. The 12 requirements are as follows:
1. Install and maintain a firewall configuration to protect cardholder data.
2. Do not use vendor-supplied defaults for system passwords and other security parame-
ters.
3. Protect stored cardholder data.
4. Encrypt transmission of cardholder data across open, public networks.
5. Use and regularly update antivirus software.
6. Develop and maintain secure systems and applications, including detecting and mitigat-
ing vulnerabilities and applying mitigating controls.
7. Restrict access to cardholder data by business need-to-know.
8. Assign a unique ID to each person with computer access.
9. Restrict physical access to cardholder data.
10. Track and monitor all access to network resources and cardholder data.
11. Regularly test security systems and processes.
12. Maintain a policy that addresses information security.
36
https://www.iso.org/standard/70918.html
37
https://www.congress.gov/bill/115th-congress/senate-bill/680
38
http://icv.sustainabletransport.org/
39
https://searchsecurity.techtarget.com/definition/PCI-DSS-12-requirements

KA Secure Software Lifecycle | July 2021 Page 581


The Cyber Security Body Of Knowledge
www.cybok.org

17.4 ASSESSING THE SECURE SOFTWARE LIFECYCLE


[1612, 1613]
Organisations may wish to or be required to assess the maturity of their secure development
lifecycle. Three assessment approaches are described in this section.

17.4.1 SAMM
The Software Assurance Maturity Model (SAMM)40 is an open framework to help organisations
formulate and implement a strategy for software security that is tailored to the specific risks
facing the organisation. Resources are provided for the SAMM to enable an organisation to
do the following:
1. Define and measure security-related activities within an organisation.
2. Evaluate their existing software security practices.
3. Build a balanced software security program in well-defined iterations.
4. Demonstrate improvements in a security assurance program.
Because each organisation utilises its own secure software process (i.e., its own unique
combination of the practices laid out in Sections 2 and 3), the SAMM provides a framework to
describe software security initiatives in a common way. The SAMM designers enumerated
activities executed by organisations in support of their software security efforts. Some ex-
ample activities include: build and maintain abuse case models per project; specify security
requirements based upon known risks; and identify the software attack surface. These activi-
ties are categorised into one of 12 security practices. The 12 security practices are further
grouped into one of four business functions. The business functions and security practices
are as follows:
1. Business Function: Governance
(a) Strategy and metrics
(b) Policy and compliance
(c) Education and guidance
2. Business Function: Construction
(a) Threat assessment
(b) Security requirements
(c) Secure architecture
3. Business Function: Verification
(a) Design review
(b) Code review
(c) Security testing
40
https://www.opensamm.org/ and https://www.owasp.org/images/6/6f/SAMM_Core_V1-5_FINAL.pdf

KA Secure Software Lifecycle | July 2021 Page 582


The Cyber Security Body Of Knowledge
www.cybok.org

4. Business Function: Deployment


(a) Vulnerability management
(b) Environment hardening
(c) Operational enablement
The SAMM assessments are conducted through self-assessments or by a consultant chosen
by the organisation. Spreadsheets are provide by SAMM for scoring the assessment, providing
information for the organisation on their current maturity level:
• 0: Implicit starting point representing the activities in the Practice being unfulfilled.
• 1: Initial understanding and ad hoc provision of the Security Practice.
• 2: Increase efficiency and/or effectiveness of the Security Practice.
• 3: Comprehensive mastery of the Security Practice at scale.
Assessments may be conducted periodically to measure improvements in an organisation’s
security assurance program.

17.4.2 BSIMM
Gary McGraw, Sammy Migues, and Brian Chess desired to create a descriptive model of the
state-of-the-practice in secure software development lifecycle. As a result, they forked an early
version of SAMM (see Section 4.1) to create the original structure of the Building Security In
Maturity Model (BSIMM) [1612, 1613] in 2009. Since that time, the BSIMM has been used to
structure a multi-year empirical study of the current state of software security initiatives in
industry.
Because each organisation utilises its own secure software process (i.e., its own unique
combination of the practices laid out in Sections 2 and 3), the BSIMM provides a framework
to describe software security initiatives in a common way. Based upon their observations,
the BSIMM designers enumerated 113 activities executed by organisations in support of
their software security efforts. Some example activities include: build and publish security
features; use automated tools along with a manual review; and integrate black-box security
tools into the quality assurance process. Each activity is associated with a maturity level and
is categorised into one of 12 practices. The 12 practices are further grouped into one of four
domains. The domains and practices are as follows:
1. Domain: Governance
(a) Strategy and metrics
(b) Compliance and policy
(c) Training
2. Domain: Intelligence
(a) Attack models
(b) Security features and design
(c) Standards and requirements

KA Secure Software Lifecycle | July 2021 Page 583


The Cyber Security Body Of Knowledge
www.cybok.org

3. Domain: Secure software development lifecycle touchpoints


(a) Architecture analysis
(b) Code review
(c) Security testing
4. Domain: Deployment
(a) Penetration testing
(b) Software environment
(c) Configuration management and vulnerability management
BSIMM assessments are conducted through in-person interviews by software security profes-
sionals at Cigital (now Synopsys) with security leaders in a firm. Via the interviews, the firm
obtains a scorecard on which of the 113 software security activities the firm uses. After the
firm completes the interviews, they are provided information comparing themselves with the
other organisations that have been assessed. BSIMM assessments have been conducted
since 2008. Annually, the overall results of the assessments from all firms are published,
resulting in the BSIMM1 through BSIMM9 reports. Since the BSIMM study began in 2008,
167 firms have participated in BSIMM assessment, sometimes multiple times, comprising
389 distinct measurements. To ensure the continued relevance of the data reported, the
BSIMM9 report excluded measurements older than 42 months and reported on 320 distinct
measurements collected from 120 firms.

17.4.3 The Common Criteria


The purpose of this Common Criteria (CC)41 is to provide a vehicle for international recog-
nition of a secure information technology (IT) product (where the SAMM and BSIMM were
assessments of a development process). The objective of the CC is for IT products that
have earned a CC certificate from an authorised Certification/Validation Body (CB) to be
procured or used with no need for further evaluation. The Common Criteria seek to provide
grounds for confidence in the reliability of the judgments on which the original certificate
was based by requiring that a CB issuing Common Criteria certificates should meet high and
consistent standards. A developer of a new product range may provide guidelines for the
secure development and configuration of that product. This guideline can be submitted as a
Protection Profile (the pattern for similar products that follow on). Any other developer can
add to or change this guideline. Products that earn certification in this product range use the
protection profile as the delta against which they build.
Based upon the assessment of the CB, a product receives an Evaluation Assurance Level
(EAL). A product or system must meet specific assurance requirements to achieve a particular
EAL. Requirements involve design documentation, analysis and functional or penetration
testing. The highest level provides the highest guarantee that the system’s principal security
features are reliably applied. The EAL indicates to what extent the product or system was
tested:
• EAL 1: Functionally tested. Applies when security threats are not viewed as serious. The
evaluation provides evidence that the system functions in a manner consistent with its
41
https://www.commoncriteriaportal.org/ccra/index.cfm

KA Secure Software Lifecycle | July 2021 Page 584


The Cyber Security Body Of Knowledge
www.cybok.org

documentation and that it provides useful protection against identified threats.


• EAL 2: Structurally tested. Applies when stakeholders require low-to-moderate
independently-assured security but the complete development record is not readily
available, such as with securing a legacy system.
• EAL 3: Methodically tested and checked. Applies when stakeholders require a moderate
level of independently-assured security and a thorough investigation of the system and
its development, without substantial re-engineering.
• EAL 4: Methodically designed, tested and reviewed. Applies when stakeholders require
moderate-to-high independently-assured security in commodity products and are pre-
pared to incur additional security-specific engineering costs.
• EAL 5: Semi-formally designed and tested. Applies when stakeholders require high,
independently-assured security in a planned development and require a rigorous de-
velopment approach that does not incur unreasonable costs from specialist security
engineering techniques.
• EAL 6: Semi-formally verified design and tested. Applies when developing systems in
high-risk situations where the value of the protected assets justifies additional costs.
• EAL 7: Formally verified design and tested. Applies when developing systems in ex-
tremely high-risk situations and when the high value of the assets justifies the higher
costs.
The CC provides a set of security functional and security assurance requirements. These
requirements, as appropriate, are incorporated into the product’s security requirements, as
laid out in Section 2.1.1 bullet 2, Section 2.1.2 bullet 6, and Section 2.1.3 bullet 1.

17.5 ADOPTING A SECURE SOFTWARE LIFECYCLE


[1612, 1613, 1614]
This knowledge area has provided a myriad of possible practices an organisation can include
in its secure software lifecycle. Some of these practices, such as those discussed in Section 2,
potentially apply to any product. Other practices are domain specific, such as those discussed
in Section 3.
Organisations adopting new practices often like to learn from and adopt practices that are
used by organisations similar to themselves [1614]. When choosing which security practices
to include in a secure software lifecycle, organisations can consider looking at the latest
BSIMM [1612, 1613] results which provide updated information on the adoption of practices in
the industry.

KA Secure Software Lifecycle | July 2021 Page 585


The Cyber Security Body Of Knowledge
www.cybok.org

DISCUSSION
[1615]
This chapter has provided an overview of of three prominent and prescriptive secure software
lifecycle processes and six adaptations of these processes that can be applied in a specified
domain. However, the cybersecurity landscape in terms of threats, vulnerabilities, tools, and
practices is ever evolving. For example, a practice has has not be been mentioned in any of
these nine processes is the use of a bug bounty program for the identification and resolution
of vulnerabilities. With a bug bounty program, organisations compensate individuals and/or
researchers for finding and reporting vulnerabilities. These individuals are external to the
organisation producing the software and may work independently or through a bug bounty
organisation, such as HackerOne42 .
While the majority of this knowledge area focuses on technical practices, the successful
adoption of these practices involves organisational and cultural changes in an organisation.
The organisation, starting from executive leadership, must support the extra training, resources,
and steps needed to use a secure development lifecycle. Additionally, every developer must
uphold his or her responsibility to take part in such a process.
A team and an organisation need to choose the appropriate software security practices to de-
velop a customised secure software lifecycle based upon team and technology characteristics
and upon the security risk of the product.
While this chapter has provided practices for developing secure products, information in-
security is often due to economic disincentives [1615] which drives software organizations
to choose the rapid deployment and release of functionality over the production of secure
products. As a result, increasingly governments and industry groups are imposing cyber
security standards on organisations as a matter of legal compliance or as a condition for
being considered as a vendor. Compliance requirements may lead to faster adoption of a
secure development lifecycle. However, this compliance-driven adoption may divert efforts
away from the real security issues by driving an over-focus on compliance requirements rather
than on the pragmatic prevention and detection of the most risky security concerns.

42
https://www.hackerone.com

KA Secure Software Lifecycle | July 2021 Page 586


The Cyber Security Body Of Knowledge
www.cybok.org

CROSS-REFERENCE OF TOPICS VS REFERENCE MATERIAL

[1600]
[1469]
[1572]

[1577]
17.1 Motivation c1 c1 c1
17.2 Prescriptive Secure Software Lifecycle Processes
17.2.1 Secure Software Lifecycle Processes c2 c2 c2 c2
17.2.2 Comparing the Secure Software Lifecycle Models
17.3 Adaptations of the Secure Software Lifecycle
17.3.1 Agile Software Development and DevOps c3
17.3.2 Mobile
17.3.3 Cloud Computing
17.3.4 Internet of Things (IoT)
17.3.5 Road Vehicles
17.3.6 ECommerce/Payment Card Industry
17.4 Assessing the Secure Software Lifecycle
17.5 Adopting a Secure Software Lifecycle

FURTHER READING

Building Secure Software: How to Avoid Security Problems the Right Way
[1469]
This book introduces the term software security as an engineering discipline for building
security into a product. This book provides essential lessons and expert techniques for
security professionals who understand the role of software in security problems and for
software developers who want to build secure code. The book also discusses risk assessment,
developing security tests, and plugging security holes before software is shipped.

Writing Secure Code, Second Edition. [1577]


The first edition of this book was internally published in Microsoft and was required reading for
all members of the Windows team during the Windows Security Push. The second edition was
made publicly available in the 2003 book and provides secure coding techniques to prevent
vulnerabilities, to detect design flaws and implementation bugs, and to improve test code and
documentation.

KA Secure Software Lifecycle | July 2021 Page 587


The Cyber Security Body Of Knowledge
www.cybok.org

Software Security: Building Security In [1578]


This book discusses seven software securing best practices, called touchpoints. It also
provides information on software security fundamentals and contexts for a software security
program in an enterprise.

The Security Development Lifecycle (Original Book) [1572]


This seminal book provides the foundation for the other processes laid out in this knowledge
area, and was customised over the years by other organisations, such as Cisco 43 . The book
lays out 13 stages for integrating practices into a software development lifecycle such that
the product is more secure. This book is out of print, but is avaialble as a free download44 .

The Security Development Lifecycle (Current Microsoft Resources) [1579]


The Microsoft SDL are practices that are used internally to build secure products and services,
and address security compliance requirements by introducing security practices throughout
every phase of the development process. This webpage is a continuously-updated version of
the seminal book [1572] based on Microsoft’s growing experience with new scenarios such
as the cloud, the Internet of Things (IoT) and Artificial Intelligence (AI).

Software Security Engineering: A Guide for Project Managers [1592]


This book is a management guide for selecting from among sound software development
practices that have been shown to increase the security and dependability of a software
product, both during development and subsequently during its operation. Additionally, this
book discusses governance and the need for a dynamic risk management approach for
identifying priorities throughout the product lifecycle.

Cyber Security Engineering: A Practical Approach for Systems and Software


Assurance [1616]
This book provides a tutorial on the best practices for building software systems that exhibit
superior operational security, and for considering security throughout your full system de-
velopment and acquisition lifecycles. This book provides seven core principles of software
assurance, and shows how to apply them coherently and systematically. This book addresses
important topics, including the use of standards, engineering security requirements for acquir-
ing COTS software, applying DevOps, analysing malware to anticipate future vulnerabilities,
and planning ongoing improvements.
43
https://www.cisco.com/c/en/us/about/trust-center/technology-built-in-security.html#~stickynav=2
44
https://blogs.msdn.microsoft.com/microsoft_press/2016/04/19/free-ebook-the-security-development-lifecycle/

KA Secure Software Lifecycle | July 2021 Page 588


The Cyber Security Body Of Knowledge
www.cybok.org

SAFECode’s Fundamental Practices for Secure Software Development: Es-


sential Elements of a Secure Development Lifecycle Program, Third Edition
[1600]
Eight practices for secure development are provided based upon the experiences of member
companies of the SAFECode organisation.

OWASP’s Secure Software Development Lifecycle Project (S-SDLC) [1580]


Based upon a committee of industry participants, the Secure-Software Development Lifecycle
Project (S-SDLC) defines a standard Secure Software Development Life Cycle and provides
resources to help developers know what should be considered or best practices at each
phase of a development lifecycle (e.g., Design Phase/Coding Phase/Maintain Phase/etc.) The
committee of industry participants are members of the Open Web Application Security Project
(OWASP)45 , an international not-for-profit organisation focused on improving the security of
web application software. The earliest secure software lifecycle contributions from OWASP
were referred to as the Comprehensive, Lightweight Application Security Process (CLASP).

Security controls
Government and standards organizations have provided security controls to be integrated in
a secure software or systems lifecyle:
1. The Trustworthy Software Foundation 46 provides the the Trustworthy Software Frame-
work (TSFr) 47 a collection of good practice, existing guidance and relevant standards
across the five main facets of trustworthiness: Safety; Reliability; Availability; Resilience;
and Security. The purpose of the TSFr is to provide a minimum set of controls such that,
when applied, all software (irrespective of implementation constraints) can be specified,
realised and used in a trustworthy manner.
2. The US National Institute of Standards and Technology (NIST) has authored the Sys-
tems Security Engineering Cyber Resiliency Considerations for the Engineering [1617]
framework (NIST SP 800-160). This Framework provides resources on cybersecurity
Knowledge, Skills and Abilitiess (KSAs), and tasks for a number of work roles for achiev-
ing the identified cyber resiliency outcomes based on a systems engineering perspective
on system life cycle processes.
3. The Software Engineering Institute (SEI) has collaborated with professional organisa-
tions, industry partners and institutions of higher learning to develop freely-available
curricula and educational materials. Included in these materials are resources for a
software assurance program48 to train professionals to build security and correct func-
tionality into software and systems.
4. The UK National Cyber Security Centre (NCSC)49 provide resources for secure software
development:
45
https://www.owasp.org/
46
https://tsfdn.org
47
https://tsfdn.org/ts-framework/
48
https://www.sei.cmu.edu/education-outreach/curricula/software-assurance/index.cfm
49
https://www.ncsc.gov.uk/

KA Secure Software Lifecycle | July 2021 Page 589


The Cyber Security Body Of Knowledge
www.cybok.org

(a) Application development50 : recommendations for the secure development, pro-


curement, and deployment of generic and platform-specific applications.
(b) Secure development and deployment guidance51 : more recommendations for the
secure development, procurement, and deployment of generic and platform-specific
applications.
(c) The leaky pipe of secure coding52 : a discussion of how security can be woven more
seamlessly into the development process, particularly by developers who are not
security experts.

Training materials
Training materials are freely-available on the Internet. Some sites include the following:
1. The Trustworthy Software Foundation provides a resource library 53 of awareness mate-
rials and guidance targeted for those who teach trustworthy software principles, those
who seek to learn about Trustworthy Software and those who want to ensure that the
software they use is trustworthy. The resources available include a mixture of documents,
videos, animations and case studies.
2. The US National Institute of Standards and Technology (NIST) has created the NICE
Cyber security Workforce Framework [1618]. This Framework provides resources on
cyber security Knowledge, Skills and Abilitiess (KSAs), and tasks for a number of work
roles.
3. The Software Engineering Institute (SEI) has collaborated with professional organisa-
tions, industry partners and institutions of higher learning to develop freely-available
curricula and educational materials. Included in these materials are resources for a
software assurance program54 to train professionals to build security and correct func-
tionality into software and systems.
4. SAFECode offers free software security training courses delivered via on-demand web-
casts55 .

50
https://www.ncsc.gov.uk/collection/application-development
51
https://www.ncsc.gov.uk/collection/developers-collection
52
https://www.ncsc.gov.uk/blog-post/leaky-pipe-secure-coding
53
https://tsfdn.org/resource-library/
54
https://www.sei.cmu.edu/education-outreach/curricula/software-assurance/index.cfm
55
https://safecode.org/training/

KA Secure Software Lifecycle | July 2021 Page 590


V Infrastructure Security

591
Chapter 18
Applied Cryptography
Kenneth G. Paterson ETH Zürich

593
The Cyber Security Body Of Knowledge
www.cybok.org

INTRODUCTION
This document provides a broad introduction to the field of cryptography, focusing on ap-
plied aspects of the subject. It complements the Cryptography Knowledge Area (Chapter 10)
which focuses on formal aspects of cryptography (including definitions and proofs) and on
describing the core cryptographic primitives. That said, formal aspects are highly relevant
when considering applied cryptography. As we shall see, they are increasingly important when
it comes to providing security assurance for real-world deployments of cryptography.
The overall presentation assumes a basic knowledge of either first-year undergraduate mathe-
matics, or that found in a discrete mathematics course of an undergraduate Computer Science
degree. Good cryptography textbooks that cover the required material include [963, 1619, 1620].
We begin by informally laying out the key themes that we will explore in the remainder of the
document.

Cryptography is a Mongrel
Cryptography draws from a number of fields including mathematics, theoretical computer
science and software and hardware engineering. For example, the security of many public key
algorithms depends on the hardness of mathematical problems which come from number
theory, a venerable branch of mathematics. At the same time, to securely and efficiently
implement such algorithms across a variety of computing platforms requires a solid under-
standing of the engineering aspects. To make these algorithms safely usable by practitioners,
one should also draw on usability and Application Programming Interface (API) design. This
broad base has several consequences. Firstly, almost no-one understands all aspects of the
field perfectly (including the present author). Secondly this creates gaps — between theory
and practice, between design and implementation (typically in the form of a cryptographic
library, a collection of algorithm and protocol implementations in a specific programming
language) and between implementations and their eventual use by potentially non-expert
developers. Thirdly, these gaps lead to security vulnerabilities. In fact, it is rare that standard-
ised, widely-deployed cryptographic algorithms directly fail when they are properly used. It is
more common that cryptography fails for indirect reasons — through unintentional misuse
of a library API by a developer, on account of bad key management, because of improper
combination of basic cryptographic algorithms in a more complex system, or due to some
form of side-channel leakage. All of these topics will be discussed in more detail.

Cryptography 6= Encryption
In the popular imagination cryptography equates to encryption; a cryptographic mechanism
providing confidentiality services. In reality, cryptography goes far beyond this to provide an
underpinning technology for building security services more broadly. Thus, secure communi-
cations protocols like Transport Layer Security (TLS) rely on both encryption mechanisms
(Authenticated Encryption, AE) and integrity mechanisms (e.g. digital signature schemes) to
achieve their security goals. In fact, in its most recent incarnation (version 1.3), TLS relies
exclusively on Diffie-Hellman key exchange to establish the keying material that it consumes,
whereas earlier versions allowed the use of public key encryption for this task. We will discuss
TLS more extensively in Section 18.5; here the point is that, already in the literally classic
application of cryptography, encryption is only one of many techniques used.

KA Applied Cryptography | July 2021 Page 594


The Cyber Security Body Of Knowledge
www.cybok.org

Moreover, since the boom in public research in cryptography starting in the late 1970s, re-
searchers have been incredibly fecund in inventing new types of cryptography to solve seem-
ingly impossible tasks. Whilst many of these new cryptographic gadgets were initially of purely
theoretical interest, the combination of Moore’s law and the growth of technologies such as
cloud computing has made some of them increasingly important in practice. Researchers
have developed some of these primitives to the point where they are efficient enough to be
used in large-scale applications. Some examples include the use of zero-knowledge proofs
in anonymous cryptocurrencies, the use of Multi-Party Computation (MPC) techniques to
enable computations on sensitive data in environments where parties are mutually untrusting,
and the (to date, limited) use of Fully Homomorphic Encryption (FHE) for privacy-preserving
machine learning.

Cryptography is Both Magical and Not Magical


Cryptography can seem magical in what it can achieve. Consider the millionaire’s problem:
two millionaires want to find out who is richer, without either telling the other how much they
are worth. This seems impossible, but it can be done in a reasonably efficient manner and
under mild assumptions. But there is no such thing as cryptographic “fairy dust” that can be
sprinkled on a bad (i.e. insecure) system to make it good (i.e. secure). Rather, at the outset of a
system design activity, cryptography should be thought of as being something that will work in
concert with other security building blocks to build a secure system (or more pessimistically, a
system in which certain risks have been mitigated). In this sense, cryptography makes systems
stronger by making certain attack vectors infeasible or just uneconomic to attack. Consider
again the secure communications protocol TLS. When configured properly, TLS offers end-to-
end security for pairs of communicating devices, providing strong assurances concerning
the confidentiality and integrity of messages (and more). However, it says nothing about the
security of the endpoints where the messages are generated and stored. Moreover, TLS does
not prevent traffic analysis attacks, based on analysing the number, size and direction of flow
of TLS-encrypted messages. So the use of cryptography here reduces the “attack surface” of
the communication system, but does not eliminate all possible attacks on the system, nor
does it aim to do so.
Developing the systemic view further, it is unfortunate that cryptography is generally brittle
and can fail spectacularly rather than gracefully. This can be because of a breakthrough in
cryptanalysis, in the sense of breaking one of the core cryptographic algorithms in use. For
example, there could be an unexpected public advance in algorithms for integer factorisation
that renders our current choices of key-sizes for certain algorithms totally insecure. This seems
unlikely — the last significant advance at an algorithmic level here was in the early 1990s with
the invention of the Number Field Sieve. So more likely this is because the cryptography is
provided in a way that makes it easy for non-experts to make mistakes, or the realisation of a
new attack vector enabling an attacker to bypass the cryptographic mechanism in use.
It is also an unfortunate fact that, in general, cryptography is non-composable, in the sense
that a system composed of cryptographic components that are individually secure (according
to some suitable formal definitions for each component) might itself fail to be secure in
the intended sense. A simple example arises in the context of so-called generic composition
of symmetric encryption and MAC algorithms to build an overall encryption scheme that
offers both confidentiality and integrity. Here, the “E&M” scheme obtained by encrypting the
plaintext and, in parallel, applying a MAC to the plaintext, fails to offer even a basic level of
confidentiality. This is because the MAC algorithm, being deterministic, will leak plaintext

KA Applied Cryptography | July 2021 Page 595


The Cyber Security Body Of Knowledge
www.cybok.org

equality across multiple encryptions. This example, while simple, is not artificial: the SSH
protocol historically used such an E&M scheme and only avoided the security failure due
to the inclusion of a per-message sequence number as part of the plaintext (this sequence
number was also needed to achieve other security properties of the SSH secure channel).
This example generalises, in the sense that even small and seemingly trivial details can have
a large effect on security: in cryptography, every bit matters.
In view of the above observations, applied cryptography is properly concerned with a broader
sweep of topics than just the low-level cryptographic algorithms. Of course these are still
crucial and we will cover them briefly. However, applied cryptography is also about the inte-
gration of cryptography into systems and development processes, the thorny topic of key
management and even the interaction of cryptography with social processes, practices and
relations. We will touch on all of these aspects.

Cryptography is Political
Like many other technologies, cryptography can be used for good or ill. It is used by human
rights campaigners to securely organise their protests using messaging apps like Telegram
and Signal [1621]. It is used by individuals who wish to maintain their privacy against the incur-
sions of tech companies. It enables whistle-blowers to securely communicate with journalists
when disclosing documents establishing company or governmental wrong-doing (see Privacy
& Online Rights Knowledge Area (Section 5.4)). But it can also be used by terrorists to plan
attacks or by child-abusers to share illegal content. Meanwhile cryptocurrencies can be used
by drug dealers to launder money [1622] and as a vehicle for extracting ransom payments.1
These examples are chosen to highlight that cryptography, historically the preserve of govern-
ments and their militaries, is now in everybody’s hands — or more accurately, on everybody’s
phone. This is despite intensive, expensive efforts over decades on the part of governments
to regulate the use of cryptography and the distribution of cryptographic technology through
export controls. Indeed, such laws continue to exist, and violations of them can produce
severe negative consequences so practitioners should be cautious to research applicable
regulation (see Law & Regulation Knowledge Area (Section 3.11.3) for further discussion of
this topic).
But the cryptographic genie has escaped the bottle and is not going back in. Indeed, crypto-
graphic software of reasonable quality is now so widespread that attempts to prevent its use
or to introduce government-mandated back-doors are rendered irrelevant for anyone with a
modicum of skill. This is to say nothing as to whether it is even possible to securely engineer
cryptographic systems that support exceptional access for a limited set of authorised parties,
something which experts doubt, see for example [1624]. Broadly, these efforts at control and
the reaction to them by individual researchers, as well as companies, are colloquially known
as The Crypto Wars. Sometimes, these are enumerated, though it is arguable that the First
Crypto War never ended, but simply took on another, less-intense, less-visible form, as became
apparent from the Snowden revelations [1625].
1
On the other hand, a 2019 RAND report [1623] concluded there is little evidence for use of cryptocurrencies
by terrorist groups.

KA Applied Cryptography | July 2021 Page 596


The Cyber Security Body Of Knowledge
www.cybok.org

The Cryptographic Triumvirate


A helpful classification of cryptographic applications arises from considering what is happen-
ing to the data. The classical cryptographic applications relate to data in transit, i.e. secure
communications. Cryptography can be applied to build secure data storage systems, in which
case we talk about data at rest. In fact these two application domains are quite close in terms
of the techniques they use. This is because, to a first-order approximation, one can regard
a secure storage system as a kind of communications channel in time. Finally, in the era of
cloud computing, outsourced storage and privacy-preserving computation, we have seen the
emergence of cryptography for data under computation. This refers to a broad set of tech-
niques enabling computations to be done on data that is encrypted — imagine outsourcing an
entire database to an untrusted server in such a way that database operations (insert, delete,
search queries, etc) can still be carried out without leaking anything about those operations —
or the data itself — to the server. Keeping in mind this triumvirate — data in transit, data at
rest, data under computation — can be useful when understanding what to expect in terms
of the security, performance and maturity of systems using cryptography. In short, systems
providing security for data in transit and data at rest are more mature, are performant at
scale and tend to be standardised. By contrast, systems providing security for data under
computation are largely in an emergent phase.
This classification focuses on data and therefore fails to capture some important applications
of cryptography such as user authentication2 and attestation.3

Organisation
Having laid out the landscape of Applied Cryptography, we now turn to a more detailed
consideration of sub-topics. The next section is concerned with cryptographic algorithms
and schemes — the building blocks of cryptography. It also discusses protocols, which typi-
cally combine multiple algorithms into a more complex system. In Section 18.2 we discuss
implementation aspects of cryptography, addressing what happens when we try to turn a
mathematical description of a cryptographic algorithm into running code. Cryptography simply
translates the problem of securing data into that of securing and managing cryptographic
keys, following Wheeler’s aphorism that every problem in computer science can be solved by
another level of indirection. We address the topic of key management in Section 18.3. Sec-
tion 18.4 covers a selection of issues that may arise for non-expert consumers of cryptography,
while Section 18.5 discusses a few core cryptographic applications as a means of showing
how the different cryptographic threads come together in specific cases. Finally, Section 18.6
looks to the future of applied cryptography and conveys closing thoughts.
2
Here, for example, FIDO is developing open specifications of interfaces for authenticating users to web-
based applications and services using public key cryptography.
3
This concept refers to methods by which a hardware platform can provide security guarantees to third
parties about how it will execute code. It is addressed in the Hardware Security Knowledge Area (Chapter 20).

KA Applied Cryptography | July 2021 Page 597


The Cyber Security Body Of Knowledge
www.cybok.org

18.1 ALGORITHMS, SCHEMES AND PROTOCOLS


[963, 1619, 1620]

18.1.1 Basic Concepts


In this subsection, we provide a brief summary of some basic concepts in cryptography. A
more detailed and formal introduction to this material can be found in Cryptography Knowledge
Area (Chapter 10).
Cryptographic algorithms are at the core of cryptography. There are many different classes of
algorithm and many examples within each class. Moreover, it is common to group algorithms
together to form cryptographic primitives or schemes. For example, a Public Key Encryption
(PKE) scheme consists of a collection of three algorithms: a key generation algorithm, an
encryption algorithm and a corresponding decryption algorithm.
Unfortunately, there is no general agreement on the terminology and the meanings of the terms
algorithm, scheme, primitive and even protocol overlap and are sometimes used interchange-
ably. We will reserve the term algorithm for an individual algorithm (in the computer science
sense — a well-defined procedure with specific inputs and outputs, possibly randomised).
We will use scheme to refer to a collection of algorithms providing some functionality (e.g.
as above, a PKE scheme) and protocol for interactive systems in which two or more parties
exchange messages.4 Such protocols are usually built by combining different cryptographic
schemes, themselves composed of multiple algorithms.
Most cryptographic algorithms are keyed (the main exception are hash functions). This means
that they have a special input, called a key, which controls the operation of the algorithm.
The manner in which the keying is performed leads to the fundamental distinction between
symmetric and asymmetric schemes. In a symmetric scheme the same key is used for two
operations (e.g. encryption and decryption) and the confidentiality of this key is paramount
for the security of the data which it protects. In an asymmetric scheme, different keys are
used for different purposes (e.g. in a PKE scheme, a public key is used for encryption and a
corresponding private key is used for decryption, with the key pair of public and private keys
being output by the PKE scheme’s key generation algorithm). The usual requirement is that
the private key remains confidential to the party who runs the key generation algorithm, while
the public key (as the name suggests) can be made public and widely distributed.
We now turn to discussion of the most important (from the perspective of applied cryptog-
raphy) cryptographic primitives and schemes. Our treatment is necessarily informal and
incomplete. Any good textbook will provide missing details. Martin’s book [1620] provides an
accessible and mostly non-mathematical treatment of cryptography. Smart’s book [963] is
aimed at Computer Science undergraduates with some background in mathematics. The text
by Boneh and Shoup [1619] is more advanced and targets graduate students.
4
Alternatively, a protocol is a distributed algorithm.

KA Applied Cryptography | July 2021 Page 598


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.2 Hash functions


A hash function is usually an unkeyed function H which takes as input bit-strings of variable
length and produces short outputs of some fixed length, n bits say.
A crucial security property is that it should be hard to find collisions for a hash function, that
is, pairs of inputs (m0 , m1 ) resulting in the same hash value, i.e. such that H(m0 ) = H(m1 )
(such collisions must exist because the output space is much smaller than the input space).
If the output size is n bits, then there is a generic attack based on the birthday paradox that
will find collisions with effort about 2n/2 hash function evaluations.5
Other important security properties are pre-image resistance and second pre-image resistance.
Informally, pre-image resistance says that it is hard, given an output from a hash function h,
to find an input m such that H(m) = h. Second pre-image resistance says that, given an m, it
is difficult to find m0 6= m such that H(m) = H(m0 ).
Hash functions are often modelled as random functions in formal security analyses, leading
to the Random Oracle Model. Of course, a given hash function is a fixed function and not
a random one, so this is just a heuristic, albeit a very useful one when performing formal
security analyses of cryptographic schemes making use of hash functions.
SHA-1 with n = 160 is a widely-used hash function. However, collisions for SHA-1 can be
found using much less than 280 effort, so it is now considered unsuitable for applications
requiring collision-resistance. Other common designs include SHA-256 (with n = 256 and still
considered very secure) and SHA-3 (with variable length output and based on different design
principles from the earlier SHA families). The SHA families are defined in a series of NIST
standards [954, 1626].

18.1.3 Block ciphers


A block cipher is a function taking as input a symmetric key K of k bits and a plaintext P
with n bits and producing as output a ciphertext C also of n bits. For each choice of K, the
resulting function, often written EK (·), is a permutation mapping n-bit strings to n-bit strings.
Since EK (·) is a permutation, it has an inverse, which we denote DK (·). We require that both
EK (·) and DK (·) be fast to compute.
Many security properties can be defined for block ciphers, but the most useful one for formal
security analysis demands that the block cipher be a Pseudo-Random Permutation (PRP).
Informally this means, if K is chosen uniformly at random, then no efficient adversary can tell
the difference between outputs of the block cipher and the outputs of a permutation selected
uniformly at random from the set of all permutations of n bits.
It is notable that block cipher designers do not typically target such a goal, but rather resistance
to a range of standard attacks. One such attack is exhaustive key search: given a few known
plaintext/ciphertext pairs for the given key, try each possible K to test if it correctly maps
the plaintexts to the ciphertexts. This generic attack means that a block cipher’s key length k
must be big enough to make the attack infeasible. The Data Encryption Standard (DES) had
5
The birthday paradox is a generalisation of the initially surprising observation that in a group of 23√
randomly
selected people there is a 50-50 chance of two people sharing a birthday; more generally, if we make N selec-
tions uniformly at random from a set of N objects, then there is a constant probability of two of the selections
being the same.

KA Applied Cryptography | July 2021 Page 599


The Cyber Security Body Of Knowledge
www.cybok.org

k = 56 which was considered by experts already too short when the algorithm was introduced
by the US government in the mid 1970s.
The Advanced Encryption Standard (AES) [1627] is now the most-widely used block cipher. The
AES was the result of a design competition run by the US government agency NIST. It has a
128-bit block (n = 128) and its key-length k is either 128, 192, or 256, precluding exhaustive key
search. Fast implementation of AES is supported by hardware instructions on many Central
Processing Unit (CPU) models. Fast and secure implementation of AES is challenging in
environments where an attacker may share memory resources with the victim, for example
a cache. Still, with its widespread support and lack of known security vulnerabilities, it is
rarely the case that any block cipher other than AES is needed. One exception to this rule is
constrained computing environments.
Except in very limited circumstances, block ciphers should not be used directly for encrypting
data. Rather, they are used in modes of operation [1628]. Modes are discussed further below
under Authenticated Encryption schemes.

18.1.4 Stream ciphers


Stream ciphers are algorithms that can encrypt a stream of bits (as opposed to a block of n
bits in the case of block ciphers) under the control of a k-bit key K. Most stream ciphers use
the key to generate a key-stream and then combine it in a bit-by-bit fashion with the plaintext
using an XOR operation, to produce the ciphertext stream.
Keystream reuse is fatal to security, since the XOR of two ciphertexts created using the same
keystream reveals the XOR of the plaintexts, from which recovering the individual plaintexts
becomes possible given enough plaintext redundancy [1629]. To avoid this, stream ciphers
usually employ an Initialisation Vector (IV) along with the key; the idea is that each choice of
IV should produce an independent keystream. IVs need not be kept secret and so can be sent
along with the ciphertext or derived by sender and receiver from context.
The main security requirement for a stream cipher is that, for a random choice of key K, and
each choice of IV, it should be difficult for an adversary to distinguish the resulting keystream
from a truly random bit-string of the same length.
It is easy to build a stream cipher from a block cipher by using a mode of operation; these
are discussed in Section 18.1.6.4. Dedicated stream ciphers suitable for implementation in
hardware have traditionally relied on combining simpler but insecure components such as
Linear Feedback Shift Registers. The A5/1 and A5/2 stream ciphers once widely used in
mobile telecommunications systems are of this type. The RC4 stream cipher was well-suited
for implementation in software and has a very simple description making it attractive to
developers. RC4 became widely used in IEEE wireless communications systems (WEP, WPA)
and in Secure Socket Layer/Transport Layer Security (SSL/TLS). It is now considered obsolete
because of a variety of security vulnerabilities that it presents in these applications.

KA Applied Cryptography | July 2021 Page 600


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.5 Message Authentication Code (MAC) schemes


MAC schemes are used to provide authentication and integrity services. They are keyed. A
MAC scheme consists of three algorithms. The first is called KeyGen for key generation. It is
randomised and usually consists of selecting a key K at random from the set of bit-strings of
some fixed length k. The second algorithm is called Tag. Given as input K and a message
m encoded as a bit-string, Tag produces a MAC tag τ , usually a short string of some fixed
length t. The third algorithm is called Verify. This algorithm is used to verify the validity of
a message/MAC tag combination (m, τ ) under the key K. So Verify takes as input triples
(K, m, τ ) and produces a binary output, with “1” indicating validity of the input triple.
The main security property required of a MAC scheme is that it should be hard for an adversary
to come up with a new pair (m, τ ) which Verify accepts with key K, even when the adversary
has already seen many pairs (m1 , τ1 ), (m2 , τ2 ), . . . produced by Tag with the same key K for
messages of its choice m1 , m2 , . . .. The formal security notion is called Strong Unforgeability
under Chosen Message Attack (SUF-CMA for short).
SUF-CMA MAC schemes can be used to provide data origin authentication and data integrity
services. Suppose two parties Alice and Bob hold a shared key K; then no party other than
those two can come up with a correct MAC tag τ for a message m. So if Alice attaches MAC
tags τ to her messages before sending them to Bob, then Bob, after running Verify and
checking that it accepts, can be sure the messages only came from Alice (or maybe himself,
depending on how restrictive he is in using the key) and were not modified by an attacker.
MAC schemes can be built from hash functions and block ciphers. HMAC [1630] is a popular
MAC scheme which, roughly speaking, implements its Tag algorithm by making two passes
of a hash function applied to the message m prefixed with the key K. The security analysis of
HMAC requires quite complex assumptions on the underlying hash function [1619, Section
8.7]. MAC schemes can also be built from so-called universal hash functions in combination
with a block cipher. Security then relies only on the assumption that the block cipher is a PRP.
Since universal hash functions can be very fast, this can lead to very efficient MACs. A widely
used MAC of this type is GMAC, though it is usually used only as part of a more complex
algorithm called AES-GCM that is discussed below.

18.1.6 Authenticated Encryption (AE) schemes


An Authenticated Encryption (AE) scheme [1631] is a symmetric scheme which transforms
plaintexts into ciphertexts and which simultaneously offers confidentiality and integrity prop-
erties for the plaintext data. After a long period of debate, AE has emerged as a powerful
and broadly applicable primitive for performing symmetric encryption. In most cases where
symmetric encryption is needed, AE is the right tool for the job.
An AE scheme consists of three algorithms: KeyGen, Enc and Dec. The first of these is
responsible for key generation. It is randomised and usually consists of selecting a key K at
random from the set of bit-strings of some fixed length k. Algorithm Enc performs encryption.
It takes as input a key K and a plaintext M to be encrypted. Practical AE schemes allow
M to be a bit-string of variable length. In the nonce-based setting, Enc has an additional
input called the nonce, denoted N , and usually selected from a set of bit-strings of some
fixed size n. In this setting, Enc is deterministic (i.e. it needs no internal randomness). In the
randomised setting, Enc is a randomised algorithm and does not take a nonce input. The
third algorithm in an AE scheme is the decryption algorithm, Dec. It takes as input a key K, a

KA Applied Cryptography | July 2021 Page 601


The Cyber Security Body Of Knowledge
www.cybok.org

ciphertext string C and, in the nonce-based setting, a nonce N . It returns either a plaintext
M or an error message indicating that decryption failed. Correctness of a nonce-based AE
scheme demands that, for all keys K, all plaintexts M and all nonces N , if running Enc on
input (K, M, N ) results in ciphertext C, then running Dec on input (K, C, N ) results in plaintext
M . Informally, correctness means that, for a given key, decryption “undoes” encryption.

18.1.6.1 AE Security

Security for nonce-based AE is defined in terms of the combination of a confidentiality property


and an integrity property.
Confidentiality for AE says, roughly, that an adversary learns nothing that it does not already
know from encryptions of messages. Slightly more formally, we consider an adversary that
has access to a left-or-right (LoR) encryption oracle. This oracle takes as input a pair of equal-
length plaintexts (M0 , M1 ) and a nonce N selected by the adversary; internally the oracle
maintains a randomly generated key K and a randomly sampled bit b. It selects message
Mb , encrypts it using Enc on input (K, Mb , N ) and returns the resulting ciphertext C to the
adversary. The adversary’s task is to make an estimate of the bit b, given repeated access
to the oracle while, in tandem, performing any other computations it likes. The adversary is
considered successful if, at the end of its attack, it outputs a bit b0 such that b0 = b. An AE
scheme is said to be IND-CPA secure (“indistinguishability under chosen plaintext attack”)
if no adversary, consuming reasonable resources (quantified in terms of the computational
resources it uses, the number of queries it makes and sometimes the bit-length of those
queries) is able to succeed with probability significantly greater than 1/2. An adversary can
always just make a complete guess b0 for the bit b and will succeed half of the time; hence
we penalise the adversary by demanding it do significantly better than this trivial guessing
attack. The intuition behind IND-CPA security is that an adversary, even with perfect control
over which pairs of messages get encrypted, cannot tell from the ciphertext which one of the
pair — the left message or the right message — gets encrypted each time. So the ciphertexts
do not leak anything about the messages, except perhaps their lengths.6
The integrity property says, roughly, that an adversary cannot create new ciphertexts that
decrypt to plaintexts, instead of producing decryption failures. Slightly more formally, we give
an adversary an encryption oracle; this oracle internally maintains a randomly generated key K
and on input (M, N ) from the adversary, runs Enc on input (K, M, N ) and returns the resulting
ciphertext C to the adversary. We also give the adversary a decryption oracle, to which it can
send inputs (C, N ). In response to such inputs, the oracle runs Dec on input (K, C, N ) (for the
same key K used in the encryption oracle) and gives the resulting output to the adversary
— this output could be a message or an error message. The adversary against integrity is
considered to be successful if it at some point sends an input (C, N ) to its decryption oracle
which results in an output that is a plaintext M and not an error message. Here, we require that
(C, N ) be such that C is not a ciphertext output by the encryption oracle when it was given
some input (M, N ) during the attack — otherwise the adversary can win trivially. Under this
restriction, for the adversary to win, the pair (C, N ) must be something new that the adversary
could not have trivially obtained from its encryption oracle. In this sense, C is a ciphertext
forgery for some valid plaintext, which the adversary may not even know before it sends (C, N )
for decryption. An AE scheme is said to be INT-CTXT secure (it has “integrity of ciphertexts”)
6
But note that hiding the length of data can be a more complicated task than hiding the data itself. Moreover,
just knowing the length of data can lead to significant attacks. Such attacks, more broadly covered under side-
channel attacks, are discussed in Section 18.2.3.

KA Applied Cryptography | July 2021 Page 602


The Cyber Security Body Of Knowledge
www.cybok.org

if no adversary, consuming reasonable resources (quantified as before) is able to succeed


with probability significantly greater than 0.
Similar security notions can be developed for the randomised setting.
An AE scheme is said to be secure (or AE secure) if it is both IND-CPA and INT-CTXT secure.
This combination of security properties is very strong. It implies, for example, IND-CCA security,
a traditional security notion which extends IND-CPA security by also equipping the adversary
with a decryption capability, capturing the capabilities that a practical attacker may have. It
also implies a weaker “integrity of plaintexts” notion, which roughly states that it should be
hard for the adversary to create a ciphertext that encrypts a new plaintext.

18.1.6.2 Nonces in AE

It is a requirement in the IND-CPA security definition that the nonces used by the adversary
be unique across all calls to its LoR encryption oracle. Such an adversary is called a nonce
respecting adversary. In practice, it is usually the responsibility of the application using an AE
scheme to ensure that this condition is met across all invocations of the Enc algorithm for a
given key K. Note that the nonces do not need to be random. Indeed choosing them randomly
may result in nonce collisions, depending on the quality of the random bit source used, the
size of the nonce space and the number of encryptions performed. For example, the nonces
could be invoked using a stateful counter. The core motivation behind the nonce-based setting
for AE is that it is easier for a cryptographic implementation to maintain state across all
uses of a single key than it is to securely generate the random bits needed to ensure security
in the randomised setting. This is debatable and nonce repetitions have been observed in
practice [1632]. For some AE schemes such as the widely deployed AES-GCM scheme, the
security consequences of accidental nonce repetition are severe, e.g. total loss of integrity
and/or partial loss of confidentiality. For this reason, misuse-resistant AE schemes have
been developed. These are designed to fail more gracefully under nonce repetitions, revealing
less information in this situation than a standard AE scheme might. They are generally more
computationally expensive than standard AE schemes. AES-GCM-SIV [1633] is an example of
such a scheme.

18.1.6.3 AE Variants

Many variants of the basic AE formulation and corresponding security notions have been devel-
oped. As an important example, Authenticated Encryption with Associated Data (AEAD) [1634]
refers to an AE extension in which an additional data field, the Associated Data (AD), is crypto-
graphically bound to the ciphertext and is integrity protected (but not made confidential). This
reflects common use cases. For example, we have a packet header that we wish to integrity
protect but which is needed in the clear to deliver data, and a packet body that we wish to both
integrity protect and make confidential. Even the basic AE security notion can be strengthened
by requiring that ciphertexts be indistinguishable from random bits or by considering security
in the multi-user setting, where the adversary interacts with multiple AE instantiations under
different, random keys and tries to break any one of them. The latter notion is important when
considering large-scale deployments of AE schemes. The two separate notions, IND-CPA and
INT-CTXT, can also be combined into a single notion [1635].

KA Applied Cryptography | July 2021 Page 603


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.6.4 Constructing AE Schemes

Secure AE (and AEAD) schemes can be constructed generically from simpler encryption
schemes offering only IND-CPA security and SUF-CMA secure MAC schemes. There are three
basic approaches: Encrypt-then-MAC (EtM), Encrypt-and-MAC (E&M) and MAC-then-Encrypt
(MtE). Of these, the security of EtM is the easiest to analyse and provides the most robust
combination, because it runs into fewest problems in implementations. Both MtE and E&M
have been heavily used in widely-deployed secure communications protocols such as SSL/TLS
and Secure Shell (SSH) with occasionally disastrous consequences [1636, 1637, 1638]. Broad
discussions of generic composition can be found in [1639] (in the randomised setting) and
[1640] (more generally).
This generic approach then leaves the question of how to obtain an encryption scheme
offering IND-CPA security. This is easily achieved by using a block cipher in a suitable mode of
operation [1628], for example, counter (CTR) mode or CBC mode. Such a mode takes a block
cipher and turns it into a more general encryption algorithm capable of encrypting messages
of variable length, whereas a block cipher can only encrypt messages of length n bits for some
fixed n. The IND-CPA security of the mode can then be proved based on the assumption that
the used block cipher is a PRP. Indeed, the nonce-based AEAD scheme AES-GCM [1628] can be
seen as resulting from a generic EtM construction applied using AES in a specific nonce-based
version of CTR mode and an SUF-CMA MAC constructed from a universal hash function based
on finite field arithmetic. AES-GCM is currently used in about 90% of all TLS connections on
the web. It has excellent performance on commodity CPUs) from Intel and AMD because of
their hardware support for the AES operations and for the finite field operations required by
the MAC. A second popular AEAD scheme, ChaCha20-Poly1305 [1641], arises in a similar way
from different underlying building blocks. The CAESAR competition7 was a multi-year effort to
produce a portfolio of AEAD schemes for three different use cases: lightweight applications,
high-performance applications and defence in depth (essentially, misuse-resistant AE).

18.1.7 Public Key Encryption Schemes and Key Encapsulation Mecha-


nisms
A Public Key Encryption (PKE) scheme consists of three algorithms: KeyGen, Enc and Dec.
The first of these is responsible for key generation. It is randomised and outputs key pairs
(sk, pk) where sk denotes a private key (often called the secret key) and pk denotes a public
key. The algorithm Enc performs encryption. It takes as input the public key pk and a plaintext
M to be encrypted and returns a ciphertext C. In order to attain desirable security notions
(introduced shortly), Enc is usually randomised. The plaintext M comes from some set of
possible plaintexts that the scheme can handle. Usually this set is limited and dictated by
the mathematics from which the PKE scheme is constructed. This limitation is circumvented
using hybrid encryption, which combines PKE and symmetric encryption to allow a more
flexible set of plaintexts. The third algorithm in a PKE scheme is the decryption algorithm,
Dec. It takes as input the private key sk and a ciphertext C. It returns either a plaintext M or
an error message indicating that decryption failed. Correctness of a PKE scheme requires
that, for all key pairs (sk, pk) and all plaintexts M , if running Enc on input (pk, M ) results in
ciphertext C, then running Dec on input (sk, C) results in plaintext M . Informally, correctness
means that, for a given key, decryption “undoes” encryption. Notice here the fundamental
asymmetry in the use of keys in a PKE scheme: pk is used during encryption and sk during
7
See https://competitions.cr.yp.to/caesar-submissions.html.

KA Applied Cryptography | July 2021 Page 604


The Cyber Security Body Of Knowledge
www.cybok.org

decryption.

18.1.7.1 PKE Security

There are many flavours of security for PKE. We focus on just one, which is sufficient for many
applications, and provide a brief discussion of some others.
Recall the definition of IND-CPA and IND-CCA security for AE schemes from Section 18.1.6.
Analogous notions can be defined for PKE. In the IND-CPA setting for PKE, we generate a key
pair (sk, pk) by running KeyGen and give the public key pk to an adversary (since public keys
are meant to be public!). The adversary then has access to an LoR encryption oracle which, on
input a pair of equal-length messages (M0 , M1 ), performs encryption of Mb under the public
key pk, i.e. runs the randomised algorithm Enc on input (pk, Mb ), to get a ciphertext C which
is then returned to the adversary. The adversary’s task is to make an estimate of the bit b,
given repeated access to the oracle while, in tandem, performing any other computations it
likes. The adversary is considered successful if, at the end of its attack, it outputs a bit b0 such
that b0 = b. A PKE scheme is said to be IND-CPA secure (“indistinguishability under chosen
plaintext attack”) if no adversary, consuming reasonable resources (quantified in terms of the
computational resources it uses and the number of queries it makes) is able to succeed with
probability significantly greater than 1/2. The intuition behind IND-CPA security for PKE is the
same as that for AE: even with perfect control over which pairs of messages get encrypted,
an adversary cannot tell from the ciphertext which one of the pairs is encrypted each time.
Note that in order to be IND-CPA secure, a PKE scheme must have a randomised encryption
algorithm (if Enc was deterministic, then an adversary that first makes an encryption query on
the pair (M0 , M1 ) with M0 6= M1 and then an encryption query on (M0 , M0 ) could easily break
the IND-CPA notion). If a PKE scheme is IND-CPA secure, then it must be computationally
difficult to recover pk from sk, since, if this were possible, then an adversary could first recover
sk and then decrypt one of the returned ciphertexts C and thereby find the bit b.
IND-CCA security for PKE is defined by extending the IND-CPA notion to also equip the adver-
sary with a decryption oracle. The adversary can submit (almost) arbitrary bit-strings to this
oracle. The oracle responds by running the decryption algorithm and returning the resulting
plaintext or error message to the adversary. To prevent trivial wins for the adversary and
therefore avoid a vacuous security definition, we have to restrict the adversary to not make
decryption oracle queries for any of the outputs obtained from its encryption oracle queries.
We do not generally consider integrity notions for PKE schemes. This is because, given the
public key pk, an adversary can easily create ciphertexts of its own, so no simple concept of
“ciphertext integrity” would make sense for PKE. Integrity in the public key setting, if required,
usually comes from the application of digital signatures, as discussed in Section 18.1.9. Digital
signatures and PKE can be combined in a cryptographic primitive called signcryption. This
can be a useful primitive in some use-cases, e.g. secure messaging (see Section 18.5.2).
In some applications, such as anonymous communications or anonymous cryptocurrencies,
anonymity of PKE plays a role. Roughly speaking, this says that a PKE ciphertext should not
leak anything about the public key pk that was used to create it. This is an orthogonal property
to IND-CPA/IND-CCA security. A related concept is robustness for PKE, which informally says
that a ciphertext generated under one public key pk should not decrypt correctly under the
private key sk 0 corresponding to a second public key pk 0 . Such a property, and stronger variants
of it, are needed to ensure that trial decryption of anonymous ciphertexts does not produce
unexpected results [1642].

KA Applied Cryptography | July 2021 Page 605


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.7.2 Key Encapsulation Mechanisms

A Key Encapsulation Mechanism (KEM) is a cryptographic scheme that simplifies the design
and use of PKE. Whereas a PKE scheme can encrypt arbitrary messages, a KEM is limited to
encrypting symmetric keys. One can then build a PKE scheme from a KEM and an AE scheme
(called a Data Encapsulation Mechanism, DEM, in this context): first use the KEM to encrypt a
symmetric key K, then use K in the AE scheme to encrypt the desired message; ciphertexts
now consist of two components: the encrypted key and the encrypted message.
We can define IND-CPA and IND-CCA security notions for KEMs. These are simpler to work with
than the corresponding notions for PKE and this simplifies security analysis (i.e. generating
and checking formal proofs). Moreover, there is a composition theorem for KEMs which says
that if one takes an IND-CCA secure KEM and combines it with an AE-secure DEM (AE scheme)
as above, then one gets an IND-CCA secure PKE scheme. As well as simplifying design and
analysis, this KEM-DEM or hybrid viewpoint on PKE reflects how PKE is used in practice.
Because PKE has generally slow algorithms and has large ciphertext overhead (compared to
symmetric alternatives like AE), we do not use it directly to encrypt messages. Instead, we use
a KEM to encrypt a short symmetric key and then use that key to encrypt our bulk messages.

18.1.7.3 Some common PKE schemes and KEMs

Perhaps the most famous PKE scheme is the RSA scheme. In its textbook form, the public
key consists of a pair (e, N ) where N is a product p · q of two large primes and the private
key consists of a value d such that de = 1 mod (p − 1)(q − 1). Encryption of a message M ,
seen as a large integer modulo N , sets C = M e mod N . On account of the mathematical
relationship between d and e, it can be shown that then M = C d mod N . So encryption is done
by “raising to the power of e mod N " and decryption is done by “raising to the power of d mod
N ". These operations can be carried out efficiently using the square-and-multiply algorithm
and its variants. Decryption can be accelerated by working separately modulo p and q and
then combining the results using the Chinese Remainder Theorem (CRT).
The security of RSA, informally, depends on the hardness of the Integer Factorisation Problem
(IFP): if an adversary can recover p and q from N , then it can recompute d from e, p and q
using the extended Euclidean algorithm. But what we really want is a converse result: if an
adversary can break RSA, then it should be possible to use that adversary in a black-box
manner as a subroutine to create an algorithm that factors the modulus N or solves some
other presumed-to-be-hard problem.
This textbook version of RSA is completely insecure and must not be used in practice. Notice,
for example, that it is not randomised, so it certainly cannot be IND-CPA secure. Instead RSA
must be used as the basis for constructing more secure alternatives. This is usually done by
performing a keyless encoding step, represented as a function µ(·), on the message before
applying the RSA transform. Thus we have C = µ(M )e mod N . Decryption then involves
applying the reverse transform and decoding.
In order to achieve modern security notions, µ(·) must be randomised. One popular encoding
scheme, called PKCS#1 v1.5 and specified in [1643], became very common in applications due
to its relatively early standardisation and its ease of implementation. Unfortunately, RSA with
PKCS#1 v1.5 encoding does not achieve IND-CCA security, as demonstrated by the famous
Bleichenbacher attack [1644]. Despite now being more than 20 years old, variants of the
Bleichenbacher attack still regularly affect cryptographic deployments, see for example [1645].

KA Applied Cryptography | July 2021 Page 606


The Cyber Security Body Of Knowledge
www.cybok.org

A better encoding scheme is provided in PKCS#1 v2.1 (also specified in [1643]), but it has not
fully displaced the earlier variant. RSA with PKCS#1 v2.1 encoding — also called RSA-OAEP
— can be proven to yield an IND-CCA secure PKE scheme but the best security proof we
have [1646] is unsatisfactory in various technical respects: its proof is not tight and requires
making a strong assumption about the hardness of a certain computational problem. Improved
variants with better security analyses do exist in the literature but have not found their way
into use.
One can easily build an IND-CCA secure KEM from the RSA primitive, as follows: select M at
random from {0, 1, . . . , N − 1}, set C = M e mod N (as in textbook RSA) and define K = H(M )
to be the encrypted symmetric key. Here H is a cryptographic hash function (e.g. SHA-256).
This scheme can be proven secure by modelling H as a random oracle, under the assumption
that the RSA inversion problem is hard. The RSA inversion problem is, informally, given e, M
and M e mod N for a random M , to recover M . The RSA inversion problem is not harder than
the IFP, since any algorithm to solve IFP can be used to construct an algorithm that solves the
RSA inversion problem. But the RSA inversion problem could be easier than the IFP and it is
an open problem to fully decide this question.
Note that RSA encryption is gradually being displaced in applications by schemes using Elliptic
Curve Cryptography (ECC) because of its superior performance and smaller key-sizes for a
given target security level. See Section 18.1.13 for further discussion.
We will discuss another class of PKE schemes, based on the Discrete Logarithm Problem
(DLP) after discussing Diffie-Hellman Key Exchange.

18.1.8 Diffie-Hellman Key Exchange


Diffie-Hellman Key Exchange (DHKE) is a fundamental tool in cryptography and was introduced
by Diffie and Hellman in their seminal paper on Public Key Cryptography [1647]. DHKE allows
two parties to set up a shared key based only on a public exchange of messages. First the
two parties (let us follow convention and call them Alice and Bob) agree on some common
parameters: a group G of prime order q and a generator g of that group. Typically, these are
agreed during the exchange of messages or are pre-agreed; ideally, they are standardised so
that well-vetted parameters of well understood cryptographic strength are used. Alice then
chooses a value x uniformly at random from {0, 1, . . . , q − 1}, computes g x using the group
operation in G and sends g x to Bob. Similarly Bob chooses a value y uniformly at random from
{0, 1, . . . , q − 1}, computes g y and sends it to Alice. Now Bob can compute (g x )y = g xy while
Alice can compute (g y )x = g yx = g xy . Thus the value g xy is a common value that both Alice
and Bob can compute.
An adversary Eve who eavesdrops on the communications between Alice and Bob can see g x
and g y and can be assumed to know the public parameters g, q and a description of the group
G. This adversary is then faced with the problem of computing g xy from the triple (g, g x , g y ).
Informally, this is known as the Computational Diffie-Hellman Problem (CDHP). One way for
Eve to proceed is to try to compute x from g x and then follow Alice’s computation. Computing
x from g and g x is known as the Discrete Logarithm Problem (DLP). The CHDP is not harder
than the DLP (since an algorithm to solve the latter can be used to solve the former) but the
exact relationship is not known.
The traditional setting for DHKE is to select large primes p and q such that q divides p − 1 and
then take g to be an element of multiplicative order q modulo p; such a g generates a group of

KA Applied Cryptography | July 2021 Page 607


The Cyber Security Body Of Knowledge
www.cybok.org

order q. By choosing q and p of an appropriate size, we can make the DLP, and presumably the
CDHP, hard enough to attain a desired security level (see further discussion in Section 18.1.13).
An alternative that has largely now displaced this traditional “finite field Diffie-Hellman” setting
in applications is to use as G the group of points on an elliptic curve over a finite field. This
allows more efficient implementation and smaller key sizes at high security levels, but comes
with additional implementation pitfalls.
The raw DHKE protocol as described directly above is rarely used in practice, because it
is vulnerable to active Man-in-the-Middle (MitM) attacks, in which the adversary replaces
the values g x and g y exchanged in the protocol with values for which it knows the discrete
logarithms. However, the core idea of doing “multiplication in the exponent” is used repeatedly
in cryptographic protocols. MitM attacks are generally prevented by adding some form of
authentication (via MACs or digital signatures) to the protocol. This leads to the notion of
Authenticated Key Exchange (AKE) protocols — see [1648] for a comprehensive treatment.
It is important also for Alice and Bob to use trusted parameters, or to verify the cryptographic
strength of the parameters that they receive, in DHKE. This can be a complex undertaking
even in the traditional setting, since a robust primality test is needed [1649, 1650]. In the
elliptic curve setting, it is not reasonable to expect the verification to be done by protocol
participants and we use one of a small number of standardised curves. It is also important
for the respective parties to check that the received values g y and g x do lie in the expected
group, otherwise the protocol may be subject to attacks such as the small sub-group attack.
These checks may be computationally costly.

18.1.8.1 From Diffie-Hellman to ElGamal

It is easy to build a KEM from the DHKE primitive. We simply set the KeyGen algorithm to
output a key pair (sk, pk) = (y, g y ) (where y is generated as in DHKE), while Encrypt selects x
as in DHKE and then simply outputs the group element g x as the ciphertext. Finally, Decrypt
takes as input a group element h and outputs KDF(hy ) where KDF(·) denotes a suitable key
derivation function (as covered in Section 18.3.2). So the symmetric key encapsulated by
group element g x is KDF(g xy ).
From this KEM, using the standard KEM-DEM construction, we obtain a variant of the ElGamal
encryption scheme [1651] called the Diffie-Hellman Integrated Encryption Scheme (DHIES)
and analysed in [1652]. In the elliptic curve setting, the scheme is known as ECIES. It is a
particularly neat PKE scheme with compact ciphertexts and strong security properties. It
avoids many of implementation issues associated with standardised variants of RSA.

18.1.9 Digital Signatures


Digital signatures schemes are used to provide authentication, integrity and non-repudiation
services. They are the asymmetric analogue of MACs. A digital signature scheme consists
of three algorithms: KeyGen, Sign and Verify. The first, KeyGen, is responsible for key
generation. It is randomised and outputs key pairs (sk, vk) where sk denotes a private key
(often called the secret key or signing key) and vk denotes a verification key. The second
algorithm Sign takes as input sk and a message m encoded as a bit-string and produces a
signature σ, usually a short string of some fixed length t. The third algorithm is called Verify.
This algorithm is used to verify the validity of a message/signature combination (m, σ) under
the key vk. So Verify takes as input triples (vk, m, σ) and produces a binary output, with “1”

KA Applied Cryptography | July 2021 Page 608


The Cyber Security Body Of Knowledge
www.cybok.org

indicating validity of the input triple.


The main security property required of a digital signature scheme is that it should be hard
for an adversary to come up with a new pair (m, σ) which Verify accepts with key vk, even
when the adversary has already seen many pairs (m1 , σ1 ), (m2 , σ2 ), . . . produced by Sign with
the matching signing key sk for messages of its choice m1 , m2 , . . .. As for MACs, the formal
security notion is called Strong Unforgeability under Chosen Message Attack (SUF-CMA for
short).
SUF-CMA digital signature schemes can be used to provide data origin authentication and
data integrity services as per MACs. In addition, they can offer non-repudiation: if Alice is
known to be associated with a key pair (sk, vk) and provided sk has not been compromised,
then Alice cannot deny having created a valid message/signature pair (m, σ) which Verify
accepts. In practice, making this non-repudiation property meaningfully binding (e.g. for it
to have legal force) is difficult. See further discussion in Law & Regulation Knowledge Area
(Section 3.10.4).
A common pitfall is to assume that a signature σ must bind a message m and a verification
key vk; that is, if Verify accepts on input (vk, m, σ), then this implies that σ must have been
produced using the corresponding signing key sk on message m. In fact, the SUF-CMA security
definition does not imply this, as it only refers to security under a single key pair (sk, vk) but
does not rule out the possibility that, given as input a valid triple (vk, m, σ), an adversary can
concoct an alternative key pair (sk 0 , vk 0 ) such that Verify also accepts on input (vk 0 , m, σ).
This gap leads to Duplicate Signature Key Selection (DSKS) attacks, see [1653] for a detailed
treatment. Such attacks can lead to serious vulnerabilities when using digital signatures in
more complex protocols.
Signature schemes can be built from the same kinds of mathematics as PKE schemes.
For example, the textbook RSA signature scheme signs a message M by computing σ =
H(m)d mod N where H is a cryptographic hash function. Here d is the signing key and the
verification key is a pair (e, N = pq) such that de = 1 mod (p − 1)(q − 1). Then verification of a
purported signature σ on a message m involves checking the equality σ e = H(m) mod N . The
similarity to RSA encryption, wherein signing uses the same operation of “raising to the power
d mod N ” as does decryption in textbook RSA, is not coincidental. It is also a source of much
confusion, since in the case of general signature schemes, signing is not related to any PKE
decryption operation. A variation of textbook RSA in which H(m) is replaced by a randomised
encoding of the hashed message according to the PKCS#1 version 1.5 encoding scheme for
signatures [1643] is widely used in practice but has significant security shortcomings and
lacks a formal security proof. The RSA-PSS encoding scheme, another randomised variant, is
also specified in [1643]; it does permit a formal security proof of security [1654].
The DSA scheme [1655] works in the finite field Diffie-Hellman setting, while ECDSA translates
DSA to the elliptic curve setting. Notably, despite their standardised form and widespread
use, neither DSA nor ECDSA enjoys a fully satisfactory UF-CMA security proof. The EdDSA
signature scheme [1656] is a variant of ECDSA that arguably enjoys better implementation
security than ECDSA. In particular, ECDSA has a randomised signing algorithm and its security
fails spectacularly (allowing recovery of the signing key) if the same random value is used to
sign two different messages; ECDSA is also vulnerable to attacks exploiting partial leakage
of the random values used during signing. By contrast, EdDSA is a deterministic scheme
and avoids the worst of these failure modes. EdDSA, being more closely based on Schnorr
signatures than ECDSA, also enjoys security proofs based on assumptions that are milder
than those needed for proving the security of ECDSA.

KA Applied Cryptography | July 2021 Page 609


The Cyber Security Body Of Knowledge
www.cybok.org

Interestingly, signature schemes can be built from symmetric primitives, specifically hash
functions. The original idea goes back to Lamport [1657]: to be able to sign a single bit message,
commit in the verification key to two values h0 = H(M0 ) and h1 = H(M1 ), where M0 encodes
a zero-bit and M1 encodes a one-bit. So the verification key is (h0 , h1 ) and the signing key is
(M0 , M1 ). Now to sign a single bit b, the signer simply outputs Mb ; the verification algorithm
checks the relation hb = H(Mb ), outputting “1” if this holds. The original Lamport scheme
is one-time use only and only signs a single-bit message. But many enhancements have
been made to it over the years bringing it to a practically usable state. A specific hash-based
signature scheme SPHINCS+ is an “alternate candidate” in the NIST PQC process for selecting
post-quantum secure schemes, see Section 18.1.16 for further discussion.
Many forms of signature scheme with advanced security properties have been researched
and sometimes find their way into use, especially in privacy-oriented applications. For ex-
ample, blind signatures [1658] allow a party to obtain signatures on messages without the
signer knowing which message is being signed. Blind signature schemes can be used as a
building block in electronic cash and electronic voting schemes. As a second example, group
signatures allow one of many parties to sign messages in such a way that the verifier cannot
tell exactly which party produced the signature; meanwhile a group manager can “break” this
anonymity property. Group signatures have been used in the Trusted Computing Group’s Direct
Anonymous Attestation protocol [1396] to enable remote authentication of Trusted Platform
Modules whilst preserving privacy. A third example is provided by ring signatures [1659], which
have functionality similar to group signatures but without the opening capability possessed by
a group manager. The cryptocurrency Monero has used ring signatures to provide anonymity.

18.1.10 Cryptographic Diversity


In the preceding subsections we have focused on the main algorithms and schemes that
will be encountered in today’s deployed cryptographic systems. However, a brief glance at
the literature will reveal that this is just the tip of the iceberg in terms of what cryptographic
ideas have been researched. But the vast majority of these ideas are not standardised, or
do not have readily available implementations of production quality, or are not fully fleshed
out in a way that a software engineer could easily implement them. There is a long road
from cryptography as presented in most research papers and cryptography as it needs to be
packaged for easy deployment.

18.1.11 The Adversary


So far, we have referred to the adversary as an abstract entity. In practice there are adversaries
with different motivations, levels of skill and available computational resources. In applied
cryptography we generally want to design cryptographic components that are usable across
a wide variety of applications, so we should be conservative in how we model the adversary,
assuming it is capable of marshalling significant resources.
At the same time, for the most part, we cannot achieve unconditional security in practical
systems, that is security against adversaries with unbounded computational power. This is
because such systems consume large amounts of key material that cannot be established
using public key methods. So we have to place some limits on the adversary’s computational
capabilities. A typical objective is to make the adversary expend work comparable to an
exhaustive key search for a block cipher like AES with 128-bit keys, in which case we speak of

KA Applied Cryptography | July 2021 Page 610


The Cyber Security Body Of Knowledge
www.cybok.org

a “128-bit security level”.8 Such a computational feat seems still well beyond the horizon of even
state security agencies. It’s naturally hard to estimate their capabilities but, for comparison,
the number of hash computations carried out in global Bitcoin mining currently stands at
around 267 per second and has the electricity consumption of a small sovereign state. At that
rate, and assuming the cost of computing a hash is the same as that of testing an AES key,
an exhaustive key search would still require 236 , or about 1011 , years.9 If we are even more
conservative, or want a very large security margin over a long period of time (during which
large-scale quantum computers may become available), or are concerned about concrete
security in the context of multi-target attacks, then aiming for 192-bit or 256-bit security may
be attractive.
We should also be conservative in rejecting algorithms and schemes that have known weak-
nesses, even if seemingly minor. It is a truism that attacks in cryptography only get stronger
with time, either due to computational advances or the introduction of new cryptanalytic
ideas. This conservatism is in tension with the fact that replacing one cryptographic scheme
with another can be costly and time-consuming, unless cryptographic agility is built into our
system (see Section 18.1.14 for further discussion). We are often encumbered with legacy
cryptographic systems that cannot be easily updated, or where the system owners do not see
the value in doing so until a practical break is exhibited.

18.1.12 The Role of Formal Security Definitions and Proofs


We have described in the preceding subsections, in an informal manner, syntax and security
definitions for the main cryptographic schemes. These informal definitions are backed by
fully formal ones, see for example [1619, 1660]. The value of such definitions are manifold.
Syntax and correctness definitions enable one to be precise about what behaviour to expect
from a cryptographic scheme and to build schemes out of simpler components in a precise
way. Security definitions allow different schemes to be compared and to check whether
application requirements will be met. Strong security definitions can rule out many classes
of practical attack. A security proof — showing that a given scheme satisfies a given formal
security definition under certain assumptions — then gives assurance as to the soundness of
a scheme’s design and makes it clear what assumptions security rests on.
Indeed, formal security analysis in Applied Cryptography has reached a maturity level where
there should be no need to ever deploy a cryptographic scheme or protocol that does not
come with a clear syntax and have a rigorous security proof under clearly stated assumptions.
Unfortunately, this rule is still often ignored in practice. This does leave opportunities for
post hoc analysis by researchers, either to find flaws or to provide positive security results. In
an ideal world, one would first gather all the requirements for any new scheme or protocol,
then design the system and simultaneously develop security proofs for it. In reality, one is
often trapped in a design-release-break-fix cycle. An intermediate approach of design-break-
fix-release was used for TLS 1.3. Further discussion comparing these different models of
cryptographic development can be found in [1661].
8
It is difficult to be precise about concrete security levels. Issues arise because of different cost models for
computation, e.g. counting the unit of cost as being an AES operation versus a single CPU) instruction; on some
platforms, an AES round operation can be performed via a single CPU instruction! One also needs to account
for the use of CPUs, GPUs and special purpose hardware for cryptanalysis, such as might be within the reach
of a well-funded security agency.
9
It is worth noting that the sun is expected to consume the earth in a super nova in about 1010 years, assuming
it is not swallowed by a black hole first.

KA Applied Cryptography | July 2021 Page 611


The Cyber Security Body Of Knowledge
www.cybok.org

Notice also that such proofs are not unconditional, unlike most proofs in mathematics. A
typical proof shows that a given scheme or protocol satisfies a particular security definition
under some assumptions. Such proofs are often stated in a reductive fashion (i.e. as with
reductions from complexity theory): given any adversary in the form of an arbitrary algorithm
that can break the scheme according to the security definition, the proof shows that the
adversary A can be used as a subroutine in building an algorithm B that can break one of the
components of the scheme (e.g. find a collision in a hash function) or in building a different
algorithm C that can break some underlying hardness assumption (e.g. solve the IFP for
moduli n with distribution given by the KeyGen algorithm of a PKE scheme). For Applied
Cryptography, concrete reductions are to be preferred. In our example, these are ones in which
in which we eschew statements describing B or C as simply being “polynomial time” but
in which the resources (computation, storage, etc) consumed by the adversary A (and its
advantage in breaking the scheme) are carefully related to those of algorithms B and C.
Furthermore, it is preferable that proofs should be tight. That is, we would like to have proofs
showing a close relationship between the resources consumed by and advantage of adversary
A on the one hand, and the resources consumed by and advantages of algorithms B and
C constructed from A on the other. The result of having a tight proof is that the scheme’s
security can be meaningfully related to that of its underlying components. This is not always
achieved, resulting in proofs that may be technically vacuous.
For complex cryptographic schemes and protocols, the security statements can end up being
difficult to interpret, as they may involve many terms and each term may relate to a different
component in a non-trivial way. Such security statements typically arise from proofs with
many hand-written steps that can hide errors or be difficult for humans to verify. Typically
though, such proofs are modularised in a sequence of steps that can be individually checked
and updated if found to be in error. A popular approach called “game hopping” or “sequences
of games”, as formalised in [1662, 1663] in two slightly different ways, lends itself to the
generation of such proofs. An alternative approach to taming the complexity of proofs comes
from the use of formal and automated analysis methods, see Formal Methods for Security
Knowledge Area (Chapter 13) for an extensive treatment.
The proofs are usually for mathematically tractable pseudo-code descriptions of the schemes,
not for the schemes as implemented in some high-level programming language and certainly
not for schemes as implemented in a machine language. So there is a significant gap in terms
of what artefacts the proofs actually make statements about. Researchers have had some
success in developing tools that can prove the security of running code and some code of this
type has been deployed in practice; for a good overview, see [1664]. Furthermore, a security
proof only gives guarantees concerning the success of attacks that lie within the scope of the
model and says nothing about what happens beyond that. For example, an adversary operating
in the real world may have greater capabilities than are provided to it in the security model
used for proving security. We shall return to these issues in the next section on cryptographic
implementation.
A sustained critique of the provable security approach has been mounted by Koblitz and
Menezes in their “Another look at . . .” series of papers, see [1665] for a retrospective. This
critique has not always been welcomed by the theoretical cryptography research community,
but any serious field should be able to sustain, reflect on and adapt to such critique. In our view,
the work of Koblitz and Menezes has helped to bridge the gap between theory and practice
in cryptography, since it has helped the community to understand and begin to address the
limitations of its formal foundations.

KA Applied Cryptography | July 2021 Page 612


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.13 Key Sizes


We have discussed why aiming for the 128-bit security level makes sense — it provides
a sufficient margin of security in almost every conceivable circumstance, at least within
the realm of conventional computing. Another reason is that except in specific application
domains such as constrained environments, it is efficiently achievable. In other words, there
is no good reason not to aim this high.
Algorithms and their keys do have finite lifetimes. Advances in cryptanalysis may render once
secure algorithms or key sizes insecure. Moreover, the longer an individual key is in use, the
more likely it is to become compromised. These issues for individual keys are discussed in
more detail in Section 18.3. Changing algorithms and key sizes can be inconvenient and costly.
This provides arguments in favour of making conservative choices in the first place.
The target security level of 128 bits brings efficiency considerations into play, especially for
asymmetric algorithms. For example, it is estimated that forcing a direct attack on the IFP to
break RSA cost 2128 effort would require the use of a 3072-bit modulus [1666, Table 2].10 This is
because of the sub-exponential complexity of the best known algorithm for solving the IFP (the
Number Field Sieve). Such a modulus size is large enough to significantly slow down the basic
RSA operations (in the general case, the complexity of modular exponentiation grows with the
cube of the modulus bit length). The same is true for finite-field DLP-based cryptosystems
(e.g. Diffie-Hellman and ElGamal). By contrast, because only square-root speed-ups exist for
the ECDLP, we can escape with much smaller parameters when targeting 128-bit security for
ECC: a curve over a 256-bit prime field suffices. So, for the standard 128-bit security level,
ECC-based schemes become more efficient than IFP-based or finite field DLP-based ones.
The contrast is even more stark if one targets a 256-bit security level: there 15360-bit RSA
public keys are recommended by NIST [1666, Table 2], while the required size for ECC only
moves up to 512 bits.
These considerations explain the recent rise in popularity of ECC and may lead to the slow
death of RSA-based cryptosystems. The US National Security Agency (NSA) has recommended
organisations who have not already done so to not make a significant expenditure to transition
from RSA to ECC, but to wait for post-quantum algorithms (i.e. algorithms that aim to be
secure against large-scale quantum computers) that should result from the on-going NIST
standardisation effort.11

18.1.14 Cryptographic Agility


Occasionally it is necessary in some system or protocol to exchange one algorithm for
another in the same class. One reason might be that the original algorithm is broken. The
history of hash functions provides notable examples, with once secure hash functions like
MD5 now being considered trivially insecure. Another reason might be that a more efficient
alternative becomes available — consider the switch from RSA-based algorithms to ECC-
based ones discussed in Section 18.1.13. A third reason is the introduction of a technology
shift — for example, a precautionary shift to post-quantum algorithms as a hedge against the
development of large-scale quantum computers.
This exchange is made easier if the system or protocol is cryptographically agile — that is,
10
Other estimates are available, see summary at https://www.keylength.com/en/, but all estimates are in the
same ballpark.
11
See https://apps.nsa.gov/iad/programs/iad-initiatives/cnsa-suite.cfm.

KA Applied Cryptography | July 2021 Page 613


The Cyber Security Body Of Knowledge
www.cybok.org

if it has an in-built capability to switch one algorithm for another and/or from one version
to another. This facility is enabled in secure communications protocols like IPsec, SSH and
SSL/TLS through cipher suite and version negotiation: the algorithms that will be used and the
protocol version are negotiated between the participating parties during the protocol execution
itself. Adding this facility to an already complex protocol may introduce security vulnerabilities,
since the negotiation mechanisms themselves may become a point of weakness. An example
of this is downgrade attacks in the context of SSL/TLS, which have exploited the co-existence
of different protocol versions [1667] as well as support for deliberately weakened “EXPORT”
cipher suites [438, 1668]. Cryptographic agility may also induce software bloat as there is an
incipient temptation to add everyone’s favourite algorithm.
At the opposite end of the spectrum from cryptographic agility lies systems (and their de-
signers) that are cryptographically opinionated, that is, where a single set of algorithms is
selected and hard-coded. WireGuard [1669] is an example of such a protocol: it has no facility
to change algorithms and not even a protocol version field.
There is a middle-way: support cryptographic agility where possible, but with tight control over
which algorithms and legacy versions are supported.
For more information on cryptographic agility, especially in the post-quantum setting, we
recommend [1670].

18.1.15 Development of Standardised Cryptography


Standardisation plays an important role in cryptography. Standards provide a suite of carefully
vetted primitives, higher-level protocols and cryptographic best practices that can be used by
non-expert developers. They can also help to ensure interoperability, not only by providing
complete specifications of algorithms but also by helping to identify a smaller set of algorithms
on which implementers can focus.
There are multiple standardisation bodies producing cryptographic standards. Most prominent
are ISO/IEC, the US National Institute for Standards and Technology (NIST) and the Internet
Engineering Task Force (IETF). ISO/IEC and NIST cryptographic standards tend to focus
(though not exclusively) on lower-level primitives, while IETF works more at the protocol level.
The three bodies work in quite different ways.
ISO/IEC uses a closed model, where country representatives come together to produce
standards through a multi-stage drafting and voting process. ISO/IEC working groups can
and do invite external experts to attend their meetings and provide input.
NIST employees directly write some of their standards, with open calls for comment from
the wider community. NIST also runs cryptographic competitions in which external teams of
researchers compete to meet a design specification. The AES and SHA-3 algorithms were
produced in this way. Although NIST is a US federal government body, its standards tend to
become de facto international standards. Its competitions are also entered by teams from all
around the world and the winners are frequently not from the US.
The IETF model is completely open. Anyone can join an IETF mailing list and join a technical
discussion or, given financial resources, attend IETF meetings where its standards are devel-
oped. For a document to become an IETF standard (technically, a “Request for Comments" or
RFC), the key requirement is “rough consensus and running code”. Multiple levels of review
and consensus-building are involved before a draft document becomes an RFC. The Internet

KA Applied Cryptography | July 2021 Page 614


The Cyber Security Body Of Knowledge
www.cybok.org

Research Task Force (IRTF) is a sister-organisation to the IETF and its Crypto Forum Research
Group (CFRG)12 acts as a repository of expertise on which the IETF can draw. CFRG also
produces its own RFCs.
Standards bodies are not perfect. Too many bodies — and standards produced by them — can
lead to cryptographic proliferation, which makes inter-operation harder to achieve.They can
also lead to subtle incompatibilities between different versions of the same algorithms. Even
completely open standards bodies may fail to gather input from the right set of stakeholders.
Standards bodies can act prematurely and standardise a version of a scheme that is later
found to be deficient in some way, or where improved options only emerge later. Once the
standard is set, in the absence of serious attacks, there may be little incentive to change it.
The history of PKE schemes based on RSA illustrates this well (see Section 18.1.7.3): RSA with
PKCS#1 v1.5 encoding has led to many security issues and the introduction of attack-specific
work-arounds; RSA with PKCS#1 v2.1 encoding (RSA-OAEP) has been standardised for many
years but has not become widely used; meanwhile even better ways of turning RSA into a
KEM or a PKE have been discovered but have not become mainstream.
Standards bodies are also subject to “regulatory capture”, whereby groups representing
specific national or commercial interests have the potential to influence the work of a standards
body. For example, NSA had a role in the design of the DES algorithm [1671, pp. 232-233], and, on
another occasion, supplied the overall design of the Dual_EC_DBRG pseudorandom generator
that was specified in a NIST standard [1672], along with certain critical parameters [1673, p.
17]. In such contexts, transparency as to the role of any national or commercial stakeholders
is key. For instance, NIST have reviewed their cryptographic standardisation process [1673] to
increase transparency and decrease reliance on external organisations.
Other standards bodies relevant for cryptography include ETSI (which is active in post-quantum
cryptographic standardisation, as discussed immediately below) and IEEE (which developed
an early series of standards for Public Key Cryptography, IEEE P1363).

18.1.16 Post-quantum Cryptography


Large-scale quantum computers, if they could be built, would severely threaten RSA-based
and discrete-log-based cryptographic algorithms in both finite field and elliptic curve settings.
This includes almost all of the Public Key Cryptography in use today. This is because of Shor’s
algorithm [1674], a quantum algorithm that leads to polynomial-time algorithms for both IFP
and DLP in both finite field and the elliptic curve settings. This stands in strong contrast to
the situation with the best known classical, non-quantum, algorithms for these problems
which are super-polynomial, but sub-exponential for IFP and the DLP in finite fields (and fully
exponential for the DLP in the elliptic curve setting). Quantum computers also have some
consequences for symmetric algorithms due to Grover’s algorithm [1675], which in theory
provides a quadratic speed-up for exhaustive key search. However, the impact there is less
substantial — as a rule of thumb, doubling the key size is sufficient to thwart quantum attacks.
Post-quantum cryptography (PQC) refers to the study of cryptographic algorithms and
schemes that are plausibly secure against large-scale quantum algorithms. Such algorithms
and schemes are still classical: they are designed to run on classical computers. We use the
term “plausibly” here, since the field of quantum algorithms is still young and it is hard to
anticipate future developments. Note that PQC is sometimes referred to as quantum-safe,
12
See https://irtf.org/cfrg.

KA Applied Cryptography | July 2021 Page 615


The Cyber Security Body Of Knowledge
www.cybok.org

quantum resistant, or quantum-immune cryptography. PQC has been an active but niche
research field for many years.
In late 2016, in response to projected progress in scaling quantum computing and recognising
the long transition times needed for introducing new cryptographic schemes, NIST launched
a process to define a suite of post-quantum schemes.13 The focus of the NIST process is
on KEMs and digital signature schemes, since the threat quantum computing poses for
symmetric schemes is relatively weaker than it is for public key schemes. At the time of
writing in mid 2021, the process (actually a competition) has entered its third round and a set
of finalist schemes has been selected, alongside a set of alternate, or back-up schemes. The
NIST process should result in new NIST standards in the mid 2020s.
It will be a significant challenge to integrate the new schemes into widely-used protocols and
systems in a standardised way. This is because the NIST finalists have quite different (and
usually worse) performance profiles, in terms of key sizes, ciphertext or signature size and
computation requirements, from existing public key schemes. Work is underway to address
this challenge in IETF and ETSI and some deployment experiments have been carried out for
the TLS protocol by Google and CloudFlare.14 It is likely that post-quantum schemes will initially
be deployed in hybrid modes alongside classical public key algorithms, to mitigate against
immaturity of implementation and security analysis. The recent deployment experiments used
hybrid modes.

18.1.17 Quantum Key Distribution


PQC should be distinguished from quantum cryptography, which attempts to harness quan-
tum effects to build secure cryptographic schemes. The most mature branch of quantum
cryptography is Quantum Key Distribution (QKD). A QKD protocol typically uses polarised
photons to transmit information from one party to another, in such a way that an eavesdropper
trying to intercept the signals will disturb them in a detectable way. The two parties can engage
in a resolution protocol over a classical, authenticated channel that leads to them agreeing
on keying material about which even a computationally unbounded adversary has minimal
information.
QKD is often marketed as offering unconditional security assuming only the correctness of
the known laws of physics. In terms of commercial deployment, QKD faces severe challenges.
The main one is that it does not solve a problem that we cannot solve satisfactorily using
other means. Moreover, those means are already commoditised. Subsidiary issues are: the
need for expensive special-purpose hardware, the need for an authentic channel (if we have
such a channel, then we could use it to distribute public keys instead), limitations on range
that stand in opposition to standard end-to-end security requirements, limitations on the rate
at which secure keys can be established (leading to hybrid QKD/classical systems, thereby
obviating any unconditional security guarantees), and the fact that theoretical guarantees of
unconditional security are hard to translate into practice.
13
See https://csrc.nist.gov/projects/post-quantum-cryptography.
14
See https://blog.cloudflare.com/the-tls-post-quantum-experiment.

KA Applied Cryptography | July 2021 Page 616


The Cyber Security Body Of Knowledge
www.cybok.org

18.1.18 From Schemes to Protocols


In this section, we have focused on low-level cryptographic algorithms and schemes. In real
systems, these are combined to build more complex systems. Often, the result is a collection
of interactive algorithms, commonly called a cryptographic protocol. An overall cryptographic
system may use a number of sub-systems and protocols. We will look briefly at a few specific
systems in Section 18.5. Here, we restrict ourselves to general comments about such systems.
First, to reiterate from Section 18.1.12, even complex cryptographic protocols and systems,
and their security properties, are amenable to rigorous definitions and analysis. The approach
is to analyse the security of such systems in terms of simpler, easier to analyse security
properties of their components. We have seen a simple example of this in our treatment of
AE Section 18.1.6 in, where a generic composition approach allows the AE(AD) security of the
EtM composition to be established based on the IND-CPA security of its “E” component and
the SUF-CMA security of its “M” component.
This process could be carried to the next level. Consider, for example, building a unidirectional
secure channel protocol, assuming a suitable symmetric key is already in place at the sender
and receiver. First we need a definition of what functionality and security such a protocol
should provide. For example, we could demand integrity and confidentiality of plaintexts sent
and that an adversary that tries to reorder, drop, or replay ciphertexts can be detected. Suitable
formal definitions capturing these requirements can be found in [1676].
Then we can try to realise the unidirectional secure channel protocol from a nonce-based
AEAD scheme and prove that its security follows from (or can be reduced to) the standard
security definition for AEAD security. Here, a candidate construction is to make the sender
stateful, by having it maintain a counter for the number of plaintexts encrypted. That counter
is encoded as the nonce for the AEAD scheme. The receiver maintains an independent copy
of the counter, using it as the nonce when performing decryption. Intuitively, confidentiality
and integrity for individual ciphertexts in the secure channel follows immediately from the
AEAD security definition. Meanwhile, an adversary tampering with the order of ciphertexts will
lead to the receiver using the wrong counter value when decrypting, which leads to an error by
the integrity properties of the AEAD scheme. These ideas can and should be formalised.
We can go even further and build a bidirectional secure channel protocol from a unidirectional
one. Here, additional considerations arise from the possibility of reflection attacks and whether
the channel should preserve the joint ordering of ciphertexts in both directions. We can also
consider secure channel protocols in which the protocol recovers from accidental packet
losses or reordering arising from the underlying network transport, or where such errors are
fatal and lead to termination of the protocol. We can try to remove the assumption of having
pre-established symmetric keys by bringing a key exchange component into play as a separate
or integrated sub-protocol.
Again, all these aspects can be formally defined and analysed. However, the challenges in
dealing with the complexity inherent in such systems should already be apparent, especially
when the models and proofs are all hand-generated. For this reason, it is common to make
some kind of “cryptographic core” of a system the focus of analysis and to abstract away many
details. For example, a typical analysis will completely ignore all key management aspects,
including PKI (which we discuss in Section 18.3.8). Instead, it is simply assumed that all keys
are authentic and where they need to be, as we did in the example above. However, these
details are relevant to the overall security of the system. So too much abstraction brings the
risk of missing important facets or making assumptions that are not warranted in practice.

KA Applied Cryptography | July 2021 Page 617


The Cyber Security Body Of Knowledge
www.cybok.org

It is then important to be clear about what is — and is not — being formally specified and
analysed.
An alternative approach to taming complexity is to use mechanised tools, letting a computer
do the heavy lifting. However, the currently available tools are quite difficult to use and require
human intervention and, more often than not, input from the tool designer. One of the more
successful approaches here is to use a symbolic model of the cryptographic primitives
rather than a computational one as we have been considering so far. This provides a level of
abstraction that enables more complex protocols to be considered, but which misses some
of the subtleties of the computational approach. Symbolic approaches to formal analysis are
covered in more detail in Formal Methods for Security Knowledge Area (Chapter 13).

18.2 CRYPTOGRAPHIC IMPLEMENTATION


[405, 1453, 1637, 1677]
So far, we have focused on describing cryptographic algorithms and schemes in terms of
abstract algorithms or in mathematical terms. In research papers, new schemes are usually
presented as pseudo-code. Of course, this all needs to be translated into actual code (or
hardware) for practical use. In this section, we briefly discuss some of the considerations that
arise in this process.

18.2.1 Cryptographic Libraries


From the perspective of the developer, cryptography is usually consumed via a cryptographic
library, that is, a collection of algorithms and schemes accessed through an API. Many different
cryptographic libraries are available, in many different programming languages, though ‘C’
and Java libraries are most common. Different libraries have different licence restrictions,
though many are available under non-restrictive “open source” licences of various kinds.
Some libraries (e.g. OpenSSL15 or BouncyCastle16 ) are richly featured, supporting many differ-
ent cryptographic schemes and processes and can be used across a wide range of applications.
Others are much more restrictive and designed to support only certain use cases. In part, this
reflects the taste of the libraries’ authors, but also age and available development resources.
Some libraries are better maintained than others. For example, prior to the Heartbleed vul-
nerability (discussed in Section 18.3.5), OpenSSL had fallen into a state of some ossification
and disrepair. Consequently, Google and OpenBSD separately decided to “fork” OpenSSL, that
is to create entirely separate development branches of the library, resulting in the BoringSSL
and LibreSSL libraries. Heartbleed also resulted in a broader realisation of how important
OpenSSL was to the whole Internet ecosystem. A sequence of reforms of the OpenSSL project
followed and today the project is in much better shape, with a larger team of core developers,
better funding and more active development.
Some cryptographic libraries are developed by professional software engineers with consid-
erable experience in avoiding some of the pitfalls we discuss in this section and elsewhere.
Others are not. In many cases, the developers are working on a volunteer basis; most of
15
https://www.openssl.org/
16
https://www.bouncycastle.org/

KA Applied Cryptography | July 2021 Page 618


The Cyber Security Body Of Knowledge
www.cybok.org

OpenSSL’s code development is done in this way. As part of its support model, a crypto-
graphic library should have a clear process for notifying its maintainers of bugs and security
vulnerabilities. The library’s developers should commit to address these in a timely manner.

18.2.2 API Design for Cryptographic Libraries


The API that a cryptographic library presents to its consumers is critical. There is a delicate
balance to be struck between flexibility (allowing developers to use the library in a wide variety
of ways, thereby making it more useful) and security (restricting the API in an effort to prevent
developers from using the library in insecure ways). Consider the problem of providing an
API for symmetric encryption. Should the library allow direct access to a raw block cipher
capability? Possibly, since some developers may need that functionality at some point. But
also perhaps not — since it’s probable that an inexperienced developer will use the block
cipher in ECB mode to perform bulk encryption, with predictably insecure results.17 This simple
example is not an isolated one. It could be replaced, for example, with one involving nonces in
an API for AEAD, or one involving the selection of parameters for a primality test.
Green and Smith [405] present ten principles for API design for cryptographic libraries. Their
principles are derived empirically from their analysis of an ad hoc collection of examples and
from interviews with developers. More systematic approaches, relying on standard method-
ologies from social science, have followed, see [1678] for an illustrative example of this line
of work.
Green and Smith observe that developers’ mistakes affect many users, so it makes sense to
focus on them and not the actual end users, who are typically the target of usable security
research. They point out that cryptographic libraries appear to be uniquely prone to misuse by
developers, with even subtle misuse leading to catastrophic security failures. They also argue
that as the use of cryptography in applications becomes more common, so cryptographic
libraries are increasingly used by developers without cryptographic expertise.
Green and Smith’s ten principles can be summarised as follows:
1. Integrate cryptographic functionality into standard APIs; that is, hide cryptography from
developers where possible.
2. Make APIs sufficiently powerful to satisfy both security and non-security requirements.
The argument here is that developers ultimately don’t have a security goal in mind
and satisfying their actual requirements, ascertained through interviewing them, will
encourage them to use an API rather than writing their own cryptographic code.
3. Design APIs that are easy to learn without cryptographic expertise.
4. Don’t break the developer’s paradigm (or mental model) of what the API should look like.
5. Design APIs that are easy to use even without reading the documentation (since devel-
opers will not read it!).
6. Create APIs that are hard to misuse — visible errors should result from incorrect usage.
7. APIs should have safe and unambiguous defaults.
8. APIs should have testing modes, because otherwise developers will hack the API to turn
off security during development to ease testing, but then may fail to properly remove
17
See https://blog.filippo.io/the-ecb-penguin/ for a vivid illustration of the limitations of ECB mode.

KA Applied Cryptography | July 2021 Page 619


The Cyber Security Body Of Knowledge
www.cybok.org

their hacks. An issue here is that the resulting code could be released with the testing
mode still enabled, but one would hope that regular software assurance would detect
this before release.
9. Code that uses the API should be easy to read and maintain. For example, iteration
counts for password hashing should not be set by a developer via the API, but instead
internally in the library. One issue here is that the internal defaults may be overkill and
hurt performance in some use cases. This relates to the tension between flexibility and
security.
10. The API should assist with or handle end user interaction, rather than leave the entire
burden of this to the developer using the API. Here, error messages are highlighted
as a particular concern by Green and Smith: the API and the library documentation
should help developers understand what failure modes the library has, what the security
consequences of these are and how the resulting errors should be handled by the calling
code.
For additional references and discussion, see Human Factors Knowledge Area (Section 4.6.2).

18.2.3 Implementation Challenges


Having discussed libraries and their APIs, we now turn to challenges arising in securely
implementing the algorithms and schemes within these libraries.
The main problem is to translate a purely mathematical or pseudo-code description of a
scheme (the typical unit of analysis in formal security proofs arising in research papers)
into running code on a real computer in such a way that the abstraction level involved in the
security analysis is still properly respected by the running code. Put another way, the challenge
is to ensure there are no mechanisms through which sensitive information can leak that are
not already anticipated and eliminated by the security analysis. There are multiple ways in
which such leakage can arise. We consider a representative selection here.

18.2.3.1 Length Side Channels

As we noted in Section 18.1.6, the usual security goal of an AEAD scheme does not guarantee
that the length of plaintexts will be hidden. Indeed, AEAD schemes like AES-GCM make it
trivial to read off the plaintext length from the ciphertext length. However, it is clear that length
leakage can be fatal to security. Consider a simplistic secure trading system where a user
issues only two commands, “BUY” or “SELL”, with these commands being encoded in simple
ASCII and sent over a network under the protection of AES-GCM encryption. An adversary
sitting on the network who can intercept the encrypted communications can trivially infer what
commands a user is issuing, just by looking at ciphertext lengths (the ciphertexts for “SELL”
will be one byte longer than those for “BUY”). More generally, attacks based on traffic analysis
and on the analysis of metadata associated with encrypted data can result in significant
information leaking to an adversary.

KA Applied Cryptography | July 2021 Page 620


The Cyber Security Body Of Knowledge
www.cybok.org

18.2.3.2 Timing Side Channels

The amount of time that it takes to execute the cryptographic code may leak information about
the internal processing steps of the algorithm. This may in turn leak sensitive information,
e.g. information about keys. The first public demonstration of this problem was made by
Kocher [1453] with the attacker having direct access to timing information. Later it was shown
that such attacks were even feasible remotely, i.e. could be carried out by an attacker located
at a different network location from the target, with timing information being polluted by
network noise [1679].
Consider for example a naive elliptic curve scalar multiplication routine which is optimised
to ignore leading zeros in the most significant bits of the scalar. Here we imagine the scalar
multiplication performing doubling and adding operations on points, with the operations being
determined by the bits of the scalar from most significant to least significant. If the adversary
can somehow time the execution of the scalar multiplication routine, it can detect cases where
the code finishes early and infer which scalars have some number of most significant bits
equal to zero. Depending on how the routine is used, this may provide enough side channel
information to enable a key to be recovered. This is the case, for example, for the ECDSA
scheme, where even partial leakage of random values can be exploited. Recent systematic
studies in this specific setting [1680, 1681] show that timing attacks are still pertinent today.

18.2.3.3 Error Side Channels

Errors arising during cryptographic processing can also leak information about internal pro-
cessing steps. Padding oracle attacks on CBC mode encryption, originally introduced in [1636],
provide a classic and persistent example of this phenomenon. CBC mode uses a block cipher
to encrypt plaintext data that is a multiple of the block cipher’s block length. But in applications,
we typically want to encrypt data of arbitrary length. This implies that data needs to be padded
to a block boundary of the block cipher before it can be encrypted by CBC mode. Vaudenay
observed that, during decryption, this padding needs to be removed, but the padding may be
invalidly formatted and the decryption code may produce an error message in this case. If the
adversary can somehow observe the error message, then it can infer something about the
padding’s validity. By carefully constructing ciphertexts and observing errors arising during
their decryption, an adversary can mount a plaintext recovery attack via this error side channel.
In practice, the error messages may themselves be encrypted, but then revealed via a sec-
ondary side channel, e.g. a timing side channel (since an implementation might abort fur-
ther processing once a padding error is encountered). For examples of this in the con-
text of SSL/TLS and which illustrate the difficulty of removing this class of side channel,
see [1637, 1682].

KA Applied Cryptography | July 2021 Page 621


The Cyber Security Body Of Knowledge
www.cybok.org

18.2.3.4 Attacks Arising from Shared Resources

The cryptographic code may not be running in perfect isolation from potential adversaries.
In particular, in modern CPUs), there is a memory cache hierarchy in which the same fast
memory is shared between different processes, with each process potentially overwriting
portions of the cache used by other processes. For example, in a cloud computing scenario,
many different users’ processes may be running in parallel on the same underlying hardware,
even if they are separated by security techniques like virtualisation. So an attacker, running in
a separate process in the CPU, could selectively flush portions of the cache and then, after
the victim process has run some critical code, observe by timing its own cache accesses,
whether that part of the cache has been accessed by the victim process or not. If the victim
process has a pattern of memory access that is key-dependent, then this may indirectly leak
information about the victim’s key. This particular attack is known as a Flush+Reload attack
and was introduced in [1683]; several other forms of cache-based attack are known. The
possibility of such attacks was first introduced in [1684]; later such attacks were shown to be
problematic for AES in particular [1685].18 In the last few years, researchers have had a field
day developing cache-based and related micro-architectural attacks against cryptographic
implementations. These attacks arise in general from designers of modern CPUs making
architectural compromises in search of speed.

18.2.3.5 Implementation Weaknesses

More prosaically, cryptographic keys may be improperly deleted after use, or accidentally
written to backup media. Plaintext may be improperly released to a calling application before
its integrity has been verified. This can occur in certain constructions where MAC verification
is done after decryption and also in streaming applications where only a limited-size buffer is
available for holding decrypted data.

18.2.3.6 Attacks Arising from Composition

A system making use of multiple cryptographic components may inadvertently leak sensitive
information through incorrect composition of those components. So we have leakage at a
system level rather than directly from the individual cryptographic components. Consider the
case of Zcash,19 an anonymous cryptocurrency. Zcash uses a combination of zero-knowledge
proofs, a PKE scheme and a commitment scheme in its transaction format. The PKE scheme
is used as an outer layer and is anonymous, so the identity of the intended recipient is shielded.
How then should a Zcash client decide if a transaction is intended for it? It has to perform a
trial decryption using its private key; if this fails, no further processing is carried out. Otherwise,
if decryption succeeds, then further cryptographic processing is done (e.g. the commitment
is checked). This creates a potentially observable difference in behaviour that breaks the
intended anonymity properties of Zcash [1686]. The PKE scheme used may be IND-CCA secure
and anonymous, but these atomic security properties do not suffice if the overall system’s
behaviour leaks the critical information.
18
See also https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for contemporaneous but unpublished work.
19
See https://z.cash/.

KA Applied Cryptography | July 2021 Page 622


The Cyber Security Body Of Knowledge
www.cybok.org

18.2.3.7 Hardware Side Channels

Cryptography is often implemented directly in hardware. For example, hardware acceleration


of cryptographic functions was once common, both in low-cost environments such as pay-
ment cards and in higher-end applications, such as server-side SSL/TLS operations. Today,
Internet-of-Things (IoT) deployments may use hardware components to implement expensive
cryptographic functions. Hardware-based cryptography can also be found in Trusted Platform
Modules (TPMs) as specified by the Trusted Computing Group and in systems like Intel Soft-
ware Guard eXtensions (SGX) and ARM Trustzone. As noted in Section 18.1.3, modern CPUs)
have instructions to enable high performance implementation of important cryptographic
algorithms like AES.
There are additional sources of leakage in hardware implementations of cryptography. For
example, an attacker against a smartcard might be able to observe how much power the
smartcard draws while carrying out its cryptographic operations at a fine-grained time resolu-
tion and this might reveal the type of operation being carried out at each moment in time. To
give a more specific example, in an implementation of RSA decryption using a basic “square
and multiply” approach, the two possible operations for each private key bit — either square or
square & multiply — could consume different amounts of power and thus the private key can
be read off bit-by-bit from a power trace. The electromagnetic emissions from a hardware
implementation might also leak sensitive information. Even sonic side channels are possible.
For example the first working QKD prototype [1687] reportedly had such a side channel, since
an observer could listen to the optical components physically moving and thereby learn which
polarisation was being used for each signal being sent. This highlights just one of the many
challenges in achieving unconditional security according to the laws of physics.
For a fuller discussion of hardware side channels, we refer the reader to Hardware Security
Knowledge Area (Chapter 20).

18.2.3.8 Fault Attacks

Hardware implementations may also be vulnerable to fault or glitch attacks, where an error
is introduced into cryptographic computations at a precise moment resulting in leakage of
sensitive data (typically keys) via the output of the computation. The first such attack focused
on implementations of RSA using the CRT [1688]. A more recent incarnation of this form of
attack called Rowhammer targets the induction of faults in memory locations where keys are
stored by repeatedly writing to adjacent locations [989].

18.2.4 Defences
General techniques for defending against cryptographic implementation vulnerabilities (as
opposed to weaknesses in the algorithms and schemes themselves) come from the fields of
software and hardware security and are well-summarised in [1689, 1690]. Indeed, it can be
argued that conventional software security may be more important for cryptographic code
than for other forms of code. For hardware, blinding, masking, threshold techniques and
physical shielding are commonly used protections. For software, common techniques include
formal specification and verification of software and hardware designs, static and dynamic
analysis of code, fuzzing, information flow analysis, the use of domain-specific languages for
generating cryptographic code and the use of strong typing to model and enforce security
properties. Most of the software techniques are currently supported only by experimental

KA Applied Cryptography | July 2021 Page 623


The Cyber Security Body Of Knowledge
www.cybok.org

tools and are not at present widely deployed in production environments. Additionally, the
objects they analyse — and therefore the protections they offer — only extend so far, down to
code at Instruction Set Architecture level at best.
Length side channels can be closed by padding plaintexts to one of a set of predetermined
sizes before encryption and by adding cover or dummy traffic. Secure communications
protocols like SSL/TLS and IPsec have features supporting such operations, but these features
are not widely used in practice.
A set of coding practices aim to achieve what is loosely called Constant-Time Cryptography.
The core idea is to remove, through careful programming, any correlation between the values
of sensitive data such as keys or plaintexts, and variables that can be observed by an adversary
such as execution time. This entails avoiding, amongst other things, key-dependent memory
accesses, key-dependent branching and certain low-level instructions whose running time
is operand-dependent. It may also require writing high-level code in particular ways so as to
prevent the compiler from optimising away constant-time protections.20 Writing constant-time
code for existing algorithms is non-trivial. In some cases, cryptographic designers have taken
it into account from the beginning when designing their algorithms. For example, Bernstein’s
ChaCha20 algorithm21 does so, while using certain coordinate systems makes it easier to
achieve constant-time implementation of elliptic curve algorithms [1691].

18.2.5 Random Bit Generation


Cryptography relies on randomness in a crucial way. Most obviously, random bits are needed
for symmetric keys, and for more complex key generation algorithms in the public key setting.
But to achieve standard security notions such as IND-CPA security, PKE schemes need to have
a randomised encryption algorithm. Fortunately, in the nonce-based AE setting, we can avoid
the need for randomness during encryption. Some signature schemes have a randomised
signing algorithm. This is the case for RSA PSS, DSA and ECDSA, for example.22
A failure to supply suitable randomness to such algorithms can have disastrous consequences.
We already remarked on this in the context of DSA and ECDSA in Section 18.1.9. We will discuss
examples for asymmetric key pair generation in Section 18.3.4.
So our cryptographic algorithms need to have access to “strong” random bit sources. To be
generally applicable, such a source should offer a plentiful supply of bits that are independent
and uniformly distributed, such that the adversary has no information about them. Specific
algorithms may reveal their random bits, but general usage requires that they remain hidden.
In an ideal world, every computing device would be equipped with a True Random Bit Generator
(TRBG)23 whose output is hidden from potential adversaries. In practice, this has proven to be
very difficult to achieve. Intel and AMD CPUs) do offer access to the post-processed output of
a TRBG via the RDRAND instruction. However, the designs of these TRBGs are not fully open.
In the absence of a TRBG, common practice is for the operating system to gather data from
weak, local entropy sources such as keyboard timings, disk access times, process IDs and
packet arrival times, to mix this data together in a so-called entropy pool and then to extract
pseudo-random bits from the pool as needed using a suitable cryptographic function (a
20
An introduction to the paradigm can be found at https://www.bearssl.org/constanttime.html.
21
See https://cr.yp.to/chacha.html.
22
But signature schemes can always be derandomised by using a standard method, see [1692].
23
Also called a True Random Number Generator (TRNG).

KA Applied Cryptography | July 2021 Page 624


The Cyber Security Body Of Knowledge
www.cybok.org

Pseudo-Random Number Generator, PRNG, using a seed derived from the entropy pool).
Designs of this type are standardised by NIST in [1672]. They are also used in most operating
systems but with a variety of ad hoc and hard-to-analyse constructions. Mature formal security
models and constructions for random bit generators do exist, see [1677] for a survey. But this
is yet another instance where practice initially got ahead of theory, then useful theory was
developed, and now practice is yet to fully catch up again.
It is challenging to estimate how much true randomness can be gathered from the aforemen-
tioned weak entropy sources. In some computing environments, such as embedded systems,
some or all of the sources may be absent, leading to slow filling of the entropy pool after
a reboot — leaving a “boot time entropy hole” [1693, 1694]. A related issue arises in Virtual
Machine (VM) environments, where repeated random bits may arise if they are extracted from
the Operating System too soon after a VM image is reset [1695].
There has been a long-running debate on whether such random bit generators should be
blocking or non-blocking: if the OS keeps a running estimate of how much true entropy remains
in the pool as output is consumed, then should the generator block further output being taken
if the entropy estimate falls below a certain threshold? The short answer is no, if we believe
we are using a cryptographically-secure PRNG to generate the output, provided the entropy
pool is properly initialised with enough entropy after boot. This is because we should trust our
PRNG to do a good job in generating output that is computationally indistinguishable from
random, even if not truly random. Some modern operating systems now offer an interface to
a random bit generator of this “non-blocking-if-properly-seeded” type.

18.3 KEY MANAGEMENT


[1620, 1666, 1696, 1697]
Cryptographic schemes shift the problem of securing data to that of securing and managing
keys. Therefore no treatment of applied cryptography can ignore the topic of key management.
As explained by Martin [1620, Chapter 10], cryptographic keys are in the end just data, albeit of
a special and particularly sensitive kind. So key management must necessarily involve all the
usual processes involved in Information Security management, including technical controls,
process controls and environmental controls.
An introductory treatment of key management can be found in the aforementioned [1620,
Chapter 10]. A more detailed approach can be found in the NIST three part series [1666, 1696,
1697].

18.3.1 The Key Life-cycle


Keys should be regarded as having a life-cycle, from creation all the way to destruction.
Keys first need to be generated, which may require cryptographically secure sources of
randomness, or even true random sources, in order to ensure keys have sufficient entropy to
prevent enumeration attacks.
Keys may then need to be securely distributed to where they will be used. For example, a
key may be generated as part of a smartcard personalisation process and injected into a
smartcard from the personalisation management system through a physically secure channel;
or a symmetric key for protecting a communication session may be established at a client

KA Applied Cryptography | July 2021 Page 625


The Cyber Security Body Of Knowledge
www.cybok.org

and server in a complex cryptographic protocol, perhaps with one party choosing the session
key and then making use of PKE to transport it to the other party.
Keys may also be derived from other keys using suitable cryptographic algorithms known as
Key Derivation Functions.
Keys need to be stored securely until they are needed. We discuss some of the main key
storage options in more detail in the sequel.
Then keys are actually used to protect data in some way. It may be necessary to impose limits
on how much data the keys are used to protect, due to intrinsic limitations of the cryptographic
scheme in which they are being used. Keys may then need to be changed or updated. For
example, the TLS specification in its latest version, TLS 1.3 [1480], contains recommendations
about how much data each AEAD key in the protocol can be used to protect. These are set
by analysing the security bounds for the employed AEAD schemes. TLS also features a key
update sub-protocol enabling new keys to be established within a secure connection.
Keys may need to be revoked if they are discovered to have been compromised. The revocation
status of keys must then be communicated to parties relying on those keys in a timely and
reliable manner.
Keys may also need to be archived — put into long-term, secure storage — enabling the data
they protect to be retrieved when needed. This may involve encrypting the keys under other
keys, which themselves require management. Finally, keys should be securely deleted at the
end of their lifetime. This may involve physical destruction of storage media, or carefully
overwriting keys.
Given the complexity in the key life-cycle, it should be apparent that the key life-cycle and its
attendant processes need to be carefully considered and documented as part of the design
process for any system making use of cryptography.
We have already hinted that keys in general need to remain secret in order to be useful (public
keys are an exception; as we discuss below, the requirement for public keys is that they be
securely bound to identity of the key owner and their function). Keys can leak in many ways —
through the key generation procedure due to poor randomness, whilst being transported to
the place where they will be needed, through compromise of the storage system on which they
reside, through side-channel attacks while in use, or because they are not properly deleted once
exhausted. So it may be profitable for attackers to directly target keys and their management
rather than the algorithms making use of them when trying to break a cryptographic system.
Additionally, it is good practice that keys come with what Martin [1620] calls assurance of
purpose — which party (or parties) can use the key, for which purposes and with what limits.
Certain storage formats — for example, digital certificates — encode this information along
with the keys. This relates to the principle of key separation which states that a given key
should only ever be used for one purpose (or in one cryptographic algorithm). This principle is
perhaps more often broken than observed and has led to vulnerabilities in deployed systems,
see, for example [1667, 1698].

KA Applied Cryptography | July 2021 Page 626


The Cyber Security Body Of Knowledge
www.cybok.org

18.3.2 Key Derivation


Key derivation refers to the process of creating multiple keys from a single key. The main
property required is that exposure of any of the derived keys should not compromise the
security of any of the others, nor the root key from which they are derived. This is impossible
in an information theoretic sense, since given enough derived keys, the root key must be
determinable through an exhaustive search. But it can be assured under suitable computational
assumptions. For example, suppose we have a Pseudo-Random Function (PRF) F which takes
as input key K and “message” input m; then outputs F (K, mi ) on distinct inputs m1 , m2 , . . . will
appear to be random values to an adversary and even giving the adversary many input/output
pairs (mi , F (K, mi )) will not help it in determining K nor any further input/output pairs. Thus
a pseudo-random function can be securely used as a Key Derivation Function (KDF) with root
key K. We refer then to the mi as labels for key derivation.
In many situations, the root key K may itself not be of a suitable length for use in a PRF or come
from some non-uniform distribution. For example, the key may be a group element resulting
from a Diffie-Hellman key exchange. In this case, one should first apply an entropy extraction
step to make a suitable key K, then apply a PRF. One may also desire a key derivation function
with variable length output (different functions may require keys of different sizes) or variable
size label inputs. So the general requirements on a KDF go beyond what a simple PRF can
offer. HKDF is one general-purpose KDF that uses the HMAC algorithm as a variable-input PRF.
It is defined in [1699]. As well as key and label inputs and variable length output, it features
an optional non-secret salt input which strengthens security across multiple, independent
invocations of the algorithm. In legacy or constrained systems, one can find many ad hoc
constructions for KDFs, using for example block ciphers or hash functions.
Using a KDF allows for key diversification and makes it easier to comply with the principle of key
separation. For example, one can use a single symmetric key to derive separate encryption and
MAC keys to be used in an EtM construction. Modern AE schemes avoid this need, effectively
performing key derivation internally. In the extreme, one might use a KDF in combination with a
specific label to derive a fresh key for each and every application of a cryptographic algorithm,
a process known in the financial cryptography context as unique key per transaction. One can
also choose to derive further keys from a derived key, creating a key hierarchy or tree of keys
of arbitrary (but usually bounded) depth. Such key hierarchies are commonly seen in banking
systems. For example, a bank may have a payment-system-wide master secret, from which
individual card secrets are derived; in turn, per transaction keys are derived from the card
secrets. Of course, in such a system, protection of the master secret — the key to the kingdom!
— is paramount. Generally specialised hardware is used for storing and operating with such
keys.

18.3.3 Password-Based Key Derivation


A very common practice, arising from the inevitable involvement of humans in cryptographic
systems, is to derive cryptographic keys from passwords (or passphrases). Humans cannot
remember passwords that have the high entropy required to prevent exhaustive searches,
so special-purpose KDFs should be used in such applications, called password-based KDFs
(PBKDFs). The idea of these is to deliberately slow-down the KDF to limit the speed at which
exhaustive searches can be carried out by an attacker (standard KDFs do not need to be
slowed in this way, since they should only operate on high-entropy inputs). Memory-hard
PBKDFs such as scrypt (defined in [1700] and analysed in [1701]) or Argon2 (the winner

KA Applied Cryptography | July 2021 Page 627


The Cyber Security Body Of Knowledge
www.cybok.org

of a password hashing design competition24 ), which are designed to force an attacker to


expend significant memory resources when carrying out exhaustive searches, should be
used in preference to older designs. These PBKDFs have adjustable hardness parameters
and permit salting, important for preventing pre-computation attacks on the collections of
hashed passwords that are typically stored in authentication databases. It is worth keeping
in mind that commercial password cracking services exist. These cater for many different
formats and make extensive use of Graphical Processing Units (GPUs). They are impressively
fast, rendering human-memorable passwords on the verge of being obsolete and forcing the
adoption of either very long passwords or entirely different authentication methods.
An alternative to using a password directly to derive a key is to use the password as an
authentication mechanism in a key exchange protocol, leading to the concept of Password
Authenticated Key Exchange (PAKE). When designed well, a PAKE can limit an attacker to
making a single password guess in each execution of the protocol. PAKE has not seen
widespread adoption yet, but is currently undergoing standardisation in the IRTF.25 .

18.3.4 Key Generation


The main requirement for symmetric keys that are being generated from scratch is that they
should be chosen close to uniformly at random from the set of all bit-strings of the appropriate
key length. This requires access to good sources of random bits at the time when the key
is selected. These bits may come from true random sources (e.g. from noise in electronic
circuits or from quantum effects) or from an operating-system supplied source of pseudo-
random bits, which may in turn be seeded and refreshed using random bits gathered from
the local environment. This topic is discussed in more detail in Section 18.2. An alternative is
to use a Physically Unclonable Function (PUF) to generate keying material from the intrinsic
properties of a piece of hardware. Depending on what properties are measured, significant post-
processing may be needed to correct for errors and to produce output with good randomness
properties. An overview of some of the challenges arising for PUFs can be found in [1702].
For asymmetric algorithms there may be additional requirements, for example the key pairs
may require special algebraic structure or need to lie in certain numerical ranges. However,
the KeyGen algorithms for such schemes should handle this internally and themselves only
require access to a standard random bit source.
There are (in)famous cases where key generation processes were not appropriately ran-
domised [1693, 1703] (see also the Debian incident26 ). Another challenge is that in some
cases keys need to be generated in constrained environments, e.g. on smartcards that will be
used as personal identity cards, where generating the keys “off-card” and then injecting them
into the card would not meet the security requirements of the application. There may then
be a temptation to over-optimise the key generation process. This can result in significant
security vulnerabilities [1704].
24
See https://www.password-hashing.net/.
25
Details at https://github.com/cfrg/pake-selection.
26
See https://www.debian.org/security/2008/dsa-1571.

KA Applied Cryptography | July 2021 Page 628


The Cyber Security Body Of Knowledge
www.cybok.org

18.3.5 Key Storage


Once keys are generated, they typically need to be stored. An exception is where the keys can
be generated on the fly from a human-memorable password, as discussed in Section 18.3.3.)
In general, the storage medium needs to be appropriately secured to prevent the keys be-
coming available to adversaries. In many cases, the only security for stored keys comes
from that provided by the local operating system’s access control mechanisms, coupled with
hardware-enforced memory partitioning for different processes. Such security mechanisms
can be bypassed by local attackers using low-level means that exploit the presence of shared
resources between different executing processes, e.g. a shared memory cache. These are
discussed in Section 18.2. This is a particular issue in multi-tenant computing environments,
e.g. cloud computing, where a customer has little control over the processes running on the
same CPU) as its own. But it is also an issue in single-tenant computing environments when
one considers the possibility of malicious third-party software. An acute challenge arises
for “crypto in the browser”, where users must rely on the browser to enforce separation and
non-interference between code running in different browser tabs, and where the code running
in one tab may be loaded from a malicious website “evil.example.com” that is trying to
extract cryptographic secrets from code running another tab “your-bank.example.com”.
The Heartbleed incident [1559] was so damaging precisely because it enabled a fully remote
attacker (i.e. one not running on the same machine as the victim process but merely able
to connect over a network to that machine) to read out portions of a TLS server’s memory,
potentially including the server’s private key. It shows that any amount of strong cryptography
can easily be undermined thanks to a simple software vulnerability. In the case of Heartbleed,
this was a buffer over-read in the OpenSSL implementation of the TLS/DTLS Heartbeat
protocol.
Developers who store cryptographic keys in memory (or in software) often use software
obfuscation techniques to try to make it harder to identify and extract the keys. Correspondingly,
there are de-obfuscation tools which try to reverse this process and there is a long-running
arms race in this domain. The research topic of white-box cryptography [1705] attempts to
formalise this game of attack and defence.
The alternative to in memory storage of keys is to rely on special purpose secure storage for
keys. Devices for this exist at all scales and price points. For example, a smartcard may cost a
few cents to manufacture, but can be well-protected against passive and invasive attacks that
attempt to recover the keys that it stores. A well-designed smartcard will have an interface
which carefully limits how an external card reader communicating with the card can interact
with it. In particular, there will not be any commands that a reader can send to the card that
would allow its keys to directly leave the card. Rather, a reader will only be able to send data to
the card and have it cryptographically transformed locally using the stored keys, then receive
the results of the cryptographic computations. Thus, the reader will be able to interact with
the secrets on the card only via a cryptographic API. Smartcards are typically bandwidth- and
computation-limited.
At the other end of the scale are Hardware Security Modules (HSMs), which may cost many
thousands of US dollars, offer much greater capabilities and be tested to a high level of security
using industry-standard evaluation methodologies (e.g. the NIST FIPS 140 series). HSMs are
frequently used in the financial sector, and are also now offered in cloud environments, see
Section 18.4. As with smartcards, an HSM offers key storage and key usage functions via a
carefully controlled API. The API provided by an HSM can be used to extend the security that

KA Applied Cryptography | July 2021 Page 629


The Cyber Security Body Of Knowledge
www.cybok.org

the HSM offers to keys that it directly stores to a larger collection of keys. Consider the simple
case of using the HSM to store a Key Encryption Key (KEK), such that the HSM’s API allows
that key to be used internally for authorised encryption and decryption functions. Then the
HSM can be used to wrap and, when needed, unwrap many Data Encryption Keys (DEKs) using
a single KEK that is stored inside the HSM. Here wrapping means encrypting and unwrapping
means decrypting. Assuming the used encryption mechanism is strong, the wrapped DEKs
can be stored in general, unprotected memory. Further details on HSMs can be found in the
Hardware Security Knowledge Area (Chapter 20).
TPMs also provide hardware-backed key storage. Technologies aiming to provide secure
execution environments, such as Intel SGX and ARM Trustzone, enable secure storage of
keys but also possess much more general capabilities. On mobile devices, the Android and
iOS operating systems offer similar key storage features through Android Keystore and iOS
Secure Enclave — both of these are essentially mini-HSMs.

18.3.6 Key Transportation


Depending on system design, keys may be generated at one place but need to be transported
to another place where they will subsequently be used. Traditionally, secure couriers were
used to physically transport keys stored on paper tape or magnetic media. Of course, this is
costly and time-consuming. A related method, used often in low-value consumer applications
(e.g. domestic wireless routers), is to simply print the key on the back of the device. Then
security is reduced to that of the physical security of the device.
An alternative approach, widely used in the mobile telecommunications and consumer finance
sector, is to inject symmetric keys into low-cost security modules (e.g. a Subscriber Identity
Module, SIM, card in the case of mobile telephones or a bank card in the case of finance)
and distribute those to customers. At the same time, copies of the symmetric keys may be
kept en masse at a centralised location. In the case of mobile telecommunications, this is
at a logical network component called the Authentication Centre (AuC); the symmetric keys
are then used to perform authentication of SIM cards to the network and as the basis for
deriving session keys for encrypting voice and data on the wireless portion of the network. In
the financial setting, the symmetric keys injected into cards are typically already derived from
master keys using a KDF, with the master keys being held in an HSM.
If a key is already in place, then that key can be used to transport fresh keys. Here, the idea
is to sparingly use an expensive but very secure algorithm to transport the keys. With the
advent of Public Key Cryptography, the problem of key transportation can be reduced to the
problem of establishing an authentic copy of the public key of the receiver at the sender. As
we discuss in Section 18.3.8, this is still a significant problem. Earlier versions of the TLS
protocol used precisely this mechanism (with RSA encryption using PKCS#1 v1.5 padding) to
securely transport a master key from a client in possession of a server’s public key to that
server. In TLS, various session keys are derived from this master key (using an ad hoc KDF
construction based on iteration of MD5 and SHA-1).

KA Applied Cryptography | July 2021 Page 630


The Cyber Security Body Of Knowledge
www.cybok.org

18.3.7 Refreshing Keys and Forward Security


Consider a scenario where a symmetric key is being used to protect communications between
two parties A and B, e.g. by using the key in an AEAD scheme as was described in Sec-
tion 18.1.18. Suppose that key is compromised by some attack — for example, a side-channel
attack on the AEAD scheme. Then the security of all the data encrypted under that key is
potentially lost. This motivates the idea of regularly refreshing keys. Another reason to do
this, beyond key compromise, is that the formal security analysis of the AEAD scheme will
indicate that any given key can only be used to encrypt a certain amount of data before the
security guarantees are no longer meaningful.
Several mechanisms exist to refresh symmetric keys. A simple technique is to derive, using
a KDF, a new symmetric key from the existing symmetric key at regular intervals, thereby
creating a chain of keys. The idea is that if the key is compromised at some point in time, then
it should still be hard to recover all previous keys (but of course possible to compute forward
using the KDF to find all future keys). This property, where the security of old keys remains
after the compromise of a current key, is somewhat perversely known as forward security.
In our example it follows from standard security properties of a KDF. In practice, hashing is
often improperly used in place of a KDF, leading to the notion of a hash chain. This approach
does not require direct interaction between the different users of a given symmetric key (i.e.
A and B in our example). But it does need a form of synchronisation so that the two parties
know which version of the key to use.
Another idea is to use a PKE scheme (more properly a KEM) to regularly transport a fresh
symmetric key from one party to the other, e.g. from A to B under the protection of B’s public
key and then use that session key in the AEAD scheme (as discussed above was done in earlier
versions of TLS). This creates fresh session keys whenever needed. Now to fully evaluate
the system under key compromises, we should consider the effect of compromise of B’s
PKE private key too. This takes us back to square one: if the attacker can intercept the PKE
ciphertexts as they are sent, store them and later learn B’s PKE private key by some means
(e.g. by factorising the modulus in the case of RSA, or simply as a result of an order made
by a lawful authority), then it can recover all the session keys that were ever transported and
thereby decrypt all the AEAD ciphertexts. In short, if the capabilities of the adversary include
PKE private key compromise, then the use of PKE does not achieve forward security.
To improve the situation, we can make use of ephemeral Diffie-Hellman key exchange, in
combination with appropriate authentication to prevent active MiTM attacks, to set up fresh
session keys. Here, ephemeral refers to the Diffie-Hellman values used being freshly gen-
erated by both parties in each key exchange instance. We might use digital signatures for
the authentication, as TLS 1.3 does. Now what happens in the event of compromise of the
equivalent of the KEM private key? This is the signing key of the digital signature scheme.
We see, informally, that an active MiTM attacker who knows the signing key could spoof the
authentication and fool the honest protocol participants into agreeing on fresh session keys
with the attacker rather than each other. This cannot be prevented. However, a compromise at
some point in time of the signing key has no effect on the security of previously established,
old session keys. If the authentication was sound at the time of the exchange, so that active
attacks were not possible, then an adversary has to break a Diffie-Hellman key exchange in
order to learn a previously established session key. Moreover breaking one Diffie-Hellman
key exchange does not affect the security of the others.27 In comparison to the use of PKE
27
Except in the case where the discrete logarithm algorithm used allows reuse of computation when solving
multiple DLPs. This is the case for the best known finite field algorithms, see [438] for the significant real-world

KA Applied Cryptography | July 2021 Page 631


The Cyber Security Body Of Knowledge
www.cybok.org

to transport session keys, Diffie-Hellman key exchange achieves strictly stronger forward
security properties.
Complementing forward security is the notion of backward security, aka post compromise,
security [1706]. This refers to the security of keys established after a key compromise has
occurred. Using Diffie-Hellman key exchange can help here too, to establish fresh session
keys, but only in the event that the adversary is restricted to being passive for at least one run
of the Diffie-Hellman protocol.

18.3.8 Managing Public Keys and Public Key Infrastructure


The main security requirement for a public key is that it be authentically bound to the identity
of the party who is in legitimate possession of the corresponding private key. Otherwise
impersonation attacks become possible. For example, a party A might encrypt a session key
to what it thinks is the public key of B, but the session key can be recovered by some other
party C. Or a party A might receive a digital signature on some message M purporting to be
generated by B using its private key, but where the signature was actually generated by C;
here A would verify the signature using C’s (public) verification key thinking it was using B’s
key.
A second requirement is that parties who rely on the soundness of the binding between public
key and identity are also able to gain assurance that the binding is still valid (i.e. has not
expired or been revoked).
These two requirements have many implications. Meeting them leads to the introduction
of additional infrastructure and management functions. Conventionally, this collection of
components is called a Public Key Infrastructure (PKI).

18.3.8.1 Binding Public Keys and Identities via Certificates

A suitable mechanism to bind public keys and identities — and possibly other information —
is needed. In early proposals for deploying Public Key Cryptography, it was proposed that this
could take the form of a trusted bulletin board, where all the public keys and corresponding
identities are simply listed. But this of course requires trust in the provider of the bulletin board
service.
A non-scalable and inflexible solution, but one that is commonly used in mobile applications
and IoT deployments, is to hard-code the required public key into the software of the party that
needs to use the public key. Here, security rests on the inability of an adversary to change the
public key by over-writing it in a local copy of the software, substituting it during a software
update, changing it in the code repository of the software provider, or by other means.
Another solution is to use public key digital certificates (or just certificates for short). These
are data objects in which the data includes identity, public key, algorithm type, issuance and
expiry dates, key usage restrictions and potentially other fields. In addition, the certificate
contains a digital signature, over all the other fields, of some Trusted Third Party (TTP) who
attests to the correctness of the information. This TTP is known as a Certification Authority
(CA). The most commonly used format for digital certificates is X.509 version 3 [1707].
The use of certificates moves the problem of verifying the binding implied by the digital
signature in the certificate to the authentic distribution of the CA’s public key. In practice, the
security issues this can introduce, but not for the elliptic curve setting.

KA Applied Cryptography | July 2021 Page 632


The Cyber Security Body Of Knowledge
www.cybok.org

problem may be deferred several times via a certificate chain, with each TTP’s public key in
the chain being attested to via a certificate issued by a higher authority. Ultimately, this chain
is terminated at the highest level by a root certificate that is self-signed by a root CA. That is,
the root certificate contains the public verification key of the root CA and a signature that is
created using the matching private signing key. A party wishing to make use of a user-level
public key (called a relying party) must now verify a chain of certificates back to the root and
also have means of assuring that the root public key is valid. This last step is usually solved
by an out of band distribution of the root CA’s public key. Root CAs may also cross-sign each
other’s root certificates.
As an important and visible example of a PKI, consider the Web PKI. Web browser vendors
embed a list of the public keys of a few hundred different root CAs in their software and
update the list from time to time via their software update mechanisms, which in turn may
rely for its security on a separate PKI. Website owners pay to obtain certificates binding their
sites’ URLs to their public keys from subordinate CAs. Then, when running the TLS protocol
for secure communications between a web browser and a website, the website’s server sends
a certificate chain to the web browser client. The chain provides the web browser with a
copy of the server’s public key (in the lowest certificate from the chain, the leaf or end-entity
certificate) as well as a means of verifying the binding between the web site name in the form
of its URL, and that public key. The operations and conventions of the Web PKI are managed
by the CA/Browser Forum.28

18.3.8.2 Reliance on Naming, CA Operations and Time

In addition to needing a suitable binding mechanism, there must be a stable, controlled naming
mechanism for parties. Moreover, parties need to have means of proving to CAs that they own
a specific identity and CAs need to check such assertions. Equally, CAs need to be trusted
to only issue certificates to the correct parties. This aspect of PKI intersects heavily with
legal and regulatory aspects of Information Security and is covered in more detail in Law &
Regulation Knowledge Area (Section 3.10.3).
For the Web PKI, there have been numerous incidents where CAs were found to have mis-
issued certificates, either because they were hacked (e.g. DigiNotar29 ), because of poor
control over the issuance process (e.g. TurkTrust30 ), or because they were under the control
of governments who wished to gain surveillance capabilities over their citizens. This can lead
to significant commercial impacts for affected CAs: in DigiNotar’s case, the company went
bankrupt. In other cases, CAs were found to not be properly protecting their private signing
keys, leaving them vulnerable to hacking.31 In response to a growing number of such incidents,
Google launched the Certificate Transparency (CT) effort. CT provides an open framework for
monitoring and auditing certificates; it makes use of multiple, independent public logs in an
attempt to record all the certificates issued by browser-trusted CAs. The protocols and data
formats underlying CT are specified in [1708].32
Relying parties (i.e. parties verifying certificates and then using the embedded public keys)
need access to reliable time sources to be sure that the certificate’s lifetime, as encoded in
28
See https://cabforum.org/.
29
See https://en.wikipedia.org/wiki/DigiNotar.
30
See https://nakedsecurity.sophos.com/2013/01/08/the-turktrust-ssl-certificate-fiasco-what-happened-and-what-
happens-next/.
31
See for example the case of CNNIC, https://techcrunch.com/2015/04/01/google-cnnic/.
32
See also https://certificate.transparency.dev/ for the project homepage.

KA Applied Cryptography | July 2021 Page 633


The Cyber Security Body Of Knowledge
www.cybok.org

the certificate, is still valid. Otherwise, an attacker could send an expired certificate for which
it has compromised the corresponding private key to a relying party and get the relying party
to use the certificate’s public key. This requirement can be difficult to fulfill in low-cost or
constrained environments, e.g. IoT applications.

18.3.8.3 Reliance on Certificate Status Information

Relying parties verifying certificates also need access to reliable, timely sources of information
about the status of certificates — whether the certificate is still valid or has been revoked for
some security or operational reason. This can be done by regularly sending lists of revoked
certificates to relying parties (known as Certificate Revocation Lists, CRLs), or having the
relying parties perform a real-time status check with the issuing CA before using the public
key using the Online Certificate Status Protocol, OCSP [1709]. The former approach is more
private for relying parties, since the check can be done locally, but implies the existence of a
window of exposure for relying parties between the time of revocation and the time of CRL
distribution. The latter approach provides more timely information but implies that large CAs
issuing many certificates need to provide significant bandwidth and computation to serve the
online requests.
In the web context, OCSP has become the dominant method for checking revocation status of
certificates. OCSP’s bandwidth issue is ameliorated by the practice of OCSP stapling, wherein
a web server providing a certificate regularly performs its own OCSP check and includes
the certified response from its CA along with its certificate. In an effort to further improve
user privacy, in 2020, Mozilla experimentally deployed33 an approach called CRLite developed
in [1710] in their Firefox browser. CRLite uses CT logs and other sources of information to
create timely and compact CRLs for regular distribution to web browsers.

18.3.8.4 Reliance on Correct Software and Unbroken Cryptography

The software at a relying party that validates certificate chains needs to work properly. This
is non-trivial, given the complexity of the X.509 data structures involved, the use of complex
encoding languages and the need to accurately translate security policy into running code.
There have been numerous failures. A prominent and entirely avoidable example is Apple’s
“goto fail” from 2014. Here a repeated line of code34 for error handling in Apple’s certificate
verification code in its SSL/TLS implementation caused all certificate checking to be bypassed.
This made it trivial to spoof a web server’s public key in a fake certificate to clients running
Apple’s code. This resulted in a total bypass of the server authentication in Apple’s SSL/TLS
implementation, undermining all security guarantees of the protocol.35
The certificate industry has been slow to react to advances in the cryptanalysis of algorithms
and slow to add support for new signature schemes. The story of SHA-1 and its gradual
removal from the Web PKI is a prime example. This relates to the discussion of cryptographic
agility in Section 18.1.14. The first cracks in SHA-1 appeared in 2005 [1711]. Already at this
point, cryptographers taking their standard conservative approach, recommended that SHA-1
be deprecated in applications requiring collision resistance. From 2005 onwards, the crypt-
analysis of SHA-1 was refined and improved. Finally in 2017, the first public collisions for
33
See https://blog.mozilla.org/security/2020/01/09/crlite-part-1-all-web-pki-revocations-compressed/.
34
The offending line of code was literally “goto fail”.
35
See https://dwheeler.com/essays/apple-goto-fail.html for a detailed write-up of the incident and its implications
for Apple’s software development processes.

KA Applied Cryptography | July 2021 Page 634


The Cyber Security Body Of Knowledge
www.cybok.org

SHA-1 were exhibited [1712]. This was followed in 2019 by a chosen-prefix collision attack
that directly threatened the application of SHA-1 in certificates [1713]. However, despite the
direction of travel having been clear for more than a decade, it took until 2017 before the
major web browsers finally stopped accepting SHA-1 in Web PKI certificates. Today, SHA-1
certificates are still to be found in payment systems and elsewhere. The organisations run-
ning these systems are inherently change-averse because they have to manage complex
systems that must continue to work across algorithm and technology changes. In short, these
organisations are not cryptographically agile, as discussed in Section 18.1.14.

18.3.8.5 Other Approaches to Managing Public Keys

The web of trust is an alternative to hierarchical PKIs in which the users in a system vouch for
the authenticity of one another’s public keys by essentially cross-certifying each other’s keys.
It was once popular in the PGP community but did not catch on elsewhere. Such a system
poses significant usability challenges for ordinary users [347].
Identity-Based Cryptography (IBC) [1714] offers a technically appealing alternative to traditional
Public Key Cryptography in which users’ private keys are derived directly from their identities
by a TTP called the Trusted Authority (TA) in possession of a master private key. The benefit
is that there is no need to distribute public keys; a relying party now needs only to know an
identity and have an authentic copy of the TA’s public key. The down-side for many applica-
tion domains is that trust in the TA is paramount, since it has the capability to forge users’
signatures and decrypt ciphertexts intended for them through holding the master private
key. On the other hand, IBC’s built-in key escrow property may be useful in corporate security
applications. Certificateless Cryptography [1715] tries to strike a balance between traditional
PKI and IBC. These and other related concepts have sparked a lot of scientific endeavour, but
little deployment to date.

18.4 CONSUMING CRYPTOGRAPHY


The content of this section is based on the author’s personal experience.

18.4.1 The Challenges of Consuming Cryptography


Numerous people, including those who are well educated in computer science and math-
ematics, can and do regularly make fundamental errors when attempting to devise novel
cryptographic schemes or to implement existing mechanisms.
Perhaps one reason for this is that many people receive informal exposure to cryptography
through popular culture, where it is often shown in a simplified way, for example using pen
and paper ciphers or watching a password being revealed character-by-character through
some unspecified search process. The subject also has basic psychological appeal — after
all, who does not love receiving secret messages?
The issue shows up in several different ways. Least dangerous, the author has seen that
cryptographic conferences and journals regularly receive submissions from people who are
not aware of the advanced state-of-the-art. A classic trope is papers on encryption algorithms
for digital images, where usually some low-level manipulation of pixel data is involved and
security rests on taking a standard benchmark picture from the image processing community

KA Applied Cryptography | July 2021 Page 635


The Cyber Security Body Of Knowledge
www.cybok.org

and showing that it is visually scrambled by the algorithm. (Of course, image data is ultimately
represented by bits and standard cryptographic algorithms operate on those bits.) Other topics
common in such papers are chaos-based cryptography and combining multiple schemes
(RSA, ElGamal, etc) to make a stronger one. The activity of generating such papers is a waste
of time for the authors and reviewers alike, while it misleads students involved in writing the
papers about the true nature of cryptography as a research topic.
This author has seen multiple examples where complete outsiders to the field have been
persuaded to invest in cryptographic technologies which either defy the laws of information
theory or which fall to the “kitchen sink” fallacy of cryptographic design — push the data
through enough complicated steps and it must be secure. Another classic design error is for
an inventor to fall under the spell of the “large keys” fallacy: if an algorithm has a very large
key space, then surely it must be secure? Certainly a large enough key space is necessary
for security, but it is far from sufficient. A third fallacy is that of “friendly cryptanalysis”: the
inventor has tried to break the new algorithm themselves, so it must be secure. There is no
substitute for independent analysis.
Usually these technologies are invented by outsiders to the field. They may have received
encouragement from someone who is a consumer of cryptography but not themselves an
expert or someone who is too polite to deliver a merciful blow. Significant effort may be
required to dissuade the original inventors and their backers from taking the technology
further. A sometimes useful argument to deploy in such cases is that, while the inventor’s
idea may or may not be secure, we already have available standardised, carefully-vetted,
widely-deployed, low-cost solutions to the problem and so it will be hard to commercialise the
invention in a heavily commoditised area.
Another set of issues arise when software developers, perhaps with the best of intentions and
under release deadline pressure, “roll their own crypto”. Maybe having taken an introductory
course in Information Security or Cryptography at Bachelor’s level, they have accrued enough
knowledge not to try to make their own low-level algorithms and they know they can use
an API to a cryptographic library to get access to basic encryption and signing functions.
However, with today’s cryptographic libraries, it is easy to accidentally misuse the API and
end up with an insecure system. Likely the developer wants to do something more complex
than simply encrypting some plaintext data, but instead needs to plug together a collection of
cryptographic primitives to do something more complex. This can lead to the “kitchen sink”
fallacy at the system level. Then there is the question of how the developer’s code should deal
with key management — recall that cryptographic schemes only shift the problem of securing
data to that of securing keys. Unfortunately, key management is rarely taught to Bachelor’s
students as a first class issue and this author has seen that basic issues like hard-coded keys
are still found on a regular basis in deployed cryptographic software.

KA Applied Cryptography | July 2021 Page 636


The Cyber Security Body Of Knowledge
www.cybok.org

18.4.2 Addressing the Challenges


How can individuals and businesses acting as consumers of cryptography avoid these prob-
lems? Some general advice follows.
There is no free lunch in cryptography. If someone without a track record or proper credentials
comes bearing a new technological breakthrough, or just a new algorithm: beware. Consumers
can look for independent analyses by reputable parties — there are companies and individuals
who can provide meaningful cryptographic review. Any reputable consultant can supply a
list of recent consulting engagements and contact details for references. Consumers can
also check for scientific publications in reputable academic venues that support any new
technology; very few crank ideas survive peer review, and consumers should look for clear
indications of publication quality. Consumers can learn to detect cryptographic snake-oil, for
example, by looking for instances of the kitchen sink, large keys and friendly cryptanalysis
fallacies.36
Developers should not roll their own cryptographic algorithms. They should rely on vetted,
standardised algorithms already available in packaged forms through cryptographic libraries.
Ideally, they should not roll their own higher-level cryptographic systems and protocols, but rely
on existing design patterns and standards. As just one example, there is usually no need to build
a new, secure, application-layer communications protocol when SSL/TLS support is ubiquitous.
Developers who are developing cloud-based solutions (an extremely common use-case)
should make use of cryptographic services from cloud providers — an emerging approach
called “Cryptography-as-a-Service” (CaaS). This is a natural extension of the “Software-as-
a-Service” paradigm, but commonly extends it to include key management services. A key
feature in commercial CaaS offerings is “HSM-as-a-service”, allowing service users to avoid
the cost and expertise needed to maintain on-premise HSMs.37
When these options are not possible, developers should seek expert advice (and not Crypto
Stack Exchange [1716]). Very large organisations should have an in-house cryptography devel-
opment team who vet all uses of cryptography before they go into production, ideally with
involvement from the design stage. There is at least one large software company that today
runs detection tools across its entire codebase to find instances where cryptography is being
misused, and sends email alerts to the cryptography development team.
Smaller organisations for whom cryptography is a core technology should employ applied
cryptographers with proven credentials in the field. Alternatively such medium-sized organisa-
tions — and the smallest companies — should cultivate relationships with trusted external
partners who can provide cryptographic consulting services. The cost of such services is
relatively small compared to the potential damage to reputation and shareholder value in the
event of a major security incident arising from the improper use of cryptography.
36
A more extensive list of snake-oil indicators can be found at: http://www.interhack.net/people/cmcurtin/snake-
oil-faq.html.
37
Without wishing to favour any particular vendors, interested readers can learn more about the typical fea-
tures of such services at https://aws.amazon.com/cloudhsm/ or https://www.entrust.com/digital-security/certificate-
solutions/products/pki/managed-services/entrust-cryptography-as-a-service.

KA Applied Cryptography | July 2021 Page 637


The Cyber Security Body Of Knowledge
www.cybok.org

18.4.3 Making Cryptography Invisible


Ultimately, the best kind of cryptography is that which is as invisible as possible to its end
users. Five years ago web browsers displayed locks of different colours to indicate whether
SSL/TLS connections were secure or not, but users could always click-through to reach the
website regardless. Today, web browsers like Google Chrome and Mozilla Firefox are more
aggressive in how they handle SSL/TLS security failures and in some cases simply do not
allow a connection to be made to the website.38 There is a continuous tension here between
protecting users and enabling functionality that users demand. For example, at the time of
writing, users can still freely visit websites that do not offer SSL/TLS connections in both
Chrome and Firefox, receiving a “not secure” warning in the browser bar; this behaviour may
change in the future as more and more websites switch to supporting SSL/TLS.
Meanwhile, for more than 25 years, in most jurisdictions, mobile telephone calls have been
encrypted between the mobile device and the base-station (or beyond) in order to prevent
eavesdropping on the broadcast communications medium. However, even in the latest 5G
systems, the encryption is optional [1717, Annex D] and can be switched off by the network
operator. However, very few mobile telephones actually display any information to the user
about whether encryption is enabled or not, so it is debatable to what extent users are really
protected.
By contrast, secure messaging services like Signal offer end-to-end security by default and
with no possibility of turning these security features off. Signal’s servers are not able to read
users’ messages — as a January 2021 advert for Signal pointedly stated, “we know nothing”.
Cryptographic security and user privacy are core to the company’s business model. So much
so that if a significant cryptographic flaw were to be found in Signal’s design, then, at the very
least, it would lead to a significant loss of customers and possibly even lead to the company’s
downfall. Yet there is very little in-app security messaging — there is no equivalent of a browser
lock that users are asked to interpret, for example.

18.5 APPLIED CRYPTOGRAPHY IN ACTION


[444, 1480, 1718]
Having explored the landscape of applied cryptography from the bottom up, we now take a
brief look at three different application areas, using them to draw out some of the key themes
of the KA.
38
Readers interested in learning more about what different browsers do and do not allow can visit https://
badssl.com/ or http://example.com.

KA Applied Cryptography | July 2021 Page 638


The Cyber Security Body Of Knowledge
www.cybok.org

18.5.1 Transport Layer Security


The Transport Layer Security (TLS) protocol has already been mentioned several times in
passing. The protocol has a long history, arriving at TLS 1.3 [1480], completed in 2018. It
is a complex protocol with several interrelated sub-protocols: the TLS Handshake Protocol
uses (usually) asymmetric cryptography to establish session keys; these are then used in the
TLS Record Protocol in conjunction with symmetric techniques to provide confidentiality and
integrity for streams of application data; meanwhile the TLS Alert Protocol can be used to
transport management information and error alerts. TLS is now used to protect roughly 95%
of all HTTP traffic.39 In this application domain, TLS relies heavily on the Web PKI for server
authentication.
TLS 1.3 represents a multi-year effort by a coalition of industry-based engineers and academics
working under the aegis of the IETF to build a general-purpose secure communications
protocol. The main drivers for starting work on TLS 1.3 were, firstly, to improve the security of
the protocol by updating the basic design and removing outdated cryptographic primitives
(e.g. switching the MtE construction for AEAD only; removing RSA key transport in favour of
DHKE to improve forward security) and, secondly, to improve the performance of the protocol
by reducing the number of communication round trips needed to establish a secure channel.
Specifically, in earlier versions of TLS, two round trips were needed, while in TLS 1.3 only one
is needed in most cases and even a zero round trip mode is supported in TLS 1.3 when keys
have been established in a previous session.
The design process for TLS 1.3 is described and contrasted with the process followed for
previous versions in [1661]. A key difference is the extensive reliance on formal security
analysis during the design process. This resulted in many research papers being published
along the way, amongst which we highlight [1719, 1720, 1721]. Some of the formal analysis was
able to detect potential security flaws in earlier versions of the protocol, thereby influencing
the protocol design in crucial ways — for example, the use of HKDF in a consistent, hierarchical
manner to process keying material from different stages of the protocol. The protocol was
also modified to make it more amenable to formal security analysis. It is an almost complete
success story in how academia and industry can work together. Only a few criticisms can be
levelled at the process and the final design:
• Not all of the security and functionality requirements were elicited at the outset, which
meant many design changes along the way, with attendant challenges for researchers
doing formal analysis. However, such an iterative approach seems to be unavoidable in
a complex protocol designed to serve many use cases.
• The formal analyses missed a few corner cases, especially in the situation where au-
thentication is based on pre-shared symmetric keys. An attack in one such corner case
was subsequently discovered [1722].
• The zero round trip mode of TLS 1.3 has attractive latency properties but achieves these
at the expense of forward security. Defending against replay attacks in this mode is
difficult in general and likely impossible in distributed server settings when interactions
with typical web clients are taken into account. Recent research showing how to add
forward security to the zero round trip mode of TLS 1.3 can be found in [1723, 1724, 1725,
1726].
39
See https://transparencyreport.google.com/https/overview?hl=en for Google data supporting this statistic.

KA Applied Cryptography | July 2021 Page 639


The Cyber Security Body Of Knowledge
www.cybok.org

18.5.2 Secure Messaging


On the surface, secure messaging seems to have a very similar set of requirements to TLS:
we have pairs of communicating parties who wish to securely exchange messages. However,
there are significant differences, leading to different cryptographic solutions. First, secure
messaging systems are asynchronous, meaning that the communicating parties cannot easily
run an equivalent of the TLS Handshake Protocol to establish symmetric keys. Second, there
is no infrastructure analogous to the Web PKI that can be leveraged to provide authentication.
Third, group communication, rather than simply pairwise communication, is an important
use case. Fourth, there is usually a bespoke server that is used to relay messages between
the pairs of communicating parties but which should not have access to the messages
themselves.
We discuss three different secure messaging systems: Apple’s iMessage, Signal (used in
WhatsApp) and Telegram. We focus on the two-party case for brevity.

18.5.2.1 Apple iMessage

Apple’s iMessage system historically used an ad hoc signcryption scheme that was shown to
have significant vulnerabilities in [1727]. This was despite signcryption being a well-known
primitive with well-established models, generic constructions from PKE and digital signatures
and security proofs in the academic literature. Conjecturally, the designers of Apple’s scheme
were constrained by the functions available in their cryptographic library. The Apple system
relied fully on trust in Apple’s servers to distribute authentic copies of users’ public keys — a
PKI by fiat. The system was designed to be end-to-end secure, meaning that without active
impersonation via key substitution, Apple could not read users’ messages. It did not enjoy
any forward-security properties, however: once a user’s private decryption key was known, all
messages intended for that user could be read. Note that Apple’s iMessage implementation
is not open source; the above description is based on the reverse engineering carried out
in [1727] and so may no longer be accurate.

18.5.2.2 Signal

The Signal design, which is used in both Signal and WhatsApp, takes a slightly different ap-
proach in the two-party case. It uses a kind of asynchronous DHKE approach called ratcheting.
At a high level, every time Alice sends user Bob a new message, she also includes a Diffie-
Hellman (DH) value and updates her symmetric key to one derived from that DH value and the
DH value she most recently received from Bob. On receipt, Bob combines the incoming DH
value with the one he previously sent to make a new symmetric key on his side. This key is
called a chaining key.
For each message that Alice sends to Bob without receiving a reply from Bob, she derives
two new keys from the current chaining key by applying a KDF (based on HKDF) to it; one key
is used as the next chaining key, the other is used to encrypt the current message. This is
also called ratcheting by the Signal designers and the combination of ratcheting applied to
both DH values and symmetric keys is called double ratcheting.40 This mechanism provides
forward security for Signal messages, despite its asynchronous nature. It also provides post
compromise security. The use of ratcheting, however, entails problems with synchronisation:
40
See https://signal.org/docs/specifications/doubleratchet/ for a concise overview of the process.

KA Applied Cryptography | July 2021 Page 640


The Cyber Security Body Of Knowledge
www.cybok.org

if a message is lost between Alice and Bob, then their keys will end up in different states. This
is solved by keeping caches of recent chaining keys.
For symmetric encryption, Signal uses a simple generic AE construction based on EtM relying
on CBC mode using AES with 256 bit keys for the “E” component and HMAC with SHA-256 for
the “M” component. This is a conservative and well-understood design.
Authentication in Signal is ultimately the same as in iMessage: it depends on trust in the server.
The idea is that users register a collection of DH values at the server; these are fetched by other
users and used to establish initial chaining keys. However, a malicious server could replace
these values and thereby mount a MitM attack. The use of human-readable key fingerprints
provides mitigation against this attack.
A formal security analysis of the double ratcheting process used by Signal can be found
in [444]. Note that, in order to tame complexity, this analysis does not treat the composition of
the double ratchet with the symmetric encryption component. The Signal design has spurred a
spate of recent research into the question of what is the best possible security one can achieve
in two-party messaging protocols and how that security interacts with the synchronisation
issues.

18.5.2.3 Telegram

A third design is that used by Telegram.41 It is notable for the way it combines various crypto-
graphic primitives (RSA, finite field DHKE, a hash-based key derivation function, a hash-based
MAC and a non-standard encryption mode called IGE). Moreover, it does not have proper
key separation: keys used to protect messages from Alice to Bob share many overlapping
bits with keys used in the opposite direction; moreover those key bits are taken directly from
“raw” DHKE values. These features present significant barriers to formal analysis and violate
cryptographic best practices. Furthermore, Telegram does not universally feature end-to-end
encryption; rather it has two modes, one of which is end-to-end secure, the other of which
provides secure communications only from each client to the server. The latter seems to
be much more commonly used in practice, but is of course subject to interception. This
is concerning, given that Telegram is frequently used by higher-risk users in undemocratic
countries.

18.5.3 Contact Tracing à la DP-3T


The DP-3T project42 was formed by a group of academic researchers in response to the COVID-
19 pandemic, with the core aim of rapidly developing automated contact tracing technology
based on mobile phones and Bluetooth Low Energy beacons. A central objective of DP-3T was
to enable automated contact tracing in a privacy-preserving way, so without using location data
and without storing lists of contacts at a central server. DP-3T’s approach directly influenced
the Google-Apple Exposure Notification (GAEN)43 system that forms the basis for dozens of
national contact tracing apps around the world, including (after a false start) the UK system.
An overview of the DP-3T proposal is provided in [1718].
The DP-3T design uses cryptography at its heart. Each phone generates a symmetric key and
uses this as the root of a chain of keys, one key per day. Each day key in the chain is then used
41
See https://telegram.org/.
42
https://github.com/DP-3T/documents
43
See https://www.google.com/covid19/exposurenotifications/ and https://covid19.apple.com/contacttracing.

KA Applied Cryptography | July 2021 Page 641


The Cyber Security Body Of Knowledge
www.cybok.org

to generate, using a pseudo-random generator built from AES in CTR mode, a sequence of 96
short pseudo-random strings called beacons. At each 15-minute time interval during the day,
the next beacon from the sequence is selected and broadcast using BLE. Other phones in the
vicinity pick up and record the beacon-carrying BLE signals and store them in a log along with
metadata (time of day, received signal strength). Notice that the beacons are indistinguishable
from random strings, under the assumption that AES is a good block cipher.
When a user of the system receives a positive COVID-19 test, they instruct their phone to upload
the recent day keys to a central server, possibly along with sent signal strength information.44
All phones in the system regularly poll the server for the latest sets of day keys, use them to
regenerate beacons and look in their local logs to test if at some point they came into range of
a phone carried by a person later found to be infected. Using sent and received signal strength
information in combination with the number and closeness (over time) of matching beacons,
the phone can compute a risk score. If the score is above a threshold, then the phone user can
be instructed to get a COVID-19 test themselves. Setting the threshold in practice to balance
false positives against false negatives is a delicate exercise made more difficult by the fact
that BLE permits only inaccurate range estimation.
Notice that the central server in DP-3T stores only day keys released from phones by infected
parties. The central server is not capable of computing which users were in proximity to which
other users, nor even the identity of users who uploaded keys (though this information will
become visible to the health authority because of the issuance of authorisation codes). All
the comparison of beacons and risk computations are carried out on users’ phones. One
can contend that a fully centralised system could provide more detailed epidemiological
information — some epidemiologists unsurprisingly made this argument. On the other hand,
the strict purpose of the DP-3T system was to enable automated contact tracing, not to provide
an epidemiological research tool. A more detailed privacy analysis of DP-3T can be found
in [1718].
The DP-3T design was produced, analysed, prototyped and deployed under test conditions all
in the space of a few weeks. After adoption and adaptation by Google and Apple, it made its
way into national contact tracing apps within a few months. Given the pace of development,
simplicity of the core design was key. Only “off the shelf” cryptographic techniques available
in standard cryptographic libraries could be used. Given the likely scaling properties of the
system (possibly tens of millions of users per country) and the constraints of BLE message
sizes, using Public Key Cryptography was not an option; only symmetric techniques could
be countenanced. Many follow-up research papers have proposed enhanced designs using
more complex cryptographic techniques. The DP-3T team did not have this luxury and instead
stayed resolutely pragmatic in designing a system that balances privacy, functionality and
ease of deployment, and that resists repurposing.
44
To prevent spurious uploads, the day keys can only be uploaded after entering an authorisation code issued
by the local health authority into the app.

KA Applied Cryptography | July 2021 Page 642


The Cyber Security Body Of Knowledge
www.cybok.org

18.6 THE FUTURE OF APPLIED CRYPTOGRAPHY


The first 2000 years of applied cryptography were mostly about securing data in transit,
starting from the Caesar cipher and ending with today’s mass deployment of TLS, secure
messaging and privacy-preserving contact tracing systems. Today, cryptography is also heavily
used to protect data at rest, the second element in our cryptographic triumvirate from the
KA’s introduction. These two application domains will continue to develop and become more
pervasive in our everyday lives — consider for example the rise of the Internet of Things and
its inherent need for communications security.
We anticipate significant developments in the following directions:
• The debate around lawful access to encrypted data will continue. However, we reiterate
that in the face of determined but not even particularly sophisticated users, this is a
lost battle. For example, it seems unlikely that state security agencies can break the
cryptography that is used in Signal today. Instead, they will have to continue to bypass
cryptographic protections through exploitation of vulnerabilities in end systems. Legal
frameworks to enable this exist in many countries. Of course, governments could pass
legislation requiring developers to include mechanisms to enable lawful access, or could
pressure vendors into removing secure messaging applications from app stores.
• We expect that the current focus on cryptocurrencies and blockchains will result in a core
set of useful cryptographic technologies. For example, anonymous cryptocurrencies
have already been an important vehicle for forcing innovation in and maturation of
zero-knowledge proofs.
• The third element in our cryptographic triumvirate was cryptography for data under
computation. This area is undergoing rapid technical development and there is a healthy
bloom of start-up companies taking ideas like FHE and MPC to market. A good overview
of the status for “applied MPC” can be found in [463], while [1728] provides insights into
deployment challenges specific to FHE. The idea of being able to securely outsource
one’s data to third party providers and allow them to perform computations on it (as
FHE does) is very alluring. However, some 15 years after its invention, FHE still incurs
something like a 106 —108 times overhead compared to computing on plaintext data.
This limits its application to all but the most sensitive data at small scales. Meanwhile,
the core applications become ever more data- and computation-hungry, so remain
out of reach of FHE for now. FHE, MPC and related techniques also face a significant
challenge from trusted execution technologies like SGX. Approaches like SGX still rely on
cryptography for attesting to the correct execution of code in secure enclaves, but can
effectively emulate FHE functionality without such large overheads. On the other hand,
SGX and related technologies have themselves been the subject of a series of security
vulnerabilities so may offer less protection in practice than cryptographic approaches.
• One area where computing on encrypted data may have a medium-term commercial
future is in specialised applications such as searching over encrypted data and more
generally, database encryption. Current state-of-the-art solutions here trade efficiency
(in terms of computation and bandwidth overheads) for a certain amount of leakage.
Quantifying the amount of leakage and its impact on security is a challenging research
problem that must be solved before these approaches can become widely adopted.45
45
But see https://docs.mongodb.com/manual/core/security-client-side-encryption/ for details of MongoDB’s use of
deterministic symmetric encryption to enable searchable field-level encryption.

KA Applied Cryptography | July 2021 Page 643


The Cyber Security Body Of Knowledge
www.cybok.org

• Another growth area for applied cryptography is in privacy-preserving techniques for data-
mining and data aggregation. Google’s privacy-preserving advertising framework [1729]
provides one prominent example. Another is the Prio system [461] that allows privacy-
preserving collection of telemetry data from web browsers. Prio has been experimentally
deployed in Mozilla’s Firefox browser.46
• Electronic voting (e-voting) has long been touted as an application area for cryptography.
There is a large scientific literature on the problem. However, the use of e-voting in
local and national elections has proved problematic, with confidence-sapping security
vulnerabilities having been found in voting software and hardware. For example, a recent
Swiss attempt to develop e-voting was temporarily abandoned after severe flaws were
found in some of the cryptographic protocols used in the system during a semi-open
system audit [1730]. The Estonian experience has been much more positive, with a
system built on Estonia’s electronic identity cards having been in regular use (and having
seen regular upgrades) since 2005. Key aspects of the Estonian success are openness,
usability and the population’s broad acceptance of and comfort with online activity.
• We may see a shift in how cryptography gets researched, developed and then deployed.
The traditional model is the long road from research to real-world use. Ideas like MPC
have been travelling down this road for decades. Out of sheer necessity, systems like DP-
3T have travelled down the road much more quickly. A second model arises when practice
gets ahead of theory and new theory is eventually developed to analyse what is being
done in practice; often this leads to a situation where the practice could be improved by
following the new theory, but the improvements are slow in coming because of the drag
of legacy code and the difficulty of upgrading systems in operation. Sometimes a good
attack is needed to stimulate change. A third model is represented by TLS 1.3: academia
and industry working together to develop a complex protocol over a period of years.
• Cryptography involves a particular style of thinking. It involves quantifying over all adver-
saries in security proofs (and not just considering particular adversarial strategies), being
conservative in one’s assumptions, and rejecting systems even if they only have “certifi-
cational flaws”. Such adversarial thinking should be more broadly applied in security
research. Attacks on machine learning systems are a good place where this cross-over
is already bearing fruit.

CROSS-REFERENCE OF TOPICS VS REFERENCE MATERIAL

Topics Cites
18.1 Algorithms, Schemes and Protocols [963, 1619, 1620]
18.2 Cryptographic Implementation [405, 1453, 1637, 1677]
18.3 Key Management [1620, 1666, 1696, 1697]
18.4 Consuming Cryptography
18.5 Applied Cryptography in Action [444, 1480, 1718]
18.6 The Future of Applied Cryptography

46
See https://blog.mozilla.org/security/2019/06/06/next-steps-in-privacy-preserving-telemetry-with-prio/.

KA Applied Cryptography | July 2021 Page 644


Chapter 19
Network Security
Christian Rossow CISPA Helmholtz Center
for Information Security
Sanjay Jha University of New South Wales

645
The Cyber Security Body Of Knowledge
www.cybok.org

INTRODUCTION
The ubiquity of networking allows us to connect all sorts of devices and gain unprecedented
access to a whole range of applications and services anytime, anywhere. However, our heavy
reliance on networking technology also makes it an attractive target for malicious users who
are willing to compromise the security of our communications and/or cause disruption to
services that are critical for our day-to-day survival in a connected world. In this chapter, we
will explain the challenges associated with securing a network under a variety of attacks for a
number of networking technologies and widely used security protocols, along with emerging
security challenges and solutions. This chapter aims to provide the necessary background in
order to understand other knowledge areas, in particular the Security Operations & Incident
Management Knowledge Area (Chapter 8) which takes a more holistic view of security and
deals with operational aspects. An understanding of the basic networking protocol stack
and popular network protocols is assumed. Standard networking text books explain the
fundamentals of the layered Internet Protocol suite [1731, 1732].
This chapter is organized as follows. In Section 19.1, we lay out the foundations of this chapter
and define security goals in networked systems. As part of this, we also outline attackers and
their capabilities that threaten these goals. In Section 19.2, we describe six typical networking
scenarios that nicely illustrate why security in networking is important, and achieving it can be
non-trivial. We then discuss the security of the various networking protocols in Section 19.3,
structured by the layered architecture of the Internet protocol stack. In Section 19.4, we
present and discuss several orthogonal network security tools such as firewalls, monitoring
and Software Defined Networking (SDN). We complete this chapter with a discussion on how
to combine the presented mechanisms in Section 19.5.

CONTENT

19.1 SECURITY GOALS AND ATTACKER MODELS


[1731, c8] [1733, c1] [1734, c1] [1735, c6] [1732, c8]
We want and need secure networks. But what does secure actually mean? In this chapter, we
define the fundamentals of common security goals in networking. Furthermore, we discuss
capabilities, positions and powers of attackers that aim to threaten these security goals.

19.1.1 Security Goals in Networked Systems


When designing networks securely, we aim for several orthogonal security goals [1733]. The
most commonly used security goals are summarized in the CIA triad: confidentiality, integrity
and availability. Confidentiality ensures that untrusted parties cannot leak or infer sensitive
information from communication. For example, in a confidential email communication system,
(i) only the sender and recipient of an email can understand the email content, not anyone
on the communication path (e.g., routers or email providers), and (ii) no one else ought to
learn that email was sent from the sender to the recipient. Integrity ensures that untrusted
parties cannot alter information without the recipient noticing. Sticking to our email example,
integrity guarantees that any in-flight modification to an email (e.g., during its submission,
or on its way between email providers) will be discovered as such by the recipient. Finally,

KA Network Security | July 2021 Page 646


The Cyber Security Body Of Knowledge
www.cybok.org

availability ensures that data and services should be accessible by their designated users all
the time. In our email scenario, a Denial of Service (DoS) attacker may aim to threaten the
availability of email servers in order to prevent or delay email communication.
Next to the CIA triad, there are more subtle security goals, not all of which apply in each and
every application scenario. Authenticity is ensured if the recipient can reliably attribute the
origin of communication to the sender. For example, an email is authentic if the recipient can
ensure that the claimed sender actually sent this email. Non-repudiation extends authenticity
such that we can prove authenticity to arbitrary third parties, i.e., allowing for public verification.
In our email scenario, non-repudiation allows the email recipient to prove to anyone else that
a given email stems from a given sender. Anonymity means that communication cannot
be traced back to its sender (sender anonymity) and/or recipient (recipient anonymity). For
example, if an attacker sends a spoofed email that cannot be reliably traced back to its actual
sender (e.g., the correct personal identity of the attacker), it is anonymous. There are further
privacy-related guarantees such as unlinkability that go beyond the scope of this chapter and
are defined in the Privacy & Online Rights Knowledge Area (Chapter 5).
To achieve security goals, we will heavily rely on cryptographic techniques such as public and
symmetric keys for encryption and signing, block and stream ciphers, hashing, and digital
signature, as described in the Cryptography Knowledge Area (Chapter 10) and the Applied
Cryptography Knowledge Area (Chapter 10). Before showing how we can use these techniques
for secure networking, though, we will discuss attacker models that identify capabilities of
possible attackers of a networked system.

19.1.2 Attacker Models


Attacker models are vital to understand the security guarantees of a given networked system.
They define the capabilities of attackers and determine their access to the network.
Often, the Dolev-Yao [1230] attacker model is used for a formal analysis of security protocols
in the research literature. The Dolev-Yao model assumes that an attacker has complete control
over the entire network, and concurrent executions of the protocol between the same set
of two or more parties can take place. The Dolev-Yao model describes the worst possible
attacker: an attacker that sees all network communication, allowing the attacker to read any
message, prevent or delay delivery of any message, duplicate any message, or otherwise
synthesise any message for which the attacker has the relevant cryptographic keys (if any).
Depending on the context, real attackers may have less power. For example, we can distin-
guish between active and passive attackers. Active attackers, like Dolev-Yao, can manipulate
packets. In contrast, passive attackers (eavesdroppers) can observe but not alter network
communication. For example, an eavesdropper could capture network traffic using packet
sniffing tools in order to extract confidential information such as passwords, credit card details
and many other types of sensitive information from unprotected communication. But even
if communication is encrypted, attackers may be able to leverage communication patterns
statistics to infer sensitive communication content (“traffic analysis”, see Section 19.3.1.6).
We furthermore distinguish between on-path and off-path attackers. The prime example for
an on-path attacker is a person-in-the-middle (PITM), where an attacker is placed between two
communication parties. In contrast, off-path attackers can neither see nor directly manipulate
the communication between parties. Still, off-path attackers can cause severe harm. For
example, an off-path attacker could spoof TCP packets aiming to maliciously terminate a

KA Network Security | July 2021 Page 647


The Cyber Security Body Of Knowledge
www.cybok.org

suspected TCP connection between two parties. Similarly, off-path attackers could abuse high-
bandwidth links and forge Internet Protocol (IP) headers (IP spoofing, see Section 19.3.2.4)
to launch powerful and anonymous Denial of Service (DoS) attacks. Using forged routing
protocol message, off-path attackers may even try to become on-path attackers.
In addition, the position of attackers heavily influences their power. Clearly, a single Internet
user has less power than an entire rogue Internet Service Provider (ISP). The single user can
leverage their relatively small bandwidth to launch attacks, while an ISP can generally also sniff
on and alter communication, abuse much larger bandwidths, and correlate traffic patterns.
Then again, as soon as attackers aggregate the power of many single users/devices (e.g., in
form of a botnet), their overall power amplifies. Attackers could also be in control of certain
Internet services, routers, or any combination thereof. We also distinguish between insider
and outsider attackers, which are either inside or outside of a trusted domain, respectively.
Overall, we model (i) where attackers can be positioned, (ii) who they are, and (iii) which capa-
bilities they have. Unfortunately, in strong adversarial settings, security guarantees diminish
way too easily. For example, strong anonymity may not hold against state actors who can
(theoretically) control major parts of the Internet, such as Tier-1. Similarly, availability is hard
to maintain for spontaneous and widely distributed DoS incidents.

19.2 NETWORKING APPLICATIONS


[1731, c1] [1732, c1]
We now turn to a selection of popular concrete networking applications that help us illustrate
why achieving security is far from trivial. This chapter is less about providing solutions—
which are discussed in the subsequent two chapters—but more about outlining how security
challenges differ by application and context. To this end, we will introduce various network-
ing applications, starting with typical local networks and the Internet, and continuing with
similarly ubiquitous architectures such as bus networks (e.g., for cyber-physical system),
fully-distributed networks, wireless networks, and finally, Software Defined Networking (SDN).

19.2.1 Local Area Networks (LANs)


Arguably, a Local Area Network (LAN) is the most intuitive and by far predominant type of
network with the (seemingly) lowest security requirements. Local networks connect systems
within an internal environment, shaping billions of home networks and corporate networks
alike. A typical fallacy of LAN operators is to blindly trust their network. However, the more
clients (can possibly) connect to a LAN, the harder it is to secure a LAN environment. In fact,
the clients themselves may undermine the assumed default security of a LAN. For example,
without further protection, existing clients can find and access unauthorized services on
a LAN, and possibly exfiltrate sensitive information out of band. This becomes especially
problematic if the clients are no longer under full control of the network/system operators.
A prime example is the recent rise of the Bring Your Own Device (BYOD) principle, which
allows employees to integrate their personal untrusted devices into corporate networks—in
the worst case, even creating network bridges to the outside world. Similarly, attackers may
be able to add malicious clients, such as by gaining physical access to network plugs. Finally,
attackers may impersonate other LAN participants by stealing their loose identity such as
publicly-known hardware addresses (e.g., cloning MAC addresses in Ethernet).

KA Network Security | July 2021 Page 648


The Cyber Security Body Of Knowledge
www.cybok.org

Consequently, even though a LAN is conceptually simple, we still have uncertainties regarding
LAN security: Can we control which devices become part of a network to exclude untrusted
clients and/or device configurations? (Sections 19.3.4.1 and 19.4.5) Can we monitor their
actions to identify attackers and hold them accountable? (Section 19.4.3) Can we partition
larger local networks into multiple isolated partitions to mitigate potential damage? (Sec-
tion 19.3.4.5)

19.2.2 Connected Networks and the Internet


Securing communication becomes significantly more challenging when connecting local
networks. The most typical scenario is connecting a LAN to the Internet, yet also LAN-to-LAN
communication (e.g., two factories part of a joint corporate network using a Virtual Private
Network (VPN)) is common. With such connected networks, we suddenly face an insecure end-
to-end channel between the networks which is no longer in control of blindly trusted network
operators. When communicating over the Internet, the traffic will pass several Autonomous
Systems (ASs), each of which can in principle eavesdrop and manipulate the communication.
Worse, senders typically have little to no control over the communication paths. In fact, even
ASs cannot fully trust all paths in their routing table. Without further precautions, any other
(misbehaving) AS on the Internet can reroute a target’s network traffic by hijacking Internet
routes. For example, a state actor interested in sniffing on the communication sent to a
particular network can send malicious route announcements to place itself as PITM between
all Internet client and the target network. Finally, especially corporate networks may choose
to “include” third-party services into their network, such as data centers for off-site storage or
clouds for off-site computations. Suddenly, external networks are integrated into corporate
networks, and organizations may send sensitive data to servers outside of their control.
This leads to several questions: Can two parties securely communicate over an insecure
channel, i.e., with guaranteed confidentiality and integrity? (Section 19.3.2.1) How should we
securely connect networks, e.g., in a corporate setting? (Section 19.3.3.1) Can we ensure that
the underlying routing is trusted, or at least detect attacks? (Section 19.3.3.3)

19.2.3 Bus Networks


Cyber-physical systems regularly use bus networks to communicate. For example, industrial
control systems may use Modbus [1736] to exchange status information and steer cyber-
physical devices. Home automation networks (e.g., Konnex Bus (KNX) [1737]) allow to sense
inputs in houses (temperature, brightness, activity) which in turn govern actuators (e.g., heat-
ing/cooling, light, doors). Vehicular networks (e.g., Controller Area Network (CAN) [1738])
connect dozens if not hundreds of components that frequently communicate, e.g., sensors
such as a radar to control actuators such as the engine or brakes. All these bus networks
typically also form local networks, yet with subtle differences to the aforementioned LANs.
Frequently, bus clients require real-time guarantees for the arrival and processing time of
data. Clearly, even safety is affected if the signal of a car’s brake pedal arrives “too late” at its
destination (the brake system). This need for real-time guarantees are often in direct conflict
to simple security demands such as authenticity that may add “expensive” computations.
Furthermore, by design, bus networks introduce a shared communication medium, which
allows clients to both, sniff and manipulate communication. On top, many of the bus pro-
tocols were designed without paying particular attention to security, partially because they
predate security best practices we know today. Finally, bus clients can have limited computing

KA Network Security | July 2021 Page 649


The Cyber Security Body Of Knowledge
www.cybok.org

resources, having only limited capability to perform expensive security operations.


We summarize our discussion with a set of questions on bus security: Can we retrospectively
add security to unprotected bus networks without breaking compatibility? Is it possible to
gain security without violating real-time guarantees? (Section 19.3.4.7) Which additional
mechanisms can we use to reduce the general openness of bus networks? (Section 19.3.4.5)

19.2.4 Wireless Networks


When moving from wired to wireless networks, we do not necessarily face fundamentally new
threats, but increase the likelihood of certain attacks to occur. In particular, wireless networks
(e.g., Wireless LAN) are prone to eavesdropping due to the broadcast nature of their media.
Similarly, while simple physical access control may still partially work to protect a cable-
connected LAN network, it fails terribly for wireless networks that—by construction—have no
clear boundary for potential intruders. For example, walls around a house do not stop signals
from and to wireless home networks, and without further protection, attackers can easily join
and sniff on networks. In wireless settings, we thus have to pay particular attention to both
access control and secure communication (including securing against traffic analysis). As it
turns out, though, defining secure wireless standards is far from trivial, and several wireless
and cellular standards have documented highly-critical vulnerabilities or design flaws.
To better understand their deficiencies, we will study the following questions: How can we
enforce access control in wireless networks, while not sacrificing usability? How can we
prevent eavesdropping attacks in wireless communication? (Section 19.3.4.6)

19.2.5 Fully-Distributed Networks: DHTs and Unstructured P2P Networks

Centralized client-server architectures have clear disadvantages from an operational point of


view. They have a single point of failure, and require huge data centers in order to scale to
many clients. Furthermore, centralized architectures have a central entity that can control or
deliberately abandon them. Fully-distributed networks provide scalability and resilience by
design. Not surprisingly, several popular networks chose to follow such distributed designs,
including cryptocurrencies such as Bitcoin or file sharing networks. Decentralized network can
be roughly grouped in two types. Distributed Hash Tables (DHTs) such as Kademlia [1069] and
Freenet [569] have become the de-facto standard for structured Peer to Peer (P2P) networks,
and offer efficient and scalable message routing to millions of peers. In unstructured P2P
networks, peers exchange data hop-by-hop using gossip protocols. Either way, P2P networks
are designed to welcome new peers at all times. This lack of peer authentication leads to
severe security challenges. A single entity may be able to aggressively overpopulate a network
if no precautions are taken. DHTs further risk that attackers can potentially disrupt routing
towards certain data items. Similarly, gossip networks risk that single entities flood the network
with malicious data. Finally, due to the open nature of the networks, we also face privacy
concerns, as network participants can infer which data items a peer retrieves.
All in all, fully-distributed networks raise fundamentally different security concerns: Can we
reliably route data (or data requests) to their target? Can we authenticate peers even in
distributed networks? What are the security implications of storing data in and retrieving data
from a publicly accessible network? (Section 19.3.1.5)

KA Network Security | July 2021 Page 650


The Cyber Security Body Of Knowledge
www.cybok.org

19.2.6 Software-Defined Networking and Network Function Virtualisation

Software Defined Networking (SDN) is our final use case. SDN is strictly speaking not a
networking application, but more of a technology to enable for dynamic and efficient network
configuration. Yet it concerns similarly many security implications as other applications do.
SDN aims to ease network management by decoupling packet forwarding (data plane) and
packet routing (control plane). This separation and the underlying flow handling have enabled
for drastic improvements from a network management perspective, especially in highly-
dynamic environments such as data centers. The concept of Network Functions Virtualisation
(NFV) complements SDN, and allows to virtualize network node functions such as load
balancers or firewalls.
We will revisit SDN by discussing the following questions: How can SDN help in network
designs and better monitoring? Can NFV help securing networks by virtualizing security
functions? Are there new, SDN-specific threats? (Section 19.4.4)

19.3 NETWORK PROTOCOLS AND THEIR SECURITY

Server
Client
Communication
Network

Application Application Layer Application

Transport Transport Layer Transport

Internet Inet Layer Internet Internet Inet Layer Internet

Link Link Layer Link Link Link Layer Link

Physical path traversed by data


Logical path traversed by data
Network Devices

Figure 19.1: Four-Layer Internet Protocol Stack (a.k.a. TCP/IP stack).

After having introduced several networking applications, we now turn to the security of net-
working protocols. To guide this discussion, we stick to a layered architecture that categorizes
protocols and applications. Indeed, a complex system such as distributed applications run-
ning over a range of networking technologies is best understood when viewed as layered
architecture. Figure 19.1 shows the 4-layer Internet protocol suite and the interaction between
the various layers. For each layer, we know several network protocols—some of which are
quite generic, and others that are tailored for certain network architectures. The Internet is

KA Network Security | July 2021 Page 651


The Cyber Security Body Of Knowledge
www.cybok.org

the predominant architecture today, and nicely maps to the TCP/IP model. It uses the Internet
Protocol (IP) (and others) at the Internet layer, and UDP/TCP (and others) at the transport
layer—hence, the Internet protocol suite is also known as the TCP/IP stack.
Other networking architectures such as automotive networks use completely different sets of
protocols. Not always it is possible to directly map their protocol to the layered architecture of
the TCP/IP model. Consequently, more fine-grained abstractions such as the ISO/OSI model
extend this layered architecture. For example, the ISO/OSI model splits the link layer into two
parts, namely the data link layer (node-to-node data transfer) and the physical layer (physical
transmission and reception of raw data). The ISO/OSI model defines a network layer instead
of an Internet layer, which is more inclusive to networks that are not connected to the Internet.
Finally, ISO/OSI defines two layers below the application layer (presentation and session).
The vast majority of topics covered in this chapter do not need the full complexity of the
ISO/OSI model. In the following, we therefore describe the security issues and according
countermeasures at each layer of the TCP/IP model. We thereby follow a top-down approach,
starting with application-layer protocols, and slowly going down to the lower layers until the
link layer. Whenever possible, we abstract from the protocol specifics, as many discussed
network security principles can be generically applied to other protocols.

19.3.1 Security at the Application Layer


[1731, c8] [1734, c6,c15,c19–c22] [1732, c8]
We first seek to answer how application protocols can be secured, or how application-layer
protocols can be leveraged to achieve certain security guarantees.

19.3.1.1 Email and Messaging Security

As a first example of an application-layer security protocol, we will look at secure email. Given
its age, the protocol for exchanging emails, Simple Mail Transfer Protocol (SMTP), was not
designed with security in mind. Still, businesses use email even now. Communication parties
typically want to prevent others from reading (confidentiality) or altering (integrity) their emails.
Furthermore, they want to verify the sender’s identity when reading an email (authenticity).
Schemes like Pretty Good Privacy (PGP) and Secure Multipurpose Internet Mail Extensions
(SMIME) provide such end-to-end security for email communication. Their basic idea is that
each email user has their own private/public key pair–see the Cryptography Knowledge Area
(Chapter 10) for the cryptographic details, and Section 19.3.2.2 for a discussion how this key
material can be shared. The sender signs the hash of a message using the sender’s private key,
and sends the hash along with the (email) message to the recipient. The recipient can then
validate the email’s signature using the sender’s public key. Checking this signature allows for
an integrity check and authentication at the same time, as only the sender knows their private
key. Furthermore, this scheme provides non-repudiation as it can be publicly proved that the
hash (i.e., the message) was signed by the sender’s private key. To gain confidentiality, the
sender encrypts the email before submission using “hybrid encryption”. That is, the sender
creates a fresh symmetric key used for message encryption, which is significantly faster than
using asymmetric cryptography. The sender then shares this symmetric key with the recipient,
encrypted under the recipient’s public key.
This very same scheme can be applied to other client-to-client communication. For example,
instant messengers (e.g., WhatsApp, Threema or Signal) or video conference systems can use

KA Network Security | July 2021 Page 652


The Cyber Security Body Of Knowledge
www.cybok.org

this general principle to achieve the same end-to-end guarantees. One remaining challenge
for such strong guarantees to hold is that user identities (actually, their corresponding key
material) have to be reliably validated [1739]. The Applied Cryptography Knowledge Area
(Chapter 18) has more details.
Not all email users leverage such client-to-client security schemes, though. Both PGP and
SMIME have usability challenges (e.g., key distribution, difficulty of indexed searches, etc.) that
hamper wide adoption [347]. To address this issue, we can secure mail protocols (SMTP, but
also Internet Message Access Protocol (IMAP) and Post Office Protocol (POP)) with the help
of TLS (see Section 19.3.2.1). By wrapping them in TLS, we at least achieve hop-by-hop security,
e.g., between client and their mail submission server or between mail servers during email
transfer. Consequently, we can protect email submission, retrieval and transport from on-path
adversaries. However, even though communication is protected hop-by-hop, curious mail
server operators can see emails in plain. Only end-to-end security schemes like PGP/SMIME
protect against untrusted mail server operators.
There are other challenges to secure email, such as phishing and spam detection, which are
described in depth in the Adversarial Behaviours Knowledge Area (Chapter 7).

19.3.1.2 Hyper Text Transfer Protocol Secure (HTTPS)

The most prominent application-layer protocol, the Hypertext Transfer Protocol (HTTP), was
designed without any security considerations. Yet, the popularity of HTTP and its unprece-
dented adoption for e-commerce imposed strict security requirements on HTTP later on.
Its secure counterpart HTTPS wraps HTTP using a security protocol at the transport layer
(TLS, see Section 19.3.2.1), which can be used to provide confidentiality and integrity for the
entire HTTP communication—including URL, content, forms and cookies. Furthermore, HTTPS
allows clients to implicitly authenticate web servers using certificates. HTTPS is described in
much greater detail in the Web & Mobile Security Knowledge Area (Chapter 16).

19.3.1.3 DNS Security

In its primary use case, the Domain Name System (DNS) translates host names to their
corresponding IP addresses. A hierarchy of authoritative name servers (NSs) maintain this
mapping. Resolving NSs (resolvers) iteratively look up domain names on behalf of clients. In
such an iterative lookup, the resolver would first query the root NSs, which then redirect the
resolver to NSs lower in the DNS hierarchy, until the resolver contacts an NS that is authoritative
for the queried domain. For example, in a lookup for a domain sub.example.com, a root
NS would redirect the resolver to a NS that is authoritative for the .com zone, which in turn
tell the resolver to contact the NS authoritative for *.example.com. To speed up these
lookups, resolvers cache DNS records according to a lifetime determined by their authoritative
NSs. To minimize privacy leaks towards NSs in the upper hierarchy, resolvers can minimize
query names such that NSs higher up in the hierarchy do not learn the fully-qualified query
name [1740].
Unfortunately, multiple attacks aim to abuse the lack of authentication in plain text DNS. A
PITM attacker can impersonate a resolver, return bogus DNS records and divert traffic to a
malicious server, thus allowing them to collect user passwords and other credentials. In a DNS
cache poisoning attack, adversaries aim to implant bogus name records, thus diverting a user’s
traffic towards the target domain to attacker-controlled servers. Learning from these attacks,
the IETF introduced the DNS Security Extensions (DNSSEC). DNSSEC allows authoritative

KA Network Security | July 2021 Page 653


The Cyber Security Body Of Knowledge
www.cybok.org

name servers to sign DNS records using their private key. The authenticity of the DNS records
can be verified by a requester using the corresponding public key. In addition, a digital signature
provides integrity for the response data. The overall deployment of DNSSEC at the top-level
domain root name servers—a fundamental requirement to deploy DNSSEC at lower levels in
the near future—steadily increases [1741].
DNSSEC explicitly does not aim to provide confidentiality, i.e., DNS records are still communi-
cated unencrypted. DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem.
They provide end-to-end security between the DNS client and its chosen resolver, by tunneling
DNS via secure channels, namely TLS (see Section 19.3.2.1) or HTTPS (see Section 19.3.1.2),
respectively. More and more popular Web browsers (e.g., Chrome, Firefox) enable DoH by
default, using selected resolvers preconfigured by the browser vendors. This resulted in a
massive centralization of DNS traffic towards just a few resolvers received. Such a centraliza-
tion puts resolvers in a quite unique position with the power of linking individual clients (by
IP addresses) to their lookups. Oblivious DNS Over HTTPS (ODoH) addresses this issue by
adding trusted proxies between DNS clients and their chosen resolvers [1742].
Irrespective of these security protocols, resolvers are in a unique situation to monitor name
resolutions of their clients. Resolver operators can leverage this in order to protect clients by
offering some sort of blocklist of known “misbehaving” domains which have a bad reputation.
Such DNS filtering has the potential to mitigate cyber threats, e.g., by blocking phishing
domains or command & control domains of known malware variants.
Finally, DNS is prone to Distributed Denial of Service (DDoS) attacks [1743]. DNS authoritative
servers can be targeted by NXDOMAIN attacks, in which an IP-spoofing client looks up many
unassigned subdomains of a target domain at public (open) resolvers. Subdomains are
typically chosen at random and are therefore not cached, hence sometimes referred to as
random subdomain attack. Consequently, the resolvers have to forward the lookups and hence
flood the target authoritative name server. In another type of DDoS attack, DNS servers (both
resolvers and authoritatives) are regularly abused for amplification DDoS attacks, in which
they reflect IP-spoofed DNS requests with significantly larger responses [1744]. Reducing the
number of publicly-reachable open DNS resolvers [1745] and DNS rate limiting can mitigate
these problems.

19.3.1.4 Network Time Protocol (NTP) Security

The Network Time Protocol (NTP) is used to synchronise devices (hosts, server, routers etc.)
to within a few milliseconds of Coordinated Universal Time (UTC). NTP clients request times
from NTP servers, taking into account round-trip times of this communication. In principle,
NTP servers use a hierarchical security model implementing digital signatures and other
standard application-layer security mechanisms to prevent transport-layer attacks such as
replay or PITM. However, these security mechanisms are rarely enforced, easing attacks that
shift the time of target system [1746] in both, on-path and off-path attacks. In fact, such time
shifting attacks may have severe consequences, as they, e.g., allow attackers to use outdated
certificates or force cache flushes. The large number of NTP clients that rely on just a few
NTP servers on the Internet to obtain their time are especially prone to this attack [1747]. To
counter this threat, network operators should install local NTP servers that use and compare
multiple trusted NTP server peers. Alternatively, hosts can use NTP client implementations
that offer provably secure time crowdsourcing algorithms [1748].

KA Network Security | July 2021 Page 654


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.1.5 Distributed Hash Table (DHT) Security

There are two main threats to DHTs: (i) Eclipse and (ii) Sybil attacks. An Eclipse attacker aims
to poison routing tables to isolate target nodes from other, benign overlay peers. Redundancy
helps best against Eclipse attacks. For example, systems like Kademlia foresee storage and
routing redundancy, which mitigate some low-profile attacks against DHTs. In the extreme,
DHT implementations can use dedicated routing tables with verified entries [1749]. Central
authorities—which lower the degree of distribution of DHTs, though—can solve the underlying
root problem and assign stable node identifiers [1088].
In a Sybil attack, an adversary introduces malicious nodes with self-chosen identifiers to
subvert DHT protocol redundancy [1084]. To prevent such Sybils, one can limit the number
of nodes per entity, e.g., based on IP addresses [1749]—yet causing collateral damage to
nodes sharing this entity (e.g., multiple peers behind a NAT gateway). Others suggested to
use peer location as identifier validation mechanisms, which, however, prevents that nodes
can relocate [1750]. Computational puzzles can slow down the pace at which attackers can
inject malicious peers [1751], but are ineffective against distributed botnet attacks. Finally,
reputation systems enable peers to learn trust profiles of their neighbors [1752], which ideally
discredit malicious nodes [1752].
Unfortunately, all these countermeasures either restrict the generality of DHTs, or introduce a
centralized component. Therefore, most defenses have not fully evolved from academia
into practice. A more complete treatment of DHT security is provided by Urdaneta and
van Steen [1088] and in the Distributed Systems Security Knowledge Area (Chapter 12).

19.3.1.6 Anonymous and Censorship-Free Communication

Anonymity in communication is a double-edged sword. On the one hand, we want to hold


malicious communication parties accountable. On the other hand, other scenarios may indeed
warrant for anonymous communication. For example, democratic minds may want to enable
journalists to communicate even in suppressing regimes.
Anonymity can best be achieved when mixing communication with others and rerouting it
over multiple parties. Onion routing represents the de facto standard of such Anonymous
Communication Networks (ACNs). Tor is the most popular implementation of this general
idea, in which parties communicate via (typically three) onion routers in a multi-layer encrypted
overlay network [1753]. A Tor client first selects (typically three) onion routers, the entry node,
middle node(s), and exit node. The client then establishes an end-to-end secure channel (using
TLS) to the entry node, which the client then uses to create another end-to-end protected
between stream to the middle node. Finally, the client uses this client-to-middle channel to
initiate an end-to-end protected stream to the exit node. The resulting path is called a circuit.
Tor clients communicate over their circuit(s) to achieve sender anonymity. Any data passed
via this circuit is encrypted three times, and each node is responsible for decrypting one layer.
Onion routing provides quite strong security guarantees. First of all, the entry and middle
node cannot decrypt the communication passed over the circuit, only the exit node can.
Furthermore, none of the proxies can infer both communication endpoints. In fact, only the
entry node knows the client, and only the exit node knows the server. This also means that
servers connected via onion routing networks do not learn the true addresses of their clients.
This general concept can be expanded to achieve recipient anonymity, i.e., protect the identity
of servers. For example, Tor allows for so-called onion services which can only be contacted

KA Network Security | July 2021 Page 655


The Cyber Security Body Of Knowledge
www.cybok.org

by Tor clients that know the server identity (a hash over their public key). As onion services
receive data via Tor circuits and can never be contacted directly, their identity remains hidden.
While Tor gives strong anonymity guarantees, it is not fully immune against deanonymisation.
In particular, traffic analysis and active traffic delay may help to infer the communication
partners, especially if entry and exit node collaborate. In fact, it is widely accepted that power-
ful adversaries can link communication partners by correlating traffic entering and leaving
the Tor network [515, 1754]. Furthermore, patterns such as inter-arrival times or cumulative
packet sizes were found sufficient to attribute encrypted communication to a particular web-
site [1755]. Consequently, attackers may be able to predict parts of the communication content
even though communication is encrypted and padded. As a response, researchers explored
countermeasures such as constant rate sending or more efficient variants of it [1756].
Orthogonal to ACNs, censorship-resistant networks aim to prevent that attackers can suppress
communication. The typical methodology here is to blend blocklisted communication into
allowed traffic. For example, decoy routing uses on-path routers to extract covert (blocklisted)
information from an overt (allowed) channel and redirects this hidden traffic to the true
destination [581]. Similarly, domain fronting leverages allowed TLS endpoints to forward a
covert stream—hidden in an allowed TLS stream—to the actual endpoint [578]. Having said
this, nation state adversaries have the power to turn off major parts (or even all) of the
communication to radically subvert these schemes at the expense of large collateral damage.

19.3.2 Security at the Transport Layer


[1731, c8] [1733, c4,c6] [1732, c8]
In this subsection, we discuss the security on the transport layer which sits below the applica-
tion layer in the protocol stack.

19.3.2.1 TLS (Transport Layer Security)

Application-layer protocols rely on the transport layer to provide confidentiality, integrity and
authentication mechanisms. These capabilities are provided by a shim layer between the
application and transport layers, called the Transport Layer Security (TLS). In this section,
our discussions will be hugely simplified to just cover the basics of the TLS protocol. For a
more detailed discussion, including the history of TLS and past vulnerabilities, see Applied
Cryptography Knowledge Area (Section 18.5).
We discuss the most recent and popular TLS versions 1.2 and 1.3, with a particular focus
on their handshakes. Irrespective of the TLS version, the handshake takes care of crypto-
graphic details that application-layer protocols otherwise would have to deal with themselves:
authenticating each other, agreeing on cryptographic cipher suites, and deriving key material.
The handshakes differ between the two TLS versions, as shown in Figure 19.2. We start
discussing TLS 1.2, as shown on the left-hand side of the figure. First, client and server
negotiate which TLS version and cipher suites to use in order to guarantee compatibility
even among heterogeneous communication partners. Second, server and client exchange
certificates to authenticate each other, whereas client authentication is optional (and for
brevity, omitted in Figure 19.2). Certificates contain communication partner identifiers such
as domain names for web servers, and include their vetted public keys (see Section 19.3.2.2
for details). Third, the communication partners derive a symmetric key that can be used to

KA Network Security | July 2021 Page 656


The Cyber Security Body Of Knowledge
www.cybok.org

TLS 1.2 Server TLS 1.3


Client Client

Internet Internet

TCP 3-way Handshake TCP 3-way Handshake

TLS
Hand-
shake

Secure Application Data Application Data


Channel

Figure 19.2: TLS Handshake: Comparison between TLS 1.2 (on the left) and TLS 1.3 (on the
right), excluding the optional steps for client authentication.

secure the data transfer. To derive a key, the client can encrypt a freshly generated symmetric
key under the server’s public (e.g., RSA) key. Alternatively, the partners can derive a key using
a Diffie-Hellman Key Exchange (DHKE). The DHKE provides TLS with perfect forward secrecy
that prevents attackers from decrypting communication even if the server’s private key leaks.
As a final step, the handshake then validates the integrity of the handshake session. From now
on, as part of the data transfer phase, TLS partners use the derived key material to encrypt
and authenticate the subsequent communication.
TLS 1.3, as shown on the right of Figure 19.2, designs this handshake more efficiently. Without
sacrificing security guarantees, TLS 1.3 reduces the number of round-trip times to one (1-
RTT). TLS 1.3 no longer supports RSA-based key exchanges in favor of DHKE. The client
therefore guesses the chosen key agreement protocol (e.g., DHKE) and sends its key share
right away in the first step. The server would then respond with the chosen protocol, its
key share, certificate and a signature over the handshake (in a CertificateVerify message). If
the client was connected to the server before, TLS 1.3 even supports a handshake without
additional round-trip time (0-RTT)—at the expense of weakening forward secrecy and replay
prevention. Finally, as Formal Methods for Security Knowledge Area (Chapter 13) explores,
TLS 1.3 has the additional benefit that it is formally verified to be secure [1719, 1757].
We now briefly discuss how TLS successfully secures against common network attacks.
First, consider an eavesdropper that wants to obtain secrets from captured TLS-protected
traffic. As the user data is encrypted, no secrets can be inferred. Second, in an IP spoofing
attack, attackers may try any of the TLS partners to accept bogus data. However, to inject data,
attackers lack the secret key to inject encrypted content. Third, also data cannot be altered, as
TLS protects data integrity using authenticated encryption or message authentication codes.
Finally, even a strong PITM attack is prevented by the help of certificates that authenticate
the parties—unless the PITM attacker can issue certificates that the TLS partners trust, as
discussed next. The TLS protocol also guarantees that payload arrives at the application in

KA Network Security | July 2021 Page 657


The Cyber Security Body Of Knowledge
www.cybok.org

order, detects dropped and modified content, and also effectively prevents replay attacks that
resend the same encrypted traffic to duplicate payload. Having said this, TLS does not prevent
attackers from delaying parts or all of the communication.

19.3.2.2 Public Key Infrastructure

So far we have simply assumed that communication partners can reliably obtain trustworthy
public keys from each other. However, in presence of active on-path attackers, how can one
trust public keys exchanged via an insecure channel? The fundamental “problem” is that,
conceptually, everyone can create public/private key pairs. Public-Key Infrastructure (PKI)
provides a solution for managing trustworthy public keys (and, implicitly, their private key
counterparts). Government agencies or standard organisations appoint registrars who issue
and keep track of so-called certificates on behalf of entities (individuals, servers, routers etc).
Assume a user wants to obtain a trusted certificate and the corresponding key material.
To this end, the user first generates a public/private key pair on their own hardware. The
private key is never shared with anyone. The public key becomes part of a certificate signing
request (CSR) that the user sends to a registration authority. Before this authority signs the
certificate as requested, the user has to prove their identity (e.g., possession of a domain
name for an HTTPS certificate, or personal identifiers for an S/MIME certificate) to registrars.
The registrar’s signature prevents forgery, as anyone can now verify the certificate using the
(publicly known or similarly verifiable) registrar’s public key. The resulting certificate contains
the user’s identity and public key, as well as CA information and a period of certificate validity.
Its format and PKI management specifications are specified in RFC 1422 and the ITU-X.509
standard.
The existing PKI model has faced several challenges, as evidenced by cases where CAs
have issued certificates in error, or under coercion, or through their own infrastructure being
attacked. As a response, CAs publish a list of revoked/withdrawn certificates, which can be
queried using the Online Certificate Status Protocol (OCSP) as defined in RFC 6960, or is
piggy-backed (“stapled”) in TLS handshakes. To avoid wrong (but validated) certificates being
issued, browsers temporarily started “pinning” them. However, this practice that was quickly
abandoned and deprecated in major browsers, as it turned out to be prone to human errors
(in case of key theft or key loss). Instead, big players such as Google or Cloudflare started
collecting any observed and valid certificates in public immutable logs. TLS client such as
browsers can then opt to refuse non-logged certificates. This scheme, known as Certificate
Transparency (CT) [1758], forces attackers publishing their rogue certificates. Consequently,
certificate owners can notice whether malicious parties started abusing their identifies (e.g.,
domains).
The web of trust is an alternative, decentralized PKI scheme where users can create a com-
munity of trusted parties by mutually signing certificates without needing a registrar. The
PGP scheme we discussed in Section 19.3.1.1 and its prominent implementation GNU Privacy
Guard (GPG) is a good example, in which users certify each others’ key authenticity.
A more detailed PKI discussion is part of the Applied Cryptography Knowledge Area (Sec-
tion 18.3.8).

KA Network Security | July 2021 Page 658


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.2.3 TCP Security

TLS does a great deal in protecting the TCP payloads and prevents session hijacks and packet
injection. Yet what about the security of TCP headers of TLS connections or other, non-TLS
connections? In fact, attackers could try launching TCP reset attacks that aim to maliciously
tear down a target TCP connection. To this end, they guess or bruteforce valid sequence
numbers, and then spoof TCP segments with the RST flag being set. If the spoofed sequence
numbers hit the sliding window, the receiving party will terminate the connection. There are
mainly two orthogonal solutions to this problem deployed in practice: (i) TCP/IP stacks have to
ensure strong randomness for (initial) sequence number generation. (ii) Deny RST segments
with sequence numbers that fall in the middle of the sliding window. Conceptually, these
defenses are ineffective against on-path attackers that can reliably manipulate TCP segments
(e.g., dropping payload and setting the RST flag). Having said this, Weaver et al. [1759] show
that race conditions allow for detecting RST attacks launched by off-path attackers even if
they can infer the correct sequence number.
A SYN Flooding attacker keeps sending TCP SYN segments and forces a server to allocate
resources for half-opened TCP connections. When servers limit the number of half-opened
connections, benign clients can no longer establish TCP connections to the server. To mitigate
this session exhaustion, servers can delete a random half-opened session whenever a new
session needs to be created—potentially deleting benign sessions, though. A defence known
as SYN Cookies has been implemented by operating systems as a more systematic response
to SYN floods [RFC4987]. When enabled, the server does not half open a connection right
away on receiving a TCP connection request. It selects an Initial Sequence Number (ISN)
using a hash function over source and destination IP addresses, port numbers of the SYN
segment, a timestamp with a resolution of 64 seconds, as well as a secret number only known
to the server. The server then sends the client this ISN in the SYN/ACK message. If the request
is from a legitimate sender, the server receives an ACK message with an acknowledgment
number which is ISN plus 1. To verify if an ACK is from a benign sender, the server thus again
computes the SYN cookie using the above-mentioned data, and checks if the acknowledge
number in the ACK segment minus one corresponds to the SYN cookie. If so, the server opens
a TCP connection, and only then starts using resources. A DoS attacker would have to waste
resources themselves and reveal the true sending IP address to learn the correct ISN, hence,
leveling the fairness of resource consumption.

19.3.2.4 UDP Security

What TLS is for TCP, Datagram TLS (DTLS) is for UDP. Yet again there are additional security
considerations for UDP that we briefly discuss next. In contrast to its big brother TCP, UDP is
designed such that application-layer protocols have to handle key mechanisms themselves
(or tolerate their absence), including reordering, reliable transport, or identifier recognition.
Furthermore, being a connection-less protocol, UDP endpoints do not implicitly verify each
others’ IP address before communication starts. Consequently, if not handled at the appli-
cation layer, UDP protocols are prone to IP spoofing attacks. We already showcased the
consequences of this at the example of DNS spoofing. In general, to protect against this threat,
any UDP-based application protocol must gauge the security impact of IP spoofing.
Reflective DDoS attacks are a particular subclass of IP spoofing attacks. Here, attackers send
IP packets in which the source IP address corresponds to a DDoS target. If the immediate
recipients (called reflectors) reply to such packets, their answers overload the victim with

KA Network Security | July 2021 Page 659


The Cyber Security Body Of Knowledge
www.cybok.org

undesired replies. We mentioned this threat already in the context of DNS (Section 19.3.1.3).
The general vulnerability boils down to the lack of IP address validation in UDP. Consequently,
several other UDP-based protocols are similarly vulnerable to reflection [1744]. Reflection
attacks turn into amplification attacks, if the responses are significantly larger than the
requests, which effectively amplifies the attack bandwidth. Unless application-level protocols
validate addresses, or enforce authentication, reflection for UDP-based protocols will remain
possible. If protocol changes would break compatibility, implementations are advised to
rate limit the frequency in which clients can trigger high-amplification responses. Alternative,
non-mandatory instances of amplifying services can be taken offline [1745].

19.3.2.5 QUIC

QUIC is a new transport-level protocol that saw rapid deployment by popular Web browsers.
QUIC offers faster communication using UDP instead of HTTP over TCP. QUIC was originally
designed by Google, and was then standardized by the IETF in 2021 [1760]. Its main goal is
increasing communication performance using multiplexed connections. Being a relatively
new protocol, in contrast to other protocols, QUIC was designed to be secure. Technically,
QUIC uses most of the concepts described in TLS 1.3, but replaces the TLS Record Layer with
its own format. This way, QUIC cannot only encrypt payload, but also most of the header data.
QUIC, being UDP-based, “replaces” the TCP three-way handshake by its own handshake, which
integrates the TLS handshake. This eliminates any round-trip time overhead of TLS. With
reference to Figure 19.2 (page 657), QUIC integrates the only two TLS 1.3 handshake messages
in its own handshake. When serving certificates and additional data during the handshake,
QUIC servers run the risk of being abused for amplification attacks (cf. Section 19.3.2.4), as
server responses are significantly larger than initial client requests. To mitigate this problem,
QUIC servers verify addresses during the handshake, and must not exceed certain amplification
prior to verifying addresses (the current IETF standard draft defines a factor of three).

19.3.3 Security at the Internet Layer


[1731, c8] [1733, c5,c9] [1734, c17] [1732, c8]
Although application-layer and transport-layer security help to provide end-to-end security,
there is also merit in adding security mechanisms to the network layer. First, higher-layer
security mechanisms do not necessarily protect an organisation’s internal network links from
malicious traffic. If and when malicious traffic is detected at the end hosts, it is too late, as
the bandwidth has already been consumed. The second major issue is that the higher-layer
security mechanisms described earlier (e.g., TLS) do not conceal or protect IP headers. This
makes the IP addresses of the communicating end hosts visible to eavesdroppers and even
modifiable to PITM attackers.

KA Network Security | July 2021 Page 660


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.3.1 IPv4 Security

IP Spoofing: IP spoofing, as discussed for UDP and DNS (sections 19.3.2.4 and 19.3.1.3,
respectively), finds its root in the Internet Protocol (IP) and affects both IPv4 and IPv6. In
principle, malicious clients can freely choose to send traffic with any arbitrary IP address.
Thankfully, most providers perform egress filtering and discard traffic from IP addresses
outside of their domain [1761]. Furthermore, Unicast Reverse Path Forwarding (uRPF) enables
on-path routers to drop traffic from IP addresses that they would have expected entering on
other interfaces [1761].
Fragmentation Attacks: IPv4 has to fragment packets that do not fit the network’s Maximum
Transmission Unit (MTU). While fragmentation is trivial, defragmentation is not so, and has
led to severe security problems in the past. For example, a Teardrop attack abuses the fact
that operating systems may try to retain huge amounts of payload when trying to reassemble
highly-overlapping fragments of a synthetic TCP segment. Fragmentation also eases DNS
cache poisoning attacks in that attackers need to bruteforce a reduced search space by
attacking only the non-starting fragments [1762]. Finally, fragmentation may assist attackers
in evading simple payload matches by scattering payload over multiple fragments.
VPNs and IPsec: Many organisations prefer their traffic to be fully encrypted as it leaves
their network. For example, they may want to connect several islands of private networks
owned by an organisation via the Internet. Also, employers and employees want a flexible work
environment where people can work from home, or connect from a hotel room or an airport
lounge without compromising their security. If only individual, otherwise-internal web hosts
need to made available, administrators can deploy web proxies that tunnel traffic (sometimes
referred to as WebVPN). In contrast, a full-fledged Virtual Private Network (VPN) connects
two or more otherwise-separated networks, and not just individual hosts.
There are plenty of security protocols that enable for VPNs, such as Point-to-Point Tunneling
Protocol (PPTP) (deprecated), TLS (used by, e.g., OpenVPN [1763]), or Secure Socket Tunneling
Protocol (SSTP). We will illustrate the general VPN concept at the example of the Internet
Protocol Security (IPsec) protocol suite. Figure 19.4 shows that an employee working from
home accesses a server at work, the VPN client in their host encapsulates IPv4 datagrams into
IPsec and encrypts IPv4 payload containing TCP or UDP segments, or other control messages.
The corporate gateway detects the IPsec datagram, decrypts it and decapsulates it back
to the IPv4 datagram before forwarding it to the server. Every response from the server is
also encrypted by the gateway. IPsec also provides data integrity, origin authentication and
replay attack prevention. These guarantees depend on the chosen IPSec protocol, though.
Only the recommended and widely-deployed Encapsulation Security Payload (ESP) protocol
(part of IPSec) provides these guarantees, including confidentiality and origin authentication.
In contrast, the less popular Authentication Header (AH) protocol just provides integrity.
Similarly, several tunneling protocols such as Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP) or Multiprotocol Label Switching (MPLS) do not provide CIA
guarantees. Should those be required in untrusted networks, e.g., due to GRE’s multi-protocol
or multi-casting functionality, it is advisable to use them in combination with IPSec.
The entire set of modes/configurations/standards provided by IPsec is extensive [1764].
Here, we only briefly introduce that IPsec supports two modes of operation: tunnel mode
and transport mode, as compared in Figure 19.3. In transport mode, only the IP payload—not
the original IP header—is protected. The tunnel mode represent a viable alternative if the
edge devices (routers/gateways) of two networks are IPsec aware. Then, the rest of the
servers/hosts need not worry about IPsec. The edge devices encapsulate every IP packet

KA Network Security | July 2021 Page 661


The Cyber Security Body Of Knowledge
www.cybok.org

Transport Mode Tunnel Mode

Original: IP header IP data IP header IP payload

in IPsec: IP header IPsec hdr IP data new IP hdr IPsec hdr IP header IP payload

Figure 19.3: Comparison between IPsec transport mode and tunnel mode. All parts of the
original packets that are shaded in gray are protected in the respective mode.

IP TCP/UDP Data
Header Header Payload

Public
Internet IP IPSec TCP/UDP Data Payload
Header Header Header

IPSec compliant
Encrypted
Gateway Router

Enterprise Network
IP IPSec TCP/UDP Data Payload
Header Header Header

Encrypted

IPSec Compliant
Host
Home Network

Figure 19.4: IPsec client-server interaction in transport mode (no protection of IP headers).

including the header. This virtually creates a secure tunnel between the two edge devices. The
receiving edge device then decapsulates the IPv4 datagram and forwards within its network
using standard IP forwarding. Tunnel mode simplifies key negotiation, as two edge devices
can handle connections on behalf of all hosts in their respective networks. An additional
advantage is that also IP headers (including source/destination address) gets encrypted.
When a large number of endpoints use IPsec, manually distributing the IPsec keys becomes
challenging. RFC 7296 [1765] defines the Internet Key Exchange protocol (IKEv2). Readers
will observe a similarity between TLS (Section 19.3.2) and IKE, in that IKE also requires an
initial handshake process to negotiate cryptographic algorithms and other values such as
nonces and exhange identities and certificates. We will skip the details of a complex two-
phase protocol exchange which results in the establishment of a quantity called SKEYSEED.
These SKEYSEEDs are used to generate the keys used during a session (Security Associa-
tions (SAs)). IKEv2 uses the Internet Security Association and Key Management Protocol
(ISAKMP) [1766], which defines the procedures for authenticating the communicating peer,
creation and management of SAs, and the key generation techniques.

KA Network Security | July 2021 Page 662


The Cyber Security Body Of Knowledge
www.cybok.org

NAT: Due to the shortage of IPv4 address space, Network Address Translation (NAT) was
designed so that private IP addresses could be mapped onto an externally routable IP address
by the NAT device [1731]. For an outgoing IP packet, the NAT device changes the private source
IP address to a public IP address of the outgoing link. This has implicit, yet unintentional
security benefits. First, NAT obfuscates the internal IP address from the outside world. To a
potential attacker, the packets appear to be coming from the NAT device, not the real host
behind the NAT device. Second, unless loopholes are opened via port forwarding or via UPnP,
NAT gateways such as home routers prevent attackers from reaching internal hosts.

19.3.3.2 IPv6 Security

Although conceptually very similar from a security perspective, IPv6 brings a few advantages
over IPv4. For example, IPv6’s 128-bit address space slows down port scans, as opposed
to IPv4, where the entire 32-bit address space can be scanned in less than an hour [1767].
Similarly, IPv6 comes with built-in encryption in form of IPsec. IPsec that was initially mandated
in the early IPv6 standard. Yet nowadays, due to implementation difficulties, IPsec remains a
recommendation only. Furthermore, in contrast to IPv4, IPv6 has no options in its header—
these were used for attacks/exploits in IPv4.
The community has debated many years over the potential security pitfalls with IPv6. As a
quite drastic change, the huge address space in IPv6 obsoletes NATing within the IPv6 world,
including all its implicit security benefits. In particular, NAT requires state tracking, which
devices often couple with a stateful firewall (which we will discuss in Section 19.4.1) that brings
additional security. Furthermore, NAT hides the true IP addresses and therefore complicates IP-
based tracking—providing some weak form of anonymity. Having said this, experts argue that
these perceived advantages also come with lots of complexity and disadvantages (e.g., single
point of failure), and that eliminating NAT by no means implies that Internet-connected devices
no longer have firewalls [1768]. Furthermore, having large networks to choose addresses from,
IPv6 may allow to rotate IP addresses more frequently to complicate address-based tracking.
Summarizing this debate, as long as we do not drop firewalls, and are careful with IP address
assignment policies, IPv6 does not weaken security.
Finally, another important aspect to consider is that we are still in a steady transition from IPv4
to IPv6. Hence, many devices feature a so-called dual stack, i.e., IPv4 and IPv6 connectivity.
This naturally asks for protecting both network accesses simultaneously.

19.3.3.3 Routing Security

IPv4/IPv6 assume that Internet routers reliably forward packets from source to destination.
Unfortunately, a network can easily be disrupted if either the routers themselves are compro-
mised or they accept spurious routing exchange messages from malicious actors. We will
discuss these threats in the following, distinguishing between internal and external routing.
Within an Autonomous System (AS): Interior Gateway Protocols (IGPs) are used for exchang-
ing routing information within an Autonomous System (AS). Two such protocols, Routing
Information Protocol (RIPv2) and Open Shortest Path First (OSPFv2), are in widespread use
with ASs for IPv4 networks. The newer RIPng and OSPFv3 versions support IPv6. These
protocols support no security by default but can be configured to support either plain text-
based authentication or MD5-based authentication. Authentication can avoid several kinds
of attacks such as bogus route insertion or modifying and adding a rogue neighbour. Older
routing protocols, including RIPv1 or Cisco’s proprietary Interior Gateway Routing Protocol

KA Network Security | July 2021 Page 663


The Cyber Security Body Of Knowledge
www.cybok.org

(IGRP)—unlike its more secure successor, the Enhanced Interior Gateway Routing Protocol
(EIGRP)—do not offer any kind of authentication, and hence, should be used with care.
Across ASs: The Internet uses a hierarchical system where each AS exchanges routing infor-
mation with other ASs using the Border Gateway Protocol (BGP) [1769, 1770]. BGP is a path
vector routing protocol. We distinguish between External BGP used across ASs, and Internal
BGP that is used to propagate routes within an AS. From now on, when referring to BGP, we
talk about External BGP, as it comes with the most interesting security challenges. In BGP, ASs
advertise their IP prefixes (IP address ranges of size /24 or larger) to peers, upstreams and
customers [1731]. BGP routers append their AS information before forwarding these prefixes
to their neighbors. Effectively, this creates a list of ASs that have to be passed to reach the
prefix, commonly referred to as the AS path.
High-impact attacks in the past have highlighted the security weakness in BGP due to its
lack of integrity and authentication [1771]. In particular, in a BGP prefix hijacking attack [1772],
a malicious router could advertise an IP prefix, saying that the best route to a service is
through its network. Once the traffic starts to flow through its network, it can drop (DoS,
censorship), sniff on (eavesdrop) or redirect traffic in order to overload an unsuspecting AS.
As a countermeasure, the Resource Public Key Infrastructure (RPKI) [1773], as operated by
the five Regional Internet Registrys (RIRs), maps IP prefixes to ASs in so-called Route Origin
Authorization (ROA). When neighbors receive announcements, RPKI allows them to discard
BGP announcements that are not backed by an ROA or are more specific than allowed by the
ROA. This process, called Route Origin Validation (ROV), enables to drop advertisements in
which the AS that owns the advertised prefix is not on the advertised path.
RPKI cannot detect bogus advertisements where the owning AS is on path, but a malicious
AS aims to reroute the target’s AS traffic as an intermediary. BGPsec partially addresses this
remaining security concern [1774]. Two neighbouring routers can use IPsec mechanisms for
point-to-point security to exchange updates. Furthermore, BGPsec enables routers to verify
the incremental updates of an announced AS path. That is, they can verify which on-path
AS has added itself to the AS path, preventing bogus paths that include a malicious AS that
lacks the according cryptographic secrets. However, BGPsec entails large overheads, such
as verifying a larger number of signatures on booting, and splitting up bulk announcements
into many smaller ones. Furthermore, BGPsec only adds security if all systems on the AS
path support it. Hence, not many routers deploy BGPsec yet, fueled by the lack of short-term
benefits [1775]—and it is likely to take years until it will find wide adoption, if ever.
Despite the fact that BGP prefix hijacks are a decade-old problem, fixing them retroactively
remains one of the great unsolved challenges in network security. In fact, one camp argues
that the BGP design is inherently flawed [1776], and entire (yet not widely deployed) Internet
redesigns such as SCION [1777] indeed provide much stronger guarantees. Others did not give
up yet, and hope to further strengthen the trust in AS paths by the help of ongoing initiatives
such as Autonomous System Provider Authorization (ASPA) [1778].

KA Network Security | July 2021 Page 664


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.3.4 ICMP Security

Internet Control Message Protocol (ICMP) is a supportive protocol mainly used for exchanging
status or error information. Unfortunately, it introduced several orthogonal security risks,
most of which are no longer present but still worth mentioning. Most notably, there many
documented cases in which ICMP was an enabler for Denial of Service (DoS) attacks. The Ping
of Death abused a malformed ICMP packet that triggered a software bug in earlier versions
of the Windows operating system, typically leading to a system crash at the packet recipient.
In an ICMP flood, an attacker sends massive amounts of ICMP packets to swamp a target
network/system. Such floods can be further amplified in so-called smurf attacks, in which an
attacker sends IP-spoofed ICMP ping messages to the broadcast address of an IP network.
If the ICMP messages are relayed to all network participants using the (spoofed) address
of the target system as source, the target receives ping responses from all active devices.
Smurf attacks can be mitigated by dropping ICMP packets from outside of the network, or by
dropping ICMP messages destined to broadcast addresses.
But also outside of the DoS context, ICMP is worth considering from a security perspective.
Insider attackers can abuse ICMP as covert channel to leak sensitive data unless ICMP is
closely monitored or forbidden. ICMP reachability tests allow attackers to perform network
reconnaissance during network scans (see also Section 19.4.3). Many network operators
thus balance pros and cons of ICMP in their networks, often deciding to drop external ICMP
messages using a firewall (see also Section 19.4.1).

19.3.4 Security on the Link Layer


[1731, c8] [1733, c7] [1732, c8]
In this section, we are confining our attention to the security of the link layer. We mostly focus
on the logical part of the link layer. The physical part is addressed in the Physical Layer and
Telecommunications Security Knowledge Area (Chapter 22).

19.3.4.1 Port-based Network Access Control (IEEE 802.1X)

The IEEE 802.1X is a port-based authentication for securing both wired and wireless networks.
Before a user can access a network at the link layer, it must authenticate the switch or access
point (AP) they are attempting to connect to, either physically or via a wireless channel. As with
most standards bodies, this group has its own jargon. Figure 19.5 shows a typical 802.1X setup.
A user is called a supplicant and a switch or AP is called an authenticator. Supplicant software
is typically available on various OS platforms or it can also be provided by chipset vendors.
A supplicant (client) wishing to access a network must use the Extensible Authentication
Protocol (EAP) to connect to the Authentication Server (AuthS) via an authenticator. The EAP
is an end-to-end (client to authentication server) protocol. When a new client (supplicant)
is connected to an authenticator, the port on the authenticator is set to the ‘unauthorised’
state, allowing only 802.1X traffic. Other higher layer traffic, such as TCP/UDP is blocked. The
authenticator sends out the EAP-Request identity to the supplicant. The supplicant responds
with the EAP-response packet, which is forwarded to the AS, and typically proves the supplicant
possesses its credentials. After successful verification, the authenticator unblocks the port to
let higher layer traffic through. When the supplicant logs off, the EAP-logoff to the authenticator
sets the port to block all non-EAP traffic.

KA Network Security | July 2021 Page 665


The Cyber Security Body Of Knowledge
www.cybok.org

EAP TLS

EAP

RADIUS/LDAP/.. EAP over LAN (EAPOL)

UDP/IP IEEE 802.11

Hub

Wired LAN

RADIUS Server Authenticator


(Switch/AP)

Wireless Device
Supplicant

Figure 19.5: Extensible Authentication Protocol (EAP)

There are a couple of pitfalls when deploying EAP and choosing the wrong mode of operation.
Certain EAP modes are prone to PITM attacks, especially in a wireless setting. It therefore is
advised to use any sort of TLS-based EAP variant, such as EAP-TLS [1779] or the Protected Ex-
tensible Authentication Protocol (PEAP). Similarly, dictionary attacks can weaken the security
guarantees of certain EAP modes (e.g., EAP-MD5) that should be avoided.

19.3.4.2 WAN Link-Layer Security

What IEEE 802.1X is for local networks, protocols like Point-to-Point Protocol (PPP), its sibling
PPP over Ethernet (PPPoE), or High-Level Data Link Control (HDLC) are for Wide Area Networks
(WANs). They offer clients means to connect to the Internet by the help of their ISPs. PPP(oE) is
the most widely used protocol in this context, used by billions of broadcast devices worldwide.
Although optional in its standard, in practice, ISPs usually mandate client authentication to
hold unauthorized users off. Popular examples of such authentication protocols within PPP
are Password Authentication Protocol (PAP), Challenge Handshake Authentication Protocol
(CHAP), or any of the authentication protocols supported by EAP. Usage of PAP is discouraged,
as it transmits client credentials in plain. Instead, CHAP uses a reasonbly secure challenge-
response authentication, which is however susceptible to offline bruteforce attacks against
recorded authentication sessions that contain weak credentials.

KA Network Security | July 2021 Page 666


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.4.3 Attacks On Ethernet Switches

Ethernet switches maintain forwarding table entries in a Content Addressable Memory (CAM).
As a switch learns about a new destination host, the switch includes this host’s address and
its physical port in the CAM. For all future communications, this table entry is looked up to
forward a frame to the correct physical port. MAC spoofing allow attackers to manipulate this
mapping by forging their Message Authentication Code (MAC) addresses when sending traffic.
For example, to poison the forwarding table, an attacker crafts frames with random addresses
to populate an entire CAM. If successful, switches have to flood all the incoming data frames
to all the outgoing ports, as they can no longer enter new address-to-port mappings. This
makes the data available to the attacker attached to any of the switch ports.
Such MAC spoofing attacks can also be more targeted. Assume attackers want to steal traffic
destined to one particular target host only, instead of seeing all traffic. The attacker then
copies the target’s MAC address. This way, the attacker may implicitly rewrite the target’s entry
in the switch forwarding table. If so, the switch will falsely forward frames to the attacking
host that were actually destined for the target host.
Mitigating these MAC spoofing attacks requires authenticating the MAC addresses before
populating the forwarding table entry. For example, IEEE 802.1X (see Section 19.3.4.1) mitigates
such attack, vetting hosts before they can connect. Furthermore, switches may limit the
number of MAC addresses per interface or enforce MAC bindings, as described next.

19.3.4.4 Address Resolution Protocol (ARP) / Neighbor Discovery Protocol (NDP)

The Address Resolution Protocol (ARP) translates IPv4 addresses to link layer addresses
(e.g., MAC addresses in Ethernet). ARP spoofing is similar to MAC spoofing, yet does not
(only) target the switch’s address mappings. Instead, ARP spoofing target the IP-to-MAC
address mappings of all network participants (possibly including the switch) in the same
segment. To this end, ARP spoofing attackers send fake ARP messages over a LAN. For
example, they can broadcast crafted ARP requests and hope participants learn wrong IP-to-
MAC mappings on the fly, or reply with forged replies to ARP request. Either way, attackers
aim to (re-)bind the target’s IP address to their own MAC address. If successful, attackers
will receive data that were intended for the target’s IP address. ARP spoofing is particularly
popular for session hijacking and PITM attacks. Similar attacks are possible for the Reverse
Address Resolution Protocol (RARP), which—by now rarely used—allows hosts to discover
their IP address. To mitigate ARP spoofing, switches employ (or learn) a trusted database of
static IP-to-MAC address mappings, and refuse to relay any ARP traffic that contradicts these
trusted entries. Alternatively, network administrators can spot ARP anomalies [1780], e.g.,
searching for suspicious cases in which one IP address maps to multiple MAC addresses.
What ARP is for IPv4, the Neighbor Discovery Protocol (NDP) is for IPv6. NDP is based on
ICMPv6 and is more feature-rich than ARP. Conceptually, NDP underlies the same spoofing
risks as ARP, though, and requires the same countermeasures. Furthermore, there is one
more caveat due to automatic IPv6 address assignment. In IPv6’s most basic (yet common)
IP address autoconfiguration scheme, layer-3 addresses are derived directly from layer-2
addresses without any need for address resolution. Knowledge of the MAC address may allow
attackers to infer information about the host/servers which can be handy when launching
attacks, or to track devices even if they change network prefixes. Using hash function for
address generation is recommended as a mitigation technique. Further, RFC 4982 extends
IPv6 by allowing for a Cryptographically Generated Address (CGA) where an address is bound

KA Network Security | July 2021 Page 667


The Cyber Security Body Of Knowledge
www.cybok.org

to a public signature key. Orthogonal to this, RFC 7217 proposes to have stable addresses
within a network prefix, and change them when clients switch networks to avoid cross-network
tracking.

19.3.4.5 Network Segmentation

MAC spoofing and ARP spoofing nicely illustrate how fragile security on the network layer
is. Consequently, network architects aim to split their physical network into several smaller
networks—a practice known as network segmentation. Highly-critical environments such as
sensitive military or control networks use physical segmentation. As the required change
of cables and wires is quite expensive, virtual network segmentation has become a popular
alternative. Virtual LANs (VLANs) are the de facto standard for virtual segmentation on
Ethernet networks. VLANs can split sensitive (e.g., internal servers) from less sensitive (guest
WiFi) network segments. VLANs enforce that routers can see and react upon traffic between
segments, and limit the harm attackers can do to the entire LAN. It is important to note that
network segmentation (e.g., via VLANs) does not necessarily require VPNs to bridge the
networks. If all network segments are local, a router that is part of multiple subnetworks can
connect them, ideally augmented with secure firewall policies (cf. Section 19.4.1) that control
inter-network communication at the IP layer.
VLAN hopping attacks allow an attacking host on a VLAN to gain access to resources on
other VLANs that would normally be restricted. There are two primary methods of VLAN
hopping: switch spoofing and double tagging. In a switch spoofing attack, an attacking host
impersonates a trunking switch responding to the tagging and trunking protocols (e.g., IEEE
802.1Q or Dynamic Trunking Protocol) typically used in a VLAN environment. The attacker
now succeeds in accessing traffic for multiple VLANs. Vendors mitigate these attacks by
proper switch configuration. For example, the ports are assigned a trunking role explicitly and
the others are configured as access ports only. Also, any automatic trunk negotiation protocol
can be disabled. In a double tagging attack, an attacker succeeds in sending its frame to more
than one VLAN by inserting two VLAN tags to a frame it transmits. However, this attack does
not allow them to receive a response. Again, vendors provide recommended configuration
methods to deal with these possible attacks. A comprehensive survey of Ethernet attacks
and defence can be found in [1781].
Organizations like hosting providers that heavily virtualize services quickly reach the limitation
of having a maximum of little less than 4096 VLANs when trying to isolating their services.
Virtual eXtensible LAN (VXLAN) tackles this limitation by introducing an encapsulation scheme
for multi-tenant environments [1782]. Unlike VLANs, which work on the link layer, VXLANs
strictly speaking operate at the network layer to emulate link-layer networks. VXLAN allows
creating up to ≈16M virtually separated networks, which can additionally be combined with
VLAN functionality. Like VLANs, VXLANs also do not aim to provide confidentiality or integrity
in general. Instead, they are means to segment networks. Worse, however, being on the network
layer, VXLAN packets can traverse the Internet, and may allow attackers to inject spoofed
VXLAN packets into “remote” networks. Thus, care has to be taken, e.g., by ingress filtering at
the network edge to drop VXLAN packets that carry a valid VXLANs endpoint IP address.

KA Network Security | July 2021 Page 668


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.4.6 Wireless Security

Wireless LAN are more vulnerable to security risks due to the broadcast nature of media,
which simplifies eavesdropping. There have been several failed attempts to add integrity and
confidentiality to WLANs communication. First, the Wired Equivalent Privacy (WEP) protocol
used a symmetric key encryption method where the host shares a key with an Access Point
(AP) out of band. WEP had several design flaws. First, a 24-bit IV introduced a weakness in that
≈16 million unique IVs can be exhausted in high-speed links in less than 2 hours. Given that IVs
are sent in plaintext, an eavesdropper can easily detect this reuse and mount a known plaintext
attack. Furthermore, using RC4 allowed for the Fluhrer, Martin and Shamir (FMS) attacks,
in which an attacker can recover the key in an RC4 encrypted stream by capturing a large
number of messages in that stream [1783, 1784]. Furthermore, WEP’s linear CRC was great for
detecting random link errors, but failed to reliably reveal malicious message modifications.
An interim standard called the Wi-Fi Protected Access (WPA) was quickly developed for
backward hardware compatibility, while WPA2 was being worked out. WPA uses the Temporal
Key Integrity Protocol (TKIP) but maintains RC4 for compatibility. The Pre-Shared Key (PSK),
also known as WPA-Personal, is similar to the WEP-Key. However, the PSK is used differently, a
nonce, and PSK are hashed to generate a temporal key. Following this, a cryptographic mixing
function is used to combine this temporal key, the Temporal MAC (TMAC), and the sequence
counter resulting in one key for encryption (128 bits) and another key for integrity (64 bits). As
a consequence, every packet is encrypted with a unique encryption key to avoid FMS-style
attacks. Also, the WPA extends the WEP IV to 48 bits. Several new fields include a new Frame
Check Sequence (FCS) field, a CRC-32 checksum for error correction and a hash function for a
proper integrity check. Due to compromises it made with respect to backwards compatibility,
the WPA has had its own share of attacks, though [1785].
The Wifi alliance then standardized WPA2 in 2004. WPA2 relies on more powerful hardware
supporting a 128-bit AES Counter Mode with the Cipher Block Chaining Message Authentica-
tion Code Protocol (CCMP), obsoleting RC4. It also provides an improved 4-way handshake
and temporary key generation method (which does not feature forward secrecy, though).
While implementations of this handshake were shown insecure [1786], the general handshake
methodology was formally verified and is still believed to be secure [1787, 1788].
In 2018, a new WPA3 standard was accepted to make a gradual transition and eventually
replace the WPA2. WPA3 overcomes the lack of perfect forward secrecy in WPA and WPA2.
The PSK is replaced with a new key distribution called the Simultaneous Authentication of
Equals (SAE) based on the IETF Dragonfly key exchange. The WPA3-Personal mode uses a
128-bit encryption, whereas the WPA3-Enterprise uses 192-bit encryption.
The discussion so far assumed that there is a shared secret between WLAN users and APs
from which session keys can be derived. In fact, enterprise settings usually handle WLAN
access control using perform strong authentication such as 802.1X (Section 19.3.4.1). Ideally,
WLAN users have their own client certificates that provide much stronger security than any
reasonably user-friendly password. In contrast, for openly accessible networks such as at
airports or restaurants, there may neither be PSKs nor certificates. Consequently, the lack of
strong encryption would leave communication unprotected. Opportunistic Wireless Encryption
(OWE) tackles this open problem [1789]. Instead of using a PSK during the WPA2/3 four-way
handshake, the client and AP use a pairwise secret derived from an initial DHKE.

KA Network Security | July 2021 Page 669


The Cyber Security Body Of Knowledge
www.cybok.org

19.3.4.7 Bus Security

Bus networks follow a special topology in that all nodes are directly connected to a shared
medium (the bus). Securing a bus is inherently complex, especially if we assume that an insider
attacker is connected to the bus. In order to illustrate this, we will focus on the Controller Area
Network (CAN) standard, which despite its age is still quite commonly used in cars today.
CAN nicely reveals many issues that can arise on bus networks in general. CAN connects
so-called Electrical Control Units (ECUs), such as a car’s wheel, break pedal or the radio.
CAN is a real-time protocol designed to give priority to more urgent ECUs (e.g., brake pedal)
over less pressing ones (e.g., multimedia control). Sadly, CAN suffers from severe security
vulnerabilities. They become especially problematic if ECUs are or turn malicious (e.g., after
compromise). First, CAN does not authenticate messages, i.e., any compromised ECU (e.g.,
multimedia system) can easily spoof messages of critical components (e.g., wheel speed
sensor). Second, compromised bus components can receive and invalidate all messages of
any arbitrary other ECUs on the same bus. For example, a compromised ECU could suppress
the signals sent by an activated brake pedal. Finally, and a little less concerning than the
previous examples, CAN is unencrypted, providing no confidentiality against sniffing.
A radical protocol change could solve all these problems. In fact, there are new standards
like AUTomotive Open System ARchitecture (AUTOSAR) [1790] that provide improved security
principles. Yet, as always, such radical changes take a long time in practice, as they break
compatibility of existing devices. Also, devices have a years-long development cycle and usage
time. Vendors are aware of these issues and aim to mitigate the problem by segmenting critical
components from less critical ones (segmentation in general is discussed in Section 19.3.4.5).
While certainly a vital step, as it physically disconnects more complex and vulnerable devices
such as multimedia systems from safety-critical devices, this only reduces and not entirely
eliminates the attack surface. A star topology would solve many of these issues, as the
medium is no longer shared and address spoofing could be validated by a central entity. Yet star
topologies incur significant additional physical cables, and thus, higher costs, manufacturing
complexity, and weight. Academia explored several approaches to add message authenticity
to CAN and to prevent spoofing on CAN without breaking backwards-compatibility [1791, 1792].
None of them found wide deployment in practice yet, though, possibly due to costs and the
need to adapt ECUs. Alternative approaches aim to detect spoofed messages by learning and
modeling the per-ECU voltage of bus messages. Unfortunately, such classifiers were proven
unreliable [1793]. A wider popularity of CAN-FD [1738], which offers a flexible data rate and
larger messages (64B instead of 8B in CAN) will decrease overhead of security add-ons and
may thus ease the development of more secure CAN communication in the future.
Many of the observed problems generalize beyond CAN to any bus system or even shared-
medium network. Rogue components on a bus can suppress messages by invalidating them,
anyone on a bus can see all messages, and there are no built-in protection against spoofing.
Physical separation and segmentation of bus networks remains one of the key concepts to
securing them. In addition, to add security guarantees to insecure bus protocols, we sometimes
see complete protocol overhauls that typically break backward compatibility. For example, the
insecure Modbus standard from 1979 [1736] has a secure alternative (Modbus/TCP Security
Protocol [1794]) since 2018, which wraps bus messages in secure TLS-protected channels.

KA Network Security | July 2021 Page 670


The Cyber Security Body Of Knowledge
www.cybok.org

19.4 NETWORK SECURITY TOOLS


[1731, c8] [1733, c5,c8,c11,c12] [1734, c23] [1735, c6] [1732, c8]
Until now we have discussed attacks and defenses at the protocol level. We will now introduce
additional and orthogonal tools to these protocol-level defenses. Many of these tools have
become de facto standards on top of the aforementioned security schemes at the protocol
level. We only provide a brief overview here. The effective deployment of these tools is covered
in detail in the Security Operations & Incident Management Knowledge Area (Chapter 8).

19.4.1 Firewalling
Firewalls can be co-located with routers or implemented as specialised servers. In either case,
they are gatekeepers, inspecting all incoming/outgoing traffic. Firewall systems are typically
configured as bastion hosts, i.e., minimal systems hardened against attacks. They apply traffic
filters based on a network’s security policy and treat all network packets accordingly. The
term filter is used for a set of rules configured by an administrator to inspect a packet and
perform a matching action, e.g., let the packet through, drop the packet, drop and generate
a notification to the sender via ICMP messages. Packets may be filtered according to their
source and destination network addresses, protocol type (TCP, UDP, ICMP), TCP or UDP
source/destination port numbers, TCP Flag bits (SYN/ACK), rules for traffic from a host or
leaving the network via a particular interface and so on. Traditionally, firewalls were pure
packet filters, which worked on inspecting header field only. By now, firewalls can also be
stateful, i.e., they retain state information about flows and can map packets to streams. While
stateful firewalls allow to monitor related traffic and can map communication to flows, this
comes at the cost of maintaining (possibly lots of) state.

Rule State Src IP Src Port Dst IP Dst Port Proto Action
#1 NEW 172.16.0.0/24 * * 80, 443 TCP ACCEPT
#2 NEW * * 172.16.20.5 22 TCP ACCEPT
#3 ESTABLISHED * * * * TCP ACCEPT
#4 * * * * * * DROP

Figure 19.6: Firewalling example. Rule #1 allows outgoing HTTP(S), rule #2 allows incoming
SSH.

Figure 19.6 shows a simple example firewall configuration. All internal hosts (here, in network
172.16.0.0/24) are allowed to communicate to TCP ports 80/443 for HTTP/HTTPS to
external hosts (rule #1). External hosts can connect to an internal SSH server via TCP on port
22 (rule #2). All follow-up communication of these connections is granted (rule #3). Any other
communication is dropped (rule #4). In reality, firewall configurations can become incredibly
more complex than this minimal example. Specifying complete and coherent policies is
typically hard. It typically helps to first lay out a firewall decision diagram, which is then—ideally
automatically—transferred into concrete, functionally equivalent firewall policies [1795]. Tools
like Firewall Builder [1796] or Capirca [1797] can assist in this process.
Application Gateway (AG): Application gateways, aka application proxies, perform access
control and thus facilitate any additional requirements of user authentication before a ses-
sion is admitted. These AGs can also inspect content at the application layer, unless fully

KA Network Security | July 2021 Page 671


The Cyber Security Body Of Knowledge
www.cybok.org

encrypted. In a typical setting, the application gateway will use a firewall’s services after
performing authentication and policy enforcement. A client wanting to access an external
service would connect to the AG first. The AG would prompt them for authentication before
initiating a session to the external server. The AG would now establish the connection with
the destination acting as a relay on behalf of the client, essentially creating two sessions like
a PITM. Another interesting application of an AG is TLS termination. An incoming webserver
TLS connection could be terminated at the AG, so that it could do the resource intensive en-
cryption/decryption and pass the un-encrypted traffic to the back-end servers. In practice, the
AGs are also configured to inspect encrypted outbound traffic where the clients are configured
with corresponding certificates installed at the AG.
Circuit-level Gateway (CG): A CG is a proxy that functions as a relay for TCP connections,
thus allowing hosts from a corporate Intranet to make TCP connections over the Internet. CGs
are typically co-located with a firewall. The most widely used CG today is SOCKS. For end user
applications, it runs transparently as long as the hosts are configured to use SOCKS in place
of a standard socket interface. A CG is simple to implement compared to an AG, as it does
not need to understand application layer protocols.
DMZ: Network design ensures careful firewall placements by segmenting networks. Typically,
the Demilitarised Zone (DMZ) (aka a perimeter network) is created. All external untrusted
users are restricted from using the services available in this zone. Typically, an organisation’s
public web server and authoritative DNS would reside in the DMZ. The rest of the network
is partitioned into several security zones by a security architect. For example, a payment
database would be deployed to an isolated network, so would an internal file server.

19.4.2 Intrusion Detection and Prevention Systems


Intrusion Detection Systems (IDSs) can provide valuable information about anomalous net-
work behaviour. They inspect payload, higher-layer information and many more attributes of
sessions beyond what a firewall can do. An IDS would monitor network traffic with the help of
agents/sensors/monitors on the network and sets off alarms when it detects (or thinks it has)
suspicious activity. Essentially, the IDS would compare the traffic against what it considers
normal traffic and, using a range of techniques, would generate an alert. IDSs can operate
purely on traffic statistics that can be derived from header data. Alternatively, Deep Packet
Inspection (DPI) allows to inspect transport- or application-layer payloads to recognize known
malicious communication patterns (e.g., of malware). There are several widely-used IDSs
such as Snort [859], Zeek [1798] or Suricata [1799]. IDSs have been used in numerous contexts,
such as for detecting malware [671, 1800, 1801, 1802], attacks in automotive networks [1803]
or against unmanned vehicles [1804], software exploits [1805], DoS attacks [1806] or attacks
in wireless ad-hoc networks [1807].
The accuracy of an IDS remains a fundamental operational challenge. False alarms are a
huge problem for network/security administrators despite decades of research. An IDS may
generate false positives for legitimate hosts carrying out suspicious yet benign behaviour.
Likewise, a false negative occurs when malicious activity remains undetected.
Signature-based IDSs compare monitored traffic against a database of known malicious
communication patterns. The database has to be continually updated, and despite all efforts,
will never be complete. Signatures can be as simple as a source/destination IP address or
other protocol headers, or match payload patterns. Rule specification goes beyond the scope
of this chapter and is covered by more detailed textbooks [1808, 1809]. The following toy rule

KA Network Security | July 2021 Page 672


The Cyber Security Body Of Knowledge
www.cybok.org

checks generates an alert if the payload of TCP/80 connection to network 192.168.5.7/24


contains a ‘GET’ string.
alert tcp any any -> 192.168.5.7/24 80 1

(content:"GET"; msg:"GET has been detected";) 2

A signature-based IDS generates a heavy workload, as it has to compare huge numbers of


signatures. Speed of detection plays a key role in preventing these attacks. Several systems
deploy parallel and distributed detection systems that can cope with high traffic rates on large
networks and allow online detection; others exploit parallelism at the hardware level in order
to overcome processing delays so that packets and flows can be processed at high speeds.
Anomaly-based IDSs compare monitored traffic to behavioral models built previously “nor-
mal” traffic during a learning phase. Instead of blocking certain patterns, anomaly detection
allows all communication it deems as benign, and blocks the rest. The fundamentally hard
problem here is to capture normal traffic that is both clean (i.e., does not contain malicious
behavior) and sufficiently representative (i.e., it also captures benign behavior that will arise
in the future). For example, an anomaly-based system based on statistical features could
capture bandwidth usage, protocols, ports, arrival rate and burstiness [869]. In this example,
a large percentage of port scans would be anomalous and generate an alert. Despite using
machine learning techniques, anomaly-based IDSs accuracy remains unsatisfying in practical
deployments [1810].
Another way of classifying IDSs is the point of monitoring for malicious behaviour. A Host
Intrusion Detection System (HIDS) runs on individual hosts in the network. Most virus scan
software would have this feature where they also monitor inbound and outbound traffic in
addition to the usual virus scanning. This can be particularly helpful if the hosts have been
compromised and form part of a bot to attack other servers/networks. In contrast, a network
intrusion detection system is deployed at strategic locations within the network to monitor
inbound and outbound traffic to and from the devices in various segments of the network.
Intrusion Prevention System (IPS): An IPS distinguishes itself from an IDS in that it can be
configured to block potential threats by setting filtering criteria on routers/switches at various
locations in the network. IPS systems monitor traffic in real time dropping any suspected
malicious packets, blocking traffic from malicious source addresses or resetting suspect
connections. In most cases, an IPS would also have IDS capabilities.

19.4.3 Network Security Monitoring


Several other network monitoring tools help to understand the security situation of a network.
We will briefly discuss these monitoring methodologies and mention example uses cases.
Flow monitoring standards such as NetFlow [1811] or IPFIX [1812] aggregate statistical infor-
mation of all communication streams within a network. They provide a sweet spot between
recording all network communication and nothing at all. Flow aggregation typically requires
little computational resources and has low storage demands, enabling for long-term storage.
Flow data comes handy during network forensics, or even as input for anomaly detection [1813].
Network forensics enable administrators to extract application payload observed on their
networks. When used as sniffer or when applied on recorded traffic, frameworks such as
NetworkMiner [1814] or Xplico [1815] can, e.g., extract files, emails and HTTP sessions. They
often come with further functionality to fingerprint the network hosts and to map network

KA Network Security | July 2021 Page 673


The Cyber Security Body Of Knowledge
www.cybok.org

hosts to locations. Having said this, unless augmented with the according key material, these
forensic tools are limited to analyzing non-secure communication.
Network scans allow network administrators to enumerate hosts and services within their
network (or, optionally, the entire Internet). There are numerous tools such as Nmap [1816] or
Zmap [1767] that can send, e.g., ICMP and SYN probes at scale.
IP telescopes are publicly reachable network ranges that do not host any service or client.
Given these networks are still routed, though, one can monitor any traffic sent to them and
derive interesting observations. For example, IP telescopes help observing network scans
by others [1817]. Similarly, they allow to spot backscatter [1818], i.e., responses of traffic that
attackers have provoked when using the telescope’s IP addresses in IP spoofing attacks (e.g.,
when assigning random IP addresses during SYN floods).
Honeypots are system used by defenders to trap attackers. They are intentionally vulnerable
yet well-isolated client or server systems that are exposed to attackers. There is a wide diversity
of client-side honeypots (e.g., to emulate browser vulnerabilities [1819]) and server-side hon-
eypots (e.g., to emulate service vulnerabilities [1820, 1821], or to attract DDoS attacks [1822]).
Observing the techniques attackers use to exploit these honeypots gives valuable insights
into tactics and procedures.
Network reputation services can help to assess the trustworthiness of individual network
entities such as IP addresses or domain names. Based on past behaviour observed from
an entity, these mostly commercial providers publish a score that serves as reputation for
others. Identifying badly reputed hosts in network traffic can help to detect known attackers
or connections to botnets. Reputation services are, however, limited in coverage and accuracy
due to volatile domaind and IP address usage of attacking hosts.
Finally, Security Information and Event Management (SIEM) systems collect events from
security-critical sensors (e.g., IDS, firewalls, host-based sensors, system log files). A SIEM
system then analyes these events to distill and raise security-critical incidents for further
inspection. It is particularly the combination of multiple data sources (system log files, host-
based anomaly sensors, firewall or IDS events) that makes SIEM so successful in detecting,
e.g., brute force attacks, worm propagations, or scans.

19.4.4 SDN and NFV Security


The SDN architecture enables for novel threat detection and attack prevention capabili-
ties [1823, 1824]. For example, the central SDN controller(s) can more accurately infer DDoS
attacks, and automate mitigation strategies by dynamically reprogram switches to drop mali-
cious traffic flows. Furthermore, infected hosts can automatically be routed to an isolated
network region (sometimes called a walled garden) issuing quarantine notifications to users of
infected systems. SDN allows for such an immediate network isolation via software-controlled
changes in the network—which was tedious reconfiguration labor before. Similarly, network
designs can be rapidly scaled to higher loads.
At the same time, the SDN management plane offers a unique attack vector. Intruders gaining
power over the SDN controller undermine any security guarantees that SDN bring, and have
additionally the power to reconfigure networks at will. It is therefore of utmost importance that
the SDN controller software and the underlying platform follow strong security best practices,
including hardened software and strict access control. Furthermore, the SDN platform has
to be protected against new SDN-specific threats. For example, the SDN controllers use a

KA Network Security | July 2021 Page 674


The Cyber Security Body Of Knowledge
www.cybok.org

Spanning Tree Algorithm (SPTA) for topology updates. In a DoS attack, an adversary could
advertise a fake link and force the SPTA to block legitimate ports. Similarly, being in a central
position, SDN controllers can be target of DoS attacks [1825]. Furthermore, Hong et al. [1826]
provide a number of attack vectors on practical SDN switch implementations. SDN switches
are also prone to a timing side channel attack [1827]. For example, attackers can send a packet
and measure the time it takes the switch to process this packet. For a new packet, the switch
will need to fetch a new rule from the controller, thus resulting in additional delay over the
flows that already have rules installed at the switch. Consequently, the attacker can determine
whether an exchange between an IDS and a database server has taken place, or whether a
host has visited a particular website. A possible countermeasure would introduce delay for
the first few packets of every flow even if a rule exists [1828]. A more extensive analysis of
SDN vulnerabilities in general can be found in a study by Zerkane et al. [1829].
Network Functions Virtualisation (NFV) aims to reduce capex and allow for the rapid intro-
duction of new services to the market. Specialised network middleboxes such as firewalls,
encoders/decoders, DMZs and deep packet inspection units are typically closed black box
devices running proprietary software [1830]. NFV researchers have proposed the deployment
of these middleboxes entirely as virtualised software modules and managed via standard-
ised and open APIs. These modules are called Virtual Network Functions (VNFs). A large
number of possible attacks concern the Virtual Machine (Hypervisor) as well as configuring
virtual functions. Lal et al. [1831] provide a table of NFV security issues and best practice
for addressing them. For example, an attacker can compromise a VNF and spawn other
new VNFs to change the configuration of a network by blocking certain legitimate ports. The
authors suggest hypervisor introspection and security zoning as mitigation techniques. Yang
et al. [1832] provide a comprehensive survey on security issues in NFV.

19.4.5 Network Access Control


Networks are typically quite lax about which devices can become part of them. Section 19.3.4.1
described port-based device authentication devices, which demands secrets from trusted
devices. Unfortunately, this still gives no security guarantees on the trustworthiness of network
devices. For example, while a device may have been deemed trustworthy at times, system
compromises may have caused the system to boot into an untrusted state. Network Access
Control, usually implemented using the Trusted Network Connect (TNC) architecture [1833],
enforces configurable security policies of devices when they join networks. These policies
enforce that network clients have been booted into trustworthy configurations. This may even
enable firewalls to precisely attribute which client software has caused certain traffic [1834].
Technically, to perform remote attestation, the verifier relies on unforgeable trusted hardware
on the proving device. Technical details can be found in the Hardware Security Knowledge
Area (Chapter 20).
The main practical drawback of such policies is that they attestate only the initial system
state. Malicious runtime modifications to the system are not possible with TNC. Validating if
a device was compromised after it entered a trusted boot state is still subject to research,
e.g., in runtime-attestation schemes via remote control-flow enforcement [1835, 1836].

KA Network Security | July 2021 Page 675


The Cyber Security Body Of Knowledge
www.cybok.org

19.4.6 Zero Trust Networking


Zero trust networks radically give up the idea of blindly trusting devices within a assumed-to-
be trusted part of a network. In zero trust networks, all devices are untrusted unless proven
otherwise. This represent a paradigm shift which arose out of the challenges in defining
centralized perimeters (e.g., a firewall) that split networks into trusted/untrusted domains. The
main motivation here is that traditional networks keep losing control over which devices join
the seemingly trusted side of a network (fueled by, e.g., bring-your-own-device). At the same
time, devices temporarily jump from trusted to untrusted networks (e.g., work-from-home),
before rejoining the trusted networks in a potentially modified state.
Migrating traditional network designs to zero trust networks is not trivial. NCSC provides a
comprehensive tutorial which can serve as a great starting point [1837]. In essence, a transition
to zero trust networks first requires a deep understanding of the assets in a network, such as
users, devices, services and data. Similarly, administrators require capabilities to measure
the security state of these assets. Any request to services must be authorized using strong
multi-factor authentication (described in the Authentication, Authorisation & Accountability
Knowledge Area (Chapter 14)), ideally paired with a single sign-on scheme not to frustrate
users. Not all legacy services can readily be plugged into such a zero-trust setting, requiring
adaptations to make them compatible to standard authentication schemes (e.g., OpenID
Connect, OAuth, SAML).
One popular example of such a zero trust network design is BeyondCorp [1838]. It leverages
network access control (see Section 19.4.5) to identify devices, and rigorously enforces user
identification using a centralized single sign-on system. In BeyondCorp, previously internal
application services become external ones, protected by an access proxy that rigorously
enforces strong encryption and access control.

19.4.7 DoS Countermeasures


Denial of Service (DoS) attacks can roughly be categorized into two categories, depending
on which resources they aim to exhaust. First, in volumetric DoS attacks, adversaries aim to
exhaust the network bandwidth of a victim. Amplification attacks (see Section 19.3.2.4) are
the most dominant instance of such attacks, but also large-scale Distributed Denial of Service
(DDoS) attacks from remote-controlled botnets can leverage high attack bandwidths. Attack
targets are typically individual services or networks, yet can also be entire links in the upper
Internet hierarchy (and their depending ASs) that become congested [1839, 1840]. Volumetric
attacks can be mitigated most effectively when traffic is stopped as early as possible before
it reaches the target network. For example, commercial so-called scrubbing services help to
filter malicious network traffic before it reaches the target. Technically, scrubbing services are
high-bandwidth network providers that—with the help of their customers—place themselves
between the Internet and an organization’s perimeter. Alternatively, attack victims can null
route traffic towards certain subnetworks via BGP advertisements to drop their traffic, or use
BGP FlowSpec to filter traffic at powerful edge routers.
Second, in application-level DoS attacks, miscreants aim to cripple resources at the software
layer. They typically aim to exhaust memory or computation resources (e.g., CPU). Here,
defenses are quite application specific. For example, SYN cookies (see Section 19.3.2.3) and
rate limiting protect TCP-based applications against connection floods. Also, CAPTCHAs
may help to further distinguish between human- and or computer-generated communication,
which is especially useful in the Web context.

KA Network Security | July 2021 Page 676


The Cyber Security Body Of Knowledge
www.cybok.org

19.5 CONCLUSION
[1733, c5] [1735, c8,c11] [1732, c6]

19.5.1 The Art of Secure Networking


We covered a broad arsenal of network security instruments and schemes that network
architects, network operators and communication partners can employ. It is important to
understand that there is no silver bullet to obtain “a secure” network or communication.
Unfortunately, it is rarely even possible to guarantee that none of the security goals will be
broken ever. Consequently, it requires a thorough combination of these principles to obtain a
reasonable and satisfactory level of security.
Having said this, there are fundamentals that we should strive for, and for which we have
proven and standardized means. Endpoints can securely communicate using TLS. Sometimes
these endpoints do however not represent the final recipients of sensitive data, such as for
email or messenger servers which may “buffer” messages until the recipient fetches them. In
these cases, we can deploy asynchronous end-to-end security schemes at the application layer
(e.g., PGP/SMIME) that tolerate untrusted middle hops. Network operators must be prepared
for attackers from within their network, and from outsiders. To protect against external threats,
they can deploy zero trust networking, or use firewalls for more centralized architectures. On
top, an IDS helps to identify threats at the payload level that were unnoticed by the firewall, and
network monitoring in general will allow for a posterio network forensics. Insider attacks are
much harder to mitigate, especially if trusted devices have been compromised by attackers,
yet port-based authentication or even network access control are good starting points. In
any case, decent network defenses require a fundamental interplay of several security best
practices.

19.5.2 Further Network Security Topics


Network security is too broad to be fully captured at reasonable detail within a single knowledge
area. Therefore, we have excluded several topics that are closely related, yet (still) relatively
special. We will very briefly outline them in the following.
Cloud and Data Center Security: As soon as organizations outsource computations (cloud) or
information (data center), this immediately triggers security demands. They are only partially
connected to network security, though. For example, to not expose data and computations
to cloud operators, clients can store data in hardware-backed secure containers such as
Intel SGX [1841]. Data centers and clouds run into risks that adversaries may abuse side
channel to leak sensitive data from co-located services or systems—a topic that is discussed
in detail in the Operating Systems & Virtualisation Knowledge Area (Chapter 11).
Delay-Tolerant Networks and Ad-hoc Sensors Networks: Not all networks guarantee that
communication partners are online, reachable and responsive all the time. Sensor networks
are one such example, where energy-constrained sensors just wake up periodically to exchange
information, and usually hibernate. Similarly, the speed of light implies that devices in networks
in space have significant message inter-arrival times (e.g., already over 2 seconds between
Earth and Moon). In general, this requires delay-tolerant networks, which are incompatible
with many of the aforementioned security principles, most of which assume reliable and
quick responsiveness of communication endpoints. A detailed treatment of this subject goes

KA Network Security | July 2021 Page 677


The Cyber Security Body Of Knowledge
www.cybok.org

beyond this KA. A great starting point for further reading is Ivancic’ security analysis on
delay-tolerant networks [1842].
Network Covert Channels: Network covert channels aim to hide the pure existence of commu-
nication, e.g., using steganography. They allow two or more collaborating attacker processes
to leak sensitive information despite network policies that should prevent such leakage. For ex-
ample, attackers may encode sensitive information in TCP headers that will remain unnoticed
by IDS [1843]. Similar covert channels are possible for other protocols, such as DNS [1844] or
IP [1845]. Covert channels can be confined by carefully modeling and observing all protocols
fields or patterns in general that could be abused for hiding information [1846].
Payment Networks: The banking sector foresees its own proprietary standards and network
protocols. Exploring those in detail goes beyond the scope of this document, particularly also
because protocols can be even specific to certain regions (e.g., FinTS in Germany) or special
purposes (e.g., 3-D Secure for securing credit card transactions). The rise of digital currencies
such as Bitcoin which implement several protocols on their own add further complexity. Finally,
stock exchanges nowadays heavily depend on reliable networks, and are extremely sensitive
to timing attacks that require careful Quality-of-Service assurances [1847, 1848].
Physical-Layer Security: Our security analyses stopped at the logical part of the link layer.
The physical part of this layer deserves further attention and indeed is a subject on its own. In
fact, we witnessed several recent advancements in this field, such as Bluetooth Low Energy,
distance bounding and positioning protocols, Near-Field Communication (NFC) or cellular
networks. For a detailed treatment of this subject, we refer to the Physical Layer and Telecom-
munications Security Knowledge Area (Chapter 22).
Networking Infrastructure Security: We have so far assumed that networking components are
fully trusted. However, with global supply chains that involve dozens of parties and countries
during manufacturing a component, such assumption may be easily invalidated in practice.
What happens if network infrastructure, which often is part of critical infrastructures, cannot
be trusted, e.g., due to backdoors or software vulnerabilities? Answering this question is far
from trivial, as it depends on which components and which security guarantees are at stake.
One recent real-world example of such an analysis happens in 5G networks, where some
countries ban hardware that is delivered by some other countries, simply because of lacking
trust. This quickly turns into a non-networking issue that finds its solutions in other chapters,
such as in the Software Security Knowledge Area (Chapter 15), the Secure Software Lifecycle
Knowledge Area (Chapter 17) or Hardware Security Knowledge Area (Chapter 20). Discussing
non-trustworthy networking components goes beyond the scope of this chapter.
Cross-Border Regulations: Networks that span several countries and thus legislations are
quite interesting from a law perspective. There may be conflicts of law, e.g., regarding patents,
export restrictions, or simply the question whether or not a digital signature is legally binding.
These topics are addressed in depth in the Law & Regulation Knowledge Area (Chapter 3).

KA Network Security | July 2021 Page 678


The Cyber Security Body Of Knowledge
www.cybok.org

CROSS-REFERENCE OF TOPICS VS REFERENCE MATERIAL

[1733]

[1734]

[1735]

[1732]
[1731]
19.1 Security Goals and Attacker Models c8 c1 c1 c6 c8
19.2 Networking Applications c1 c1
19.3.1 Security at the Application Layer c8 c6,c15,c19–c22 c8
19.3.2 Security at the Transport Layer c8 c4,c6 c8
19.3.3 Security at the Internet Layer c8 c5,c9 c17 c8
19.3.4 Security on the Link Layer c8 c7 c8
19.4 Network Security Tools c8 c5,c8,c11,c12 c23 c6 c8
19.5 Conclusion c5 c8,c11 c6

KA Network Security | July 2021 Page 679


The Cyber Security Body Of Knowledge
www.cybok.org

KA Network Security | July 2021 Page 680


Chapter 20
Hardware Security
Ingrid Verbauwhede KU Leuven

681
The Cyber Security Body Of Knowledge
www.cybok.org

INTRODUCTION
Hardware security covers a broad range of topics from trusted computing to Trojan circuits. To
classify these topics we follow the different hardware abstraction layers as introduced by the Y-
chart of Gajski & Kuhn. The different layers of the hardware design process will be introduced
in section 20.1. It is linked with the important concept of a root of trust and associated
threat models in the context of hardware security. Next follows section 20.2 on measuring
and evaluating hardware security. The next sections gradually reduce the abstraction level.
Section 20.3 describes secure platforms, i.e. a complete system or system-on-chip as trusted
computing base. Next section 20.4 covers hardware support for software security: what
features should a programmable processor include to support software security. This section
is closely related to the Software Security Knowledge Area (Chapter 15). Register transfer
level is the next abstraction level down, covered in section 20.5. Focus at this level is typically
the efficient and secure implementation of cryptographic algorithms so that they can be
mapped on ASIC or FPGA. This section is closely related to the Cryptography Knowledge
Area (Chapter 10). All implementations also need protection against physical attacks, most
importantly against side-channel and fault attacks. Physical attacks and countermeasures are
described in section 20.6. Section 20.7 describes entropy sources at the lowest abstraction
level, close to CMOS technology. It includes the design of random numbers generators and
physically unclonable functions. The last technical section describes aspects related to the
hardware design process itself. This chapter ends with the conclusion and an outlook on
hardware security.

20.1 HARDWARE DESIGN CYCLE AND ITS LINK TO


HARDWARE SECURITY
Hardware security is a very broad topic and many topics fall under its umbrella. In this section,
these seemingly unrelated topics are grouped and ordered according to the design levels of
abstraction as introduced by the Y-chart of Gajski & Kuhn [1849]. While Gajski & Kuhn propose
a general approach to hardware design, in this chapter it is applied to the security aspects of
hardware design and it is linked to threat models and the associated root of trust.

20.1.1 Short background on the hardware design process


Design abstraction layers are introduced in hardware design to reduce the complexity of the
design. As indicated in 20.1, the lowest abstraction level a designer considers are individual
transistors at the center of the figure. These transistors are composed together to form basic
logic gates, such as NAND, NOR gates or flip-flops, called the logic level. Going one abstraction
layer up, at register transfer level gates are grouped together to form modules, registers, ALU’s,
etc, and their operation is synchronized by a clock. These modules are then composed to
form processors, specified by instruction sets, upon which applications and algorithms can
be implemented.
By going up in the abstraction layers, details of underlying layers are hidden. This reduces
design complexity at higher abstraction layers. The abstraction layers are represented by
concentric circles in figure 20.1. Upon these circles, the Y-chart of Gajski & Kuhn introduces 3
design activities, represented by three axes: a behavioral axis, describing the behavior or what

KA Hardware Security | July 2021 Page 682


The Cyber Security Body Of Knowledge
www.cybok.org

Behavioural Domain Structural Domain


Systems
Algorithms
Processors
Register transfers
Logic ALUs, RAM, etc.
Current, voltage Gates, flip-flops, etc.
Transistors

Transistor layout
Cell layout
Module layout
Floorplans
Physical partitions
Physical Domain

Figure 20.1: Gajski-Kuhn Y-chart

needs to be implemented (aka specifications), a structural axis describing how something is


implemented and a physical axis, how the layouts are composed together at gate, module,
chip, board level. An actual design activity is a ‘walk’ through this design space. Typically
one starts with the specifications at the top of the behavioral domain. These specifications
(=what) are decomposed in components at the same level of abstraction (=how) moving from
the behavioral axis to the structural axis. A structural component at one abstraction level
becomes a behavioral component at one level down.
As an example of a walk through the design space: Assume a hardware designer is requested
to implement a light-weight, low power security protocol for an Internet of Things (IoT) device.
This designer will only receive specifications on what needs to be designed: a security protocol
aims at providing confidentiality and integrity (= what) and a set of cryptographic algorithms
(= components) to support the protocol. The crypto-algorithms are provided as a behavioral
specification to the hardware designer, who has the choice of implementing it as a dedicated
co-processor, as an assembly program, or support it with a set of custom instructions. De-
pending on costs and volumes, a choice of a target CMOS technology or an FPGA platform
is made. This behavioral level will be translated into a more detailed register-transfer level
description (e.g. VHDL or Verilog). At the Register Transfer Level (RTL), decisions need to be
made if this will be a parallel or sequential version, a dedicated or programmable design, with
or without countermeasures against side-channel and fault attacks, etc.
Essential for the division in design abstraction layers, is the creation of models on how
components behave. E.g. to simulate the throughput or energy consumption of a arithmetic
unit, quality models of the underlying gates need to be available. Similarly, the Instruction Set
Architecture is a model of a processor available to the programmer.

KA Hardware Security | July 2021 Page 683


The Cyber Security Body Of Knowledge
www.cybok.org

20.1.2 Root of trust


In the context of security, a root of trust is a model of an underlying component for the purpose
of security evaluation. According to Anderson [1013]: "A root of trust is a component used
to realize a security function, upon which a designer relies but of which the trustworthiness
can not be explicitly verified." The designer uses one or multiple components to construct a
security function, which then defines the trusted computing base. It is defined by the trusted
computing group as follows: “An entity can be trusted if it always behaves in the expected
manner for the intended purpose.” [1850].
E.g. for an application developer, a Trusted Platform Module (TPM) or a Subscriber Identity
Module (SIM) are a root of trust which the developer uses to construct a security application.
For the TPM designer, the TPM is composition of smaller components which are composed
together to provide security functionality. At the lowest hardware abstraction layers, basic
roots of trust are the secure storage of the key in memory or the quality of the True Random
Number Generator.
Hardware security is used as an enabler for software and system security. For this reason,
hardware provides basic security services such as secure storage, isolation or attestation. The
software or system considers the hardware as the trusted computing base. And thus from a
systems or application view point, hardware has to behave as a trusted component. However,
the hardware implementation can violate the trust assumption. E.g. Trojan circuits or side-
channel attacks could leak the key or other sensitive data to an attacker. Hence, hardware itself
also needs security. Moreover hardware needs security at all abstraction layers. Therefore, at
every abstraction layer, a threat model and associated trust assumptions need to be made. An
alternative definition for a root of trust in the context of design abstraction layers is therefore:
“A root of trust is a component at a lower abstraction layer, upon which the system relies for
its security. Its trustworthiness can either not be verified, or it is verified at a lower hardware
design abstraction layer. "

20.1.3 Threat model


A threat model is associated with each root of trust. When using a root of trust, it is assumed
that the threat model is not violated. This means that the threat model is also linked to the
hardware abstraction layers. If we consider a root of trust at a particular abstraction layer,
then all components that constitute this root of trust, are also considered trusted.
Example 1: security protocols assume that the secret key is securely stored and not accessible
to the attacker. The root of trust, upon which the protocol relies, is the availability of secure
memory to guard this key. For the protocol designer, this secure memory is a black box. The
hardware designer has to decompose this requirement for a secure memory into a set of
requirements at a lower abstraction layer. What type of memory will be used? On which busses
will the key travel? Which other hardware components or software have access to the storage?
Can there be side-channel leaks?
Example 2: It is during this translation of higher abstraction layer requirements from protocol
or security application developers into lower abstraction layers for the hardware designers
that many security vulnerabilities occur. Implementations of cryptographic algorithms used
to be considered black boxes to the attacker: only inputs/outputs at the algorithm level are
available to mount mostly mathematical cryptanalysis attacks. However, with the appearance
of side-channel attacks (see section 20.6) this black box assumption no longer holds. Taking

KA Hardware Security | July 2021 Page 684


The Cyber Security Body Of Knowledge
www.cybok.org

side-channel leakage into account the attacker has the algorithm level information as well as
the extra timing, power, electro-magnetic information as observable from the outside of the
chip. Thus the attacker model moves from black box to gray box. It is still assumed that the
attacker does not know the details of the internals, e.g. the contents of the key registers.
Example 3: for programmable processors, the model between hardware and software is
traditionally considered the Instruction Set Architecture (ISA). The ISA is what is visible to the
software programmer and the implementation of the ISA is left to the hardware designer. The
ISA used to be considered the trust boundary for the software designer. Yet, with the discovery
of micro-architectural side-channel attacks, such as Spectre, Meltdown, Foreshadow, this
ISA model is no longer a black box, as also micro-architectural information and leakage are
available to the attacker [1851].

20.1.4 Root of trust, threat model and hardware design abstraction layers

The decomposition in abstraction layers, in combination with Electronic Design Automation


(EDA) tools, is one of the main reasons that the exponential growth of Moore’s law was
sustainable in the past decades and it still is. This approach works well when optimizing for
performance, area, energy or power consumption. Yet for hardware security, no such general
decomposition exists.
In this chapter, we propose to organise the different hardware security topics, their associated
threat models and root of trust according to the hardware design abstraction layers, as there is
no known other general body of knowledge available to organize the topics. This organization
has the advantage that it can be used to identify the state of the art on different subtopics of
hardware security. As an example, in the specific context of hardware implementations of
cryptographic algorithms, the state of the art is well advanced and robust countermeasures
exist to protect cryptographic implementations against a wide range of side-channel attacks,
as shown in detail in section 20.5. Yet in the context of general processor security, e.g. to
isolate process related data or to provide secure execution, new security hazards continue to
be discovered on a regular basis.
In an attempt to order the topics, table 20.1 summarizes this organization. The different
abstraction layers are identified (first column) from a hardware perspective. The highest level
(system and software) sits on top of the hardware platform. E.g. a system designer assumes
that a secure platform is available. Thus the secure platform is the root of trust, providing
security functionality. The second column describes the functionality provided by the root of
trust. The third column describes how this functionality might be implemented. E.g. at the
highest abstraction layer this might be by providing a Trusted Execution Module or a secure
element, etc. The fourth column describes the threat models and attack categories at that
abstraction layer. E.g. at system level, the system designer assumes that they will receive a
module that provides isolation, integrity, attestation, etc. The last column describes typical
design activities at this particular design abstraction layer.
This exercise is repeated for each abstraction layer and described in detail in each of the
following sections.
At the processor level, one can distinguish general purpose programmable processors and
domain specific processors. General purpose processors should support a wide range of
applications, which unfortunately typically include software vulnerabilities. Hardware features

KA Hardware Security | July 2021 Page 685


The Cyber Security Body Of Knowledge
www.cybok.org

Abstraction level Root of trust - Structural (how) - Example Threats Typical HW design
functionality examples activities
System and Secure platforms e.g. Trusted to support security
application Execution isolation, integrity, application
(Trustzone, SGX, attestation, . . . development
TEE), HSM, Secure
Element
Processor general purpose e.g. shadow stack SW vulnerabilities ISA, HW/SW
co-design
Processor domain specific Crypto specific Timing attacks Constant number
RTL of clock cycles
Register Transfer Crypto specific Building blocks, Side Channel Logic synthesis
Attack,
Logic Resistance to SCA, Masking, Circuit Side Channel FPGA tools,
Power, EM, fault styles attack, fault standard cell
design
Circuit and Source of entropy TRNG, PUF, Secure Temperature, SPICE simulations
technology SRAM glitches
Physical Tamper Shields, sensors Probing, heating Layout activities
Resistance

Table 20.1: Design abstraction layers linked to threat models, root of trust and design activities

are added to address these software vulnerabilities, such as a shadow stack or measures
to support hardware control flow integrity. Domain specific processors typically focus on
a limited functionality. They are typically developed as co-processors in larger systems-on-
chip. Typical examples are co-processors to support public key or secret key cryptographic
algorithms. Time at the processor level is typically measured in instruction cycles.
Both general purpose and domain specific processors are composed together from compu-
tational units, multipliers and ALU’s, memory and interconnect. These modules are typically
described at the register transfer level: constant-time and resistance against side-channel
attacks become the focus. Time at this level is typically measured in clock cycles.
Multipliers, ALU’s, memories, interconnect and bus infrastructure are created from gates
and flip-flops at the logic level. At this design abstraction level, focus is on leakage through
physical side-channels, power, electro-magnetic, and fault attacks. Time is typically measured
in absolute time (nsec) based on the available standard cell libraries or FPGA platforms.
The design of entropy sources requires knowledge and insights into the behavior of transistors
and the underlying Complementary Metal-Oxide-Semiconductor (CMOS) technology.The de-
sign of these hardware security primitives is therefore positioned at the circuit and transistor
level. Similarly the design of sensors and shields against physical tampering require insight
into the technology. At the circuit and technology level it is measured in absolute time, e.g.
nsec delay or GHz clock frequency.
The table 20.1 does not aim to be complete. The idea is to illustrate each abstraction layer
with an example. In the next sections, the hardware security goals and their associated threat
models will be discussed in detail in relation to and relevance for each abstraction layer.

KA Hardware Security | July 2021 Page 686


The Cyber Security Body Of Knowledge
www.cybok.org

20.2 MEASURING HARDWARE SECURITY


Depending on the commercial application domain, several industrial and government orga-
nizations have issued standards or evaluation procedures. The most well known ones are
the FIPS 140-2 (and the older FIPS 140-1), the Common Criteria (CC) evaluation and in the
financial world the EMVCO. FIPS 140-2 mostly focuses on the implementation security of
cryptographic algorithms. Common Criteria are applicable to IT security in general.

20.2.1 FIPS140-2
FIPS140-2 is a US NIST standard used for the evaluation of cryptographic modules. FIPS140-2
defines security levels from 1 to 4 (1 being the lowest). The following gives a description of the
four levels from a physical hardware security point of view. Next to the physical requirements,
there are also roles, services and authentication requirements (for more details see [1852]
and other KAs).
Security level 1 only requires than an approved cryptographic algorithm be used, e.g. AES or
SHA-3, but does not impose physical security requirements. Hence a software implementation
could meet level 1. Level 2 requires a first level of tamper evidence. Level 3 also requires the
tamper evidence, but on top requires tamper resistance.
NIST defines tampering as an intentional but unauthorized act resulting in the modification of
a system, components of systems, its intended behavior, or data, [1853].
Tamper evidence means that there is a proof or testimony that tampering with a hardware
module has happened. E.g. a broken seal indicates that a device was opened. A light sensor
might observe that the lid of a chip package was lifted.
Tamper resistance means that on top of tamper evidence, protection mechanisms are added
to the device. E.g. by extra coating or dense metal layers, it is difficult to probe the key registers.
Level 4 increases the requirements such that the cryptographic module can operate in physi-
cally unprotected environments. In this context, the physical side-channel attacks pose an
important threat. If any of these physical components depend on sensitive data being pro-
cessed, information is leaked. Since the device is under normal operation, a classic tamper
evidence mechanism will not realize that the device is under attack. See later in section 20.6.

20.2.2 Common criteria and EMVCo


“Common Criteria for information technology security evaluation" is an international standard
for IT product security (ISO/IEC 15408), in short known as Common Criteria (CC). CC is a
very generic procedure applicable to the security evaluation of IT products. Several parties
are involved in this procedure. The customer will define a set of security specifications for
its product. The manufacturer will design a product according to these specifications. An
independent evaluation lab will verify if the product fulfills the claims made in the security
requirements. Certification bodies will issue a certification that the procedure was correctly
followed and that the evaluation lab indeed confirmed the claims made. The set of security
specifications are collected in a so-called protection profile.
Depending on the amount of effort put into the security evaluation, the CC defines different
Evaluation Assurance Levels (EALs). It ranges from basic functional testing, corresponding

KA Hardware Security | July 2021 Page 687


The Cyber Security Body Of Knowledge
www.cybok.org

to EAL1, to formally verified design and tested, corresponding to the highest level EAL7. CC
further subdivides the process of evaluation into several classes, where most of the classes
verify the conformity of the device under test. The 5th class (AVA) deals with the actual
vulnerability assessment. It is the most important class from a hardware security viewpoint
as it searches for vulnerabilities and associated tests. It will assign a rating on the difficulty
to execute the test, called the identification, and the possible benefit an attacker can gain
from the penetration, called the exploitation. The difficulty is a function of the time required
to perform the attack, the expertise of the attacker from layman to multiple experts, how
much knowledge of the device is required from simple public information to detailed hardware
source code, the number of samples required, and the cost and availability of equipment to
perform the attack, etc. A high difficulty level will result in a high score and a high level of the
AVA class. The highest score one can obtain is an AVA level of 5, which is required to obtain a
top EAL score.
Its usage is well established in the field of smartcards and secure elements as they are used
in telecom, financial, government ID’s applications. It is also used in the field of Hardware
Security Modules, Trusted Platform Modules and some more [1854]. For certain classes of
applications minimum sets of requirements are defined into protection profiles. There exists
protection profiles for Trusted Platform Module (TPM), Javacards, Biometric passports, SIM
cards, secure elements, etc.
Since certification comes from one body, there exist agreements between countries so that
the certifications in one country are recognized in other countries. As an exception EMVCo
is a private organization to set the specifications for worldwide interoperability of payment
transactions. It has its own certification procedure similar to CC.
Please note that the main purpose of a common criteria evaluation is to verify that an IT product
delivers the claims promised in the profile. It does not mean that there are no vulnerabilities
left. A good introduction to the topic can be found in [1855] and a list of certified products on
[1854].

20.2.3 SESIP: Security Evaluation Standard for IoT Platforms


In the context of IoT security evaluation, a recent initiative is the SESIP Security Evaluation
scheme [1856], currently at version 1.2. IoT devices are typically small, light-weight ’things’,
with limited accessibility via internet. Several levels of threat model for IoT are possible: from
only remote internet access, over various remote software attack options, to also physical
attack resistance. A comprehensive set of security functional requirements are defined:
identification and attestation, product lifecycle, secure communication, software and physical
attack resistance, cryptographic functionality including random number generation, and some
compliance functionality to e.g. provide secure encrypted storage or provide reliable time.
Similar to Common Criteria, SESIP provides several levels of assurance. Level 1 is the lowest
level and consists of a self-assessment. The highest level of SESIP consists of a full CC
evaluation similar to smart cards or secure elements. The levels in between cover from a black
box penetration testing over white box penetration testing with or without time limitations.

KA Hardware Security | July 2021 Page 688


The Cyber Security Body Of Knowledge
www.cybok.org

20.3 SECURE PLATFORMS


This section describes the goals and the state-of-the-art in secure platforms. At this high level
of abstraction the system designer receives a complete chip or board as trusted computing
base. The system designers assume that the trusted root delivers a set of cryptographic
functions, protected by the hardware and software inside the physical enclosure. Common
to these platforms is that they are stand-alone pieces of silicon with a strict access policy.
Depending on the provided functionality, the hardware tamper resistance and protection levels,
and the communication interface, these secure platforms are used in different application
fields (automotive, financial, telecom). Three important platforms are the Hardware Security
Module (HSM), the Subscriber Identification Module or SIM and the Trusted Platform Module
(TPM). These are briefly described next.

20.3.1 HSM Hardware Security Module


A HSM module will typically provide cryptographic operations, e.g. a set of public key and
secret key algorithms, together with secure key management including secure generation,
storage and deletion of keys. Essential to HSM’s is that these operations occur in a hardened
and tamper resistant environment. A TRNG and a notion of a real-time clock are usually also
included. HSM’s are mostly used in server back-end systems to manage keys or payment
systems, e.g. in banking systems.
A HSM is used as a co-processor, attached to a host system. Its architecture typically includes
a micro-processor/micro-controller, a set of crypto co-processors, secure volatile and non-
volatile memory, TRNG, real-time clock, and I/O. The operations occur typically inside a tamper
resistant casing. In previous generations, inside the casing multiple components reside on
one board.
Recently, in some application domains, such as automotive, HSM functionality is no longer
provided as a stand-alone module but is now integrated as a secure co-processor in a larger
System on a Chip (SoC). Indeed Moore’s law enables higher integration into one SoC. What
exactly is covered under HSM functionality depends on the application domain. Therefore,
compliance with security levels is also evaluated by specialized independent evaluation labs
according to specific protection profiles.

20.3.2 Secure Element and Smartcard


Similar to an HSM, a Secure Element and a smart card provide a set of cryptographic algorithms,
public key, secret key, HMAC, etc. together with secure key storage, generation and deletion.
The main difference with an HSM are cost, size, and form factor. They are typically implemented
as one single integrated circuit and have a much smaller form factor from around 50 cm2 to
less than 1 cm2 . The main difference between a smart card and a secure element sits in the
form factor and the different markets they address. Secure elements are a more generic term,
while smart cards have the very specific form factor of a banking card. They are produced in
large volumes and need to be very cheap as they are used for SIM cards in cell phones and
smart phones. They are also used in banking cards, pay-TV systems access cards, national
identity cards and passports, and recently in IOT devices, vehicular systems and so on. Tamper
resistance and physical protection are essential to secure elements. They are a clear instance
of what in a computer architecture domain are called ’domain specific processors’. Specific

KA Hardware Security | July 2021 Page 689


The Cyber Security Body Of Knowledge
www.cybok.org

protection profiles exist depending the application domain: financial, automotive, pay-TV, etc.
A typical embedded secure element is one integrated circuit with no external components. It
consists of a small micro-controller with cryptographic co-processors, secure volatile and
non-volatile storage, TRNG, etc. I/O is usually limited, through a specific set of pins, or through
a NFC wireless connection. Building a secure element is a challenge for a hardware designer,
as one needs to combine security with non-security requirements of embedded circuits: small
form factor (no external memory), low power and/or low energy consumption in combination
with tamper resistance and resistance against physical attacks, such as side-channel and
fault attacks (see section 20.6).

20.3.3 Trusted Platform Module (TPM)


The TPM module has been defined by the Trusted Computing Group (TCG), an industry
association, to provide specific security functions to the Personal Computer (PC) platform.
More specifically, the TPM is a root of trust embedded on the PC platform, so that PC+TPM
platform can identify itself and its current configuration and running software [1850]. The
TPM provides three specific roots of trust: the Root of Trust for Measurement (RTM), the
Root of Trust for Storage (RTS), the Root of Trust for Reporting (RTR). Besides these three
basic functions, other functionality of TPMs is being used: access to specific cryptographic
functions, secure key storage, support for secure login, etc.
The TPM is implemented as a separate security module, much like a secure element but with a
specific bus interface to a PC platform, e.g. through the LPC or I2 C bus interface. Its architecture
at minimum consists of an embedded micro-controller, several crypto coprocessors, secure
volatile and non-volatile storage for root keys and a high quality true random number generator.
It includes hardware engines for hash functions (SHA1 and SHA256), public key (RSA and
ECC), secret key (AES) and HMAC calculations. Since a TPM is a separate module, physical
protection and tamper resistance is essential for security. Next to its main scope of integrity
protection, TPM also has applications in disk encryption, digital rights management, etc.
The most recent TPM2.0 version broadens the application scope from PC oriented to also
supporting networking, embedded, automotive, IoT, and so on. It also provides a more flexible
approach in the functionality included. Four types of TPM are identified: the dedicated inte-
grated circuit ‘discrete element’ TPM provides the highest security level. One step lower in
protection level is the ‘integrated TPM’ as an IP module in a larger SoC. The lowest levels of
protection are provided by the firmware and software TPM.
The adoption of TPMs has evolved differently from what was originally the focus of the TCG.
Originally, the main focus was the support of a secure boot and the associated software stack,
so that a complete measurement of the software installed could be made. The problem is
that the complexity of this complete software base grows too quickly, making it too difficult to
measure completely all variations in valid configurations. Thus TPMs are less used to protect
a complete software stack up to the higher layers of software. Still most new PCs now have
TPMs but they are used to protect the encryption keys, avoid firmware roll-back, and assist
the boot process in general.
Starting from the original TPM, the Trusted Computing Group has broadened its scope and
now has working groups on many different application, such as cloud, embedded systems,
IoT, mobile, network equipment, and so on, [1857].

KA Hardware Security | July 2021 Page 690


The Cyber Security Body Of Knowledge
www.cybok.org

20.4 HARDWARE SUPPORT FOR SOFTWARE SECURITY AT


ARCHITECTURE LEVEL
At the secure platform level, the complete module, i.e. hardware and its enclosed embedded
software, are part of the trusted computing base. One level down on the abstraction layers,
we make the assumption that all hardware is trusted, while software is no longer trusted.
Indeed, software vulnerabilities are a major source of security weaknesses (see the Software
Security Knowledge Area (Chapter 15)). To prevent the exploitation or to mitigate the effects of
software vulnerabilities, a large variety of hardware modifications/additions to the processor
architecture have been proposed in literature and have been included in commercial proces-
sors. We call this abstraction layer the hardware/software boundary: hardware forms the
trust boundary, while software is no longer trusted. These security additions to the hardware
typically have a cost in extra area and loss in performance.
The most important security objectives at this design abstraction level are to support protec-
tion, isolation and attestation for the software running on a processor platform [1858], [1859],
[1860].
• Protection: "A set of mechanisms for ensuring that multiple processes sharing the
processor, memory, or I/O devices cannot interfere, intentionally or unintentionally, with
one another by reading or writing each others’ data. These mechanisms also isolate the
operating system from the user process" [1858]. In a traditional computer architecture,
usually the OS kernel is part of the Trusted Computing Base (TCB), but the rest of the
software is not.
• With isolation, a hardware mechanism is added that controls access to pieces of soft-
ware and associated data. Isolation separates two parties: a software module might
need protection from the surrounding software is one case. So, a Protected Model Archi-
tecture (PMA) provides a hardware guarantee that a piece of software runs unhindered
from unwanted outside influences. The opposite case, if we want to limit the effects
of possibly tainted software to its environment, it will be sandboxed or be put into a
‘compartment.’ Protected Module Architectures are a hardware only solution: the OS is
not part of the TCB. More details are described in section 20.4.4
• With attestation, there is hardware support to demonstrate to a third party that the
system, e.g. the code installed and/or running on a processor, is in a particular state.
Attestation can be local or remote. Local attestation means that one software module
can attest its state to another one on the same compute platform. Remote attestation
means that a third party, outside the compute platform can get some guarantee about
the state of a processor.
In the context of general purpose computing, Virtual Machines (VMs) and Hypervisors have
been introduced to support multiple operating systems on one physical processor. This sharing
of resources improves efficiency and reuse. It can however only be realized by a secure and
efficient sharing of physical memory: virtual machines should only be allowed to use the
portions of physical memory assigned to it. The organization and details of virtual memory
are out of scope of hardware security and part of the Operating Systems & Virtualisation
Knowledge Area (Chapter 11). The hardware supports protection by providing privileged
instructions, control and status registers and sometimes support for multiple parallel threads.
In the context of embedded micro-controllers, with no operating system, and only one applica-

KA Hardware Security | July 2021 Page 691


The Cyber Security Body Of Knowledge
www.cybok.org

tion, the hardware support could be limited to only machine level support. Memory protection
could be added as an optional hardware module to the processor.
Other more advanced security objectives to support software security might include:
• Sealed storage is the process of wrapping code and/or data with certain configuration,
process or status values. Only under the correct configuration (e.g. program counter
value, nonce, secret key, etc.) can the data be unsealed. Dynamic root of trust in combi-
nation with a late launch guarantees that even if the processor starts from an unknown
state, it can enter a fixed known piece of code and known state. This typically requires
special instructions to enter and exit the protected partition.
• Memory protection refers to the protection of data when it travels between the processor
unit and the on-chip or off-chip memory. It protects against bus snooping or side-channel
attacks or more active fault injection attacks.
• Control flow integrity is a security mechanism to prevent malware attacks from redirect-
ing the flow of execution of a program. In hardware, the control flow of the program is
compared on-the-fly at runtime with the expected control flow of the program.
• Information flow analysis is a security mechanism to follow the flow of sensitive data
while it travels through the different components, from memory to cache over multiple
busses into register files and processing units and back. This is important in the context
of micro-architectural and physical side-channel attacks.
In the next subsections a representative set of hardware approaches to address the above
software security challenges are presented. Some hardware techniques address multiple se-
curity objectives. Some are large complex approaches, others are simple dedicated hardware
features.
As a side note: a large body of knowledge on software-only approaches is available in literature.
Mostly, they offer a weaker level of security as they are not rooted in a hardware root of trust.
E.g. for control flow integrity, software-only approaches might instruct the software code
to check branches or jumps, while hardware support might calculate MACs on the fly and
compare these to stored associated MACs.

20.4.1 Trusted Execution Environment (TEE)


TEE was originally an initiative of Global Platform, a consortium of companies, to standardize a
part of the processor as a trusted secure part. TEE has since evolved and covers in general the
hardware modifications made to processors to provide isolation and attestation to software
applications. There is a large body of knowledge both from the industrial side as well as from
the academic side.
TEE is a concept that provides a secure area of the main processor “to provide end-to-end
security by protecting the execution of authenticated code, confidentiality, authenticity, privacy,
system integrity and data access rights” [1861]. It is important that the TEE is isolated from the
so-called Rich Execution Environment (REE), which includes the untrusted OS. The reasoning
behind this split is that it is impossible to guarantee secure execution and to avoid malware in
the normal world due to the complexity of the OS and all other applications running there. The
rich resources are accessible from the TEE, while the opposite is not possible. Global Platform
does not specify the specifics on how these security properties should be implemented. Three
main hardware options are suggested. Option 1 assumes that every processor component on

KA Hardware Security | July 2021 Page 692


The Cyber Security Body Of Knowledge
www.cybok.org

the IC can be split into a trusted and a rich part, i.e. the processor core, the crypto accelerators,
the volatile and non-volatile memory are all split. Option 2 assumes that there is a separate
secure co-processor area on the SoC with a well-defined hardware interface to the rest of the
SoC. Option 3 assumes a dedicated off-chip secure co-processor, much like a secure element.
Global Platform defines also a Common Criteria based protection profile (see section 20.2.2)
for the TEE. It assumes that the package of the integrated circuit is a black box [1861] and
thus secure storage is assumed by the fact that the secure asset remains inside the SoC. It
follows the procedures of common criteria assurance package EAL2 with some extra features.
It pays extra attention to the evaluation of the random number generator and the concept of
monotonic increasing time.

20.4.2 IBM 4758 Secure coprocessor


An early example, even before the appearance of the TEE of Global Platform is the IBM 4758
secure processor. Physical hardware security was essential for this processor: it contained a
board with a general purpose processor, DRAM, separate battery backed-DRAM, Flash ROM,
crypto accelerator (for DES), a random number generator and more. All of these components
were enclosed in a box with tamper resistant and tamper evidence measures. It was certified
to FIPS 140-1, level 4 at that time [1862].

20.4.3 ARM Trustzone


ARM Trustzone is one well known instantiation of a TEE. It is part of a system of ARM
processors integrated into System on a Chips (SoCs) mostly used for smartphones. The
TEE is the secure part of the processor and it runs a smaller trusted OS. It is isolated from
the non-secure world, called the Rich Execution Environment, which runs the untrusted rich
OS. The main hardware feature to support this split is the Non-Secure (NS) bit. The AXI bus
transactions are enhanced with a NS bit so that it can block the access of secure world
resources by non-secure resources. Each AXI transaction comes with this bit set or reset.
When the processor runs in the secure mode, then the transaction comes with the NS bit set
to zero, which gives it access to both secure and non-secure resources. When the processor
runs in normal mode, it can only access resources from the normal world. This concept is
extended to the level 1 and level 2 cache. These caches store an extra information bit to
indicate if the code can be accessed by a secure or non-secure master. Special procedures
are foreseen to jump from secure to non-secure and vice-versa. This is supported by a special
monitor mode which exists in the secure world.
The split applied by ARM Trustzone is however a binary split. Applications from different
vendors could co-exist together in the secure world and so if one trusted component violates
the system’s security, the security can no longer be guaranteed. To address this issue, protected
module architectures are introduced.
Trusted Execution Environments are also being created in open-source context, more specifi-
cally in the context of the RISC-V architecture.

KA Hardware Security | July 2021 Page 693


The Cyber Security Body Of Knowledge
www.cybok.org

20.4.4 Protected Module Architectures and HWSW co-design solutions


If multiple software applications want to run on the same platform isolated from each other,
then hardware needs to isolate them from each other at a more fine granularity. This can
be done by so-called protected module architectures. The basic idea is that small software
modules can run protected from all other software running on the processor. And because
they are small, their properties and behavior can be verified more thoroughly. The protection
is provided by extra features added to the hardware in combination with an extremely small
trusted software base if needed. In the Flicker project, the software TCB relies on only 250
lines of codes but requires a dedicated TPM chip [1863]. Table 12 of the review work of [1862],
provides an in-depth comparison of several general purpose secure processor projects with
their hardware and software TCB. The hardware TCB distinguishes between the complete
mother board as TCB, e.g. for TPM usage, to CPU package only for SGX and other projects. The
software TCB varies from a complete secure world as is the case for TrustZone to privileged
containers in the case of SGX or a trusted hypervisor, OS or security monitor.
Even more advanced are solutions with a zero trusted software base: only the hardware is
trusted. This is the case for the Sancus project [1864]. It implements a program counter based
memory access control system. Extra hardware is provided to compare the current program
counter with stored boundaries of the protected module. Access to data is only possible if
the program counter is in the correct range of the code section. Progress of the program in
the code section is also controlled by the hardware so that correct entry, progress and exit of
the module can be guaranteed.
Intel’s Software Guard Extension (SGX) are also a protection mechanism at small granularity.
Software modules of an application are placed in memory enclaves. Enclaves are defined in
the address space of a process, but access to enclaves is restricted. Enclaves are created,
initialized, and cleared by possibly untrusted system software, but operating in the enclave
can only be done by the application software. Minimizing the extra hardware to support SGX,
and especially avoiding performance degradation is an important goal. The details of the
hardware micro-architecture have not been disclosed: yet its most important parts are a
memory encryption unit, a series of hardware enforced memory access checks and secure
memory range registers [1862].

20.4.5 Light-weight and individual solutions


The above listed solutions are mostly suited for general purpose computing, i.e. for platforms
on which a complex software stack will run. In literature, more solutions are proposed to
provide extremely light weight solutions to support specific security requests. SMART is one
early example: it includes a small immutable piece of bootROM, considered the root of trust,
to support remote attestation [1865].
To protect against specific software attacks, more individual hardware countermeasures
have been introduced. One example is a hardware shadow stack: to avoid buffer overflow
attacks and to protect control flow integrity, return addresses are put on both the stack and
the shadow stack. When a function loads a return address, the hardware will compare the
return address of the stack to that of the shadow stack. They should agree for a correct return.
Another example is the protection of jump and return addresses to avoid buffer overflow
attacks and other abuses of pointers. A simple but restrictive option is to use read-only
memory, which fixes the pointer. A novel recent technique is the use of pointer authentication.

KA Hardware Security | July 2021 Page 694


The Cyber Security Body Of Knowledge
www.cybok.org

The authentication code relies on cryptographic primitives. A challenge for these algorithms
is that they should create the authentication tag with very low latency to fit into the critical
path of a microprocessor. The ARMV8-A architectures uses therefore a dedicated low-latency
crypto algorithm Qarma [1866]. In this approach the unused bits in a 64-bit pointer are used
to store a tag. This tag is calculated based on a key and on the program state, i.e. current
address and function. These tags are calculated and verified on the fly.
Address Space Layout Randomization or Stack canaries area general software technique: its
aim is to make it hard to predict the destination address of the jump. A detailed description
can be found in the Software Security Knowledge Area (Chapter 15).

20.5 HARDWARE DESIGN FOR CRYPTOGRAPHIC


ALGORITHMS AT RTL LEVEL
The hardware features discussed so far are added to general purpose compute platforms,
i.e. to a programmable micro-processor or micro-controller. General purpose means that a
platform is created of which the hardware designer does not know the future applications
that will run on it. Flexibility, reflected in the instruction set, is then of importance. A second
class of processors are domain-specific processors: they have limited or no programmability
and designed for one or a small class of applications.

20.5.1 Design process from RTL to ASIC or FPGA


When a dedicated processor is built for one or a class of cryptographic algorithms, this gives
a lot of freedom to the hardware designer. Typically, the hardware designer will, starting
from the cryptographic algorithm description, come up with hardware architectures at the
Register Transfer Level (RTL) taking into account a set of constraints. Area is measured
by gate count at RTL level. Throughput is measured by bits/sec. Power consumption is
important for cooling purposes and measured in Watt. Energy, measured in Joule, is important
for battery operated devices. It is often expressed in the amount of operations or amount
of bits that can be processed per unit energy. Hence the design goal is to maximize the
operations/Joule or bits/Joule. The resistance to side channel attacks is measured by the
number of measurements or samples required to disclose the key or other sensitive material.
Flexibility and programmability are difficult to measure and are typically imposed by the
application or class of applications that need to be supported: will the hardware support
only one or a few algorithms, encryption and/or decryption, modes of operation, initialization,
requirements for key storage, and so on.
A hardware architecture is typically described in a Hardware Description Language such as
Verilog of VHDL. Starting from this description the two most important hardware platforms
available to a hardware designer are ASIC and FPGA. An Application Specific Integrated Circuit
(ASIC) is a dedicated circuit fabricated in silicon. Once fabricated (baked) it cannot be modified
anymore. A Field Programmable Gate Array (FPGA) is a special type of programmable device:
it consists of regular arrays of 1-bit cells, that can programmed by means of a bitstream. This
special bitstream programs each cell to a specific function, e.g. a one bit addition, a register,
a multiplexer, and so on. By changing the bit-stream the functionality of the FPGA changes.
From the viewpoint of the Register Transfer Level (RTL) the actual design process for either
FPGA or ASIC doesn’t differ that much. Similar design options are available: the designer can

KA Hardware Security | July 2021 Page 695


The Cyber Security Body Of Knowledge
www.cybok.org

decide to go for serial or parallel architectures, making use of multiple design tricks to match
the design with the requirements. The most well-known tricks are to use pipelining to increase
throughput, or unrolling to reduce latency, time multiplexing to reduce area, etc.
From implementation viewpoint, at this register transfer abstraction level, a large body of
knowledge and a large set of Electronic Design Automation (EDA) tools exist to map an
application onto a FPGA or ASIC platform [1849]. Implementation results should be compared
not only on the number of operations, but also on memory requirements (program memory
and data memory), throughput and latency requirements, energy and power requirements,
bandwidth requirements and the ease with which side-channel and fault attack countermea-
sures can be added. Please note that this large body of knowledge exists for implementations
that focus on efficiency. However, when combining efficiency with security requirements, such
as constant time execution or other countermeasures, there is a huge lack of supporting EDA
tools (see section 20.8).

20.5.2 Cryptographic algorithms at RTL level


Cryptographic implementations are subdivided in several categories, enumerated below.
The details of the cryptographic algorithms themselves are discussed in the Cryptography
Knowledge Area (Chapter 10). Here only remarks related to the RTL implementation are made.
In this section only notes specific to the hardware implementations are made.
• Secret key algorithms: both block ciphers and stream ciphers result usually in compact
and fast implementations. Feistel ciphers are chosen for very area constrained designs
as the encryption and decryption hardware is the same. This is e.g. not the case for the
AES algorithm for which encryption and decryption require different units.
• Secret key: light-weight algorithms. For embedded devices, over the years, many light-
weight algorithms have been developed and implemented, e.g. Present, Prince, Rectangle,
Simon or Speck cipher. Focus in these cases is mostly on area cost. However, lately
light-weight has been extended to include also low power, low energy and especially
low-latency. Latency is defined as the time difference between input clear text and
corresponding encrypted output or MAC. Having a short latency is important in real-
time control systems, automotive, industrial IoT but also in memory encryption, control
flow integrity applications etc. More knowledge will follow from the recent NIST call on
light-weight crypto [1867].
• Secret key: block ciphers by themselves are not directly applicable in security appli-
cation. They need to be combined with modes of operation to provide confidentiality
or integrity, etc. (see the Cryptography Knowledge Area (Chapter 10)). In this context
efficient implementations of authenticated encryption schemes are required: this is
the topic of the CAESAR competition [1868]. From an implementation viewpoint, the
sequential nature of the authenticated encryption schemes makes it very difficult to
obtain high throughputs as pipelining cannot directly be applied.
• Hash algorithms require typically a much larger area compared to secret key algorithms.
Especially the SHA3 algorithm and its different versions are large in area and slow in
execution. Therefore, light-weight hash algorithms are a topic of active research.
• One important hardware application of hash functions is the mining of cryptocurrencies,
such as Bitcoin, Etherium, Litecoin and others, based on SHA2, SHA256, SHA3, etc. To
obtain the required high throughputs, massive parallelism and pipelining is applied. This

KA Hardware Security | July 2021 Page 696


The Cyber Security Body Of Knowledge
www.cybok.org

is however limited as hash algorithms are recursive algorithms and thus there is an
upper bound on the amount of pipelining that can be applied [1869]. Cryptocurrencies
form part of the more general technology of distributed ledgers, which is discussed in
the Distributed Systems Security Knowledge Area (Chapter 12).
• The computational complexity of public key algorithms is typically 2 or 3 orders of
magnitude higher than secret key and thus its implementation 2 to 3 orders slower or
larger. Especially for RSA and Elliptic curve implementations, a large body of knowledge
is available, ranging from compact [1870] to fast, for classic and newer curves [1871].
• Algorithms resistant to attacks of quantum computers, aka post-quantum secure algo-
rithms, are the next generation algorithms requiring implementation in existing CMOS
ASIC and FPGA technology. Computational bottle-necks are the large multiplier struc-
tures, with/without the Number Theoretic Transform, the large memory requirements
and the requirements on random numbers that follow specific distributions. Currently,
NIST is holding a competition on post-quantum cryptography [1872]. Thus it is expected
that after the algorithms are decided, implementations in hardware will follow.
• Currently, the most demanding implementations for cryptographic algorithms are those
used in homomorphic encryption schemes: the computational complexity, the size of
the multipliers and especially the large memory requirements are the challenges to
address [1873].

20.6 SIDE-CHANNEL ATTACKS, FAULT ATTACKS AND


COUNTERMEASURES
This section first provides an overview of physical attacks on implementations of cryptographic
algorithms. The second part discusses a wide range of countermeasures and some open
research problems. Physical attacks, mostly side-channel and fault attacks, were originally of
great concern to the developers of small devices that are in the hands of attackers, especially
smart-cards and pay-TV systems. The importance of these attacks and countermeasures is
growing as more electronic devices are easily accessible in the context of the IoT.

20.6.1 Attacks
At the current state of knowledge, cryptographic algorithms have become very secure against
mathematical and cryptanalytical attacks: this is certainly the case for algorithms that are
standardized or that have received an extensive review in the open research literature. Cur-
rently, the weak link is mostly the implementation of algorithms in hardware and software.
Information leaks from the hardware implementation through side-channel and fault attacks.
A distinction is made between passive or side-channel attacks versus active or fault attacks.
A second distinction can be made based on the distance of the attacker to the device: attacks
can occur remotely, close to the device still non-invasive to actual invasive attacks. More
details on several classes of attacks are below.
Passive Side Channel Attacks General side-channel attacks are passive observations of a
compute platform. Through data dependent variations of execution time, power consumption
or electro-magnetic radiation of the device, the attacker can deduce information of secret
internals. Variations of execution time, power consumption or electro-magnetic radiations

KA Hardware Security | July 2021 Page 697


The Cyber Security Body Of Knowledge
www.cybok.org

are typically picked up in close proximity of the device, while it is operated under normal
conditions. It is important to note that the normal operation of the device is not disturbed.
Thus the device is not aware that it is being attacked, which makes this attack quite powerful
[980].
Side channel attacks based on variations on power consumption have been extensively
studied. They are performed close to the device with access to the power supply or the power
pins. One makes a distinction between Simple Power Analysis (SPA), Differential and Higher
Order Power Analysis (DPA), and template attacks. In SPA, the idea is to first study the target
for features that depend on the key. E.g. a typical target in timing and power attacks are
if-then-else branches that are dependent on key bits. In public key algorithm implementations,
such as RSA or ECC, the algorithm runs sequentially through all key bits. When the if-branch
takes more or less computation time than the else-branch this can be observed from outside
the chip. SPA attacks are not limited to public key algorithms, they have also been applied to
secret key algorithms, or algorithms to generate prime numbers (in case they need to remain
secret). So with knowledge of the internal operation of the device, SPA only requires to collect
one or a few traces for analysis.
With DPA, the attacker collects multiple traces, ranging from a few tens for unprotected
implementations to millions in case of protected hardware implementations. In this situation,
the attacker exploits the fact that the instantaneous power consumption depends on the
data that is processed. The same operation, depending on the same unknown sub-key, will
result in different power consumption profiles if the data is different. The attacker will also
built a statistical model of the device to estimate the power consumption as a function of
the data and the different values of the subkey. Statistical analysis on these traces based on
correlation analysis, mutual information and other statistical tests are applied to correlate the
measured values to the statistical model.
Side channel attacks based on Electro-Magnetic radiations have been recognized early-on
in the context of military communication and radio equipment. As a reaction, NATO and the
governments of many countries have issued TEMPEST [1874]. It consists of specifications on
the protection of equipment against unintentional electro-magnetic radiation but also against
leakage of information through vibrations or sound. Electro-Magnetic radiation attacks can
be mounted from a distance, as explained above, but also at close proximity to the integrated
circuit. Electro-Magnetic probing on top of an integrated circuit can release very localized
information of specific parts of an IC by using a 2D stepper and fine electro-magnetic probers.
Thus electro-magnetic evaluation has the possibility to provide more fine grained leakage
information compared to power measurements.
Timing attacks are another subclass of side-channel attacks [1453]. When the execution time
of a cryptographic calculation or a program handling sensitive data, varies as a function of the
sensitive data, then this time difference can be picked up by the attacker. A timing attack can
be as simple as a key dependent different execution time of an if-branch versus an else-branch
in a finite state machine. Cache attacks, which abuse the time difference between a cache hit
and a cache miss are an important class of timing attacks [1875], [1876], .
With a template attack, the attacker will first create a copy or template of the target device
[1877]. This template is used to study the behavior of the device for all or a large set of inputs
and secret data values. One or a few samples of the target device are then compared to the
templates in the database to deduce secret information from the device. Template attacks
are typically used when the original device has countermeasures against multiple executions.
E.g. it might have an internal counter to log the number of failed attempts. Templates can be

KA Hardware Security | July 2021 Page 698


The Cyber Security Body Of Knowledge
www.cybok.org

made based on timing, power or electro-magnetic information. As machine learning and AI


techniques become more powerful, so will the attack possibility with template attacks.
Micro-architectural Side-channels Processor architectures are very vulnerable to timing at-
tacks. The problem of information leaks and the difficulty of confinement between programs
was already identified early on in [1878]. Later timing variations in cache hits and misses
became an important class of timing attacks [1879]. Recently gaining a lot of attention are the
micro-architectural side-channel attacks, such as Spectre, Meltdown, Foreshadow. They are
also based on the observation of timing differences [1851][1879]. The strength of the attacks
sits in the fact that they can be mounted remotely from software. Modern processors include
multiple optimization techniques to boost performance not only with caches, but also specu-
lative execution, out-of-order execution, branch predictors, etc. When multiple processes run
on the same hardware platform, virtualization and other software techniques isolates the data
of the different parties in separate memory locations. Yet, through the out-of-order execution
or speculative execution (or many other variants) the hardware of the processor will access
memory locations not intended for the process by means of so-called transient instructions.
These instructions are executed but never committed. They have however touched memory
locations, which might create side channel effects, such as variations in access time, and
thus leak information.
Active fault attacks Fault attacks are active manipulations of hardware compute platforms
[1880]. The result is that the computation itself or the program control flow is disturbed. Faulty
or no outputs are released. Even if no output is released or the device resets itself, this decision
might leak sensitive information. One famous example is published in [1881]: it describes an
RSA signature implementation which makes use of the Chinese Remainder Theorem (CRT).
With one faulty and one correct result signature, and some simple mathematical calculations,
the secret signing key can be derived. Physical fault-attacks could be a simple clock glitching,
power glitching, heating up or cooling down a device. These require close proximity to the
device but are non-invasive.
With scaling of memories, more attack surfaces appear. A very specific attack on DRAM
memories, is the RowHammer attack [989, 1882]. By repeating reading specific locations in
DRAM memory, neighboring locations will loose their values. Thus by hammering certain
locations, bit flips will occur in nearby locations.
With more expensive equipment, and with opening the lid of the integrated circuit or etching
the silicon down, even more detailed information of the circuit can be obtained. Equipment
that has been used include optical fault [1883], laser attacks [1884], Focused Ion Beam (FIB),
a Scanning Electron Microscope (SEM) and other. The latter are typically equipment that has
been designed for chip reliability and failure analysis. This equipment can also be used or
misused for reverse engineering.

KA Hardware Security | July 2021 Page 699


The Cyber Security Body Of Knowledge
www.cybok.org

20.6.2 Countermeasures
There are no generic countermeasures that resist all classes of side-channel attacks. De-
pending on the threat model (remote/local access, passive/active, etc.) and the assumptions
made on the trusted computing base (i.e. what is and what is not included in the root of trust),
countermeasures have been proposed at several levels of abstraction. The most important
categories are summarized below.
To resist timing attacks, the first objective is to provide hardware that executes the application
or program in constant time independent of secret inputs, keys and internal state. Depending
on the time granularity of the measurement equipment of the attacker, constant time coun-
termeasures also need to be more fine grained. At the processor architecture level, constant
time means a constant number of instructions. At the RTL level, constant time means a
constant number of clock cycles. At logic and circuit level, constant time means a constant
logic depth or critical path independent of the input data. At instruction level, constant time
can be obtained by balancing execution paths and adding dummy instructions. Sharing of
resources, e.g. through caches, make constant time implementations extremely difficult to
obtain.
At RTL level, we need to make sure that all instructions run in the same number of clock
cycles. dummy operations or dummy gates, depending on the granularity level. Providing
constant time RTL level and gate level descriptions is however a challenge as design tools,
both hardware and software compilers, will for performance reasons synthesize away the
dummy operations or logic which were added to balance the computations.
As many side-channel attacks rely on a large number of observations or samples, randomisa-
tion is a popular countermeasure. It is used to protect against power, electro-magnetic and
timing side-channel attacks. Randomisation is a technique that can be applied at algorithm
level: it is especially popular for public key algorithms, which apply techniques such as scalar
blinding, or message blinding [1885]. Randomisation applied at register transfer and gate level
is called masking. Masking schemes randomise intermediate values in the calculations so
that their power consumption can no longer be linked with the internal secrets. A large set of
papers on gate level masking schemes is available, ranging from simple Boolean masking
to threshold implementations that are provable secure under certain leakage models [1886].
Randomisation has been effective in practice especially as a public key implementation pro-
tection measure. The protection of secret key algorithms by masking is more challenging.
Some masking schemes require a huge amount of random numbers, others assume leakage
models that do not always correspond to reality. In this context, novel cryptographic tech-
niques summarized under the label leakage resilient cryptography, are developed that are
inherently resistant against side-channel attacks [1887, 1888]. At this stage, there is still a gap
between theory and practice.
Hiding is another major class of countermeasures. The idea is to reduce the signal to noise
ratio by reducing the signal strength. Shielding in the context of TEMPEST is one such example.
Similarly, at gate level, reducing the power signature or electro-magnetic signature of standard
cells or logic modules, will increase the resistance against power or electro-magnetic attacks.
Simple techniques such as using a jittery or drifting clock, and large decoupling capacitances
will also reduce the signal to noise ratio.
Sometimes solutions for leaking at one abstraction level, e.g. power side channels, can be
addressed at a different abstraction level. Therefore, if there is a risk that an encryption key
leaks from an embedded device, a cryptographic protocol that changes the key at a sufficiently

KA Hardware Security | July 2021 Page 700


The Cyber Security Body Of Knowledge
www.cybok.org

high frequency, will also avoid side-channel information leakage.


General purpose processors such as CPUs, GPUs, and micro-controllers can not be modified
once fabricated. Thus protecting against micro-architectural attacks after fabrication by
means of software patches and updates is extremely difficult and mostly at the cost of reduced
performance [1851]. Micro-code updates are also a form of software, i.e. firmware update and
not a hardware update. The main difference is that the translation from instructions to micro-
code is a company secret, and thus for the user it looks like a hardware update. Providing
generic solutions to programmable hardware is a challenge as it is unknown beforehand which
application will run. Solutions to this problem will be a combined effort between hardware
and software techniques.
Protection against fault attacks are made at the register transfer level, as well as at the circuit
level. At RTL, protection against fault attacks is mostly based on redundancy either in space or
in time and by adding checks based on coding, such as parity checks. The price is expensive
as calculations are performed multiple times. One problem with adding redundancy is that it
increases the attack surface of side-channels. Indeed, due to the redundant calculations, the
attacker has more traces available to perform time, power or electro-magnetic side-channel
attacks [1885]. At circuit level, monitors on the clock or power supply, might detect deviations
from normal operations and raise an alarm.
Many type of circuit level sensors are added to integrated circuits. Examples are light sensors
that detect that a lid of a package has been opened. Mesh metal sensors which are laid-out
in top level metal layers can detect probing attacks. Temperature sensors detect heating or
cooling of the integrated circuit. Antenna sensors to detect electro-magnetic probes close to
the surface have been developed: these sensors measure a change in electro-magnetic fields.
And sensors that detect manipulation of the power supply or clock can be added to the device.
Note that adding sensors to detect active manipulation can again leak extra information to
the side channel attacker.
Joint countermeasures against side-channel and fault attacks are challenging and an active
area of research.

20.7 ENTROPY GENERATING BUILDING BLOCKS: RANDOM


NUMBERS, PHYSICALLY UNCLONABLE FUNCTIONS
Sources of entropy are essential for security and privacy protocols. In this section two impor-
tant sources of entropy related to silicon technology are discussed: random number generators
and physically unclonable functions.

KA Hardware Security | July 2021 Page 701


The Cyber Security Body Of Knowledge
www.cybok.org

20.7.1 Random number generation


Security and privacy rely on strong cryptographic algorithms and protocols. A source of
entropy is essential in these protocols: random numbers are used to generate session keys,
nonces, initialization vectors, to introduce freshness, etc. Random numbers are also used to
create masks in masking countermeasures, random shares in multi party computation, zero-
knowledge proofs, etc. In this section the focus is on cryptographically secure random numbers
as used in security applications. Random numbers are also used outside cryptography, e.g. in
gaming, lottery applications, stochastic simulations, etc.
In general, random numbers are subdivided in two major classes: the Pseudo Random Number
Generator (PRNG) also called Deterministic Random Bit Generator (DRBG) and the True
Random Number Generator (TRNG) or Non-Deterministic Random Bit Generator (NRBG). The
design, properties and testing of random numbers is described in detail by important standards,
issued in the US by NIST. NIST has issued the NIST800-90A for deterministic random number
generators, the NIST800-90B for entropy sources, and NIST800-90C for random bit generation
constructions [1672], [1889] [1890] 1 . In Germany and by extension in most of Europe, the
German BSI has issued two important standards: the AIS-20 for functionality classes and
evaluation criteria for deterministic random number generators and the AIS-31 for physical
random number generators [1891, 1892, 1893].
An ideal RNG should generate all numbers with equal probability. Secondly, these numbers
should be independent from previous or next numbers generated by the RNG, called forward
and backward secrecy. The probabilities are verified with statistical tests. Each standard
includes a large set of statistical tests aimed at finding statistical weaknesses. Not being
able to predict future values or derive previous values is important not only in many security
applications, e.g. when this is used for key generation, but also in many gaming and lottery
applications.
Pseudo-random number generators are deterministic algorithms that generate a sequence of
bits or numbers that look random but are generated by a deterministic process. Since a PRNG
is a deterministic process, when it starts with the same initial value, then the same sequence of
numbers will be generated. Therefore it is essential that PRNG starts with a different start-up
value each time the PRNG is initiated. This initial seed can either be generated by a slow
true random number generated or at minimum by a non-repeating value, e.g. as provided by
a monotonic increasing counter. A PRNG is called cryptographically secure if the attacker,
who learns part of the sequence, is not able to compute any previous or future outputs.
Cryptographically secure PRNGs rely on cryptographic algorithms to guarantee this forward
and backward secrecy. Forward secrecy requires on top a regular reseeding to introduce new
freshness into the generator. Hybrid RNG have an additional non-deterministic input to the
PRNG.
PRNGs provide conditional security based on the computational complexity of the underlying
cryptographic algorithms. See the Cryptography Knowledge Area (Chapter 10) for more details.
In contrast, ideal true random number generators provide unconditional security as they are
based on unpredictable physical phenomena. Thus their security is guaranteed independent
of progress in mathematics and cryptanalysis.
The core of a true random number generator consists of an entropy source, which is a physical
phenomena with a random behavior. In electronic circuits, noise or entropy sources are
usually based on thermal noise, jitter and metastability. These noise sources are never perfect:
1
NIST800-90C does not exist as a standard yet.

KA Hardware Security | July 2021 Page 702


The Cyber Security Body Of Knowledge
www.cybok.org

the bits they generate might show bias or correlation or other variations. Hence they don’t
have full entropy. Therefore, they are typically followed by entropy extractors or conditioners.
These building blocks improve the entropy per bit of output. But as the entropy extractor are
deterministic processes, they cannot increase the total entropy. So the output length will be
shorter than the input length.
Due to environmental conditions, e.g. due to temperature or voltage variations, the quality
of the generated numbers might vary over time. Therefore, the standards describe specific
tests that should be applied at the start and continuously during the process of generating
numbers. One can distinguish three main categories of tests. The first one is the total failure
test, applied at the source of entropy. The second ones are online health tests to monitor the
quality of the entropy extractors. The third ones are tests for the post-processed bits. The
requirements for these tests are well described in the different standards and specialized text
books [1894].
The challenge in designing TRNGs is first to provide a clear and convincing proof of the entropy
source, second the design of online tests which at the same are compact and can detect a
wide range of defects [1895]. The topic of attacks, countermeasures and sensors for TRNGs,
especially in the context of IoT and embedded devices, is an active research topic.

20.7.2 Physically Unclonable Functions


From a hardware perspective, Physically Unclonable Functions (PUFs), are circuits and tech-
niques to derive unique features from silicon circuits, similar to human biometrics [1896].
The manufacturing of silicon circuits results in unique process variations which cannot be
physically cloned. The basic idea of PUFs is that these unique manufacturing features are
magnified and digitized so that they can be used in security applications similar to the use of
fingerprints or other biometrics. Process and physical variations such as doping fluctuations,
line or edge widths of interconnect wires, result in variations of threshold voltages, transistor
dimensions, capacitances, etc. Thus circuits are created that are sensitive to and amplify
these variations.
The major security application for PUFs is to derive unique device specific keys, e.g. for usage
in an IoT device or smart card. Traditionally, this storage of device unique keys is done in
non-volatile memory, as the key has to remain in the chip even when the power is turned-off.
Non-volatile memory requires however extra fabrication steps, which makes chips with non-
volatile memory more expense than regular standard CMOS chips. Thus PUFs are promised
as cheap alternative for secure non-volatile memory, because the unique silicon fingerprint
is available without the extra processing steps. Indeed, each time the key is needed, it can
be read from the post-processed PUF and directly used in security protocols. They can also
replace fuses, which are large and their state is relatively easy to detect under a microscope.
The second security application is to use PUFs in identification applications, e.g. for access
control or tracking of goods. The input to a PUF is called a challenge, the output the response.
The ideal PUF has an exponential number of unique challenge response pairs, exponential in
the number of circuit elements. The uniqueness of PUFs is measured by the inter-distance
between different PUFs seeing the same challenge. The ideal PUF has stable responses:
it replies with the same response, i.e. there is no noise in the responses. Moreover, PUF
responses should be unpredictable and physically unclonable.
The ideal PUF unfortunately does not exist. In literature, two main classes of PUFs are defined,

KA Hardware Security | July 2021 Page 703


The Cyber Security Body Of Knowledge
www.cybok.org

characterized by the number of challenge-response pairs they can generate. So-called weak
PUFs are circuits with a finite number of elements, with each element providing a high amount
of entropy. The number of possible challenge-response pairs grows typically linear with the
area of the integrated circuit. Hence they are called weak PUFs. The most well known example
is the SRAM PUF [1897]. These PUFs are typically used for key generation. The raw PUF
output material is not directly usable for key generation as the PUF responses are affected
by noise. Indeed, subsequent readings of the same PUF might result in slightly varying noisy
responses, typically up to 20%. Thus after the entropy extraction follows secure sketch (similar
to error correction) circuits to eliminate the noise and compress the entropy to generate a full
entropy key [1898]. The challenge for the PUF designer is to come up with process variations
and circuits that can be used as key material, but which are not sensitive to transient noise.
A second challenge is to keep all the post-processing modules compact so that the key-
generation PUF can be included in embedded IoT devices.
The second class are the so-called strong PUFs. In this case, the number of challenge-response
pairs grows large, ideally exponential, with the silicon area. The most well-known example
is the arbiter PUF [1899]. A small number of silicon elements are combined together, e.g. to
create a chain of multiplexers or comparators, so that simple combinations of the elements
create the large challenge-response space. Also in this case, the effects of noise in the circuits
needs to be taken into account. Strong PUFs are promised to be useful in authentication
applications, e.g. for access control. Each time a challenge is applied to the PUF, a response
unique to the chip will be sent. The verifier will accept the response if it can be uniquely tied
to the prover. This requires that the PUF responses are registered in a form of a database
beforehand during an enrollment phase.
The problem with strong PUFs is that there is a strong correlation between different challenge-
response pairs of most circuits proposed in literature. Hence all of these circuits are broken
with machine learning techniques [1900] and can not be used for authentication purposes.
The fundamental problem is that very basic, mostly linear operations are used to combine PUF
elements, which makes them easy targets for machine learning attacks. Ideally, these should
be cryptographic or other computationally hard operations resistant to machine learning:
unfortunately these cannot tolerate noise. Light-weight PUF based security protocols are an
active area of research.

20.8 HARDWARE DESIGN PROCESS


In this section, several hardware security topics are described which are directly related to the
lower design abstraction layers. One is the trust in the hardware design process itself. Directly
related to this, is the problem of Trojan circuits. Also part of the hardware design process are
circuit level techniques for camouflaging, logic locking, etc.

KA Hardware Security | July 2021 Page 704


The Cyber Security Body Of Knowledge
www.cybok.org

20.8.1 Design and fabrication of silicon integrated circuits


It is important to note that the hardware design process itself also needs to be trusted.
Because of its design complexity, design at each abstraction layer relies on Electronic Design
Automation (EDA) tools. The design, fabrication, packaging and test of silicon integrated
circuits is an international engagement: silicon foundries are mostly located in Asia. Silicon
design tools are most developed in the US, and silicon testing and packaging usually occur all
over the world. For chips that end-up in critical infrastructure, such as telecommunication,
military, aviation, trust and verification of the complete design cycle is essential.
Since silicon foundries and mask making are extremely expensive, very few countries and
companies can still afford it and a huge consolidation has and is taking place in the industry.
For critical infrastructure, governments demand more tools and techniques to increase the
trustworthiness of this international design process. On this topic, large research projects
are defined to come up with methods and tools to increase the trustworthiness of the design
process and especially to assess the risk of Trojan insertions during the design process.

20.8.2 Trojan circuits


Trojan circuits are logic or gates added to large integrated circuits. As they are not part of the
specified functionality, they are difficult to detect. They rely on the fact that they are extremely
small in comparison with the large size of integrated circuits and SoCs. Trojan circuits are
classified according to three main criteria [1901, 1902]. The first one is the physical character-
istics of the Trojan, i.e. how is the Trojan inserted into the circuit. E.g. does it requires logic
modifications or only layout modifications. The second one is the activation characteristic: will
the Trojan be turned on by an internal or external event, etc. The third characteristic classifies
the type of action taken by the Trojan, e.g. will it leak information or will it destroy functionality,
etc. The knowledge area on this topic is summarized in [1901, 1902].

20.8.3 Circuit level techniques


To avoid visual inspection, circuit level camouflaging techniques are introduced [1903]. These
are standard cells or other modules that visually look the same, or they look camouflaged by
random extra material. This is done to avoid visual inspection and reverse engineering based
on visual inspection.
Another techniques to avoid loss of intellectual property is logic locking [1904]. With this
technique, extra gates are added to a circuit with a secret input. Only when the correct key is
applied to the secret gates, will the circuit perform the correct functionality. This is an active
research topic with logic locking schemes being proposed and attacked, with SAT solvers
being a very useful tool in attacking the circuits.

KA Hardware Security | July 2021 Page 705


The Cyber Security Body Of Knowledge
www.cybok.org

20.8.4 Board Level Security


Integrated circuits are placed together on Printer Circuit Boards (PCBs). Many of the attacks
and countermeasures mentioned before for integrated circuits, can be repeated for PCBs
albeit at a different scale. While integrated circuits provide some level of protection because
they are encapsulated in packages and use much smaller CMOS technologies, PCB’s are
less complex and somewhat easier to access. Therefore, for PCB’s special coatings, and
mechanical tamper evident and tamper resistant protection mechanisms could be provided.
There have been some concerns that Trojan circuits could also be included at the board level.

20.8.5 Time
The concept of time and the concept of sequence of events are essential in security protocols.
The TCG identifies three types of sequencing: a monotonic counter, a tick counter and actual
trusted time [1850]. A monotonic counter always increases, but the wall clock time between
two increments is unknown. The tick counter increases with a set frequency. It only increases
when the power is on. At power-off the tick counter will reset. Therefore the tick counter is
linked with a nonce and methods are foreseen to link this with a real wall clock time. Trusted
time is the most secure. It makes sure that there is a link between the tick counter and the
real wall clock time. From a hardware viewpoint it will require non-volatile memory, counters,
crystals, continuous power, and an on chip clock generator. The connection to a real wall
clock will require synchronization and an actual communication channel.
The importance of time is placed in a wider context in the Distributed Systems Security
Knowledge Area (Chapter 12).

20.9 CONCLUSION
Hardware security is a very broad topic, covering many different topics. In this chapter, a
classification is made based on the different design abstraction layers. At each abstraction
layer, the threat model, root of trust and security goals are identified.
Because of the growth of IoT, edge and cloud computing, the importance of hardware security
is growing. Yet, in many cases hardware security is in conflict with other performance optimi-
sations, such as low power or limited battery operated conditions. In these circumstances,
performance optimization is the most important design task. Yet it is also the most important
cause of information leakage. This is the case at all abstraction layers: instruction level,
architecture level and logic and circuit level.
Another trend is that hardware is becoming more ‘soft’. This is an important trend in processor
architecture, where FPGA functionality is added to processor architectures. The fundamental
assumption that hardware is immutable is lost here. This will create a whole new class of
attacks.
A last big challenge for hardware security is the lack of EDA tools to support hardware security.
EDA tools are made for performance optimization and security is usually an afterthought. An
added challenge is that it is difficult to measure security and thus difficult to balance security
versus area, throughput or power optimisations.

KA Hardware Security | July 2021 Page 706


Chapter 21
Cyber-Physical Systems
Security
Alvaro Cardenas University of California Santa Cruz

707
The Cyber Security Body Of Knowledge
www.cybok.org

INTRODUCTION
Cyber-Physical Systems (CPSs) are engineered systems that are built from, and depend upon,
the seamless integration of computation, and physical components. While automatic control
systems like the steam governor have existed for several centuries, it is only in the past
decades that the automation of physical infrastructures like the power grid, water systems,
or chemical reactions have migrated from analogue controls to embedded computer-based
control, often communicating through computer-based networks. In addition, new advances
in medical implantable devices, or autonomous self-driving vehicles are increasing the role of
computers in controlling even more physical systems.
While computers give us new opportunities and functionalities for interacting with the physical
world, they can also enable new forms of attacks. The purpose of this Knowledge Area is to
provide an overview of the emerging field of CPS security.
In contrast with other Knowledge Areas within CyBOK that can trace the roots of their field
back to several decades, the work on CPS security is relatively new, and our community has
not developed yet the same consensus on best security practices compared to cyber security
fields described in other KAs. Therefore, in this document, we focus on providing an overview
of research trends and unique characteristics in this field.
CPSs are diverse and can include a variety of technologies, for example, industrial control
systems can be characterised by a hierarchy of technology layers (the Purdue model [1905]).
However, the security problems in the higher layers of this taxonomy are more related to
classical security problems covered in other KAs. Therefore, the scope of this document
focuses on the aspects of CPSs more closely related to the sensing, control, and actuation of
these systems (e.g., the lower layers of the Purdue model).
The rest of the Knowledge Area is organised as follows. In Section 21.1 we provide an intro-
duction to CPSs and their unique characteristics. In Section 21.2, we discuss crosscutting
security issues in CPSs generally applicable to several domains (e.g., the power grid or vehicle
systems); in particular we discuss efforts for preventing, detecting, and responding to attacks.
In Section 21.3, we summarise the specific security challenges in a variety of CPS domains,
including the power grid, transportation systems, autonomous vehicles, robotics, and medical
implantable devices. Finally, in Section 21.4, we examine the unique challenges CPS security
poses to regulators and governments. In particular, we outline the role of governments in
incentivising security protections for CPSs, and how CPS security relates to national security
and the conduct of war.

CONTENT

21.1 CYBER-PHYSICAL SYSTEMS AND THEIR SECURITY


RISKS
[1906, 1907, 1908]
The term Cyber-Physical Systems (CPSs) emerged just over a decade ago as an attempt to
unify the common research problems related to the application of embedded computer and
communication technologies for the automation of physical systems, including aerospace,

KA Cyber-Physical Systems Security | July 2021 Page 708


The Cyber Security Body Of Knowledge
www.cybok.org

automotive, chemical production, civil infrastructure, energy, healthcare, manufacturing, new


materials, and transportation. CPSs are usually composed of a set of networked agents
interacting with the physical world; these agents include sensors, actuators, control processing
units, and communication devices, as illustrated in Figure 21.1.
The term CPSs was coined in 2006 by Helen Gill from the National Science Foundation
(NSF) in the United States [1906]. In their program announcement, NSF outlined their goal
for considering various industries (such as water, transportation, and energy) under a unified
lens: by abstracting from the particulars of specific applications in these domains, the goal of
the CPS program is to reveal crosscutting fundamental scientific and engineering principles
that underpin the integration of cyber and physical elements across all application sectors.

Sensors
Actuators s1

a1 Physical
s2
System

a2 a3 s3
s4

Network

c1 c2 c3

Distributed Controllers

Figure 21.1: General architecture of cyber-physical systems [1909].

Soon after the CPS term was coined, several research communities rallied to outline and
understand how CPSs cyber security research is fundamentally different when compared
to conventional IT cyber security. Because of the crosscutting nature of CPSs, the back-
ground of early security position papers from 2006 to 2009 using the term CPSs, ranged from
real-time systems [1910, 1911], to embedded systems [1912, 1913], control theory [1909], and
cybersecurity [1908, 1913, 1914, 1915, 1916].
While cyber security research had been previously considered in other physical domains—most
notably in the Supervisory Control and Data Acquisition (SCADA) systems of the power
grid [1917]—these previous efforts focused on applying well-known IT cyber security best
practices to control systems. What differentiates the early CPS security position papers was
their crosscutting nature focusing on a multi-disciplinary perspective for CPS security (going
beyond classical IT security). For example, while classical intrusion detection systems monitor
purely cyber-events (network packets, operating system information, etc.), early CPSs papers
bringing control theory elements [1908] suggested that intrusion detection systems for CPSs
could also monitor the physical evolution of the system and then check it against a model of
the expected dynamics as a way to improve attack detection.
CPS is related to other popular terms including the Internet of Things (IoT), Industry 4.0,

KA Cyber-Physical Systems Security | July 2021 Page 709


The Cyber Security Body Of Knowledge
www.cybok.org

or the Industrial Internet of Things, but as pointed out by Edward Lee, the term “CPS” is
more foundational and durable than all of these, because it does not directly reference either
implementation approaches (e.g., “Internet” in IoT) nor particular applications (e.g., “Industry”
in Industry 4.0). It focuses instead on the fundamental intellectual problem of conjoining the
engineering traditions of the cyber and physical worlds [1906].
The rest of this section is organised as follows: in Section 21.1.1, we introduce general proper-
ties of CPS, then in Section 21.1.2, we discuss how physical systems have been traditionally
protected from accidents and failures, and how these protections are not enough to protect
the system against cyber-attacks. We finalise this section by discussing the security and
privacy risks in CPSs along with summarising some of the most important real-world attacks
on control systems in Section 21.1.3.

21.1.1 Characteristics of CPS


CPSs embody several aspects of embedded systems, real-time systems, (wired and wireless)
networking, and control theory.
Embedded Systems: One of the most general characteristics of CPSs is that, because several
of the computers interfacing directly with the physical world (sensors, controllers, or actuators)
perform only a few specific actions, they do not need the general computing power of classical
computers—or even mobile systems—and therefore they tend to have limited resources. Some
of these embedded systems do not even run operating systems, but rather run only on firmware,
which is a specific class of software that provides low-level control of the device hardware;
devices without an operating systems are also known as bare metal systems. Even when
embedded systems have an operating system, they often run a stripped-down version to
concentrate on the minimal tools necessary for the platform.
Real-Time Systems: For safety-critical systems, the time in which computations are per-
formed is important in order to ensure the correctness of the system [1918]. Real-time pro-
gramming languages can help developers specify timing requirements for their systems, and
Real-Time Operating System (RTOS) guarantee the time to accept and complete a task from
an application [1919].
Network Protocols: Another characteristic of CPSs is that these embedded systems com-
municate with each other, increasingly over IP-compatible networks. While many critical
infrastructures such as power systems have used serial communications to monitor remote
operations in their SCADA systems, it is only in the past two decades that the information
exchange between different parts of the system has migrated from serial communications
to IP-compatible networks. For example, the serial communications protocol Modbus was
released by Modicon in 1979, and subsequent serial protocols with more capabilities included
IEC 60870-5-101 and DNP3 in the 1990s. All these serial protocols were later adapted to
support IP networks in the late 1990s and early 2000s with standards such as Modbus/TCP,
and IEC 60870-5-104 [1920, 1921].
Wireless: While most of the long-distance communications are done over wired networks,
wireless networks are also a common characteristic of CPSs. Wireless communications for
embedded systems attracted significant attention from the research community in the early
2000s in the form of sensor networks. The challenge here is to build networks on top of low-
powered and lossy wireless links, where traditional concepts for routing like the “hop distance”
to a destination are no longer applicable, and other link quality metrics are more reliable, e.g.,

KA Cyber-Physical Systems Security | July 2021 Page 710


The Cyber Security Body Of Knowledge
www.cybok.org

the expected number of times a packet has to be sent before a one-hop transmission is
successful. While most of the research on wireless sensor networks was done in abstract
scenarios, one of the first real-world successful applications of these technologies was in
large process control systems with the advent of WirelessHART, ISA100, and ZigBee [1922,
1923]. These three communications technologies were developed on top of the IEEE 802.15.4
standard, whose original version defined frames sizes so small, that they could not carry the
header of IPv6 packets. Since Internet-connected embedded systems are expected to grow
to billions of devices in the next years, vendors and standard organisations see the need to
create embedded devices compatible with IPv6. To be able to send IPv6 packets in wireless
standards, several efforts tried to tailor IPv6 to embedded networks. Most notably the Internet
Engineering Task Force (IETF) launched the 6LoWPAN effort, originally to define a standard to
send IPv6 packets on top of IEEE 802.15.4 networks, and later to serve as an adaptation layer
for other embedded technologies. Other popular IETF efforts include the RPL routing protocol
for IPv6 sensor networks, and CoAP for application-layer embedded communications [1924].
In the consumer IoT space some popular embedded wireless protocols include Bluetooth,
Bluetooth Low Energy (BLE), ZigBee, and Z-Wave [1925, 1926].
Control: Finally, most CPSs observe and attempt to control variables in the physical world.
Feedback control systems have existed for over two centuries, including technologies like
the steam governor, which was introduced in 1788. Most of the literature in control theory
attempts to model a physical process with differential equations and then design a controller
that satisfies a set of desired properties such as stability and efficiency. Control systems
were initially designed with analogue sensing and analogue control, meaning that the control
logic was implemented in an electrical circuit, including a panel of relays, which usually
encoded ladder logic controls. Analogue systems also allowed the seamless integration of
control signals into a continuous-time physical process. The introduction of digital electronics
and the microprocessor, led to work on discrete-time control [1927], as microprocessors
and computers cannot control a system in continuous time because sensing and actuation
signals have to be sampled at discrete-time intervals. More recently, the use of computer
networks allowed digital controllers to be further away from the sensors and actuators (e.g.,
pumps, valves, etc.), and this originated the field of networked-controlled systems [1928].
Another recent attempt to combine the traditional models of physical systems (like differential
equations) and computational models (like finite-state machines) is encapsulated in the field
of hybrid systems [1929]. Hybrid systems played a fundamental role in the motivation towards
creating a CPS research program, as they were an example of how combining models of
computation and models of physical systems can generate new theories that enable us to
reason about the properties of cyber- and physical-controlled systems.
Having discussed these general characteristics of CPSs, one caveat is that CPSs are diverse,
and they include modern vehicles, medical devices, and industrial systems, all with differ-
ent standards, requirements, communication technologies, and time constraints. Therefore,
the general characteristics we associate with CPSs might not hold true in all systems or
implementations.
Before we discuss cyber security problems, we describe how physical systems operating
under automatic control systems have been protected from accidents and natural failures, and
how these protections against non-malicious adversaries are not enough against strategic
attackers (i.e., attackers that know that these protections are in place and try to either bypass
them or abuse them).

KA Cyber-Physical Systems Security | July 2021 Page 711


The Cyber Security Body Of Knowledge
www.cybok.org

21.1.2 Protections Against Natural Events and Accidents


Failures in the control equipment of physical infrastructures can cause irreparable harm
to people, the environment, and other physical infrastructures. Therefore, engineers have
developed a variety of protections against accidents and natural causes, including safety
systems, protection, fault-detection, and robustness.

{
Community emergency response
Organizational
Response Plant emergency response

}
Physical protection (dikes) Physical
Response:
Physical protection (relief devices) Prevention and
Containment
Automatic
Control
Response
{Automatic action: Safety Interlock
System or Emergency Shutdown

}
Critical alarms, operator supervision,
and manual intervention Alarms and
Operator
Basic controls, process alarms, Intervention
and operator supervision

Process design

{
1
SP4 5

Regulatory 2

3
SP7
SP8
7
SP3 SP5

Control SP9 8 Steam

SP2
SP6

4
SP1

Figure 21.2: Layers of protection for safety-critical ICS.

Safety: The basic principle recommended by the general safety standard for control systems
(IEC 61508) is to obtain requirements from a hazard and risk analysis including the likelihood
of a given failure, and the consequence of the failure, and then design the system so that the
safety requirements are met when all causes of failure are taken into account. This generic
standard has served as the basis for many other standards in specific industries, for example,
the process industry (refineries, chemical systems, etc.) use the IEC 61511 standard to design
a Safety Instrumented System (SIS). The goal of a SIS is to prevent an accident by, e.g., closing
a fuel valve whenever a high-pressure sensor raises an alarm. A more general defense-in-depth
safety analysis uses Layers of Protection [1930], where hazards are mitigated by a set of layers
starting from (1) basic low priority alarms sent to a monitoring station, to (2) the activation of
SIS systems, to (3) mitigation safeguards such as physical protection systems (e.g., dikes) and
(4) organisational response protocols for a plant emergency response/evacuation. Figure 21.2
illustrates these safety layers of protection.
Protection: A related concept to safety is that of protection in electric power grids. These
protection systems include,
• Protection of Generators: when the frequency of the system is too low or too high, the
generator will be automatically disconnected from the power grid to prevent permanent
damage to the generator.
• Under Frequency Load Shedding (UFLS): if the frequency of the power grid is too low,
controlled load shedding will be activated. This disconnection of portions of the electric
distribution system is done in a controlled manner, while avoiding outages in safety-

KA Cyber-Physical Systems Security | July 2021 Page 712


The Cyber Security Body Of Knowledge
www.cybok.org

critical loads like hospitals. UFLS is activated in an effort to increase the frequency of
the power grid, and prevent generators from being disconnected.
• Overcurrent Protection: if the current in a line is too high, a protection relay will be
triggered, opening the line, and preventing damage to equipment on each side of the
lines.
• Over/Under Voltage Protection: if the voltage of a bus is too low or too high, a voltage
relay will be triggered.
Reliability: While safety and protection systems try to prevent accidents, other approaches
try to maintain operations even after failures in the system have occurred. For example, the
electric system is designed and operated to satisfy the so-called N-1 security criterion, which
means that the system could lose any one of its N components (such as one generator,
substation, or transmission line) and continue operating with the resulting transients dying
out to result in a satisfactory new steady-state operating condition, meaning that the reliable
delivery of electric power will continue.
Fault Tolerance: A similar, but data-driven approach to detect and prevent failures falls under
the umbrella of Fault Detection, Isolation, and Reconfiguration (FDIR) [1931]. Anomalies are
detected using either a model-based detection system, or a purely data-driven system; this
part of the process is also known as Bad Data Detection. Isolation is the process of identifying
which device is the source of the anomaly, and reconfiguration is the process of recovering
from the fault, usually by removing the faulty sensor (if there is enough sensor redundancy in
the system).
Robust Control: Finally, another related concept is robust control [1932]. Robust control deals
with the problem of uncertainty in the operation of a control system. These sources of unknown
operating conditions can come from the environment (e.g., gusts of wind in the operation of
planes), sensor noise, dynamics of the system not modelled by the engineers, or degradation
of system components with time. Robust control systems usually take the envelope of least
favourable operating conditions, and then design control algorithms so that the system
operates safely, even in the worst-case uncertainty.
These mechanisms are not sufficient to provide security: Before CPS security was a main-
stream field, there was a lot of confusion on whether safety, protection, fault-tolerance, and
robust controls were enough to protect CPSs from cyber-attacks. However, as argued over a
decade ago [1909], these protection systems generally assume independent, non-malicious
failures, and in security, incorrect model assumptions are the easiest way for the adversary to
bypass any protection. Since then, there have been several examples that show why these
mechanisms do not provide security. For example Liu et al. [1933] showed how fault-detection
(bad data detection) algorithms in the power grid can be bypassed by an adversary that sends
incorrect data that is consistent with plausible power grid configurations, but at the same time
is erroneous enough from the real values to cause problems to the system. A similar example
for dynamic systems (systems with a “time” component) considers stealthy attacks [1934].
These are attacks that inject small false data in sensors so that the fault-detection system
does not identify them as anomalies but, over a long-period of time, these attacks can drive the
system to dangerous operating conditions. Similarly, the N-1 security criterion in the electric
power grid assumes that if there is a failure, all protection equipment will react as configured,
but an attacker can change the configuration of protection equipment in the power grid. In
such a case, the outcome of an N-1 failure in the power grid will be completely unexpected,
as equipment will react in ways that were unanticipated by the operators of the power grid,

KA Cyber-Physical Systems Security | July 2021 Page 713


The Cyber Security Body Of Knowledge
www.cybok.org

leading to potential cascading failures in the bulk power system. Finally, in Section 21.1.3.1, we
will describe how real-world attacks are starting to target some of these protections against
accidents; for example, the Triton malware specifically targeted safety systems in a process
control system.
Safety vs. Security: The addition of new security defences may pose safety concerns, for
example, a power plant was shutdown because a computer rebooted after a patch [1935].
Software updates and patching might violate safety certifications, and preventing unauthorised
users from accessing a CPS might also prevent first responders from access to the system
in the case of an emergency (e.g., paramedics might need access to a medical device that
prevents unauthorised connections). Security solutions should take these CPS safety concerns
into account when designing and deploying new security mechanisms.

21.1.3 Security and Privacy Concerns


CPSs are at the core of health-care devices, energy systems, weapons systems, and transporta-
tion management. Industrial Control Systems systems, in particular, perform vital functions
in critical national infrastructures, such as electric power distribution, oil and natural gas
distribution, water and waste-water treatment, and intelligent transportation systems. The
disruption of these CPSs could have a significant impact on public health, safety and lead to
large economic losses.
For example, attacks on the power grid can cause blackouts, leading to interdependent
cascading effects in other vital critical infrastructures such as computer networks, medical
systems, or water systems creating potential catastrophic economic and safety effects in our
society [1936]. Attacks on ground vehicles can create highway accidents [1937], attacks on
GPS systems can mislead navigation systems and make drivers reach a destination desired by
the attacker [1938], and attacks on consumer drones can let attackers steal, cause accidents
or surreptitiously turn on cameras and microphones to monitor victims [1939].

Supervision/
Configuration

Controller

Physical
Actuators Sensors
Process

Figure 21.3: General Architecture of a CPS.

KA Cyber-Physical Systems Security | July 2021 Page 714


The Cyber Security Body Of Knowledge
www.cybok.org

21.1.3.1 Attacks Against CPSs

In general, a CPS has a physical process under its control, a set of sensors that report the
state of the process to a controller, which in turn sends control signals to actuators (e.g., a
valve) to maintain the system in a desired state. The controller often communicates with a
supervisory and/or configuration device (e.g., a SCADA system in the power grid, or a medical
device programmer) which can monitor the system or change the settings of the controller.
This general architecture is illustrated in Figure 21.3.
Attacks on CPSs can happen at any point in the general architecture, as illustrated in Figure 21.4,
which considers eight attack points.

8
Supervision/
Configuration

3
Controller

4 2

5 6 1
Physical
Actuators Sensors
Process

Figure 21.4: Attack Points in a CPS.

1. Attack 1 represents an attacker who has compromised a sensor (e.g., if the sensor data
is unauthenticated or if the attacker has the key material for the sensors) and injects
false sensor signals, causing the control logic of the system to act on malicious data.
An example of this type of attack is considered by Huang et al. [1940].
2. Attack 2 represents an attacker in the communication path between the sensor and the
controller, who can delay or even completely block the information from the sensors
to the controller, so the controller loses observability of the system (loss of view), thus
causing it to operate with stale data. Examples of these attacks include denial-of-service
attacks on sensors [1941] and stale data attacks [1942].
3. Attack 3 represents an attacker who has compromised the controller and sends incorrect
control signals to the actuators. An example of this attack is the threat model considered
by McLaughlin [1943].
4. Attack 4 represents an attacker who can delay or block any control command, thus
causing a denial of control to the system. This attack has been considered as a denial-
of-service to the actuators [1941].
5. Attack 5 represents an attacker who can compromise the actuators and execute a
control action that is different to what the controller intended. Notice that this attack is
different to an attack that directly attacks the controller, as this can lead to zero dynamics
attacks. These types of attacks are considered by Teixeira et al. [1944].

KA Cyber-Physical Systems Security | July 2021 Page 715


The Cyber Security Body Of Knowledge
www.cybok.org

6. Attack 6 represents an attacker who can physically attack the system (e.g., physically
destroying part of the infrastructure and combining this with a cyber-attack). This type
of joint cyber and physical attack has been considered by Amin et al. [1945].
7. Attack 7 represents an attacker who can delay or block communications to and from the
supervisory control system or configuration devices. This attack has been considered in
the context of SCADA systems [1946].
8. Attack 8 represents an attacker who can compromise or impersonate the SCADA system
or the configuration devices, and send malicious control or configuration changes to
the controller. These types of attacks have been illustrated by the attacks on the power
grid in Ukraine where the attackers compromised computers in the control room of the
SCADA system [1947] and attacks where the configuration device of medical devices
has been compromised [1948].
While traditionally most of the considered attacks on CPSs have been software-based, another
property of CPSs is that the integrity of these systems can be compromised even without a
computer-based exploit in what has been referred to as transduction attacks [1949] (these
attacks represent a physical way to inject false signals, as covered by Attack 1 in Figure 21.4).
By targeting the way sensors capture real-world data, the attacker can inject a false sensor
reading or even a false actuation action, by manipulating the physical environment around
the sensor [1949, 1950]. For example attackers can use speakers to affect the gyroscope of
a drone [1951], exploit unintentional receiving antennas in the wires connecting sensors to
controllers [1952], use intentional electromagnetic interference to cause a servo (an actuator)
to follow the attacker’s commands [1952], or inject inaudible voice commands to digital
assistants [1953].
In addition to security and safety-related problems, CPSs can also have profound privacy
implications unanticipated by designers of new systems. Warren and Brandeis stated in their
seminal 1890 essay The right to privacy [132] that they saw a growing threat from recent inven-
tions, like “instantaneous photographs” that allowed people to be unknowingly photographed,
and new media industries, such as newspapers, that would publish photographs without their
subjects’ consent. The rise of CPS technologies in general, and consumer IoT in particular,
are similarly challenging cultural assumptions about privacy.
CPS devices can collect physical data of diverse human activities such as electricity con-
sumption, location information, driving habits, and biosensor data at unprecedented levels of
granularity. In addition, the passive manner of collection leaves people generally unaware of
how much information about them is being gathered. Furthermore, people are largely unaware
that such collection exposes them to possible surveillance or criminal targeting, as the data
collected by corporations can be obtained by other actors through a variety of legal or illegal
means. For example, automobile manufacturers are remotely collecting a wide variety of
driving history data from cars in an effort to increase the reliability of their products. Data
known to be collected by some manufacturers include speed, odometer information, cabin
temperature, outside temperature, battery status, and range. This paints a very detailed map
of driving habits that can be exploited by manufacturers, retailers, advertisers, auto insurers,
law enforcement, and stalkers, to name just a few.
Having presented the general risks and potential attacks to CPSs we finalise our first section by
describing some of the most important real-world attacks against CPSs launched by malicious
attackers.

KA Cyber-Physical Systems Security | July 2021 Page 716


The Cyber Security Body Of Knowledge
www.cybok.org

21.1.3.2 High-Profile, Real-World Attacks Against CPSs

Control systems have been at the core of critical infrastructures, manufacturing and industrial
plants for decades, and yet, there have been few confirmed cases of cyber-attacks (here we
focus on attacks from malicious adversaries as opposed to attacks created by researchers
for illustration purposes).
Non-targeted attacks are incidents caused by the same attacks that classical IT computers
may suffer, such as the Slammer worm, which was indiscriminately targeting Windows servers
but that inadvertently infected the Davis-Besse nuclear power plant [1954] affecting the ability
of engineers to monitor the state of the system. Another non-targeted attack example was a
controller being used to send spam in a water filtering plant [1955].
Targeted attacks are those where adversaries know that they are targeting a CPS, and there-
fore, tailor their attack strategy with the aim of leveraging a specific CPS property. We look in
particular at attacks that had an effect in the physical world, and do not focus on attacks used
to do reconnaissance of CPSs (such as Havex or BlackEnergy [1956]).
The first publicly reported attack on an SCADA system was the 2000 attack on Maroochy
Shire Council’s sewage control system1 in Queensland, Australia [1958], where a contractor
who wanted to be hired for a permanent position maintaining the system used commercially
available radios and stolen SCADA software to make his laptop appear as a pumping station.
During a 3-month period the attacker caused more than 750,000 gallons of untreated sewage
water to be released into parks, rivers, and hotel grounds causing loss of marine life, and
jeopardising public health. The incident cost the city council $176,000 in repairs, monitor-
ing, clean-ups and extra security, and the contractor company spent $500,000 due to the
incident [1959].
In the two decades since the Maroochy Shire attack there have been other confirmed attacks
on CPSs [1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968]. However, no other attack has
demonstrated the new sophisticated threats that CPSs face like the Stuxnet worm (discovered
in 2010) targeting the Nuclear enrichment program in Natanz, Iran [701]. Stuxnet intercepted
requests to read, write, and locate blocks on a Programmable Logic Controller (PLC). By
intercepting these requests, Stuxnet was able to modify the data sent to, and returned from, the
PLC, without the knowledge of the PLC operator. The more popular attack variant of Stuxnet
consisted in sending incorrect rotation speeds to motors powering centrifuges enriching
Uranium, causing the centrifuges to break down so that they needed to be replaced. As a
result, centrifuge equipment had to be replaced regularly, slowing down the amount of enriched
Uranium the Natanz plant was able to produce.
Two other high-profile confirmed attacks on CPSs were the December 2015 and 2016 attacks
against the Ukrainian power grid [1969, 1970]. These attacks caused power outages and
clearly illustrate the evolution of attack vectors. While the attacks in 2015 leveraged a remote
access program that attackers had on computers in the SCADA systems of the distribution
power companies, and as such a human was involved trying to send malicious commands,
the attacks in 2016 were more automated thanks to the Industroyer malware [1971] which had
knowledge of the industrial control protocols these machines use to communicate and could
automatically craft malicious packets.
The most recent example in the arms race of malware creation targeting control systems
1
There are prior reported attacks on control systems [1957] but there is no public information corroborating
these incidents and the veracity of some earlier attacks has been questioned.

KA Cyber-Physical Systems Security | July 2021 Page 717


The Cyber Security Body Of Knowledge
www.cybok.org

is the Triton malware [1972] (discovered in 2017 in the Middle-East) which targeted safety
systems in industrial control systems. It was responsible for at least one process shutting
down. Stuxnet, Industroyer, and Triton demonstrate a clear arms race in CPS attacks believed
to be state sponsored. These attacks will have a profound impact on the way cyber-conflicts
evolve in the future and will play an essential part in how wars may be waged, as we discuss
in the last section of this chapter.

21.2 CROSSCUTTING SECURITY


[1973, 1974, 1975]
The first step for securing CPS is to identify the risks that these systems may have, and then
prioritise how to address these risks with a defence-in-depth approach. Risk assessment
consists of identifying assets in a CPS [1976], understanding their security exposure, and
implementing countermeasures to reduce the risks to acceptable levels [1917, 1977, 1978, 1979,
1980]. Penetration testing is perhaps the most common way to understand the level of risk
of the system and can be used to design a vulnerability management and patching strategy.
The supply chain is also another risk factor, discussed further in the Risk Management &
Governance Knowledge Area (Chapter 2).
One new area in CPSs is to identify the actuators or sensors that give the attacker maximum
controlability of the CPS if they are compromised [1934, 1981, 1982, 1983, 1984] and then
prioritise the protection of these devices.
Once the risks have been identified, a general defence-in-depth approach includes prevention,
detection, and mitigation mechanisms. In this section we look at crosscutting security efforts
to prevent, detect, and mitigate attacks, and the next section will look at specific CPS domains
such as the power grid and intelligent transportation systems. This section is divided in three
parts (1) preventing attacks (Section 21.2.1), (2) detecting attacks (Section 21.2.2), and (3)
mitigating attacks (Section 21.2.3).

21.2.1 Preventing Attacks


The classical way to protect the first computer-based control systems was to have them
isolated from the Internet, and from the corporate networks of the asset owners. As business
practices changed, and efficiency reasons created more interconnections of control systems
with other information technology networks, the concept of sub-network zone isolation was
adopted by several CPS industries, most notably in the nuclear energy sector. This network
isolation is usually implemented with the help of firewalls and data diodes [1985].
On the other hand, there are several ways to break the air gap, including insider attacks, or
adding new connectivity to the network via mobile devices. Therefore, to prevent attacks in
modern CPSs, designers and developers have to follow the same best security practices as
classical IT systems; i.e., they need to follow a secure development life cycle to minimise soft-
ware vulnerabilities, implement access control mechanisms, and provide strong cryptographic
protections along with a secure key management system [1986].
While the best security practices of classical IT systems can give the necessary mechanisms
for the security of control systems, these mechanisms alone are not sufficient for the defence-
in-depth of CPSs. In this section we will discuss how, by understanding the interactions of the

KA Cyber-Physical Systems Security | July 2021 Page 718


The Cyber Security Body Of Knowledge
www.cybok.org

CPS system with the physical world, we should be able to


1. better understand the consequences of an attack.
2. design novel attack-detection algorithms.
3. design new attack-resilient algorithms and architectures.
In the rest of this subsection we will focus on illustrating the challenges for implementing
classical IT security best practices in CPSs, including the fact that several CPSs are composed
of legacy systems, are operated by embedded devices with limited resources, and face new
vulnerabilities such as analogue attacks.
Securing Legacy Systems: The life cycle of CPS devices can be an order of magnitude larger
than regular computing servers, desktops, or mobile systems. Consumers expect that their
cars last longer than their laptops, hospitals expect medical equipment to last over a decade,
the assets of most industrial control systems last for at least 25 years [1987], and most of
these devices will not be replaced until they are fully depreciated. Some of these devices were
designed and deployed assuming a trusted environment that no longer exists. In addition,
even if these devices were deployed with security mechanisms at the time, new vulnerabilities
will eventually emerge and if the devices are no longer supported by the manufacturer, then
they will not be patched. For example, after the Heartbleed vulnerability was discovered, major
manufacturers pushed updates to mitigate this problem; however most embedded devices
monitoring or controlling the physical world will not be patched (patching some safety-critical
systems might even violate their safety certification). So even if a vendor used OpenSSL to
create a secure communication channel between CPS devices originally, they also need to
consider supporting the device over a long-time frame.
Therefore, to prevent attacks in CPSs we have to deal with (1) designing systems where
security can be continuously updated, and (2) retrofitting security solutions for existing legacy
systems [1988].
Some devices cannot be updated with these new secure standards, and therefore a pop-
ular way to add security to legacy networks is to add a bump-in-the-wire [1989]. Typically
a bump-in-the-wire is a network appliance that is used to add integrity, authentication, and
confidentiality to network packets exchanged between legacy devices. The legacy device
thus sends unencrypted and unauthenticated packets and the network appliance will tun-
nel them over a secure channel to another bump-in-the-wire system at the other end of the
communication channel that then removes the security protections and gives the insecure
packet to the final destination. Note that a bump-in-the-wire can only protect the system from
untrusted parties on a network, but if the end-point is compromised, a bump-in-the-wire won’t
be effective.
A similar concept has been proposed for wireless devices like implantable medical devices.
Because some of these wireless devices communicate over insecure channels, attackers can
listen or inject malicious packets. To prevent this, a wireless shield [1990] can be used near the
vulnerable devices. The wireless shield will jam any communication attempt to the vulnerable
devices except the ones from devices authorised by the owner of the shield. Wireless shields
have also been proposed for other areas, such as protecting the privacy of consumers using
BLE devices [1991]. Because of their disruptive nature, it is not clear if wireless shields will
find practical applications in consumer applications.
Lightweight Security: While several embedded devices support classical cryptography, for
some devices the performance of cryptographic algorithms in terms of energy consumption,

KA Cyber-Physical Systems Security | July 2021 Page 719


The Cyber Security Body Of Knowledge
www.cybok.org

or latency, may not be acceptable [1992]. For symmetric cryptography, NIST has plans for the
standardisation of a portfolio of lightweight cryptographic algorithms [1993] and the current
CAESAR competition for an authenticated-encryption standard is evaluating the performance
of their submissions in resource-constrained devices [1994]. For public-key algorithms, Elliptic
Curve Cryptography generally offers the best balance of performance and security guarantees,
but other lightweight public-key algorithms might be more appropriate depending on the
requirements of the system [1995]. When it comes to exploit mitigation, the solutions are
less clear. Most deeply embedded devices do not have support for data execution prevention,
address space layout randomisation, stack canaries, virtual memory support, or cryptographi-
cally secure random number generators. In addition system-on-chip devices have no way to
expand their memory, and real-time requirements might pose limitations on the use of virtual
memory. However, there are some efforts to give embedded OS better exploit mitigation
tools [1996].
Secure Microkernels: Another OS security approach is to try to formally prove the security
of the kernel. The design of secure operating systems with formal proofs of security is an
effort dating back to the Orange Book [1014]. Because the increasing complexity of code in
monolithic kernels makes it hard to prove that operating systems are free of vulnerabilities,
microkernel architectures that provide a minimal core of the functionality of an operating
system have been on the rise. One example of such a system is the seL4 microkernel, which is
notable because several security properties have been machine-checked with formal proofs of
security [1034]. DARPA’s HACMS program [1997] used this microkernel to build a quadcopter
with strong safety and security guarantees [1997].
Preventing Transduction Attacks: As introduced in the previous section, transduction attacks
represent one of the novel ways in which CPS security is different from classical IT security.
Sensors are transducers that translate a physical signal into an electrical one, but these
sensors sometimes have a coupling between the property they want to measure, and another
analogue signal that can be manipulated by the attacker. For example, sound waves can affect
accelerometers in wearable devices and make them report incorrect movement values [1998],
and radio waves can trick pacemakers into disabling pacing shocks [1999]. Security counter-
measures to prevent these attacks include the addition of better filters in sensors, improved
shielding from external signals, anomaly detection, and sensor fusion [1950]. Some specific
proposals include: drilling holes differently in a circuit board to shift the resonant frequency
out of the range of the sensor, adding physical trenches around boards containing speakers
to reduce mechanical coupling, using microfiber cloths for acoustic isolation, implement-
ing low-pass filters that cut-off coupled signals, and secure amplifiers that prevent signal
clipping [1949, 1998].

21.2.2 Detecting Attacks


Detecting attacks can be done by observing the internal state of a CPS device, by monitoring the
interaction among devices to spot anomalous activities, or even using out-of-band channels.
In the first category, Remote Attestation is a field that has received significant attention for
detecting malware in embedded systems because they usually do not have strong malware
protections themselves [2000, 2001, 2002, 2003]. Remote attestation relies on the verification
of the current internal state (e.g., RAM) of an untrusted device by a trusted verifier. There
are three variants of remote attestation: software-based attestation, hardware-assisted at-
testation, and hybrid attestation. Software-based attestation does not rely on any special

KA Cyber-Physical Systems Security | July 2021 Page 720


The Cyber Security Body Of Knowledge
www.cybok.org

security hardware in the device, but it has weak security guarantees and usually requires
wireless range between the verifier and the device being checked. In contrast, hardware-based
attestation (e.g., attestation with the support from a TPM, TrustZone or SGX) provides stronger
security, but requires dedicated secure hardware in CPSs devices, which in turn increases
their cost, which might not be affordable in some low-end embedded systems. Hybrid ap-
proaches attempt to find a middle ground by reducing the secure hardware requirements
while overcoming the security limitations of pure software-based approaches [1865, 2004].
The minimal secure hardware requirements include a secure place to store the secret key,
and safe code that has exclusive access to that key. A challenge for hybrid attestation is
the fact that it needs to be non-interruptible and atomic (it has to run from the beginning
to the end), and the (so far) relatively long (5-7 seconds [1865, 2004]) secure measurement
of embedded memory might not be applicable for safety-critical real-time applications. In
addition to academic work, industry is also developing standards to enhance the security of
embedded systems with minimal silicon requirements. For example, the Trusted Computing
Group (TCG) Device Identifier Composition Engine (DICE) is working on combining simple
hardware capabilities to establish strong identity, attest software, and security policy, and
assist in deploying software updates. We finalise our description of attestation by pointing
out that most of the practical proposals for attestation work for initialisation, but building
practical run-time attestation solutions remains a difficult challenge.
Network Intrusion Detection: The second category of solutions for detecting attacks relies
on monitoring the interactions of CPS devices. In contrast with classical IT systems, where
simple Finite-State models of network communications will fail, CPSs exhibit comparatively
simpler network behaviour: servers change less frequently, there is a more stable network
topology, a smaller user population, regular communication patterns, and networks host
a smaller number of protocols. Therefore, intrusion detection systems, anomaly detection
algorithms, and white listing access controls are easier to design and deploy than in classical
IT systems [2005]. If the CPS designer can give a specification of the intended behaviour of the
network, then any non-specified traffic can be flagged as an anomaly [2006]. Because most
of the communications in CPS networks are between machines (with no human intervention),
they happen automatically and periodically, and given their regularity, these communication
patterns may be captured by finite state models like Deterministic Finite Automata [2007, 2008]
or via Discrete-Time Markov Chains [2009, 2010]. While network specification is in general
easier in CPS environments when compared to IT, it is still notoriously difficult to maintain.
Physics-Based Attack Detection: The major distinction of control systems with respect to
other IT systems is the interaction of the control system with the physical world. In contrast
to work in CPS intrusion detection that focuses on monitoring “cyber” patterns, another line of
work studies how monitoring sensor (and actuation) values from physical observations, and
control signals sent to actuators, can be used to detect attacks; this approach is usually called
physics-based attack detection [1974]. The models of the physical variables in the system
(their correlations in time and space) can be purely data-driven [2011], or based on physical
models of the system [1934]. There are two main classes of physical anomalies: historical
anomalies and physical-law anomalies.
Historical Anomalies: identify physical configuration we have not seen before. A typical
example is to place limits on the observed behaviour of a variable [2012]. For example if during
the learning phase, a water level in a tank is always between 1m and 2m, then if the water
level ever goes above or below these values we can raise an alert. Machine learning models
of the historical behaviour of the variables can also capture historical correlations of these
variables. For example, they can capture the fact that when the tank of a water-level is high, the

KA Cyber-Physical Systems Security | July 2021 Page 721


The Cyber Security Body Of Knowledge
www.cybok.org

water level of a second tank in the process is always low [2013]. One problem with historical
anomalies is that they might generate a large number of false alarms.
Physical-Law Anomalies: A complementary approach to historical observations that may
have fewer false alarms, is to create models of the physical evolution of the system. For
example we have a sensor that monitors the height of a bouncing ball, then we know that
this height follows the differential equations from Newton’s laws of mechanics. Thus, if a
sensor reports a trajectory that is not plausible given the laws of physics, we can immediately
identify that something is not right with the sensor (a fault or an attack). Similarly, the physical
properties of water systems (fluid dynamics) or the power grid (electromagnetic laws) can be
used to create time series models that we can then use to confirm that the control commands
sent to the field were executed correctly and that the information coming from sensors is
consistent with the expected behaviour of the system. For example, if we open an intake valve
we should expect that the water level in the tank should rise, otherwise we may have a problem
with the control, actuator, or the sensor. Models of the physical evolution of the system have
been shown to be better at limiting the short-term impact of stealthy attacks (i.e., attacks
where the attacker creates a malicious signal that is within the margin of error of our physical
models) [2014]. However, if the attack persists for a long time and drives the system to an
unsafe region by carefully selecting a physically plausible trajectory, then historical models
can help in detecting this previously unseen state [2015].
In addition to the physics of the system being controlled, devices (such as actuators) have
dynamics as well, and these physical properties can also be used to monitor the proper
behaviour of devices [2016].
Out-of-band Detection: Another way to passively monitor the physical system is through
out-of-band channels [2017]. For example, Radio Frequency-based Distributed Intrusion Detec-
tion [2018] monitors radio frequency emissions from a power grid substation in order to check
if there are malicious circuit breaker switching, transformer tap changes, or any activation of
protecting relays without the direct request sent from the SCADA server. The basic idea is to
correlate control commands sent by the SCADA server, with the radio frequency emissions
observed in the substation. A potential drawback with this approach is that attackers can
launch RF attacks mimicking the activation of a variety of electric systems, which can lead to
security analysts losing confidence in the veracity of the alerts.
Active Detection: In addition to passively monitoring a CPS, an intrusion detection system can
actively query devices to detect anomalies in how devices respond to these requests [2019].
In addition to a network query, the intrusion detection system can also send a physical chal-
lenge to change the system’s physical behaviour. This approach is also known as physical
attestation [2013, 2020, 2021], where a control signal is used to alter the physical world, and
in response, it expects to see the changes done in the physical world reflected in the sensor
values. For example, we can send signals to change the network topology of the power grid
to see if the sensors report this expected change [2022], use a change in the field of vision of
a camera to detect hacked surveillance cameras [2023], or use a watermarking signal in a
control algorithm [2024]. The concept of active detection is related to research on moving
target defence applied to cyber-physical systems [2025, 2026, 2027, 2028]. However, both
active detection and moving target defence might impose unnecessary perturbations in a sys-
tem by their change of the physical world for security purposes. Therefore, these techniques
might be too invasive and costly. Consequently, the practicality of some of these approaches
is uncertain.

KA Cyber-Physical Systems Security | July 2021 Page 722


The Cyber Security Body Of Knowledge
www.cybok.org

21.2.3 Mitigating Attacks


Most of the efforts for mitigating faults in CPSs have focused on safety and reliability (the
protection of the system against random and/or independent faults). Attack mitigation is
an extension of safety and reliability protections for when the faults in the systems are not
created at random by nature, but by an adversary.
Attack mitigation is related to the concept of resilient control systems, defined as those
that maintain state awareness and an accepted level of operational normalcy in response to
disturbances, including threats of an unexpected and malicious nature [2029].
There are two main types of mitigating technologies: i) proactive and ii) reactive. Proactive
mitigation considers design choices deployed in the CPS prior to any attack. On the other hand,
reactive responses only take effect once an attack has been detected, and they reconfigure
the system online in order to minimise the impact of the attack. We first describe proactive
approaches.
Conservative Control: One of the first ideas for mitigating the impact of attacks was to operate
the system with enough safety margins so that if an attack ever occurred, it would be harder
for the attacker to reach an unsafe region. One intuitive idea for this type of control algorithm is
to use Model Predictive Control (MPC) to design a control strategy that predicts that an attack
will happen starting at the next time step [1941], and therefore plans an optimal control action
that will attempt to keep the system safe if the attack happens. Operating a CPS conservatively
usually comes at the cost of suboptimal operation and extra costs when the system is not
under attack.
Resilient Estimation: Resilient estimation algorithms attempt to obtain this state of a sys-
tem, even if a subset of sensors is compromised [2030, 2031]. The basic idea is to use the
knowledge of a CPS and the correlations of all sensor values. With enough redundancy in
sensor measurements, a resilient estimation algorithm can reject attempted attacks and still
obtain an accurate state estimate. This idea is similar to error correcting codes in information
theory, where a subset of the bits transmitted can be corrupted, but the error correcting code
reconstructs the original message. The drawback, however, is that not all CPSs will have a
variety of correlated sensors to check the consistency of others, so this approach depends on
the properties of the system.
Sensor Fusion: Resilient estimation algorithms usually assume a variety of multi-modal sen-
sors to achieve their security guarantees. This is also the idea behind sensor fusion, where sen-
sors of different types can help “confirm” the measurement of other sensors [2032, 2033, 2034].
A basic example of sensor fusion in automotive systems is to verify that both the LiDAR read-
ings and the camera measurements report consistent observations.
Virtual Sensors: When we use physical-laws anomaly detection systems, we have, in effect, a
model of the physical evolution of the system. Therefore, one way to mitigate attacks on the
sensors of a CPS is to use a physical model of the system to come up with the expected sensor
values that can then be provided to the control algorithm [1934, 2015, 2035]. By removing
a sensor value with its expected value obtained from the system model, we are effectively
controlling a system using open-loop control, which might work in the short-term, but may be
risky as a long-term solution, as all physical models are not perfect, and the error between the
real-world and the model simulation can increase over time. Another important consideration
when designing virtual sensors as an attack-response mechanism, is to evaluate the safety of
the system whenever the system is activated due to a false alarm [1934].

KA Cyber-Physical Systems Security | July 2021 Page 723


The Cyber Security Body Of Knowledge
www.cybok.org

Constraining Actuation: A similar principle of operating conservatively is to physically con-


strain the actuators of a CPS so that if the attacker ever succeeds in gaining access to the
system, it is restricted in how fast it can change the operation of the system. This approach
can guarantee, for example, the safety of vehicle platooning systems, even when the attacker
has complete control of one of the vehicles [2036].
Inertial Resets: Another idea to mitigate attacks is to reset and diversify the system as
frequently as possible so that attackers are unable to gain persistent control of the sys-
tem [2037, 2038]. The basic idea is that a full software reset of the system will make the
system boot again in a trusted state, eliminating the presence of an attacker. This requires the
system to have a trusted computing base that can boot the system in a secure state where the
malware is not loaded yet. However, turning off a system that is in operation is a potentially
dangerous action, and it is not clear if this proposal will be practical.
Reactive Control Compensation: When sensors or controllers are under attack, new actions
are generated in order to maintain the safety of the system. Inspired by the literature on
fault-tolerant control, one idea is to attempt to estimate the attack signal, and then generate a
compensating action to eliminate it [2039]. The problem with this approach is that it does
not consider strategic adversaries; however game-theoretic approaches can address that
limitation. In game-theoretic models, an attacker compromises a set of control signals uak ∈
Rma and the defender uses the remaining controllers udk ∈ Rmd to deploy a defence action. The
game between the attacker and the defender can be simultaneous (zero-sum or minimax
game) [2040, 2041, 2042] or sequential (e.g., Stackelberg game) [2043, 2044, 2045]. One of
the challenges with game theory is that, in order to model and prove results, the formulation
needs to be simplified, and in addition, models need to add a number of extra assumptions
that might not hold in practice.
Safe Control Actions: Another reactive approach is to change or even prevent a potentially
malicious control action from acting on the system. The idea of having a High Assurance
Controller (HAC) as a backup to a High Performance Controller (HPC) predates work on CPS
security, and was proposed as a safety mechanism to prevent complex and hard-to verify
HPCs from driving the system to unsafe states [2046]. A more recent and security-oriented
approach is to use the concept of a reference monitor to check if the control action will
result in any unsafe behaviour before it is allowed to go into the field [1943]. The proposed
approach depends on a controller of controllers (C2 ), which mediates all control signals sent
by the controller to the physical system. In particular, there are three main properties that C2
attempts to hold: 1) safety (the approach must not introduce new unsafe behaviours, i.e., when
operations are denied the ‘automated’ control over the plant, it should not lead the plant to an
unsafe behaviour); 2) security (mediation guarantees should hold under all attacks allowed by
the threat model); and 3) performance (control systems must meet real-time deadlines while
imposing minimal overhead).
All the security proposals for preventing, detecting, and responding to attacks presented in this
section are generally applicable to CPSs. However, there are unique properties of each CPS
application that can make a difference in how these solutions are implemented. Furthermore,
some unique properties of a particular CPS domain can lead to new solutions (such as
the touch-to-access principle proposed for implantable medical devices [2047]). In the next
section we change focus from general and abstract CPS descriptions, to domain-specific
problems and solutions.

KA Cyber-Physical Systems Security | July 2021 Page 724


The Cyber Security Body Of Knowledge
www.cybok.org

21.3 CPS DOMAINS


[1973, 2048, 2049, 2050, 2051, 2052, 2053, 2054]
Having presented general principles for securing CPSs, in this section we discuss domain-
specific security problems for CPSs. In particular we focus on industrial control systems,
electrical power grids, transportation systems, vehicles, robots, medical devices, and con-
sumer IoT.

21.3.1 Industrial Control Systems


Industrial control systems represent a wide variety of networked information technology
systems connected to the physical world [2055]. Depending on the application, these control
systems are also called Process Control Systems (PCSs) in the chemical industry, or Dis-
tributed Control Systems (DCSs) if the devices used for supervision and control are procured
using a monolithic architecture.
Control systems are usually composed of a set of networked agents, consisting of sensors,
actuators, control processing units such as Programmable Logic Controllers (PLCs), Remote
Terminal Units (RTUs), and communication devices. For example, the oil and gas industry uses
integrated control systems to manage refining operations at plant sites, remotely monitor the
pressure and flow of gas pipelines, and control the flow and pathways of gas transmission.
Water utilities can remotely monitor well levels and control the wells’ pumps; monitor flows,
tank levels, or pressure in storage tanks; monitor pH, turbidity, and chlorine residual; and
control the addition of chemicals to the water.

Figure 21.5: Bottom Layers of Industrial Control Systems [1908].

Control systems have a layered hierarchy [1905], which can be used for network segmentation
and to ensure access control. Figure 21.5 shows an illustration of the lower layers of this
hierarchy.
The top layers operate using mostly traditional Information Technology: computers, operating
systems, and related software. They control the business logistic system, which manages the
basic plant production schedule, material use, shipping and inventory levels, and also plant
performance, and keep data historians for data-driven analytics (e.g., predictive maintenance).
The supervisory control layer is where the Supervisory Control and Data Acquisition (SCADA)
systems and other servers communicate with remote control equipment like Programmable
Logic Controllers (PLCs) and Remote Terminal Units (RTUs). The communication between
servers in a control room and these control equipment is done via a Supervisory Control
Network (SCN).

KA Cyber-Physical Systems Security | July 2021 Page 725


The Cyber Security Body Of Knowledge
www.cybok.org

Regulatory control is done at the lower layer, which involves instrumentation in the field, such
as sensors (thermometers, tachometers, etc.) and actuators (pumps, valves, etc.). While
traditionally this interface has been analogue (e.g., 4-20 milliamperes), the growing numbers
of sensors and actuators as well as their increased intelligence and capabilities, has given rise
to new Field Communication Networks (FCNs) where the PLCs and other types of controllers
interface with remote Input/Output boxes or directly with sensors and actuators using new
Ethernet-based industrial protocols like ENIP and PROFINET, and wireless networks like
WirelessHART. Several ring topologies have also been proposed to avoid a single point of
failure for these networks, such as the use of Device Level Ring (DLR) over ENIP.
SCN and FCN networks represent Oblivious Transfer (OT) networks, and they have different
communication requirements and different industrial network protocols. While SCN can
tolerate delays of up to the order of seconds, FCN typically require an order of magnitude
of lower communication delays, typically enabling communications between devices with a
period of 400 us.
Intrusion detection is a popular research topic for protecting control systems, and this includes
using network security monitors adapted to industrial protocols [2005, 2007, 2008, 2009, 2010,
2056, 2057], and physics-based anomaly detection [1934, 2011, 2012, 2014, 2058, 2059]. The
layer where we monitor the physics of the system can have a significant impact on the types
of attacks that can be detected [2060].
In particular the adversary can compromise and launch attacks from (1) SCADA servers [2061],
(2) controllers/PLCs [2062], (3) sensors [1933], and (4) actuators [2063], and each of these
attacks can be observable at different layers of the system.
Most of the work on network security monitoring for industrial control systems has deployed
network intrusion detection systems at the SCN. However, if an anomaly detection system
is only deployed in the supervisory control network then a compromised PLC can send ma-
nipulated data to the field network, while pretending to report that everything is normal back
to the supervisory control network. In the Stuxnet attack, the attacker compromised a PLC
(Siemens 315) and sent a manipulated control signal ua (which was different from the original
u, i.e., ua 6= u). Upon reception of ua , the frequency converters periodically increased and
decreased the rotor speeds well above and below their intended operation levels. While the
status of the frequency converters y was then relayed back to the PLC, the compromised
PLC reported a manipulated value ya = 6 y to the control centre (claiming that devices were
operating normally). A similar attack was performed against the Siemens 417 controller [2062],
where attackers captured 21 seconds of valid sensor variables at the PLC, and then replayed
them continuously for the duration of the attack, ensuring that the data sent through the SCN
to the SCADA monitors would appear normal [2062]. A systematic study of the detectability of
various ICS attacks (controller, sensor, or actuator attacks) was given by Giraldo et al. [2060],
and the final recommendation is to deploy system monitors at the field network, as well as at
the supervisory network, and across different loops of the control system.
In addition to attack detection, preventing the system from reaching unsafe states is also
an active area of research [1943, 2064, 2065, 2066, 2067]. The basic idea is to identify that
a control action can cause a problem in the system, and therefore a reference monitor will
prevent this control signal from reaching the physical system. Other research areas include
the retrofitting of security in legacy systems [1988, 2068], and malware in industrial control
devices [2069, 2070]. A concise survey of research in ICS security was given by Krotofil and
Gollmann [2071], and reviews of state-of-the-art practices in the field of ICS security include
the work of Knowles et al. and Cherdantseva et al. [1980, 2072].

KA Cyber-Physical Systems Security | July 2021 Page 726


The Cyber Security Body Of Knowledge
www.cybok.org

A problem for studying industrial control systems is the diversity of platforms, including the
diversity of devices (different manufacturers with different technologies) and applications
(water, chemical systems, oil and gas, etc.). Therefore one of the big challenges in this space
is the reproducibility of results and the generality of industrial control testbeds [2073].

21.3.2 Electric Power Grids


At the turn of the century, the US National Academy of Engineering selected the top 20
engineering achievements of the twentieth century (the achievements that most improved
people’s quality of life) and at the top of this list, was the power grid [2074]. In the approximately
140 years since their inception, electric grids have extended transmission lines to 5 billion
people around the world, bringing light, refrigeration, and many other basic services to people
across the globe.
The power grid has three major parts: (1) generation, (2) transmission, and (3) distribution.
Electric power is generated wherever it is convenient and economical, and then it is transmitted
at high voltages (100kV-500kV) in order to minimise energy losses—electrical power is equal
to voltage times electrical current (P = V I), (and given a constant power, high voltage lines
have less electrical current), and therefore there is less energy lost as heat as the current
moves through the transmission lines. Geographically, a distribution system is located in a
smaller region thereby energy losses are less of a concern while safety (preventing accidents,
fires, electrocutions, etc.) is more important, therefore they are operated at lower voltages.
The transmission system is an interconnected, redundant network that spans large regions
(usually one country). Large generation plants and the transmission network (the first two
parts of the power grid) are usually referred to as the Bulk Power System, and this bulk power
system is responsible for the reliable delivery of electricity to large areas. A disruption in
the bulk power grid can cause a country-level blackout that would require several days of a
blackstart period to restart the system. In contrast, distribution systems (the third part of the
grid) are much smaller, their networks are radial (non-redundant), and a failure in their system
usually only causes a localised outage (e.g., a blackout in a neighborhood). This is the reason
most government and industry efforts have prioritised the creation of standards for security
in the bulk power system [2050].
One of the most popular lines of work related to the security of power systems is the study of
false data injection attacks in order to cause the algorithms in the power grid to misbehave. The
most popular of this type of attacks are the false data injection attacks against state estimation.
In the power grid, operators need to estimate the phase angles xk from the measured power
flow yk in the transmission grid. As mentioned in the section about CPS safety, bad data
detection algorithms were meant to detect random sensor faults, not strategic attacks, and
as Liu et al. [1933, 2075] showed, it is possible for an attacker to create false sensor signals
that will not raise an alarm (experimental validation in software used by the energy sector was
later confirmed [2076]). There has been a significant amount of follow up research focusing
on false data injection for state estimation in the power grid, including the work of Dán and
Sandberg[2077], who study the problem of identifying the best k sensors to protect in order
to minimise the impact of attacks, and Kosut et al. [2078], who consider attackers trying to
minimise the error introduced in the estimate, and defenders with a new detection algorithm
that attempts to detect false data injection attacks. Further work includes [1982, 2022, 2079,
2080, 2081].

KA Cyber-Physical Systems Security | July 2021 Page 727


The Cyber Security Body Of Knowledge
www.cybok.org

21.3.2.1 Smart Grids

While the current power grid architecture has served well for many years, there is a growing
need to modernise the world’s electric grids to address new requirements and to take ad-
vantage of the new technologies. This modernisation includes the integration of renewable
sources of energy, the deployment of smart meters, the exchange of electricity between
consumers and the grid, etc. Figure 21.6 illustrates some of these concepts. The rationale for
modernising the power grid includes the following reasons:

Bulk Generation Transmission Distribution Renewable


Energy
Renewable Integration

Wide Area
Monitoring

Non Smart
Renewable Phasor Relays
Measurement
Large Capacity
Unit
Batteries

Customers
Smart Meter Renewable
Energy

Energy
Batteries
Management
One-way electricity flow Smart Meter Systems
Two-way electricity flow
Smart
Appliances

Plug-in Vehicles

Figure 21.6: Modernization of the power grid [2082].

Efficiency: One of the main drivers of the smart grid programs is the need to make more
efficient use of the current assets. The peak demand for electricity is growing every year and
so utility companies need to spend more money each year in new power plants and their
associated infrastructures. However, the peak demand is only needed 16% of the time and so
the equipment required to satisfy this peak demand will remain idle for the rest of the time.
One of the goals for the smart grid is to change the grid from load following to load shaping
by giving incentives to consumers for reducing electricity consumption at the times of peak
demand. Reducing peak demand – in addition to increasing the grid stability – can enable
utilities to postpone or avoid the construction of new power stations. The control or incentive
actions used to shape the load is usually called Demand Response.
Efficiency also deals with the integration of the new and renewable generation sources, such
as wind and solar power with the aim of reducing the carbon footprint.
Reliability: The second main objective of modernising the power grid is reliability, especially at
the distribution layer (the transmission layer is more reliable). By deploying new sensors and

KA Cyber-Physical Systems Security | July 2021 Page 728


The Cyber Security Body Of Knowledge
www.cybok.org

actuators throughout the power grid, operators can receive real-time, fine-grained data about
the status of the power grid, that enables better situational awareness, faster detection of
faults (or attacks), and better control of the system, resulting in fewer outages. For example,
the deployment of smart meters is allowing distribution utilities to automatically identify the
location and source of an outage.
Consumer choice: The third objective is to address the lack of transparency the current power
grid provides to consumers. Currently, most consumers receive only monthly updates about
their energy usage. In general, consumers do not know their electricity consumption and
prices that they are paying at different times of the day. They are also not informed about
other important aspect of their consumption such as the proportion of electricity that was
generated through renewable resources. Such information can be used to shape the usage
pattern (i.e., the load). One of the goals of the smart grid is to offer consumers real-time data
and analytics about their energy use. Smart appliances and energy management systems will
automate homes and businesses according to consumer preferences, such as cost savings
or by making sure more renewable energy is consumed.
To achieve these objectives, the major initiatives associated with the smart grid are the ad-
vanced metering infrastructure, demand response, transmission and distribution automation,
distributed energy resources, and the integration of electric vehicles.
While modernising the power grid will bring many advantages, it can also create new threat
vectors. For example, by increasing the amount of collected consumer information, new
forms of attack will become possible [2083]. Smart grid technologies can be used to infer the
location and behaviour of users including if they are at home, the amount of energy that they
consume, and the type of devices they own [2084, 2085]).
In addition to new privacy threats, another potential new attack has been referred to as
load-altering attack. Load-altering attacks have been previously studied in demand-response
systems [2086, 2087, 2088, 2089, 2090, 2091]. Demand-response programs provide a new
mechanism for controlling the demand of electricity to improve power grid stability and
energy efficiency. In their basic form, demand-response programs provide incentives (e.g., via
dynamic pricing) for consumers to reduce electricity consumption during peak hours. Currently,
these programs are mostly used by large commercial consumers and government agencies
managing large campuses and buildings, and their operation is based on informal incentive
signals via phone calls by the utility or by the demand-response provider (e.g., a company
such as Enel X) asking the consumer to lower their energy consumption during the peak
times. As these programs become more widespread (targeting residential consumers) and
automated (giving utilities or demand-response companies the ability to directly control the
load of their customers remotely) the attack surface for load-altering attacks will increase. The
attacks proposed consider that the adversary has gained access to the company controlling
remote loads and can change a large amount of the load to affect the power system and
cause either inefficiencies to the system, economic profits for the attacker, or potentially
cause enough load changes to change the frequency of the power grid and cause large-scale
blackouts. Demand-response systems can be generalised by transactive energy markets,
where prosumers (consumers with energy generation and storage capabilities) can trade
energy with each other, bringing their own privacy and security challenges [2092].
More recently Soltan et al. [2093] studied the same type of load-altering attacks but when the
attacker creates a large-scale botnet with hundreds of thousands of high-energy IoT devices
(such as water heaters and air conditioners). With such a big botnet the attacker can cause (i)
frequency instabilities, (ii) line failures, and (iii) increased operating costs. A followup work

KA Cyber-Physical Systems Security | July 2021 Page 729


The Cyber Security Body Of Knowledge
www.cybok.org

by Huang et al. [2094] showed that creating a system blackout—which would require a black
start period of several days to restart the grid—or even a blackout of a large percentage of the
bulk power grid can be very difficult in part because the power grid has several protections to
load changes, including under-frequency load shedding.

21.3.3 Transportation Systems and Autonomous Vehicles


Modern vehicular applications leverage ubiquitous sensing and actuation capabilities to im-
prove transportation operations [2095] thanks to technologies such as smart phones [2096],
participatory sensing [2097], and wireless communication networks [2098]. Modern func-
tionalities include Traffic flow control with ramp metering at freeway on-ramps and signal
timing plans at signalised intersections to reduce congestion; Demand management which
focuses on reducing the excess traffic during peak hours; Incident management which targets
resources to alleviate incident hot spots; and Traveler information which is used to reduce
traveler buffer time, i.e., the extra time the travelers must account for, when planning trips.
While this large-scale collection of sensor data can enable various societal advantages, it also
raises significant privacy concerns. To address these emerging privacy concerns from sensor
data, many techniques have been proposed, including differential privacy [509].
Although privacy is an important concern for these systems, it is unfortunately not the only
one. Widespread vulnerabilities such as those from traffic sensors [1968, 2099, 2100] can
be readily exploited [2101, 2102, 2103, 2104]. For example, Wang et al. [2103] showed that
attackers can inject false data in crowdsourced services to cause false traffic congestion
alarms and fake accidents, triggering the services to automatically reroute traffic.
Similar problems can be found on commercial flights. Not only are airplanes being mod-
ernised while introducing potentially new attack vectors by attempting to attack avionic
systems through the entertainment network [2105] but air traffic systems might also be vul-
nerable to attacks. A new technology complementing (or potentially replacing) radar systems
is the Automatic Dependent Surveillance-Broadcast (ADS-B) system. ADS-B consists of air-
planes sharing their GPS coordinates with each other and with air traffic control systems, but
these systems are currently unauthenticated and unencrypted, posing security and privacy
problems [2106].

21.3.3.1 Ground, Air, and Sea Vehicles

Software problems in the sensors of vehicles can cause notorious failures, as the Ariane 5
rocket accident [2107], which was caused by software in the inertial navigation system shut
down causing incorrect signals to be sent to the engines. With advances in manufacturing
and modern sensors, we are starting to see the proliferation of Unmanned Vehicles (UVs) in
the consumer market as well as across other industries. Devices that were only available to
government agencies have diversified their applications ranging from agricultural manage-
ment to aerial mapping and freight transportation [2108]. Out of all the UVs available in the
commercial market (aerial, ground and sea vehicles) unmanned aerial vehicles seem to be
the most popular kind with a projected 11.2 billion dollar global market by 2020 [2109].
The expansion of unmanned aerial vehicles has increased security and privacy concerns. In
general, there is a lack of security standards for drones and it has been shown that they are
vulnerable to attacks that target either the cyber and/or physical elements [2052, 2110]. From

KA Cyber-Physical Systems Security | July 2021 Page 730


The Cyber Security Body Of Knowledge
www.cybok.org

the point of view of privacy, drones can let users spy on neighbours [2111, 2112], and enable
literal helicopter parenting [2113].
Attacks remotely accessing someone else’s drone (e.g., a neighbour) to take photos or videos,
stealing drones wirelessly (e.g., an attacker in a vehicle can take over a drone and ask it to
follow the vehicle), and taking down a drone operated by someone else (which can lead to
charges like mishandling a drone in public, which in turn has resulted in reckless endangerment
convictions) [1939].
UVs have multiple sensors that aid them to assess their physical environments such as
accelerometers, gyroscopes, barometers, GPS and cameras. While reliance on sensor data
without any form of validation has proven to be an effective trade-off in order to maintain the
efficiency demands of real-time systems, it is not a sustainable practice as UVs become more
pervasive. Transduction attacks on sensors have shown that accelerometers, gyroscopes,
and even cameras used by drones for stabilisation can be easily attacked, causing the drone
to malfunction, crash, or even be taken over by the attacker [1951, 1998, 2114].
Even on many operational warships, remote monitoring of equipment is now done with a
hardwired LAN by systems such as the Integrated Condition Assessment System (ICAS) [2115].
ICAS are generally installed with connections to external Programmable Logic Controllers
(PLCs), which are used in Supervisory Control and Data Acquisition (SCADA) systems to direct
the movement of control equipment that performs actual manipulation of physical devices in
the ship such as propulsion and steering (rudder) devices [2115, 2116]. Therefore, the secure
operation of ships is highly related to the security of industrial control systems.
For ground vehicles, one of the areas of interest is the security of the Controller Area Network
(CAN). The CAN system is a serial broadcast bus designed by Bosch in 1983 to enable the
communication of Electrical Control Units (ECUs) in cars. Examples of ECUs include brake
systems, the central timing module, telematic control units, gear control, and engine control.
The CAN protocol, however, does not have any security mechanism, and therefore an attacker
who can enter the CAN bus in a vehicle (e.g., through a local or remote exploit) can spoof
any ECU to ignore the input from drivers, and disable the brakes or stop the engine [2117].
Therefore, research has considered ways to retrofit lightweight security mechanisms for
CAN systems [1791], or how to detect spoofed CAN messages based on the physical-layer
characteristics of the signal [2118] (voltage level profiles, timing, frequency of messages, etc.).
However, the security of some of these systems remains in question [2119].
Autonomous vehicles will also face new threats, for example, a malicious vehicle in an auto-
mated platoon can cause the platoon to behave erratically, potentially causing accidents [2120].
Finally, new functionalities like a remote kill-switch can be abused by attackers, for example,
an attacker remotely deactivated hundreds of vehicles in Austin, Texas, leaving their owners
without transportation [2121].

KA Cyber-Physical Systems Security | July 2021 Page 731


The Cyber Security Body Of Knowledge
www.cybok.org

21.3.4 Robotics and Advanced Manufacturing


Security in manufacturing has been for many years a part of critical infrastructure security but,
as the manufacturing process became more sophisticated, the threats have increased. Wells
et al. [2053] give a high-level view about the concerns of this industry. They also mention that
quality control techniques traditionally used in the manufacturing industry can be leveraged
to detect attacks.
Attacks can target the structural integrity (scale, indent, or vertex) or material integrity (strength,
roughness, or color) of the manufactured products [2122]. Physical tests, for example, non-
destructive tests such as visual inspection, weight measure, dimension measure, 3D laser
scanning, interferometry, X-ray, CT, and destructive mechanical tests like employing the tensile
and yield properties of the material can help us in detecting attacks.
Robotic systems in automated assembly lines can also be used to create damaged parts or
cause safety problems [2123]. Safety accidents with robots date back to 1979, when a worker
at Ford motor company was killed by a robot. As pointed out by P.W. Singer, the Ford worker
might have been the first, but he would be far from the last, as robots have killed various other
people [2124]. Beyond manufacturing, robotic weapons also pose significant challenges. For
example, in 2007 a software glitch in an antiaircraft system sporting two cannons began firing
hundreds of high-explosive rounds, and by the time they were emptied, nine soldiers were
dead, and fourteen seriously injured [2124]. We will discuss later in this document how new
advances in CPSs may change the way nations wage future wars.

21.3.5 Medical Devices


Due to their safety and privacy risks, embedded medical devices are another CPS domain that
has received significant attention in the literature.
While not an attack, the software error of the Therac-25 is one of the most well-known classical
examples of how software problems can harm and even kill people. The Therac-25 was a
computer-controlled radiation therapy machine that gave massive radiation overdoses to
patients resulting in deaths and injuries [2125]. Our concern here is if these problems are not
accidental but malicious?
Modern Implantable Medical Devices (IMDs) include pacemakers, defibrillators, neurostimula-
tors, and drug delivery systems. These devices can usually be queried and reprogrammed by a
doctor, but this also opens these devices up to security and privacy threats, in particular when
an attacker can impersonate the device used by the doctor to modify the settings of IMDs.
Rushanan et al. [2054] and Camara et al. [2126] describe the types of adversaries that medical
devices will be subject to, including the ability to eavesdrop all communication channels
(passive) or read, modify and inject data (active). In order to mitigate possible attacks in the
telemetry interface, they propose authentication (e.g., biometric, distance bounding, out of
band channels, etc.), and the use of an external wearable device that allows or denies access
to the medical device depending on whether this extra wearable device is present. In addition
to prevention, they also discuss attack detection by observing patterns to distinguish between
safe and unsafe behaviour.
In particular, a novel proposal to study proper authentication of the programmer with the IMD
is the touch-to-access principle [2047, 2127]. The basic idea is that the patient has a biometric
signal (such as the time between heart beats) that should only be available to other devices

KA Cyber-Physical Systems Security | July 2021 Page 732


The Cyber Security Body Of Knowledge
www.cybok.org

in direct contact with the patient. This “secret” information is then used by the programmer
and the IMD as a fuzzy password to bootstrap their security association.
A key challenge is to make sure that the biometric signal being used to give access via
touch-to-access, is not remotely observable. However, heart beats can be inferred with side
information including a webcam [2128], and an infrared laser [2129].
Security goes beyond implantable devices. As healthcare computer and software infrastructure
introduces new technology, the industry will need to increase its security efforts. Medical data
is a prime target for theft and privacy violations, and denial of service attacks in the form of
ransomware [2130].

21.3.6 The Internet of Things


Consumer Internet of Things (IoT) devices are found everywhere: in our houses as voice-
assistant devices, home automation smart devices, smart appliances, and surveillance sys-
tems; in healthcare as wearable technology including fitness devices and health-monitoring
devices; in education including Internet-connected educational children toys; and for enter-
tainment including remote controlled Wi-Fi devices.
As our lives become more dependent on these systems, their security has become an impor-
tant, growing concern. The security of these devices depends on the integrity of the software
and firmware they execute and the security mechanisms they implement.
New attack vectors make IoT devices attractive to criminals, like bad actors using vulnerable
IoT devices to orchestrate massive Distributed Denial of Service (DDoS) attacks (the Mirai
botnet) [665, 2131], attackers who compromised a fish tank to penetrate the internal network
of a casino [2132], or attackers demanding ransomware from a hotel so they could let their
guests enter their rooms [1961].
A large number of the IoT devices included in large IoT botnets [665, 2131] include Internet-
connected cameras. Internet-connected cameras have given rise to multiple reports of unau-
thorised access by attackers [2133], and video feeds of multiple cameras are openly available
online and discoverable through IoT web indexing platforms like Shodan [2134], potentially
compromising the privacy of consumers who do not check the default configuration mech-
anisms. The threats to IoT go beyond privacy fears and DDoS attacks. Vulnerabilities in
consumer IoT products including drones, IoT cameras, smart toys for children, and intimate
devices can lead not only to privacy invasions but also to physical damages (drones being
used to harm people), abuse, and harassment [2135]. Understanding the consequences of
these new type of physical and mental abuses will require the involvement of more social
scientists and legal scholars to help us define a framework on how to reason about them.
An area that has attracted significant attention from the research community is the security
of voice-activated digital assistants. For example, researchers leveraged microphone non-
linearities to inject inaudible voice commands to digital assistants [1953]. Other recent work
includes the use of new attacks like “voice squatting” or “voice masquerading” to take over
voice-controlled applications [2136]. For example the consumer might want to open the
application “Capital One”, but an attacker can make an application available called “Capital
Won” and the voice-controlled personal assistant might open the second functionality. In the
“voice masquerading” attack, an attacker application might remain in control of the system
and pretend to be following the consumer’s commands to open other functionalities, while in
reality it is impersonating the desired functionalities.

KA Cyber-Physical Systems Security | July 2021 Page 733


The Cyber Security Body Of Knowledge
www.cybok.org

Several of the security solutions for consumer IoT have proposed the idea of having a cen-
tralised IoT secure hub that mediates the communications between IoT devices in a home,
and the Internet [2137]. One of the problems of relying on an external device to mediate
IoT communications is that the connections between IoT device and the cloud servers may
be encrypted, and therefore this hub will need to make security decisions with encrypted
traffic [2138]. On the other hand, end-to-end encrypted communications can also prevent
consumers from auditing their IoT devices to make sure they are not violating their privacy
expectations. One option to address this problem is to ask the vendor of the IoT device to
disclose their key (and rotate their key) to a trusted third party (called “auditor”) that can
decrypt and show the results to the owners of the data [2139].
In short, the proliferation of vulnerable IoT devices is raising new security and privacy concerns,
while making IoT devices attractive to attackers. Insecurities in these devices range from
insecure-by-design implementations (e.g., devices that have backdoors for troubleshooting)
to their inability to apply software updates to patch vulnerable firmware. One of the biggest
problems for improving the security of IoT and CPSs is that market forces do not incentivise
vendors to compete for better security. In the next section we will discuss the causes of this
lack of security and some potential solutions.

21.4 POLICY AND POLITICAL ASPECTS OF CPS SECURITY


[2124, 2140, 2141]
In this final section of the paper we summarise some of the industry- and government-led
efforts to try to improve the security of CPSs, and how to leverage the new field of CPS security
for attacks and wars.

21.4.1 Incentives and Regulation


Most industries in the CPS domain have rarely seen attacks sabotaging their physical process,
in part because CPS attacks are hard to monetise by criminals. In addition to being rare,
attacks on CPSs are not openly reported, and this lack of actuarial data leads to low quality
risk estimates; as the US Department of Energy (DoE) stated in their Energy Delivery Systems
Cyber Security Roadmap [2142]: “Making a strong business case for cyber security investments
is complicated by the difficulty of quantifying risk in an environment of (1) rapidly changing,
(2) unpredictable threats, (3) with consequences that are hard to demonstrate.”
In summary, market incentives alone are insufficient to improve the security posture of CPSs,
and as a result, our CPS infrastructures remain fairly vulnerable to computer attacks and
with security practices that are decades behind the current security best practices used in
enterprise IT domains. This market failure for improving the security of CPSs has resulted in
several calls for government intervention [2143, 2144, 2145].
Regulation: Mandating cyber security standards that the CPS industries have to follow is a
possible government intervention, and there is some precedent for this idea. Before 2003, the
North American Electric Reliability Corporation (NERC) merely suggested standards to the
power systems operators in the US but after the August 2003 blackout, regulations that were
once optional are now mandatory [2049]. However, CPS industries have pushed back against
regulation, arguing that regulations (e.g., mandating compliance to specific security standards)

KA Cyber-Physical Systems Security | July 2021 Page 734


The Cyber Security Body Of Knowledge
www.cybok.org

will stifle innovation, and that more regulation tends to create a culture of compliance instead
of a culture of security.
Some states in the US are starting to take regulation into their hands; for example, the recently
proposed California Senate Bill SB-327 will make California the first state in the US with an
IoT cyber security law—starting in 2020, any manufacturer of a device that connects “directly
or indirectly” to the Internet must equip it with “reasonable” security features, designed to
prevent unauthorised access, modification, or information disclosure.
The European Union Agency for cyber security proposed the EU Network and Information
Security directive [2146] as the first piece of EU-wide cyber security legislation, where operators
of essential services such as those outlined in this KA have to comply with these new sets of
standards.
Another alternative to imposing regulation broadly, is to use the governments’ “power of the
purse” by mandating cyber security standards only to companies that want to do business
with the government. The goal would be that once the best security practices are developed
to meet the standards for working with the government, then they will spread to other markets
and products. This approach is a reasonable balance between incentives and regulation. Only
CPS and IoT vendors working with the Federal government will have to follow specific security
standards, but once they are implemented, the same security standards will benefit other
markets where they reuse the technologies.
One of the notable exceptions to the lack of regulation is the nuclear energy industry. Because
of the highly safety-critical nature of this industry, nuclear energy is highly regulated in general,
and in cyber security standards in particular, with processes such as the Office for Nuclear
Regulation (ONR) Security Assessment Principles in the UK [2147].
Incentives: A complementary way to nudge companies to improve their cyber security posture
is for governments to nurture a cyber-insurance market for CPS protection. So, instead of
asking companies to follow specific standards, governments would demand firms to have
cyber-insurance for their operations [2148, 2149, 2150, 2151]. There is a popular view that under
certain conditions, the insurance industry can incentivise investments in protection [2152]. The
idea is that premiums charged by the insurance companies would reflect the cyber security
posture of CPS companies; if a company follows good cyber security practices, the insurance
premiums would be low, otherwise, the premiums would be very expensive (and this would in
principle incentivise the company to invest more in cyber security protections). It is not clear
if this cyber-insurance market will grow organically, or if it would need to be mandated by the
government.
It is unclear if government incentives to improve security in CPSs will require first a catastrophic
cyber-attack, but it appears that, in the future, the choice will no longer be between government
regulation and no government regulation, but between smart government regulation and stupid
regulation [2140].

KA Cyber-Physical Systems Security | July 2021 Page 735


The Cyber Security Body Of Knowledge
www.cybok.org

21.4.2 Cyber-Conflict
Computer networks enable an extension to the way we interact with others, and any conflict
in the real-world, will have its representation in cyberspace; including (cyber-)crime, activism,
bullying, espionage, and war [1916].
Cybercriminals compromise computers anywhere they can find them (even in control sys-
tems). These attacks may not be targeted (i.e., they do not have the intention of harming
control systems), but may cause negative side effects: control systems infected with malware
may operate inappropriately. The most famous non-targeted attack on control systems oc-
curred in 2003, when the Slammer worm affected the computerised safety monitoring system
at the Davis-Besse nuclear power plant in the US. While the plant was not connected to the
Internet, the worm entered the plant network via a contractor’s infected computer connected by
telephone directly to the plant’s network, thereby bypassing the firewall [1954]. A more recent
example of a non-targeted attack occurred in 2006, when a computer system that managed
the water treatment operations of a water filtering plant near Harrisburgh Pensylvania, was
compromised and used to send spam and redistribute illegal software [1955]. More recently,
ransomware has also been used to attack CPSs, like the attack on the Austrian hotel [1961],
where guests were unable to get their room keys activated until the hotel paid the ransom.
Disgruntled employees are a major source of targeted computer attacks against control
systems [780, 1960, 1963]. These attacks are important from a security point of view because
they are caused by insiders: individuals with authorised access to computers and networks
used by control systems. So, even if the systems had proper authentication and authorisation,
as well as little information publicly available about them, attacks by insiders would still be
possible. Because disgruntled employees generally act alone, the potential consequences
of their attacks may not be as damaging as the potential harm caused by larger organised
groups such as terrorists and nation states.
Terrorists, and activists are another potential threat to control systems. While there is no
concrete evidence that terrorists or activists have targeted control systems via cyber-attacks,
there is a growing threat of such an attack in the future.
Nation states are establishing military units with computer security expertise for any future
conflicts. For example, the US established Cyber Command [2153] to conduct full spectrum
operations (offensive capabilities) in 2009, and several other countries also announced similar
efforts around the same time. The role of computer networks in warfare has been a topic of
academic discussion since 1998 [2154], and CPSs are playing a foundational difference on
how wars are waged, from robotic units and unmanned vehicles supporting soldiers in the
field, to discussions of cyberwar [2155].
In addition to land, air, sea and space, cyberspace is now considered by many nations as an
additional theatre of conflict. International treaties have developed public international law
concerning two main principles in the law of war (1) jus ad bellum the right to wage a war, and
(2) jus in bellum acceptable wartime conduct. Two sources have considered how the law of
war applies to cyberspace [2141]: (1) The Tallinn Manual, and (2) the Koh Speech.
The Tallinn manual is a non-binding study by NATO’s cooperative cyber-defence center of
excellence, on how the law of war applies to cyber conflicts, and the Koh Speech was a
speech given by Harold Koh, a US State Department legal advisor, which explained how the
US interprets international law applied to cyberspace. Both of these sources agree that a key
reason to authorise the use of force (jus ad bellum) as a response to a cyber operation, is

KA Cyber-Physical Systems Security | July 2021 Page 736


The Cyber Security Body Of Knowledge
www.cybok.org

when the physical effects of a cyber-attack are comparable to kinetic effects of other armed
conflicts, for example, when a computer attack triggers a nuclear plant meltdown, opens a dam
upriver, or disables air-traffic control. The argument is that the effects of any of these attacks
are similar to what a missile strike from an enemy would look like. In contrast, when there is
no physical harm, the problem of determining when a cyber-attack can be considered a use of
force by the enemy is unresolved, so cyber-attacks to the financial, or election infrastructure
of a nation may not clear the bar to be considered an act of war.
Once nations are engaged in war, the question is how to leverage computer attacks in a way
that is consistent with acceptable wartime conduct (jus in bellum). The conventional norm is
that attacks must distinguish between military and non-military objectives. Military objectives
can include war-fighting, war-supporting, and war-sustaining efforts. The problem in attacking
critical infrastructures is that some of the infrastructures supporting these efforts are in
dual-use by the military as well as by the civilian population. For example, a large percentage
of military communications in the US use civilian networks at some stage, and the power grid
supports military as well as civilian infrastructures.
Another factor to consider in designing CPS attacks is that the “law of war” in general prohibits
uncontrollable or unpredictable attacks, in particular those that deny the civilian population
of indispensable objects, such as food or water. While physical weapons have a limited
geographical area of impact, cyberweapons can have more uncontrollable side-effects; for
example, worms can replicate and escape their intended target network and infect civilian
infrastructures. Therefore, nations will have to extensively test any cyberweapon to minimise
unpredictable consequences.
In short, any future conflict in the physical world will have enabling technologies in the cyber-
world, and computer attacks may be expected to play an integral part in future conflicts. There
is a large grey area regarding what types of computer attacks can be considered an act of
force, and a future challenge will be to design cyber-attacks that only target military objectives
and minimise civilian side effects. At the same time, attack attribution in cyber-space will
be harder, and nation-states might be able to get away with sabotage operations without
facing consequences. It is a responsibility of the international community to design new legal
frameworks to cover cyber-conflicts, and for nation states to outline new doctrines covering
how to conduct cyber-operations with physical side effects.
Finally, cyberwar is also related to the discussion in the last section about cyber-insurance. For
example, after the NotPetya cyberattack in 2017 [2156], several companies who had purchased
cyber-insurance protections sought to get help from their insurance companies to cover part
of their loses. However, some insurance companies denied the claims citing a war exclusion
which protects insurers from being saddled with costs related to damage from war. Since
then insurers have been applying the war exemption to avoid claims related to digital attacks 2 .
This type of collateral damage from cyber-attacks might be more common in the future, and
presents a challenge for insurance industries in their quest to quantify the risk of correlated
large-scale events.
2
https://www.nytimes.com/2019/04/15/technology/cyberinsurance-notpetya-attack.html

KA Cyber-Physical Systems Security | July 2021 Page 737


The Cyber Security Body Of Knowledge
www.cybok.org

21.4.3 Industry Practices and Standards


We finalise the CPS Security KA by referencing various industry and government efforts for
improving the security of CPSs. There are several industrial and government-led efforts to
improve the security of control systems. One of the most important security standards in this
space started with the Instruction Set Architecture (ISA) standard ISA 99, which later became
a US standard with ANSI 62443 and finally an international cyber security standard for control
systems known as IEC 62443 [2157].
The US National Institute of Standards and Technology (NIST) has guidelines for security
best practices for general IT in Special Publication 800-53. US Federal agencies must meet
NIST SP 800-53, but industry in general (and industry dealing with the US government in
particular) uses these recommendations as a basis for their security posture. To address
the security of control systems in particular, NIST has also published a Guide to Industrial
Control System (ICS) Security [2048], a guideline to smart grid security in NIST-IR 762 [2158],
and a guideline for IoT security and privacy [1973]. Although these recommendations are not
enforceable, they can provide guidance for analysing the security of most utility companies. A
more recent effort is the NIST cyber security framework for protecting critical infrastructure,
which was initiated by an Executive Order from then US President Obama [2159], as an effort
to improve the security posture of critical infrastructures.
Another notable industry-led effort for protecting critical infrastructures is the North American
Electric Reliability Corporation (NERC) cyber security standards for control systems [2050].
NERC is authorised to enforce compliance to these standards, and it is expected that all
electric utilities operating the bulk power system in North America are fully compliant with
these standards.
All of these standards are general and flexible. Instead of prescribing specific technology
solutions, they give a high-level overview of the variety of security technologies available
(e.g., authentication, access control, network segmentation, etc.), and then give a set of
general procedures for protecting systems, starting with (1) gathering data to identify the
attack surface of a given system (this includes a basic network enumeration procedure that
seeks to enumerate all devices and services available in the network of the asset owner),
(2) building a security policy based on the attack surface of the system, and (3) deploy the
security countermeasures, including network segmentation, or network security monitoring.
In addition to these general security standards for control systems, the industries that develop
and maintain specific industrial control protocols, such as those used for SCADA, e.g., IEC 104,
or those in the process industry, e.g., PROFINET, have also released standards and documen-
tation for securing industrial networks. Recall that most of these industrial protocols were
developed before security was a pressing concern for industrial control systems, therefore
the communication links were not authenticated or encrypted. The new standard IEC 62351 is
meant to guide asset owners on how to deploy a secure network to authenticate and encrypt
network links, and other organisations have released similar support, such as, providing secu-
rity extensions for PROFINET3 . Instead (or in addition) to using these end-to-end application
layer security recommendations, some operators might prefer to use lower-layer security
protections of IP networks, including TLS and IPSec.
In the IoT domain, ETSI, the European Standards Organisation developed the first globally-
applicable security standard for consumer IoT. ETSI TS 103 645 establishes a security baseline
for Internet-connected consumer products and provide a basis for future IoT certification. This
3
https://www.profibus.com/download/pi-white-paper-security-extensions-for-profinet/

KA Cyber-Physical Systems Security | July 2021 Page 738


The Cyber Security Body Of Knowledge
www.cybok.org

standard builds closely on the UK’s Code of Practice for Consumer IoT Security [2160]. Another
more specific IoT standard by the Internet Engineering Task Force (IETF) for IoT devices is
the Manufacturer Usage Description (MUD) standard [2161]. The goal of this standard is to
automate the creation of network white lists, which are used by network administrators to
block any unauthorised connection by the device. Other IoT security standards being devel-
oped by the IETF include protocols for communications security, access control, restricting
communications, and firmware and software updates [2162].
All these industry efforts and standards have essentially three goals: (1) create awareness of
security issues in control systems, (2) help operators of control systems and security officers
design a security policy, and (3) recommend basic security mechanisms for prevention
(authentication, access controls, etc), detection, and response to security breaches. For the
most part industry efforts for protecting CPSs are based on the same technical principles
from general Information Technology systems. Therefore, industry best practices are behind
general IT security best practices and the most recent CPS security research discussed in
this KA. We hope that in the next decade CPS security research becomes mature enough to
start having an impact on industry practices.

CONCLUSIONS
As technology continues to integrate computing, networking, and control elements in new
cyber-physical systems, we also need to train a new generation of engineers, computer
scientists, and social scientists to be able to capture the multidisciplinary nature of CPS
security, like transduction attacks. In addition, as the technologies behind CPS security mature,
some of them will become industry-accepted best practices while others might be forgotten. In
2018, one of the areas with greatest momentum is the industry for network security monitoring
(intrusion detection) in cyber-physical networks. Several start-up companies in the US, Europe,
and Israel offer services for profiling and characterising industrial networks, to help operators
better understand what is allowed and what should be blocked. On the other hand, there are
other CPS security research areas that are just starting to be analysed, like the work on attack
mitigation, and in particular, the response to alerts from intrusion detection systems.
We are only at the starting point for CPS security research, and the decades to come will bring
new challenges as we continue to integrate physical things with computing capabilities.

KA Cyber-Physical Systems Security | July 2021 Page 739


The Cyber Security Body Of Knowledge
www.cybok.org

CROSS-REFERENCE OF TOPICS VS REFERENCE MATERIAL

[2163]
Other
21.1 Cyber-Physical Systems and their Security Risks
21.1.1 Characteristics of CPS c1 [1906]
21.1.2 Protections Against Natural Events and Accidents [1907]
21.1.3 Security and Privacy Concerns [1908]
21.2 Crosscutting Security
21.2.1 Preventing Attacks c6,c9 [1973]
21.2.2 Detecting Attacks c18 [1974]
21.2.3 Mitigating Attacks [1975]
21.3 CPS Domains
21.3.1 Industrial Control Systems [2048]
21.3.2 Electric Power Grids c25 [2049, 2050]
21.3.3 Transportation Systems and Autonomous Vehicles c26, c29 [2051, 2052]
21.3.4 Robotics and Advanced Manufacturing [2053]
21.3.5 Medical Devices c27 [2054]
21.3.6 The Internet of Things [1973]
21.4 Policy and Political Aspects of CPS Security
21.4.1 Incentives and Regulation [2140]
21.4.2 Cyber-Conflict [2124, 2141]
21.4.3 Industry Practices and Standards [2048]

KA Cyber-Physical Systems Security | July 2021 Page 740


Chapter 22
Physical Layer and
Telecommunications
Security
Srdjan Čapkun ETH Zurich

741
The Cyber Security Body Of Knowledge
www.cybok.org

INTRODUCTION
This Knowledge Area is a review of the most relevant topics in wireless physical layer security.
The physical phenomenon utilized by the techniques presented in this Knowledge Area is the
radiation of electromagnetic waves. The frequencies considered hereinafter consist of the
entire spectrum that ranges from a few Hertz to frequencies beyond those of visible light
(optical spectrum). This Knowledge Area covers concepts and techniques that exploit the
way these signals propagate through the air and other transmission media. It is organised
into sections that describe security mechanisms for wireless communication methods as
well as some implications of unintended radio frequency emanations.
Since most frequencies used for wireless communication reside in the radio frequency spec-
trum and follow the well-understood laws of radio propagation theory, the majority of this
Knowledge Area is dedicated to security concepts based on physical aspects of radio fre-
quency transmission. The chapter therefore starts with an explanation of the fundamental
concepts and main techniques that were developed to make use of the wireless communi-
cation layer for confidentiality, integrity, access control and covert communication. These
techniques mainly use properties of physical layer modulations and signal propagation to
enhance the security of systems.
After having presented schemes to secure the wireless channel, the Knowledge Area continues
with a review of security issues related to the wireless physical layer, focusing on those aspects
that make wireless communication systems different from wired systems. Most notably, signal
jamming, signal annihilation and jamming resilience. The section on jamming is followed
by a review of techniques capable of performing physical device identification (i.e., device
fingerprinting) by extracting unique characteristics from the device’s (analogue) circuitry.
Following this, the chapter continues to present approaches for performing secure distance
measurements and secure positioning based on electromagnetic waves. Protocols for dis-
tance measurements and positioning are designed in order to thwart threats on the physical
layer as well as the logical layer. Those attack vectors are covered in detail, together with
defense strategies and the requirements for secure position verification.
Then, the Knowledge Area covers unintentional wireless emanations from devices such as
from computer displays and summarises wireless side-channel attacks studied in literature.
This is followed by a review on spoofing of analogue sensors. Unintentional emissions are
in their nature different from wireless communication systems, especially because these
interactions are not structured. They are not designed to carry information, however, they also
make use of—or can be affected by—electromagnetic waves.
Finally, after having treated the fundamental concepts of wireless physical security, this
Knowledge Area presents a selection of existing communication technologies and discusses
their security mechanisms. It explains design choices and highlights potential shortcomings
while referring to the principles described in the earlier sections. Included are examples from
near-field communication and wireless communication in the aviation industry, followed by
the security considerations of cellular networks. Security of global navigation systems and of
terrestrial positioning systems is covered last since the security goals of such systems are
different from communication systems and are mainly related to position spoofing resilience.

KA Physical Layer and Telecommunications Security | July 2021 Page 742


The Cyber Security Body Of Knowledge
www.cybok.org

CONTENT

22.1 PHYSICAL LAYER SCHEMES FOR CONFIDENTIALITY,


INTEGRITY AND ACCESS CONTROL
[1990, 2164, 2165, 2166, 2167, 2168]
Securing wireless networks is challenging due to the shared broadcast medium which makes
it easy for remote adversaries to eavesdrop, modify and block the communication between
devices. However, wireless communication also offers some unique opportunities. Radio
signals are affected by reflection, diffraction, and scattering, all of which contribute to a
complex multi-path behaviour of communicated signals. The channel response, as measured
at the receiver, can therefore be modelled as having frequency and position dependent random
components. In addition, within the short time span and in the absence of interference,
communicating parties will measure highly correlated channel responses. These responses
can therefore be used as shared randomness, unavailable to the adversary, and form a basis
of secure communication.
It should be noted that modern-day cryptography provides many different protocols to assure
the confidentiality, integrity and authenticity of data transmitted using radio signals. If the
communicating parties are associated with each other or share a mutual secret, cryptographic
protocols can effectively establish secure communication by making use of cryptographic
keying material. However, if mere information exchange is not the only goal of a wireless
system (e.g., in a positioning system), or if no pre-shared secrets are available, cryptographic
protocols operating at higher layers of the protocol stack are not sufficient and physical-layer
constructs can be viable solutions. The main physical layer schemes are presented in the
following sections.

22.1.1 Key Establishment based on Channel Reciprocity


The physical-layer randomness of a wireless channel can be used to derive a shared secret.
One of the main security assumptions of physical-layer key establishment schemes is that
the attacker is located at least half a wavelength away from the communicating parties.
According to wireless communication theory, it can be assumed that the attacker’s channel
measurements will be de-correlated from those computed by the communicating parties if
they are at least half a wavelength apart. The attacker will therefore likely not have access
to the measured secret randomness. If the attacker injects signals during the key genera-
tion, the signal that it transmits will, due to channel distortions, be measured differently at
communicating parties, resulting in key disagreement.
Physical layer key establishment schemes operate as follows. The communicating parties
(Alice and Bob) first exchange pre-agreed, non-secret, data packets. Each party then measures
the channel response over the received packets. The key agreement is then typically executed
in three phases.
Quantisation Phase: Alice and Bob create a time series of channel properties that are measured
over the received packets. Example properties include RSSI and the CIR. Any property that
is believed to be non-observable by the attacker can be used. The measured time series are
then quantised by both parties independently. This quantisation is typically based on fixed or

KA Physical Layer and Telecommunications Security | July 2021 Page 743


The Cyber Security Body Of Knowledge
www.cybok.org

dynamic thresholds.
Information reconciliation phase: Since the quantisation phase is likely to result in disagreeing
sequences at Alice and Bob, they need to reconcile their sequences to correct for any errors.
This is typically done leveraging error correcting codes and privacy amplification techniques.
Most schemes use simple level-crossing algorithms for quantisation and do not use coding
techniques. However, if the key derivation uses methods based on channel states whose
distributions are not necessarily symmetric, more sophisticated quantisation methods, such
as approximating the channel fading phenomena as a Gaussian source, or (multi-level) coding
is needed [2165].
Key Verification Phase: In this last phase, communicating parties confirm that they established
a shared secret key. If this step fails, the parties need to restart key establishment.
Most of the research in physical-layer techniques has been concerned with the choice of
channel properties and of the quantisation technique. Even if physical-layer key establishment
techniques seem attractive, many of them have been shown to be vulnerable to active, physi-
cally distributed and multi-antenna adversaries. However, in a number of scenarios where the
devices are mobile, and where the attacker is restricted, they can be a valuable replacement
or enhancement to traditional public-key key establishment techniques.

22.1.2 MIMO-supported Approaches: Orthogonal Blinding, Zero-Forcing

Initially, physical-layer key establishment techniques were proposed in the context of single-
antenna devices. However, with the emergence of MIMO devices and beam-forming, re-
searchers have proposed to leverage these new capabilities to further secure communication.
Two basic techniques that were proposed in this context are orthogonal blinding and zero
forcing. Both of these techniques aim to enable the transmitter to wirelessly send confidential
data to the intended receiver, while preventing the co-located attacker from receiving this
data. Although this might seem infeasible, since as well as the intended receiver, the attacker
can receive all transmitted packets. However, MIMO systems allow transmitters to ’steer’
the signal towards the intended receiver. For beam-forming to be effective, the transmitter
needs to know some channel information for the channels from its antennas to the anten-
nas of the receiver. As described in [2167], these channels are considered to be secret from
the attacker. In Zero-Forcing, the transmitter knows the channels to the intended receiver
as well as to the attacker. This allows the transmitter to encode the data such that it can
be measured at the receiver, whereas the attacker measures nothing related to the data. In
many scenarios, assuming the knowledge of the channel to the attackers is unrealistic. In
Orthogonal Blinding, the transmitter doesn’t know the channel to the attacker, but knows the
channels to the receiver. The transmitter then encodes the data in the way that the receiver
can decode the data, whereas the attacker will receive data mixed with random noise. The
attacker therefore cannot decode the data. In order to communicate securely, the transmitter
and the receiver do not need to share any secrets. Instead, the transmitter only needs to know
(or measure) the channels to the intended receivers. Like physical-layer key establishment
techniques, these techniques have been show to be vulnerable to multi-antenna and physically
distributed attackers. They were further shown to be vulnerable to known-plaintext attacks.

KA Physical Layer and Telecommunications Security | July 2021 Page 744


The Cyber Security Body Of Knowledge
www.cybok.org

22.1.3 Secrecy Capacity


Secrecy capacity is an information-theoretical concept that attempts to determine the maximal
rate at which a wireless channel can be used to transmit confidential information without
relying on higher-layer encryption, even if there is an eavesdropper present. A famous result by
Shannon [2169] says that, for an adversary with unbounded computing power, unconditionally
secure transmission can only be achieved if a one-time-pad cipher is used to encrypt the
transmitted information. However, Wyner later showed that if the attacker’s channel slightly
degrades the information, that is, the channel is noisy, the secrecy capacity can indeed be
positive under certain conditions [2170]. This means it is possible to convey a secret message
without leaking any information to an eavesdropper. Csiszár and Korner extended Wyner’s
result by showing that the secrecy capacity is non-zero, unless the adversary’s channel (wiretap
channel) is less noisy than the channel that carries the message from the legitimate transmitter
to the receiver [2171]. These theoretical results have been refined for concrete channel models
by assuming a certain type of noise (e.g., Gaussian) and channel layout (e.g., SIMO and MIMO).
Researchers have managed to derive explicit mathematical expressions and bounds even
when taking into account complex phenomena such as fading which is present in wireless
channels [2172].
A practical implementation of the concept of secrecy capacity can mainly be achieved using
the two methods described above. Either the communicating parties establish a secret key by
extracting features from the wireless channel (see 22.1.1) or they communicate with each other
using intelligent coding and transmission strategies possibly relying on multiple antennas
(see 22.1.2). Therefore, the study of secrecy capacity can be understood as the information-
theoretical framework for key establishment and MIMO-supported security mechanisms in
the context of wireless communication.

22.1.4 Friendly Jamming


Similar to Orthogonal Blinding, Friendly Jamming schemes use signal interference generated
by collaborating devices to either prevent an attacker from communicating with the protected
device, or to prevent the attacker from eavesdropping on messages sent by protected devices.
Friendly Jamming can therefore be used for both confidentiality and access control. Unlike
Orthogonal Blinding, Friendly Jamming doesn’t leverage the knowledge of the channel to the
receiver. If a collaborating device (i.e., the friendly jammer) wants to prevent unauthorised
communication with the protected device it will jam the receiver of the protected device. If it
wants to prevent eavesdropping, it will transmit jamming signals in the vicinity of the protected
device. Preventing communication with a protected device requires no special assumptions on
the location of the collaborating devices. However, protecting against eavesdropping requires
that the eavesdropper is unable to separate the signals from the protected device from those
originating at the collaborating device. For this to hold, the channel from the protected device
to the attacker should not be correlated to the channel from the collaborating device to the
attacker. To ensure this, the protected device and the collaborating device need to be typically
placed less than half a carrier wavelength apart. This assumption is based on the fact that, in
theory, an attacker with multiple antennas who tries to tell apart the jamming signal from the
target signal requires the two transmitters to be separated by more than half a wavelength.
However, signal deterioration is gradual and it has been shown that under some conditions,
a multi-antenna attacker will be able to separate these signals and recover the transmitted
messages.

KA Physical Layer and Telecommunications Security | July 2021 Page 745


The Cyber Security Body Of Knowledge
www.cybok.org

Friendly jamming was originally proposed for the protection of those medical implants (e.g.,
already implanted pacemakers) that have no abilities to perform cryptographic operations.
The main idea was that the collaborating device (i.e. ’the shield’) would be placed around the
user’s neck, close to the pacemaker. This device would then simultaneously receive and jam
all communication from the implant. The shield would then be able to forward the received
messages to any other authorised device using standard cryptographic techniques.

22.1.5 Using Physical Layer to Protect Data Integrity


Research into the use of physical layer for security is not only limited to the protection of data
confidentiality. Physical layer can also be leveraged to protect data integrity. This is illustrated
by the following scenario. Assuming that two entities (Alice and Bob) share a common radio
communication channel, but do not share any secrets or authentication material (e.g., shared
keys or authenticated public keys), how can the messages exchanged between these entities
be authenticated and how can their integrity be preserved in the presence of an attacker? Here,
by message integrity, we mean that the message must be protected against any malicious
modification, and by message authentication we mean that it should be clear who is the
sender of the message.
One basic technique that was proposed in this context is integrity codes, a modulation scheme
that provides a method of ensuring the integrity (and a basis for authentication) of a message
transmitted over a public channel. Integrity codes rely on the observation that, in a mobile
setting and in a multi-path rich environment, it is hard for the attacker to annihilate randomly
chosen signals.
Integrity codes assume a synchronised transmission between the transmitter and a receiver, as
well as the receiver being aware that it is in the range of the transmitter. To transmit a message,
the sender encodes the binary message using a unidirectional code (e.g., a Manchester code),
resulting in a known ration of 1s and 0s within an encoded message (for Manchester code, the
number of 1s and 0s will be equal). This encoded message is then transmitted using on-off
keying, such that each 0 is transmitted as an absence of signal and each 1 as a random signal.
To decode the message and check its integrity, the receiver simply measures the energy of
the signal. If the energy in a time slot is above a fixed threshold, the bit is interpreted as a 1
and if it is below a threshold, it is interpreted as a 0. If the ratio of bits 1 and 0 corresponds to
the encoding scheme, the integrity of the message is validated. Integrity codes assume that
the receiver knows when the transmitter is transmitting. This means that their communication
needs to be scheduled or the transmitter needs to always be transmitting.

22.1.6 Low probability of intercept and Covert Communication


LPI signals are such signals that are difficult to detect for the unintended recipient. The
simplest form of LPI is communication at a reduced power and with high directionality. Since
such communication limits the range and the direction of communication, more sophisticated
techniques were developed: Frequency Hopping, Direct Sequence Spread Spectrum and
Chirping. In Frequency Hopping the sender and the receiver hop between different frequency
channels thus trying to avoid detection. In Direct Sequence Spread Spectrum the information
signal is modulated with a high rate (and thus high bandwidth) digital signal, thus spreading
across a wide frequency band. Finally, Chirps are high speed frequency sweeps that carry
information. The hopping sequence or chirp sequence constitute a secret shared between

KA Physical Layer and Telecommunications Security | July 2021 Page 746


The Cyber Security Body Of Knowledge
www.cybok.org

receiver and transmitter. This allows the legitimate receiver to recombine the signal while an
eavesdropper is unable to do so.
Covert communication is parasitic and leverages legitimate and expected transmissions
to enable unobservable communication. Typically, such communication hides within the
expected and tolerated deviations of the signal from its nominal form. One prominent example
is embedding of communicated bits within the modulation errors.

22.2 JAMMING AND JAMMING-RESILIENT


COMMUNICATION
[2173, 2174]
Communication jamming is an interference that prevents the intended receiver(s) from suc-
cessfully recognising and decoding the transmitted message. It happens when the jammer
injects a signal which, when combined with the legitimate transmission, prevents the re-
ceiver from extracting the information contained in the legitimate transmission. Jamming
can be surgical and affect only the message preamble thus preventing decoding, or can be
comprehensive and aim to affect every symbol in the transmission.
Depending on their behaviour, jammers can be classified as constant or reactive. Constant
jammers transmit permanently, irrespective of the legitimate transmission. Reactive jammers
are most agile as they sense for transmission and then jam. This allows them to save energy
as well as to stay undetected. Jammer strength is typically expressed in terms of their output
power and their effectiveness as the jamming-to-signal ratio at the receiver. Beyond a certain
jamming-to-signal ratio, the receiver will not be able to decode the information contained in
the signal. This ratio is specific to particular receivers and communication schemes. The main
parameters that influence the success of jamming are transmission power of the jammer
and benign transmitter, their antenna gains, communication frequency, and their respective
distances to the benign receiver. These parameters will determine the jamming-to-signal ratio.
Countermeasures against jamming involve concealing from the adversary which frequencies
are used for communication at which time. This uncertainty forces the adversary to jam a wider
portion of the spectrum and therefore weakens their impact on the legitimate transmission,
effectively reducing the jamming-to-signal ratio. Most common techniques include Chirp,
FHSS and DSSS. Typically, these techniques rely on pre-shared secret keys, in which case we
call the communication ’coordinated’. Recently, to enable jamming resilience in scenarios in
which keys cannot be pre-shared (e.g., broadcast), uncoordinated FHSS and DSSS schemes
were also proposed.

KA Physical Layer and Telecommunications Security | July 2021 Page 747


The Cyber Security Body Of Knowledge
www.cybok.org

Uncoordinated Frequency Hopping (transmitter)


M := m, sig(m), …
1. Fragmentation
M1 M2 M3 Ml
2. Fragment linking
(protects against M1 M2 … Ml
insertion)

3. Packet Encoding (ECC) m1 m2 ml


(protects against
jamming)
Figure 22.1: In UFH, the fragment linking protect against message insertion attack.

22.2.1 4. Repeated transmissionSpread Spectrum Techniques


Coordinated
Coordinated Spread Spectrum techniques are prevalent jamming countermeasures in a num-
ber of civilian and military applications. They are used not only
22 to increase resilience to
jamming, but also to cope with interference from neighboring devices. Spreading is used in
practically all wireless communication technologies, in e.g.,802.11, cellular, Bluetooth, global
satellite positioning systems.
Spread spectrum techniques are typically effective against jammers that cannot cover the
entire communication spectrum at all times. These techniques make a sender spread a
signal over the entire available band of radio frequencies, which might require a considerable
amount of energy. The attacker’s ability to impact the transmission is limited by the achieved
processing gain of the spread-spectrum communication. This gain is the ratio by which
interference can be suppressed relative to the original signal, and is computed as a ratio of the
spread signal radio frequency bandwidth to the un-spread information (baseband) bandwidth.
Spread-spectrum techniques use randomly generated sequences to spread information sig-
nals over a wider band of frequencies. The resulting signal is transmitted and then de-spread
at the receivers by correlating it with the spreading sequence. For this to work, it is essential
that the transmitter and receiver share the same secret spreading sequence. In FHSS, this
sequence is the set of central frequencies and the order in which the transmitter and receiver
switch between them in synchrony. In DSSS, the data signal is modulated with the spreading
sequence; this process effectively mixes the carrier signal with the spreading sequence, thus
increasing the frequency bandwidth of the transmitted signal. This process allows for both
narrow band and wide band jamming to be suppressed at the receiver. Unless the jammer
can guess the spreading code, its jamming signal will be spread at the receiver, whereas
the legitimate transmission will be de-spread, allowing for its detection. The secrecy of the
spreading codes is therefore crucial for the jamming resilience of spread spectrum systems.
This is why a number of civilian systems that use spreading with public spreading codes, such
as the GPS and 802.11b, remain vulnerable to jamming.

22.2.2 Uncoordinated Spread Spectrum Techniques


In broadcast applications and in applications in which communication cannot be anticipated
as scheduled, there is still a need to protect such communication from jamming.
To address such scenarios, uncoordinated spread spectrum techniques were proposed: UFH
and UDSSS. These techniques enable anti-jamming broadcast communication without pre-
shared secrets. uncoordinated frequency hopping relies on the fact that even if the sender
hops in a manner that is not coordinated with the receiver, the throughput of this channel will

KA Physical Layer and Telecommunications Security | July 2021 Page 748


The Cyber Security Body Of Knowledge
www.cybok.org

be non-zero. In fact, if the receiver is broadband, it can recover all the messages transmitted by
the sender. UFH however, introduces new challenges. Given that the sender and the receiver
are not synchronised, and short message fragments transmitted within each hop are not
authenticated, the attacker can inject fragments that make the reassembly of the packets
infeasible. To prevent this, UFH includes fragment linking schemes that make this reassembly
possible even under poisoning.
UDSSS follows the principle of DSSS in terms of spreading the data using spreading sequences.
However, in contrast to anti-jamming DSSS where the spreading sequence is secret and shared
exclusively by the communication partners, in UDSSS, a public set of spreading sequences is
used by the sender and the receivers. To transmit a message, the sender repeatedly selects a
fresh, randomly selected spreading sequence from the public set and spreads the message
with this sequence. Hence, UDSSS neither requires message fragmentation at the sender
nor message reassembly at the receivers. The receivers record the signal on the channel
and despread the message by applying sequences from the public set, using a trial-and-error
approach. The receivers are not synchronised to the beginning of the sender’s message and
thus record for (at least) twice the message transmission time. After the sampling, the receiver
tries to decode the data in the buffer by using code sequences from the set and by applying a
sliding-window protocol.

22.2.3 Signal Annihilation and Overshadowing


Unlike jamming where the primary goal of the attacker is to prevent information from being
decoded at the receiver, signal annihilation suppresses the signal at the receiver by introduc-
ing destructive interference. The attacker’s goal is to insert a signal which cancels out the
legitimate transmitter’s signal at the antenna of the receiver. This typically means that the
attacker will generate a signal identical to the legitimate transmission only with a different
polarity. Jamming attacks typically increase the energy on the channel and thus are more
easily detected than signal annihilation which reduces the energy typically below the threshold
of signal detection.
The goal of overshadowing is similar to jamming and signal annihilation in the sense that
the attacker aims to prevent the receiver from decoding a legitimate signal. However, instead
of interfering with the signal by adding excessive noise to the channel or cancelling out the
signal (i.e., signal annihilation), the attacker emits their own signal at the same time and
overshadows the legitimate signal. As a result, the receiver only registers the adversarial
signal which is often orders of magnitude higher in amplitude than the legitimate signal.
Practical overshadowing attacks were shown to be effective against QPSK modulation [2175]
and more recently against cellular LTE systems [2176].
Malicious signal overshadowing can not only deceive the receiver into decoding different data
than intended, it can also be used to alter any physical properties the receiver may extract
during signal reception, such as angle of arrival or time of arrival. Overshadowing attacks have
been shown to be particularly effective against systems that rely on physical layer properties
including positioning and ranging systems.

KA Physical Layer and Telecommunications Security | July 2021 Page 749


The Cyber Security Body Of Knowledge
www.cybok.org

22.3 PHYSICAL-LAYER IDENTIFICATION


[2177]
Physical-Layer Identification techniques enable the identification of wireless devices by unique
characteristics of their analogue (radio) circuitry; this type of identification is also referred to
as Radio Fingerprinting. More precisely, physical-layer device identification is the process of
fingerprinting the analogue circuitry of a device by analysing the device’s communication at
the physical layer for the purpose of identifying a device or a class of devices. This type of
identification is possible due to hardware imperfections in the analogue circuitry introduced at
the manufacturing process. These imperfections are remotely measurable as they appear in
the transmitted signals. While more precise manufacturing and quality control could minimise
such artefacts, it is often impractical due to significantly higher production costs.
Physical-layer device identification systems aim at identifying (or verifying the identity of)
devices or their affiliation classes, such as their manufacturer. Such systems can be viewed
as pattern recognition systems typically composed of: an acquisition setup to acquire signals
from devices under identification, also referred to as identification signals, a feature extraction
module to obtain identification-relevant information from the acquired signals, also referred
to as fingerprints, and a fingerprint matcher for comparing fingerprints and notifying the ap-
plication system requesting the identification of the comparison results. Typically, there are
two modules in an identification system: one for enrollment and one for identification. During
enrollment, signals are captured either from each device or each (set of) class-representative
device(s) considered by the application system. Fingerprints obtained from the feature extrac-
tion module are then stored in a database (each fingerprint may be linked with some form
of unique ID representing the associated device or class). During identification, fingerprints
obtained from the devices under identification are compared with reference fingerprints stored
during enrollment. The task of the identification module can be twofold: either recognise
(identify) a device or its affiliation class from among many enrolled devices or classes (1:N
comparisons), or verify that a device identity or class matches a claimed identity or class (1:1
comparison).
The identification module uses statistical methods to perform the matching of the fingerprints.
These methods are classifiers trained with Machine Learning techniques during the enrollment
phase. If the module has to verify a 1:1 comparison, the classifier is referred to as binary. It
tries to verify a newly acquired signal against a stored reference pattern established during
enrollment. If the classifier performs a 1:N comparison, on the other hand, it attempts to find
the reference pattern in a data base which best matches with the acquired signal. Often, these
classifiers are designed to return a list of candidates ranked according to a similarity metric
or likelihood that denotes the confidence for a match.

KA Physical Layer and Telecommunications Security | July 2021 Page 750


The Cyber Security Body Of Knowledge
www.cybok.org

22.3.1 Device under Identification


Physical-layer device identification is based on fingerprinting the analogue circuitry of devices
by observing their radio communication. Consequently, any device that uses radio communi-
cation may be subject to physical-layer identification. So far, it has been shown that a number
of devices (or classes of devices) can be identified using physical-layer identification. These
include analogue VHF, Bluetooth, WiFi, RFID and other radio transmitters.
Although what enables a device or a class of devices to be uniquely identified among other
devices or classes of devices is known to be due to imperfections introduced at the manufac-
turing phase of the analogue circuitry, the actual device’s components causing these have
not always been clearly identified in all systems. For example, VHF identification systems are
based on the uniqueness of transmitters’ frequency synthesisers (local oscillators), while in
RFID systems some studies only suggested that the proposed identification system may rely
on imperfections caused by the RFID device’s antennas and charge pumps. Identifying the
exact components may become more difficult when considering relatively-complex devices. In
these cases, it is common to identify in the whole analogue circuitry, or in a specific sub-circuit,
the cause of imperfections. For example, IEEE 802.11 transceivers were identified considering
modulation-related features; the cause of hardware artefacts can be then located in the mod-
ulator subcircuit of the transceivers. Knowing the components that make devices uniquely
identifiable may have relevant implications for both attacks and applications, which makes
the investigation of such components an important open problem and research direction.

22.3.2 Identification Signals


Considering devices communicating through radio signals, that is, sending data according to
some defined specification and protocol, identification at the physical layer aims at extracting
unique characteristics from the transmitted radio signals and to use those characteristics to
distinguish among different devices or classes of devices. We define identification signals as
the signals that are collected for the purpose of identification. Signal characteristics are mainly
based on observing and extracting information from the properties of the transmitted signals,
like amplitude, frequency, or phase over a certain period of time. These time-windows can
cover different parts of the transmitted signals. Mainly, we distinguish between data and non-
data related parts. The data parts of signals directly relate to data (e.g., preamble, midamble,
payload) transmission, which leads to considered data-related properties such as modulation
errors, preamble (midamble) amplitude, frequency and phase, spectral transformations. Non-
data-related parts of signals are not associated with data transmission. Examples include the
turn-on transients, near-transient regions, RF burst signals. These have been used to identify
active wireless transceivers (IEEE 802.11, 802.15.4) and passive transponders (ISO 14443 HF
RFID).
The characteristics extracted from identification signals are called features. Those can be
predefined or inferred. Predefined features relate to well-understood signal characteristics.
Those can be classified as in-specification and out-specification. Specifications are used for
quality control and describe error tolerances. Examples of in-specification characteristics
include modulation errors such as frequency offset, I/Q origin offset, magnitude and phase
errors, as well as time-related parameters such as the duration of the response. Examples of
out-specification characteristics include clock skew and the duration of the turn-on transient.
Differently from predefined features, where the considered characteristics are known in ad-
vance prior to recording of the signals, we say that features are inferred when they are extracted

KA Physical Layer and Telecommunications Security | July 2021 Page 751


The Cyber Security Body Of Knowledge
www.cybok.org

from signals, for example, by means of some spectral transformations such as Fast Fourier
Transform (FFT) or Discrete Wavelet Transform (DWT), without a-priori knowledge of a spe-
cific signal characteristic. For instance, wavelet transformations have been applied on signal
turn-on transients and different data-related signal regions. The Fourier transformation has
also been used to extract features from the turn-on transient and other technology-specific
device responses. Both predefined and inferred features can be subject to further statistical
analysis in order to improve their quality and distinguishing power.

22.3.3 Device Fingerprints


Fingerprints are sets of features (or combinations of features, that are used to identify devices.
The properties that fingerprints need to present in order to achieve practical implementations
are (similar to those of biometrics):
1. Universality. Every device (in the considered device-space) should have the considered
features.
2. Uniqueness. No two devices should have the same fingerprints.
3. Permanence. The obtained fingerprints should be invariant over time.
4. Collectability. It should be possible to capture the identification signals with existing
(available) equipments.
When considering physical-layer identification of wireless devices, we further consider:
5. Robustness. Fingerprints should not be subject, or at least, they should be evaluated with
respect to external environmental aspects that directly influence the collected signal like
radio interference due to other radio signals, surrounding materials, signal reflections,
absorption, etc., as well as positioning aspects like the distance and orientation between
the devices under identification and the identification system. Furthermore, fingerprints
should be robust to device-related aspects like temperature, voltage level, and power
level. Many types of robustness can be acceptable for a practical identification system.
Generally, obtaining robust features helps in building more reliable identification systems.
6. Data-Dependency. Fingerprints can be obtained from features extracted from a specific
bit pattern (data-related part of the identification signal) transmitted by a device under
identification (e.g., the claimed ID sent in a packet frame). This dependency has partic-
ularly interesting implications if the fingerprints can be associated with both devices
and data transmitted by those devices. This might strengthen authentication and help
prevent replay attacks.

KA Physical Layer and Telecommunications Security | July 2021 Page 752


The Cyber Security Body Of Knowledge
www.cybok.org

22.3.4 Attacks on Physical Layer Identification


The large majority of research works have focused on exploring feature extraction and match-
ing techniques for physical-layer device identification. Only recently the security of these
techniques started being addressed. Different studies showed that their identification system
may be vulnerable to hill-climbing attacks if the set of signals used for building the device
fingerprint is not carefully chosen. This attack consists of repeatedly sending signals to the
device identification system with modifications that gradually improve the similarity score be-
tween these signals and a target genuine signal. They also demonstrated that transient-based
approaches could easily be disabled by jamming the transient part of the signal while still
enabling reliable communication. Furthermore, impersonation attacks on modulation-based
identification techniques were developed and showed that low-cost software-defined radios
as well as high end signal generators could be used to reproduce modulation features and
impersonate a target device with a success rate of 50-75%. Modulation-based techniques are
vulnerable to impersonation with high accuracy, while transient-based techniques are likely to
be compromised only from the location of the target device. The authors pointed out that this
is mostly due to presence of wireless channel effects in the considered device fingerprints;
therefore, the channel needed to be taken into consideration for successful impersonation.
Generally, these attacks can be divided into two groups: signal re(P)lay and feature replay
attacks. In a signal replay attack, the attacker’s goal is to observe analogue identification
signals of a target device, capture them in a digital form (digital sampling), and then transmit
(replay) these signals towards the identification system by some appropriate means. The
attacker does not modify the captured identification signals, that is, the analogue signal and
the data payload are preserved. This attack is similar to message replay in the Dolev-Yao
model in which an attacker can observe and manipulate information currently in the air at
will. Unlike in signal replay attacks, where the goal of the attack is to reproduce the captured
identification signals in their entirety, feature replay attack creates, modifies or composes
identification signals that reproduce only the features considered by the identification system.
The analogue representation of the forged signals may be different, but the features should
be the same (or at the least very similar).

22.4 DISTANCE BOUNDING AND SECURE POSITIONING


[2178, 2179, 2180, 2181, 2182, 2183, 2184, 2185]
Secure distance measurement (i.e., distance bounding) protocols were proposed to address
the issue of the verification of proximity between (wireless) devices. Their use is broad and
ranges from the prevention of relay attacks to enabling secure positioning.
Securing distance measurement requires secure protocols on the logical layer and a distance
measurement technique resilient to physical layer attacks. To attack distance measurement,
an attacker can exploit both data-layer as well as physical-layer weaknesses of distance mea-
surement techniques and protocols. Data-layer attacks can be, to a large extent, prevented by
implementing distance bounding protocols. However, physical-layer attacks are of significant
concern as they can be executed independently of any higher-layer cryptographic primitive
that is implemented.

KA Physical Layer and Telecommunications Security | July 2021 Page 753


The Cyber Security Body Of Knowledge
www.cybok.org

22.4.1 Distance Bounding Protocols


Secure distance measurement protocols aim at preventing distance shortening and enlarge-
ment attacks. When they only prevent distance shortening, they are also called distance
bounding protocols, where at the end of the protocol a secure upper bound on the distance is
calculated. These protocols are typically executed with different trust assumptions. Devices
measuring the distance (typically named verifier and prover) can be mutually trusted, in which
case the protocol aims at preventing distance manipulation by an external attacker. If one
of the devices, the prover, is untrusted, it will try to manipulate the measured distance. Other
scenarios include the untrusted prover being helped by third parties to cheat on its distance.
Distance bounding literature describes four main types of attacks ’frauds’ corresponding to
the above scenarios: distance fraud, mafia fraud, terrorist fraud and distance hijacking.
First investigations of distance bounding protocols started with the work of Beth and
Desmedt [2179], and by Brands and Chaum [2180]. These protocols, as well as many that
followed, are designed as cryptographic challenge-response protocols with RTT of flight
measurements. One of the key insights of Brands and Chaum was to minimise the processing
at the prover so that the prover cannot cheat on its distance to the verifier. Namely, this
protocol requires that the prover only computes single bit XOR during the time-critical phase
of the protocol. This translates into strong security guarantees as long as the prover cannot
implement a faster XOR than assumed by the verifier. Hancke and Kuhn [2186] proposed an
alternative protocol that uses register selection as a prover processing function. This design
reduces the number of protocols steps by allowing the verifier and the prover to pre-agree on
the nonces that will be used in the protocol exchange. Many protocols followed these two
designs, notably addressing other types of frauds (especially terrorist fraud), as well as the
robustness to message loss, performance in terms of protocol execution time, and privacy of
distance measurement.

22.4.2 Distance Measurement Techniques


Establishing proximity requires estimating the physical distance between two or more wireless
entities. Typically, the distance is estimated either by observing the changes in the signal’s
physical properties (e.g., amplitude, phase) that occur as the signal propagates or by estimat-
ing the time taken for the signal to travel between the entities.
A radio signal experiences a loss in its signal strength as it travels through the medium.
The amount of loss or attenuation in the signal’s strength is proportional to the square of
the distance travelled. The distance between the transmitter and the receiver can therefore
be calculated based on the free space path loss equation. In reality, the signal experiences
additional losses due to its interaction with the objects in the environment which are difficult
to account for accurately. This directly affects the accuracy of the computed distance and
therefore advanced models such as the Rayleigh fading and log-distance path loss models are
typically used to improve the distance estimation accuracy. Bluetooth-based proximity sensing
tags (e.g., Apple iBeacon and passive keyless entry and Start Systems) use the strength of
the received Bluetooth signal also referred to as the Received Signal Strength Indicator (RSSI)
value as a measure of proximity.
Alternatively, the devices can measure the distance between them by estimating the phase
difference between a received continuous wave signal and a local reference signal. The need
for keeping track of the number of whole cycles elapsed is eliminated by using signals of
different frequencies typically referred to as multi-carrier phase-based ranging. Due to their low

KA Physical Layer and Telecommunications Security | July 2021 Page 754


The Cyber Security Body Of Knowledge
www.cybok.org

complexity and low power consumption, phase based ranging is used in several commercial
products.
Finally, the time taken for the radio waves to travel from one point to another can be used
to measure the distance between the devices. In RF-based RTT based distance estimation
the distance d between two entities is given by d = (trx − ttx ) × c, where c is the speed
of light, ttx and trx represent the time of transmission and reception respectively. The mea-
sured time-of-flight can either be one way time-of-flight or a round-trip time-of-flight. One way
time-of-flight measurement requires the clocks of the measuring entities to be tightly synchro-
nised. The errors due to mismatched clocks are compensated in the round-trip time-of-flight
measurement.
The precise distance measurement largely depends on the system’s ability to estimate the
time of arrival and the physical characteristics of the radio frequency signal itself. The ranging
precision is roughly proportional to the bandwidth of the ranging signal. Depending on the
required level of accuracy, time-of-flight based distance measurement systems use either
Impulse-Radio Ultra Wideband (IR-UWB) or Chirp-Spread Spectrum (CSS) signals. IR-UWB
systems provide centimeter-level precision while the precision of CSS systems is of the order
of 1–2m. There are a number of commercially available wireless systems that use chirp and
UWB round-trip time-of-flight for distance measurement today.

22.4.3 Physical Layer Attacks on Secure Distance Measurement


With the increasing availability of low-cost software-defined radio systems, an attacker can
eavesdrop, modify, compose, and (re)play radio signals with ease. This means that the at-
tacker has full control of the wireless communication channel and therefore is capable of
manipulating all messages transmitted between the two entities. In RSSI-based distance
estimation, an attacker can manipulate the measured distance by manipulating the received
signal strength at the verifier. The attacker can simply amplify the signal transmitted by the
prover before relaying it to the verifier. This will result in an incorrect distance estimation at
the verifier. Commercially available solutions claim to secure against relay attacks by simply
reducing or attenuating the power of the transmitted signal. However, an attacker can trivially
circumvent such countermeasures by using higher gain amplifiers and receiving antennas.
Similarly, an attacker can also manipulate the estimated distance between the verifier and the
prover in systems that use the phase or frequency property of the radio signal. For instance, the
attacker can exploit the maximum measurable property of phase or frequency-based distance
measurement systems and execute distance reduction attacks. The maximum measurable
distance, i.e., the largest value of distance dmax that can be estimated using a phase-based
proximity system, directly depends on the maximum measurable phase. Given that the phase
value ranges from 0 to 2π and then rolls over, the maximum measurable distance also rolls over
after a certain value. An attacker can leverage this maximum measurable distance property
of the system in order to execute the distance decreasing relay attack. During the attack, the
attacker simply relays (amplifies and forwards) the verifier’s interrogating signal to the prover.
The prover determines the phase of the interrogating signal and re-transmits a response signal
that is phase-locked with the verifier’s interrogating signal. The attacker then receives the
prover’s response signal and forwards it to the verifier, however with a time delay. The attacker
chooses the time delay such that the measured phase differences reaches its maximum value
of 2 and rolls over. In other words, the attacker was able to prove to the verifier that the prover
is in close proximity (e.g., 1m away) even though the prover was far from the verifier.

KA Physical Layer and Telecommunications Security | July 2021 Page 755


The Cyber Security Body Of Knowledge
www.cybok.org

In Time of Flight (ToF) based ranging systems, the distance is estimated based on the time
elapsed between the verifier transmitting a ranging packet and receiving an acknowledgement
back from the prover. In order to reduce the distance measured, an attacker must decrease
the signal’s round trip time of flight. Based on the implementation, an attacker can reduce the
estimated distance in a time-of-flight based ranging system in more than one way. Given that
the radio signals travel at a speed of light, a 1 ns decrease in the time estimate can result in a
distance reduction of 30cm.
The first type of attack on time-of-flight ranging leverages the predictable nature of the data
contained in the ranging and the acknowledgement packets. A number of time-of-flight ranging
systems use pre-defined data packets for ranging, making it trivial for an attacker to predict
and generate their own ranging or acknowledgment signal. An attacker can transmit the
acknowledgment packet even before receiving the challenge ranging packet. Several works
have shown that the de-facto standard for IR-UWB, IEEE 802.15.4a does not automatically
provide security against distance decreasing attacks. In [2187] it was shown that an attacker
can potentially decrease the measured distance by as much as 140 meters by predicting the
preamble and payload data with more than 99% accuracy even before receiving the entire
symbol. In a ’Cicada’ attack, the attacker continuously transmits a pulse with a power greater
than that of the prover. This degrades the performance of energy detection based receivers,
resulting in reduction of the distance measurements. In order to prevent such attacks it is
important to avoid predefined or fixed data during the time critical phase of the distance
estimation scheme.
In addition to having the response packet dependent on the challenge signal, the way in
which these challenge and response data are encoded in the radio signals affects the security
guarantees provided by the ranging or localisation system. An attacker can predict the bit
(early detect) even before receiving the symbol completely. Furthermore, the attacker can
leverage the robustness property of modern receivers and transmit arbitrary signal until the
correct symbol is predicted. Once the bit is predicted (e.g., early-detection), the attacker stops
transmitting the arbitrary signal and switches to transmitting the bit corresponding to the
predicted symbol, i.e., the attacker ’commits’ to the predicted symbol, commonly known as
late commit. In such a scenario, the attacker needn’t wait for the entire series of pulses to be
received before detecting the data being transmitted. After just a time period, the attacker
would be able to correctly predict the symbol.
As described previously, round-trip time-of-flight systems are implemented either using chirp or
impulse radio ultrawideband signals. Due to their long symbol lengths, both implementations
have been shown to be vulnerable to early-detect and late-commit attacks. In the case of
chirp-based systems, an attacker can decrease the distance by more than 160 m and in some
scenarios even up to 700 m. Although IR-UWB pulses are of short duration (typically 2–3 ns
long), data symbols are typically composed of a series of UWB pulses. Furthermore, IEEE
802.15.4a IR-UWB standard allows long symbol lengths ranging from 32 ns to as large as 8µs.
Therefore, even the smallest symbol length of 32 ns allows an attacker to reduce the distance
by as much as 10 m by performing early-detect and late-commit attacks. Thus, it is clear that
in order to guarantee proximity and secure a wireless proximity system against early detect
and late-commit attacks, it is necessary to keep the symbol length as short as possible.
Design of a physical layer for secure distance measurement remains an open topic. However,
research so far has yielded some guiding principles for its design. Only radio RTT with single-
pulse or multi-pulse UWB modulation has been shown to be secure against physical layer
attacks. As a result, the IEEE 802.15.4z working group started the standardization of a new

KA Physical Layer and Telecommunications Security | July 2021 Page 756


The Cyber Security Body Of Knowledge
www.cybok.org

x
V1 V2
y

P P

V3

Figure 22.2: If the computed location of the prover is in the verification triangle, the verifiers
conclude that this is a correct location. To spoof the position of prover inside the triangle, the
attacker would need to reduce at least one of the distance bounds.

physical layer for UWB secure distance measurement.


The first attempt at formalizing the requirements for secure distance measurement based
on the Time of Arrival (ToA) of transmitted messages can be found in [2185]. Said work
presents a formal definition of Message Time of Arrival Codes (MTACs), the core primitive in
the construction of systems for secure ToA measurement. If implemented correctly, MTACs
provide the ability to withstand reduction and enlargement attacks on distance measurements.
It is shown that systems based on UWB modulation can be implemented such that the stated
security requirements are met and therefore constitute examples of MTAC schemes.

22.4.4 Secure Positioning


Secure positioning systems allow positioning anchors (also called verifiers) to compute the
correct position of a node (also called the prover) or allow the prover to determine its own
position correctly despite manipulations by the attacker. This means that the attacker cannot
convince the verifiers or the prover that the prover is at a position that is different from its
true position. This is also called spoofing-resilience. A related property is the one of secure
position verification which means that the verifiers can verify the position of an untrusted
prover. It is generally assumed that the verifiers are trusted. No restrictions are posed on the
attacker as it fully controls the communication channel between the provers and the verifiers.
The analysis of broadcast positioning techniques, such as GNSS has shown that such tech-
niques are vulnerable to spoofing if the attacker controls the signals at the antenna of the
GNSS receiver.
These type of approaches have been proposed to address this issue: Verifiable Multilateration
and Secure Positioning based on Hidden Stations.
Verifiable Multilateration relies on secure distance measurement / distance bounding. It
consists of distance bound measurements to the prover from at least three verifiers (in 2D)
and four verifiers (in 3D) and of subsequent computations performed by the verifiers or
by a central system. Verifiable Multilateration has been proposed to address both secure
positioning and position verification. In the case of secure positioning, the prover is trusted
and mafia-fraud-resilient distance bounding is run between the prover and each of the verifiers.
The verifiers form verification triangles / triangular pyramids (in 3D) and verify the position of
the prover within the triangle / pyramid. For the attacker to spoof a prover from position P to
P’ within a triangle/pyramid, the attacker would need to reduce at least one of the distance
bounds that are measured to P. This follows from the geometry of the triangle/pyramid. Since
Distance bounding prevents distance reduction attacks, Verifiable Multilateration prevents
spoofing attacks within the triangle/pyramid. The attacker can only spoof P to P’ that is

KA Physical Layer and Telecommunications Security | July 2021 Page 757


The Cyber Security Body Of Knowledge
www.cybok.org

outside of the triangle/pyramid, causing the prover and the verifiers to reject the computed
position. Namely, the verifiers and the prover only accept the positions that are within the area
of coverage, defined as the area covered by the verification triangles/pyramids. Given this,
when the prover is trusted, Verifiable Multilateration is resilient to all forms of spoofing by the
attacker. Additional care needs to be given to the management of errors and the computation
of the position when distance measurement errors are taken into account.
When used for position verification, Verifiable Multilateration is run with an untrusted prover.
Each verifier runs a distance-fraud resilient distance bounding protocol with the prover. Based
on the obtained distance bounds, the verifiers compute the provers’ position. If this position
(within some distance and position error bounds) falls within the verification triangle/pyramid,
the verifiers accept it as valid. Given that the prover is untrusted, it can enlarge any of the
measured distances, but cannot reduce them since this is prevented by the use of distance
bounding protocols. Like in the case of secure positioning, the geometry of the triangle/pyramid
then prevents the prover from claiming a false position. Unlike in the case of secure positioning,
position verification is vulnerable to cloning attacks, in which the prover shares its key to its
clones. These clones can then be strategically placed to the verifiers and fake any position
by enlarging distances to each individual verifier. This attack can be possibly addressed by
tamper resistant hardware or device fingerprinting.
Another approach to secure positioning and position verification is to prevent the attacker
from deterministically spoofing the computed position by making the positions of the verifiers
unpredictable for the attacker (either a malicious prover or an external attacker). Verifier
positions can therefore be hidden or the verifiers can be mobile. When the verifiers are hidden
they should only listen to the beacons sent by the nodes to not disclose their positions. Upon
receiving the beacons, the base stations compute the nodes location with TDOA and check if
this location is consistent with the time differences.

22.5 COMPROMISING EMANATIONS AND SENSOR


SPOOFING
[1953, 1998, 1999, 2188, 2189, 2190, 2191, 2192, 2193]
Electronic devices emit electromagnetic waves in the form of radio and audio signals, produce
heat and create vibration, all of which could correlate with confidential information that the
devices process or store. Such emanations, or more generally referred to as side channels,
are prevalent and have been extensively studied.
Remote sensor spoofing is the (physical) opposite of compromising emanations. Instead of
eavesdropping on electromagnetic leakage, an attacker injects signals that spoof the value
measured by a sensor or receiver and thereby (adversely) affects the system relying on the
sensor readings and measurements. This is particularly critical in autonomous and other
cyber-physical systems that have direct consequences on the safety of the surrounding people
and infrastructure.

KA Physical Layer and Telecommunications Security | July 2021 Page 758


The Cyber Security Body Of Knowledge
www.cybok.org

22.5.1 Compromising Emanations


In the military context, techniques for exploiting and protecting against unwanted emission
in communication systems date back to World War II and have over the time have been
collected in an umbrella-term called TEMPEST. The first public demonstration of low-cost
attacks on commercial systems using compromising emanations was done in 1985 by Wim
van Eck [2194]. This attack demonstrated that information displayed on CRT monitors can
be successfully eavesdropped from a distance of hundreds of meters. This demonstration
prompted research into the sources of such emanations as well as into protective measures.
It also highlighted that not only radio emissions leak information. In general, there are four
categories of such emanations: acoustic, optical, thermal, and electromagnetic.
Detailed studies of the sources and features that lead to such compromises have been
carried out over the years, and on multiple occasions, it was demonstrated that compromising
emanations from analogue and digital displays resulted from information being transmitted
through analogue video cables and through high-speed Digital Serial Interface (DVI) cables.
However, more recent works show that such emanations are not restricted to cables and, to
aggravate the situation, compromising emissions are not necessarily caused by analogue or
digital displays only.
Some attacks described in research showed that high-frequency sounds caused by vibration of
electronic components (capacitors and coils) in the computer’s voltage regulation circuit can
be used to infer prime factors and therefore derive RSA encryption keys. Sounds emanating
from key presses on a keyboard were used to infer what a user is typing. The resulting
vibrations can, for instance, be sensed by the accelerometer of a phone located nearby. Finally,
reflections from different objects in the vicinity of computer screens, such as spoons, bottles
and user’s retina were used to infer information show on a display.
The increasing availability of phones that integrate high quality sensors, such as cameras,
microphones and accelerometers makes it easier to mount successful attacks since no
dedicated sensor equipment needs to be covertly put in place.
To avoid unwanted signal emissions, devices can be held at a distance, can be shielded and
signals that are transmitted should be filtered in order to remove high-frequency components
that might reflect switching activity in the circuitry. Moreover, it is generally advised to place a
return wire close to the transmission wire in order to avoid exploitation of the return current.
In general, wires and communication systems bearing confidential information should be
separated (air-gapped) from non-confidential systems.

22.5.2 Sensor Compromise


Analogue sensors have been shown to be particularly vulnerable to spoofing attacks. Similar to
compromising emanations, sensor spoofing depends on the type of the physical phenomena
the sensor captures. It can be acoustic, optical, thermal, mechanic or electromagnetic.
Nowadays, many electronic devices, including self-driving cars, medical devices and closed-
loop control systems, feature analogue sensors that help observe the environment and make
decisions in a fully autonomous way. These systems are equipped with sophisticated pro-
tection mechanisms to prevent unauthorised access or compromise via the device’s com-
munication interfaces, such as encryption, authentication and access control. Unfortunately,
when it comes to data gathered by sensors, the same level of protection is often not available
or difficult to achieve since adversarial interactions with a sensor can be hard to model and

KA Physical Layer and Telecommunications Security | July 2021 Page 759


The Cyber Security Body Of Knowledge
www.cybok.org

predict. As a result, unintentional and especially intentional EMI targeted at analogue sensors
can pose a realistic threat to any system that relies on readings obtained from an affected
sensor.
EMI has been used to manipulate the output of medical devices as well as to compromise
ultrasonic ranging systems. Research has shown that consumer electronic devices equipped
with microphones are especially vulnerable to the injection of fabricated audio signals [1999].
Ultrasonic signals were used to inject silent voice commands, and acoustic waves were used
to affect the output of MEMS accelerometers. Accelerometers and intertial systems based
on MEMS are, for instance, used extensively in (consumer-grade) drones and multi-copters.
Undoubtedly, sensor spoofing attacks have gained a lot of attention and will likely impact
many future cyber-physical devices. System designers therefore have to take great care
and protect analogue sensors from adversarial input as an attacker might trigger a critical
decision on the application layer of such a device by exposing it to intentional EMI. Potential
defence strategies include, for example, (analogue) shielding of the devices, measuring signal
contamination using various metrics, or accommodating dedicated EMI monitors to detect
and flag suspicious sensor readings.
A promising strategy that follows the approach of quantifying signal contamination to detect
EMI sensor spoofing is presented in [2193]. The sensor output can be turned on and off
according to a pattern unknown to the attacker. Adversarial EMI in the wires between sensor
and the circuitry converting the reading to a digital value, i.e., the ADC, can be detected
during the times the sensor is off since the sensor output should be at a known level. In case
there are fluctuations in the readings, an attack is detected. Such an approach is thought
to be especially effective when used to protect powered or non-powered passive sensors.
It has been demonstrated to successfully thwart EMI attacks against a microphone and a
temperature sensor system. The only modification required is the addition of an electronic
switch that can be operated by the control unit or microcontroller to turn the sensor on and off.
A similar sensor spoofing detection scheme can be implemented for active sensors, such as
ultrasonic and infrared sensors, by incorporating a challenge-response like mechanism into
the measurement acquisition process [2195]. An active sensor often has an emitting element
and a receiving element. The emitter releases a signal that is reflected and captured by the
receiver. Based on the properties of the received signal, the sensor can infer information about
the entity or the object that reflected the signal. The emitter can be turned off randomly and
during that time the receiver should not be able to register any incoming signal. Otherwise, an
attack is detected and the sensor reading is discarded.

22.6 PHYSICAL LAYER SECURITY OF SELECTED


COMMUNICATION TECHNOLOGIES
[2196, 2197, 2198, 2199]
This section presents security mechanisms of a selection of existing wireless communication
techniques that are in use today. The main focus is on physical-layer security constructs
as well as any lack thereof. The communication techniques that are discussed in detail are
near-field communication, air traffic communication networks, cellular networks and global
navigation satellite systems.

KA Physical Layer and Telecommunications Security | July 2021 Page 760


The Cyber Security Body Of Knowledge
www.cybok.org

22.6.1 Near-field communication (NFC)


Near-field communication commonly refers to wireless communication protocols between two
small (portable) electronic devices. The standard is used for contact-less payment and mobile
payment systems in general. NFC-enabled devices can also exchange identity information,
such as keycards, for access control, and negotiate parameters to establish a subsequent
high-bandwidth wireless connection using more capable protocols.
NFC is designed to only transmit and receive data to a distance of up to a few centimeters.
Even if higher-layer cryptographic protocols are used, vanilla NFC protocols do not offer secure
communication and can not guarantee that two communicating devices are indeed only a
short distance apart. NFC is vulnerable to eavesdropping, man-in-the-middle attacks and
message relay attacks.
Even nowadays, standard NFC is deployed in security-critical contexts due to the assumption
that communicating devices are in close proximity. Research has shown, however, that this
assumption can not be verified reliably using NFC protocols. The distance can be made almost
arbitrarily large by relaying messages between NFC-enabled devices. The attack works as
follows: The benign NFC devices are made to believe that they are communicating with each
other, but they are actually exchanging data with a modified smartphone. An adversary can
strategically place a smartphone next to each benign NFC device while the smartphones
themselves use a communication method that can cover long distances, such as WiFi. They
simply forward the messages the benign devices are sending to each other. Such an attack is
also referred to as a wormhole attack where communicating parties are tricked into assuming
that they are closer than they actually are. This is a problem that cannot be solved using
techniques on the logical layer or on the data layer.
Obviously, most of the described attacks can be mitigated by shielding the NFC devices or
enhance the protocol with two-factor authentication, for example. Such mechanisms unfortu-
nately transfer security-relevant decisions to the user of an NFC system. Countermeasures
that do not impose user burden can roughly be categorised into physical layer methods and
the augmentation with context- or device-specific identifiers [2196].
Protocol augmentation entails context-aware NFC devices that incorporate location infor-
mation into the NFC system to verify proximity. The location sensing can be implemented
with the help of a variety of different services, each with its own accuracy and granularity.
Conceivable are, for instance, GNSS/GPS based proximity verification or leveraging the cell-ID
of the base station to which the NFC device is currently closest in order to infer a notion of
proximity.
Physical layer methods that have been suggested in research literature are timing restrictions
and distance bounding. Enforcing strict timing restraints on the protocol messages can be
understood as a crude form of distance bounding. As discussed in Section 4.1, distance
bounding determines an upper bound on the physical distance between two communicating
devices. While distance bounding is considered the most effective approach, it still remains to
be shown if secure distance bounding can be implemented in practice for small NFC-enabled
devices.

KA Physical Layer and Telecommunications Security | July 2021 Page 761


The Cyber Security Body Of Knowledge
www.cybok.org

22.6.2 Air Traffic Communication Networks


Throughout different flight phases commercial and non-commercial aviation uses several
wireless communication technologies to exchange information with aviation authorities on
the ground as well as between airborne vehicles. Often legacy systems are still in use and
security has never been part of the design of such systems.
While new proposals suggest to overhaul these systems and to tightly integrate security
measures into the data layer, such as encryption and message authentication, air traffic
communication networks are not only used for information transmission, but also to extract
physical layer features from the signal in order to perform aircraft location positioning.
A prominent example is ADS-B. An ADS-B transponder periodically (or when requested) broad-
casts the aircraft’s position information, such as coordinates, that have been obtained through
an on-board GNSS receiver. Most versions of ADS-B only support unauthenticated messages
and therefore, this technology suffers from active and passive attacks, i.e., eavesdropping,
modifying, injecting and jamming messages. It is, for instance, possible to prevent an aircraft’s
location from being tracked by Air Traffic Control (ATC) by simply jamming the respective mes-
sages. Similarly, an adversary could create ghost planes by emitting fabricated transponder
messages. A sophisticated attacker could even fully distort the view ATC has on its airspace.
Multilateration (MLAT) can be seen as a technology that mitigates some of the shortcomings
of unauthenticated ADS-B and is therefore usually deployed in conjunction with ADS-B. MLAT
does not rely on the transmitted information encapsulated in the message, but makes use of
the physical and geometrical constellation between the transmitter (i.e., transponder of the air-
craft) and several receivers. MLAT systems extract physical layer properties from the received
messages. The time of arrival of a message is recorded at different co-located receivers and,
using the propagation speed of the signal, the location of the aircraft’s transponder can be
estimated. Multilateration techniques infer the aircraft’s location even if the contents of the
ADS-B messages are incorrect and thus MLAT provides a means to crosscheck the location
information disseminated by the aircraft’s transponder.
Although MLAT offers additional security based on physical layer properties, a distributed
adversary can still manipulate ADS-B messages. In addition to altering the location information,
an attacker can modify or inject signals that affect the time-of-arrival measurement at the
receivers. If the attacker has access to multiple distributed antennas and is able to coordinate
adversarial signal emission precisely, attacks similar to those on standard ADS-B are feasible.
However, the more receivers used to record the signals, the more difficult such attacks
become. Unfortunately, MLAT is not always an effective solution in aviation as strategic
receiver placement is crucial and time of arrival calculations can be susceptible to multi-path
interference [2197].

KA Physical Layer and Telecommunications Security | July 2021 Page 762

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy