Nist SP 800-233
Nist SP 800-233
NIST SP 800-233
Ramaswamy Chandramouli
Zack Butcher
James Callaghan
Ramaswamy Chandramouli
Computer Security Division
Information Technology Laboratory
Zack Butcher
Tetrate, Inc.
James Callaghan
control-plane.io, Inc.
October 2024
Certain commercial equipment, instruments, software, or materials, commercial or non-commercial, are identified
in this paper in order to specify the experimental procedure adequately. Such identification does not imply
recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or
equipment identified are necessarily the best available for the purpose.
There may be references in this publication to other publications currently under development by NIST in
accordance with its assigned statutory responsibilities. The information in this publication, including concepts and
methodologies, may be used by federal agencies even before the completion of such companion publications.
Thus, until each publication is completed, current requirements, guidelines, and procedures, where they exist,
remain operative. For planning and transition purposes, federal agencies may wish to closely follow the
development of these new publications by NIST.
Organizations are encouraged to review all draft publications during public comment periods and provide feedback
to NIST. Many NIST cybersecurity publications, other than the ones noted above, are available at
https://csrc.nist.gov/publications.
Authority
This publication has been developed by NIST in accordance with its statutory responsibilities under the Federal
Information Security Modernization Act (FISMA) of 2014, 44 U.S.C. § 3551 et seq., Public Law (P.L.) 113-283. NIST is
responsible for developing information security standards and guidelines, including minimum requirements for
federal information systems, but such standards and guidelines shall not apply to national security systems
without the express approval of appropriate federal officials exercising policy authority over such systems. This
guideline is consistent with the requirements of the Office of Management and Budget (OMB) Circular A-130.
Nothing in this publication should be taken to contradict the standards and guidelines made mandatory and
binding on federal agencies by the Secretary of Commerce under statutory authority. Nor should these guidelines
be interpreted as altering or superseding the existing authorities of the Secretary of Commerce, Director of the
OMB, or any other federal official. This publication may be used by nongovernmental organizations on a voluntary
basis and is not subject to copyright in the United States. Attribution would, however, be appreciated by NIST.
Publication History
Approved by the NIST Editorial Review Board on 2024-10-11
Contact Information
sp800-233-comments@nist.gov
Additional Information
Additional information about this publication is available at https://csrc.nist.gov/pubs/sp/800/233/final, including
related content, potential updates, and document history.
All comments are subject to release under the Freedom of Information Act (FOIA).
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Abstract
The service mesh has become the de facto application services infrastructure for cloud-native
applications. It enables the various runtime functions of an application through proxies that
form the data plane of the service mesh. Depending on the distribution of the network layer
functions and the granularity of association of the proxies to individual services and computing
nodes, different proxy models or data plane architectures have emerged. This document
describes a threat profile for each of the data plane architectures with a detailed threat analysis
to make recommendations on their applicability for cloud-native applications with different
security risk profiles.
Keywords
cloud-native application; data plane architecture; proxy model; service mesh; threat profile.
i
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
ii
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Table of Contents
Executive Summary............................................................................................................................1
1. Introduction ...................................................................................................................................2
2. Typical Service Mesh Data Plane Capabilities and Associated Proxy Functions .................................5
3. Proxy Models (Data Plane Architectures) in Service Mesh Implementations ....................................7
iii
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
iv
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
References.......................................................................................................................................33
v
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Acknowledgments
The authors would like to express their thanks to Francesco Beltramini of control-plane.io for
participating in discussions and providing his valuable perspective. The authors would also like
to express their thanks to Isabel Van Wyk of NIST for her detailed editorial review, both for the
public comment version as well as for the final publication
vi
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Executive Summary
A centralized infrastructure called a service mesh can provide run-time services for cloud-native
applications that consist of multiple loosely coupled components called microservices. These
services include secure communication, service discovery, resiliency, and authorization of
application communication. These services are mainly provided through proxies that form the
data plane of the service mesh, which is the layer that handles application traffic at runtime
and enforces policy.
The functions that the proxies provide can be broadly categorized into two groups based on the
Open Systems Interconnection (OSI) model’s network layer to which those functions pertain:
Layer 4 (“L4”) and Layer 7 (“L7”). In most service mesh deployments in production
environments today, all proxy functions that provide services in both L4 and L7 layers are
packed into a single proxy that is assigned to a single microservice. This service mesh proxy
model is called a sidecar proxy model since the proxy is not only associated with a single service
but is implemented to execute in the same network space as the service.
However, performance and resource considerations have led to the exploration of alternate
proxy models that involve splitting L4 and L7 functions into different proxies and the
association or assignments of these proxies to either a single service or a group of services. This
enables the proxies to be implemented at different locations at the granularity of a node rather
than at the level of services. Though different models are theoretically possible, this document
only considers service mesh proxy models in the data plane implementation of commonly used
service mesh offerings at different stages.
Various potential or likely threats to proxy functions may result in different types of exploits in
different proxy models. This variation is due to several factors, such as the attack surface (i.e.,
communication patterns to which a particular proxy is exposed), the number of clients
(services) served, and the OSI layer functions that they provide (e.g., L7 functions are more
complicated and likely to have more vulnerabilities than L4 functions). The two main
contributions of this document are the following:
1. The nature of the exploits that are possible for each threat in each of the proxy models
is characterized by assigning scores to the impact and likelihood of each of the threats in
each of the proxy models or architectural patterns, resulting in a threat profile that is
associated with each architectural pattern or proxy model of service mesh.
2. Each threat profile has an inherent set of security trade-offs at an architectural level.
The implications of these trade-offs in meeting the requirements associated with the
security risk profiles of different cloud-native applications are analyzed to make a broad
set of recommendations toward specific architectural patterns that are appropriate for
applications with different security risk profiles.
1
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
1. Introduction
“Cloud-native” refers to an architectural philosophy for building scalable, resilient systems that
are designed to leverage the advantages of cloud computing environments. Cloud-native
applications can run both on-premises and in public cloud platforms and are normally built
using agile development methodologies, such as continuous integration/continuous delivery
(CI/CD). Typically, technologies such as containerization and virtual machines (VMs) are used,
and resilience and fail-safe features will be built in.
Microservices-based applications use an architectural approach in which the entire application
is broken into loosely coupled components that can be independently updated and scaled. The
implementation of microservices is enabled using containers that in turn require orchestration
tools and often employ a centralized services infrastructure (e.g., service mesh) to provide all
runtime application services, including network connectivity, security, resiliency, and
monitoring capabilities. Microservices-based applications can be implemented and deployed as
cloud-native, though they represent an independent architectural approach.
The infrastructure services or functions provided by a service mesh during application runtime
are provided by entities called proxies, which constitute the data plane of the service mesh. In
addition, the service mesh consists of another architectural component called the control
plane, which supports the functions of the data plane through interfaces to define
configurations, inject software programs, and provide security artifacts (e.g., certificates).
Various configurations for proxies are being developed and tested based on the performance
and security assurance data obtained during the deployment of service mesh over the last
several years. These configurations are proxy (implementation) models that are based on the
OSI layer functions that they provide (described in the following paragraphs) and the
granularity of association between a proxy and services. Since proxies are the predominant
entities of the data plane of a service mesh, these various proxy models are also called data
plane architectures.
The OSI model [1] is a useful abstraction for thinking about the functions required to serve an
application over the network. It describes seven “layers,” from the physical wires that connect
two machines (i.e., Layer 1 – L1, the physical layer) to the application itself (i.e., Layer 7 – L7,
the application layer).
Layers 3, 4, and 7 are key to facilitating communication between cloud-native applications (e.g.,
two microservices making Hypertext Transfer Protocol (HTTP)/REST calls to each other):
• Layer 3 (“L3”), the network layer, facilitates baseline connectivity between two
workloads or service instances. In nearly all cases, the Internet Protocol (IP) is used as
the L3 implementation.
• Layer 4 (“L4”), the transport layer, facilitates the reliable transmission of data between
workloads on the network. It also includes capabilities like encryption. Transport Control
Protocol (TCP) and User Datagram Protocol (UDP) are commonly used L4
implementations, where transport layer security (TLS) provides encryption.
2
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
• Layer 7 (“L7”), the application layer, is where protocols like HTTP operate — in user
applications themselves (e.g., HTTP web servers, Secure Shell (SSH) servers).
With respect to the layers above, a service mesh’s proxies in cloud-native environments are:
• Agnostic to L3 if the microservice instances can communicate at L3 and the proxy can
communicate with the mesh’s control plane.
• At Layer 4 (L4): Connection establishment, management, and resiliency (e.g.,
connection-level retries); TLS (encryption in transit); application identity, authentication,
and authorization; access policy based on network 5-tuple (e.g., source IP address and
port, destination IP address and port, and transport protocol).
• At Layer 7 (L7): Service discovery, request-level resiliency (e.g., retries, circuit breakers,
outlier detection); and application observability.
3
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
4
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
2. Typical Service Mesh Data Plane Capabilities and Associated Proxy Functions
This document’s methodology examines the security trade-offs of the proxy models (i.e., data
plane architectures) and the implementations of the various capabilities that result as L4 and L7
functions in proxies. Determining the totality of proxy functions requires an analysis of each
capability, the category it falls under, and the granularity of the function that it provides at L4
and L7 levels.
Table 1 - Security Capabilities [15]
5
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
L7 functions that are carried out by proxies are much more complex than L4 functions as the
latter are carried out in lower layers of the OSI stack and involve protocols such as IP and TCP.
For example, parsing a TCP stream for L4 functionality simply requires decoding a fixed set of
bytes as integers (i.e., the packet header), while handling HTTP requests for L7 functionality
requires decoding HTTP headers, including complex string parsing and compression with
variable amounts of data. Additionally, the data dealt with in an L7 function are user-supplied
(i.e., can be controlled by an attacker), while the TCP data at L4 are typically system-supplied as
part of routing a request to the infrastructure. This means that there is less room to embed
malicious data without breaking the system itself. In one case study, the proxy Envoy is used as
the data plane by several service mesh implementations. Historically, majority of Envoy
vulnerabilities have been in L7 function-related code compared to L4 function-related code.
6
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
7
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Fig. 1. Sidecar model —L4 and L7 Proxy per Service Instance (DPA-1)
(The combined L4 and L7 proxy is deployed for each application instance and there is no sharing of any proxy).
8
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
(An L4 proxy is deployed on each node and shared by all applications on that node. A single L7 proxy instance is
deployed on behalf of "Application 3").
9
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
10
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Fig. 4. L4 and L7 as Part of the Application Model (gRPC proxyless Model) (DPA-4)
11
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
12
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
4. Compromised L7 proxy
5. Compromised shared L7 Proxy
6. Outdated client libraries in applications
7. Denial of service (DoS)
8. Resource consumption
9. Privileged L4 Proxy
10. Bypassing traffic interception
13
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
5.1. Threat Analysis for L4 and L7 Proxy per Service Instance (DPA-1) — Sidecar Model
The threats to the data plane of the service mesh are denoted using the mnemonic TR-x, where
TR stands for threat, and x stands for the threat sequence number.
14
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
15
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Impact Score = 1: For noisy neighbors (i.e., other L7 proxies on the same host that are
compromised), the impact is limited by the underlying scheduling and resource constraint
system (e.g., k8s, VM sizing). The following are identical across all architectures: for a shared
ingress gateway, all services exposed on that gateway would be impacted (Impact 2); for a
shared egress gateway, all services that utilize the egress gateway would be impacted (Impact
3). Typically, only a single deployment of egress gateways is used.
Likelihood Score = 1: The sidecar itself is not a shared proxy. By its nature, it is dedicated to an
individual application. In this case, TR-5 refers to noisy neighbors, other proxies on the same
node that cause a denial of service (DoS), and shared ingress or egress gateways. Noisy
neighbors are mitigated based on the degree of isolation of the host (i.e., container versus
micro-VM versus VM). The likelihood of exploiting a shared L7 ingress or egress gateway is the
same across all architectures.
16
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
17
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Likelihood Score = 2: Because the proxy runs in user space in the same control groups as the
application, there are a variety of attacks available that are not relevant or applicable to other
implementations.
18
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
19
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Proxy Function Impacted: A DoS executed at L4 has the same impact as the centralized per-
node model because the L4 process is centralized per node. All applications on the node are
impacted. A DoS executed at L7 impacts all application instances of the target application since
a set of dedicated L7 proxies is deployed per app. The number of proxies that implement L7
functionality is typically less than the number of application instances, making them an easier
target for DoS than every instance of the target application.
Impact Score = 2: An L4 DoS would impact all application instances on the target host.
Likelihood Score = 2: The L4 proxy is deployed once per node, so it presents a better target for
DoS than DPA-1 or DPA-4. This is mitigated somewhat by the simplified functionality of an L4
proxy compared to a combined L4 and L7 proxy. The L7 proxy is shared by multiple instances of
the same application and presents an easier DoS target than the application itself. Therefore, it
is more likely than the sidecar model (DPA-1).
20
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
encapsulated as a CNI provider that runs in a privileged context by default. In a sidecar case,
privilege is only needed at startup to establish traffic interception rules. Depending on the
implementation (e.g., Kubernetes init containers), this can ensure that the privileged user is not
run alongside the application but only during initialization. In all cases, Kubernetes-defined
CAP_NET_ADMIN is typically the only privilege required for mesh data plane functionality.
21
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Likelihood Score = 3: L7 code may be enabled for another server that can be exploited to affect
all applications on the host.
22
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
23
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
configuration (e.g., service discovery) needs to be sent only once to each node rather than to
each and every application instance. Overall, this results in the lowest rate of change, the least
data transferred, and a lower runtime footprint (e.g., RAM, CPU). 1
Impact Score = 1: This has the lowest overall resource utilization of all available architectures.
Likelihood Score = 1: Because there is only a single proxy instance per node, rather than a data
plane instance per service or per service instance, resource consumption should be the lowest
of all out-of-process (i.e., non-gRPC) models.
1 Some implementations do not fully de-dupe configurations (e.g., due to implementation, as a security measure to provide some degree of
isolation) and consume RAM more similarly to a sidecar case than might otherwise appear.
24
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
5.4. Threat Analysis for L4 and L7 as Part of the Application Model (gRPC proxyless Model
(DPA-4))
25
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
probability for vulnerabilities to arise in this part of the stack, as supported by historical CVE
data.
Proxy Function Impacted: Compromising the L7 processing stack results in compromising the
entire application and in more risk of compromise beyond runtime identity and DoS for other
users.
Impact Score = 3: The application itself is compromised, including non-mesh credentials (e.g.,
truncate table users) that are not available if only the proxy is compromised.
Likelihood Score = 3: Since L7 processing code is the application, the surface area is much
larger.
26
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Proxy Function Impacted: A DoS threat to the mesh data plane (L4 or L7) is a threat to the
application itself. In all other respects, it is very similar to the sidecar model.
Impact Score = 1: There is a single instance of a single application, so an attack could be
repeated across all applications (see Sec. 5.1.7).
Likelihood Score = 2: The functionality of both the mesh data plane and the application code
itself are susceptible to DoS.
27
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
Likelihood Score = 1: Due to the nature of RPC frameworks and in-process enforcement, mesh
data plane policy should not be able to be bypassed.
28
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
29
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
30
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
• HIGH-REQ3: All traffic management capabilities are required at the request level and
should involve application layer parameters in addition to those at the network
connection level.
These capabilities reveal that a complete suite of L7 functions is required:
• Just like for applications with low risk and medium risk profiles, all four data plane
architectures can be theoretically used.
• Based on the requirements, this class of applications belongs to highly critical
applications that require a great degree of isolation and in which any compromise
should be limited to only one service instance rather than multiple service instances.
Data plane architectures that deploy an L7 proxy for each service (e.g., sidecar model
[DPA-1]) are most applicable. A shared L7 proxy per service (e.g., DPA-2) can be an
acceptable trade-off for some organizations if they have other mechanisms for
mitigating shared-fate failures of all instances of the service that the shared service
mesh L7 proxy brings (e.g., mitigating a DoS attack via L3 controls outside of the mesh).
However, tightly integrating both L4 and L7 functions with the service instance provides
a greater degree of isolation, so DPA-1 is highly recommended.
All relevant network traffic data pertaining to their level in the stack they operate will be
collected by L4 and L7 proxies. This guidance does not make any distinction between these
proxy types as far as a network data collection ability is concerned. Additionally, all proxies will
be configured to send the collected data to the appropriate monitoring tools in the enterprise
infrastructure. The tools for aggregating and filtering traffic is beyond the scope of the data
plane architecture components considered in this document. The same applies to the
monitoring tools that provide the dashboard, analyze the traffic, generate the required metrics,
and send alerts regarding the threats detected. Hence, nothing in this recommendation
compromises the observability requirements of applications in the enterprise.
Moreover, this guidance does not recommend a particular data plane architecture for the
entire enterprise. Applications have different levels of security requirements within an
enterprise, such as functions that do not require L7 proxy functions. Therefore, an enterprise
can choose to deploy DPA-1 (i.e., sidecar proxies) for some selected applications while using
different architectures (e.g., DPA-2, DPA-3, or DPA-4) for others to leverage their increased
performance and still meet the enterprise’s security needs.
31
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
32
NIST SP 800-233 Service Mesh Proxy Models for
October 2024 Cloud-Native Applications
References
[1] Wikipedia (2024) OSI Model. Available at https://en.wikipedia.org/wiki/OSI_model
[2] Chandramouli R, Butcher Z (2020) Building Secure Microservices-based Applications Using
Service-Mesh Architecture. (National Institute of Standards and Technology, Gaithersburg,
MD), NIST Special Publication (SP) NIST SP 800-204A.
https://doi.org/10.6028/NIST.SP.800-204A
[3] Chandramouli R, Butcher Z, Aradhna C (2021) Attribute-based Access Control for
Microservices-based Applications using a Service Mesh. (National Institute of Standards
and Technology, Gaithersburg, MD), NIST Special Publication (SP) NIST SP 800-204B.
https://doi.org/10.6028/NIST.SP.800-204B
[4] Chandramouli R (2022) Implementation of DevSecOps for a Microservices-based
Application with Service Mesh. (National Institute of Standards and Technology,
Gaithersburg, MD), NIST Special Publication (SP) NIST SP 800-204C.
https://doi.org/10.6028/NIST.SP.800-204C
[5] Chandramouli R, Butcher Z (2023) A Zero Trust Architecture Model for Access Control in
Cloud-Native Applications in Multi-Cloud Environments. (National Institute of Standards
and Technology, Gaithersburg, MD), NIST Special Publication (SP) NIST SP 800-207A.
https://doi.org/10.6028/NIST.SP.800-207A
[6] Jackson E, Kohavi Y, Pettit J, Posta C (2022) Ambient Mesh Security Deep Dive. (Istio)
Available at https://istio.io/latest/blog/2022/ambient-security/
[7] Howard J, Jackson EJ, Kohavi Y, Levine I, Pettit J, Sun L (2022) Introducing Ambient Mesh.
(Istio) Available at https://istio.io/latest/blog/2022/introducing-ambient-mesh/#what-
about-security
[8] Turner M (2022) eBPF and Sidecars - Getting the Most Performance and Resiliency out of
the Service Mesh. (Tetrate) Available at https://tetrate.io/blog/ebpf-and-sidecars-getting-
the-most-performance-and-resiliency-out-of-the-service-mesh/
[9] Graf T (2021) How eBPF will solve Service Mesh - Goodbye Sidecars. (Isovalent) Available
at https://isovalent.com/blog/post/2021-12-08-ebpf-servicemesh/
[10] Song J (2022) Transparent Traffic Intercepting and Routing in the L4 Network of Istio
Ambient Mesh. (Tetrate) Available at https://tetrate.io/blog/transparent-traffic-
intercepting-and-routing-in-the-l4-network-of-istio-ambient-mesh/
[11] Song J (2022) L7 Traffic Path in Ambient Mesh. (Tetrate) Available at
https://tetrate.io/blog/l7-traffic-path-in-ambient-mesh/
[12] Cilium (2024) Threat Model — Cilium 1.15.6 documentation. (Cilium) Available at
https://docs.cilium.io/en/stable/security/threat-model/
[13] Istio (2024) Ambient mode overview: ztunnel. Available at
https://istio.io/latest/docs/ambient/overview/#ztunnel
[14] Landow S (2021) gRPC Proxyless Service Mesh. (Istio) Available at
https://istio.io/v1.15/blog/2021/proxyless-grpc/
[15] Butcher Z (2024) Ambient Mesh: What you need to know about this experimental new
deployment model for Istio Available at https://tetrate.io/blog/ambient-mesh-what-you-
need-to-know-about-this-experimental-new-deployment-model-for-istio/
[16] Spring (2024) Spring Framework Available at https://spring.io/projects/spring-framework
33