A Data Protection Approach For Cloud-Native Applications
A Data Protection Approach For Cloud-Native Applications
Ramaswamy Chandramouli
Wesley Hales
Ramaswamy Chandramouli
Computer Security Division
Information Technology Laboratory
Wesley Hales
Leak Signal Inc.
June 2024
Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this
paper in order to specify the experimental procedure adequately. Such identification does not imply
recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or
equipment identified are necessarily the best available for the purpose.
There may be references in this publication to other publications currently under development by NIST in
accordance with its assigned statutory responsibilities. The information in this publication, including concepts and
methodologies, may be used by federal agencies even before the completion of such companion publications.
Thus, until each publication is completed, current requirements, guidelines, and procedures, where they exist,
remain operative. For planning and transition purposes, federal agencies may wish to closely follow the
development of these new publications by NIST.
Organizations are encouraged to review all draft publications during public comment periods and provide feedback
to NIST. Many NIST cybersecurity publications, other than the ones noted above, are available at
https://csrc.nist.gov/publications.
Publication History
Approved by the NIST Editorial Review Board on YYYY-MM-DD [Will be added to final publication]
Submit Comments
nistir-8505-comments@nist.gov
All comments are subject to release under the Freedom of Information Act (FOIA).
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
1 Abstract
2 This document addresses the need for effective data protection strategies in the evolving realm
3 of cloud-native network architectures, including multi-cloud environments, service mesh
4 networks, and hybrid infrastructures. By extending foundational data categorization concepts,
5 it provides a framework for aligning data protection approaches with the unknowns of data in
6 transit. Specifically, it explores service mesh architecture, leveraging and emphasizing the
7 capabilities of WebAssembly (WASM) in ensuring robust data protection as sensitive data is
8 transmitted through east-west and north-south communication paths.
9 Keywords
10 data governance; data privacy; data protection; data security; in-transit data categorization;
11 WASM.
i
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
ii
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
50 Table of Contents
51 1. Introduction ..............................................................................................................................1
52
53
54
55
56 2. Web Assembly Background ............................................................................................................4
57
58
59 2.2.1. Development and Deployment Process ....................................................................................... 4
60
61
62 2.4.1. Role of WASM in Different Service Mesh Architectures .............................................................. 7
63
64
65 3. Data Protection in Transit ...............................................................................................................9
66
67
68 3.2.1. Web Traffic Data Protection ......................................................................................................... 9
69 3.2.2. API Security................................................................................................................................. 10
70 3.2.3. Microsegmentation .................................................................................................................... 10
71 3.2.4. Log Traffic Data Protection ......................................................................................................... 11
72 3.2.5. LLM Traffic Data Protection........................................................................................................ 11
73 3.2.6. Credit Card-Related Data Protection .......................................................................................... 12
74 3.2.7. Monitoring Tools to Visualize Sensitive Data Flows ................................................................... 12
75 4. Security Analysis of WASM Modules .............................................................................................13
76
77 4.1.1. User-Level Security Features ...................................................................................................... 14
78 4.1.2. Security Primitives for Developers ............................................................................................. 14
79
80
81
82
83
84
iii
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
89
iv
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
90 1. Introduction
91 In the constantly evolving landscape of cloud-native application architectures, where data
92 resides in multiple locations (i.e., on-premises and on the cloud), ensuring data security involves
93 more than simply specifying and granting authorization during service requests. It also involves
94 a comprehensive strategy to categorize and analyze data access and leakage as data travels
95 across various protocols (e.g., gRPC, REST-based), especially within ephemeral and scalable
96 microservices applications. As organizations find themselves governing hundreds to tens of
97 thousands of services and the inter-service calls between them, a security void has been
98 identified in observing and protecting sensitive data in transit.
1
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
127 (b) Run using a WASM runtime in an isolated virtual machine (VM) within the proxy, which
128 allows developers to enhance applications with necessary functionality and run them as
129 efficiently as native code in the proxies.
130 Over the last few years, the Envoy WASM VM has enabled new types of compute and traffic
131 processing capabilities and allowed for custom WASM modules to be built and deployed in a
132 sandboxed and fault-tolerant manner.
133 Additionally, the following features of WebAssembly modules make them particularly effective
134 for data protection:
135 • Data Discovery and Categorization: WASM modules can dynamically identify and
136 categorize data as it traverses the network, ensuring that sensitive information is
137 recognized and handled appropriately.
138 • Dynamic Data Masking (DDM): WASM modules can apply DDM techniques to redact or
139 mask sensitive information in transit, enhancing privacy and security.
140 • User and Entity Behavior Analytics (UEBA): WASM modules can analyze user and entity
141 behaviors in real time, detecting anomalies and potential security threats.
142 • Data Loss Prevention (DLP): WASM modules can enforce DLP policies by monitoring and
143 controlling data transfers to prevent unauthorized data exfiltration.
2
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
163 data protection, API Security, microsegmentation, log traffic data protection, LLM traffic
164 data protection, and integration with monitoring tools for the visualization of sensitive
165 data flows.
166 • Section 4 presents a detailed security analysis of a WASM module by examining its
167 development, deployment, and execution environment to ensure that the module
168 satisfies the properties of a security kernel and can provide the necessary security
169 assurance.
170 • Section 5 provides a summary of the topics covered in this document and discusses how
171 WASM module functionality must continuously evolve to provide the security assurance
172 needed to protect against data breaches and exfiltration in the context of increasingly
173 sophisticated attacks on data.
174
3
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
4
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
210 The steps involved in developing a WASM module and running it using WASM runtime are [6]:
211 • Source code writing: Programs are written in languages (e.g., C++, C#, Rust, etc.) that
212 have target WASM compilers available.
213 • Parsing: The code is parsed into an abstract syntax tree (AST) structure.
214 • Compiling: The code in AST structure is then compiled into a WASM module using AOT
215 or JIT. The WASM module is generated in a binary format that can be executed by
216 WASM runtime.
217 • WASM runtime loading: The WASM runtime loads the WASM module (with file name
218 extension. wasm). If JIT is used, the compilation takes place after loading into WASM
219 runtime.
220 • Preparation for execution (i.e., instantiation): The WASM runtime creates an
221 executable instance from the WASM module by allocating memory, importing functions
222 and objects, and establishing the execution environment for the module.
223 • Code optimization: During execution of the byte code, profiling is employed to identify
224 frequently executed code, and a progressive optimization/re-optimization process takes
225 place to gradually enhance performance until the code runs efficiently.
226 Figure 1 shows the ability to develop programs in different languages, convert them into WASM
227 code, and run them under different processor architectures [4]. The execution model for WASM
228 modules in the server environment and their comparison with the container execution model
229 are described in Appendix B.
230
231
232 Fig. 1. Generating WASM modules and their execution [4]
5
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
6
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
7
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
8
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
9
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
382 authenticated identities. WASM modules can use regex and ML models to identify sensitive
383 data patterns in HTTP payloads and redact, mask, or block classified data transmissions based
384 on configured policies. Example applications include:
385 • E-commerce websites: Monitoring credit card details and personal information during
386 transactions to ensure that they are properly encrypted and masked, preventing
387 unauthorized access.
388 • Healthcare applications: Protecting patient data by detecting and encrypting sensitive
389 information, such as medical records and personal identifiers before they are
390 transmitted between systems.
391 • Corporate communications: Scanning and securing internal emails and messages to
392 prevent data breaches and ensure compliance with internal data protection policies.
10
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
418 In contrast, in-transit data categorization offers a dynamic and granular approach. By analyzing
419 data flows in real time, organizations gain actionable insights into the content and context of
420 network traffic. This enables the precise enforcement of security controls based on data
421 attributes, such as sensitivity levels or compliance requirements. Example applications include:
422 • Financial institutions: Implementing microsegmentation to protect critical systems that
423 handle transaction processing to ensure that only authorized services can access
424 sensitive financial data.
425 • Healthcare providers: Segregating networks within a hospital to ensure that medical
426 devices and patient data systems are isolated from less secure administrative networks.
427 • Retail chains: Using real-time data categorization to manage data flows between point-
428 of-sale systems and backend inventory systems to prevent unauthorized access to sales
429 data and customer information.
11
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
452 • Content Moderation: Ensuring that data processed by LLMs for content moderation is
453 handled in compliance with privacy regulations to protect user information.
454 • Data Analysis Services: Classifying and securing data used by LLMs in analytics platforms
455 to prevent unauthorized access to sensitive business insights and customer data.
12
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
13
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
14
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
549 space, execution stack, and runtime data structure [11] — and prevents WASM modules from
550 accessing other memory areas. These other memory regions are isolated from the internal
551 memory of the runtime and are set to zero by default unless otherwise initialized. However,
552 modules can access the data stored on the execution stack via dedicated instructions. The
553 actual data address on the execution stack is never shown to the module. A compliant runtime
554 ensures that the module does not break WASM's memory model [12]. This is done by bounds-
555 checking access to the linear memory at the region level. If the module accesses the memory
556 outside of the linear memory, the program traps and prevents modules from accessing data
557 outside of their allocated memory [11].
558 Another common class of memory safety error involves unsafe pointer usage and undefined
559 behavior. This includes dereferencing pointers to unallocated memory (e.g., NULL) or freed
560 memory allocations. In WebAssembly, the semantics of pointers have been eliminated for
561 function calls and variables with a fixed static scope, allowing references to invalid indexes in
562 any index space to trigger a validation error at load time or — at worst — a trap at runtime.
563 However, the bounds-checking process is performed at the level of the linear memory, and
564 modules can access the entire linear memory without restriction. Linear memory is not
565 protected by standard techniques like stack canaries or guard pages. Therefore, buffer
566 overflows — which occur when data exceeds the boundaries of an object and accesses adjacent
567 memory regions — cannot affect local or global variables stored in index space. Data stored in
568 linear memory can also overwrite adjacent objects since bounds-checking is performed at linear
569 memory region granularity and is not context-sensitive.
15
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
16
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
623 • Even for making those calls, only clusters known to the proxy can be used. Similarly,
624 responses coming from clusters already known to the proxy are examined.
625
17
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
18
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
639 References
640 [1] Doerrfeld B (2023) Wasm: The Next Generation Beyond Kubernetes? Available at
641 https://cloudnativenow.com/features/wasm-the-next-generation-beyond-
642 kubernetes/?utm_medium=email&_hsmi=293085224&_hsenc=p2ANqtz-9qlZw1MWD8-
643 AHNH54OoPyWB7vOLe0uG0KZSIe2uH1sbXAD_rmmhyXHMThd0GMdMLUb-
644 w8axL_Gv1L2RM9Nq55L2eCysg&utm_content=293086040&utm_source=hs_email
645 [2] Krasnov M (2020) Web Assembly is the End of Internet as we know it. Available at
646 https://betterprogramming.pub/webassembly-is-the-end-of-the-internet-as-we-know-
647 it-9085a49cbc7b
648 [3] WebAssembly (2024) WebAssembly Concepts. Available at
649 https://developer.mozilla.org/en-US/docs/WebAssembly/Concepts#see_also
650 [4] TechTarget (2022) Server-side WebAssembly prepares for takeoff in 2023. Available at
651 https://www.techtarget.com/searchitoperations/news/252527414/Server-side-
652 WebAssembly-prepares-for-takeoff-in-2023
653 [5] Medium (2023) WASM and Kubernetes – A new era of application development.
654 Available at https://medium.com/@seifeddinerajhi/wasm-and-kubernetes-a-new-era-
655 of-cloud-native-application-deployment-b3c59b39f640
656 [6] Podobnik TJ (2023) WASM Runtimes Vs Containers: Cold Start Delays (Part 1). Available
657 at https://levelup.gitconnected.com/wasm-runtimes-vs-containers-performance-
658 evaluation-part-1-454cada7da0b
659 [7] ITPro (2024) WASM Today, AI Tomorrow: KubeCon Extends its Reach. Available at
660 https://www.itprotoday.com/cloud-computing-and-edge-computing/wasm-today-ai-
661 tomorrow-kubecon-expands-its-
662 reach?_mc=NL_DR_EDT__20240401&cid=NL_DR_EDT__20240401&utm_rid=CPNET000
663 059406774&utm_campaign=57716&utm_medium=email&elq2=a6cba5014e5b49bb9a1
664 fe0c3e0351bd2&sp_eh=87aea8874bbd0a1985055c93c957744c11570c6718777eca378d
665 b1b4436de815
666 [8] Security.md (2018) WebAssembly Security. Available at
667 https://github.com/WebAssembly/design/blob/main/Security.md
668 [9] Huang W, Paradies M (2021) An Evaluation of WebAssembly and eBPF as Offloading
669 Mechanisms in the Context of Computational Storage. Available at
670 https://marcusparadies.github.io/files/ebpf_vs_wasm_report.pdf
671 [10] Daniel Lehmann D, Kinder J, and Pradel M. (2020). Everything Old is New Again: Binary
672 Security of WebAssembly. In USENIX Security
673 [11] Haas A., et all (2017). Bringing the web up to speed with WebAssembly. In PLDI.
674 [12] WebAssembly Community Group (2023). WebAssembly Specification. Draft Release 2.0
675 (Draft 2023-04-24). Available at https://webassembly.github.io/spec/
676 [13] WebAssembly Community Group (2023). WebAssembly System Interface. Available at
677 https://github.com/WebAssembly/WASI
678 [14] Johnson E., et all (2023). WaVe: A verifiably secure WebAssembly sandboxing runtime.
679 In Proceedings of IEEE Symposium on Security and Privacy (SP).
680 [15] Wasmtime (2023). Security - Wasmtime. Available at
681 https://docs.wasmtime.dev/security.html
19
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
685
686 Fig. 2. WASM Module Development & Execution in Browsers
687 The WebAssembly program is run through a compiler (also called a WebAssembly target
688 compiler) that inputs code into an LLVM-compliant language and produces a binary .wasm file.
689 That file is loaded onto the existing JavaScript code by the JavaScript Interop layer and executed
690 by the WebAssembly runtime [2]. The .wasm file is a low-level assembly language file in binary
691 format.
692 The WASM compiler for C, C++, and Rust takes the source code written in those languages and
693 compiles it into a WASM module. Then the necessary JavaScript “glue” code is generated for
694 loading and running the module and an HTML document is used to display the results of the
695 code. The details of this process are explained in [3].
696
20
NIST IR 8505 ipd (Initial Public Draft) A Data Protection Approach for
June 2024 Cloud-Native Applications
697 Appendix B. Comparison of Execution Models for Containers and WASM Modules
698
699 Fig. 3. Comparison of Execution Stack for Containers & WASM Modules
700 Container images are created by combining the program containing the application logic with
701 its dependencies (e.g., runtime libraries) in a container runtime (e.g., docker). The container is a
702 full file system (i.e., utilities, binary), and the generated image should be for a designated OS
703 kernel and processor architecture (e.g., Intel, Arm, etc.). For example, if a Raspberry Pi OS is
704 running a docker image, then an image for the C/C++ application based on a Linux image must
705 be created and compiled for the ARM processor architecture. Otherwise, then container will not
706 run as expected [5].
707 In contrast, WASM modules and binaries are precompiled C/C++ applications that do not rely
708 on being coupled with a host OS or system architecture because they do not contain a
709 precompiled file system or low-level OS primitives. Every directory and system resource is
710 attached to a WASM module during runtime facilitated by WASI and then run using WASM
711 runtime. In other words, WASI is used to access all resources under the control of the OS,
712 essentially decoupling the code from its dependency on the platform architecture.
713
21