Fast55s Cankar CC BY
Fast55s Cankar CC BY
net/publication/369881584
CITATIONS READS
6 169
7 authors, including:
All content following this page was uploaded by Nenad Petrovic on 15 April 2023.
This work is licensed under a Creative Commons Attribution 2 BACKGROUND AND RELATED WORK
International 4.0 License.
2.1 DevSecOps workflow
ICPE ’23 Companion, April 15–19, 2023, Coimbra, Portugal Life-cycle management of the application from the configuration
© 2023 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0072-9/23/04. and deployment phase to the daily repetitive updates and upgrades
https://doi.org/10.1145/3578245.3584943 has been recognised as a Dev(Sec)Ops workflow [2] containing
ICPE ’23 Companion, April 15–19, 2023, Coimbra, Portugal Matija Cankar et al.
multiple stages where different roles of experts and technicians 2.3 Security in run-time phase of DevSecOps
need to collaborate. The stages can be divided in design phase – Once the deployment defined by IaC scripts is done, applications
including Plan, Create, Verify and Package – and run-time phase and services go live. While they are up and running, during their
– including Release, Configure and Monitor. Phases are meant to usage life-cycle phase, various events (such as malicious behavior,
be repetitive and focused on special tasks. Considering the possi- attacks and anomalies) can occur and have impact on many aspects
bility of security and other issues in design phase, verification of related to their availability, performance, data safety and overall
IaC script trustworthiness is applied as a measure to tackle them. integrity [2]. A set of methods aiming to detect such occasions is
On the other side, in run-time phase, monitoring of real-time be- referred to as Dynamic Application Security Testing (DAST).
havior of a system is performed in order to detected deviations For this purpose, the monitoring components record various
from normal operation. In this paper, our focus is on two phases messages, errors and info about the state changes, generating logs
aiming to ensure trustworthiness of the application by improving: as output. Usually, log analysis methods are applied in order to
a) static verification and security analysis before the deployment in discover events, which could affect the system, such as security
design phase and b) adoption of machine learning-based approach to treats or failures. In order to achieve this goal, various tools and ap-
anomalies detection within logs in run-time, on the other side. proaches are used, as some of significant examples are: WALASAM
[8] - web server log analysis using data mining methods based
2.2 Security in design phase of DevSecOps on classification and clustering; nfer [9] - rule-based event-stream
Before the deployment, still in the design phase of the applica- abstraction and processing; CMS-NLP [10] - NLP-based technique
tion, the verification stage checks consistency and inspects for for log analysis aiming to reduce the operational workload and
vulnerabilities by applying SAST tools. Our goal is to detect known delays within a CMS solution. Other prominent methodologies in
vulnerabilities, often caused by improper configuration, parame- the state-of-the-art of anomaly detection on logs use BERT-based
ter values, type errors and non-compliance with common good pre-trained methods [3] and the Masked Language Model (MLM)
practices and particular specification language standards. However, in combination with BERT[12], as well as deep learning based on
it is quite often the case that IaC-based deployment depends on auto-encoder networks [5].
multiple components, provided by either community or specific Our approach leveraged an existing machine learning approach,
organizations that work on IaC languages and standards. Therefore, complementing a traditional DAST methodology, to log monitoring
it is also highly relevant to take into account the previously men- relying on deep learning techniques designed for Natural Language
tioned concerns and apply them to IaC-related libraries, templates Processing (NLP), which has been already approved by enabling
and collections (as for Ansible). For that reason, additional steps robust distinction between normal system operation and abnormal-
are be performed such as verification of dependencies or usage of ities [10].
outdated, vulnerable libraries, which is referred to as component
inspection within the scope of this paper. The component inspec- 3 APPROACH
tion can find new vulnerabilities even if the IaC has not changed,
In this paper, we adopt an approach which tackles the previously
therefore this task is continuously repeated also after the applica-
mentioned issues related to security and trustworthiness within the
tion is running. The available of-the-shelf solutions – also called
scope of DevOps workflows in both the design and run-time phase,
checks – can base on the expert knowledge integrated in the tool
based on DevSecOps philosophy [2]. In the design phase, we rely on
[6] or rely on machine learning approaches [4].
(i) a service for static code scanning, integrating many independent
Some of the notable existing SAST solutions considering com-
tools and in run-time phase, we developed (ii) an NLP-based service
ponent and security inspection are the following: Mega-Linter1
for detecting anomalies and therefore potential issues in system
- open-Source, offers quality and consistency analysis checks for
logs as presented in Section 2.3. Both services are applicable on
wide set of languages covered, outputs detailed reports and support
wide set of IaC related formats and standard and provide a summary
auto-fixing; Super-linter2 - GitHub-integrated workflow that rep-
of the scans to the final user.
resents a combination of multiple linter tools; Snyk3 - continuous
Figure 1 depicts the proposed DevSecOps workflow based cov-
component checking of dependencies covering project build tools
ering the aspects of both design and run-time trustworthiness,
such as Maven for Java and npm for Node.js. Moreover, Open Web
leveraging the proposed (i) and (ii) approaches in synergy with
Application Security Project (OWASP)4 holds an extensive list of
other DevSecOps steps. In the first step, when user has already
open source and commercial SAST tools.
designed an application, she provides the desired archive contain-
The service we are proposing combines code scans and also com-
ing IaC scripts and submit it for static scanning. Here, user is able
ponent inspections, which means that also holds the knowledge of
to notice if issues exist, and correct the code, accordingly. After
potential issues that can materialise when a specific component is in
user intervention and code correction, the IaC archive is checked
use. Beside existing scans we focused on implementing component
once again and deployed in case that no problems were detected.
scans which has not been yet covered by the community.
After the successful deployment, when the infrastructure is up and
running the services, the infrastructure or application logs are ac-
quired. These logs unveil a lot of potential security issues that is
1 https://megalinter.io/latest/,
accessed on 18 January 2023
2 https://github.com/github/super-linter, known described by experts. To identify unknown problems and
accessed on 18 January 2023
3 https://snyk.io/, accessed on 18 January 2023 to label potential issues, an additional AI-based analysis service is
4 https://owasp.org/www-community/Source_Code_Analysis_Tools/, (January 2023) processing the logs and detecting anomalies, ranking them with
Security in DevSecOps: Applying Tools and Machine
Learning to Verification and Monitoring Steps ICPE ’23 Companion, April 15–19, 2023, Coimbra, Portugal
an evaluation score, so users can focus only on the parts of history This framework combining IaC and component check is imple-
logs that potentially present a threat. mented in Python programming language and offers both web-
based REST interface relying on FastAPI and command-line in-
terface (CLI) for easier integration with DevOps pipelines. The
OpenAPI specification5 can be used with SwaggerUI graphical in-
terface to interact with deployed service. A variety of check tools
is covered - from basic linters (pylint – for Pythom, YAMLlint –
for TOSCA and Ansible YAML files, Hadolint – for Docker files) to
more sophisticated security-related tools (such as Terrascan and
tfsec for Terraform; Steampunk Spotter for Ansible; xOpera TOSCA
parser for TOSCA YAML). Apart from that, informational tools that
provide IaC archive-related statistics are included as well, such as
cloc. The list of currently supported static analysis tools for specific
IaC-related file types can be found here6 .
We named the presented service as IaC Scan Runner [11]. It is
an open-source software, publicly available on GitHub7 , with a
Figure 1: Workflow enabling design and run-time trustwor- goal to aggregate various types of IaC-related static script scanning
thiness in DevSecOps: 1 – upload IaC; 2 – archive scan 3 - list tools put together into unified web-based API. To ease the usage
of compatible check tools 4 – generate HTML summary and the component inspection tool is integrated in the IaC Scan Runner
persist results 5- correcting with respect to reported prob- and can be initiated among other supported scans. The professional
lems 6 – IaC deployment 7– run-time events and logging 8 – version of component inspection tool is available also separately
log analysis 9 – event detection and alerts under the commercial name Steampunk Spotter8 . Beside the static
IaC analysis provides an assisted automation code writing and
offers recommendations for Ansible Playbooks. Tool can be simply
integrated within GitHub CI/CD workflows using command-line
4 THE SERVICES OF INSPECTION interface.
We developed a design-time and run-time services for security
inspection. A design-time can do a multi-pass over IaC by initiating
4.2 Design-time IaC Scan results
known open-source and proprietary IaC scan tools together with When the IaC is processed by all scans the outputs are ranked and
our own implemented inspection services for deep component displayed to a user. The output lists result summary in four levels:
check of Ansible IaC. The design service resulted in two parts, 1) Passed – the IaC is clear with no problems, 2) Error – issues
IaC Inspector and IaC Component inspector that we combined in found 3) Warnings and 4) Not performed – scan not performed as
the Restful Service. For the run-time services an AI powered log there was no associated file. To ease the managing all scan results
inspection tool, called LOMOS, is developed. for the user, the ranks define prioritisation list, displaying checks
with more serious issues on top and less important issues later.
The list of the scans is limited to the ones that current version
4.1 Design-time IaC and component inspection
of IaC Scan Runner supports, however, we are aware that DevOps
The design-time service we developed works as presented in Figure paradigm is evolving and new scans will appear in the future, to
1 and covers steps 1-6 as mentioned in previous sections. The user cover new vulnerabilities. To make the tool more future-proof, we
provides the IaC archive that is about to be scanned by the IaC prepared the instructions on our GitHub location9 that can be used
Inspector. The IaC Scan runner analyzes the user selected checks to add any new scan in the IaC Scan Runner.
together with the archive and automatically recognize the com-
patible checks to performed. In this step our crucial contribution 4.3 Run-time service inspection
is the development of the Ansible component inspector. Our gap
The run-time inspection is focused on vulnerability assessment
analysis revealed that in case of Ansible, IaC code relies on multiple
tools, including the continuous checking of the system safety and
Ansible Collections that provide specific functionality. However,
system information management systems that check system his-
inclusion of each collection presents new potential risks, as collec-
toric logs. In the following sections we will present the VAT and
tions could be outdated and/or vulnerable. This led us to develop
LOMOS approach.
a tool with the following set of features: 1) parameter checking -
wrong configuration identification, making sure that the correct 4.3.1 VAT - static vulnerability assessment from rule matching. In
parameters are used, considering their relationships 2) best prac- complex systems, verification is not only a matter concerning the
tices adoption - ensures that anti-patterns are avoided 3) module
checking - identifies name changes and redirects, checks for fully 5 https://xlab-si.github.io/iac-scanner-docs/
qualified names, and ensure we are using only certified and ap- 6 https://xlab-si.github.io/iac-scanner-docs/02-runner.html
7 https://github.com/xlab-si/iac-scan-runner, accessed on 13 January 2023
proved modules 4) correction recommendations - error assistant 8 https://steampunk.si/blog/how-to-use-steampunk-spotter-cli-to-audit-your-
will guide us through the hard-to-catch errors, while errors and playbook/, accessed on 13 January 2023
warnings can be distinguished by colors. 9 https://github.com/xlab-si/iac-scan-runner#readme, accessed on 20 January 2023
ICPE ’23 Companion, April 15–19, 2023, Coimbra, Portugal Matija Cankar et al.
pre-operation steps, but it also encompasses the operation of the series data. In contrast, LOMOS makes use of state-of-the-art NLP
system. In particular, it is of paramount importance to make sure methods in order to model log streams and capture their normal
that a system undergoes through a continuous run-time verification operating conditions. This enables the implementation of a moni-
for what concerns any security violation. Such violations can be toring system that does not depend on any manually defined rules
recognized by monitoring of various security metrics (e.g., file in- or human intervention, but rather on behavioral model which is
tegrity, network configuration changes, usage of software reported able to detect deviations that would represent any kind of abnor-
as vulnerable, malware detection) and integrated with the system’s mal situations, including potential security threats. The following
monitoring component to recognize violations of defined secu- relevant use cases are covered: (i) the insightful monitoring of ap-
rity policies and alert DevSecOps teams to address and eliminate plication logs; (ii) the automatized alert system for any deviation
threats as fast as possible. Through security verification and threat from normal execution workflows; (iii) the easy root cause analysis
detection on multiple levels, the PIACERE framework empowers via integration of logs from several components; and (iv) the identi-
deployed applications by helping to prevent abuse and leakage of fication of specific events, such as security incidents, performance
data. drop and system failures. Additionally, LOMOS can be integrated
We have developed a toolset able to verify any security violation with other systems as Grafana UX interface, Slack alerting system,
at run-time, feeding self-learning and self-healing mechanisms. The Security information and event management system (SIEM) and
monitoring system is capable of detecting security-related events extended and detection response tools (XDR).
and incidents in the deployed application’s environment10 . It is
(to the extent possible) deployable automatically and notifies users 5 USE-CASES AND APPLICATIONS
about security alerts. The system is then able to automatically
deploy security monitoring agents, integrated into the monitoring Table 1: Fields of security service application, IaC Scan Run-
mechanisms and notify about security threats according to the ner (IaC and Component check), VAT and LOMOS
policies.
Component
4.3.2 LOMOS - dynamic security with AI-powered log analysis. To
IaC check
LOMOS
VAT
complement and enhance the static analysis with VAT as discussed
check
above in Section 4.3.1, we have developed a log analysis tool -
IaC
LOMOS - that provides the automatic analyses of system or appli-
cation logs and provides valuable insights regarding the current Public administration ✓ ✓ ✓
and past status of the monitored assets. It is based on LogBERT Transport infra ✓ ✓
[7] and implements self-supervised NLP methods, such as Masked Critical infrastructure ✓ ✓
Language Modelling, relying on deep learning techniques and tak- Supply-chain Security ✓ ✓
ing into account various aspects relevant to logs, such as seman- Connected Cars ✓ ✓
tics of their messages and sequential information. The adopted Smart Agriculture ✓ ✓
AI-based approach is able to perform automatic message analysis Health Devices ✓
based on historical log records, taking into account factors, such
as their severity and occurrence frequency. It allows for unsuper- The presented services have been already applied on the domains
vised distinction between the normal flow and abnormalities, while where users are evaluating them. The overall mapping between the
corresponding notification are sent to the user when unexpected domains of interest and presented tools is shown in Table 1. The Iac
behavior or incident happens (see the workflow in Figure 2). Scan Runner is currently evaluated by the Slovenian Public Admin-
istration Ministry that will use the tool to inspect their production
services deployed on the state-internal network in the sense of IaC
security and component – Ansible collection – verification. In the
area of telecommunication infrastructure, the Ericsson will evalu-
ate the usage of the IaC Scan Runner on the IaC code for network
configuration and deploying edge applications. The Prodevelop,
a maritime critical infrastructure manager, will evaluate the IaC
Scan Runner for IaC supporting the migration of their application
from private to the public cloud. All three mentioned domains are
using the SAST tools through the PIACERE IDE developed over the
Eclipse IDE framework. Our Ansible component check tool called
Steampunk Spotter, which is included in IaC Scan Runner scan set,
Figure 2: Workflow of log anomaly detection with LOMOS is also available as a commercial software for DevOps developers11 .
The overall chain of tools used for implementation of our approach
Traditional log monitoring solutions, such as the one discussed is depicted in Figure 3.
in Section 4.3.1, are limited to rule-based (manual) analysis of time On the one side, the VAT and LOMOS applications are in the eval-
10 https://medina-project.eu/blog/tools-and-techniques-collecting-evidence-
uation in food chain producers, connected car producers and smart
technical-and-organisational-measures, accessed on 20 January 2023 11 https://steampunk.si/spotter/
Security in DevSecOps: Applying Tools and Machine
Learning to Verification and Monitoring Steps ICPE ’23 Companion, April 15–19, 2023, Coimbra, Portugal