Cloud Incident Response Framework - A Quick Guide
Cloud Incident Response Framework - A Quick Guide
Framework – A Quick
Guide
Key Contributors:
Christopher Hughes
Ashish Kurmi
Larry Marks
Michael Roza
Saan Vandendriessche
REVISION HISTORY
In today’s connected era, a comprehensive incident response strategy is an integral aspect of any
organization aiming to manage and lower their risk profile. A good incident response strategy needs
to be useful not only when dealing with incidents caused by malicious threat actors, but should
also be applicable in a variety of other situations such as downtime caused by an unexpected power
outage or cut internet fiber due to roadworks. There are, however, different considerations when it
comes to incident response strategies for cloud-based infrastructure and systems, due in part to the
nature of its shared responsibility.
Standards bodies, government agencies, cloud service providers (CSPs), research institutes and security
experts have developed various incident response frameworks and best practices to help organizations
be better prepared when dealing with cloud incidents. These frameworks and best practices provide
methodical, step-by-step response plans to various types of cloud incidents, which in turn, help
manage and minimize damage to businesses.
With the abundance of Cloud Incident Response (CIR) standards, frameworks and guidelines available
in the industry, CSA’s CIR Working Group (WG) aims to provide a holistic and consistent view across
widely used frameworks for the user, be it CSPs or cloud customers. Ultimately, the WG hopes to
develop a holistic Cloud Incident Response (CIR) framework that covers the major causes of cloud
incidents (both security and non-security related), and their handling and mitigation strategies. The
aim is to serve as a go-to guide for cloud users to effectively prepare for and manage the aftermath
of cloud incidents, along with serving as a transparent and common framework for CSPs to share
cloud incident response practices with their customers.
This Quick Guide distills the main objectives and gives readers an overview of the key contributions
and efforts currently underway inside the CIR WG. As we move towards a comprehensive CIR
framework, the CIR WG hopes to take this opportunity to encourage volunteers to participate in the
WG’s efforts and provide valuable feedback to the ongoing work.
Migrating systems to the cloud is not a lift-and-shift process – which also applies to the Incident
Response (IR) process. Cloud is a different realm altogether, and expectedly, CIR is too. The three
key aspects that set CIR apart from traditional IR processes are governance, visibility, and the shared
responsibility of the cloud.
Governance
Data in the cloud resides in multiple locations, sometimes with different CSPs. Getting the various
organizations together to investigate an incident is a major challenge and can be resource draining on
large CSPs that have a colossal client pool.
Shared responsibility
Cloud customers, CSPs and/or third-party providers all have different roles to play when ensuring
security in the cloud. Generally, customers are responsible for their own data, while CSPs are
responsible for the cloud infrastructure and services that they provide. CIR should always be
coordinated across all parties.
It is important to discuss in granular detail with your CSP to ensure that the roles and governance
are clear. Be sure not to create or settle for any policy that you are unable to enforce. Organizations
should understand that they can never outsource governance or shared responsibilities.
Visibility
Lack of visibility in the cloud indicates that incidents that could have been resolved quickly are now at
risk of escalating. The cloud has the benefit of ensuring an easier, faster, cheaper and more effective
IR when leveraged properly. It is important to take great care when developing IR processes and
documentation, taking full advantage of cloud architectures as opposed to traditional data center
models. Many tools, services, and capabilities provided by CSPs greatly enhance detection, reaction,
recovery and forensic abilities that are curated for, and only possible in the cloud. CIR has to be
proactive and architected for failure throughout the process.
1
NIST Computer Security Resource Center, Glossary - Incident, https://csrc.nist.gov/glossary/term/
incident
Through this quick guide, we hope to highlight the essence of the CIR so readers can expect more
comprehensive coverage in the upcoming CIR Framework. The CIR WG also hopes to take this
opportunity to call for any contributors interested in developing the framework with the community.
There is an abundance of Incident Response (IR) standards, frameworks and guidelines available
in the industry today, which can be overwhelming for organizations to comprehend. The following
IR lifecycle diagram provides a clear understanding of how various chapters and sections across
different frameworks fit into an IR lifecycle. This is especially helpful when the user needs to zoom in
and plan for specific phases in the response process.
Detection
Post-Mortem
and Analysis
Containment, Coordination
Detection and
Preparation Eradication Post-Mortem and Information
Analysis
and Recovery Sharing
NIST 800-61r2 NIST 800-61r2 NIST 800-61r2 NIST 800-61r2 NIST 800-61r2
3.1 Preparation 3.2 Detection and Analysis 3.3 Containment, 3.4 Post-Incident Activity 4 Coordination and
Eradication and Recovery Information Sharing
TR 62 TR 62 TR 62
0.1 Cloud Outage Risks 4.2 COIR Categories TR 62 5.3 After Cloud Outage FedRAMP Incident
5.1 Before Cloud Outage 5.2 During Cloud Outage (CSCs) Communication Procedure
FedRAMP Incident
(CSCs) (CSCs) 6.3 After Cloud Outage 2 Stakeholder
Communication Procedure
6.1 Before Cloud Outage 6.2 During Cloud Outage (CSPs) Communications
5.1 Preparation
(CSPs) (CSPs)
FedRAMP Incident NIST (SP) 800-53 r4
NIST (SP) 800-53 r4
FedRAMP Incident FedRAMP Incident Communication Procedure Appendix F-IR IR-4, 1R-7,
3.1 Selecting Security
Communication Procedure Communication Procedure Post-Incident Activity IR-9
Control Baselines Appendix
5.2 Detection and Analysis 5.3 Containment,
F-IR IR-1, 1R-2, 1R-3, IR-8 CSA Security Guidance v4.0 NIST (SP) 800-150
Eradication and Recovery
NIST (SP) 800-53 r4 9.1.2.4 Post-mortem 4 Participating in Sharing
CSA Security Guidance v4.0
Appendix F-IR NIST (SP) 800-53 r4 Relationships
9.1.2.1 Preparation The Incident Handlers
AT-2, 1R-4, IR-6, 1R-7, IR-9, Appendix F-IR
Handbook
ENISA Cloud Computing SC-5, SI-4 1R-4, IR-6, IR-7, IR-9
7 Lessions Learned
Security Risk Assessment
CSA Security Guidance v4.0 CSA Security Guidance v4.0 8 Incident Handlers
Business Continuity
9.1.2.2 Detection and 9.1.2.3 Containment, Checklist
Management, page 79
Analysis Eradication and Recovery
The Incident Handlers
The Incident Handlers The Incident Handlers
Handbook
Handbook Handbook
2 Preparation
3 Identification 4 Containment
8 Incident Handlers
8 Incident Handlers 5 Eradication
Checklist
Checklist 6 Recovery
8 Incident Handlers
Checklist
Solid preparation can improve an incident response team’s readiness and efficiency, ensuring they are
sufficiently prepared in the face of threats. Organizations should work towards having more than one
mechanism in place to avoid single points of failure.
A good CIR plan should clearly establish everyone’s roles and responsibilities. A list of emergency
contacts and the various methods of communications should also be ready for reaching out to key
parties within or beyond the organization for assistance. If third-party IR providers are engaged, the
CIR plan should be around your third-party vendors. During the preparation phase, organisations
should consider vetting any third-party IR providers to have quick access to resources, should they
be needed in an emergency response situation.
Because every cloud platform is slightly different, there is no one-size-fits-all CIR plan. One concept to
consider is Chaos Engineering4, the goal of which is to build more resilient systems by experimenting
on a system in order to build confidence in the system’s capability to withstand turbulent conditions in
production.
In addition to offline backups, organizations should make full use of their CSPs’ business continuity
and disaster recovery capabilities, and be familiar with them so they can invoke them in the event of
incidents.
2
Cloud Security Alliance 2017, Security Guidance for Critical Areas of Focus in Cloud Computing v4,
https://cloudsecurityalliance.org/artifacts/security-guidance-v4/
3
Cloud Security Alliance 2017, Security Guidance for Critical Areas of Focus in Cloud Computing v4,
https://cloudsecurityalliance.org/artifacts/security-guidance-v4/
4
Chaos Community Google Group 2018, Principles of Chaos Engineering, https://principlesofchaos.
org/?lang=ENcontent
Although detection and analysis may differ from one cloud environment to the other, the monitoring
scope must cover the cloud management plane in addition to deployed assets. In-cloud monitoring
and alerts can be leveraged to help kick off an automated response workflow. Cloud logs (which
might not be available for all CSPs, or across all service models of SaaS, PaaS and IaaS), preferably
complete logging of all management activities and API calls, can help to address questions such as:
Incident classification scales are used in several industry best practices and guidelines to help users
gauge the severity of impact and/or the relative importance of cloud services availability to business
operations. The WG is developing an Incident Classification Scale of 5 categories, from Level 1 to
Level 5, with impact-increments at each level. Listed below are some mappings which are to be
expected in the subsequent deliverable, CIR Framework:
5
FedRAMP PMO 2017, FedRAMP Incident Communication Procedure, https://www.fedramp.gov/as-
sets/resources/documents/CSP_Incident_Communications_Procedures.pdf
Table 1. Example of Table with Mapping of Incident Severity Level to Relevant Frameworks
When an incident is discovered, predefined CIR plans as stipulated in Phase 1, Preparation, should
be executed (eg. taking systems offline, quarantining systems, restricting connectivity). It is of the
utmost importance not to remove the threat by blind deletion as this is equivalent to destroying evidence
which is required for forensics and revising the CIR plan. The key is to be meticulous in removing any
tiny trace of malware, threats or issues. It is also important to evaluate the compromise of data loss
versus service availability. To prevent incidents from reoccuring, systems should be hardened and
patched following an immutable infrastructure paradigm. This means servers are never modified
after deployment and any changes or fixes are built into a new image that replaces the old one.
Whenever possible, make full use of cloud backups, mirroring or restoration services provided by your
CSPs for a quick and seamless recovery. These CSP and cloud-centric services can often ensure a
more robust recovery (tailored for cloud) compared to traditional on-prem or third-party solutions.
The working group aims to develop containment, eradication, and recovery guidelines that are specific
to each level in the Incident Classification Scale. This will be published in the subsequent deliverable by
the CIR WG. Individuals who are interested in participating in this work may join the WG here.
Once the storm is over, the IR team should gather to analyse and document the incident and
determine what went well and what could be improved. The lessons learned will help in revising
and solidifying the CIR plan. The post-mortem phase is essential, and should be performed as soon
as possible, while the lessons are still fresh in everyone’s mind. Important details may be lost or
forgotten which could make a valuable difference in preventing a future incident.
Coordinating with key partners, IR teams in other departments, law enforcement agencies on their
specific roles and responsibilities greatly reinforce CIR capabilities. This communication should be set
up from the start – at the planning phase – and maintained throughout the entire CIR process when
necessary.
The IR Phases discussed above are mapped at a high level in Chapter 7 in the upcoming CIR
Framework document to the incident response sections for five of the most well-known cloud
security standards (CSA CCM, FedRAMP, NIST, ISO, and CIS). This suggests that if the CSA CIR
Framework is used as a basis for creating an IR System, that system should comply at least partially
with these well-known standards. Detailed mappings (with the exception of CIS) can be found in the
CSA CCM Cloud Matrix 3.01 Full Version9. The CCM and these mappings are important for complying
with the CSA Security Trust Assurance and Risk (STAR) Certification10.
5. Conclusion
In the event of a critical incident, there is no time to waste figuring out a game plan - every second
that goes by puts data at risk of being potentially compromised. The CSA CIR WG is developing a
sequel to this document, the Cloud Incident Response Framework, which delves into each chapter in
greater depth. Readers can expect a step-by-step guide, from preparation to post-mortem, with CIR
guidelines curated for different levels of incident severity. Key ideas and concepts are covered in each
phase and should apply to all cloud incidents.
As a work in progress, the CIR WG welcomes individuals who are interested in contributing to this
work to join the WG by registering here.
8
More information on CloudCISC: https://cloudsecurityalliance.org/research/working-groups/cloud-
cisc/
9
More information on CSA CCM: https://cloudsecurityalliance.org/research/cloud-controls-matrix/
10 More information on CSA STAR: https://cloudsecurityalliance.org/star/