Self-Managed Services Conceptual Model in Trustworthy Clouds' Infrastructure
Self-Managed Services Conceptual Model in Trustworthy Clouds' Infrastructure
Abstract
Current clouds infrastructure do not provide the full potential of auto-
mated self-managed services. Cloud infrastructure management are sup-
ported by clouds’ internal employees and contractors (e.g. enterprise ar-
chitects, system and security administrators). Such manual management
process that require human intervention is not adequate considering the
cloud promising future as an Internet scale critical infrastructure. This pa-
per is concerned about exploring and analyzing automated self-managed
services for cloud’s virtual resources. We propose a conceptual model of
self-managed services interdependencies and identify static and dynamic
factors affecting their automated actions in the context of cloud comput-
ing. Next, we identify the challenges involved in providing secure and
reliable self-managed services. We have just started the work in this area
as part of EU funded Trusted cloud (TCloud) project1 .
1 Introduction
A cloud is a new buzzword in computing terms, which has various definitions,
for example, ‘Cloud is an elastic execution environment of resources involving
multiple stakeholders and providing a metered service and multiple granulari-
ties for specified level of quality’ [4]; another definition, ‘Cloud computing is a
model for enabling convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications,
and services) that can be rapidly provisioned and released with minimal man-
agement effort or service provider interaction.’ [5]. Cloud support three main
deployment types Software as a Service (SaaS), Platform as a Service (PaaS),
and Infrastructure as a Service (IaaS) [5]. IaaS provides the most flexible type
for cloud users who prefer to have the greatest control over their resources, while
SaaS provides the most restrictive type for cloud users where cloud providers
have full control over the virtual resources. In other words cloud computing
provides a full outsourcing support for the SaaS, a partial outsourcing support
for PaaS (more specifically it provides the virtual environment and software
tools for users helping them to develop and deploy their applications), and a
1 http://www.tclouds-project.eu/
1
minimal outsourcing support for IaaS (more specifically cloud provider mainly
manages the infrastructure components running the virtual machines). In this
paper we are mainly focusing on IaaS cloud type. Cloud users for IaaS would
typically be organizations.
The two main characteristics in potential cloud critical infrastructure, which
differentiate it from current enterprise infrastructure are pay-per-use payment
model and automated self-managed services [4]. In this paper we are mainly
focusing on self-managed services for infrastructure virtual resources. This pro-
vides cloud infrastructure with exceptional capabilities and new features. For ex-
ample, scale per use, hiding the complexity of infrastructure, automated higher
reliability, availability, scalability, dependability, and resilience. These should
result in cost reduction in terms of infrastructure maintenance.
The technologies behind current cloud infrastructure are not new, as they
have been used in enterprise infrastructure for many years [7]. Cloud comput-
ing current understanding become popular with Amazon EC2 in 2006 [1], and
its infrastructure is built up of technologies and processes based on in-house
solutions. Although, current cloud infrastructure has been there for long, but
we are still far away from achieving cloud potential features for several reasons
which we discuss the ones related to self-managed services in this paper [4].
Cloud computing originate from industry (commercial requirements and
needs) and has recently moved to research because of its promising potential
as an Internet-scale computing infrastructure [2, 4]. The lack of academic re-
search that formally analyze current cloud infrastructure results in confusion in
realizing cloud potential features, as in the case of overestimating some cloud
features (e.g. using immediate and unlimited keywords when describing some
self-managed services). The lack of such resources also results in underesti-
mating the challenges involved for providing automated clouds’ infrastructure
management. For example, some people interpret NIST definition for “resources
rapidly provisioned and released” as if cloud should provide unconditional im-
mediate and unlimited services; e.g. immediate and unlimited scalability. This
is not a visible requirement considering nowadays technologies. There will al-
ways be a limitation in hardware resources. There are also many other factors
that have not been considered for such a strong claim, e.g. should cloud provide
unlimited resources in case of application software bugs, should resources be
available immediately upon application request without user prior agreement,
whats about financial control measures, etc. NIST definition does not mean
immediate and unlimited; our understanding of “rapid provision and release”
is controlled by boundaries and pre-agreed Service Level Agreement (SLA). For
example, scalability should always be agreed between cloud user and provider
in advance in upper/lower bound limits and defined in a SLA. If the organiza-
tion wants to increase/decrease either limit, then they would need to update
their SLA to reflect that (using automated APIs which simplifies the process).
Also in case of increasing the upper limit the service provider needs to check if
his internal infrastructure can cover the additional resources. When decreasing
the lower limit, the service provider must ensure that customers are not over-
charged for unused resources. These protect both cloud provider (e.g. have an
expectation of overall resources upper limits) and cloud user (e.g. does not pay
for resources used by software bugs resulting in illegitimate demands of virtual
resources).
2
1.1 Objectives
The main objective of this paper is to define and explore clouds’ self-managed
services for virtual resources. In this we provide a conceptual model of self-
managed services, we identified the factors that affects management services
decisions, and then we discuss management services interdependency in cloud
computing context. Based on this we discuss the challenges involved in providing
secure and reliable self-managed services, which we have just start working on
as part of TCloud project.
2 Self-Managed Services
In this section we briefly define self-managed services, identify the factors that
affects managed services decisions, and provide a conceptual model for functions
required to support self-managed services.
2.1 Definition
One of the main cloud potential features is the provision of automated self-
managed services. Self-managed services are about providing cloud infrastruc-
ture with exceptional capabilities enabling it automatically (i.e. without human
interventions) manage the infrastructure virtual resources and take appropriate
actions on emergencies. Self-managed services are not about autonomic comput-
ing [3]. Autonomic computing is concerned about providing self-management
for physical resources (e.g. physical servers) and it does not change dynami-
cally based on changes in end-users requirements; however, self-managed ser-
vices are about providing self-management for virtual resources, which run on
top of physical resources and are based on many static and dynamic factors
including end-user requirements and infrastructure properties. In other words
self-managed services could run on top of autonomic computing but the oppo-
site is not true (it is outside the scope of this paper to discuss this in further
details).
For IaaS cloud type, self-managed services are concerned about supporting
clouds’ virtual resources availability, reliability, scalability, resilience, and adapt-
ability. We now provide common definition of these services from literature and
then we discuss how they fit in cloud computing context. To the best of our
knowledge our paper is the first to discuss these services interdependencies in
cloud computing context.
Reliability is a statistical number that is difficult and time consuming to
measure. Reliability in general is related to the average time taken for the
3
component to fail (i.e. Mean Time Between Failure (MTBF) or Mean Time To
Fail (MTTF)) [11]. By fail we do not mean a planned maintenance window;
i.e. if a component is brought down because of a fault then the component
reliability will be negatively affected; however, if a component is brought down
because of planned maintenance then reliability will not be affected at all.
Unlike individual component reliability, end-to-end service reliability is re-
lated to the success in which a service functions [8]. End-to-end service relia-
bility is based on the resilience of components architect to support the service.
High end-to-end service reliability implies that a service always provides correct
results and guarantees no data loss. Higher individual components reliability
together with excellent architect and well defined management processes, help in
supporting higher resilience. This in turn increases end-to-end service reliability
and availability.
Availability of a service represents the relative time a service provides its
intended functions. It is based on two main factors: (a.) Mean time between
failure (MTBF) and (b.) Mean time to repair (MTTR) [11]. Availability is then
calculated as MTBF/(MTBF+MTTR). For example, 99.5% availability within
a year means the service can be down within a year for 43.8 hours regardless
of the reason (e.g. planned maintenance or unplanned failure). High levels of
availability are the result of excellent technical architect, which considers well
crafted procedures, redundant components, and high components reliability; i.e.
resilient design.
Resilience is the ability of systems to maintain its features (e.g. serviceabil-
ity and security) despite a number of sub-system and components failures [11].
High resilience can be achieved by providing redundancy together with care-
ful design (eliminating single points of failure) and well planned procedures.
Resilient design helps in achieving higher availability and end-to-end service re-
liability, as its design approach focuses on tolerating and surviving the inevitable
failures rather than trying to reduce them. The complexity of cloud infrastruc-
ture means a large number of sub-systems have to work perfectly together to
keep the operation running. In addition multiple and different groups need to
cooperate, exchange critical messages and coordinate amongst themselves when
taking a self-managed decision, as explained in section 2.3.
Adaptability is the ability of systems to provide timely and efficient reaction
on system changes. Example of such changes include: (1.) increase/decrease in
service request affecting overall system load, (2.) size of resources, (3.) security
requirements, (3.) environmental conditions (e.g. different types of resources),
and (4.) components failure. Adaptability should always consider overall system
architect ensuring the main properties of a system are preserved (e.g. security,
resilience, availability and reliability).
Scalability is the ability of systems to support dynamic environment by
adding and removing resources quickly and efficiently. For example, on peak
periods the system should scale resources up, and similarly on off-peak periods
the system should release unneeded resources. These should not affect funda-
mental system properties and should always represent user defined requirements,
as defined in SLA and QoS.
Scalability can be of either or combination of two types: horizontal scalability
and vertical scalability. Horizontal Scalability is about the amount of instances
that would need to be added or removed to a system to satisfy increase or de-
crease in demand. Vertical Scalability is about increasing or decreasing the size
4
of instances themselves to maintain increase or decrease in demand. Scalability
must not affect user-defined security and privacy requirements. For example,
adding a Virtual Machines (VM) to a group of VMs must preserve the overall
system security; e.g. having a less secure VM enables revealing sensitive content.
5
Figure 1: Factors Affecting Self-Managed Services Behavior
6
Figure 2: Self-Managed Services Conceptual Model In Cloud Context
7
importantly it ensures that the end-to-end service integrity is maintained
(i.e. no data loss and correct service execution). If service integrity is
affected by anyway and cannot be immediately and quickly recovered,
service reliability then notifies the availability service to immediately bring
the service down. This is to ensure that data integrity is always protected.
Simultaneously, adaptability and resilience process should automatically
attempt to recover the system and notifies system administrators in case
of a decision cannot be automatically made (e.g. data corruption that
requires manual intervention by an expert domain administrator).
3.1 Challenges
Providing self-managed services require careful consideration and analysis not
only because of their complexity and inter-dependability but also for the follow-
ing reasons.
8
right decision to provide redundant active/active resources across distant
locations and in other cases it is wiser to provide active/passive.
3.2 Requirements
We are still working on identifying the requirements.
4 Conclusion
Current cloud infrastructure does not provide the full potential of automated
self-managed services, and relies on cloud’s employees (system architects, system
and security administrators) to support the virtual infrastructure. In this paper
we present a conceptual model of self-managed services in the cloud. These
help in understanding the required functions and their interdependencies when
providing self-managed services in clouds’ infrastructure. Also, these help in
realizing the challenges involved in providing automated management functions
of clouds’ virtual infrastructure. We have just started working on this as part
of EU funded TCloud (Trusted Cloud) project.
5 Acknowledgment
This research has been supported by the TCloud project2 , which is funded by the
EU’s Seventh Framework Program ([FP7/2007-2013]) under grant agreement
number ICT-257243.
The author would like to thank Andrew Martin and Matthias Schunter for
their discussion and valuable input.
References
[1] Amazon. Amazon Elastic Compute Cloud (Amazon EC2), 2010.
http://aws.amazon.com/ec2/.
2 http://www.tclouds-project.eu
9
[2] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph,
Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Pat-
terson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above
the Clouds: A Berkeley View of Cloud Computing, 2009.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.
[3] IBM. Autonomic computing, 2001. http://www.research.ibm.com/autonomic/.
[4] Keith Jeffery and Burkhard NeideckerLutz. The Future of Cloud Comput-
ing — Opportunities For European Cloud Computing Beyond 2010.
[5] Peter Mell and Tim Grance. The NIST Definition of Cloud Computing.
[6] Microsoft. Microsoft System Center IT Infrastructure Server Management
Solutions, 2010. http://www.microsoft.com/systemcenter/.
10