Stucor Cs8791 DC
Stucor Cs8791 DC
SYLLABUS
UNIT I INTRODUCTION
REFERENCES
UNIT I INTRODUCTION
● Over the last three decades, businesses that use computing resources have learned to
face a vast array of buzzwords like grid computing, utility computing, autonomic
computing, on-demand computing and so on.
● In history, the term cloud has been used as a metaphor for the Internet.
● This usage of the term was originally derived from its common illustration in network
diagrams as an outline of a cloud and the symbolic representation used to represent the
transport of data across the network to an endpoint location on the other side of the
network.
● Figure 1.1 illustrates the network diagram which includes the symbolic representation of
cloud
● The cloud computing concepts were initiated in 1961, when Professor John McCarthy
suggested that computer time-sharing technology might lead to a future where
computing power and specific applications might be sold through a utility-type business
model.
● This idea became very popular in the late 1960s, but in mid 1970s the idea vanished
away when it became clear that the IT Industries of the day were unable to sustain such
a innovative computing model. However, since the turn of the millennium, the concept
has been restored.
● In early days, enterprises used the utility computing model primarily for non-mission-
critical requirements, but that is quickly changing as trust and reliability issues are
resolved.
● Research analysts and technology vendors are inclined to define cloud computing very
closely, as a new type of utility computing that basically uses virtual servers that have
been made available to third parties via the Internet.
● Others aimed to describe the term cloud computing using a very broad, all-inclusive
application of the virtual computing platform. They confront that anything beyond the
network firewall limit is in the cloud.
● The cloud sees no borders and thus has made the world a much smaller place. Similar
to that the Internet is also global in scope but respects only established communication
paths.
● People from everywhere now have access to other people from anywhere else.
● Globalization of computing assets may be the major contribution the cloud has made to
date. For this reason, the cloud is the subject of many complex geopolitical issues.
● Cloud computing is viewed as a resource available as a service for virtual data centers.
Cloud computing and virtual data centers are different one.
● For example, Amazon’s S3 is Simple Storage Service. This is a data storage service
designed for use across the Internet. It is designed to create web scalable computing
easier for developers.
● Another example is Google Apps. This provides online access via a web browser to the
most common office and business applications used today. The Google server stores all
the software and user data.
● Managed service providers (MSPs) offers one of the oldest form of cloud computing.
● Grid computing is often confused with cloud computing. Grid computing is a form of
distributed computing model that implements a virtual supercomputer made up of a
cluster of networked or Inter networked computers involved to perform very large tasks.
● Most of the cloud computing deployments in market today are powered by grid
computing implementations and are billed like utilities, but cloud computing paradigm is
evolved next step away from the grid utility model.
● The majority of cloud computing infrastructure consists of time tested and highly reliable
services built on servers with varying levels of virtualized technologies, which are
delivered via large scale data centers operating under various service level agreements
that require 99.9999% uptime.
Definition of cloud
● Cloud computing is a model for delivering IT services in which resources are retrieved
from the internet through web based tools and applications rather than a direct
connection to the server.
● In other words, cloud computing is a distributed computing model over a network and
means the ability to run a program on many connected components at a same time
● In the cloud computing environment, real server machines are replaced by virtual
machines. Such virtual machines do not physically exist and can therefore be moved
around and scaled up or down on the fly without affecting the cloud user as like a
natural cloud.
● Cloud refers to software, platform, and Infrastructure that are sold as a service. The
services accessed remotely through the Internet
● The cloud users can simply log on to the network without installing anything. They do not
pay for hardware and maintenance. But the service providers pay for physical equipment
and maintenance.
● The concept of cloud computing becomes much more understandable when one begins
to think about what modern IT environments always require scalable capacity or
additional capabilities to their infrastructure dynamically, without investing money in the
purchase of new infrastructure, all the while without needing to conduct training for new
personnel and without the need for licensing new software.
● The hardware is a part of the evolutionary process. As hardware evolved, so did the
software. As networking evolved, so did the rules for how computers communicate. The
development of such rules or protocols, helped to drive the evolution of Internet
software.
● Establishing a common protocol for the Internet led directly to rapid growth in the
number of users online.
● Today, enterprises discuss about the uses of IPv6 (Internet Protocol version 6) to ease
addressing concerns and for improving the methods used to communicate over the
Internet.
● Usage of web browsers led to a stable migration away from the traditional data center
model to a cloud computing based model. And also, impact of technologies such as
● The first step along with the evolutionary path of computers was occurred in 1930, when
the first binary arithmetic was developed and became the foundation of computer
processing technology, terminology, and programming languages.
● Calculating devices date back to at least as early as 1642, when a device that could
mechanically add numbers was invented.
● Adding devices were evolved from the abacus. This evolution was one of the most
significant milestones in the history of computers.
● In 1939, the Berry brothers were invented an electronic computer that capable of
operating digital aspects. The computations were performed using vacuum tube
technology.
● In 1941, the introduction of Z3 at the German Laboratory for Aviation purpose in Berlin
was one of the most significant events in the evolution of computers because Z3
machine supported both binary arithmetic and floating point computation. Because it was
a “Turing complete” device, it is considered to be the very first computer that was fully
operational.
● The first generation of modern computers traced to 1943, when the Mark I and Colossus
computers were developed for fairly different purposes.
● With financial support from IBM, the Mark I was designed and developed at Harvard
University. It was a general purpose electro, mechanical, programmable computer.
● Colossus is an electronic computer built in Britain at the end 1943. Colossus was the
world’s first programmable, digital, electronic, computing device.
● In general, First generation computers were built using hard-wired circuits and vacuum
tubes.
● ENIAC composed of 18,000 thermionic valves, weighed over 60,000 pounds, and
consumed 25 kilowatts of electrical power per hour. ENIAC was capable of performing
one lakh calculations a second.
● The integrated circuit or microchip was developed by Jack St. Claire Kilby, an
achievement for which he received the Nobel Prize in Physics in 2000.
● Claire Kilby’s invention initiated an explosion in third generation computers. Even though
the first integrated circuit was produced in 1958, microchips were not used in
programmable computers until 1963.
● In 1971, Intel released the world’s first commercial microprocessor called Intel 4004.
● Intel 4004 was the first complete CPU on one chip and became the first commercially
available microprocessor. It was possible because of the development of new silicon
gate technology that enabled engineers to integrate a much greater number of
transistors on a chip that would perform at a much faster speed.
● The fourth generation computers that were being developed at this time utilized a
microprocessor that put the computer’s processing capabilities on a single integrated
circuit chip.
● The first commercially available personal computer was the MITS Altair 8800, released
at the end of 1974. What followed was a flurry of other personal computers to market,
such as the Apple I and II, the Commodore PET, the VIC-20, the Commodore 64, and
eventually the original IBM PC in 1981. The PC era had begun in earnest by the mid-
1980s.
● Even though microprocessing power, memory and data storage capacities have
increased by many orders of magnitude since the invention of the 4004 processor, the
technology for Large Scale Integration (LSI) or Very Large Scale Integration (VLSI)
microchips has not changed all that much.
● For this reason, most of today’s computers still fall into the category of fourth generation
computers.
● The Internet is named after the evolution of Internet Protocol which is the standard
communications protocol used by every computer on the Internet.
● Vannevar Bush was written a visionary description of the potential uses for information
technology with his description of an automated library system called MEMEX.
● Bush introduced the concept of the MEMEX in late 1930s as a microfilm based device in
which an individual can store all his books and records.
● The second individual who has shaped the Internet was Norbert Wiener.
● Wiener was an early pioneer in the study of stochastic and noise processes. Norbert
Wiener work in stochastic and noise processes was relevant to electronic engineering,
communication, and control systems.
● SAGE refers Semi Automatic Ground Environment. SAGE was the most ambitious
computer project and started in the mid 1950s and became operational by 1963. It
remained in continuous operation for over 20 years, until 1983.
● A minicomputer was invented specifically to realize the design of the Interface Message
Processor (IMP). This approach provided a system independent interface to the
ARPANET.
● The IMP would handle the interface to the ARPANET network. The physical layer, the
data link layer, and the network layer protocols used internally on the ARPANET were
implemented using IMP.
● Using this approach, each site would only have to write one interface to the commonly
deployed IMP.
● The first networking protocol that was used on the ARPANET was the Network Control
Program (NCP). The NCP provided the middle layers of a protocol stack running on an
ARPANET connected host computer.
● The lower-level protocol layers were provided by the IMP host interface, the NCP
essentially provided a transport layer consisting of the ARPANET Host-to-Host Protocol
(AHHP) and the Initial Connection Protocol (ICP).
● The AHHP defines how to transmit a unidirectional and flow controlled stream of data
between two hosts.
● The ICP specifies how to establish a bidirectional pair of data streams between a pair of
connected host processes.
● Robert Kahn and Vinton Cerf who built on what was learned with NCP to develop the
TCP/IP networking protocol commonly used nowadays. TCP/IP quickly became the
most widely used network protocol in the world.
● Over time, there evolved four increasingly better versions of TCP/IP (TCP v1, TCP v2, a
split into TCP v3 and IP v3, and TCP v4 and IPv4). Now, IPv4 is the standard protocol,
but it is in the process of being replaced by IPv6.
● The amazing growth of the Internet throughout the 1990s caused a huge reduction in the
number of free IP addresses available under IPv4. IPv4 was never designed to scale to
global levels. To increase available address space, it had to process data packets that
were larger.
● After examining a number of proposals, the Internet Engineering Task Force (IETF)
settled on IPv6, which was released in early 1995 as RFC 1752. IPv6 is sometimes
called the Next Generation Internet Protocol (IPNG) or TCP/IP v6.
● The creation and management of virtual machines has often been called platform
virtualization.
● This approach is called as round robin scheduling (RR scheduling). It is one of the
oldest, simplest, fairest, and most widely used scheduling algorithms, designed
especially for time-sharing systems.
● The two fundamental and dominant models of computing environment are sequential
and parallel. The sequential computing era was begun in the 1940s. The parallel and
distributed computing era was followed it within a decade.
● The four key elements of computing developed during these eras are architectures,
compilers, applications, and problem solving environments.
● The terms parallel computing and distributed computing are often used interchangeably,
even though which meant somewhat different things.
● The term parallel implies a tightly coupled system, whereas distributed refers to a wider
class of system which includes tightly coupled systems.
● More specifically, the term parallel computing refers to a model in which the computation
is divided among several processors which sharing the same memory.
● In parallel computing paradigm, each processor is of the same type and it has the same
capability. The shared memory has a single address space, which is accessible to all the
processors.
● A given task is divided into multiple subtasks using a divide and conquer technique, and
each subtask is processed on a different Central Processing Unit (CPU).
● The term distributed computing encompasses any architecture or system that allows the
computation to be broken down into units and executed concurrently on different
computing elements, whether these are processors on different nodes, processors on
the same computer, or cores within the same processor.
● The core elements of parallel processing are CPUs. Based on the number of instruction
streams and data streams that can be processed simultaneously, computing systems
are classified into four categories proposed by Michael J. Flynn in 1966.
● MIMD systems are broadly categorized into shared memory MIMD and distributed
memory MIMD based on the way processing elements are coupled to the main memory.
● In the shared memory MIMD model, all the processing elements are connected to a
single global memory and they all have access to it.
● In the distributed memory MIMD model, all processing elements have a local memory.
Systems based on this model are also called loosely coupled multiprocessor systems.
● In general, Failures in a shared memory MIMD affects the entire system, where as this is
not the case of the distributed model, in which each of the processing elements can be
easily isolated.
○ Data parallelism
○ Process parallelism
○ Farmer-and-worker model
● In data parallelism, the divide and conquer methodology is used to split data into multiple
sets, and each data set is processed on different processing elements using the same
instruction.
● In process parallelism, a given operation has multiple distinct tasks that can be
processed on multiple processors.
● In farmer and worker model, a job distribution approach is used in which one processor
is configured as master and all other remaining processing elements are designated as
slaves. The master assigns jobs to slave processing elements and, on completion, they
inform the master, which in turn collects results.
● Parallelism within an application can be detected at several levels such as Large grain
(or task level), Medium grain (or control level), Fine grain (data level), Very fine grain
(multiple-instruction issue)
● A distributed system is the collection of independent computers that appears to its users
as a single coherent system.
● A distributed system is the result of the interaction of several components that pass
through the entire computing stack from hardware to software.
● At the very bottom layer, computer and network hardware constitute the physical
infrastructure.
● The hardware components are directly managed by the operating system, which
provides the basic services for inter process communication (IPC), process scheduling
and management, and resource management in terms of file system and local devices.
● The use of well-known standards at the operating system level and even more at the
hardware and network levels allows easy harnessing of heterogeneous components and
their organization into a coherent and uniform system.
● The middleware layer leverages such services to build a uniform environment for the
development and deployment of distributed applications.
● The top of the distributed system stack is represented by the applications and services
designed and developed to use the middleware.
● The second class includes all those styles that describe the physical organization of
distributed software systems in terms of their major components.
● According to Garlan and Shaw, architectural styles are classified as shown in Table 1.1
Data-centered Repository
Blackboard
● The repository architectural style is the most relevant reference model in this category. It
is characterized by two main components: the central data structure, which represents
the current state of the system, and a collection of independent components, which
operate on the central data.
● The pipe and filter style is a variation of the previous style for expressing the activity of a
software system as a sequence of data transformations. Each component of the
processing chain is called a filter, and the connection between one filter and the next is
represented by a data stream.
● The core feature of the interpreter style is the presence of an engine that is used to
interpret a pseudo code expressed in a format acceptable for the interpreter. The
interpretation of the pseudo-program constitutes the execution of the program itself.
● Object Oriented Style encompasses a wide range of systems that have been designed
and implemented by leveraging the abstractions of object oriented programming
● The layered system style allows the design and implementation of software systems in
terms of layers, which provide a different level of abstraction of the system.
● Each layer generally operates with at most two layers: the one that provides a lower
abstraction level and the one that provides a higher abstraction layer.
● On the other hand, Event Systems architectural style where the components of the
system are loosely coupled and connected.
● The client/server model features two major components: a server and a client. These
two components interact with each other through a network connection using a given
protocol. The communication is unidirectional. The client issues a request to the server,
and after processing the request the server returns a response.
● The important operations in the client-server paradigm are request, accept (client side),
and listen and response (server side).
● In general, multiple clients are interested in such services and the server must be
appropriately designed to efficiently serve requests coming from different clients. This
consideration has implications on both client design and server design.
● For the client design, there are two models: Thin client model and Fat client model.
● Thin client model, the load of data processing and transformation is put on the server
side, and the client has a light implementation that is mostly concerned with retrieving
and returning the data it is being asked for, with no considerable further processing.
● Fat client model, the client component is also responsible for processing and
transforming the data before returning it to the user, whereas the server features a fairly
light implementation that is mostly concerned with the management of access to the
data.
● The three major components in the client-server model are presentation, application
logic, and data storage.
● Presentation, application logic, and data maintenance can be seen as conceptual layers,
which are more appropriately called tiers.
● The mapping between the conceptual layers and their physical implementation in
modules and components allows differentiating among several types of architectures,
which go under the name of multi-tiered architectures.
● Two-tier architecture partitions the systems into two tiers, which are located one in the
client component and the other on the server. The client is responsible for the
presentation tier by providing a user interface. The server concentrates the application
logic and the data store into a single tier.
● Three-tier architecture separates the presentation of data, the application logic, and the
data storage into three tiers. This architecture is generalized into an N-tier model in case
it is necessary to further divide the stages composing the application logic and storage
tiers.
● The peer-to-peer model introduces a symmetric architecture in which all the components
are called as peers, play the same role and incorporate both client and server
capabilities of the client/server model.
● There are several different models in which processes can interact with each other;
these map to different abstractions for IPC. Among the most relevant models are shared
memory, remote procedure call (RPC), and message passing.
● Message passing introduces the concept of a message as the main abstraction of the
model. The entities exchanging information explicitly encode in the form of a message
the data to be exchanged. The structure and the content of a message vary according to
the model. Examples of this model are the Message-Passing Interface (MPI) and
OpenMP.
● Remote procedure call paradigm extends the concept of procedure call beyond the
boundaries of a single process, thus triggering the execution of code in remote
processes. In this case, underlying client/server architecture is implied. A remote
process hosts a server component, thus allowing client processes to request the
invocation of methods, and returns the result of the execution.
● This model organizes the communication among single components. Each message is
sent from one component to another, and there is a direct addressing to identify the
message receiver. In a point-to-point communication model it is necessary to know the
location of or how to address another component in the system.
● This model introduces a different strategy, one that is based on notification among
components.
● There are two major roles: the publisher and the subscriber.
● There are two major strategies for dispatching the event to the subscribers:
○ Push strategy. In this case it is the responsibility of the publisher to notify all the
subscribers. For example, with a method invocation.
○ Pull strategy. In this case the publisher simply makes available the message for a
specific event, and it is the responsibility of the subscribers to check whether
there are messages on the events that are registered.
Request-reply message model
● The request-reply message model identifies all communication models in which, for each
message sent by a process, there is a reply.
● This model is quite popular and provides a different classification that does not focus on
the number of the components involved in the communication but rather on how the
dynamic of the interaction evolves.
● RPC allows extending the concept of a procedure call beyond the boundaries of a
process and a single memory address space.
● The called procedure and calling procedure may be on the same system or they may be
on different systems in a network.
Service-oriented computing
● SOA encompasses a set of design principles that structure system development and
provide means for integrating components into a coherent and decentralized system.
● There are two major roles within SOA: the service provider and the service consumer.
From the cloud computing’s various definitions; a certain set of key characteristics emerges.
Figure 1.15 illustrates various key characteristics related to cloud computing paradigm.
● These demands are thereafter automatically granted by a cloud provider’s service and
the users are only charged for their usage, i.e., the time they were in possession of the
resources.
● The reactivity of a cloud solution, with regard to resource provisioning is indeed of prime
importance as it is closely related to the cloud’s pay-as-you-go business model.
● It is one of the important and valuable features of Cloud Computing as the user can
continuously monitor the server uptime, capabilities, and allotted network storage. With
this feature, the user can also monitor the computing capabilities.
● Resources in the cloud need not only be provisioned rapidly but also accessed and
managed universally, using standard Internet protocols, typically via RESTful web
services.
● This enables the users to access their cloud resources using any type of devices,
provided they have an Internet connection.
● Universal access is a key feature behind the cloud’s widespread adoption, not only by
professional actors but also by the general public that is nowadays familiar with cloud
based solutions such as cloud storage or media streaming.
● Capabilities are available over the network and accessed through standard mechanisms
that promote use by heterogeneous thin or thick client platforms such as mobile phones,
tablets, laptops, and workstations.
On-demand provisioning
Universal access
Enhanced Reliability
Measures Services
Multitentency
Resource Pooling
Elasticity
Scalability
High Availability
Maintenance
Security
● Cloud computing enables the users to enhance the reliability of their applications.
● Cloud providers usually have more than one data center and further reliability can be
achieved by backing data up in different locations.
● This can also be used to ensure service availability, in the case of routine maintenance
operations or the rarer case of a natural disaster.
● The user can achieve further reliability using the services of different cloud providers.
● The customers are entitled to a certain quality of service, guaranteed by the Service
Level Agreement that they should be able to supervise.
● Therefore, cloud providers offer monitoring tools, either using a graphical interface or via
an API.
● These tools also help the providers themselves for billing and management purposes.
1.5.5 Multitenancy
● As the grid before, the cloud’s resources are shared by different simultaneous users.
These users had to reserve in advance a fixed number of physical machines for a fixed
amount of time.
● They can also run alongside other users’ provisioned resources thus requiring a lesser
amount of physical resources. Consequently, important energy savings can be made by
shutting down the unused resources or putting them in energy saving mode.
● The provider’s computing resources are pooled to serve multiple consumers using a
multi-tenant model, with different physical and virtual resources dynamically assigned
and reassigned according to consumer demand.
● There is a sense of location independence in that the customer generally has no control
or knowledge over the exact location of the provided resources but may be able to
specify location at a higher level of abstraction (e.g., country, state, or datacenter).
● Elasticity is the ability of a system to include and exclude resources like CPU cores,
memory, Virtual Machine and container instances to adapt to the load variation in real
time.
● Elasticity is a dynamic property for cloud computing. There are two types of elasticity.
Horizontal and Vertical.
● There are other terms such as scalability and efficiency, which are associated with
elasticity but their meaning is different from elasticity while they are used
interchangeably in some cases.
● Scalability is the ability of the system to sustain increasing workloads by making use of
additional resources, it is time independent and it is similar to the provisioning state in
elasticity but the time has no effect on the system (static property).
● The following equation that summarizes the elasticity concept in cloud computing.
● To the consumer, the capabilities available for provisioning often appear to be unlimited
and can be appropriated in any quantity at any time.
● The servers are easily maintained and the downtime is very low and even in some
cases, there is no downtime.
● Cloud Computing comes up with an update every time by gradually making it better. The
updates are more compatible with the devices and perform faster than older ones along
with the bugs which are fixed.
● The capabilities of the Cloud can be modified as per the use and can be extended a lot.
It analyzes the storage usage and allows the user to buy extra Cloud storage if needed
for a very small amount.
1.5.10 Security
● Cloud Security is one of the best features of cloud computing. It creates a snapshot of
the data stored so that the data may not get lost even if one of the servers gets
damaged.
● The data is stored within the storage devices, which cannot be hacked and utilized by
any other person. The storage service is quick and reliable.
4. Define Cloud.
● Cloud refers to software, platform, and Infrastructure that are sold as a service. The
services accessed remotely through the Internet
● The cloud users can simply log on to the network without installing anything. They do
not pay for hardware and maintenance. But the service providers pay for physical
equipment and maintenance.
● The first networking protocol that was used on the ARPANET was the Network
Control Program (NCP).
● The NCP provided the middle layers of a protocol stack running on an ARPANET
connected host computer.
● The four key elements of computing developed during these eras are architectures,
compilers, applications, and problem solving environments.
● The terms parallel computing and distributed computing are often used
interchangeably, even though which meant somewhat different things. Parallel
implies a tightly coupled system, whereas distributed refers to a wider class of
system which includes tightly coupled systems.
● The term distributed computing encompasses any architecture or system that allows
the computation to be broken down into units and executed concurrently on different
computing elements, whether these are processors on different nodes, processors
on the same computer, or cores within the same processor.
● Data parallelism
● Process parallelism
● Farmer-and-worker model
● Data-centered
● Data flow
● Virtual machine
● Call and return
● Independent components
● The repository architectural style is the most relevant reference model in this
category.
● It is characterized by two main components: the central data structure, which
represents the current state of the system, and a collection of independent
components, which operate on the central data.
● Thin client model, the load of data processing and transformation is put on the server
side, and the client has a light implementation that is mostly concerned with
retrieving and returning the data it is being asked for, with no considerable further
processing.
● Fat client model, the client component is also responsible for processing and
transforming the data before returning it to the user.
● Two-tier architecture partitions the systems into two tiers, which are located one in
the client component and the other on the server.
● Three-tier architecture separates the presentation of data, the application logic, and
the data storage into three tiers.
20. List the strategies for dispatching the event to the subscribers
● Push strategy. In this case it is the responsibility of the publisher to notify all the
subscribers.
● Pull strategy. In this case the publisher simply makes available the message for a
specific event.
● The request-reply message model identifies all communication models in which, for
each message sent by a process, there is a reply.
● This model is quite popular and provides a different classification that does not focus
on the number of the components involved in the communication but rather on how
the dynamic of the interaction evolves.
-------------------------------------------------------------------------------------------------------------------------------
Service Oriented Architecture –REST and Systems of Systems –Web Services –Publish-
Subscribe Model –Basics of Virtualization –Types of Virtualization –Implementation Levels of
Virtualization –Virtualization Structures –Tools and Mechanisms –Virtualization of CPU –
Memory –I/O Devices –Virtualization Support and Disaster Recovery.
-------------------------------------------------------------------------------------------------------------------------------
● A service encapsulates a software component that gives a set of coherent and related
functionalities that can be reused and integrated into larger and more complex
applications.
● Don Box identifies four major characteristics with the intention of identify a service.
● SOA encompasses a set of design principles that structure system development and
provide means for integrating components into a coherent and decentralized system.
○ Service provider
○ Service consumer
● First, the service provider is the maintainer of the service and the organization that
makes available one or more services for others to use.
● To advertise services, the provider can publish them in a registry along with a service
contract that specifies the nature of the service, how to use the service, the requirements
for the service and the fees charged.
● Second, the service consumer can locate the service metadata in the registry and
develop the required client components to bind and use the service.
● It is very common in SOA based computing systems that components play the roles of
both service provider and service consumer.
● Services might aggregate information and data retrieved from other services or create
workflows of services to satisfy the request of a given service consumer. This practice is
called as service orchestration, which more generally describes the automated
arrangement, coordination and management of more complex computer systems,
middleware and services.
● SOA provides a reference model for architecting several software systems primarily for
enterprise business applications and systems.
○ Loose coupling
○ Abstraction
○ Reusability
○ Autonomy
■ Services have control over the logic they encapsulate and do not know
about their implementation.
○ Lack of state
Discoverability
○
■ Services are defined by description documents that constitute
supplemental metadata through which they can be effectively discovered.
■ Service discovery provides an effective means for utilizing third party
resources.
○ Composability
● Together with these principles, other resources guide the use of SOA for enterprise
application integration (EAI).
● The SOA manifest integrates the previously described principles with general
considerations about the overall goals of a service oriented approach to enterprise
application software design and what is valued in SOA.
● CORBA has been a suitable platform for realizing SOA systems because it provides
interoperability among different implementations and has been designed as a
specification supporting the development of industrial applications.
● Nowadays, SOA is mostly realized through Web services technology, which provides an
interoperable platform for connecting systems and applications.
● Web services are the prominent technology for implementing SOA systems and
applications.
● They leverage Internet technologies and standards for building distributed systems.
Several aspects make Web services the technology of choice for SOA.
○ First, they allow for interoperability across different platforms and programming
languages.
○ Second, they are based on well-known and vendor independent standards such
as HTTP, SOAP, XML and WSDL.
○ Third, they provide an intuitive and simple way to connect heterogeneous
software systems, enabling the quick composition of services in a distributed
environment.
○ Finally, they provide the features required by enterprise business applications to
be used in an industrial environment.
● They define facilities for enabling service discovery, which allows the system architect to
more efficiently compose SOA applications and service metering to assess whether a
specific service complies with the contract between the service provider and the service
consumer.
● Using as a basis the object oriented abstraction, a Web service exposes a set of
operations that can be invoked by leveraging Internet based protocols.
● The semantics for invoking Web service methods is expressed through interoperable
standards such as XML and WSDL, which also provide a complete framework for
expressing simple and complex types in a platform independent manner.
● HTTP is the most popular transport protocol used for interacting with Web services.
WSDL
Web Server
WS Client
Query UDDI Registry
WSDL
Invocation
Web Server
Application
Web Service
WSDL
● Figure 2.1 describes the common use case scenarios for Web services.
● System architects develop a Web service with their technology of choice and deploy it in
compatible Web or application servers.
● Service consumers can look up and discover services in global catalogs using Universal
Description Discovery and Integration (UDDI).
● Web services are now extremely popular, so bindings exist for any mainstream
programming language in the form of libraries or development support tools.
● This makes the use of Web services seamless and straightforward with respect to
technologies such as CORBA that require much more integration effort.
● Moreover, being interoperable, Web services constitute a better solution for SOA with
respect to several distributed object frameworks, such as .NET Remoting, Java RMI,
and DCOM/COM1, which limit their applicability to a single platform or environment.
● Besides the main function of enabling remote method invocation by using Web based
and interoperable standards, Web services encompass several technologies that put
together and facilitate the integration of heterogeneous applications and enable service
oriented computing.
● Figure 2.2 shows the Web service technologies stack that lists all the components of the
conceptual framework describing and enabling the Web services abstraction.
● These technologies cover all the aspects that allow Web services to operate in a
distributed environment, from the specific requirements for the networking to the
discovery of services.
Management
Security
Service Publication (UDDI)
QoS
Service Description (ASDL)
XML based messaging (SOAP)
Network (HTTP, FTP, Email, ...)
● The backbone of all these technologies is XML, which is also one of the causes of Web
service’s popularity and ease of use.
● XML based languages are used to manage the low level interaction for Web service
method calls (SOAP), for providing metadata about the services (WSDL), for discovery
services (UDDI), and other core operations.
● In practice, the core components that enable Web services are SOAP and WSDL.
● Simple Object Access Protocol (SOAP) is an XML based language for exchanging
structured information in a platform-independent manner, constitutes the protocol used
for Web service method invocation.
● SOAP structures the interaction in terms of messages that are XML documents
mimicking the structure of a letter, with an envelope, a header, and a body.
● The header is optional and contains relevant information on how to process the
message.
Host : www.sample.com
Content-Type: application/soap+xml; charsetutf-8
Content-Length: <Size>
● The main uses of SOAP messages are method invocation and result retrieval.
● Figure 2.3 shows an example of a SOAP message used to invoke a Web service
method that retrieves the price of a given stock and the corresponding reply.
● Despite the fact that XML documents are easy to produce and process in any platform or
programming language, SOAP has often been considered quite inefficient because of
the excessive use of markup that XML imposes for organizing the information into a well-
formed document.
● Therefore, lightweight alternatives to the SOAP/XML pair have been proposed to support
Web services.
● In a RESTful system, a client sends a request over HTTP using the standard HTTP
methods (PUT, GET, POST, and DELETE) and the server issues a response that
includes the representation of the resource.
● The GET, PUT, POST, and DELETE methods constitute a minimal set of operations for
retrieving, adding, modifying and deleting the data.
● Together with an appropriate URI organization to identify resources, all the atomic
operations required by a Web service are implemented.
● The content of data is still transmitted using XML as part of the HTTP content, but the
additional markup required by SOAP is removed.
● For this reason, REST represents a lightweight alternative to SOAP, which works
effectively in contexts where additional aspects beyond those manageable through
HTTP are absent.
● This is not a great limitation, and RESTful Web services are quite popular and used to
deliver functionalities at enterprise scale:
○ Twitter
○ Yahoo! (search APIs, maps, photos, etc)
○ Flickr
○ Amazon.com
● Web Service Description Language (WSDL) is an XML based language for the
description of Web services.
● It is used to define the interface of a Web service in terms of methods to be called and
types and structures of the required parameters and return values.
● In Figure 2.3 we notice that the SOAP messages for invoking the GetPrice method and
receiving the result do not have any information about the type and structure of the
parameters and the return values.
● This information is stored within the WSDL document attached to the Web service.
● Therefore, Web service consumer applications already know which types of parameters
are required and how to interpret results.
● As an XML based language, WSDL allows for the automatic generation of Web service
clients that can be easily embedded into existing applications.
● Moreover, XML is a platform and language independent specification, so clients for web
services can be generated for any language that is capable of interpreting XML data.
● This is a fundamental feature that enables Web service interoperability and one of the
reasons that make such technology a solution of choice for SOA.
● Besides those directly supporting Web services, other technologies that characterize
Web 2.0 and contribute to enrich and empower Web applications and then SOA based
systems.
● These fall under the names of Asynchronous JavaScript and XML (AJAX), JavaScript
Standard Object Notation (JSON) and others.
● This transforms simple Web pages in complete applications and used to enrich the user
experience.
● AJAX uses XML to exchange data with Web services and applications
■ The publisher provides facilities for the subscriber to register its interest in
a specific topic or event.
■ Specific conditions holding true on the publisher side can trigger the
creation of messages that are attached to a specific event.
■ A message will be available to all the subscribers that registered for the
corresponding event.
● There are two major strategies for dispatching the event to the subscribers:
○ Push strategy
■ In this case the publisher simply makes available the message for a
specific event and it is responsibility of the subscribers to check whether
there are messages on the events that are registered.
● Publish and subscribe model is very suitable for implementing systems based on the
one to many communication model and simplifies the implementation of indirect
communication patterns.
● It is, in fact, not necessary for the publisher to know the identity of the subscribers to
make the communication happen.
● Virtualization is a large umbrella of technologies and concepts that are meant to provide
an abstract environment whether virtual hardware or an operating system to run
applications.
● The term virtualization is often synonymous with hardware virtualization, which plays a
fundamental role in efficiently delivering Infrastructure as a Service (IaaS) solutions for
cloud computing.
○ Lack of space
○ Greening initiatives
○ Rise of administrative costs
○ Guest
○ Host
○ Virtualization layer
● The guest represents the system component that interacts with the virtualization layer
rather than with the host, as would normally happen.
● The host represents the original environment where the guest is supposed to be
managed.
● The virtualization layer is responsible for recreating the same or a different environment
where the guest will operate.
● Increased security
● In particular, sharing, aggregation, emulation, and isolation are the most relevant
features
● Sharing
● Aggregation
○ Not only is it possible to share physical resource among several guests but
virtualization also allows aggregation, which is the opposite process.
○ A group of separate hosts can be tied together and represented to guests as a
single virtual host.
● Emulation
● Isolation
○ Benefits of Isolation
■ First it allows multiple guests to run on the same host without interfering
with each other.
■ Second, it provides a separation between the host and the guest.
● This feature is a reality at present, given the considerable advances in hardware and
software supporting virtualization.
● It becomes easier to control the performance of the guest by finely tuning the properties
of the resources exposed through the virtual environment.
● Portability
○ The concept of portability applies in different ways according to the specific type
of virtualization considered.
○ In the case of a hardware virtualization solution, the guest is packaged into a
virtual image that, in most cases, can be safely moved and executed on top of
different virtual machines
● Execution virtualization techniques into two major categories by considering the type of
host they require.
● Process level techniques are implemented on top of an existing operating system, which
has full control of the hardware.
● System level techniques are implemented directly on hardware and do not require or
require a minimum of support from existing operating system.
● Within these two categories we can list various techniques that offer the guest a different
type of virtual computation environment:
○ Bare hardware
○ Operating system resources
○ Low level programming language
○ Application libraries
● All these techniques concentrate their interest on providing support for the execution of
programs, whether these are the operating system, a binary specification of a program
compiled against an abstract machine model or an application.
● Modern computing systems can be expressed in terms of the reference model described
in Figure 2.4.
Applications
API API Calls
Libraries
ABI System Calls
Hardware
● At the bottom layer, the model for the hardware is expressed in terms of the Instruction
Set Architecture (ISA), which defines the instruction set for the processor, registers,
memory and an interrupt management.
● ISA is important to the operating system (OS) developer (System ISA) and developers of
applications that directly manage the underlying hardware (User ISA).
● The application binary interface (ABI) separates the operating system layer from the
applications and libraries, which are managed by the OS.
● ABI covers details such as low level data types, alignment, call conventions and defines
a format for executable programs.
● This interface allows portability of applications and libraries across operating systems
that implement the same ABI.
● For this purpose, the instruction set exposed by the hardware has been divided into
different security classes that define who can operate with them.
● The first distinction can be made between privileged and non privileged instructions.
○ Non privileged instructions are those instructions that can be used without
interfering with other tasks because they do not access shared resources.
○ This category contains all the floating, fixed-point, and arithmetic instructions.
● Privileged instructions are those that are executed under specific restrictions and are
mostly used for sensitive operations, which expose (behavior-sensitive) or modify
(control-sensitive) the privileged state.
● Some types of architecture feature more than one class of privileged instructions and
implement a finer control of how these instructions can be accessed.
○ Ring 0 is in the most privileged level and Ring 3 in the least privileged level.
○ Ring 0 is used by the kernel of the OS, rings 1 and 2 are used by the OS level
services, and Ring 3 is used by the user.
○ Recent systems support only two levels, with Ring 0 for supervisor mode and
Ring 3 for user mode.
Ring 0
(Most privileged
Mode)
Ring 1
Ring 2
Ring 3
(Least privileged
Mode)
● All the current systems support at least two different execution modes: supervisor mode
and user mode.
○ The supervisor mode denotes an execution mode in which all the instructions
(privileged and non privileged) can be executed without any restriction.
○ This mode, also called master mode or kernel mode, is generally used by the
operating system (or the hypervisor) to perform sensitive operations on hardware
level resources.
○ In user mode, there are restrictions to control the machine level resources.
● The distinction between user and supervisor mode allows us to understand the role of
the hypervisor and why it is called that.
● Conceptually, the hypervisor runs above the supervisor mode and from here the prefix
“hyper” is used.
● In reality, hypervisors are run in supervisor mode and the division between privileged
and non privileged instructions has posed challenges in designing virtual machine
managers.
● In this model, the guest is represented by the operating system, the host by the physical
computer hardware, the virtual machine by its emulation and the virtual machine
manager by the hypervisor.
● Hardware level virtualization is also called system virtualization, since it provides ISA to
virtual machines, which is the representation of the hardware interface of a system.
● This is to differentiate it from process virtual machines, which expose ABI to virtual
machines.
● There are two major types of hypervisor: Type I and Type II. Figure 2.6 shows different
type of hypervisors.
○ This type of hypervisor is also called a native virtual machine since it runs
natively on hardware.
Virtual Machine
ISA
Virtual Machine
Virtual Machine Manager ISA
ABI
Virtual Machine Manager
Operating System
ISA
ISA
Hardware
Hardware
● At present, examples of hardware assisted virtualization are the extensions to the x86
architecture introduced with Intel-VT (formerly known as Vanderpool) and AMD-V
(formerly known as Pacifica).
● These extensions, which differ between the two vendors, are meant to reduce the
performance penalties experienced by emulating x86 hardware with hypervisors.
● The reason for this is that by design the x86 architecture did not meet the formal
requirements introduced by Popek and Goldberg and early products were using binary
translation to trap some sensitive instructions and provide an emulated version.
● After 2006, Intel and AMD introduced processor extensions and a wide range of
virtualization solutions took advantage of them: Kernel-based Virtual Machine (KVM),
VirtualBox, Xen, VMware, Hyper-V, Sun xVM, Parallels, and others.
● Full virtualization refers to the ability to run a program, most likely an operating system,
directly on top of a virtual machine and without any modification, as though it were run
on the raw hardware.
● To make this possible, virtual machine managers are required to provide a complete
emulation of the entire underlying hardware.
● Whereas it is a desired goal for many virtualization solutions, full virtualization poses
important concerns related to performance and technical implementation.
● A simple solution to achieve full virtualization is to provide a virtual environment for all
the instructions, thus posing some limits on performance.
2.6.2.3 Paravirtualization
● This allows a simpler implementation of virtual machine managers that have to simply
transfer the execution of these operations, which were hard to virtualize, directly to the
host.
● This is possible when the source code of the operating system is available, and this is
the reason that paravirtualization was mostly explored in the opensource and academic
environment.
● This technique has been successfully used by Xen for providing virtualization solutions
for Linux-based operating systems specifically ported to run on Xen hypervisors.
● Operating systems that cannot be ported can still take advantage of para virtualization
by using ad hoc device drivers that remap the execution of critical instructions to the
paravirtualization APIs exposed by the hypervisor.
● Xen provides this solution for running Windows based operating systems on x86
architectures.
● Other solutions using paravirtualization include VMWare, Parallels, and some solutions
for embedded and real-time environments such as TRANGO, Wind River, and XtratuM.
● Partial virtualization provides a partial emulation of the underlying hardware, thus not
allowing the complete execution of the guest operating system in complete isolation.
● Partial virtualization allows many applications to run transparently, but not all the
features of the operating system can be supported as happens with full virtualization.
● Historically, partial virtualization has been an important milestone for achieving full
virtualization, and it was implemented on the experimental IBM M44/44X.
● Operating system level virtualization offers the opportunity to create different and
separated execution environments for applications that are managed concurrently.
● The kernel is also responsible for sharing the system resources among instances and for
limiting the impact of instances on each other.
● A user space instance in general contains a proper view of the file system which is
completely isolated and separate IP addresses, software configurations and access to
devices.
● Operating systems supporting this type of virtualization are general purpose, timeshared
operating systems with the capability to provide stronger namespace and resource
isolation.
● The chroot operation changes the file system root directory for a process and its children
to a specific directory.
● As a result, the process and its children cannot have access to other portions of the file
system than those accessible under the new root directory.
● Because Unix systems also expose devices as parts of the file system, by using this
method it is possible to completely isolate a set of processes.
● Following the same principle, operating system level virtualization aims to provide
separated and multiple execution containers for running applications.
● This technique is an efficient solution for server consolidation scenarios in which multiple
application servers share the same technology: operating system, application server
framework, and other components.
● It consists of a virtual machine executing the byte code of a program which is the result
of the compilation process.
● At runtime, the byte code can be either interpreted or compiled on the fly against the
underlying hardware instruction set.
● Programming language level virtualization has a long trail in computer science history
and originally was used in 1966 for the implementation of Basic Combined Programming
Language (BCPL), a language for writing compilers and one of the ancestors of the C
programming language.
● Other important examples of the use of this technology have been the UCSD Pascal and
Smalltalk.
● Virtual machine programming languages become popular again with Sun’s introduction
of the Java platform in 1996.
● The Java virtual machine was originally designed for the execution of programs written
in the Java language, but other languages such as Python, Pascal, Groovy and Ruby
were made available.
● The ability to support multiple programming languages has been one of the key
elements of the Common Language Infrastructure (CLI) which is the specification behind
.NET Framework.
● In this scenario, applications are not installed in the expected runtime environment but
are run as though they were.
● In general, these techniques are mostly concerned with partial file systems, libraries, and
operating system component emulation.
● Emulation can also be used to execute program binaries compiled for different hardware
architectures.
● Application virtualization is a good solution in the case of missing libraries in the host
operating system.
● In this case a replacement library can be linked with the application or library calls can
be remapped to existing functions available in the host system.
● Another advantage is that in this case the virtual machine manager is much lighter since
it provides a partial emulation of the runtime environment compared to hardware
virtualization.
● Compared to programming level virtualization, which works across all the applications
developed for that virtual machine, application level virtualization works for a specific
environment.
● One of the most popular solutions implementing application virtualization is Wine, which
is a software application allowing Unix-like operating systems to execute programs
written for the Microsoft Windows platform.
● Wine features a software application acting as a container for the guest application and
a set of libraries, called Winelib, that developers can use to compile applications to be
ported on Unix systems.
● Wine takes its inspiration from a similar product from Sun, Windows Application Binary
Interface (WABI) which implements the Win 16 API specifications on Solaris.
● A similar solution for the Mac OS X environment is CrossOver, which allows running
Windows applications directly on the Mac OS X operating system.
● VMware ThinApp is another product in this area, allows capturing the setup of an
installed application and packaging it into an executable image isolated from the hosting
operating system.
● Using this technique, users do not have to be worried about the specific location of their
data, which can be identified using a logical path.
● There are different techniques for storage virtualization, one of the most popular being
network based virtualization by means of storage area networks (SANs).
● SANs use a network accessible device through a large bandwidth connection to provide
storage facilities.
● Network virtualization combines hardware appliances and specific software for the
creation and management of a virtual network.
● Network virtualization can aggregate different physical networks into a single logical
network (external network virtualization) or provide network like functionality to an
operating system partition (internal network virtualization).
● A VLAN is an aggregation of hosts that communicate with each other as though they
were located under the same broadcasting domain.
● Internal network virtualization is generally applied together with hardware and operating
system-level virtualization, in which the guests obtain a virtual network interface to
communicate with.
○ The guest can share the same network interface of the host and use Network
Address Translation (NAT) to access the network;
○ The virtual machine manager can emulate, and install on the host, an additional
network device, together with the driver.
○ The guest can have a private network only with the guest.
● Desktop virtualization provides the same outcome of hardware virtualization but serves a
different purpose.
● Moreover, desktop virtualization addresses the problem of making the same desktop
environment accessible from everywhere.
● Although the term desktop virtualization strictly refers to the ability to remotely access a
desktop environment, generally the desktop environment is stored in a remote server or
a data center that provides a high availability infrastructure and ensures the accessibility
and persistence of the data.
● A specific desktop environment is stored in a virtual machine image that is loaded and
started on demand when a client connects to the desktop environment.
● This is a typical cloud computing scenario in which the user leverages the virtual
infrastructure for performing the daily tasks on his computer.
● The basic services for remotely accessing a desktop environment are implemented in
software components such as Windows Remote Services, VNC, and X Server.
● Infrastructures for desktop virtualization based on cloud computing solutions include Sun
Virtual Desktop Infrastructure (VDI), Parallels Virtual Desktop Infrastructure (VDI), Citrix
XenDesktop, and others.
● This is a particular form of virtualization and serves the same purpose of storage
virtualization by providing a better quality of service rather than emulating a different
environment.
● The idea is to separate the hardware from the software to yield better system efficiency.
For example, computer users gained access to much enlarged memory space when the
concept of virtual memory was introduced.
● With sufficient storage, any computer platform can be installed in another host computer,
even if they use processors with different instruction sets and run with distinct operating
systems on the same hardware.
● A traditional computer runs with a host operating system specially tailored for its
hardware architecture, as shown in Figure 2.7(a).
● After virtualization, different user applications managed by their own operating systems
(guest OS) can run on the same hardware, independent of the host OS. This is often
done by adding additional software, called a virtualization layer as shown in Figure
2.7(b).
● This virtualization layer is known as hypervisor or virtual machine monitor (VMM). The
VMs are shown in the upper boxes, where applications run with their own guest OS over
the virtualized CPU, memory, and I/O resources.
● The main function of the software layer for virtualization is to virtualize the physical
hardware of a host machine into virtual resources to be used by the VMs, exclusively.
This can be implemented at various operational levels, as we will discuss shortly.
● Common virtualization layers include the instruction set architecture (ISA) level,
hardware level, operating system level, library support level, and application level.
● At the ISA level, virtualization is performed by emulating a given ISA by the ISA of the
host machine. For example, MIPS binary code can run on an x86-based host machine
with the help of ISA emulation.
● With this approach, it is possible to run a large amount of legacy binary code written for
various processors on any given new hardware host machine. Instruction set emulation
leads to virtual ISAs created on any hardware machine.
● One source instruction may require tens or hundreds of native target instructions to
perform its function. This process is relatively slow.
● For better performance, dynamic binary translation is desired. This approach translates
basic blocks of dynamic source instructions to target instructions.
● The basic blocks can also be extended to program traces or super blocks to increase
translation efficiency.
APPLICATION LEVEL
Figure 2.8 Virtualization ranging from hardware to applications in five abstraction levels
● Hardware-level virtualization is performed right on top of the bare hardware. On the one
hand, this approach generates a virtual hardware environment for a VM. On the other
hand, the process manages the underlying hardware through virtualization.
● The idea is to virtualize a computer’s resources, such as its processors, memory, and
I/O devices.
● The intention is to upgrade the hardware utilization rate bymultiple users concurrently.
● The idea was implemented in the IBM VM/370 in the 1960s. Xen hypervisor has been
applied to virtualize x86-based machines to run Linux or other guest OS applications.
● This refers to an abstraction layer between traditional OS and user applications. OS-
level virtualization creates isolated containers on a single physical server and the OS
instances to utilize the hardware and software in data centers.
● The containers behave like real servers. OS-level virtualization is commonly used in
creating virtual hosting environments to allocate hardware resources among a large
number of mutually distrusting users.
● This kind of VM is often called a virtual execution environment (VE), Virtual Private
System (VPS), or simply container.
● For an OS-level VM, it is possible for a VM and its host environment to synchronize state
changes when necessary.
● All OS-level VMs on the same physical machine share a single operating system kernel
○ The virtualization layer can be designed in a way that allows processes in VMs to
access as many resources of the host machine as possible, but never to modify
them.
● Most applications use APIs exported by user level libraries rather than using lengthy
system calls by the OS. Since most systems provide well-documented APIs, such an
interface becomes another candidate for virtualization.
● The software tool WINE has implemented this approach to support Windows
applications on top of UNIX hosts. Another example is the vCUDA which allows
applications executing within VMs to leverage GPU hardware acceleration.
● Library level virtualization is also known as user-level Application Binary Interface (ABI)
or API emulation.
● This type of virtualization can create execution environments for running alien programs
on a platform rather than creating a VM to run the entire operating system.
● API call interception and remapping are the key functions performed. The WABI offers
middleware to convert Windows system calls to Solaris system calls.
● Lxrun is really a system call emulator that enables Linux applications written for x86
hosts to run on UNIX systems.
● Similarly, Wine offers library support for virtualizing x86 processors to run Windows
applications on UNIX hosts.
● Visual MainWin offers a compiler support system to develop Windows applications using
Visual Studio to run on some UNIX hosts.
● vCUDA virtualizes the CUDA library and can be installed on guest OSes. When CUDA
applications run on a guest OS and issue a call to the CUDA API, vCUDA intercepts the
call and redirects it to the CUDA API running on the host OS.
● In this scenario, the virtualization layer sits as an application program on top of the
operating system, and the layer exports an abstraction of a VM that can run programs
written and compiled to a particular abstract machine definition.
● Any program written in the HLL and compiled for this VM will be able to run on it. The
Microsoft .NET CLR and Java Virtual Machine (JVM) are two good examples of this
class of VM.
● The process involves wrapping the application in a layer that is isolated from the host
OS and other applications. The result is an application that is much easier to distribute
and remove from user workstations.
Instruction Set
Very Low Very High Moderate Moderate
Architecture
Hardware-level
Very High Moderate Very High High
virtualization
OS-level
Very High Low Moderate Low
virtualization
Library support
Moderate Low Low Low
level
User application
Low Low Very High Very High
level
● The virtualization layer is responsible for converting portions of the real hardware into
virtual hardware.
● Therefore, different operating systems such as Linux and Windows can run on the same
physical machine, simultaneously.
● Depending on the position of the virtualization layer, there are several classes of VM
architectures, namely the hypervisor architecture, paravirtualization and host based
virtualization.
● The hypervisor is also known as the VMM (Virtual Machine Monitor). They both perform
the same virtualization operations.
● The hypervisor supports hardware level virtualization on bare metal devices like CPU,
memory, disk and network interfaces.
● The hypervisor software sits directly between the physical hardware and its OS. This
virtualization layer is referred to as either the VMM or the hypervisor.
● The hypervisor provides hypercalls for the guest OSes and applications.
● Depending on the functionality, a hypervisor can assume micro kernel architecture like
the Microsoft Hyper-V.
● It can assume monolithic hypervisor architecture like the VMware ESX for server
virtualization.
● A micro kernel hypervisor includes only the basic and unchanging functions (such as
physical memory management and processor scheduling).
● The device drivers and other changeable components are outside the hypervisor.
● Essentially, a hypervisor must be able to convert physical devices into virtual resources
dedicated for the deployed VM to use.
● Xen is a microkernel hypervisor, which separates the policy from the mechanism.
● The Xen hypervisor implements all the mechanisms, leaving the policy to be handled by
Domain 0. Figure 2.9 shows architecture of Xen hypervisor.
● Xen does not include any device drivers natively. It just provides a mechanism by which
a guest OS can have direct access to the physical devices.
● Xen provides a virtual environment located between the hardware and the OS.
APPLICATION
APPLICATION
APPLICATION
APPLICATION
APPLICATION
APPLICATION
DOMAIN 0 GUEST DOMAIN
XEN
HARDWARE DEVICES
Figure 2.9 Xen domain 0 for control and I/O & guest domain for user applications.
● The core components of a Xen system are the hypervisor, kernel, and applications.
● Like other virtualization systems, many guest OSes can run on top of the hypervisor.
● However, not all guest OSes are created equal, and one in particular controls the others.
● The guest OS, which has control ability, is called Domain 0, and the others are called
Domain U.
● Domain 0 is a privileged guest OS of Xen. It is first loaded when Xen boots without any
file system drivers being available.
● Domain 0 is designed to access hardware directly and manage devices. Therefore, one
of the responsibilities of Domain 0 is to allocate and map hardware resources for the
guest domains (the Domain U domains).
● For example, Xen is based on Linux and its security level is C2. Its management VM is
named Domain 0 which has the privilege to manage other VMs implemented on the
same host.
● If Domain 0 is compromised, the hacker can control the entire system. So, in the VM
system, security policies are needed to improve the security of Domain 0.
● Domain 0, behaving as a VMM, allows users to create, copy, save, read, modify, share,
migrate and roll back VMs as easily as manipulating a file, which flexibly provides
tremendous benefits for users.
● Full virtualization does not need to modify the host OS. It relies on binary translation to
trap and to virtualize the execution of certain sensitive, non virtualizable instructions.
● The guest OSes and their applications consist of noncritical and critical instructions.
● A virtualization software layer is built between the host OS and guest OS.
● With full virtualization, noncritical instructions run on the hardware directly while critical
instructions are discovered and replaced with traps into the VMM to be emulated by
software.
● Both the hypervisor and VMM approaches are considered full virtualization.
● The VMM scans the instruction stream and identifies the privileged, control and behavior
sensitive instructions. When these instructions are identified, they are trapped into the
VMM, which emulates the behavior of these instructions.
● The guest OSes are installed and run on top of the virtualization layer. Dedicated
applications may run on the VMs. Certainly, some other applications can also run with
the host OS directly.
○ First, the user can install this VM architecture without modifying the host OS.
○ Second, the host-based approach appeals to many host machine configurations.
● According to the x86 ring definitions, the virtualization layer should also be installed at
Ring 0. Different instructions at Ring 0 may cause some problems.
○ First, its compatibility and portability may be in doubt, because it must support
the unmodified OS as well.
○ Second, the cost of maintaining paravirtualized OSes is high, because they may
require deep OS kernel modifications.
○ Finally, the performance advantage of paravirtualization varies greatly due to
workload variations.
● Compared with full virtualization, paravirtualization is relatively easy and more practical.
The main problem in full virtualization is its low performance in binary translation.
● KVM is a Linux paravirtualization system. It is a part of the Linux version 2.6.20 kernel.
● In KVM, Memory management and scheduling activities are carried out by the existing
Linux kernel.
● The KVM does the rest, which makes it simpler than the hypervisor that controls the
entire machine.
● Unlike the full virtualization architecture which intercepts and emulates privileged and
sensitive instructions at runtime, paravirtualization handles these instructions at compile
time.
● The guest OS kernel is modified to replace the privileged and sensitive instructions with
hypercalls to the hypervisor or VMM. Xen assumes such paravirtualization architecture.
● The guest OS running in a guest domain may run at Ring 1 instead of at Ring 0. This
implies that the guest OS may not be able to execute some privileged and sensitive
instructions. The privileged instructions are implemented by hypercalls to the hypervisor.
● To support virtualization, processors such as the x86 employ a special running mode
and instructions known as hardware assisted virtualization.
● For the x86 architecture, Intel and AMD have proprietary technologies for hardware
assisted virtualization.
● All processors have at least two modes, user mode and supervisor mode, to ensure
controlled access of critical hardware.
● At the time of this writing, many hardware virtualization products were available.
● The VMware Workstation is a VM software suite for x86 and x86-64 computers.
● This software suite allows users to set up multiple x86 and x86-64 virtual computers and
to use one or more of these VMs simultaneously with the host operating system.
● Xen is a hypervisor for use in IA-32, x86-64, Itanium and PowerPC 970 hosts.
● KVM can support hardware assisted virtualization and paravirtualization by using the
Intel VT-x or AMD-v and VirtIO framework, respectively.
● The VirtIO framework includes a paravirtual Ethernet card, a disk I/O controller and a
balloon device for adjusting guest memory usage and a VGA graphics interface using
VMware drivers.
● The unprivileged instructions of VMs run directly on the host machine for higher
efficiency.
● The critical instructions are divided into three categories: privileged instructions, control
sensitive instructions, and behavior sensitive instructions.
● CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and
unprivileged instructions in the CPU’s user mode while the VMM runs in supervisor
mode.
● RISC CPU architectures can be naturally virtualized because all control and behavior
sensitive instructions are privileged instructions.
● The x86 CPU architectures are not primarily designed to support virtualization.
● Intel and AMD add an additional mode called privilege mode level (some people call it
Ring-1) to x86 processors.
● Therefore, operating systems can still run at Ring 0 and hypervisor can run at Ring 1.
● All the privileged and sensitive instructions are trapped in the hypervisor automatically.
● Virtual memory virtualization is similar to the virtual memory support provided by modern
operating systems.
● All modern x86 CPUs include a memory management unit (MMU) and a translation
lookaside buffer (TLB) to optimize virtual memory performance.
● That means a two stage mapping process should be maintained by the guest OS and
the VMM, respectively: virtual memory to physical memory and physical memory to
machine memory.
● The guest OS continues to control the mapping of virtual addresses to the physical
memory addresses of VMs.
● But the guest OS cannot directly access the actual machine memory.
● The VMM is responsible for mapping the guest physical memory to the actual machine
memory.
● Since each page table of the guest OSes has a separate page table in the VMM
corresponding to it, the VMM page table is called the shadow page table.
● The MMU already handles virtual-to-physical translations as defined by the OS. Then
the physical memory addresses are translated to machine addresses using another set
of page tables defined by the hypervisor.
● Processors use TLB hardware to map the virtual memory directly to the machine
memory to avoid the two levels of translation on every access.
● When the guest OS changes the virtual memory to a physical memory mapping, the
VMM updates the shadow page tables to enable a direct lookup.
● The AMD Barcelona processor has featured hardware assisted memory virtualization
since 2007.
● I/O virtualization involves managing the routing of I/O requests between virtual devices
and the shared physical hardware.
● There are three ways to implement I/O virtualization: full device emulation,
paravirtualization, and direct I/O.
● Full device emulation is the first approach for I/O virtualization. Generally, this approach
emulates well known and real world devices.
● The I/O access requests of the guest OS are trapped in the VMM which interacts with
the I/O devices.
● A single hardware device can be shared by multiple VMs that run concurrently.
However, software emulation runs much slower than the hardware it emulates.
Guest OS
Guest Device Driver
Virtualization Layer
Virtual Hardware
Device Emulation
I/O Stack
Device Driver
Physical Hardware
● The frontend driver is running in Domain U and the backend driver is running in Domain
0. They interact with each other via a block of shared memory.
● The frontend driver manages the I/O requests of the guest OSes and the backend driver
is responsible for managing the real I/O devices and multiplexing the I/O data of different
VMs.
● Para I/O-virtualization achieves better device performance than full device emulation, it
comes with a higher CPU overhead.
● Direct I/O virtualization lets the VM access devices directly. It can achieve close-to-
native performance without high CPU costs.
● For example, when a physical device is reclaimed (required by workload migration) for
later reassignment, it may have been set to an arbitrary state (e.g., DMA to some
arbitrary memory locations) that can function incorrectly or even crash the whole system.
● Since software based I/O virtualization requires a very high overhead of device
emulation, hardware-assisted I/O virtualization is critical.
● Intel VT-d supports the remapping of I/O DMA transfers and device generated interrupts.
The architecture of VT-d provides the flexibility to support multiple usage models that
may run unmodified, special-purpose, or “virtualization-aware” guest OSes.
● Another way to help I/O virtualization is via self virtualized I/O (SV-IO).
● The key idea of SV-IO is to harness the rich resources of a multicore processor. All tasks
associated with virtualizing an I/O device are encapsulated in SV-IO.
● It provides virtual devices and an associated access API to VMs and a management API
to the VMM.
● SV-IO defines one virtual interface (VIF) for every kind of virtualized I/O device, such as
virtual network interfaces, virtual block devices (disk), virtual camera devices,
● Muti-core virtualization has raised some new challenges to computer architects, compiler
constructors, system designers, and application programmers.
● There are mainly two difficulties: Application programs must be parallelized to use all
cores fully, and software must explicitly assign tasks to the cores, which is a very
complex problem.
○ The first challenge, new programming models, languages, and libraries are
needed to make parallel programming easier.
○ The second challenge has spawned research involving scheduling algorithms
and resource management policies.
● Dynamic heterogeneity is emerging to mix the fat CPU core and thin GPU cores on the
same chip, which further complicates the multi core or many core resource
management.
● The dynamic heterogeneity of hardware infrastructure mainly comes from less reliable
transistors and increased complexity in using the transistors.
● This technique alleviates the burden and inefficiency of managing hardware resources
by software.
● It is located under the ISA and remains unmodified by the operating system or VMM
(hypervisor).
Guest VMs
System
Software
V0 V1 V3
Chip C0 C1 C3
● Figure 2.11 illustrates the technique of software visible VCPU moving from one core to
another and temporarily suspending execution of a VCPU when there are no appropriate
cores on which it can run.
● The emerging many core chip multiprocessors (CMPs) provide a new computing
landscape.
● Instead of supporting time sharing jobs on one or a few cores, we can use the abundant
cores in a space sharing, where single threaded or multithreaded jobs are
simultaneously assigned to separate groups of cores for long time intervals.
● To optimize for space shared workloads, they propose using virtual hierarchies to
overlay a coherence and caching hierarchy onto a physical processor.
● A virtual hierarchy is a cache hierarchy that can adapt to fit the workload or mix of
workloads.
● The hierarchy’s first level locates data blocks close to the cores needing them for faster
access, establishes a shared-cache domain and establishes a point of coherence for
faster communication.
● When a miss leaves a tile, it first attempts to locate the block (or sharers) within the first
level. The first level can also provide isolation between independent workloads. A miss
at the L1 cache can invoke the L2 access.
● Space sharing is applied to assign three workloads to three clusters of virtual cores:
○ Namely VM0 and VM3 for database workload, VM1 and VM2 for web server
workload and VM4–VM7 for middleware workload.
● Each VM operates in a isolated fashion at the first level. This will minimize both miss
access time and performance interference with other workloads or VMs.
● The shared resources of cache capacity, inter-connect links, and miss handling are
mostly isolated between VMs. The second level maintains a globally shared memory.
● One very distinguishing feature of cloud computing infrastructure is the use of system
virtualization and the modification to provisioning tools.
● The user will not care about the computing resources that are used for providing the
services.
● Cloud users do not need to know and have no way to discover physical resources that
are involved while processing a service request.
● In addition, application developers do not care about some infrastructure issues such as
scalability and fault tolerance. Application developers focus on service logic.
Infrastructure services
Virtualized Infrastructure
Virtualized integrated manager
Black box
Virtual Solution White box
VM VM management
Agent Agent
Virtualized platforms
Figure 2.12 Virtualized servers, storage, and network for cloud platform construction
● Cloud computing systems use virtualization soware as the running environment for
legacy software such as old operating systems and unusual applications.
● Virtualization software is also used as the platform for developing new cloud applications
that enable developers to use any operating systems and programming environments
they like.
● The development environment and deployment environment can now be the same,
which eliminates some runtime problems.
● VMs provide flexible runtime services to free users from worrying about the system
environment.
● Using VMs in a cloud computing platform ensures extreme flexibility for users. As the
computing resources are shared by many users, a method is required to maximize the
user’s privileges and still keep them separated safely.
● Traditional sharing of cluster resources depends on the user and group mechanism on a
system.
● An environment that meets one user’s requirements often cannot satisfy another user.
Virtualization allows us to have full privileges while keeping them separate.
● Users have full access to their own VMs, which are completely separate from other
user’s VMs.
● Multiple VMs can be mounted on the same physical server. Different VMs may run with
different OSes.
● The virtualization is carried out by special servers dedicated to generating the virtualized
resource pool.
● The virtualized infrastructure (black box in the middle) is built with many virtualizing
integration managers.
● These managers handle loads, resources, security, data, and provisioning functions.
Figure 2.13 shows two VM platforms.
● Each platform carries out a virtual solution to a user job. All cloud services are managed
in the boxes at the top.
Install
Configure Configure backup Automatic
Install OS
hardware OS revocery
agent
Figure 2.13 Conventional disaster recover scheme versus live migration of VMs
● AWS provides extreme flexibility (VMs) for users to execute their own applications.
● GAE provides limited application level virtualization for users to build applications only
based on the services that are created by Google.
● Microsoft provides programming level virtualization (.NET virtualization) for users to build
their applications.
● The Microsoft tools are used on PCs and some special servers.
● This has enabled users to create customized environments atop physical infrastructure
for cloud computing.
● As shown in the top timeline of Figure 2.13, traditional disaster recovery from one
physical machine to another is rather slow, complex, and expensive.
● Total recovery time is attributed to the hardware configuration, installing and configuring
the OS, installing the backup agents and the long time to restart the physical machine.
● To recover a VM platform, the installation and configuration times for the OS and backup
agents are eliminated.
● The idea is to make a clone VM on a remote server for every running VM on a local
server.
● A cloud control center should be able to activate this clone VM in case of failure of the
original VM, taking a snapshot of the VM to enable live migration in a minimal amount of
time.
● The migrated VM can run on a shared Internet connection. Only updated data and
modified states are sent to the suspended VM to update its state.
● The Recovery Property Objective (RPO) and Recovery Time Objective (RTO) are
affected by the number of snapshots taken.
2. Define SOA.
● Web services are the prominent technology for implementing SOA systems and
applications.
● The concept behind a Web service is very simple.
● Using as a basis the object oriented abstraction, a Web service exposes a set of
operations that can be invoked by leveraging Internet based protocols.
6. List the aspects that make Web services the technology of choice for SOA.
● First, they allow for interoperability across different platforms and programming
languages.
● Second, they are based on well-known and vendor-independent standards such as
HTTP, SOAP, XML, and WSDL.
● Third, they provide an intuitive and simple way to connect heterogeneous software
systems
● Finally, they provide the features required by enterprise business applications to be
used in an industrial environment.
8. What is SOAP?
● Increased security
○ Sharing
○ Aggregation
○ Emulation
○ Isolation
● Performance tuning.
● Portability
● Non privileged instructions are those instructions that can be used without interfering
with other tasks because they do not access shared resources.
● Privileged instructions are those that are executed under specific restrictions and are
mostly used for sensitive operations, which expose (behavior-sensitive) or modify
(control-sensitive) the privileged state.
Ring 0
(Most privileged
Mode)
Ring 1
Ring 2
Ring 3
(Least
privileged
Mode)
● Full virtualization refers to the ability to run a program, most likely an operating
system, directly on top of a virtual machine and without any modification, as though it
were run on the raw hardware.
● Paravirtualization is a not-transparent virtualization solution that allows implementing
thin virtual machine managers.
● Partial virtualization provides a partial emulation of the underlying hardware, thus not
allowing the complete execution of the guest operating system in complete isolation.
● Partial virtualization allows many applications to run transparently, but not all the
features of the operating system can be supported, as happens with full
virtualization.
● A micro-kernel hypervisor includes only the basic and unchanging functions (such as
physical memory management and processor scheduling).
● A monolithic hypervisor implements all the aforementioned functions, including those
of the device drivers.
-------------------------------------------------------------------------------------------------------------------------------
Layered Cloud Architecture Design –NIST Cloud Computing Reference Architecture –Public,
Private and Hybrid Clouds -laaS –PaaS –SaaS –Architectural Design Challenges –Cloud
Storage –Storage-as-a-Service –Advantages of Cloud Storage –Cloud Storage Providers –S3.
-------------------------------------------------------------------------------------------------------------------------------
● These three development layers are implemented with virtualization and standardization
of hardware and software resources provisioned in the cloud.
● The services to public, private and hybrid clouds are conveyed to users through
networking support over the Internet and intranets involved.
● It is clear that the infrastructure layer is deployed first to support IaaS services.
● This infrastructure layer serves as the foundation for building the platform layer of the
cloud for supporting PaaS services.
● In turn, the platform layer is a foundation for implementing the application layer for SaaS
applications.
Internet
Provisioning of resources
● The infrastructure layer is built with virtualized compute, storage and network resources.
● The platform layer is for general purpose and repeated usage of the collection of
software resources.
● This layer provides users with an environment to develop their applications, to test
operation flows and to monitor execution results and performance.
● The platform should be able to assure users that they have scalability, dependability,
and security protection.
● In a way, the virtualized cloud platform serves as a “system middleware” between the
infrastructure and application layers of the cloud.
● The application layer is formed with a collection of all needed software modules for SaaS
applications.
● Service applications in this layer include daily office management work such as
information retrieval, document processing and calendar and authentication services.
● The application layer is also heavily used by enterprises in business marketing and
sales, consumer relationship management (CRM), financial transactions and supply
chain management.
● From the provider’s perspective, the services at various layers demand different
amounts of functionality support and resource management by providers.
● In general, SaaS demands the most work from the provider, PaaS is in the middle, and
IaaS demands the least.
● For example, Amazon EC2 provides not only virtualized CPU resources to users but
also management of these provisioned resources.
● The best example of this is the Salesforce.com CRM service in which the provider
supplies not only the hardware at the bottom layer and the software at the top layer but
also the platform and software tools for user application development and monitoring.
○ Users or brokers acting on user’s behalf submit service requests from anywhere
in the world to the data center and cloud to be processed.
○ The request examiner ensures that there is no overloading of resources whereby
many service requests cannot be fulfilled successfully due to limited resources.
○ The Pricing mechanism decides how service requests are charged. For instance,
requests can be charged based on submission time (peak/off-peak), pricing rates
(fixed/changing), or availability of resources (supply/demand).
○ The VM Monitor mechanism keeps track of the availability of VMs and their
resource entitlements.
○ The Accounting mechanism maintains the actual usage of resources by requests
so that the final cost can be computed and charged to users.
○ In addition, the maintained historical usage information can be utilized by the
Service Request Examiner and Admission Control mechanism to improve
resource allocation decisions.
○ The Dispatcher mechanism starts the execution of accepted service requests on
allocated VMs.
○ The Service Request Monitor mechanism keeps track of the execution progress
of service requests.
● The goal is to achieve effective and secure cloud computing to reduce cost and improve
services
● In general, NIST generates report for future reference which includes survey, analysis of
existing cloud computing reference model, vendors and federal agencies.
● The conceptual reference architecture shown in figure 3.2 involves five actors. Each
actor as entity participates in cloud computing
PaaS and
Privacy
Configuring Service
IaaS
Aggregation
Privacy impact Portability and
Audit Resource abstraction Interoperat-
& Control Layer
Service Arbitrage
Performance Audit Physical resource Business support
Layer
Cloud Carrier
● Cloud broker: An entity that manages the performance and delivery of cloud services
and negotiates relationship between cloud provider and consumer.
● Cloud carrier: An intermediary that provides connectivity and transport of cloud services
from cloud providers to consumers.
Consumer Auditor
Broker Provider
● Figure 3.3 illustrates the common interaction exist in between cloud consumer and
provider where as the broker used to provide service to consumer and auditor collects
the audit information.
● The interaction between the actors may lead to different use case scenario.
● Figure 3.4 shows one kind of scenario in which the Cloud consumer may request service
from a cloud broker instead of contacting service provider directly. In this case, a cloud
broker can create a new service by combining multiple services.
Provider 1
Consumer Broker
Provider 2
● Figure 3.5 illustrates the usage of different kind of Service Level Agreement (SLA)
between consumer, provider and carrier.
SLA #1 SLA #2
Consumer Provider Carrier
Maintain the consistent Specify the capacity and
level of service functionality
● Figure 3.6 shows the scenario where the Cloud auditor conducts independent
assessment of operation and security of the cloud service implementation.
Auditor
Consumer Provider
● Cloud consumer is a principal stake holder for the cloud computing service and requires
service level agreements to specify the performance requirements fulfilled by a cloud
provider.
● The service level agreement covers Quality of Service and Security aspects.
● There are three kinds of cloud consumers: SaaS consumers, PaaS Consumers and
IaaS consumers.
● SaaS consumers are members directly access the software application. For example,
document management, content management, social networks, financial billing and so
on.
● PaaS consumers are used to deploy, test, develop and manage applications hosted in
cloud environment. Database application deployment, development and testing is an
example for these kind of consumer.
● IaaS Consumer can access the virtual computer, storage and network infrastructure. For
example, usage of Amazon EC2 instance to deploy the web application.
● On the other hand, Cloud Providers have complete rights to access software
applications.
● Normally, the service layer defines the interfaces for cloud consumers to access the
computing services.
● Resource abstraction and control layer contains the system components that cloud
provider use to provide and mange access to the physical computing resources through
software abstraction.
● Control layer focus on resource allocation, access control and usage monitoring.
● Physical resource layer includes physical computing resources such as CPU, Memory,
Router, Switch, Firewalls and Hard Disk Drive.
● In cloud service management, business support entails the set of business related
services dealing with consumer and supporting services which includes content
management, contract management, inventory management, accounting service,
reporting service and rating service.
● Portability enforces the ability to work in more than one computing environment without
major task. Similarly, Interoperatability means the ability of the system work with other
system.
● Privacy is one applies to a cloud consumer’s rights to safe guard his information from
other consumers are parties.
● The main aim of Security and Privacy in cloud service management is to protect the
system from vulnerable customers.
● Cloud auditor performs independent assessments among the services and cloud broker
act as intermediate module.
● Service aggregation provides data integration. Cloud broker combines and integrate
multiple service into one or more new services.
● Due to Service arbitrage, cloud broker has a flexibility to choose services from multiple
providers.
● Cloud carrier is an intermediary that provides connectivity and transport of cloud service
between cloud consumer and cloud provider.
● It provides access to cloud consumer with the help of network, telecommunication and
other access devices where as distribution is done with transport agent,
● Transport agent is the business organization that provides physical transport of storage
media.
● The differences are based on how exclusive the computing resources are made to a
Cloud Consumer.
● A public cloud is one in which the cloud infrastructure and computing resources are
made available to the general public over a public network.
● A public cloud is owned by an organization selling cloud services, and serves a diverse
pool of clients.
● Figure 4.7 presents a simple view of a public cloud and its customers.
● One of the main benefits that come with using public cloud services is near unlimited
scalability.
● The resources are pretty much offered based on demand. So any changes in activity
level can be handled very easily.
● Public cloud allows pooling of a large number of resources, users are benefiting from the
savings of large scale operations.
● There are many services like Google Drive which are offered for free.
● Finally, the vast network of servers involved in public cloud services means that it can
benefit from greater reliability.
● Even if one data center was to fail entirely, the network simply redistributes the load
among the remaining enters making it highly unlikely that the public cloud would ever
fail.
○ Easy scalability
○ Cost effectiveness
○ Increased reliability
● At the top of the list is the fact that the security of data held within a public cloud is a
cause for concern.
● It is often seen as an advantage that the public cloud has no geographical restrictions
making access easy from everywhere, but on the flip side this could mean that the
server is in a different country which is governed by an entirely different set of security
and/or privacy regulations.
● This could mean that your data is not all that secure making it unwise to use public cloud
services for sensitive data.
● A private cloud gives a single Cloud Consumer’s organization the exclusive access to
and usage of the infrastructure and computational resources.
● It may be managed either by the Cloud Consumer organization or by a third party, and
may be hosted on the organization’s premises (i.e. on-site private clouds) or outsourced
to a hosting company (i.e. outsourced private clouds).
● Figure 3.8 presents an on-site private cloud and an outsourced private cloud,
respectively.
Figure 3.8 (a) On-site Private Cloud (b) Out-sourced Private Cloud
● The main benefit of choosing a private cloud is the greater level of security offered
making it ideal for business users who need to store and/or process sensitive data.
● A good example is a company dealing with financial information such as bank or lender
who is required by law to use secure internal storage to store consumer information.
● With a private cloud this can be achieved while still allowing the organization to benefit
from cloud computing.
● Private cloud services also offer some other benefits for business users including more
control over the server allowing it to be tailored to your own preferences and in house
styles.
● While this can remove some of the scalability options, private cloud providers often offer
what is known as cloud bursting which is when non sensitive data is switched to a public
cloud to free up private cloud space in the event of a significant spike in demand until
such times as the private cloud can be expanded.
○ Improved security
○ Greater control over the server
○ Flexibility in the form of Cloud Bursting
● The downsides of private cloud services include a higher initial outlay, although in the
long term many business owners find that this balances out and actual becomes more
cost effective than public cloud use.
● It is also more difficult to access the data held in a private cloud from remote locations
due to the increased security measures.
● A community cloud serves a group of Cloud Consumers which have shared concerns
such as mission objectives, security, privacy and compliance policy, rather than serving
a single organization as does a private cloud.
● Figure 3.9 (a) depicts an on-site community cloud comprised of a number of participant
organizations.
● A cloud consumer can access the local cloud resources, and also the resources of other
participating organizations through the connections between the associated
organizations.
● Figure 3.9 (b) shows an outsourced community cloud, where the server side is
outsourced to a hosting company.
● In this case, an outsourced community cloud builds its infrastructure off premise, and
serves a set of organizations that request and consume cloud services.
● Figure 3.10 illustrates a simple view of a hybrid cloud that could be built with a set of
clouds in the five deployment model variants.
● IaaS providers can offer the bare metal in terms of virtual machines where PaaS
solutions are deployed.
● When there is no need for a PaaS layer, it is possible to directly customize the virtual
infrastructure with the software stack needed to run applications.
● This is the case of virtual Web farms: a distributed system composed of Web servers,
database servers and load balancers on top of which prepackaged software is installed
to run Web applications.
● Other solutions provide prepackaged system images that already contain the software
stack required for the most common uses: Web servers, database servers or LAMP
stacks.
● Besides the basic virtual machine management capabilities, additional services can be
provided, generally including the following:
○ Physical infrastructure
○ Software management infrastructure
○ User interface
● At the top layer the user interface provides access to the services exposed by the
software management infrastructure.
● Such an interface is generally based on Web 2.0 technologies: Web services, RESTful
APIs and mash ups.
● Web services and RESTful APIs allow programs to interact with the service without
human intervention, thus providing complete integration within a software system.
● The core features of an IaaS solution are implemented in the infrastructure management
software layer.
● A central role is played by the scheduler, which is in charge of allocating the execution of
virtual machine instances.
● The bottom layer is composed of the physical infrastructure, on top of which the
management layer operates.
● From an architectural point of view, the physical layer also includes the virtual resources
that are rented from external IaaS providers.
● In the case of complete IaaS solutions, all three levels are offered as service.
● This is generally the case with public clouds vendors such as Amazon, GoGrid, Joyent,
Rightscale, Terremark, Rackspace, ElasticHosts, and Flexiscale, which own large
datacenters and give access to their computing infrastructures using an IaaS approach.
3.4.1 laaS
● Infrastructure or Hardware as a Service (IaaS/HaaS) solutions are the most popular and
developed market segment of cloud computing.
● The available options within the IaaS offering umbrella range from single servers to
entire infrastructures, including network devices, load balancers, database servers and
Web servers.
● The main technology used to deliver and implement these solutions is hardware
virtualization: one or more virtual machines opportunely configured and interconnected
define the distributed system on top of which applications are installed and deployed.
● Virtual machines also constitute the atomic components that are deployed and priced
according to the specific features of the virtual hardware: memory, number of processors
and disk storage.
● From the perspective of the service provider, IaaS/HaaS allows better exploiting the IT
infrastructure and provides a more secure environment where executing third party
applications.
● From the perspective of the customer, it reduces the administration and maintenance
cost as well as the capital costs allocated to purchase hardware.
● At the same time, users can take advantage of the full customization offered by
virtualization to deploy their infrastructure in the cloud.
3.4.2 PaaS
● A general overview of the features characterizing the PaaS approach is given in Figure
3.12.
● The core middleware is in charge of managing the resources and scaling applications on
demand or automatically, according to the commitments made with users.
● From a user point of view, the core middleware exposes interfaces that allow
programming and deploying applications on the cloud.
● Some implementations provide a completely Web based interface hosted in the cloud
and offering a variety of services.
● Other implementations of the PaaS model provide a complete object model for
representing an application and provide a programming language-based approach.
● Developers generally have the full power of programming languages such as Java,
.NET, Python and Ruby with some restrictions to provide better scalability and security.
● PaaS solutions can offer middleware for developing applications together with the
infrastructure or simply provide users with the software that is installed on the user
premises.
● In the first case, the PaaS provider also owns large datacenters where applications are
executed
● In the second case, referred to in this book as Pure PaaS, the middleware constitutes
the core value of the offering.
○ PaaS-I
○ PaaS-II
○ PaaS-III
● The first category identifies PaaS implementations that completely follow the cloud
computing style for application development and deployment.
● In the second class focused on providing a scalable infrastructure for Web application,
mostly websites.
○ In this case, developers generally use the provider’s APIs, which are built on top
of industrial runtimes, to develop applications.
○ Google AppEngine is the most popular product in this category.
○ It provides a scalable runtime based on the Java and Python programming
languages, which have been modified for providing a secure runtime
environment and enriched with additional APIs and components to support
scalability.
● The third category consists of all those solutions that provide a cloud programming
platform for any kind of application, not only Web applications.
○ Among these, the most popular is Microsoft Windows Azure, which provides a
comprehensive framework for building service oriented cloud applications on top
of the .NET technology, hosted on Microsoft’s datacenters.
○ Other solutions in the same category, such as Manjrasoft Aneka, Apprenda
SaaSGrid, Appistry Cloud IQ Platform, DataSynapse, and GigaSpaces DataGrid,
provide only middleware with different services.
○ Runtime framework: This framework represents the software stack of the PaaS
model and the most intuitive aspect that comes to people’s minds when they
refer to PaaS solutions.
○ Abstraction: PaaS solutions are distinguished by the higher level of abstraction
that they provide.
○ Automation: PaaS environments automate the process of deploying applications
to the infrastructure, scaling them by provisioning additional resources when
needed.
○ Cloud services: PaaS offerings provide developers and architects with services
and APIs, helping them to simplify the creation and delivery of elastic and highly
available cloud application.
3.4.3 SaaS
● It provides a means to free users from complex hardware and software management by
offloading such tasks to third parties, which build applications accessible to multiple
users through a Web browser.
● On the provider side, the specific details and features of each customer’s application are
maintained in the infrastructure and made available on demand.
● The SaaS model is appealing for applications serving a wide range of users and that can
be adapted to specific needs with little further customization.
● This is the case of CRM and ERP applications that constitute common needs for almost
all enterprises, from small to medium-sized and large business.
● Every enterprise will have the same requirements for the basic features concerning CRM
and ERP and different needs can be satisfied with further customization.
● On the customer side, such costs constitute a minimal fraction of the usage fee paid for
the software.
● The analysis carried out by Software Information and Industry Association (SIIA) was
mainly oriented to cover application service providers (ASPs) and all their variations,
● ASPs provided access to packaged software solutions that addressed the needs of a
variety of customers.
● Initially this approach was affordable for service providers, but it later became
inconvenient when the cost of customizations and specializations increased.
● The SaaS approach introduces a more flexible way of delivering application services that
are fully customizable by the user by integrating new services, injecting their own
components and designing the application and information workflows.
● Initially the SaaS model was of interest only for lead users and early adopters.
○ Software cost reduction and total cost of ownership (TCO) were paramount
○ Service level improvements
○ Rapid implementation
○ Standalone and configurable applications
○ Rudimentary application and data integration
○ Subscription and pay as you go (PAYG) pricing
● With the advent of cloud computing there has been an increasing acceptance of SaaS
as a viable software delivery model.
● This lead to transition into SaaS 2.0, which does not introduce a new technology but
transforms the way in which SaaS is used.
● Software as a Service based applications can serve different needs. CRM, ERP, and
social networking applications are definitely the most popular ones.
● SalesForce.com is probably the most successful and popular example of a CRM service.
● It provides a wide range of services for applications: customer relationship and human
resource management, enterprise resource planning, and many other features.
● SalesForce.com builds on top of the Force.com platform, which provides a fully featured
environment for building applications.
● In particular, through AppExchange customers can publish, search and integrate new
services and features into their existing applications.
● Other than providing the basic features of networking, they allow incorporating and
extending their capabilities by integrating third-party applications.
○ Google Documents and Zoho Office are examples of Web based applications
that aim to address all user needs for documents, spreadsheets and presentation
management.
○ These applications offer a Web based interface for creating, managing, and
modifying documents that can be easily shared among users and made
accessible from anywhere.
● The management of a cloud service by a single company is often the source of single
points of failure.
● Even if a company has multiple data centers located in different geographic regions, it
may have common software infrastructure and accounting systems.
● Therefore, using multiple cloud providers may provide more protection from failures.
● Criminals threaten to cut off the incomes of SaaS providers by making their services
unavailable.
● Some utility computing services offer SaaS providers the opportunity to defend against
DDoS attacks by using quick scale ups.
● Software stacks have improved interoperability among different cloud platforms, but the
APIs itself are still proprietary. Thus, customers cannot easily extract their data and
programs from one site to run on another.
● The obvious solution is to standardize the APIs so that a SaaS developer can deploy
services and data across multiple cloud providers.
● This will rescue the loss of all data due to the failure of a single company.
● Such an option could enable surge computing, in which the public cloud is used to
capture the extra tasks that cannot be easily run in the data center of a private cloud.
● Current cloud offerings are essentially public (rather than private) networks, exposing the
system to more attacks.
● Many obstacles can be overcome immediately with well understood technologies such
as encrypted storage, virtual LANs, and network middle boxes (e.g., firewalls, packet
filters).
● For example, the end user could encrypt data before placing it in a cloud. Many nations
have laws requiring SaaS providers to keep customer data and copyrighted material
within national boundaries.
● Traditional network attacks include buffer overflows, DoS attacks, spyware, malware,
rootkits, Trojan horses, and worms.
● In a cloud environment, newer attacks may result from hypervisor malware, guest
hopping and hijacking or VM rootkits.
● On the other hand, Active attacks may manipulate kernel data structures which will
cause major damage to cloud servers.
● Multiple VMs can share CPUs and main memory in cloud computing, but I/O sharing is
problematic.
● For example, to run 75 EC2 instances with the STREAM benchmark requires a mean
bandwidth of 1,355 MB/second.
● However, for each of the 75 EC2 instances to write 1 GB files to the local disk requires a
mean disk write bandwidth of only 55 MB/second.
● If we assume applications to be pulled apart across the boundaries of clouds, this may
complicate data placement and transport.
● Cloud users and providers have to think about the implications of placement and traffic
at every level of the system, if they want to minimize costs.
● This kind of reasoning can be seen in Amazon’s development of its new CloudFront
service.
● Therefore, data transfer bottlenecks must be removed, bottleneck links must be widened
and weak servers should be removed.
● The opportunity is to create a storage system that will not only meet this growth but also
combine it with the cloud advantage of scaling arbitrarily up and down on demand.
● Data consistence checking in SAN connected data centers is a major challenge in cloud
computing.
● Large scale distributed bugs cannot be reproduced, so the debugging must occur at a
scale in the production data centers.
● No data center will provide such a convenience. One solution may be a reliance on
using VMs in cloud computing.
● The level of virtualization may make it possible to capture valuable information in ways
that are impossible without using VMs.
● Debugging over simulators is another approach to attacking the problem, if the simulator
is well designed.
● The pay as you go model applies to storage and network bandwidth; both are counted in
terms of the number of bytes used.
● GAE automatically scales in response to load increases or decreases and the users are
charged by the cycles used.
● AWS charges by the hour for the number of VM instances used, even if the machine is
idle.
● The opportunity here is to scale quickly up and down in response to load variation, in
order to save money, but without violating SLAs.
● Open Virtualization Format (OVF) describes an open, secure, portable, efficient and
extensible format for the packaging and distribution of VMs.
● This VM format does not rely on the use of a specific host platform, virtualization
platform or guest operating system.
● The approach is to address virtual platform is agnostic packaging with certification and
integrity of packaged software.
● The package supports virtual appliances to span more than one VM.
● OVF also defines a transport mechanism for VM templates and the format can apply to
different virtualization platforms with different levels of virtualization.
● In terms of cloud standardization, the ability for virtual appliances to run on any virtual
platform.
● The user is also need to enable VMs to run on heterogeneous hardware platform
hypervisors.
● And also the user need to realize cross platform live migration between x86 Intel and
AMD technologies and support legacy hardware for load balancing.
● Many cloud computing providers originally relied on open source software because the
licensing model for commercial software is not ideal for utility computing.
● The primary opportunity is either for open source to remain popular or simply for
commercial software companies to change their licensing structure to better fit cloud
computing.
● One can consider using both pay for use and bulk use licensing schemes to widen the
business coverage.
● Cloud storage means storing the data with a cloud service provider rather than on a local
system.
● The end user can access the data stored on the cloud using an Internet link.
● If the users stored some data on a cloud, they can get at it from any location that has
Internet access.
● Workers do not need to use the same computer to access data nor do they have to carry
around physical storage devices.
● Also, if any organization has branch offices, they can all access the data from the cloud
provider.
● There are hundreds of different cloud storage systems, and some are very specific in
what they do.
● Some are niche-oriented and store just email or digital pictures, while others store any
type of data. Some providers are small, while others are huge and fill an entire
warehouse.
● At the most rudimentary level, a cloud storage system just needs one data server
connected to the Internet.
● A subscriber copies files to the server over the Internet, which then records the data.
When a client wants to retrieve the data, the client accesses the data server with a web
based interface and the server then either sends the files back to the client or allows the
client to access and manipulate the data itself.
● More typically, however, cloud storage systems utilize dozens or hundreds of data
servers.
● Because servers require maintenance or repair, it is necessary to store the saved data
on multiple machines, providing redundancy.
● Without that redundancy, cloud storage systems could not assure clients that they could
access their information at any given time.
3.6.1 Storage-as-a-Service
● Figure 3.13 illustrates the storage as a service where the data stored in cloud storage.
● It is also ideal when technical personnel are not available or have inadequate knowledge
to implement and maintain that storage infrastructure.
● Storage service providers are nothing new, but given the complexity of current backup,
replication, and disaster recovery needs, the service has become popular, especially
among small and medium sized businesses.
● The end user does not have to pay for infrastructure. They simply pay for how much they
transfer and save on the provider’s servers.
● A customer uses client software to specify the backup set and then transfers data across
a WAN.
● Examples:
● Authorization practices: The client lists the people who are authorized to access
information stored on the cloud system. Many corporations have multiple levels of
authorization.
● If a cloud storage system is unreliable, it becomes a liability. No one wants to save data
on an unstable system, nor would they trust a company that is financially unstable.
● Most cloud storage providers try to address the reliability concern through redundancy,
but the possibility still exists that the system could crash and leave clients with no way to
access their saved data.
● Cloud storage providers balance server loads and move data among various
datacenters, ensuring that information is stored close and thereby available quickly while
using the data.
● Storing data on the cloud is advantageous, because it allows the user to protect the data
in case there’s a disaster.
● Having the data stored off-site can be the difference between closing the door for good
or being down for a few days or weeks.
● Which storage vendor to go with can be a complex issue, and how the end user
technology interacts with the cloud can be complex.
● For instance, some products are agent based and the application automatically transfers
information to the cloud via FTP.
● But others employ a web front end and the user has to select local files on their
computer to transmit.
● Amazon S3 is the best known storage solution, but other vendors might be better for
large enterprises.
● For instance, those who offer service level agreements and direct access to customer
support are critical for a business moving storage to a service provider
● This is simply a listing of what some of the big players in the game have to offer and
anyone can use it as a starting guide to determine if their services match user’s needs.
● Amazon and Nirvanix are the current industry top dogs, but many others are in the field,
including some well known names.
● EMC is readying a storage solution and IBM already has a number of cloud storage
options called Blue Cloud.
3.6.4 S3
● The well known cloud storage service is Amazon’s Simple Storage Service (S3), which
is launched in 2006.
● Amazon S3 provides a simple web services interface that can be used to store and
retrieve any amount of data, at any time, from anywhere on the Web.
● It gives any developer access to the same highly scalable data storage infrastructure
that Amazon uses to run its own global network of web sites.
● The service aims to maximize benefits of scale and to pass those benefits on to
developers.
● Amazon S3 is intentionally built with a minimal feature set that includes the following
functionality:
○ Write, read, and delete objects containing from 1 byte to 5 gigabytes of data
each. The number of objects that can be stored is unlimited.
○ Each object is stored and retrieved via a unique developer assigned key.
○ Objects can be made private or public and rights can be assigned to specific
users.
○ Uses standards based REST and SOAP interfaces designed to work with any
Internet development toolkit.
○ Scalable: Amazon S3 can scale in terms of storage, request rate and users to
support an unlimited number of web-scale applications.
○ Reliable: Store data durably with 99.99 percent availability. Amazon says it does
not allow any downtime.
○ Fast: Amazon S3 was designed to be fast enough to support high-performance
applications. Server-side latency must be insignificant relative to Internet latency.
○ Inexpensive: Amazon S3 is built from inexpensive commodity hardware
components.
○ Simple: Building highly scalable, reliable, fast and inexpensive storage is difficult.
● Design Principles Amazon used the following principles of distributed system design to
meet Amazon S3 requirements:
● Amazon keeps its lips pretty tight about how S3 works, but according to Amazon, S3’s
design aims to provide scalability, high availability, and low latency at commodity costs.
● Each bucket is owned by an AWS account and the buckets are identified by a unique
user assigned key.
● Buckets and objects are created, listed and retrieved using either a REST or SOAP
interface.
● Objects can also be retrieved using the HTTP GET interface or via BitTorrent.
● An access control list restricts who can access the data in each bucket.
● Bucket names and keys are formulated so that they can be accessed using HTTP.
● Requests are authorized using an access control list associated with each bucket and
object, for instance: http://s3.amazonaws.com/samplebucket/samplekey
● The Amazon AWS Authentication tools allow the bucket owner to create an
authenticated URL with a set amount of time that the URL will be valid.
● Bucket items can also be accessed via a BitTorrent feed, enabling S3 to act as a seed
for the client.
● Buckets can also be set up to save HTTP log information to another bucket.
Internet
Provisioning of resources
● As consumers rely on cloud providers to meet more of their computing needs, they
will require a specific level of QoS to be maintained by their providers, in order to
meet their objectives and sustain their operations.
● Market-oriented resource management is necessary to regulate the supply and
demand of cloud resources to achieve market equilibrium between supply and
demand.
● A public cloud is one in which the cloud infrastructure and computing resources are
made available to the general public over a public network.
● A public cloud is owned by an organization selling cloud services, and serves a
diverse pool of clients.
● A private cloud gives a single Cloud Consumer’s organization the exclusive access
to and usage of the infrastructure and computational resources.
● It may be managed either by the Cloud Consumer organization or by a third party,
and may be hosted on the organization’s premises (i.e. on-site private clouds) or
outsourced to a hosting company (i.e. outsourced private clouds).
Merits Demerits
Ability to easily share and Not the right choice for every
collaborate organization
Lower cost Slow adoption to date
● SaaS 2.0 is not a new technology but transforms the way in which SaaS is used.
● Cloud storage means storing the data with a cloud service provider rather than on a
local system. The end user can access the data stored on the cloud using an Internet
link.
● Cloud storage has a number of advantages over traditional data storage.
● If the users stored some data on a cloud, they can get at it from any location that has
Internet access.
● The term Storage as a Service means that a third-party provider rents space on their
storage to end users who lack the budget or capital budget to pay for it on their own.
● It is also ideal when technical personnel are not available or have inadequate
knowledge to implement and maintain that storage infrastructure.
● Web email providers like Gmail, Hotmail, and Yahoo! Mail store email messages on
their own servers.
● Flickr and Picasa host millions of digital photographs. YouTube hosts millions of
user-uploaded video files.
● Hostmonster and GoDaddy store files and data for many client web sites.
● Facebook and MySpace are social networking sites and allow members to post
pictures and other content.
● MediaMax and Strongspace offer storage space for any kind of digital data.
● Storing data on the cloud is advantageous, because it allows you to protect your data
in case there’s a disaster.
● Having your data stored off-site can be the difference between closing your door for
good or being down for a few days or weeks.
● Which storage vendor to go with can be a complex issue, and how the end user
technology interacts with the cloud can be complex.
● The best-known cloud storage service is Amazon’s Simple Storage Service (S3),
which launched in 2006.
● Amazon S3 is designed to make web-scale computing easier for developers.
● Amazon S3 provides a simple web services interface that can be used to store and
retrieve any amount of data, at any time, from anywhere on the Web.
● It gives any developer access to the same highly scalable data storage infrastructure
that Amazon uses to run its own global network of web sites.
21. What are the design requirements considers by Amazon to build S3?
● Scalable
● Reliable
● Fast
● Inexpensive
● Simple
22. What are the design principles considers by Amazon to meet S3 requirements?
● Decentralization
● Autonomy
● Local responsibility
● Controlled concurrency
● Failure toleration
● Controlled parallelism
● Symmetry
● Simplicity
-------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------
● Figure 4.1 shows six layers of cloud services, ranging from hardware, network, and
collocation to infrastructure, platform and software applications.
● The cloud platform provides PaaS, which sits on top of the IaaS infrastructure. The top
layer offers SaaS.
● The next layer is for interconnecting all the hardware components and it is simply called
Network as a Service (NaaS).
● The next layer up offers Location as a Service (LaaS), which provides a collocation
service to house, power and secure all the physical hardware as well as network
resources.
● The cloud infrastructure layer can be further subdivided as Data as a Service (DaaS)
and Communication as a Service (CaaS) in addition to compute and storage in IaaS.
● From the provider perspective, cloud infrastructure performance is the primary concern.
● From the end users perspective, the quality of services, including security, is the most
important.
● SaaS tools also apply to distributed collaboration, financial and human resources
management. These cloud services have been growing rapidly in recent years.
● Based on the observations of some typical cloud computing instances, such as Google,
Microsoft, and Yahoo!, the overall software stack structure of cloud computing software
can be viewed as layers.
● Each layer has its own purpose and provides the interface for the upper layers just as
the traditional soware stack does. However, the lower layers are not completely
transparent to the upper layers.
● The platform for running cloud computing services can be either physical servers or
virtual servers.
● By using VMs, the platform can be flexible; It means the running services are not bound
to specific hardware platforms.
● The software layer on top of the platform is the layer for storing massive amounts of
data.
● This layer acts like the file system in a traditional single machine. Other layers running
on top of the file system are the layers for executing cloud computing applications.
● As in a cluster environment, there are also some runtime supporting services in the
cloud computing environment.
● Cluster monitoring is used to collect the runtime status of the entire cluster.
● The scheduler queues the tasks submitted to the whole cluster and assigns the tasks to
the processing nodes according to node availability.
● The distributed scheduler for the cloud application has special characteristics that can
support cloud applications, such as scheduling the programs written in MapReduce
style.
● The runtime support system keeps the cloud cluster working properly with high
efficiency.
● The SaaS model provides the software applications as a service, rather than lifting users
purchase the software.
● On the provider side, costs are rather low, compared with conventional hosting of user
applications.
● The customer data is stored in the cloud that is either vendor proprietary or a publicly
hosted cloud supporting PaaS and IaaS.
● The SLAs must commit sufficient resources such as CPU, memory and bandwidth that
the user can use for a preset period.
● Resource provisioning schemes also demand fast discovery of services and data in
cloud computing infrastructures.
● To deploy VMs, users treat them as physical hosts with customized operating systems
for specific applications.
● For example, Amazon’s EC2 uses Xen as the virtual machine monitor (VMM). The same
VMM is used in IBM’s Blue Cloud.
● In the EC2 platform, some predefined VM templates are also provided. Users can
choose different kinds of VMs from the templates.
● Figure 4.2 shows three cases of static cloud resource provisioning policies.
● In case (a), over provisioning with the peak load causes heavy resource waste (shaded
area).
● In case (b), under provisioning (along the capacity line) of resources results in losses by
both user and provider in that paid demand by the users (the shaded area above the
capacity) is not served and wasted resources still exist for those demanded areas below
the provisioned capacity.
● In case (c), the constant provisioning of resources with fixed capacity to a declining user
demand could result in even worse resource waste.
● The user may give up the service by canceling the demand, resulting in reduced
revenue for the provider.
● Both the user and provider may be losers in resource provisioning without elasticity.
● The demand-driven method provides static resources and has been used in grid
computing for many years.
● This method adds or removes computing instances based on the current utilization level
of the allocated resources.
● The demand-driven method automatically allocates two Xeon processors for the user
application, when the user was using one Xeon processor more than 60 percent of the
time for an extended period
● In general, when a resource has surpassed a threshold for a certain amount of time, the
scheme increases that resource based on demand.
● When a resource is below a threshold for a certain amount of time, that resource could
be decreased accordingly.
● Amazon implements such an auto-scale feature in its EC2 platform. This method is easy
to implement.
● The scheme does not work out right if the workload changes abruptly.
● This scheme adds or removes machine instances based on a specific time event.
● The scheme works better for seasonal or predicted events such as Christmastime in the
West and the Lunar New Year in the East.
● During these events, the number of users grows before the event period and then
decreases during the event period.
● The method results in a minimal loss of QoS, if the event is predicted correctly.
● Otherwise, wasted resources are even greater due to events that do not follow a fixed
pattern.
● In this method, the Internet searches for popularity of certain applications and creates
the instances by popularity demand.
● Again, the scheme has a minimal loss of QoS, if the predicted popularity is correct.
● In order to support a large number of application service consumers from around the
world, cloud infrastructure providers (i.e., IaaS providers) have established data centers
in multiple geographical locations to provide redundancy and ensure reliability in case of
site failures.
● For example, Amazon has data centers in the United States (e.g., one on the East Coast
and another on the West Coast) and Europe.
● However, currently Amazon expects its cloud customers (i.e., SaaS providers) to
express a preference regarding where they want their application services to be hosted.
● Amazon does not provide seamless/automatic mechanisms for scaling its hosted
services across multiple geographically distributed data centers.
○ First, it is difficult for cloud customers to determine in advance the best location
for hosting their services as they may not know the origin of consumers of their
services.
○ Second, SaaS providers may not be able to meet the QoS expectations of their
service consumers originating from multiple geographical locations.
● Figure 4.3 shows the high-level components of the Melbourne group’s proposed Inter
Cloud architecture.
● In addition, no single cloud infrastructure provider will be able to establish its data
centers at all possible locations throughout the world.
● As a result, cloud application service (SaaS) providers will have difficulty in meeting QoS
expectations for all their consumers.
● They need to be able to utilize market-based utility models as the basis for provisioning
of virtualized software services and federated hardware infrastructure among users with
heterogeneous applications.
● They consist of client brokering and coordinator services that support utility-driven
federation of clouds:
○ Application scheduling
○ Resource allocation
○ Migration of workloads
● The Cloud Exchange (CEx) acts as a market maker for bringing together service
producers and consumers.
● It aggregates the infrastructure demands from application brokers and evaluates them
against the available supply currently published by the cloud coordinators.
● An SLA specifies the details of the service to be provided in terms of metrics agreed
upon by all parties, and incentives and penalties for meeting and violating the
expectations, respectively.
● The availability of a banking system within the market ensures that financial transactions
pertaining to SLAs between participants are carried out in a secure and dependable
environment.
● Cloud service providers must learn from the managed service provider (MSP) model and
ensure that their customer’s applications and data are secure if they hope to retain their
customer base and competitiveness.
● Today, enterprises are looking toward cloud computing horizons to expand their on-
premises infrastructure, but most cannot afford the risk of compromising the security of
their applications and data.
● For example, IDC recently conducted a survey1 (Figure 4.4) of 244 IT executives/CIOs
and their line-of-business (LOB) colleagues to gauge their opinions and understand their
companies’ use of IT cloud services.
● Moving critical applications and sensitive data to public and shared cloud environments
is of great concern for those corporations that are moving beyond their data center’s
network perimeter defense.
● To alleviate these concerns, a cloud solution provider must ensure that customers will
continue to have the same security and privacy controls over their applications and
services.
● In addition, solution provider give evidence to customers that their organization and
customers are secure and they can meet their service level agreements, and that they
can prove compliance to auditors.
● Although virtualization and cloud computing can help companies accomplish more by
breaking the physical bonds between an IT infrastructure and its users, heightened
security threats must be overcome in order to benefit fully from this new computing
paradigm.
● Enterprise security is only as good as the least reliable partner, department and vendor.
● With the cloud model, the cloud consumer’s loss control over physical security.
● In a public cloud, the consumers are sharing computing resources with other companies.
● In a shared pool outside the enterprise, users do not have any knowledge or control of
where the resources run.
● Storage services provided by one cloud vendor may be incompatible with another
vendor’s services should you decide to move from one to the other.
● Ensuring the integrity of the data really means that it changes only in response to
authorized transactions.
● Since access to logs is required for Payment Card Industry Data Security Standard (PCI
DSS) compliance and may be requested by auditors and regulators, security managers
need to make sure to negotiate access to the provider’s logs as part of any service
agreement.
● Cloud applications undergo constant feature additions and users must keep up to date
with application improvements to be sure they are protected.
● The speed at which applications will change in the cloud will affect both the SDLC and
security.
● Security needs to move to the data level, so that enterprises can be sure their data is
protected wherever it goes.
● Sensitive data is the domain of the enterprise, not the cloud computing provider.
● There is a huge body of standards that apply for IT security and compliance, governing
most business interactions that will, over time, have to be translated to the cloud.
● SaaS makes the process of compliance more complicated, since it may be difficult for a
customer to discern where its data resides on a network controlled by its SaaS provider,
or a partner of that provider, which raises all sorts of compliance issues of data privacy,
segregation, and security.
● Security managers will need to pay particular attention to systems that contain critical
data such as corporate financial information or source code during the transition to
server virtualization in production environments.
● Outsourcing means losing significant control over data, and while this is not a good idea
from a security perspective, the business ease and financial savings will continue to
increase the usage of these services.
● Security managers will need to work with their company’s legal staff to ensure that
appropriate contract terms are in place to protect corporate data and provide for
acceptable service level agreements.
● Cloud based services will result in many mobile IT users accessing business data and
services without traversing the corporate network.
● This will increase the need for enterprises to place security controls between mobile
users and cloud based services.
● Although traditional data center security still applies in the cloud environment, physical
segregation and hardware based security cannot protect against attacks between virtual
machines on the same server.
● Administrative access is through the Internet rather than the controlled and restricted
direct or on-premises connection that is adhered to in the traditional data center model.
● This increases risk and exposure and will require stringent monitoring for changes in
system control and access control restriction.
● Proving the security state of a system and identifying the location of an insecure virtual
machine will be challenging.
● The co-location of multiple virtual machines increases the attack surface and risk of
virtual machine to virtual machine compromise.
● Localized virtual machines and physical servers use the same operating systems as well
as enterprise and web applications in a cloud server environment, increasing the threat
of an attacker or malware exploiting vulnerabilities in these systems and applications
remotely.
● Virtual machines are vulnerable as they move between the private cloud and the public
cloud.
● A fully or partially shared cloud environment is expected to have a greater attack surface
and therefore can be considered to be at greater risk than a dedicated resources
environment.
● Data is fluid in cloud computing and may reside in on-premises physical servers, on-
premises virtual machines, or off-premises virtual machines running on cloud computing
resources and this will require some rethinking on the part of auditors and practitioners
alike.
● To establish zones of trust in the cloud, the virtual machines must be self-defending,
effectively moving the perimeter to the virtual machine itself.
● In the cloud computing world, the cloud computing provider is in charge of customer data
security and privacy.
● Cloud computing models of the future will likely combine the use of SaaS (and other
XaaS’s as appropriate), utility computing and Web 2.0 collaboration technologies to
leverage the Internet to satisfy their customer needs.
● New business models being developed as a result of the move to cloud computing are
creating not only new technologies and business operational processes but also new
security requirements and challenges as described previously.
● As the most recent evolutionary step in the cloud service model (Figure 4.5), SaaS will
likely remain the dominant cloud service model for the predictable future and the area
where the most critical need for security practices and oversight will reside.
● The technology analyst and consulting firm Gartner lists seven security issues which one
should discuss with a cloud computing vendor.
● Privileged user access inquires about who has specialized access to data and about the
hiring and management of such administrators.
● Regulatory compliance makes sure that the vendor is willing to undergo external audits
and/or security certifications.
● Data location does the provider allow for any control over the location of data.
● Data segregation makes encryption is available at all stages and that these encryption
schemes were designed and tested by experienced professionals.
● Recovery is the way to find out what will happen to data in the case of a disaster. And
also it covers the way to perform complete restoration.
● Investigative support does the vendor have the ability to investigate any inappropriate or
illegal activity.
● Long-term viability focus on data if the company goes out of business and format and
process behind the returned data.
● To address the security issues listed above, SaaS providers will need to incorporate and
enhance security practices used by the managed service providers and develop new
ones as the cloud computing environment evolves.
● A charter for the security team is typically one of the first deliverables from the steering
committee.
● This charter must clearly define the roles and responsibilities of the security team and
other groups involved in performing information security functions.
● Lack of a formalized strategy can lead to an unsustainable operating model and security
level as it evolves.
● In addition, lack of attention to security governance can result in key needs of the
business not being met, including but not limited to, risk management, security
monitoring, application security, and sales support.
● Lack of proper governance and management of duties can also result in potential
security risks being left unaddressed and opportunities to improve the business being
missed because the security team is not focused on the key security functions and
activities that are critical to the business.
● In the cloud environment, physical servers are consolidated to multiple virtual machine
instances on virtualized servers.
● Not only can data center security teams replicate typical security controls for the data
center at large to secure the virtual machines, they can also advise their customers on
how to prepare these machines for migration to a cloud environment when appropriate.
● Firewalls, intrusion detection and prevention, integrity monitoring and log inspection can
all be deployed as software on virtual machines to increase protection as well as
maintain compliance integrity of servers and applications as virtual resources move from
on-premises to public cloud environments.
● By deploying this traditional line of defense to the virtual machine itself, the user can
enable critical applications and data to be moved to the cloud securely.
● To facilitate the centralized management of a server firewall policy, the security software
loaded onto a virtual machine should include a bidirectional stateful firewall that enables
virtual machine isolation and location awareness, thereby enabling a tightened policy
and the flexibility to move the virtual machine from on-premises to cloud resources.
● Integrity monitoring and log inspection software must be applied at the virtual machine
level.
● This approach to virtual machine security, which connects the machine back to the
mother ship, has some advantages in that the security software can be put into a single
software agent that provides for consistent control and management throughout the
cloud while integrating seamlessly back into existing security infrastructure investments,
providing economies of scale, deployment, and cost savings for both the service
provider and the enterprise.
4.10 IAM
● Identity and access management is a critical function for every organization and a
fundamental expectation of SaaS customers is that the principle of least privilege is
granted to their data.
● The principle of least privilege states that only the minimum access necessary to
perform an operation should be granted, and that access should be granted only for the
minimum amount of time necessary.
● However, business and IT groups will need and expect access to systems and
applications.
● The advent of cloud services and services on demand is changing the identity
management landscape.
● Most of the current identity management solutions are focused on the enterprise and
typically are architected to work in a very controlled, static environment.
● In the cloud environment, where services are offered on demand and they can
continuously evolve, aspects of current models such as trust assumptions, privacy
implications, and operational aspects of authentication and authorization, will be
challenged.
● Meeting these challenges will require a balancing act for SaaS providers as they
evaluate new models and management processes for IAM to provide end-to-end trust
and identity throughout the cloud and the enterprise.
● Another issue will be finding the right balance between usability and security. If a good
balance is not achieved, both business and IT groups may be affected by barriers to
completing their support and maintenance activities efficiently.
● Security standards define the processes, procedures, and practices necessary for
implementing a security program.
● These standards also apply to cloud related IT activities and include specific steps that
should be taken to ensure a secure environment is maintained that provides privacy and
security of confidential information in a cloud environment.
● Security standards are based on a set of key principles intended to protect this type of
trusted environment.
● Messaging standards, especially for security in the cloud, must also include nearly all the
same considerations as any other IT security endeavor.
● This means having overlapping systems designed to provide security even if one system
fails. An example is a firewall working in conjunction with an intrusion-detection system
(IDS).
● Defense in depth provides security because there is no single point of failure and no
single entry vector at which an attack can occur.
● For this reason, a choice between implementing network security in the middle part of a
network (i.e., in the cloud) or at the endpoints is a false dichotomy.
● No single security system is a solution by itself, so it is far better to secure all systems.
● This type of layered security is precisely what we are seeing develop in cloud computing.
● Traditionally, security was implemented at the endpoints, where the user controlled
access.
● An organization had no choice except to put firewalls, IDSs, and antivirus software inside
its own network.
● Today, with the advent of managed security services offered by cloud providers,
additional security can be provided inside the cloud.
● SAML is built on a number of existing standards, namely, SOAP, HTTP and XML. SAML
relies on HTTP as its communications protocol and specifies the use of SOAP (currently,
version 1.1).
● Both SAML 1.1 and SAML 2.0 use digital signatures (based on the XML Signature
standard) for authentication and message integrity.
● XML encryption is supported in SAML 2.0, though SAML 1.1 does not have encryption
capabilities.
● SAML defines XML based assertions and protocols, bindings and profiles.
● The term SAML Core refers to the general syntax and semantics of SAML assertions as
well as the protocol used to request and transmit those assertions from one system
entity to another.
● A SAML binding determines how SAML requests and responses map to standard
messaging protocols. An important (synchronous) binding is the SAML SOAP binding.
● SAML standardizes queries for, and responses that contain, user authentication,
entitlements and attribute information in an XML format.
● This format can then be used to request security information about a principal from a
SAML authority.
● The relying party (or assertion consumer or requesting party) is a partner site that
receives the security information.
● SAML assertions are usually transferred from identity providers to service providers.
● Assertions contain statements that service providers use to make access control
decisions.
○ Authentication statements
○ Attribute statements
○ Authorization decision statements
<Authentication>
</Authentication>
<Attribute>
</Attribute>
<Authentication>
</Authentication>
● Authentication statements assert to a service provider that the principal did indeed
authenticate with an identity provider at a particular time using a particular method of
authentication.
● Other information about the authenticated principal (called the authentication context)
may be disclosed in an authentication statement.
● A SAML protocol describes how certain SAML elements (including assertions) are
packaged within SAML request and response elements
● A service provider makes a query directly to an identity provider over a secure back
channel. For this reason, query messages are typically bound to SOAP.
● Corresponding to the three types of statements, there are three types of SAML queries:
○ Authentication query
○ Attribute query
○ Authorization decision query.
● Of these, the attribute query is perhaps most important. The result of an attribute query
is a SAML response containing an assertion, which itself contains an attribute statement.
● OAuth is an open protocol, initiated by Blaine Cook and Chris Messina, to allow secure
API authorization in a simple, standardized method for various types of web applications.
● Cook and Messina had concluded that there were no open standards for API access
delegation.
● The OAuth discussion group was created in April 2007, for the small group of
implementers to write the draft proposal for an open protocol.
● DeWitt Clinton of Google learned of the OAuth project and expressed interest in
supporting the effort.
● In July 2007, the team drafted an initial specification and it was released in October of
the same year.
● For developers, OAuth provides users access to their data while protecting account
credentials.
● OAuth allows users to grant access to their information, which is shared by the service
provider and consumers without sharing all of their identity.
● The Core designation is used to stress that this is the baseline, and other extensions
and protocols can build on it.
● By design, OAuth Core 1.0 does not provide many desired features (e.g., automated
discovery of endpoints, language support, support for XML-RPC and SOAP, standard
definition of resource access, OpenID integration, signing algorithms, etc.).
● This intentional lack of feature support is viewed by the authors as a significant benefit.
● The Core deals with fundamental aspects of the protocol, namely, to establish a
mechanism for exchanging a user name and password for a token with defined rights
and to provide tools to protect the token.
● It is important to understand that security and privacy are not guaranteed by the
protocol.
● In fact, OAuth by itself provides no privacy at all and depends on other protocols such as
SSL to accomplish that.
● In fact, the specification includes substantial security considerations that must be taken
into account when working with sensitive data.
● With Oauth, sites use tokens coupled with shared secrets to access resources.
4.11.3 OpenID
● OpenID is an open, decentralized standard for user authentication and access control
that allows users to log onto many services using the same digital identity.
● The original OpenID authentication protocol was developed in May 2005 by Brad
Fitzpatrick, creator of the popular community web site LiveJournal.
● In late June 2005, discussions began between OpenID developers and other developers
from an enterprise software company named NetMesh.
● The direct result of the collaboration was the Yadis discovery protocol, which was
announced on October 24, 2005.
● The Yadis specification provides a general-purpose identifier for a person and any other
entity, which can be used with a variety of services.
● Yadis discovery protocol is used for obtaining a resource description document, given
that identifier.
● Together these enable coexistence and interoperability of a rich variety of services using
a single identifier.
● The identifier uses a standard syntax and a well established namespace and requires no
additional namespace administration infrastructure.
● An OpenID is in the form of a unique URL and is authenticated by the entity hosting the
OpenID URL.
● The OpenID protocol does not rely on a central authority to authenticate a user’s identity.
● Neither the OpenID protocol nor any web sites requiring identification can mandate that
a specific type of authentication be used; nonstandard forms of authentication such as
smart cards, biometrics, or ordinary passwords are allowed.
● With OpenID 2.0, the client discovers the identity provider service URL by requesting the
XRDS document (also called the Yadis document) with the content type
application/xrds+xml, which may be available at the target URL but is always available
for a target XRI.
● There are two modes by which the relying party can communicate with the identity
provider: checkid_immediate and checkid_setup.
● In checkid_immediate, the relying party requests that the provider not interact with the
user. All communication is relayed through the user’s browser without explicitly notifying
the user.
● In checkid_setup, the user communicates with the provider server directly using the
same web browser as is used to access the relying party site.
● OpenID does not provide its own authentication methods, but if an identity provider uses
strong authentication, OpenID can be used for secure transactions.
● SSL/TLS Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer
(SSL), are cryptographically secure protocols designed to provide security and data
integrity for communications over TCP/IP.
● TLS and SSL encrypt the segments of network connections at the transport layer.
● Several versions of the protocols are in general use in web browsers, email, instant
messaging and Voice-over-IP (VoIP).
● TLS is an IETF standard protocol which was last updated in RFC 5246.
● TLS authentication is one way in which the server is authenticated, because the client
already knows the server’s identity. In this case, the client remains unauthenticated.
● TLS also supports a more secure bilateral connection mode whereby both ends of the
connection can be assured that they are communicating with whom they believe they
are connected.
● Mutual authentication requires the TLS client side to also maintain a certificate.
● Cluster monitoring is used to collect the runtime status of the entire cluster.
● The scheduler queues the tasks submitted to the whole cluster and assigns the tasks
to the processing nodes according to node availability.
● The distributed scheduler for the cloud application has special characteristics that
can support cloud applications, such as scheduling the programs written in
MapReduce style.
● The runtime support system keeps the cloud cluster working properly with high
efficiency.
● Runtime support is software needed in browser-initiated applications applied by
thousands of cloud customers.
● This method adds or removes computing instances based on the current utilization
level of the allocated resources.
● The demand-driven method automatically allocates two Xeon processors for the user
application, when the user was using one Xeon processor more than 60 percent of
the time for an extended period
● Cloud service providers must learn from the managed service provider (MSP) model
and ensure that their customer’s applications and data are secure if they hope to
retain their customer base and competitiveness.
● Security ranked first as the greatest challenge or issue of cloud computing.
8. List the seven security issues with respect to cloud computing vendor.
● Firewalls, intrusion detection and prevention, integrity monitoring, and log inspection
can all be deployed as software on virtual machines to increase protection and
maintain compliance integrity of servers and applications as virtual resources move
from on-premises to public cloud environments.
● Integrity monitoring and log inspection software must be applied at the virtual
machine level.
● Identity and access management is a critical function for every organization, and a
fundamental expectation of SaaS customers is that the principle of least privilege is
granted to their data.
● Security standards define the processes, procedures, and practices necessary for
implementing a security program.
● These standards also apply to cloud related IT activities and include specific steps
that should be taken to ensure a secure environment is maintained that provides
privacy and security of confidential information in a cloud environment.
● Authentication statements
● Attribute statements
● Authorization decision statements
● A SAML protocol describes how certain SAML elements (including assertions) are
packaged within SAML request and response elements
● SAML protocol is a simple request–response protocol.
● The most important type of SAML protocol request is a query.
● Authentication query
● Attribute query
● Authorization decision query.
● OAuth (Open authentication) is an open protocol, initiated by Blaine Cook and Chris
Messina, to allow secure API authorization in a simple, standardized method for
various types of web applications.
● OAuth is a method for publishing and interacting with protected data.
● OAuth allows users to grant access to their information, which is shared by the
service provider and consumers without sharing all of their identity.
● An OpenID is in the form of a unique URL and is authenticated by the entity hosting
the OpenID URL.
● SSL/TLS Transport Layer Security (TLS) and its predecessor, Secure Sockets Layer
(SSL), are cryptographically secure protocols designed to provide security and data
integrity for communications over TCP/IP.
● TLS and SSL encrypt the segments of network connections at the transport layer.
● TLS also supports a more secure bilateral connection mode whereby both ends of
the connection can be assured that they are communicating with whom they believe
they are connected. This is known as mutual authentication.
● Mutual authentication requires the TLS client side to also maintain a certificate.
-------------------------------------------------------------------------------------------------------------------------------
Hadoop – MapReduce – Virtual Box -- Google App Engine – Programming Environment for
Google App Engine – OpenStack – Federation in the Cloud – Four Levels of Federation –
Federated Services and Applications – Future of Federation.
-------------------------------------------------------------------------------------------------------------------------------
5.1 Hadoop
● The Hadoop implementation of MapReduce uses the Hadoop Distributed File System
(HDFS) as its underlying layer rather than GFS.
○ MapReduce engine
○ HDFS
● The MapReduce engine is the computation engine running on top of HDFS as its data
storage manager.
● HDFS is a distributed file system inspired by GFS that organizes files and stores their
data on a distributed computing system.
● To store a file in this architecture, HDFS splits the file into fixed-size blocks (e.g., 64 MB)
and stores them on workers (DataNodes).
● The NameNode (master) also manages the file system’s metadata and namespace.
● In such systems, the namespace is the area maintaining the metadata and metadata
refers to all the information stored by a file system that is needed for overall
management of all files.
● For example, NameNode in the metadata stores all information regarding the location of
input splits/blocks in all DataNodes.
● Each DataNode, usually one per node in a cluster, manages the storage attached to the
node. Each DataNode is responsible for storing and retrieving its file blocks.
● However, because HDFS is not a general purpose file system, as it only executes
specific types of applications, it does not need all the requirements of a general
distributed file system.
● One of the main aspects of HDFS is its fault tolerance characteristic. Since Hadoop is
designed to be deployed on low-cost hardware by default, a hardware failure in this
system is considered to be common rather than an exception.
● Hadoop considers the following issues to fulfill reliability requirements of the file system
○ Block replication: To reliably store data in HDFS, file blocks are replicated in this
system. The replication factor is set by the user and is three by default.
■ The list of blocks per file will shrink as the size of individual blocks
increases.
■ Keeping large amounts of data sequentially within a block provides fast
streaming reads of data.
● HDFS Operation: The control flow of HDFS operations such as write and read can
properly highlight roles of the NameNode and DataNodes in the managing operations
○ To read a file in HDFS, a user sends an “open” request to the NameNode to get
the location of file blocks.
○ For each file block, the NameNode returns the address of a set of DataNodes
containing replica information for the requested file.
○ The number of addresses depends on the number of block replicas. Upon
receiving such information, the user calls the read function to connect to the
closest DataNode containing the first block of the file.
○ After the first block is streamed from the respective DataNode to the user, the
established connection is terminated and the same process is repeated for all
blocks of the requested file until the whole file is streamed to the user.
○ To write a file in HDFS, a user sends a “create” request to the NameNode to
create a new file in the file system namespace.
○ If the file does not exist, the NameNode notifies the user and allows him to start
writing data to the file by calling the write function.
○ The first block of the file is written to an internal queue termed the data queue
while a data streamer monitors its writing into a DataNode.
○ Since each file block needs to be replicated by a predefined factor, the data
streamer first sends a request to the NameNode to get a list of suitable
DataNodes to store replicas of the first block.
○ The steamer then stores the block in the first allocated DataNode.
○ Afterward, the block is forwarded to the second DataNode by the first DataNode.
○ The process continues until all allocated DataNodes receive a replica of the first
block from the previous DataNode.
○ Once this replication process is finalized, the same process starts for the second
block and continues until all blocks of the file are stored and replicated on the file
system.
5.2 MapReduce
● The topmost layer of Hadoop is the MapReduce engine that manages the data flow and
control flow of MapReduce jobs over distributed computing systems.
● Figure 5.1 shows the MapReduce engine architecture cooperating with HDFS.
● Similar to HDFS, the MapReduce engine also has a master/slave architecture consisting
of a single JobTracker as the master and a number of TaskTrackers as the slaves
(workers).
● The JobTracker manages the MapReduce job over a cluster and is responsible for
monitoring jobs and assigning tasks to TaskTrackers.
● The TaskTracker manages the execution of the map and/or reduce tasks on a single
computation node in the cluster.
● Each TaskTracker node has a number of simultaneous execution slots, each executing
either a map or a reduce task.
● Slots are defined as the number of simultaneous threads supported by CPUs of the
TaskTracker node.
● For example, a TaskTracker node with N CPUs, each supporting M threads, has M * N
simultaneous execution slots.
● It is worth noting that each data block is processed by one map task running on a single
slot.
○ User node
○ JobTracker
○ TaskTrackers
● The data flow starts by calling the runJob (conf) function inside a user program running
on the user node, in which conf is an object containing some tuning parameters for the
MapReduce framework and HDFS.
● The runJob (conf) function and conf are comparable to the MapReduce (Spec, &Results)
function and Spec in the first implementation of MapReduce by Google.
● Figure 5.2 depicts the data flow of running a MapReduce job in Hadoop.
● Job Submission Each job is submitted from a user node to the JobTracker node that
might be situated in a different node within the cluster through the following procedure:
○ A user node asks for a new job ID from the JobTracker and computes input file
splits.
○ The user node copies some resources, such as the job’s JAR file, configuration
file, and computed input splits, to the JobTracker’s file system.
○ The user node submits the job to the JobTracker by calling the submitJob()
function.
○ Task assignment The JobTracker creates one map task for each computed input
split by the user node and assigns the map tasks to the execution slots of the
TaskTrackers.
■ The JobTracker considers the localization of the data when assigning the
map tasks to the TaskTrackers.
■ The JobTracker also creates reduce tasks and assigns them to the
TaskTrackers.
■ The number of reduce tasks is predetermined by the user, and there is
no locality consideration in assigning them.
○ Task execution The control flow to execute a task (either map or reduce) starts
inside the TaskTracker by copying the job JAR file to its file system.
○ Instructions inside the job JAR file are executed after launching a Java Virtual
Machine (JVM) to run its map or reduce task.
○ Task running check A task running check is performed by receiving periodic
heartbeat messages to the JobTracker from the TaskTrackers.
○ Each heartbeat notifies the JobTracker that the sending TaskTracker is alive, and
whether the sending TaskTracker is ready to run a new task.
● For one thing, it installs on the existing Intel or AMD-based computers, whether they are
running Windows, Mac OS X, Linux, or Oracle Solaris operating systems (OSes).
● Secondly, it extends the capabilities of existing computer so that it can run multiple
OSes, inside multiple virtual machines, at the same time.
● As an example, the end user can run Windows and Linux on your Mac, run Windows
Server 2016 on your Linux server, run Linux on your Windows PC, and so on, all
alongside the existing applications.
● It can run everywhere from small embedded systems or desktop class machines all the
way up to datacenter deployments and even Cloud environments.
● Virtual Box is created by Innotek and it was acquired by Sun Microsystems. In 2010,
Virtual Box was acquired by Oracle.
● Virtual Box supported in Windows, macOS. Linux, Solaris and Open Solaris.
● The user can independently configure each VM and run it under a choice of software-
based virtualization or hardware assisted virtualization if the underlying host hardware
supports this.
● The host OS and guest OSs and applications can communicate with each other through
a number of mechanisms including a common clipboard and a virtualized network
facility.
● Guest VMs can also directly communicate with each other if configured to do so.
● The software based virtualization was dropped starting with VirtualBox 6.1. In earlier
versions the absence of hardware assisted virtualization, VirtualBox adopts a
standard software-based virtualization approach.
● This mode supports 32 bit guest OSs which run in rings 0 and 3 of the
Intel ring architecture.
○ The system reconfigures the guest OS code, which would normally run in ring 0,
to execute in ring 1 on the host hardware.
○ Because this code contains many privileged instructions which cannot run
natively in ring 1, VirtualBox employs a Code Scanning and Analysis Manager
(CSAM) to scan the ring 0 code recursively before its first execution to identify
problematic instructions and then calls the Patch Manager (PATM) to perform in-
situ patching.
○ This replaces the instruction with a jump to a VM-safe equivalent compiled code
fragment in hypervisor memory.
○ The guest user mode code, running in ring 3, generally runs directly on the host
hardware in ring 3.
● In both cases, VirtualBox uses CSAM and PATM to inspect and patch the offending
instructions whenever a fault occurs.
● VirtualBox also contains a dynamic recompiler, based on QEMU to recompile any real
mode or protected mode code entirely.
● Hardware assisted virtualization is starting with version 6.1, VirtualBox only supports.
● VirtualBox supports both Intel VT-X and AMD-V hardware assisted virtualization.
● Making use of these facilities, VirtualBox can run each guest VM in its own separate
address-space.
● The guest OS ring 0 code runs on the host at ring 0 in VMX non-root mode rather than in
ring 1.
● Until then, VirtualBox specifically supported some guests (including 64 bit guests, SMP
guests and certain proprietary OSs) only on hosts with hardware-assisted virtualization
● The system emulates hard disks in one of three disk image formats:
○ VDI: This format is the VirtualBox-specific VirtualBox Disk Image and stores data
in files bearing a ".vdi" .
○ VMDK: This open format is used by VMware products and stores data in one or
more files bearing ".vmdk" filename extensions. A single virtual hard disk may
span several files.
○ VHD: This format is used by Windows Virtual PC and Hyper-V and it is the native
virtual disk format of the Microsoft Windows operating system. Data in this format
are stored in a single file bearing the ".vhd" filename extension.
● A VirtualBox virtual machine can, therefore, use disks previously created in VMware or
Microsoft Virtual PC, as well as its own native format.
● VirtualBox can also connect to iSCSI targets and to raw partitions on the host, using
either as virtual hard disks.
● For an Ethernet network adapter, VirtualBox virtualizes these Network Interface Cards.
● A USB controller is emulated so that any USB devices attached to the host can be seen
in the guest.
● When the Oracle VM VirtualBox graphical user interface (GUI) is opened and a VM is
started, at least the following three processes are running:
○ The service is responsible for bookkeeping, maintaining the state of all VMs, and
for providing communication between Oracle VM VirtualBox components.
● The Main API of Oracle VM VirtualBox exposes the entire feature set of the virtualization
engine.
● The Main API is made available to C++ clients through COM on Windows hosts or
XPCOM on other hosts. Bridges also exist for SOAP, Java and Python.
● The company has extensive experience in massive data processing that has led to new
insights into data-center design and novel programming models that scale to incredible
sizes.
● Google has hundreds of data centers and has installed more than 460,000 servers
worldwide.
● For example, 200 Google data centers are used at one time for a number of cloud
applications.
● Data items are stored in text, images, and video and are replicated to tolerate faults or
failures.
● Google’s App Engine (GAE) which offers a PaaS platform supporting various cloud and
web applications.
● Google has pioneered cloud development by leveraging the large number of data
centers it operates.
● For example, Google pioneered cloud services in Gmail, Google Docs, and Google
Earth, among other applications.
● These applications can support a large number of users simultaneously with HA.
● Notable technology achievements include the Google File System (GFS), MapReduce,
BigTable, and Chubby.
● In 2008, Google announced the GAE web application platform which is becoming a
common platform for many small cloud service providers.
● GAE enables users to run their applications on a large number of data centers
associated with Google’s search engine operations.
● Figure 5.4 shows the major building blocks of the Google cloud platform which has been
used to deliver the cloud services highlighted earlier.
● Users can interact with Google applications via the web interface provided by each
application.
● Third-party application providers can use GAE to build cloud applications for providing
services.
● The applications all run in data centers under tight management by Google engineers.
Inside each data center, there are thousands of servers forming different clusters
Node
Scheduler Big Table Server
Chubby
MapReduce Job
GFS Master
GFS Chunk
Application Node Node Server
User Node Scheduler Slave
● The building blocks of Google’s cloud computing application include the Google File
System for storing large amounts of data, the MapReduce programming framework for
application developers, Chubby for distributed application lock services, and BigTable as
a storage service for accessing structural or semistructural data.
● With these building blocks, Google has built many cloud applications.
● Figure 5.4 shows the overall architecture of the Google cloud infrastructure.
● A typical cluster configuration can run the Google File System, MapReduce jobs and
BigTable servers for structure data.
● Extra services such as Chubby for distributed locks can also run in the clusters.
● GAE runs the user program on Google’s infrastructure. As it is a platform running third-
party programs, application developers now do not need to worry about the maintenance
of servers.
● At the time of this writing, GAE supports Python and Java programming environments.
The applications can run similar to web application containers.
● The frontend can be used as the dynamic web serving infrastructure which can provide
the full support of common technologies.
● Google offers essentially free GAE services to all Gmail account owners.
● The user can register for a GAE account or use your Gmail account name to sign up for
the service.
● If the user exceeds the quota, the page instructs how to pay for the service. Then the
user can download the SDK and read the Python or Java guide to get started.
● Note that GAE only accepts Python, Ruby and Java programming languages.
● The platform does not provide any IaaS services, unlike Amazon, which offers IaaS and
PaaS.
● This model allows the user to deploy user-built applications on top of the cloud
infrastructure that are built using the programming languages and software tools
supported by the provider (e.g., Java, Python).
● Azure does this similarly for .NET. The user does not manage the underlying cloud
infrastructure.
● The cloud provider facilitates support of application development, testing, and operation
support on a well-defined service platform.
● Best-known GAE applications include the Google Search Engine, Google Docs, Google
Earth and Gmail.
● Users can interact with Google applications via the web interface provided by each
application.
● Third party application providers can use GAE to build cloud applications for providing
services.
● Inside each data center, there might be thousands of server nodes to form different
clusters.
● One is a storage service to store application specific data in the Google infrastructure.
● The data can be persistently stored in the backend storage server while still providing
the facility for queries, sorting and even transactions similar to traditional database
systems.
● GAE also provides Google specific services, such as the Gmail account service. This
can eliminate the tedious work of building customized user management components in
web applications.
● Figure 5.5 summarizes some key features of GAE programming model for two
supported languages: Java and Python.
● A client environment that includes an Eclipse plug-in for Java allows you to debug your
GAE on your local machine.
● Also, the GWT Google Web Toolkit is available for Java web application developers.
Developers can use this, or any other language using a JVM based interpreter or
compiler, such as JavaScript or Ruby.
● Python is often used with frameworks such as Django and CherryPy, but Google also
supplies a built in webapp Python environment.
● There are several powerful constructs for storing and accessing data.
● The data store is a NOSQL data management system for entities that can be, at most, 1
MB in size and are labeled by a set of schema-less properties.
Google corporate
Data Store apps
Users
Firewall
Secure Intranet
● Queries can retrieve entities of a given kind filtered and sorted by the values of the
properties.
● Java offers Java Data Object (JDO) and Java Persistence API (JPA) interfaces
implemented by the open source Data Nucleus Access platform, while Python has a
SQL-like query language called GQL.
● The data store is strongly consistent and uses optimistic concurrency control.
● The user application can execute multiple data store operations in a single transaction
which either all succeed or all fail together.
● The data store implements transactions across its distributed network using entity
groups.
● Entities of the same group are stored together for efficient execution of transactions.
● The user GAE application can assign entities to groups when the entities are created.
● The performance of the data store can be enhanced by in-memory caching using the
memcache, which can also be used independently of the data store.
● Recently, Google added the blobstore which is suitable for large files as its size limit is 2
GB.
● The Google SDC Secure Data Connection can tunnel through the Internet and link your
intranet to an external GAE application.
● The URL Fetch operation provides the ability for applications to fetch resources and
communicate with other hosts over the Internet using HTTP and HTTPS requests.
● There is a specialized mail mechanism to send e-mail from your GAE application.
● Applications can access resources on the Internet, such as web services or other data,
using GAE’s URL fetch service.
● The URL fetch service retrieves web resources using the same high-speed Google
infrastructure that retrieves web pages for many other Google products.
● There are dozens of Google “corporate” facilities including maps, sites, groups,
calendar, docs, and YouTube, among others.
● These support the Google Data API which can be used inside GAE.
● An application can use Google Accounts for user authentication. Google Accounts
handles user account creation and sign-in, and a user that already has a Google
account (such as a Gmail account) can use that account with your app.
● GAE provides the ability to manipulate image data using a dedicated Images service
which can resize, rotate, flip, crop and enhance images. An application can perform
tasks outside of responding to web requests.
● GFS was built primarily as the fundamental storage service for Google’s search engine.
● As the size of the web data that was crawled and saved was quite substantial, Google
needed a distributed file system to redundantly store massive amounts of data on cheap
and unreliable computers.
● In addition, GFS was designed for Google applications and Google applications were
built for GFS.
● In traditional file system design, such a philosophy is not attractive, as there should be a
clear interface between applications and the file system such as a POSIX interface.
● GFS typically will hold a large number of huge files, each 100 MB or larger, with files that
are multiple GB in size quite common. Thus, Google has chosen its file data block size
to be 64 MB instead of the 4 KB in typical traditional file systems.
● Files are typically written once, and the write operations are often the appending data
blocks to the end of files.
● BigTable was designed to provide a service for storing and retrieving structured and
semi structured data.
● BigTable applications include storage of web pages, per-user data, and geographic
locations.
● The scale of such data is incredibly large. There will be billions of URLs, and each URL
can have many versions, with an average page size of about 20 KB per version.
● There are hundreds of millions of users and there will be thousands of queries per
second.
● The same scale occurs in the geographic data, which might consume more than 100 TB
of disk space.
● It is not possible to solve such a large scale of structured or semi structured data using a
commercial database system.
● This is one reason to rebuild the data management system and the resultant system can
be applied across many projects for a low incremental cost.
● The other motivation for rebuilding the data management system is performance.
● Low level storage optimizations help increase performance significantly which is much
harder to do when running on top of a traditional database layer.
● The design and implementation of the BigTable system has the following goals.
● Thus, BigTable can be viewed as a distributed multilevel map. It provides a fault tolerant
and persistent database as in a storage service.
● The BigTable system is scalable, which means the system has thousands of servers,
terabytes of in-memory data, peta bytes of disk based data, millions of reads/writes per
second and efficient scans.
● It can store small files inside Chubby storage which provides a simple namespace as a
file system tree.
● The files stored in Chubby are quite small compared to the huge files in GFS.
5.6 OpenStack
● The OpenStack project is an open source cloud computing platform for all types of
clouds, which aims to be simple to implement, massively scalable and feature rich.
● Developers and cloud computing technologists from around the world create the
OpenStack project.
● Each service offers an application programming interface (API) that facilitates this
integration.
● The OpenStack system consists of several key services that are separately installed.
● These services work together depending on your cloud needs and include the Compute,
Identity, Networking, Image, Block Storage, Object Storage, Telemetry, Orchestration,
and Database services.
● The administrator can install any of these projects separately and configure them
standalone or as connected entities.
● To design, deploy, and configure OpenStack, administrators must understand the logical
architecture.
● OpenStack consists of several independent parts, named the OpenStack services. All
services authenticate through a common Identity service.
● Individual services interact with each other through public APIs, except where privileged
administrator commands are necessary.
● All services have at least one API process, which listens for API requests, preprocesses
them and passes them on to other parts of the service.
● With the exception of the Identity service, the actual work is done by distinct processes.
● For communication between the processes of one service, an AMQP message broker is
used.
● When deploying and configuring the OpenStack cloud, administrator can choose among
several message broker and database solutions, such as RabbitMQ, MySQL, MariaDB,
and SQLite.
● Users can access OpenStack via the web-based user interface implemented by the
Horizon Dashboard, via command-line clients and by issuing API requests through tools
like browser plug-ins or curl.
● For applications, several SDKs are available. Ultimately, all these access methods issue
REST API calls to the various OpenStack services.
● The controller node runs the Identity service, Image service, Placement service,
management portions of Compute, management portion of Networking, various
Networking agents, and the Dashboard.
○ Optionally, the controller node runs portions of the Block Storage, Object
Storage, Orchestration, and Telemetry services.
● The compute node runs the hypervisor portion of Compute that operates instances. By
default, Compute uses the KVM hypervisor.
● The compute node also runs a Networking service agent that connects instances to
virtual networks and provides firewalling services to instances via security groups.
● Administrator can deploy more than one compute node. Each node requires a minimum
of two network interfaces.
● The optional Block Storage node contains the disks that the Block Storage and Shared
File System services provision for instances.
● For simplicity, service traffic between compute nodes and this node uses the
management network.
● Administrator can deploy more than one block storage node. Each node requires a
minimum of one network interface.
● The optional Object Storage node contains the disks that the Object Storage service
uses for storing accounts, containers, and objects.
● For simplicity, service traffic between compute nodes and this node uses the
management network.
● This service requires two nodes. Each node requires a minimum of one network
interface. Administrator can deploy more than two object storage nodes.
● The provider networks option deploys the OpenStack Networking service in the simplest
way possible with primarily layer 2 (bridging/switching) services and VLAN segmentation
of networks.
● A key opportunity for the emerging cloud industry will be in defining a federated cloud
ecosystem by connecting multiple cloud computing providers using a common standard.
● A notable research project being conducted by Microsoft called the Geneva Framework.
This framework focuses on issues involved in cloud federation.
● Geneva has been described as claims based access platform and is said to help simplify
access to applications and other systems.
● The concept allows for multiple providers to interact seamlessly with others and it
enables developers to incorporate various authentication models that will work with any
corporate identity system, including Active Directory,
● LDAPv3 based directories, application specific databases, and new user centric identity
models such as LiveID, OpenID, and InfoCard systems.
● Federation in cloud is implemented by the use of Internet Engineering Task Force (IETF)
standard Extensible Messaging and Presence Protocol (XMPP) and inter domain
federation using the Jabber Extensible Communications Platform (Jabber XCP).
● Because this protocol is currently used by a wide range of existing services offered by
providers as diverse as Google Talk, Live Journal, Earthlink, Facebook, ooVoo, Meebo,
Twitter, the U.S. Marines Corps, the Defense Information Systems Agency (DISA), the
U.S. Joint Forces Command (USJFCOM), and the National Weather Service.
● Jabber XCP is a highly programmable platform, which makes it ideal for adding
presence and messaging to existing applications or services and for building next-
generation, presence based solutions.
● Over the last few years there has been a controversy brewing in web services
architectures.
● Cloud services are being talked up as a fundamental shift in web architecture that
promises to move us from interconnected silos to a collaborative network of services
whose sum is greater than its parts.
● The problem is that the protocols powering current cloud services, SOAP (Simple Object
Access Protocol) and a few other assorted HTTP based protocols, are all one-way
information exchanges.
● Therefore cloud services are not real time, would not scale, and often cannot clear the
firewall.
● Many believe that those barriers can be overcome by XMPP (also called Jabber) as the
protocol that will fuel the Software as a Service (SaaS) models of tomorrow.
● Google, Apple, AOL, IBM, Live journal and Jive have all incorporated this protocol into
their cloud based solutions in the last few years.
● Since the beginning of the Internet era, if the user wanted to synchronize services
between two servers, the most common solution was to have the client “ping” the host at
regular intervals, which is known as polling.
● XMPP’s profile has been steadily gaining since its inception as the protocol behind the
open source instant messenger (IM) server jabberd in 1998.
● Robust security is supported via Simple Authentication and Security Layer (SASL) and
Transport Layer Security (TLS).
● XMPP is a good fit for cloud computing because it allows for easy two way
communication
● XMPP eliminates the need for polling and focus on rich publish subscribe functionality
● It in it is XML-based and easily extensible, perfect for both new IM features and custom
cloud services
● It is efficient and has been proven to scale to millions of concurrent users on a single
service (such as Google’s GTalk). And also it has a built-in worldwide federation model.
● Of course, XMPP is not the only pub-sub enabler getting a lot of interest from web
application developers.
● An Amazon EC2-backed server can run Jetty and Cometd from Dojo.
● Unlike XMPP, Comet is based on HTTP and in conjunction with the Bayeux Protocol,
uses JSON to exchange data.
● Given the current market penetration and extensive use of XMPP and XCP for
federation in the cloud and that it is the dominant open protocol in that space.
● The ability to exchange data used for presence, messages, voice, video, files,
notifications, etc., with people, devices and applications gain more power when they can
be shared across organizations and with other service providers.
● Federation differs from peering, which requires a prior agreement between parties
before a server-to-server (S2S) link can be established.
● In the past, peering was more common among traditional telecommunications providers
(because of the high cost of transferring voice traffic).
● In the brave new Internet world, federation has become a de facto standard for most
email systems because they are federated dynamically through Domain Name System
(DNS) settings and server configurations.
● Federation is the ability for two XMPP servers in different domains to exchange XML
stanzas.
● According to the XEP-0238: XMPP Protocol Flows for Inter-Domain Federation, there
are at least four basic types of federation:
● Permissive federation
● Verified federation
○ This type of federation occurs when a server accepts a connection from a peer
after the identity of the peer has been verified.
○ It uses information obtained via DNS and by means of domain-specific keys
exchanged beforehand.
○ The connection is not encrypted, and the use of identity verification effectively
prevents domain spoofing.
○ To make this work, federation requires proper DNS setup and that is still subject
to DNS poisoning attacks.
○ Verified federation has been the default service policy on the open XMPP since
the release of the open-source jabberd 1.2 server.
● Encrypted federation
○ In this mode, a server accepts a connection from a peer if and only if the peer
supports Transport Layer Security (TLS) as defined for XMPP in Request for
Comments (RFC) 3920.
○ The peer must present a digital certificate.
○ The certificate may be self signed, but this prevents using mutual authentication.
○ If this is the case, both parties proceed to weakly verify identity using Server
Dialback.
○ XEP-0220 defines the Server Dialback protocol, which is used between XMPP
servers to provide identity verification.
○ Server Dialback uses the DNS as the basis for verifying identity
○ The basic approach is that when a receiving server receives a server-to-server
connection request from an originating server, it does not accept the request until
it has verified a key with an authoritative server for the domain asserted by the
originating server.
○ Although Server Dialback does not provide strong authentication or trusted
federation, and although it is subject to DNS poisoning attacks, it has effectively
prevented most instances of address spoofing on the XMPP network since its
release in 2000.
○ This results in an encrypted connection with weak identity verification.
● Trusted federation
○ In this federation, a server accepts a connection from a peer only under the
stipulation that the peer supports TLS and the peer can present a digital
certificate issued by a root certification authority (CA) that is trusted by the
authenticating server.
○ The list of trusted root CAs may be determined by one or more factors, such as
the operating system, XMPP server software or local service policy.
○ In trusted federation, the use of digital certificates results not only in a channel
encryption but also in strong authentication.
○ The use of trusted domain certificates effectively prevents DNS poisoning attacks
but makes federation more difficult, since such certificates have traditionally not
been easy to obtain.
● Clouds typically consist of all the users, devices, services, and applications connected to
the network.
● In order to fully leverage the capabilities of this cloud structure, a participant needs the
ability to find other entities of interest.
● Such entities might be end users, multiuser chat rooms, real-time content feeds, user
directories, data relays, messaging gateways, etc.
● XMPP uses service discovery (as defined in XEP-0030) to find the aforementioned
entities.
● The discovery protocol enables any network participant to query another entity regarding
its identity, capabilities and associated entities.
● When a participant connects to the network, it queries the authoritative server for its
particular domain about the entities associated with that authoritative server.
● In response to a service discovery query, the authoritative server informs the inquirer
about services hosted there and may also detail services that are available but hosted
elsewhere.
● XMPP includes a method for maintaining personal lists of other entities, known as roster
technology, which enables end users to keep track of various types of entities.
● Usually, these lists are comprised of other entities the users are interested in or interact
with regularly.
● Most XMPP deployments include custom directories so that internal users of those
services can easily find what they are looking for.
● These mechanisms have provided a stable, secure foundation for growth of the XMPP
network and similar real time technologies.
1. What is Hadoop?
● HDFS is a Hadoop distributed file system inspired by GFS that organizes files and
stores their data on a distributed computing system.
● HDFS has a master/slave architecture containing a single NameNode as the master
and a number of DataNodes as workers (slaves).
● To store a file in this architecture, HDFS splits the file into fixed-size blocks (e.g., 64
MB) and stores them on workers (DataNodes).
● The mapping of blocks to DataNodes is determined by the NameNode.
● One of the main aspects of HDFS is its fault tolerance characteristic. Since Hadoop
is designed to be deployed on low-cost hardware by default, a hardware failure in
this system is considered to be common rather than an exception.
5. List the issues to fulfill reliability requirements of the file system by hadoop.
● Block replication
● Replica placement
● Heartbeat and Block report messages
● The list of blocks per file will shrink as the size of individual blocks increases, and by
keeping large amounts of data sequentially within a block, HDFS provides fast
streaming reads of data.
8. Define MapReduce.
● The topmost layer of Hadoop is the MapReduce engine that manages the data flow
and control flow of MapReduce jobs over distributed computing systems.
● Similar to HDFS, the MapReduce engine also has a master/slave architecture
consisting of a single JobTracker as the master and a number of TaskTrackers as
the slaves (workers).
● The JobTracker manages the MapReduce job over a cluster and is responsible for
monitoring jobs and assigning tasks to TaskTrackers.
● The TaskTracker manages the execution of the map and/or reduce tasks on a single
computation node in the cluster.
● a user node
● a JobTracker
● TaskTrackers
● VDI: This format is the VirtualBox-specific VirtualBox Disk Image and stores data in
files bearing a “.vdi”.
● VMDK: This open format is used by VMware products and stores data in one or
more files bearing “.vmdk” filename extensions.
● VHD: This format is used by Windows Virtual PC and Hyper-V, and is the native
virtual disk format of the Microsoft Windows operating system.
● Google’s App Engine (GAE) which offers a PaaS platform supporting various cloud
and web applications.
● Datastore
● Application runtime environment
● Software development kit (SDK)
● Administration console
● GAE web service infrastructure
● Well-known GAE applications include the Google Search Engine, Google Docs,
Google Earth, and Gmail.
● These applications can support large numbers of users simultaneously.
● Users can interact with Google applications via the web interface provided by each
application.
● Third-party application providers can use GAE to build cloud applications for
providing services.
17. Mention the goals for design and implementation of the BigTable system.
● The OpenStack project is an open source cloud computing platform for all types of
clouds, which aims to be simple to implement, massively scalable, and feature rich.
● Developers and cloud computing technologists from around the world create the
OpenStack project.
● OpenStack provides an Infrastructure-as-a-Service (IaaS) solution through a set of
interrelated services.
● The OpenStack system consists of several key services that are separately installed.
● Compute, Identity, Networking, Image, Block Storage, Object Storage, Telemetry,
Orchestration and Database services.
● Permissive federation
● Verified federation
● Encrypted federation
● Trusted federation
Seventh Semester
(Regulation 2017)
1. Define Cloud.
2. List the components of cloud model.
3. Mention the four characteristics to identify the service.
4. Differentiate between Full virtualization and Paravirtualization.
5. What are advantages of cloud storage?
6. What is Hardware as a Service?
7. What is the purpose of runtime support service named cluster monitoring?
8. Compare over provisioning and under provisioning?
9. Illustrate the architecture of VirtualBox.
10. List the merits of XMPP.
PART B – (5 X 16 = 80 marks)
13. (a) (i) Explain about layered architectural design of cloud computing. (8)
(ii) Explain about cloud deployment models. (8)
Or
(b) Explain about major architectural design challenges in cloud. (16)
14. (a) (i) Explain about inter cloud resource management with neat diagram. (8)
(ii) Explain about resource provisioning methods. (8)
Or
(b) (i) Explain about Identity Access Management. (8)
(ii) Explain about Virtual Machine Security. (8)
15. (a) Explain about HDFS and MapReduce in Hadoop framework. (16)
Or
(b) (i) Explain about Programming environment for Google AppEngine (8)
(ii) Explain about the levels of federation. (8)
Seventh Semester
(Regulation 2017)
PART B – (5 X 16 = 80 marks)
11. (a) Explain about the principles of Parallel and Distributed Computing. (16)
Or
(b) Explain about characteristics of cloud computing. (16)
13. (a) Explain about NIST reference architecture with neat diagram. (16)
Or
(b) (i) Explain about cloud service model. (8)
(ii) Explain about Storage-as-a-Service. (8)
14. (a) (i) Explain about global exchange of cloud resources (8).
(ii) Explain about runtime support services in inter cloud management. (8)
Or
(b) Explain about cloud security and its challenges. Elaborate some standards
specific to cloud security. (16)
“Books are my favorite friends and I Consider my home library, with many
thousand books, to be my greatest wealth. Every new book, based on some new
idea inspires me and gives me a new thought to ponder.”