0% found this document useful (0 votes)
3 views

ccd notes

The document provides an overview of cloud computing, including its definitions, evolution, and key concepts such as virtualization, service orientation, and utility computing. It outlines the characteristics and benefits of cloud computing, such as resource pooling, on-demand self-service, and security, while also addressing challenges like security and privacy, interoperability, and vendor lock-in. Overall, it emphasizes the importance of understanding both the advantages and risks associated with cloud computing in modern IT environments.

Uploaded by

aashishangdi3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ccd notes

The document provides an overview of cloud computing, including its definitions, evolution, and key concepts such as virtualization, service orientation, and utility computing. It outlines the characteristics and benefits of cloud computing, such as resource pooling, on-demand self-service, and security, while also addressing challenges like security and privacy, interoperability, and vendor lock-in. Overall, it emphasizes the importance of understanding both the advantages and risks associated with cloud computing in modern IT environments.

Uploaded by

aashishangdi3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Cloud computing:

Chapter 1: Cloud computing Fundamentals:

Definition of cloud computing:


According to the definition given by Armbrust

1)Cloud computing refers to both the applications


delivered as services over the Internet and the
hardware and system software in the datacenters
that provide those services.

2) According to the definition proposed by the U.S.


National Institute of Standards and Technology
(NIST):
Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with
minimal management effort or service provider
interaction.
3) RajKumar Buyya defined cloud computing based on
the nature of utility computing
A cloud is a type of parallel and distributed system
consisting of a collection of interconnected and
virtualized computers that are dynamically
provisioned and presented as one or more unified
computing resources based on service-level
agreements established
established through negotiation between
the service provider and consumers.

Q) What is evolution of cloud computing

Distributed Systems:
It is a composition of multiple independent systems but all of
them are depicted as a single entity to the users. The purpose
of distributed systems is to share resources and also use them
effectively and efficiently. Distributed systems possess
characteristics
stics such as scalability, concurrency, continuous
availability, heterogeneity, and independence in failures. But
the main problem with this system was that all the systems
were required to be present at the same geographical location.

Mainframe computing:
computi
Mainframes which first came into existence in 1951 are
highly powerful and reliable computing machines. These are
responsible for handling large data such as massive input-
input
output operations. Even today these are used for bulk
processing tasks such as online transactions etc. These
systems have almost no downtime with high fault tolerance.
After distributed computing, these increased the processing
capabilities of the system.
Cluster computing:
In 1980s, cluster computing came as an alternative to
mainframe computing. Each machine in the cluster was
connected to each other by a network with high bandwidth.
These were way cheaper than those mainframe systems.
These were equally capable of high computations. Also, new
nodes could easily be added to the cluster if it was required.

Grid computing:
In 1990s, the concept of grid computing was introduced. It
means that different systems were placed at entirely different
geographical locations and these all were connected via the
internet. These systems belonged to different organizations
and thus the grid consisted of heterogeneous nodes. Although
it solved some problems but new problems emerged as the
distance between the nodes increased.

Virtualization:
It was introduced nearly 40 years back. It refers to the process
of creating a virtual layer over the hardware which allows the
user to run multiple instances simultaneously on the
hardware. It is a key technology used in cloud computing. It is
the base on which major cloud computing services such as
Amazon EC2, VMware vCloud, etc work on.

Web 2.0:
It is the interface through which the cloud computing services
interact with the clients. It is because of Web 2.0 that we have
interactive and dynamic web pages. It also increases
flexibility among web pages. Popular examples of web 2.0
include Google Maps, Facebook,
Service orientation:
It acts as a reference model for cloud computing. It supports
low-cost,
cost, flexible, and evolvable applications. Two important
concepts were introduced in this computin
computing model.

Utility computing:
It is a computing model that defines service
provisioning techniques for services such as compute
services along with other major services such as
storage, infrastructure, etc which are provisioned on a
pay-per-use
use basis.
basis

1. What is virtualisation ?

Virtualization is technology that you can use to create virtual


representations of servers, storage, networks, and other
physical machines. Virtual software mimics the functions of
physical hardware to run multiple virtual machines machin
simultaneously on a single physical machine.
one of the main cost-effective,
cost effective, hardware
hardware-reducing,
reducing, and
energy-saving
saving techniques used by cloud providers is
Virtualization. Virtualization allows sharing of a single
physical instance of a resource or an appli
application
cation among
multiple customers and organizations at one time.
Hypervisor is software that creates and runs virtual machines
The term virtualization is often synonymous with hardware
virtualization, which plays a fundamental role in efficiently
delivering Infrastructure-as-a-Service
Infrastructure Service (IaaS) solutions for
cloud computing. Moreover, virtualization technologies
provide a virtual environment for not only executing
applications but also for storage, memory, and networking.

Advantages of Virtualization
More flexible and efficient allocation of resources.
Enhance development productivity.
It lowers the cost of IT infrastructure.
Remote access and rapid scalability.
High availability and disaster recovery.
Pay peruse of the IT infrastructure on demand.
Enables running multiple operating systems.
all vm will work independently

Virtualization involves the creation of something's


virtual platform, including virtual computer
hardware, virtual storage devices and virtual
computer networks.

Software called hypervisor is used for hardware


virtualization. With the help of a virtual machine
hypervisor, software is incorporated into the server
hardware component. The role of hypervisor is to
control the physical hardware that is shared between
the client and the provider. Hardware virtualization
can be done using the Virtual Machine Monitor
(VVM) to remove physical hardware. There are
several extensions to the processes which help to
speed up virtualization activities and increase
hypervisor performance. When this virtualization is
done for the server platform, it is called server
socialization.
Hypervisor creates an abstract layer from the
software to the hardware in use. After a hypervisor is
installed, virtual representations such as virtual
processors take place. After installation, we cannot
use physical processors. There are several popular
hypervisors including ESXi-based VMware vSphere
and Hyper-V.

FIGURE 1.14 Hardware Virtualization

Instances in virtual machines are typically


represented by one or more data, which can be easily
transported in physical structures. In addition, they
are also autonomous since they do not have other
dependencies for their use other than the virtual
machine manager.

A Process virtual machine, sometimes known as an


application virtual machine, runs inside a host OS as
a common application, supporting a single process. It
is created at the beginning and at the end of the
process. Its aim is to provide a platform-independent
programming environment which abstracts the
information about the hardware or operating system
underlying the program and allows it to run on any
platform in the same way. For example, Linux wine
software helps you run Windows.
A high level abstraction of a VM process is the high
level programming language (compared with the
low-level ISA abstraction of the VM system).
Process VMs are implemented by means of an
interpreter; just-in-time compilation achieves
performance comparable to compiled programming
languages.

The Java programming language introduced with the


Java virtual machine has become popular with this
form of VM. The .NET System, which runs on a VM
called the Common Language Runtime, is another
example.
FIGURE 1.15 process virtual machine design
Reference from “Mastering Cloud Computing
Foundations and Applications
Programming” by Rajkumar
Buyya)

Q) Properties and characteristics of cloud computing?


1.0.1 Characteristics and benefits
As both commercially and technologically mature
cloud computing services, companies will be easier to
maximize their potential benefits. However, it is
equally important to know what cloud computing is
and what it does.
FIGURE 1. 7 Features of
Cloud Computing Following are the
characteristics of Cloud Computing:
1. Resources Pooling
This means that the Cloud provider used a
multi-leaner model to deliver the computing
resources to various customers. There are various
allocated and reassigned physical and virtual
resources, which rely on customer demand. In
general, the customer
Cloud Computing: Unedited Version pg. 10
has no control or information about the location
of the resources provided, but can choose location
on a higher level of abstraction.
2. On-Demand Self-Service
This is one of the main and useful advantages of
Cloud Computing as the user can track server
uptimes, capability and network storage on an
ongoing basis. The user can also monitor
computing functionalities with this feature.
3. Easy Maintenance
The servers are managed easily and the downtime
is small and there are no downtime except in
some cases. Cloud Computing offers an update
every time that increasingly enhances it. The
updates are more system friendly and operate
with patched bugs faster than the older ones.
4. Large Network Access
The user may use a device and an Internet
connection to access the cloud data or upload it to
the cloud from anywhere. Such capabilities can
be accessed across the network and through the
internet.
5. Availability
The cloud capabilities can be changed and
expanded according to the usage. This review
helps the consumer to buy additional cloud
storage for a very small price, if necessary.
6. Automatic System
Cloud computing analyzes the data required
automatically and supports a certain service level
of measuring capabilities. It is possible to track,
manage and report the usage. It provides both the
host and the customer with accountability.
7. Economical
It is a one-off investment since the company
(host) is required to buy the storage, which can be
made available to many companies, which save
the host from monthly or annual costs. Only the
amount spent on the basic maintenance and some
additional costs are much smaller.
8. Security
Cloud Security is one of cloud computing's best
features. It provides a snapshot of the data stored
so that even if one of the servers is damaged, the
data cannot get lost. The information is stored on
the storage devices, which no other person can
hack or use. The service of storage is fast and
reliable.
9. Pay as you go
Users only have to pay for the service or the
space in cloud computing. No hidden or
additional charge to be paid is liable to pay. The
service is economical and space is often allocated
free of charge.
10. Measured Service
Cloud Computing resources that the company
uses to monitor and record. This use of resources
is analyzed by charge-per-use capabilities. This
means that resource use can be measured and
reported by the service provider, either on the
virtual server instances running through the cloud.
You will receive a models pay depending on the
manufacturing company's actual consumption.
Q) Challenges and risk in cloud computing?
All has advantages and challenges. We saw many
Cloud features and it’s time to identify the Cloud
computing challenges with tips and techniques you
can identify all your own. Let's therefore start to
explore cloud computing risk and challenges. Nearly
all companies are using cloud computing because
companies need to store the data. The companies
generate and store a tremendous amount of data.
Thus, they face many security issues. Companies
would include establishments to streamline and
optimize the process and to improve cloud computing
management.
Cloud Computing: Unedited Version pg. 11
This is a list of all cloud computing threats and
challenges:
11. Security & Privacy
12. Interoperability & Portability
13. Reliable and flexible
14. Cost
15. Downtime
16. Lack of resources
17. Dealing with Multi-Cloud Environments
18. Cloud Migration
19. Vendor Lock-In
20. Privacy and Legal issues

1. Security and Privacy of Cloud


The cloud data store must be secure and
confidential. The clients are so dependent on the
cloud provider. In other words, the cloud provider
must take security measures necessary to secure
customer data. Securities are also the customer's
liability because they must have a good password,
don't share the password with others, and update
our password on a regular basis. If the data are
outside of the firewall, certain problems may occur
that the cloud provider can eliminate. Hacking and
malware are also one of the biggest problems
because they can affect many customers. Data loss
can result; the encrypted file system and several
other issues can be disrupted.
2. Interoperability and Portability
Migration services into and out of the cloud shall be
provided to the Customer. No bond period should
be allowed, as the customers can be hampered. The
cloud will be capable of supplying premises
facilities. Remote access is one of the cloud
obstacles, removing the ability for the cloud
provider to access the cloud from anywhere.
3. Reliable and Flexible
Reliability and flexibility are indeed a difficult task
for cloud customers, which can eliminate leakage of
the data provided to the cloud and provide customer
trustworthiness. To overcome this challenge, third-
party services should be monitored and the
performance, robustness, and dependence of
companies supervised.
4. Cost
Cloud computing is affordable, but it can be
sometimes expensive to change the cloud to
customer demand. In addition, it can hinder the
small business by altering the cloud as demand can
sometimes cost more. Furthermore, it is sometimes
costly to transfer data from the Cloud to the
premises.
5. Downtime
Downtime is the most popular cloud computing
challenge as a platform free from downtime is
guaranteed by no cloud provider. Internet
connection also plays an important role, as it can be
a problem if a company has a nontrustworthy
internet connection, because it faces downtime.
6. Lack of resources
The cloud industry also faces a lack of resources
and expertise, with many businesses hoping to
overcome it by hiring new, more experienced
employees. These employees will not only help
solve the challenges of the business but will also
train existing employees to benefit the company.
Currently, many IT employees work to enhance
cloud computing skills and it is difficult for the
chief executive because the employees are little
qualified. It claims that employees with exposure of
the latest innovations and associated technology
would be more important in businesses.

Cloud Computing: Unedited Version pg. 12


7. Dealing with Multi-Cloud Environments
Today not even a single cloud is operating with full
businesses. According to the RightScale report
revelation, almost 84 percent of enterprises adopt a
multi-cloud approach and 58 percent have their
hybrid cloud approaches mixed with the public and
private clouds. In addition, five different public and
private clouds are used by organizations.

FIGURE 1. 8 RightScale 2019 report revelation


The teams of the IT infrastructure have more
difficulty with a long-term prediction about the
future of cloud computing technology.
Professionals have also suggested top strategies to
address this problem, such as rethinking processes,
training personnel, tools, active vendor relations
management, and the studies.

8. Cloud Migration
While it is very simple to release a new app in the
cloud, transferring an existing app to a cloud
computing environment is harder. 62% said their
cloud migration projects are harder than they
expected, according to the report. In addition, 64%
of migration projects took longer than expected and
55% surpassed their budgets. In particular,
organizations that migrate their applications to the
cloud reported migration downtime (37%), data
before cutbacks synchronization issues (40%),
migration tooling problems that work well (40%),
slow migration of data (44%), security
configuration issues (40%), and time-consuming
troubleshooting (47%). And to solve these
problems, close to 42% of the IT experts said that
they wanted to see their budget increases and that
around 45% of them wanted to work at an in- house
professional, 50% wanted to set the project longer,
56% wanted more pre-migration tests.
9. Vendor lock-in
The problem with vendor lock-in cloud computing
includes clients being reliant (i.e. locked in) on the
implementation of a single Cloud provider and not
switching to another vendor without any significant
costs, regulatory restrictions or technological
incompatibilities in the future. The lock-up situation
can be seen in apps for specific cloud platforms,
such as Amazon EC2, Microsoft Azure, that are not
easily transferred to any other cloud platform and
that users are vulnerable to changes made by their
providers to further confirm the lenses of a software
developer. In fact, the issue of lock-in arises when,
for example, a company decide to modify cloud
providers (or perhaps integrate services from
different providers), but cannot move applications
or data across different cloud services, as the
semantics of cloud providers' resources and services
do not correspond. This heterogeneity of cloud
semantics and APIs creates technological
incompatibility which in turn leads to challenge
interoperability and portability. This makes it very
complicated and difficult to interoperate, cooperate,
portability, handle and maintain data and services.
For these reasons, from the point of view of the
company it is important to maintain flexibility in
changing providers according to business needs or
even to maintain in-house certain components
which are less critical to safety due to risks. The
issue of supplier lock-in will prevent
interoperability and portability between cloud
providers. It is the way for cloud providers and
clients to become more competitive.
10. Privacy and Legal issues
Apparently, the main problem regarding cloud
privacy/data security is 'data breach.'
Cloud Computing: Unedited Version pg. 13
Infringement of data can be generically defined
as loss of electronically encrypted personal
information. An infringement of the information
could lead to a multitude of losses both for the
provider and for the customer; identity theft,
debit/credit card fraud for the customer, loss of
credibility, future prosecutions and so on. In the
event of data infringement, American law
requires notification of data infringements by
affected persons. Nearly every State in the USA
now needs to report data breaches to the
affected persons. Problems arise when data are
subject to several jurisdictions, and the laws on
data privacy differ. For example, the Data
Privacy Directive of the European Union
explicitly states that 'data can only leave the EU
if it goes to a 'additional level of security'
country.' This rule, while simple to implement,
limits movement of data and thus decreases data
capacity. The EU's regulations can be enforced.

Explain hardware virtualization


Hardware virtualization is the method used to create
virtual versions of physical desktops and operating
systems. It uses a virtual machine manager (VMM) called
a hypervisor to provide abstracted hardware to multiple
guest operating systems, which can then share the
physical hardware resources more efficiently
Hardware virtualization, also known as platform
virtualization, is a technology that enables the creation and
operation of virtual machines (VMs) on a physical
computing system. It allows multiple operating systems
and applications to run simultaneously on a single
hardware platform, as if they were running on separate
physical machines.
In hardware level virtualization, a software layer called a
hypervisor, also known as a virtual machine monitor
(VMM), is installed on the host machine. The hypervisor
acts as an intermediary between the physical hardware and
the virtual machines, managing the allocation of hardware
resources such as CPU, memory, storage, and network
interfaces between those machines.
The hypervisor creates virtual instances of the underlying
hardware, including virtual CPUs, memory spaces, and
disk storage, which are then assigned to each virtual
machine. This enables each VM to operate independently,
with its isolated environment, as if running on its
dedicated hardware. Solution Architect courses will aid in
fast-tracking your career with Cloud Computing
certifications and acquiring essential skills.
Isolation: Hardware-based virtualization provides strong
isolation between virtual machines, which means that any
problems in one virtual machine will not affect other
virtual machines running on the same physical host.
Resource allocation: Hardware-based virtualization allows
for flexible allocation of hardware resources such as CPU,
memory, and I/O bandwidth to virtual machines.

Snapshot and migration: Hardware-based virtualization


allows for the creation of snapshots, which can be used for
backup and recovery purposes. It also allows for live
migration of virtual machines between physical hosts,
which can be used for load balancing and other purposes.

Support for multiple operating systems: Hardware-based


virtualization supports multiple operating systems, which
allows for the consolidation of workloads onto fewer
physical machines, reducing hardware and maintenance
costs.

Compatibility: Hardware-based virtualization is


compatible with most modern operating systems, making
it easy to integrate into existing IT infrastructure.

Advantages of hardware-based virtualization –


It reduces the maintenance overhead of paravirtualization
as it reduces (ideally, eliminates) the modification in the
guest operating system. It is also significantly convenient
to attain enhanced performance. A practical benefit of
hardware-based virtualization has been mentioned by
VMware engineers and Virtual Iron.
Disadvantages of hardware-based virtualization –
Hardware-based virtualization requires explicit support in
the host CPU, which may not available on all x86/x86_64
processors. A “pure” hardware-based virtualization
approach, including the entire unmodified guest operating
system, involves many VM traps, and thus a rapid
increase in CPU overhead occurs which limits the
scalability and efficiency of server consolidation. This
performance hit can be mitigated by the use of para-
virtualized drivers; the combination has been called
“hybrid virtualization”.

What are Different Types of Hardware Virtualization


Full Virtualization
With full virtualization, one of the different hardware
virtualization types, VMs run their own operating systems
and applications, just as if they were on separate physical
machines. This allows for great flexibility and
compatibility. You can have VMs running different
operating systems, like Windows, Linux, or even exotic
ones, all coexisting peacefully on the same physical
hardware.
Advantages
One of the key advantages is isolation. Each VM operates
in its own virtual bubble, protected from the chaos that
might arise from other VMs sharing the same hardware.
Furthermore, full virtualization enables the migration of
VMs between physical hosts. Imagine the ability to move
a running VM from one physical server to another, like a
teleportation trick. This live migration feature allows for
workload balancing, hardware maintenance without
downtime, and disaster recovery.
Full virtualization also plays a vital role in testing and
development environments. It allows developers to create
different VMs for software testing, without the need for
dedicated physical machines. This helps them save a lot of
money, time, and efforts in the long run.

Emulation Virtualization
Emulation virtualization, the next one in different types of
hardware virtualization, relies on a clever technique
known as hardware emulation. Through hardware
emulation, a virtual machine monitor, or hypervisor,
creates a simulated hardware environment within each
virtual machine.
This simulated environment replicates the characteristics
and behaviour of the desired hardware platform, even if
the underlying physical hardware is different. It's like
putting on a digital costume that makes the virtual
machine look and feel like it's running on a specific type
of hardware.

Advantages
But how does this aid in enabling hardware virtualization?
Well, the main advantage of emulation virtualization lies
in its flexibility and compatibility. It enables virtual
machines to run software that may be tied to a specific
hardware platform, without requiring the exact hardware
to be present.
This flexibility is particularly useful in scenarios where
legacy software or operating systems need to be preserved
or migrated to modern hardware. Emulation virtualization
allows these legacy systems to continue running on virtual
machines, ensuring their longevity and compatibility with
new hardware architectures.
It is a powerful tool in the virtualization magician's
arsenal, allowing us to transcend the limitations of
physical hardware and embrace a world of endless
possibilities.

Q) Advantages of cloud computing in machine learning?


Para-Virtualization
Unlike other types of hardware virtualization, para-
virtualization requires some special coordination between
the virtual machine and the hypervisor. The guest
operating system running inside the virtual machine
undergoes slight modifications. These modifications
introduce specialised API calls, allowing the guest
operating system to communicate directly with the
hypervisor.

Advantages
This direct communication eliminates the need for certain
resource-intensive tasks, such as hardware emulation,
which is required in full virtualization. By bypassing these
tasks, para-virtualization can achieve higher performance
and efficiency compared to other virtualization
techniques.
Para-virtualization shines in scenarios where performance
is paramount. It's like having a race car driver and a
skilled navigator working together to achieve the fastest
lap times. By leveraging the direct communication
between the guest operating system and the hypervisor,
para-virtualization minimises the overhead and latency
associated with traditional virtualization approaches.
This performance boost is particularly beneficial for high-
performance computing, real-time systems, and I/O-
intensive workloads. It's like having a turbocharger that
boosts the virtual machine's performance, enabling it to
handle demanding tasks with efficiency and precision.

Advantages of Hardware Virtualization


Improved Resource Utilisation
With hardware virtualization, you can maximise the
utilisation of physical resources such as CPU, memory,
and storage.
By running multiple virtual machines (VMs) on a single
physical server, you can effectively make use of the
available resources.
Enhanced Scalability
Hardware virtualization enables you to easily scale your
infrastructure to meet changing demands. Whether you
need to add more virtual machines or allocate additional
resources to existing VMs, virtualization allows for
seamless scalability. It's like having the ability to expand
your stage and accommodate more performers as the
audience grows.
Increased Flexibility and Agility
Virtualization offers flexibility by decoupling the software
from the underlying hardware.
You can run different operating systems and applications
on the same physical server, allowing for diverse
workloads and environments.
Cost Savings
One of the major benefits of hardware virtualization is
significant cost savings. By consolidating multiple
physical servers into a virtualized environment, you
reduce the need for additional hardware, power
consumption, and cooling costs. It enables optimising
your expenses by sharing resources efficiently.
Improved Disaster Recovery and Business Continuity
Virtualization provides robust disaster recovery
capabilities. With features like live migration and
snapshots, you can easily move virtual machines between
physical hosts or create point-in-time backups. In the
event of hardware failure or a disaster, you can quickly
restore operations, minimising downtime and ensuring
business continuity. It's like having an emergency plan
that allows you to seamlessly switch venues and continue
with the work.
Simplified Testing and Development
Virtualization simplifies the process of testing and
development. You can create isolated virtual environments
to test new software, configurations, or updates without
impacting production systems. This also can help you save
a lot of time you’d have invested in gathering all the
hardware for different machines.
Enhanced Security
Hardware virtualization can improve security by isolating
virtual machines from each other. Even if one VM is
compromised, the others remain unaffected.
Green IT and Environmental Benefits

Chapter2: Cloud
d Architecture and cloud service
management.

2. Draw architecture of cloud computing

1. Frontend :
Frontend of the cloud architecture refers to the client side
of cloud computing system. Means it contains all the user
interfaces and applications which are used by the client to
access the cloud computing services/resources. For
example, use of a web browser to access the cloud
platform.
Client Infrastructure – Client Infrastructure is a part of the
frontend component. It contains the applications and user
interfaces which are required to access the cloud platform.
In other words, it provides a GUI( Graphical User
Interface ) to interact with the cloud.
2. Backend :
Backend refers to the cloud itself which is used by the
service provider. It contains the resources as well as
manages the resources and provides security mechanisms.
Along with this, it includes huge storage, virtual
applications, virtual machines, traffic control mechanisms,
deployment models, etc.

Application –
Application in backend refers to a software or platform to
which client accesses. Means it provides the service in
backend as per the client requirement.
Service –
Service in backend refers to the major three types of cloud
based services like SaaS, PaaS and IaaS. Also manages
which type of service the user accesses.
Runtime Cloud-
Runtime cloud in backend provides the execution and
Runtime platform/environment to the Virtual machine.
Storage –
Storage in backend provides flexible and scalable storage
service and management of stored data.
Infrastructure –
Cloud Infrastructure in backend refers to the hardware and
software components of cloud like it includes servers,
storage, network devices, virtualization software etc.
Management –
Management in backend refers to management of backend
components like application, service, runtime cloud,
storage, infrastructure, and other security mechanisms etc.
Security –
Security in backend refers to implementation of different
security mechanisms in the backend for secure cloud
resources, systems, files, and infrastructure to end-users.
Internet –
Internet connection acts as the medium or a bridge
between frontend and backend and establishes the
interaction and communication between frontend and
backend.
Database– Database in backend refers to provide database
for storing structured data, such as SQL and NOSQL
databases. Example of Databases services include Amazon
RDS, Microsoft Azure SQL database and Google Cloud
SQL.
Networking– Networking in backend services that provide
networking infrastructure for application in the cloud,
such as load balancing, DNS and virtual private networks.
Analytics– Analytics in backend service that provides
analytics capabilities for data in the cloud, such as
warehousing, business intelligence and machine learning.

CLOUD SERVICE MODELS:


A cloud conveyance model connotes an assigned,
pre-bundled blend of IT assets available by a cloud
supplier. Three common cloud conveyance models
turned out to be comprehensively perceived and
honorable:
• Infrastructure-as-a-Service (IaaS)
• Platform-as-a-Service (PaaS)
• Software-as-a-Service (SaaS)
2.1.4.1. Framework as-a-Service (IaaS)
The IaaS circulation model implies an independent
IT climate contained of foundation driven IT assets
which will be recovered and accomplished by
means of cloud administration based interfaces and
devices. This climate can incorporate equipment,
organize, network, working frameworks, and other
"crude" IT assets. In distinction to conventional
facilitating or redistributing environmental factors,
with IaaS, IT assets are normally virtualized and
pressed into wraps that compress in advance
runtime climbing and customization of the
framework. the broadly useful of an IaaS domain
is to flexibly cloud customers with an elevated
level of control and responsibility over its.
design and use. The IT assets gave by IaaS are by
and large not pre-arranged, setting the official
obligation straightforwardly upon the cloud
shopper. This model is consequently utilized by
cloud buyers that need a significant level of
command over the cloud-based condition they will
make. Here and there cloud suppliers will contract
IaaS contributions from other cloud suppliers in
order to scale their own cloud surroundings. the
sorts and makes of the IT assets gave by IaaS
items offered by various cloud suppliers can
change. IT assets accessible through IaaS
conditions are for the most part offered as newly
instated virtual occurrences. A focal and first IT
asset inside a run of the mill IaaS condition is that
the virtual server. Virtual servers are rented by
indicating server equipment necessities, similar to
processor limit, memory, and local space for
putting away, as appeared in Figure
Cloud Computing: Unedited Version pg. 11
Figure 2.1.8. A cloud customer is using a virtual
server within an IaaS atmosphere. Cloud
consumers are delivered with a range of
contractual guarantees by the cloud provider,
relating to physiognomies such as capacity,
performance, and availability.
(Reference :Cloud Computing(Concepts,
Technology & Architecture) by
Thomas Erl,Zaigham Mahmood, and
Ricardo Puttini)
2.1.4.2. Stage as-a-Service (PaaS)
The PaaS conveyance model says to a pre-
categorized "prepared to-utilize" condition
ordinarily involved previously sent and arranged
IT assets. In specific, PaaS be subject to on the
application of an prompt area that sets up a lot of
pre-hustled stuffs and instruments used to help the
whole conveyance lifecycle of custom
applications.
Even motives a cloud buyer would apply and place
properties into a PaaS domain include:
•The cloud buyer needs to reach out on-premise
conditions into the cloud for versatility and
financial purposes.
•The cloud customer utilizes the instant
condition to totally substitute an on-premise
condition.
•The cloud consumer wants to go into a
cloud dealer and takes its personal cloud
administrations to be complete available to
additional outer cloud buyers.
By employed privileged an immediate phase, the
cloud shopper is saved the authoritative mass of
location up and keeping up the exposed
foundation IT assets gave by means of the IaaS
model. On the other hand, the cloud customer is
conceded a lower level of authority over the
fundamental IT assets that host and arrangement
the stage (Figure 4.12).

Figure 2.1.9. A cloud consumer is accessing a


ready-made PaaS environment. The question mark
indicates that the cloud consumer is intentionally
shielded from the implementation details of the
platform.
(Reference :Cloud Computing(Concepts,
Technology & Architecture) by
Thomas Erl,Zaigham Mahmood, and
Ricardo Puttini)

Cloud Computing: Unedited Version pg. 12


PaaS products are available with different
development stacks. For example, Google App
Engine offers a Java and Python-based
environment.
2.1.4.2.1. Programming as-a-Service (SaaS)
A product program situated as a typical cloud
administration and made open as an "item" or
general worth connotes the standard profile of
a SaaS offering. The SaaS conveyance model
is normally wont to make a refillable cloud
administration extensively available
(frequently monetarily) to an assorted variety
of cloud clients. Complete profitable center
occurs about SaaS matters which will be
borrowed and applied for different drives and
by incomes of numerous terms

Figure 2.1.10. The cloud service customer is


given access the cloud agreement,
but to not any fundamental IT
resources or application details.
(Reference :Cloud Computing(Concepts,
Technology & Architecture)
by Thomas Erl,Zaigham Mahmood,
and Ricardo Puttini)
A cloud purchaser is normally allowed
constrained authoritative command above a
SaaS practice. it's most normally provisioned
by the cloud provider, yet it are frequently
formally claimed by whichever element expect
the cloud administration proprietor job. for
example , an enterprise going about as a cloud
buyer while utilizing and managing a PaaS
domain can assemble a cloud administration
that it chooses to convey in that equivalent
condition as a SaaS offering. An identical
association at that point adequately accept the
cloud supplier job in bright of the fact that the
SaaS-based cloud administration is framed
accessible to different associations that go
about as cloud purchasers when utilizing that
cloud administration.
Q) What are layers of cc?

1) Infrastructure as a Service (IaaS): IaaS is the basic


layer of the cloud that comprises hardware and
network. That said, IaaS is different from a regular
server as it comes with two key features of cloud
technology-virtualisation and scalability. IaaS
service providers scale this layer in such a manner
that the additional cost of adding more storage or
bandwidth is minimal. Owing to virtualisation, these
providers are able to use up to 90% of their
computing resources in contrast to traditional
hosting services where servers may lay idle at times.

Provide infrastructure used by system administrators


and network architecture
Provides underlying
os Security
Networking
Servers
Offers services where you can see IP address , stores
data on virtual machine
Virtual local area network
Load balancer
Advantages of IaaS
The resources can be deployed by the provider to a
customer’s environment at any given time.
Its ability to offer the users to scale the business based
on their requirements.
The provider has various options when deploying
resources including virtual machines, applications,
storage, and networks.
It has the potential to handle an immense number of
users.
It is easy to expand and saves a lot of money.
Companies can afford the huge costs associated with
the implementation of advanced technologies.
Cloud provides the architecture.
Enhanced scalability and quite flexible.
Dynamic workloads are supported.
Disadvantages of IaaS
Security issues are there.
Service and Network delays are quite a issue in IaaS.
2) Platform as a Service (PaaS): This layer of the cloud
caters to the requirements of software developers as it
is the place where new applications are developed. You
can use PaaS services to build and test your
applications on the cloud before deploying them. In
fact, it is designed in such a manner that it supports the
entire lifecycle of an application, right from building,
testing and deployment to maintenance and updating.
Like IaaS, PaaS includes infrastructure, but also
includes development tools, database management
systems and much more.
Developers use it.
More control
Provides platform and environment
Allows developers to build applications and services
over internet
Services are hosted in cloud and accessed by user via
web browser
No control over infrastructure
We will interact with user interface provided by venue
Hosts hardware and software on its own infrastructure
Advantages of PaaS
Programmers need not worry about what specific
database or language the application has been
programmed in.
It offers developers the to build applications without
the overhead of the underlying operating system or
infrastructure.
Provides the freedom to developers to focus on the
application’s design while the platform takes care of
the language and the database.
It is flexible and portable.
It is quite affordable.
It manages application development phases in the
cloud very efficiently.

Disadvantages of PaaS
Data is not secure and is at big risk.
As data is stored both in local storage and cloud, there
are high chances of data mismatch while integrating
the data.
3) Software as a Service (SaaS): The third and final
layer of the cloud comes with a complete software
solution. Here, organizations rent the usage of a SaaS
application, and users connect to it by means of the
internet. The application, therefore, needs to be web-
based server so that it can be accessed from anywhere.
In this case, the service provider offers both the
software and the hardware

Advantages of SaaS
It is a cloud computing service category providing a
wide range of hosted capabilities and services. These
can be used to build and deploy web-based software
applications.
It provides a lower cost of ownership than on-premises
software. The reason is it does not require the purchase
or installation of hardware or licenses.
It can be easily accessed through a browser along a
thin client.
No cost is required for initial setup.
Low maintenance costs.
Installation time is less, so time is managed properly.
Disadvantages of SaaS
Low performance.
It has limited customization options.
It has security and data concerns.

Layered Architecture of Cloud

Application Layer
1. The application layer, which is at the top of
the stack, is where the actual cloud apps are
located. Cloud applications, as opposed to
traditional applications, can take advantage
of the automatic-scaling functionality to
gain greater performance, availability, and
lower operational costs.
2. This layer consists of different Cloud
Services which are used by cloud users.
Users can access these applications
according to their needs. Applications are
divided into Execution
layers and Application layers.
3. In order for an application to transfer data,
the application layer determines whether
communication partners are available.
Whether enough cloud resources are
accessible for the required communication
is decided at the application layer.
Applications must cooperate in order to
communicate, and an application layer is in
charge of this.
4. The application layer, in particular, is
responsible for processing IP traffic
handling protocols like Telnet and FTP.
Other examples of application layer systems
include web browsers, SNMP protocols,
HTTP protocols, or HTTPS, which is
HTTP’s successor protocol.
Platform Layer
1. The operating system and application
software make up this layer.
2. Users should be able to rely on the platform
to provide them with Scalability,
Dependability, and Security
Protection which gives users a space to
create their apps, test operational processes,
and keep track of execution outcomes and
performance. SaaS application
implementation’s application layer
foundation.
3. The objective of this layer is to deploy
applications directly on virtual machines.
4. Operating systems and application
frameworks make up the platform layer,
which is built on top of the infrastructure
layer. The platform layer’s goal is to lessen
the difficulty of deploying programmers
directly into VM containers.
5. By way of illustration, Google App Engine
functions at the platform layer to provide
API support for implementing storage,
databases, and business logic of ordinary
web apps.
Infrastructure Layer
1. It is a layer of virtualization where physical
resources are divided into a collection of
virtual resources using virtualization
technologies like Xen, KVM, and VMware.
2. This layer serves as the Central Hub of
the Cloud Environment, where resources
are constantly added utilizing a variety of
virtualization techniques.
3. A base upon which to create the platform
layer. constructed using the virtualized
network, storage, and computing resources.
Give users the flexibility they want.
4. Automated resource provisioning is made
possible by virtualization, which also
improves infrastructure management.
5. The infrastructure layer sometimes referred
to as the virtualization layer, partitions the
physical resources using virtualization
technologies like Xen, KVM, Hyper-V,
and VMware to create a pool of compute
and storage resources.
6. The infrastructure layer is crucial to cloud
computing since virtualization technologies
are the only ones that can provide many
vital capabilities, like dynamic resource
assignment.
Datacenter Layer
 In a cloud environment, this layer is
responsible for Managing Physical
Resources such as servers, switches,
routers, power supplies, and cooling
systems.
 Providing end users with services requires
all resources to be available and managed in
data centers.
 Physical servers connect through high-speed
devices such as routers and switches to the
data center.
 In software application designs, the division
of business logic from the persistent data it
manipulates is well-established. This is due
to the fact that the same data cannot be
incorporated into a single application
because it can be used in numerous ways to
support numerous use cases. The
requirement for this data to become a
service has arisen with the introduction of
microservices.
 A single database used by many
microservices creates a very close coupling.
As a result, it is hard to deploy new or
emerging services separately if such
services need database modifications that
may have an impact on other services. A
data layer containing many databases, each
serving a single microservice or perhaps a
few closely related microservices, is needed
to break complex service interdependencies.

Q) What are types of cloud computing

Private Cloud: Here, computing resources are


deployed for one particular organization. This method
is more used for intra-business interactions. Where the
computing resources can be governed, owned and
operated by the same organization.
Community Cloud: Here, computing resources are
provided for a community and organizations.
Public Cloud: This type of cloud is used usually for
B2C (Business to Consumer) type interactions. Here
the computing resource is owned, governed and
operated by government, an academic or business
organization.
Hybrid Cloud: This type of cloud can be used for both
type of interactions – B2B (Business to Business) or
B2C ( Business to Consumer). This deployment
method is called hybrid cloud as the computing
resources are bound together by different clouds.
Basis Of IAAS PAAS SAAS
Infrastructure
Platform as a Software as a
Stands for as a
service. service.
service.
IAAS is used
by SAAS is used
PAAS is used by
Uses network by the end
developers.
architects user.
.
IAAS gives PAAS gives
access to access to run
the time
resources environmentSAAS gives
like to access to
Access
virtual deployment the end
machines and user.
and development
virtual tools for
storage. application.
It is a service It is a cloud It is a service
model computing model in
that model that cloud
Model provides delivers tools computing
virtualize that are used that hosts
d for the software to
computin development make it
g of available to
resources applications. clients.
over the
internet.
There is no
requirement
It requiresSome knowledge
Technical about
technical is required
understan technicaliti
knowled for the basic
ding. es company
ge. setup.
handles
everything.
It is popular
It is popular
among
It is popular among
consumers
among developers
and
develope who focus on
Popularity companies,
rs and the
such as file
researche development
sharing,
rs. of apps and
email, and
scripts.
networking.
It has about a
It has around
It has around 27 % rise in
Percentage a 12%
32% the cloud
rise incremen
increment. computing
t.
model.
Used by the
skilled
Used by mid-
develope Used among
level
r to the users of
Usage developers to
develop entertainme
build
unique nt.
applications.
applicati
ons.
Amazon Web
Facebook, andMS Office web,
Services,
Google Facebook
Cloud services. sun,
search and Google
vCloud
engine. Apps.
Express.
AWS virtual
Enterprise IBM cloud
privateMicrosoft Azure.
services. analysis.
cloud.
Outsourced
Force.com, AWS,
cloud Salesforce
Gigaspaces. Terremark
services.
Operating
System,
Data of the
User Controls Runtime, Nothing
Middlew application
are, and
Applicati
on data
It is highly
It is highly
scalable to
scalable to
It is highly suit the
suit the
scalable small, mid
Others different
and and
businesses
flexible. enterprise
according to
level
resources.
business

Q) Cloud service management policies and


meachanisms?
Cloud computing is an internet based computing,
providing the on demand self services over the internet
for the use of servers, storage space or disk, different
platforms and applications to any cloud user. The
cloud computing services are `Pay as per your usage'
based on the agreement between the Cloud Service
Provider and Cloud customer. Service Level
Agreement (SLA) is a contract between service
provider and the third party such as Cloud user or
Broker(agent), where service conditions are formally
or legally defined. Oftenly Service Level
Agreement(SLA) is used to define the terms of the
delivery period of the service, and the different
performance parameters of service to be provided by
the provider.
Importance of SLA:
● The consumer can get the information about
the service providers.
● SLA describes the complete information about
the service and the type of services (SaaS,
PaaS, IaaS) that are provided to a particular
consumer.
● SLA describes the purpose and objectives
based on business level policies, which
includes the part of the service provider and
the customer.
● The consumers will be able to identify the key
security and management strategies of
agreement.
● SLA is used to monitor the quality of service,
performance, response time from the service
point of view.
● The consumer can get the idea about the
requirements for the management of the
service in case of poor performance.
A Service Level Agreement (SLA) is the bond for
performance negotiated between the cloud services
provider and the client. Earlier, in cloud computing all
Service Level Agreements were negotiated between a
client and the service consumer. Nowadays, with the
initiation of large utility-like cloud computing
providers, most Service Level Agreements are
standardized until a client becomes a large consumer
of cloud services. Service level agreements are also
defined at different levels which are mentioned below:
● Customer-based SLA
● Service-based SLA
Multi level SLA

TYPES OF SLA:
Service-level agreement provides a framework within
which both seller and buyer of a service can pursue a
profitable service business relationship. It outlines the
broad understanding between the service provider and
the serviceconsumer for conducting business and
forms the basis for maintaining a mutually beneficial
relationship. From a legal perspective, the necessary
terms and conditions that bind the service provider to
provide services continually to the service consumer
are formally defined in SLA.
There are two types of SLAs from the perspective of
application hosting.
1. Infrastructure SLA
2.Application SLA

Types of SLA:

There are two types of SLAs from the perspective of


application hosting.
1. Infrastructure SLA
2. Application SLA
Infrastructure SLA.
The infrastructure provider manages and offers
guarantees on availability of the infrastructure,
namely, server machine, power,network connectivity,
and so on. Enterprises manage themselves, their
applications that are deployed on these server
machines. The machines are leased to the customers
and are isolated from machines of other customers.

Application SLA.
In the application co-location hosting model, the
server capacity is available to the applications based
saolely on their resource demands.Hence, the service
providers are flexible in allocating and de-allocating
computing resources among the co-located
applications. Therefore, the service providers are also
responsible for ensuring to meet their customer’s
application SLOs. For example, an enterprise can
have the following application SLA with a service
provider for one of its application, as shown in Table
Each SLA goes through a sequence of steps starting
from identification
fication of terms and conditions, activation
and monitoring off the stated terms and conditions, and
eventual termination of contract once the hosting
relationship ceases to exist. Such a sequence of steps is
called SLA life cycle and consists of the following five
phases:
1. Contract definition
finition
2. Publishing and discovery
3. Negotiation
4. Operationalization
5.De-commissioning
commissioning
SLA is a legal, formal and negotiated document that
defines the service in terms of quantitative and
qualitative metrics. The metrics which is involved in
SLA should be proficient of being measured on a
consistent basis, and the SLA should be evaluated by
that metrics. SLA plays a role in the life cycle of the
service. SLA cannot guarantee that the consumer can
access the service as described in the SLA document.
In future our work is focused on developing a
approach to certify that the service is provided
according the specified level of quality which is
mentioned in the SLA.

Chapter 3: Cloud Data Storage:

What is Cloud Storage?


Cloud Storage is technology that allows us to save
files in storage, and then access those files via the
Cloud. Let's break down this definition. First, storage is
the computer's ability to save files and other
resources for later use. When you restart a computer,
the files that are still available after the computer
turns back on are saved and read from storage. Such
storage commonly consists of a hard drive, a USB
Flash drive, or another type of drive. How Cloud
Storage Works

How Cloud Storage Works? Cloud storage is saving


data to an off-site storage system maintained by a
third party. Rather than storing information to your
computer’s hard drive or other local storage device,
you save it to a remote database. The Internet
provides the connection between your computer and
the database. Cloud Storage Architecture Cloud
Computing Reference Architecture

1.Cloud Consumer The cloud consumer is the principal


stakeholder for the cloud computing service. A cloud
consumer represents a person or organization that
maintains a business relationship with, and uses the
service from a cloud provider. A cloud consumer
browses the service catalog from a cloud provider,
requests the appropriate service, sets up service
contracts with the cloud provider, and uses the
service. The cloud consumer may be billed for the
service provisioned, and needs to arrange payments
accordingly. A cloud provider may also list in the SLAs
a set of promises explicitly not made to consumers,
i.e. limitations, and obligations that cloud consumers
must accept. A cloud consumer can freely choose a
cloud provider with better pricing and more favorable
terms.
2. Cloud Provider A cloud provider is a person, an
organization; it is the entity responsible for making a
service available to interested parties. A Cloud
Provider acquires and manages the computing
infrastructure required for providing the services, runs
the cloud software that provides the services, and
makes arrangement to deliver the cloud services to
the Cloud Consumers through network access. For
Software as a Service, the cloud provider deploys,
configures, maintains and updates the operation of
the software applications on a cloud infrastructure so
that the services are provisioned at the expected
service levels to cloud consumers.
3. Cloud Auditor A cloud auditor is a party that can
perform an independent examination of cloud service
controls with the intent to express an opinion
thereon. Audits are performed to verify conformance
to standards through review of objective evidence. A
cloud auditor can evaluate the services provided by a
cloud provider in terms of security controls, privacy
impact, performance, etc. For security auditing, a
cloud auditor can make an assessment of the security
controls in the information system to determine the
extent to which the controls are implemented
correctly, operating as intended, and producing the
desired outcome with respect to the security
requirements for the system.
4. Cloud Broker As cloud computing evolves, the
integration of cloud services can be too complex for
cloud consumers to manage. A cloud consumer may
request cloud services from a cloud broker, instead of
contacting a cloud provider directly. A cloud broker is
an entity that manages the use, performance and
delivery of cloud services and negotiates relationships
between cloud providers and cloud consumers. In
general, a cloud broker can provide services in three
categories [9]: Service Intermediation: A cloud broker
enhances a given service by improving some specific
capability and providing value-added services to cloud
consumers.
5. Cloud Carrier A cloud carrier acts as an
intermediary that provides connectivity and transport
of cloud services between cloud consumers and cloud
providers. Cloud carriers provide access to consumers
through network, telecommunication and other
access devices. Types of Cloud Storage There are four
main types of cloud storage —
1. Public cloud
2. Private cloud
3. Community cloud
4. Hybrid cloud Public Cloud Public cloud storage is
where the enterprise and storage service provider are
separate and there aren't any cloud resources stored
in the enterprise's data center.
The cloud storage provider fully manages the
enterprise's public cloud storage. Private Cloud
Companies that look for cost efficiency and greater
control over data & resources will find the private
cloud a more suitable choice. The private cloud offers
bigger opportunities that help meet specific
organizations' requirements when it comes to
customization. Community Cloud The community
cloud operates in a way that is similar to the public
cloud. There's just one difference - it allows access to
only a specific set of users who share common
objectives and use cases.
This type of deployment model of cloud computing is
managed and hosted internally or by a third-party
vendor. Hybrid Cloud A hybrid cloud or cloud bursting
is a combination of two or more cloud architectures.
While each model in the hybrid cloud functions
differently, it is all part of the same architecture.
Further, as part of this deployment of the cloud
computing model, the internal or external providers
can offer resources.A company with critical data will
prefer storing on a private cloud, while less sensitive
data can be stored on a public cloud. Risks of Cloud
Storage
1. Requires high speed internet connection most of
the time.
2. Data is stored on third party servers.
3. When a provider closes its service for maintenance,
4. You may find it troublesome to access your data.If
your provider closes its service permanently, you may
lose you valuable data.
5. Premium services cost you a considerable amount
for the storage volume Advantages of Cloud Storage
1. Usability: All cloud storage services reviewed in this
topic have desktop folders for Mac’s and PC’s. This
allows users to drag and drop files between the cloud
storage and their local storage.
2. Bandwidth: You can avoid emailing files to
individuals and instead send a web link to recipients
through your email.
3. Accessibility: Stored files can be accessed from
anywhere via Internet connection.
4. Disaster Recovery: It is highly recommended that
businesses have an emergency backup plan ready in
the case of an emergency. Cloud storage can be used
as a back-up plan by businesses by providing a second
copy of important files.
5. Cost Savings: Businesses and organizations can
often reduce annual operating costs by using cloud
storage; cloud storage costs about 3 cents per
gigabyte to store data internally.
Disadvantages of Cloud Storage
1. Usability: Be careful when using drag/drop to move
a document into the cloud storage folder. This will
permanently move your document from its original
folder to the cloud storage location.
2. Bandwidth: Several cloud storage services have a
specific bandwidth allowance.
3. Accessibility: If you have no internet connection,
you have no access to your data.
4. Data Security: There are concerns with the safety
and privacy of important data stored remotely. The
possibility of private data commingling with other
organizations makes some businesses uneasy.
5. Software: If you want to be able to manipulate your
files locally through multiple devices, you’ll need to
download the service on all devices.

Chapter 4: Data management using Cloud


Computing:

What is data pipelining ?


A data pipeline is a process that moves data from one
system or format to
another. The data pipeline typically includes a series of
steps. This is for
extracting data from a source, transforming and
cleaning it, and loading it
into a destination system, such as a database or a data
warehouse. Data
pipelines can be used for a variety of purposes,
including data integration,
data warehousing, automating data migration, and
analytics.

What is purpose of data pipelining


The data pipeline is a key element in the overall data
management process. Its purpose is to automate and
scale repetitive data
flows and associated data collection, transformation
and integration tasks. A properly constructed data
pipeline can accelerate the processing that;s required
as data is gathered, cleansed, filtered, enriched and
moved to
downstream systems and applications. Well-designed
pipelines also enable organizations to take advantage
of big
data assets that often include large amounts of
structured, unstructured and semistructured data. In
many cases, some of that is real-time data generated
and updated on an ongoing basis. As the volume,
variety and
velocity of data continue to grow in big data systems,
the need for data pipelines that can linearly scale --
whether in on-premises, cloud or hybrid
cloud environments -- is becoming increasingly critical
to analytics initiatives and business operations.

Who needs data pipelinig


A data pipeline is needed for any analytics application
or business process
that requires regular aggregation, cleansing,
transformation and distribution
of data to downstream data consumers. Typical data
pipeline users include
the following:
1) Data scientists and other members of data science
teams.
2) Business intelligence (BI) analysts and developers.
3) Business analysts.
Senior management and other business executives.
4) Marketing and sales teams.
5) Operational workers.
To make it easier for business users to access relevant
data, pipelines can
also be used to feed it into BI dashboards and reports,
as well as
operational monitoring and alerting systems.

How data pipeline works


The data pipeline development process starts by
defining what, where and
how data is generated or collected. That includes
capturing source system
characteristics, such as data formats, data structures,
data schemas and
data definitions -- information thats needed to plan and
build a pipeline.
Once its in place, the data pipeline typically involves
the following steps:
Many data pipelines are built by data engineers or big
data engineers. To create effective pipelines, its critical
that they develop their soft skills -- meaning their
interpersonal and communication skills. This will help
them collaborate with data
scientists, other analysts and business stakeholders to
identify user requirements
and the data that needed to meet them before
launching a data pipeline
development project. Such skills are also necessary for
ongoing conversations to
prioritize new development plans and manage existing
data pipelines.

Other best practices on data pipelines include the


following:
Manage the development of a data pipeline as a
project, with defined goals and delivery dates.
Document data lineage information so the history,
technical attributes
and business meaning of data can be understood.
Ensure that the proper context of data is maintained as
its transformed in
a pipeline. Create reusable processes or templates for
data pipeline steps to
streamline development.
Avoid scope creep that can complicate pipeline
projects and create
unrealistic expectations among users.

1. Data ingestion. Raw data from one or more source


systems is
ingested into the data pipeline. Depending on the data
set, data
ingestion can be done in batch or real-time mode.
Data integration. If multiple data sets are being pulled
into the
pipeline for use in analytics or operational
applications, they need
to be combined through data integration processes.
3. Data cleansing. For most applications, data
quality management
measures are applied to the raw data in the pipeline to
ensure that
its clean, accurate and consistent.
4. Data filtering. Data sets are commonly filtered to
remove data
that isnt needed for the particular applications the
pipeline was
built to support.
5. Data transformation. The data is modified as needed
for the
planned applications. Examples of data
transformation methods
include aggregation, generalization, reduction and
smoothing.
6. Data enrichment. In some cases, data sets are
augmented and
enriched as part of the pipeline through the addition of
more data
elements required for applications.
7. Data validation. The finalized data is checked to
confirm that it is
valid and fully meets the application requirements.
8. Data loading. For BI and analytics applications, the
data is
loaded into a data store so it can be accessed by users.
Typically,
thats a data warehouse, a data lake or a data lakehouse,
which
combines elements of the other two platforms.
Many data pipelines also apply machine
learning and neural network algorithms to create more
advanced data transformations and
enrichments. This includes segmentation, regression
analysis, clustering and the creation of advanced
indices and propensity scores.
In addition, logic and algorithms can be built into a
data pipeline to add intelligence.
As machine learning -- and, especially, automated
machine learning
( AutoML ) -- processes become more prevalent, data
pipelines likely will
become increasingly intelligent. With these processes,
intelligent data
pipelines could continuously learn and adapt based on
the characteristics
of source systems, required data transformations and
enrichments, and
evolving business and application requirements.

There are several types of data pipeline architecture,


each with its own set
of characteristics and use cases. Some of the most
common types include:
1) Batch Processing: Data is processed in batches at
set intervals,
such as daily or weekly.
2)Real-Time Streaming: Data is processed as soon as it
is generated,
with minimal delay.
3) Lambda Architecture: A combination of batch and
real-time
processing, where data is first processed in batch and
then updated
in real-time.
4) Kappa Architecture: Similar to Lambda
architecture, data is only
processed once, and all data is ingested in real time.
5) Microservices Architecture: Data is processed using
loosely
coupled, independently deployable services.
5)ETL (Extract, Transform, Load) Architecture: Data
is extracted
from various sources, transformed to fit the target
system, and
loaded into the target system.
A data pipeline architecture is essential for several
reasons:
Scalability: Data pipeline architecture should allow for
the efficient
processing of large amounts of data, enabling
organizations to scale
their data processing capabilities as their data volume
increases.
a) Reliability: A well-designed data pipeline
architecture ensures that
data is processed accurately and reliably. This reduces
the risk of
errors and inaccuracies in the data.
b) Efficiency: Data pipeline architecture streamlines
the data
processing workflow, making it more efficient and
reducing the time
and resources required to process data.
c) Flexibility: It allows for the integration of different
data sources and
the ability to adapt to changing business requirements.
5) Security: Data pipeline architecture enables
organizations to
implement security measures, such as encryption and
access
controls, to protect sensitive data.
6) Data Governance: Data pipeline architecture allows
organizations to
implement data governance practices such as data
lineage, data
quality, and data cataloging that help maintain data
accuracy,
completeness, and reliability.

Data pipelines can be compared to the plumbing


system in the real world.
Both are crucial channels that meet basic needs,
whether it’s moving data
or water. Both systems can malfunction and require
maintenance. In many
companies, a team of data engineers will design
and maintain data
pipelines.
Data pipelines should be automated as much as
possible to
reduce the need for manual supervision. However,
even with data
automation, businesses may still face challenges with
their data pipelines:
1) Complexity: In large companies, there could be a
large number of
data pipelines in operation. Managing and
understanding all these
pipelines at scale can be difficult, such as identifying
which pipelines
are currently in use, how current they are, and what
dashboards or
reports rely on them. In an environment with multiple
data pipelines,
tasks such as complying with regulations and
migrating to the cloud
can become more complicated.
2) Cost: Building data pipelines at a large scale can be
costly.
Advancements in technology, migration to the cloud,
and demands
for more data analysis may all require data engineers
and developers
to create new pipelines. Managing multiple data
pipelines may lead
to increased operational expenses as time goes by.
a) Efficiency: Data pipelines may lead to slow query
performance
depending on how data is replicated and transferred
within an
organization. When there are many simultaneous
requests or large
amounts of data, pipelines can become slow,
particularly in situations that involve multiple data
replicas or use data
virtualization techniques.
What are data pipeline design patterns
Data pipeline design patterns are templates used as a
foundation for creating data pipelines. The choice of
design pattern depends on various
factors, such as how data is received, the business use
cases, and the data volume. Some common design
patterns include:
2) Raw Data Load: This pattern involves moving and
loading raw data from one location to another, such as
between databases or from an on-premise data center
to the cloud. However, this pattern only focuses on the
extraction and loading process and can be slow and
time-consuming with large data volumes. It works well
for one-time operations but is not suitable for recurring
situations.
#) Extract, Transform, Load (ETL): This is a widely
used pattern for loading data into data warehouses,
lakes, and operational data
stores. It involves the extraction, transformation, and
loading of data from one location to another. However,
most ETL processes use
batch processing which can introduce latency to
operations.
Streaming ETL: Similar to the standard ETL pattern
but with data streams as the origin, this pattern uses
tools like Apache Kafka or
StreamSets Data Collector Engine for the complex
ETL processes. Extract, Load, Transform (ELT): This
pattern is similar to ETL, but the transformation
process happens after the data is loaded into the target
destination, which can reduce latency. However, this
design can affect data quality and violate data privacy
rules.
Change, Data, Capture (CDC): This pattern introduces
freshness to data processed using the ETL batch
processing pattern by detecting
changes that occur during the ETL process and sending
them to message queues for downstream processing.
#) Data Stream Processing: This pattern is suitable for
feeding real- time data to high-performance
applications such as IoT and financial
applications. Data is continuously received from
devices, parsed and filtered, processed, and sent to
various destinations like dashboards
for real-time applications.

Difference between elt and data pipeline


Both data pipelines and ETL are responsible for
transferring data between sources and storage
solutions, but they do so in different ways.
Data pipelines work with ongoing data streams in real
time
An ETL pipeline refers to a set of integration-related
batch processes that run on a scheduled basis. ETL
jobs extract data from one or more systems, do basic
data transformations and load the data into a repository
for analytics or operational uses.
A data pipeline, on the other hand, involves a more
advanced set of data processing
activities for filtering, transforming and enriching data
to meet user needs.
As mentioned above, a data pipeline can handle batch
processing but also run in real- time mode, either with
streaming data or triggered by a predetermined rule or
set of conditions. As a result, an ETL pipeline can be
seen as one form of a data pipeline.

Difffernce between etl and elt


ETL focuses more on individual “batches” of data for
more specific purposes.
Transforms data before loading into data warehouse
Best for predefined applications with known
transformation requirements

ELT transforms data after loading into data warehouse


Best for supporting wide range of applications with
different transformation requirements
Q)What is ETL pipeline
pipeline?

q) Can you describe the components of a typical data


pipeline?

1. Storage
One of the first components of a data pipeline is
storage.
Storage provides the foundation for all other
components, as it sets up the pipeline for success. It
simply acts as a place to hold big data until the
necessary tools are available to perform more in
in-depth
tasks. The main function of storage is to provide cost
cost-
effective large-scale
scale storage that scales as the
organization’s data grows.
2. Preprocessing
The next component of a data pipeline is
preprocessing.
This part of the process prepares big data for analysis
and creates a controlled environment for downstream
processes.
The goal of preprocessing is to “clean up” data, which
means correcting dirty inputs, unraveling messy data
structures, and transforming unstructured information
into a structured format (like putting all customer
names in the same field rather than keeping them in
separate fields).It also includes identifying and tagging
relevant subsets of the data for different types of
analysis.data pipelines

3. Analysis
The third component of a data pipeline is analysis,
which provides useful insights into the collected
information and makes it possible to compare new data
with existing big data sets. It also helps organizations
identify relationships between variables in large
datasets to eventually create models that represent
real-world processes.
4. Applications
The fourth component of a data pipeline is
applications, which are specialized tools that provide
the necessary functions to transform processed data
into valuable information. Software such as business
intelligence (BI) can help customers quickly make
applications out of their data.For example, an
organization may use statistical software to analyze big
data and generate reports for business intelligence
purposes.

5. Delivery
The final component of a data pipeline is delivery,
which is the final presentation piece used to
deliver valuable information to those who need it.
For example, a company may use web-based
reporting tools, SaaS applications or a BI solution
to deliver the content to
CHAPTER5: VIRTUALIZATION &
CONTAINERIZATION ON &ELASTICITY IN
CLOUD COMPUTING:

What is Docker, and why is it used for


containerization?
1) Docker is the containerization platform that is used
to package your application and all its
dependencies together in the form of containers to
make sure that your application works
seamlessly in any environment which can be
developed or tested or in production.
2) Docker is a tool designed to make it easier to
create, deploy, and run applications by using
containers.
3) Docker is the world’s leading software
container platform. It was launched in 2013 by a
company called Dotcloud, Inc which was later
renamed Docker, Inc. It is written in the Go
language.
Docker architecture consists of Docker client, Docker
Daemon running on Docker Host, and Docker Hub
repository. Docker has client
Server architecture in which the client communicates
with the Docker Daemon running on the Docker
Host using a combination of APIs, Socket IO, and
TCP.
What are components of Docker
1.Docker Clients and Servers
Servers– Docker has a client-
client
server architecture. The Docker Daemon/Server
consists of all containers.
The Docker Daemon/Server receives the request from
the Docker client through CLI or REST APIs and thus
processes the
request accordingly. Docker client and Daemon can be
present on the same host or different host.

1) Docker Images– Docker images are used to build


docker containers by usi
using
a read-only template.
2) Docker File– Docker file is a text file that contains a
series of instructions on how to build your Docker
image.
3) Docker Registries–
Registries Docker Registry is a storage
component for Docker images. We can store the
images in either public/private repositories so that
multiple users C
can collaborate in building the application.
4) Docker Containers– Docker Containers are runtime
instances of Docker images. Containers contain the
whole kit required for an application, so the
application can be run in an isolated way.

Advantages of Docker
1.Speed – The speed of Docker containers compared to
a virtual machine is very fast.
The time required to build a container is very fast
because they are tiny and lightweight.
2.Portability – The applications that are built inside
docker containers are extremely portable. These
portable applications can easily be moved anywhere as
a single element and their performance also remains
the same.

3.Scalability – Docker has the ability that it can be


deployed on several physical servers, data servers, and
cloud platforms. It can also be run on every Linux
machine. Containers can easily be moved from a cloud
environment to a local host and from there back to the
cloud again at a fast pace.
4 Density – Docker uses the resources that are
available more efficiently because it does not use a
hypervisor. This is the reason that more containers can
be run on a single host as compared to virtual
machines. Docker Containers have higher performance
because of their high density and no overhead wastage
of resources.
4. Explain the difference between a Docker container
and a virtual machine.
A VM lets you run a virtual machine on any hardware.
Docker lets you run an application on any
operating system. It uses isolated user
space instances known as containers.
Docker containers have their own file system,
dependency structure, processes, and network
capabilities. The application has
it requires inside the container and can run anywhere.
Docker container technology uses the underlying
host operating system
resources directly.
what is containerization
Containerization is a method of virtualizing an
operating system so that multiple isolated applications
can run on a single host operating system.
Containerization is a method of virtualization that
packages applications and their dependencies into
isolated, self-contained containers, allowing them to
run securely and independently from each other on the
same host. It provides an efficient way to
deploy, manage, and scale applications across different
platforms.
For example, Docker is a popular form of
containerization that allows software
developers to package their applications into
standardized isolated containers. Docker makes it
easier for applications to run on any system, regardless
of its underlying infrastructure. The most significant
benefit of containerization is increased efficiency.
Containers allow applications to run in isolated, secure
environments, improving resource utilization and
allowing for more flexibility in deployment.
Additionally, containers make deploying, scaling, and
managing applications easier, resulting in improved
operational agility. Steps are:
Package the application and dependencies into a
standard file format, such as a Docker image.
Deploy the packaged application and its dependencies
into a container.
Execute the containerized application in the container
runtime environment.

Differ between repository vs registry


The terms repository and registry may be easily
confused when talking
about containers. A container repository is used to
store related images for
setup and deployment. Container repositories can be
used to manage, pull
or push images.
Container registries store multiple repositories of
container images, as well as storing API paths
and access control rules. Container registries also have
to option of being hosted publicly or privately.
What is container repository
A container registry is a collection of repositories
made to store container images. A container image is a
file comprised of multiple layers which can execute
applications in a single instance.
Types of registries
A container registry is a type of tool that can host and
distribute container images.
A container image is a binary file that serves as the
blueprint for executing applications as containers.
Container images aren’t containers themselves; to
create a container, you have to run a container based
on a container image. But container images tell your
container runtime which processes to execute when it
starts a container.

Thus, the role of a container registry is not to run


containers, but rather to provide an efficient,
centralized solution for storing the data that is
necessary for running containers. By allowing teams to
host a virtually unlimited number of container images
in a single place, container registries make it easy for
developers to publish their applications as container
images, and for users to access those images.
Public container registries are generally the faster and
easier route when initiating a container registry.
2)Public registries are also seen to be easier to use.
3)they may also be less secure than private registries.
4) They are for smaller teams and wroks for standard
and open sourced images from public registries.

A private container registry is set up by the


organization using it.
2) Private registries are either hosted or on premises
and popular with
larger organization or enterprises that are more set on
using a
container registry.
3) Having complete control over the registry in
development allows an
organization more freedom in how they choose to
manage it.
4) private registries are seen to be the more secure .
What is container security
Public containers are seen as less secure because
individual
container images may contain malicious or outdated
code which, if
goes unpatched, could lead to a data breach. It may
also be unknown
who has read or write access to an image.
If an organizations priority is security when it comes to
container registries,
then they should implement a private registry. Other
security approaches to
container registries include:
Assigning role-based access control (RBAC).
Scanning for vulnerabilities in images.
Digitally signing images to ensure each image is
trusted.
Using authentication methods such as access tokens or
JSON
key files, similar to how Google's container
registry works.
Using Identity and Access Manager (IAM) settings,
like how IBM's
Cloud Container Registry does.

Explain elastic resources


1) Elastic resources are applications and infrastructure
that can be summoned on
demand when traffic or workloads get high.
2)Cloud computing businesses such as AWS and
Google Cloud rely on elastic
resources as a business model to bill customers on-
demand like a utility bill.
3) This supply side and demand side economic cycle
are the underpinnings of the cloud
ecosystem.
A good example of an elastic resource is an EC2
server. If a business only requires 2 servers
to run their website, but see a holiday traffic spike,
they can simply allocate additional elastic
resources by increasing the EC2 servers from 2 to 4 to
handle the holiday traffic load. Once
that traffic dies down, they can deprovision the servers
back to 2. This is an elastic
resource.”
. Explain the difference between a Docker container
and a virtual machine.
A VM lets you run a virtual machine on any hardware.
Docker lets you run an application on any
operating system. It uses isolated user
space instances known as containers.
Docker containers have their own file system,
dependency structure, processes, and network
capabilities.
pabilities. The application has
it requires inside the container and can run anywhere.
Docker container technology uses the underlying
host operating system
resources directly.
Chapter 6:Managed Machine learning systems:
Q)compare commercial and open source ML systems:

Summary of differences: ETL vs. ELT


Category ETL ELT
Stands for Extract, transform, and load Extract, load, and transform
Takes raw data, transforms it into a Takes raw data, loads it into the
Process predetermined format, then loads it target data warehouse, then
into the target data warehouse. transforms it just before analytics.
Transformation and Transformation occurs in a Transformation takes place in the
load locations secondary processing server. target data warehouse.
Can handle structured,
Data compatibility Best with structured data. unstructured, and semi-structured
data.
ELT is faster than ETL as it can
Speed ETL is slower than ELT. use the internal resources of the
data warehouse.
Can be time-consuming and costly
More cost-efficient depending on
Costs to set up depending on ETL tools
the ELT infrastructure used.
used.
May require building custom You can use built-in features of the
Security applications to meet data protection target database to manage data
requirements. protection

ELT ETL

ETL tools require specific hardware with


ELT tools do not require additional
their own engines to perform
hardware
transformations

Mostly Hadoop or NoSQL database to


RDBMS is used exclusively to store data
store data.Rarely RDBMS is used

As all components are in one system, As ETL uses staging area, extra time is
loading is done only once required to load the data

The system has to wait for large sizes of


Time to transform data is independent
data. As the size of data increases,
of the size of data
transformation time also increases

It is cost effective and available to all Not cost effective for small and medium
business using SaaS solution business

The data transformed is used by data The data transformed is used by users
scientists and advanced analysts reading report and SQL coders

Creates ad hoc views.Low cost for Views are created based on multiple
building and maintaining scripts.Deleting view means deleting data

Best for unstructured and non-relational


Best for relational and structured data.
data. Ideal for data lakes. Suited for
Better for small to medium amounts of data
very large amounts of data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy