Cloud Unit 2 22.01.2024
Cloud Unit 2 22.01.2024
before proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document through
email in error, please notify the system manager. This document contains
proprietary information and is intended only to the respective group / learning
community as intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately by e-mail
if you have received this document by mistake and delete this document from your
system. If you are not the intended recipient you are notified that disclosing,
copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited.
22CS907
CLOUD FOUNDATIONS
Department : CSE
Created by:
Date : 2 2 .01.2024
1. CONTENTS
S. Contents Page No
No.
1 Contents 5
2 Course Objectives 6
3 Pre-Requisites 7
4 Syllabus 8
5 Course outcomes 10
7 Lecture Plan 12
9 Lecture Notes 14
10 Assignments 73
12 Part B Questions 78
13 Online Certifications 79
15 Assessment Schedule 81
• Pre-requisite Chart
22IT202– DATABASE
MANAGEMENT SYSTEMS
4. SYLLABUS
CLOUD FOUNDATIONS L TC P
22CS907
2 03 2
UNIT I INTRODUCTION TO CLOUD 6+6
Cloud Computing - Cloud Versus Traditional Architecture - IaaS, PaaS, and SaaS - Cloud
Architecture - The GCP Console - Understanding projects - Billing in GCP - Install and
configure Cloud SDK - Use Cloud Shell - APIs - Cloud Console Mobile App.
List of Exercise/Experiments:
1. Install and configure cloud SDK.
2. Connect to computing resources hosted on Cloud via Cloud Shell.
UNIT II COMPUTE AND STORAGE 6+6
Compute options in the cloud - Exploring IaaS with Compute Engine - Configuring elastic
apps with autoscaling - Exploring PaaS - Event driven programs - Containerizing and
orchestrating apps - Storage options in the cloud - Structured and unstructured storage in
the cloud - Unstructured storage using Cloud Storage - SQL managed services - NoSQL
managed services.
List of Exercise/Experiments:
1. Create virtual machine instances of various machine types using the Cloud Console and
the command line. Connect an NGINX web server to your virtual machine.
2. Create a small App Engine application that displays a short message.
3. Create, deploy, and test a cloud function using the Cloud Shell command line.
4. Deploy a containerized application.
5. Create a storage bucket, upload objects to it, create folders and subfolders in it, and make
objects publicly accessible using the Cloud command line.
UNIT III APIs AND SECURITY IN THE CLOUD 6+6
The purpose of APIs – API Services - Managed message services - Introduction to security
in the cloud - The shared security model - Encryption options - Authentication and
authorization with Cloud IAM - Identify Best Practices for Authorization using Cloud IAM.
List of Exercise/Experiments:
1. Deploy a sample API with any of the API service.
2. Publish messages with managed message service using the Python client library.
3. Create two users. Assign a role to a second user and remove assigned roles associated
with Cloud IAM. Explore how granting and revoking permissions works from Cloud Project
Owner and Viewer roles.
UNIT IV NETWORKING, AUTOMATION AND MANGAEMENT TOOLS 6+6
Introduction to networking in the cloud - Defining a Virtual Private Cloud - Public and private
IP address basics - Cloud network architecture - Routes and firewall rules in the cloud -
Multiple VPC networks - Building hybrid clouds using VPNs - Different options for load
balancing - Introduction to Infrastructure as Code - Terraform - Monitoring and management
tools.
List of Exercise/Experiments:
1. Create several VPC networks and VM instances and test connectivity across networks.
2. Create two nginx web servers and control external HTTP access to the web servers using
tagged firewall rules.
3. Configure a HTTP Load Balancer with global backends. Stress test the Load Balancer and
denylist the stress test IP.
4. Create two managed instance groups in the same region. Then, configure and test an
Internal Load Balancer with the instances groups as the backends.
5. Monitor a Compute Engine virtual machine (VM) instance with Cloud Monitoring by
creating uptime check, alerting policy, dashboard and chart.
UNIT V BIG DATA AND MACHINE LEARNING SERVICES 6+6
Introduction to big data managed services in the cloud - Leverage big data operations - Build
Extract, Transform, and Load pipelines - Enterprise Data Warehouse Services - Introduction
to machine learning in the cloud - Building bespoke machine learning models with AI Platform
- Pre-trained machine learning APIs.
List of Exercise/Experiments:
1. Create a cluster, run a simple Apache Spark job in the cluster, then modify the number
of workers in the cluster.
2. Create a streaming pipeline using one of the cloud service.
3. Set up your Python development environment, get the relevant SDK for Python, and run
an example pipeline using the Cloud Console.
4. Use cloud-based data preparation tool to manipulate a dataset. Import datasets, correct
mismatched data, transform data, and join data.
5. Utilize a cloud-based data processing and analysis tool for data exploration and use a
machine learning platform to train and deploy a custom TensorFlow Regressor model for
predicting customer lifetime value.
TOTAL: 60 PERIODS
5. COURSE OUTCOME
PSO3
PSO2
PSO1
PO- PO- PO- PO- PO- PO- PO- PO- PO- PO- PO- PO-
1 2 3 4 5 6 7 8 9 10 11 12
CO1 K3 2 1 1 - - - - - - 2 2 2 2 2 2
CO2 K3 3 3 3 - - - - 2 2 2 2 2 2 2 2
CO3 K3 3 3 3 - - 2 - 2 2 2 2 2 2 2 2
CO4 K3 3 3 3 - - - - 2 2 2 2 2 2 2 2
CO5 K3 3 3 3 - - 2 - - 2 2 2 2 2 2 2
Correlation Level:
1. Slight (Low)
2. Moderate (Medium)
3. Substantial (High)
If there is no correlation, put “-“.
7. LECTURE PLAN
Number Actual
Sl. Proposed Taxonomy Mode of
Topic of Lecture CO
No. Date Level Delivery
Periods Date
Cloud Computing-
Cloud Versus Chalk &
1 Traditional 1 03.01.2024 CO1 K1 talk
Architecture
IaaS, PaaS, and Chalk &
2 1 03.01.2024 CO1 K2
SaaS talk
In cloud computing, the term “compute” describes concepts and objects related to
software computation. It is a generic term used to reference processing power, memory,
networking, storage, and other resources required for the computational success of any
program.
GCP offers a range of compute services that go from giving users full control (i.e.,
Compute Engine) to highly-abstracted (i.e., Firebase and Cloud Functions), letting Google
take care of more and more of the management and operations along the way.
App engine: App engine is a platform as a service and Cloud computing platform for
developing and hosting web applications. It helps to build highly scalable web and mobile
backend applications on a fully managed serverless platform developers can focus on
writing code without having to manage the underlying infrastructure.
Cloud Functions: It offers scalable pay as you go Functions as a Service od FaaS to run
your code with zero server management.
Cloud Run: Cloud Run provides a serverless managed compute platform to run stateless,
isolated containers without orchestration that can be invoked via web requests or
Pub/Sub events.
Compute Engine is a computing and hosting service that lets you create and run
virtual machines on Google infrastructure
It can run thousands of virtual CPUs on a system that offers quick, consistent
performance.
Compute Engine instances can run the public images for Linux and Windows Server
that Google provides as well as private custom images that you can create or
import from your existing systems
Can use Docker containers, which are automatically launched on instances running
the Container-Optimized OS public image
Google Compute Engine (GCE) is an infrastructure as a service (IaaS) offering that
allows clients to run workloads on Google's physical hardware.
Google Compute Engine provides a scalable number of virtual machines (VMs) to
serve as large compute clusters for that purpose.
Secure and customizable compute service that lets you create and run virtual
machines on Google’s infrastructure.
GCE can be managed through a RESTful application program interface (API),
command line interface or web console.
An instance is a virtual machine (VM) hosted on Google's infrastructure. You can create
an instance or create a group of managed instances by using the Google Cloud console,
the Google Cloud CLI, or the Compute Engine API.
Machine Properties
You can choose the machine properties of your instances, such as the number of virtual
CPUs and the amount of memory, by using a set of predefined machine types or by
creating your own custom machine types.
When you create a VM, you select a machine type from a machine family that determines
the resources available to that VM. There are several machine families you can choose
from and each machine family is further organized into machine series and predefined
machine types within each series.
Machine family: A curated set of processor and hardware configurations optimized for
specific workloads. When you create a VM instance, you choose a predefined or custom
machine type from your preferred machine family.
Series: Machine families are further classified by series and generation. For example,
the N1 series within the general-purpose machine family is the older version of the N2
series. Generally, generations of a machine series use a higher number to describe the
newer generation. For example, the N2 series is the newer generation of the N1 series.
Machine type: Every machine series has predefined machine types that provide a set
of resources for your VM. If a predefined machine type does not meet your needs, you
can also create a custom machine type.
Machine Families
General Purpose
Compute-optimized
Memory-optimized
Accelerator-optimized
General purpose machine family:
The general-purpose machine family has the best price-performance with the most
flexible vCPU to memory ratios, and provides features that target most standard and
cloud-native workloads.
The general-purpose machine family has predefined and custom machine types to align
with your workload, depending on your requirements. E2, E2 shared-core, N2, N2D, and
Tau T2D are second generation machine series in this family; N1 and its related shared-
core machine types are first generation machine series.
E2 machine series
Does not support GPUs, local SSDs, sole-tenant nodes, or nested virtualization.
E2 Limitations
N2 machine series
The Tau T2D machine series run on the third generation AMD EPYC Milan processor with
a base frequency of 2.45 GHz, an effective frequency of 2.8 GHz, and a max boost
frequency of 3.5 GHz. This series has predefined machine types of up to 60 vCPUs,
support 4 GB of memory per vCPU, and up to 32 Gbps of network egress. It also supports
committed use discounts and reservations.
N1 machine series
the C2 machine series provides full transparency into the architecture of the underlying
server platforms, letting you fine-tune the performance. Machine types in this series offer
much more computing power, and are generally more robust for compute-intensive
workloads compared to N1 high-CPU machine types.
The C2 series comes in different machine types ranging from 4 to 60 vCPUs, and offers
up to 240 GB of memory. You can attach up to 3TB of local storage to these VMs for
applications that require higher storage performance.
The C2 series also supports 50 Gbps and 100 Gbps high-bandwidth network
configurations.
This series also produces a greater than 40% performance improvement compared to
the previous generation N1 machines and offer higher performance per thread and
isolation for latency-sensitive workloads
The C2D machine series provides the largest VM sizes and are best-suited for high-
performance computing (HPC). The C2D series also has the largest available last-level
cache (LLC) cache per core.
The C2D machine series comes in different machine types ranging from 2 to 112 vCPUs,
and offer up to 896 GB of memory. You can attach up to 3TB of local storage to these
machine types for applications that require higher storage performance.
Limitations
The memory-optimized machine family provides the most compute and memory
resources of any Compute Engine machine family offering. They are ideal for workloads
that require higher memory-to-vCPU ratios than the high-memory machine types in the
general-purpose N1 machine series.
The M1 machine series has up to 4TB of memory, while the M2 machine series has up to
12TB of memory. These machine series are well-suited for large in-memory databases
such as SAP HANA, as well as in-memory data analytics workloads.
Both M1 and M2 machine series offer the lowest cost per GB of memory on Compute
Engine, making them a great choice for workloads that utilize higher memory
configurations with low compute resources requirements.
M1 machine series
The M1 machine series is the older generation memory-optimized machine series that
offer 14.9 to 24 GB of memory per vCPU. This series offers the m1- ultramem and m1-
megamem machine types and are only available in specific regions and zones.
M1 Limitations
The M1 machine series is only available as predefined machine types. This series offer at
least from 14 GB to 28 GB memory per vCPU. The following restrictions apply:
M2 machine series
With the addition of 6 TB and 12 TB machine types to the M2 machine series, SAP
customers can run their largest SAP HANA databases on Google Cloud. The M2 series is
available with on demand pricing for an evaluation period only.
M2 Limitations
The M2 machine series is only available as predefined machine types. This series offers
at least from 14 GB to 28 GB memory per vCPU. The following restrictions apply:
You cannot use regional persistent disks with the M2 machine series.
The M2 machine series are only available in select zones and regions on specific
CPU processors.
The accelerator-optimized machine family feature NVIDIA's new Ampere A100 GPUs and
is a new machine family available on Compute Engine. Machine types in this family are
optimized for massively parallelized Compute Unified Device Architecture (CUDA)
workloads, such as machine learning (ML) and high performance computing (HPC).
Cloud storage. Persistent disks feature high-performance block storage that lets users
take snapshots and create new persistent disks from the snapshot.
Confidential VMs. These VMs enable users to encrypt data while it's being processed
without negatively affecting performance.
Custom machine types. Users can customize VMs to suit business needs and optimize
cost effectiveness.
Global load balancing. This feature distributes workloads across multiple instance
regions to improve performance, throughput, and availability.
GPU accelerators. Users can add GPUs to speed up computationally intensive workloads
like virtual workstation applications and machine learning. Customers pay for GPU
resources only while using them.
Instance groups. These VM clusters run a single application and automatically manage
updates.
Live migration for VMs. VMs can migrate between host machines without rebooting.
This feature enables applications to continue running during maintenance.
Local solid-state drives. These local SSDs are always encrypted and physically
attached to the host server. They have low latency compared to persistent disks.
Operating system (OS) support. Users can run a number of different OSes, including
Debian, CentOS, Red Hat Enterprise Linux, SUSE, Ubuntu and Windows Server. GCE also
includes patch management for OSes.
Payment. GCE offers per-second billing and committed use discounts with no upfront
costs or instance lock-in.
Sole-tenant nodes. These nodes are GCE servers dedicated to one tenant. They make
it easier to deploy bring-your-own-license (BYOL) applications and allow the same
machine types and VM configurations as standard compute instances.
Spot VMs. These are affordable instance options used for fault-tolerant workloads and
batch jobs. They help users cut costs, but they can be prone to service interruptions.
Spot VMs come with the same capabilities and machine types as standard VMs.
Virtual machine manager. GCE comes with the VM manager, which helps users
manage OSes for large collections of VMs. GCE also provides right-sizing
recommendations to help customers use resources efficiently.
There are many reasons organizations use Google Compute Engine, including these
three:
AUTOSCALING
Managed instance groups (MIGs) let you operate apps on multiple identical VMs.
You can make your workloads scalable and highly available by taking advantage
of automated MIG services, including: autoscaling, autohealing, regional (multiple
zone) deployment, and automatic updating.
Unmanaged instance groups let you load balance across a fleet of VMs that you
manage yourself.
Autoscaling works with managed instance groups (MIGs) only. Unmanaged
instance groups are not supported.
Autoscaling works by adding more VMs to your MIG when there is more load
(scaling out, sometimes referred to as scaling up), and deleting VMs when the
need for VMs is lowered (scaling in or down).
Autoscaling lets your apps gracefully handle increases in traffic, and it reduces cost
when the need for resources is lower.
You can autoscale a MIG based on its CPU utilization, Cloud Monitoring metrics,
schedules, or load balancing serving capacity.
You define the autoscaling policy and the autoscaler performs automatic scaling
based on the measured load and the options you configure.
When you set up an autoscaler to scale based on load balancing serving capacity,
the autoscaler watches the serving capacity of an instance group and scales when
the VM instances are over or under capacity. The serving capacity of an instance
can be defined in the load balancer's backend service and can be based on either
utilization or requests per second.
Autoscaling policy:
When you define an autoscaling policy for your group, you specify one or more signals
that the autoscaler uses to scale the group. When you set multiple signals in a policy, the
autoscaler calculates the recommended number of VMs for each signal and sets your
group's recommended size to the largest number.
You can autoscale based on one or more of the following metrics that reflect the load of
the instance group:
The autoscaler continuously collects usage information based on the selected utilization
metric, compares actual utilization to your desired target utilization, and uses this
information to determine whether the group needs to remove instances (scale in) or add
instances (scale out).
Schedules:
You can use schedule-based autoscaling to allocate capacity for anticipated loads. You
can have up to 128 scaling schedules per instance group. For each scaling schedule,
specify the following:
Each scaling schedule is active from its start time and for the configured duration. During
this time, autoscaler scales the group to have at least as many instances as defined by
the scaling schedule.
App Engine is a fully managed, serverless platform for developing and hosting web
applications at scale. You can choose from several popular languages, libraries, and
frameworks to develop your apps, and then let App Engine take care of provisioning
servers and scaling your app instances based on demand.
Google App Engine (often referred to as GAE or simply App Engine) is a cloud
computing platform as a service for developing and hosting web applications in
Google-managed data centers.
Applications are sandboxed and run across multiple servers.[2] App Engine offers
automatic scaling for web applications—as the number of requests increases for
an application, App Engine automatically allocates more resources for the web
application to handle the additional demand.
Google App Engine primarily supports Go, PHP, Java, Python, Node.js, .NET, and
Ruby applications, although it can also support other languages via "custom
runtimes".
The service is free up to a certain level of consumed resources and only in standard
environment but not in flexible environment. Fees are charged for additional
storage, bandwidth, or instance hours required by the application.
You can run your applications in App Engine by using the App Engine flexible environment
or the App Engine standard environment. You can also choose to simultaneously use both
environments for your application and allow your services to take advantage of each
environment's individual benefits.
App Engine is well suited to applications that are designed using a microservice
architecture, especially if you decide to utilize both environments. Use the following
sections to learn and understand which environment best meets your application's needs.
Standard environment
The standard environment is optimal for applications with the following characteristics:
1. Source code is written in specific versions of the supported programming
languages:
Python 2.7, Python 3.7, Python 3.8, Python 3.9, and Python 3.10
Java 8, Java 11, and Java 17
Node.js 10, Node.js 12, Node.js 14, Node.js 16
PHP 5.5, PHP 7.2, PHP 7.3, PHP 7.4, and PHP 8.1
Ruby 2.5, Ruby 2.6, Ruby 2.7, and Ruby 3.0
Go 1.11, Go 1.12, Go 1.13, Go 1.14, Go 1.15, and Go 1.16
2. Intended to run for free or at very low cost, where you pay only for what you need
and when you need it. For example, your application can scale to 0 instances when
there is no traffic.
3. Experiences sudden and extreme spikes of traffic which require immediate
scaling.
Flexible environment
App Engine allows developers to focus on what they do best: writing code. Based on
Compute Engine, the App Engine flexible environment automatically scales your app up
and down while also balancing the load.
The flexible environment is optimal for applications with the following characteristics:
Features
Customizable infrastructure
Performance options
Native feature support
App Engine app is created under your Google Cloud project when you create an
application resource. The App Engine application is a top-level container that includes the
service, version, and instance resources that make up your app. When you create your
App Engine app, all your resources are created in the region that you choose, including
your app code along with a collection of settings, credentials, and your app's metadata.
Each App Engine application includes at least one service, the default service, which can
hold many versions, depending on your app's billing status.
The following diagram illustrates the hierarchy of an App Engine app running with multiple
services. In this diagram, the app has two services that contain multiple versions, and
two of those versions are actively running on multiple instances:
Services
Use services in App Engine to factor your large apps into logical components that can
securely share App Engine features and communicate with one another. Generally, your
App Engine services behave like microservices. Therefore, you can run your whole app
in a single service, or you can design and deploy multiple services to run as a set of
microservices.
Versions
Having multiple versions of your app within each service allows you to quickly switch
between different versions of that app for rollbacks, testing, or other temporary events.
You can route traffic to one or more specific versions of your app by migrating or
splitting traffic.
Instances
The versions within your services run on one or more instances. By default, App Engine
scales your app to match the load. Your apps will scale up the number of instances that
are running to provide consistent performance, or scale down to minimize idle instances
and reduces costs. For more information about instances, see How Instances are
Managed.
Application requests
Each of your app's services and each of the versions within those services must have a
unique name. You can then use those unique names to target and route traffic to
specific resources using URLs, for example:
https://VERSION-dot-SERVICE-dot-PROJECT_ID.REGION_ID.r.appspot.com
Limits
The maximum number of services and versions that you can deploy depends on your
app's pricing:
There is also a limit to the number of instances for each service with basic or manual
scaling:
There is also a limit to the number of characters in the URL of your application.
Description Limit
CLOUD FUNCTIONS:
Key features
Cloud Functions has a simple and intuitive developer experience. Just write your code
and let Google Cloud handle the operational infrastructure. Develop faster by writing and
running small code snippets that respond to events. Streamline challenging orchestration
problems by connecting Google Cloud products to one another or third-party services
using events.
Use open source FaaS (function as a service) framework to run functions across multiple
environments and prevent lock-in. Supported environments include Cloud Functions, local
development environment, on-premises, Cloud Run, and other Knative-based serverless
environments.
1. Click the linked name of the function in the Cloud Functions Overview page to
open the Function details page.
Containers
Containers, which make the Virtual Machine or the programs independent of the
underlying Operating System. Containers are deployed upon compatible Operating
systems, and we can even have multiple containers.
Containers essentially are a simple way to deploy and use cloud-based services.
However, in practical application, one may end up with many many containers –
and managing them manually might get too taxing.
Deploying, managing, connecting, and updating those many containers would
need a separate department or a dedicated team- which would make the process
inefficient.
When you run a GKE cluster, you also gain the benefit of advanced cluster management
features that Google Cloud provides. These include:
This is the original mode of operation that came out with GKE and is still used today.
Here the user gets node configuration flexibility and full control over managing the
clusters and node infrastructure. It is best suited for those looking to have full control
over every little aspect of their GKE experience.
In this mode, the entire managing of node and cluster infrastructure is done from Google’s
side, providing a more hands-off approach. However, it comes with some restrictions that
one needs to keep in mind as well like, the choice in Operating System is currently limited
to just two, and most features are available only via the CLI.
Use GKE when you want to provide developers architectural flexibility or minimize
operational costs. While GKE is the best-managed offering, you’ll need in-house resources
to manage your Kubernetes clusters.
Based on the user’s CPU utilization and other custom metrics, Google provides horizontal
pod autoscaling (adjusting the number of machines based on requirements) and based
on CPU and memory usage, vertical pod autoscaling (varying the power of available
machines).
Kubernetes Applications:
Google provides pre-built applications that are enterprise-ready, with included licensing,
billing, and portability. Such applications increase the productivity of the user, as their
work is now cut out.
GKE offers Cloud Logging and Cloud Monitoring with simple checkbox configurations
which makes it easier to gain insight into how an application is running.
Fully Managed:
GKE clusters are fully managed by Google Site Reliability Engineers (SREs), ensuring that
the cluster is available and up to date.
CLOUD RUN
Build and deploy scalable containerized apps using your favorite language (Go, Python,
Java, Node.js, .NET) on a fully managed serverless platform.
You can deploy code written in any programming language on Cloud Run if you can build
a container image from it. In fact, building container images is optional. If you're using
Go, Node.js, Python, Java, .NET Core, or Ruby, you can use the source-based deployment
option that builds the container for you, using the best practices for the language you're
using.
Google has built Cloud Run to work well together with other services on Google Cloud,
so you can build full-featured applications.
In short, Cloud Run allows developers to spend their time writing their code, and very
little time operating, configuring, and scaling their Cloud Run service. You don't have to
create a cluster or manage infrastructure in order to be productive with Cloud Run.
On Cloud Run, your code can either run continuously as a service or as a job. Both
services and jobs run in the same environment and can use the same integrations with
other services on Google Cloud.
1. Cloud Run services. Used to run code that responds to web requests, or events.
2. Cloud Run jobs. Used to run code that performs work (a job) and quits when the
work is done.
Scale to zero is attractive for economic reasons since you're charged for the CPU and
memory allocated to a container instance with a granularity of 100ms. If you don't
configure minimum instances, you're not charged if your service is not used.
Request-based
If a container instance is not processing requests, the CPU is not allocated, and you're
not charged. Additionally, you pay a per-request fee.
Instance-based
You're charged for the entire lifetime of a container instance and the CPU is always
allocated. There's no per-request fee.
STORAGE OPTIONS IN THE CLOUD
Cloud Storage allows world-wide storage and retrieval of any amount of data at any time.
You can use Cloud Storage for a range of scenarios including serving website content,
storing data for archival and disaster recovery, or distributing large data objects to users
via direct download.
The storage options that GCP provides us are: Google Cloud Storage,Google Cloud
Bigtable, Google Cloud SQL, Google Cloud Spanner, Cloud Datastore and Cloud Firestore.
STRUCTURED AND UNSTRUCTURED DATA STORAGE IN THE CLOUD
The Google storage and database services can be put into 2 categories:
If the data can be organized in a structural format like rows and columns then it is known
as structured data. It comes in various sizes, latency, and cost based on the requirement.
It is a sequence of bytes that could be from a video, image, or document. The data is
stored as objects in buckets and no insight can be gained from unstructured data.
Google Cloud Storage and Cloud Firestore are used to store unstructured data in the
Google Cloud Platform.
CLOUD STORAGE
Google Cloud Storage is the object storage service offered by Google Cloud. It provides
some interesting features such as object versioning or fine-grain permissions (per object
or bucket), that can make development easy and help reduce operational overheads. It
also serves as the foundation of several different services.
Google Cloud Storage offers developers and IT organizations durable and highly available
object storage. It assesses no minimum fee; you pay only for what you use. Prior
provisioning of capacity isn’t necessary.
Cloud Storage is a service for storing your objects in Google Cloud. An object is an
immutable piece of data consisting of a file of any format. You store objects in containers
called buckets. All buckets are associated with a project, and you can group your projects
under an organization. Each project, bucket, and object in Google Cloud is a resource in
Google Cloud, as are things such as Compute Engine instances.
After you create a project, you can create Cloud Storage buckets, upload objects to your
buckets, and download objects from your buckets. You can also grant permissions to
make your data accessible to principals you specify, or - for certain use cases such as
hosting a website - accessible to everyone on the public internet.
Cloud storage is one of the many storage options on GCP and stores and search
object data also known as blob data you can store an unlimited number of objects
in the cloud up to 5 terabytes in size.
Each cloud storage is well-suited for binary or object data such as images media
servings and backup cloud storage are the same storage that we use for images
in google photos Gmail attachments Google Docs and so on.
Users have a variety of storage requirements for a multitude of use cases to cater
to these requirements.
Google cloud offers different classes of cloud storage the classes are based on how
often the data is accessed multi-regional storage, Regional, Nearline and Cold line.
Cloud storage organizes files into buckets, when you create a bucket, you give it
a Globally Unique name and Geographic location where the buckets and contents
are stored, and you choose one of the default storage classes.
There are several ways to control user access to your object and buckets for most
to purposes cloud IAM is sufficient rules are inherited from project to bucket to
object
If you need final control, you can create access control lists ACLs define who has
access to your bucket and objects as well as what level of access they have
Each ACL consists of two pieces of information a scope which who defines who
can perform the specified actions and a permission which defines what action can
be performed for example read or write
If you want, you can turn on object versioning on your buckets cloud storage keeps
a history of modifications that is overrides or deletes for all objects in the bucket
you can list the archived versions of an object restore an object to an older state
or permanently delete a version as needed.
If you don’t turn on object versioning new data always overrides old cloud storage.
Buckets
Buckets are the basic containers that hold your data. Everything that you store in Cloud
Storage must be contained in a bucket. You can use buckets to organize your data and
control access to your data, but unlike directories and folders, you cannot nest buckets.
There is no limit to the number of buckets you can have in a project or location.
There are, however, limits to the rate you can create or delete buckets.
When you create a bucket, you give it a globally unique name and a geographic location
where the bucket and its contents are stored. The name and location of the bucket cannot
be changed after creation, though you can delete and re-create the bucket to achieve a
similar result. There are also optional bucket settings that you can configure during bucket
creation and change later.
4. Click Create.
Storage classes
The storage class set for an object affects the object's availability and pricing model.
When you create a bucket, you can specify a default storage class for the bucket. When
you add objects to the bucket, they inherit this storage class unless explicitly set
otherwise.
Cloud Storage lets you choose among four different types of storage classes: Standard,
Nearline, Coldline and Archieve. Standard and Archive are high-performance object
storage, whereas Nearline and Coldline are backup and archival storage. All of the storage
classes are accessed in analogous ways using the Cloud Storage API, and they all offer
millisecond access times.
Standard storage is best for data that is frequently accessed ("hot" data) and/or
stored for only brief periods of time.
When used in a region, Standard storage is appropriate for storing data in the
same location as Google Kubernetes Engine clusters or Compute Engine
instances that use the data. Co-locating your resources maximizes the
performance for data-intensive computations and can reduce network charges.
When used in a dual-region, you still get optimized performance when accessing
Google Cloud products that are located in one of the associated regions, but you
also get the improved availability that comes from storing data in geographically
separate locations.
When used in a multi-region, Standard storage is appropriate for storing data
that is accessed around the world, such as serving website content, streaming
videos, executing interactive workloads, or serving data supporting mobile and
gaming applications.
Nearline storage is a low-cost, highly durable storage service for storing infrequently
accessed data. Nearline storage is a better choice than Standard storage in scenarios
where slightly lower availability, a 30-day minimum storage duration, and costs for data
access are acceptable trade-offs for lowered at-rest storage costs.
Nearline storage is ideal for data you plan to read or modify on average once per month
or less. For example, if you want to continuously add files to Cloud Storage and plan to
access those files once a month for analysis, Nearline storage is a great choice.
Nearline storage is also appropriate for data backup, long-tail multimedia content, and
data archiving. Note, however, that for data accessed less frequently than once a
quarter, Coldline storage or Archive storage are more cost-effective, as they offer lower
storage costs.
Coldline storage
Coldline storage is a very-low-cost, highly durable storage service for storing infrequently
accessed data. Coldline storage is a better choice than Standard storage or Nearline
storage in scenarios where slightly lower availability, a 90-day minimum storage duration,
and higher costs for data access are acceptable trade-offs for lowered at-rest storage
costs.
Coldline storage is ideal for data you plan to read or modify at most once a quarter. Note,
however, that for data being kept entirely for backup or archiving purposes, Archive
storage is more cost-effective, as it offers the lowest storage costs.
Archive storage
Archive storage is the lowest-cost, highly durable storage service for data archiving,
online backup, and disaster recovery. Unlike the "coldest" storage services offered by
other Cloud providers, your data is available within milliseconds, not hours or days.
Like Nearline storage and Coldline storage, Archive storage has a slightly lower availability
than Standard storage. Archive storage also has higher costs for data access and
operations, as well as a 365-day minimum storage duration. Archive storage is the best
choice for data that you plan to access less than once a year.
For example:
Cold data storage - Archived data, such as data stored for legal or regulatory reasons,
can be stored at low cost as Archive storage, yet still be available if you need it.
Disaster recovery - In the event of a disaster recovery event, recovery time is key. Cloud
Storage provides low latency access to data stored as Archive storage.
Project: Example Inc. is building several applications, and each one is associated with
a project. Each project has its own set of Cloud Storage APIs, as well as other
resources.
Bucket: Each project can contain multiple buckets, which are containers to store your
objects. For example, you might create a photos bucket for all the image files your app
generates and a separate videos bucket.
Here are some basic ways you can interact with Cloud Storage:
Console: The Google Cloud console provides a visual interface for you to manage
your data in a browser.
gsutil: gsutil is a command-line tool that allows you to interact with Cloud
Storage through a terminal. If you use other Google Cloud services, you can
download the Google Cloud CLI, which includes gsutil along with the gcloud tool
for other services.
Client libraries: The Cloud Storage client libraries allow you to manage your data
using one of your preferred languages, including C++, C#, Go, Java, Node.js,
PHP, Python, and Ruby.
REST APIs: Manage your data using the JSON or XML API.
Identity and Access Management: Use IAM to control who has access to the
resources in your Google Cloud project. Resources include Cloud Storage buckets and
objects, as well as other Google Cloud entities such as Compute Engine instances. You
can grant principals certain types of access to buckets and objects, such as update,
create, or delete.
Data encryption: Cloud Storage uses server-side encryption to encrypt your data by
default. You can also use supplemental data encryption options such as customer-
managed encryption keys and customer-supplied encryption keys.
Authentication: Ensure that anyone who accesses your data has proper credentials.
Bucket Lock: Govern how long objects in buckets must be retained by specifying a
retention policy.
Key Features
Fully managed relational database service for MySQL, PostgreSQL, and SQL Server with
rich extension collections, configuration flags, and developer ecosystems.
Google Cloud SQL offers the flexibility to set up database infrastructure once done with
cloud application development. If you have existing databases running in MySQL, SQL
Server, or PostgreSQL BETA, you can conveniently shift them to Cloud SQL.
1. MySQL
It is Provided by Oracle
2. PostgreSQL
3.SQL Server
Provided by Microsoft
SQL Server is a relational database management software product that helps store and
extract data when requested by the applications. These applications can be running on
the same system or a network of systems spread across a network.
CLOUD SQL
Cloud SQL is a fully managed database service that helps you set up, maintain, manage,
and administer your relational databases on Google Cloud Platform.
You can use Cloud SQL with MySQL, PostgreSQL, or SQL Server.
Cloud SQL is an easy-to-use service that delivers fully managed relational databases. If
you want to keep all your focus on building your application and not worry about database
management tedious tasks such as applying patches and updates, managing backups,
and configuring replications. If you are in an initial phase of a startup and low of DevOps
team and need a relational database, Cloud SQL is the one for you.
Cloud SQL offers many services so you don't have to build and maintain them yourself.
You can focus on your data and let Cloud SQL handle the following operations:
Backups
High availability and failover
Network connectivity
Export and import
Maintenance and updates
Monitoring
Logging
Each Cloud SQL instance is powered by a virtual machine (VM) running on a host Google
Cloud server. Each VM operates the database program, such as MySQL Server,
PostgreSQL, or SQL Server, and service agents that provide supporting services, such as
logging and monitoring. The high availability option also provides a standby VM in another
zone with a configuration that's identical to the primary VM.
The database is stored on a scalable, durable network storage device called a persistent
disk that attaches to the VM. A static IP address sits in front of each VM to ensure that
the IP address an application connects to persists throughout the lifetime of the Cloud
SQL instance.
Cloud SQL pricing varies with your configuration settings, and depends on:
Configuration updates
As your database's usage grows and new workloads are added, you might want to update
your database configuration to adapt accordingly. Configuration updates include:
System updates
Keeping the database instance up and running requires operational effort beyond
configuration updates. Servers and disks need to be replaced and upgraded. Operating
systems need to be patched as new vulnerabilities are discovered. Database programs
need to be upgraded as the database software provider releases new features and fixes
new issues.
The process Cloud SQL uses to perform system updates varies based on which part of
the system is getting updated. In general, Cloud SQL system updates are divided into
three categories: hardware updates, online updates, and maintenance.
Cloud Spanner
Cloud Spanner is a fully managed, mission-critical, relational database service that offers
transactional consistency at global scale, automatic, synchronous replication for high
availability, and support for two SQL dialects: Google Standard SQL and PostgreSQL
1.Nodes
2. Storage
3. Networking
Cloud Spanner pricing for nodes (or processing) is set on an hourly basis, based on the
maximum number of nodes used within any given hour in a project. Pricing for Cloud
Spanner storage is set on a per-month basis, based on the average amount of data in
Cloud Spanner tables and secondary indexes during that month. Google Cloud Spanner
pricing for network bandwidth is set on a per-month basis, based on the amount used
during that month.
Before begin,
1. Create an account
2. Create a new Google Cloud project
3. Make sure that billing is enabled for your Cloud project.
4. Enable the Cloud Spanner API for the project.
5. In Choose your configuration, retain the default option Regional and select a
configuration from the drop-down menu.
Your instance configuration determines the geographic location where your
instances are stored and replicated.
6. In Allocate compute capacity, retain the default value of 1000 processing units.
7. Click Create.
The instance appears in the instances list.
NO SQL BASED MANAGED SERVICE
CLOUD FIRESTORE
Firestore is a NoSQL document database built for automatic scaling, high performance,
and ease of application development. While the Firestore interface has many of the same
features as traditional databases, as a NoSQL database it differs from them in the way it
describes relationships between data objects.
Cloud Firestore is a flexible, scalable database for mobile, web, and server development
from Firebase and Google Cloud. Like Firebase Realtime Database, it keeps the data in-
sync across client apps through real-time listeners and offers offline support for mobile
and web so users can build responsive apps that work regardless of network latency or
Internet connectivity.
Following Cloud Firestore's NoSQL data model, you store data in documents that
contain fields mapping to values. These documents are stored in collections, which
are containers for your documents that you can use to organize your data and
build queries.
Documents support many different data types, from simple strings and numbers,
to complex, nested objects. You can also create subcollections within documents
and build hierarchical data structures that scale as your database grows.
The Cloud Firestore data model supports whatever data structure works best for
your app.
Protect access to your data in Cloud Firestore with Firebase Authentication and Cloud
Firestore Security Rules for Android, Apple platforms, and JavaScript, or Identity and
Access Management (IAM) for server-side languages.
1. If you haven't already, create a Firebase project: In the Firebase console, click Add
project, then follow the on-screen instructions to create a Firebase project or to
add Firebase services to an existing GCP project.
2. From the Firebase console's navigation pane, select Firestore, then click Create
database for Firestore.
6. Click Done.
Secure your data
Use Firebase Authentication and Firestore Security Rules to secure your data in
Firestore.
Here are some basic rule sets you can use to get started. You can modify your
security rules in the Rules tab of the Firebase console.
SQL relational database management system with joins and secondary indexes
Built-in high availability
Strong global consistency
Database sizes that exceed 2 TB
And high numbers of input/output operations per second. We’re talking tens
of thousands of reads/writes per second or more
CLOUD BIGTABLE
Cloud Bigtable is Google's fully managed NoSQL Big Data database service. It's the same
database that powers many core Google services, including Search, Analytics, Maps, and
Gmail.
Bigtable should be used if your required are one or all of the mentioned below:
Cloud Bigtable is a sparsely populated table that can scale to billions of rows and
thousands of columns, enabling you to store terabytes or even petabytes of data. A single
value in each row is indexed; this value is known as the row key.
Bigtable is ideal for storing large amounts of single-keyed data with low latency. It
supports high read and write throughput at low latency, and it's an ideal data source for
MapReduce operations.
Bigtable is ideal for applications that need high throughput and scalability for key/value
data, where each value is typically no larger than 10 MB. Bigtable also excels as a storage
engine for batch MapReduce operations, stream processing/analytics, and machine-
learning applications.
You can use Bigtable to store and query all of the following types of data:
Time-series data, such as CPU and memory usage over time for multiple servers.
Marketing data, such as purchase histories and customer preferences.
Financial data, such as transaction histories, stock prices, and currency exchange
rates.
Internet of Things data, such as usage reports from energy meters and home
appliances.
Graph data, such as information about how users are connected to one another.
Bigtable stores data in massively scalable tables, each of which is a sorted key/value map.
The table is composed of rows, each of which typically describes a single entity, and
columns, which contain individual values for each row. Each row is indexed by a single
row key, and columns that are related to one another are typically grouped into a column
family. Each column is identified by a combination of the column family and a column
qualifier, which is a unique name within the column family.
Each row/column intersection can contain multiple cells. Each cell contains a unique
timestamped version of the data for that row and column. Storing multiple cells in a
column provides a record of how the stored data for that row and column has changed
over time. Bigtable tables are sparse; if a column is not used in a particular row, it does
not take up any space.
Bigtable architecture
As the diagram illustrates, all client requests go through a frontend server before they
are sent to a Bigtable node. (In the original Bigtable paper, these nodes are called
"tablet servers.") The nodes are organized into a Bigtable cluster, which belongs to a
Bigtable instance, a container for the cluster.
Each node in the cluster handles a subset of the requests to the cluster. By adding nodes
to a cluster, you can increase the number of simultaneous requests that the cluster can
handle. Adding nodes also increases the maximum throughput for the cluster. If you
enable replication by adding additional clusters, you can also send different types of traffic
to different clusters. Then if one cluster becomes unavailable, you can fail over to another
cluster.
A Bigtable table is sharded into blocks of contiguous rows, called tablets, to help balance
the workload of queries. (Tablets are similar to HBase regions.) Tablets are stored on
Colossus, Google's file system, in SSTable format. An SSTable provides a persistent,
ordered immutable map from keys to values, where both keys and values are arbitrary
byte strings. Each tablet is associated with a specific Bigtable node. In addition to the
SSTable files, all writes are stored in Colossus's shared log as soon as they are
acknowledged by Bigtable, providing increased durability.
Load balancing
Each Bigtable zone is managed by a primary process, which balances workload and data
volume within clusters. This process splits busier/larger tablets in half and merges less-
accessed/smaller tablets together, redistributing them between nodes as needed. If a
certain tablet gets a spike of traffic, Bigtable splits the tablet in two, then moves one of
the new tablets to another node. Bigtable manages the splitting, merging, and
rebalancing automatically, saving you the effort of manually administering your tablets.
Labs:
7. What are the various ways to authenticate the Google Compute Engine
API?
Different methods are available for Google Compute Engine API authentication:
Using OAuth 2.0
Through client library
Directly with an access token
14.Which is the best storage type in GCP is for long term archives? Coldline
Storage is a better choice than Standard Storage or Nearline Storage in
scenarios where slightly lower availability, a 90-day minimum storage duration,
and higher costs for data access are acceptable trade-offs for lowered at-rest
storage costs
https://onlinecourses.nptel.ac.in/noc20_cs55/preview
https://learndigital.withgoogle.com/digitalgarage/course/gcloud-
computing-foundations
14. REAL TIME APPLICATIONS
Cloud Storage:
Drop Box
Gmail
Facebook
Marketing:
Maropost
Hubspot
Adobe Marketing Cloud
Education:
SlideRocket
Ratatype
Amazon Web Services
Healthcare:
ClearData
Dell’s Secure Healthcare Cloud
IBM Cloud
15. ASSESSMENT SCHEDULE
Name of the
S.NO Assessment Start Date End Date Portion
REFERENCES:
1. https://cloud.google.com/docs
2. https://www.cloudskillsboost.google/course_templates/153
3. https://nptel.ac.in/courses/106105223
17. MINI PROJECT
You have started a new role as a Junior Cloud Engineer for Jooli, Inc. You are expected to help
manage the infrastructure at Jooli. Common tasks include provisioning resources for projects.
You are expected to have the skills and knowledge for these tasks, so step-by-step guides are not
provided.
1. Create all resources in the default region or zone, unless otherwise directed.(CO2, K2)
Containerized Applications
Application containerization can be defined as a method that is OS-level virtualization. It
is used for running and deploying shared applications without publishing an entire VM
(Virtual Machine) for all apps. More than one isolated services and applications execute
on an individual host and use a similar OS kernel.
Various containers work over virtual machines, cloud instances, and bare-metal systems,
across Mac OSes, Linux, and select windows.
It is a developing technology, i.e., modifying the format developers run, and test that
instance of an application within the cloud.
The layer of container virtualization is able for scaling up micro-services to match growing
demands for various components of the application and load distribution. The developers
can define the physical resources set as disposable VMs with virtualization. Also, this
setup encourages flexibility. For example, when the developers want the variation
through a standard image, then they can make any container that holds the newly created
library within the virtualized environment.
The developers make modifications to the code within a container image for updating an
application. Then, the developers redeploy the image for running over the host OS.
ECR (Amazon Elastic Container Registry): It is a product of Amazon Web Services that
deploys, manages, and stores the Docker images. These services are some managed
clusters of the instances of Amazon EC2. Amazon Elastic Container Registry hosts the
images within a highly scalable and available architecture. It can enable developers for
deploying containers for the applications dependably.
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.