Nptel Cloud
Nptel Cloud
▪ Enhanced accuracy
▪ Increased productivity
Types of SLAs
● Off-the-Shelf/Non-Negotiable/Direct SLAs: Predefined SLAs that are not
open to negotiation. Common in public cloud offerings.
● Negotiable SLAs: SLAs that are customized through negotiation, potentially
involving external agents.
Service Level Management
● Monitoring and Management: Ongoing processes to ensure service
performance aligns with SLOs.
● Provider Perspective: Decision-making based on business objectives and
technical constraints.
● Consumer Perspective: Evaluation of cloud services suitability for
organizational needs.
SLA Considerations
● Business Objectives: Alignment of the SLA with the business goals of both
the provider and consumer.
● Responsibilities: Clear definition of the division of responsibilities between
the provider and consumer, varying based on the service type.
● Business Continuity and Disaster Recovery: Consumer assurance of
adequate provider protection in case of disasters.
● Maintenance: SLA provisions regarding infrastructure maintenance impact.
● Data Location: Ability for the consumer to audit data location compliance.
● Data Seizure: Potential impact on other consumers in a multi-tenant
environment if law enforcement targets data of a specific consumer.
● Provider Failure: Consequences of provider failure.
● Jurisdiction: Location for dispute resolution.
Additional SLA Requirements
● Security: Data encryption and key management.
● Privacy: Data isolation in multi-tenant environments.
● Data Retention and Deletion: Policies for data retention and deletion.
● Hardware Erasure: Secure disposal of hardware.
● Regulatory Compliance: Adherence to relevant regulations.
● Transparency: Open communication with the consumer.
● Certification: Provider adherence to recognized standards.
● Monitoring: Robust monitoring systems.
● Auditability: Consumer right to audit provider systems and procedures.
Key Performance Indicators (KPIs)
● KPIs: Low-level resource metrics used to measure service performance and
contribute to higher-level SLOs.
● Examples: Downtime, uptime, data transfer rates.
● KPIs are directly measured from system parameters and inform the
calculation of SLOs, which are then aggregated into the overall SLA.
Industry-Defined KPIs and Monitoring
● Monitoring Responsibility: The role of third-party organizations in
monitoring provider performance to ensure neutrality and eliminate conflicts of
interest.
● Common Metrics: Throughput, availability, reliability, load balancing,
durability, elasticity, linearity.
Examples of Cloud Provider SLAs
● Amazon EC2 (IaaS): 99.5% availability.
● Amazon S3 (Storage as a Service): 99% availability.
● Google, Salesforce, Microsoft (IaaS, PaaS): 99.9% availability.
Lecture 12 Notes: Economics of Cloud Computing
Economic Drivers of Cloud Computing
● Common Infrastructure: Shared, standardized resources with benefits
derived from statistical multiplexing.
● Multiplexing benefits: Leveraging shared resources across multiple
workloads to increase utilization and reduce costs.
● Location Independence: Ubiquitous availability, leading to latency reduction
and enhanced user experience.
● Online Connectivity: Enabling attribute for cost-effective service delivery.
● Utility Pricing: Pay-as-you-go model for resource consumption.
● On-Demand Resources: Scalable and elastic provisioning and
de-provisioning with minimal management overhead.
Economies of Scale
● Reduced Overhead Costs: Benefits of bulk purchasing and shared
infrastructure.
● Statistical Multiplexing: Increased utilization through the aggregation of
diverse workloads, leading to lower costs compared to unconsolidated
workloads.
● Reduced SLA Violations: Multiplexing can help mitigate service disruptions,
minimizing revenue loss and SLA penalties.
Coefficient of Variation (CV)
● Coefficient of Variation (CV): A statistical measure that quantifies the degree
of dispersion in data around the mean. Calculated as the ratio of standard
deviation (σ) to the mean (μ): CV=σ/μ.
● Importance in Cloud Economics: Helps assess the risk associated with
variable demand for cloud services.
● Smoother Curves: A lower CV indicates a smoother demand curve, implying
more predictable resource needs and potentially lower costs.
● Multiplexing Benefits: Aggregating demand from multiple sources can
reduce the CV and contribute to a smoother, more predictable overall demand
pattern.
Real-World Considerations
● Correlated Demands: The reality that demands for cloud services are often
correlated, impacting the effectiveness of economies of scale.
● Location Value: Latency considerations associated with the distance
between users and cloud resources.
● Global User Base: The need for a distributed service architecture to support
a globally dispersed user base.
Value of Utility Pricing
● Economic Viability: Cloud services need not always be cheaper to be
economically advantageous.
● Demand Variability: Utility pricing aligns costs with usage patterns, making it
beneficial for scenarios with fluctuating demand.
● Peak-to-Average Demand Ratio: A key factor in determining the economic
benefit of cloud services, with higher ratios favoring cloud adoption.
● Additional Cost Considerations: Network costs, interoperability overhead,
reliability, and accessibility need to be factored into economic assessments.
Value of On-Demand Services
● Demand Matching: On-demand provisioning eliminates penalties associated
with owning resources that don't match instantaneous demand.
● Penalty Calculation: Penalties are incurred for both underutilized resources
and for failing to meet service delivery due to insufficient resources.
● Demand Characteristics: The nature of demand (flat, linear, non-linear)
influences the effectiveness of on-demand provisioning.
● Challenges with Non-Linear Demand: Exponential demand growth poses
significant challenges for resource provisioning, as even with fixed
provisioning intervals, the system may fall behind, leading to exponentially
growing penalties.
Lecture 13 Notes: Managing Data in the Cloud
Challenges of Data Management in the Cloud
● Data Security and Control: Ensuring data security and maintaining control
over data stored in a third-party environment.
● Scalability: Handling massive data volumes and ensuring system scalability
to accommodate growth.
● Query Efficiency: Optimizing data access and query execution in a
distributed cloud environment.
Relational Databases (RDBMS)
● Interaction via SQL: User applications interact with RDBMS using Structured
Query Language (SQL).
● Query Optimization: The RDBMS parser optimizes query execution time.
● Disk Space Management: Data is stored in pages for efficient retrieval.
● Database File System: Independent file system for optimized data
management.
● Parallel I/O: Support for parallel input/output operations for enhanced
performance.
● Row-Oriented Storage: Traditional storage format suitable for write-intensive
operations and transaction processing applications.
● Column-Oriented Storage: Efficient for data warehouse workloads and
aggregate operations.
Parallel Database Architectures
● Shared Memory: Suitable for servers with multiple CPUs sharing the same
memory address space.
● Shared Disk: Independent servers share storage through a high-speed
network (e.g., NAS, SAN).
● Shared Nothing: Independent servers with their own disk space connected
via a network.
Advantages of Parallel Databases
● Efficient Query Execution: Leveraging multiple processors to enhance SQL
query performance.
● Data Partitioning and Distribution: Distributing data across processors in
shared-nothing architectures.
● Distributed Query Optimization: The SQL optimizer handles distributed
joins.
● Transaction Isolation: Mechanisms like two-phase commit locking ensure
data consistency.
● Fault Tolerance: Failover mechanisms to handle system failures and ensure
data availability.
Cloud File Systems
● Google File System (GFS): A distributed file system designed for managing
large files across clusters of commodity servers.
● Hadoop Distributed File System (HDFS): An open-source implementation of
GFS.
● Key Features:
o Fault tolerance and failure handling.
o Support for parallel reads, writes, and appends.
o Large file storage through data chunking (GFS) or blocking (HDFS).
o Data replication for redundancy and availability.
Bigtable
● Bigtable: A distributed structured storage system built on GFS.
● Data Access: Data is accessed using row key, column key, and timestamp.
● Data Model:
o Column families and labels for storing name-value pairs.
o Dynamic label creation within column families.
o Multiple data versions stored with timestamps.
● Tablets and Tablet Servers: Tables are divided into tablets, each managed
by a tablet server.
● SS tables: Column families for a given row range are stored in separate
distributed files called SS tables.
● Metadata Management: Metadata is managed by metadata servers and can
also be split into tablets.
● Parallel Operations: Supports concurrent reads and inserts.
Dynamo
● Dynamo: A key-value store developed by Amazon.
● Key Features:
o Supports large volumes of concurrent updates.
o Handles bulk reads and writes.
o Data Model**: Key-value pairs suitable for e-commerce applications.
o Independent of underlying distributed file systems.
o Uses consistent hashing for data distribution and replication.
o Quorum protocol for maintaining consistency.
Datastore
● Datastore: A key-value database store offered by Google (Google App
Engine Datastore) and Amazon (SimpleDB).
● Key Features:
o Column-oriented storage.
o Multiple index tables for efficient querying.
o Horizontal partitioning (sharding) for scalability.
o Lexicographical sorting for key-value storage.
o Entity grouping for transactions.
o Automatic and configurable index creation.
o Query execution optimization based on index selectivity.
Lecture 14 Notes: Introduction to MapReduce
MapReduce: A Parallel Programming Paradigm
● MapReduce: A programming model for processing and generating large data
volumes using parallel computation. Developed by Google for large-scale text
processing.
● Key Features:
o Designed for massively scalable data processing.
o Utilizes tens of thousands of processors.
o Fault-tolerant design handles processor and network failures.
● Hadoop: An open-source implementation of MapReduce.
Parallel Computing Models
● Shared Memory: Processors share the same memory address space.
● Distributed Memory: Processors have their own separate memory.
● Shared Disk: A hybrid model where processors share storage but have their
own memory.
● Shared Nothing: Processors have their own memory and storage.
Challenges in Parallel Computing
● Synchronization: Coordinating operations among multiple processors.
● Communication Overheads: Costs associated with message passing
between processors.
● Work Distribution: Ensuring balanced workload distribution among
processors.
● Scalability: Maintaining efficiency as data size and processor count increase.
Case Study: Word Frequency Counting
● Problem: Determine the frequency of each word in a vast collection of
documents.
● Approaches:
o Approach 1: Divide words among processors (each processor handles
a subset of words).
o Approach 2: Divide documents among processors (each processor
handles all words across a subset of documents).
● Analysis: Approach 2 is more efficient and scalable because it ensures every
read operation contributes to the final result.
The MapReduce Model
● Map Phase:
o Mappers read a portion of the input data.
o Transform key-value pairs into a new set of key-value pairs.
o Write results to intermediate files, one per reducer.
● Reduce Phase:
o Reducers fetch intermediate files from mappers.
o Group results by key and apply a reduce function.
o Write final results back to the distributed file system.
MapReduce Fault Tolerance
● Heartbeat Messages: Periodic checks for processor liveness.
● Task Duplication: The master duplicates tasks assigned to unresponsive
processors.
● Mapper Failure Handling: Reassignment of tasks to other nodes upon
mapper failure.
● Reducer Failure Handling: Reassignment of only remaining tasks upon
reducer failure, as completed tasks are already written to the file system.
MapReduce Efficiency
● Parallel Efficiency: A measure of how effectively the MapReduce
implementation utilizes parallel processing.
● Factors Influencing Efficiency: Data size, computation time, read/write
times, and communication overhead.
MapReduce Applications
● Document Indexing: Creating an index of words and their locations in
documents, essential for web search and structured data handling.
● Relational Operations: Executing SQL statements, including joins and
group-by operations, on large datasets.
Lecture 15 Notes: OpenStack
What is OpenStack?
● OpenStack: An open-source cloud operating system for managing compute,
storage, and networking resources within a data center.
● Key Features:
o Provides a dashboard for administrator control.
o Enables users to provision resources through a web interface.
o Primarily used for Infrastructure as a Service (IaaS).
● Meghamala: An experimental cloud based on OpenStack implemented at IIT
Kharagpur.
OpenStack Capabilities
● Infrastructure as a Service (IaaS):
o On-demand virtual machine provisioning and snapshotting.
o Networking and storage services.
o Multi-tenancy support.
o Resource quotas for users and projects.
OpenStack Architecture
● Major Components:
o Horizon (Dashboard): Provides a web-based interface for interacting
with OpenStack services.
o Neutron (Networking): Enables network connectivity as a service.
o Cinder (Block Storage): Provides persistent block storage for VMs.
o Nova (Compute): Manages the lifecycle of compute instances (VMs).
o Glance (Image Services): Stores and retrieves VM disk images.
o Swift (Object Storage): Stores and retrieves unstructured data objects.
o Ceilometer (Telemetry): Monitors and meters cloud usage for billing
and analysis.
o Keystone (Identity Services): Provides authentication and
authorization services.
OpenStack Workflow
1. User logs into Horizon and requests VM creation.
2. Keystone authenticates the request.
3. Nova initiates VM provisioning.
4. Nova Scheduler selects a suitable host.
5. Neutron configures networking.
6. Cinder provisions block storage.
7. Glance provides the VM image.
8. Swift retrieves the image.
9. A hypervisor on the chosen host creates the VM.
OpenStack Storage Concepts
● Ephemeral Storage: Temporary storage that exists only for the duration of
the VM instance. Managed by Nova.
● Block Storage: Persistent block-level storage that can be attached to VMs.
Managed by Cinder.
● Object Storage: Persistent storage for unstructured data objects, accessible
from anywhere. Managed by Swift.
o App Services:
o View the deployed application using the provided URL, which includes
the project's unique identifier.
o Shut down the project if needed.
● The lecture emphasizes the step-by-step procedures for hosting a basic
webpage and building a web application using GCP.
● It encourages exploration of additional services and tools available on GCP
for developing various applications.
● The lecture aims to provide a practical understanding of how cloud platforms
operate from a user's point of view.
Example 1:
Example 2:
● Company X uses cloud services from Provider P with the following SLA:
○ Availability: 99.95%
○ Service period: 30 days
○ Maximum service hours per day: 12 hours
○ Cost: $50/day
● Penalty:
○ 10% service credit if uptime is between 99% and 99.95%.
○ 25% service credit if uptime is less than 99%.
● Five outages occurred during the service period, totaling 9 hours and 25
minutes of downtime.
● Calculations:
○ Total cost: $50/day * 30 days = $1500.
○ Service availability: 1 - (downtime / uptime) * 100 = 97.385%.
○ Service credit: 25% of total cost = $375.
○ Effective cost: $1500 - $375 = $1125.
Cost Comparison:
● Cloud Cost (CT) is influenced by demand over time (D(t)), peak demand (P),
average demand (A), baseline unit cost (B), and cloud unit cost (C).
● Total Baseline Cost (BT) is calculated as peak demand (P) * baseline unit
cost (B) * time (t).
● Utility Premium (U) is the ratio of cloud unit cost (C) to baseline unit cost (B).
● Cloud is cheaper when:
○ CT < BT
○ U < P/A (utility premium is less than the ratio of peak demand to
average demand).
Real-World Considerations:
● Demand spikes can occur due to events like news stories or promotions.
● Hybrid models can be used, where organizations own resources for baseline
usage and use cloud resources for peak demand.
● Factors like network costs, interoperability overhead, reliability, and
accessibility should also be considered.
Penalty Cost:
Example 1:
Example 2:
Example 3:
Two Phases:
● Three mappers process chunks of text data and count word occurrences.
● Two reducers aggregate counts for specific words from the mappers.
Types of Resources:
VM Control Techniques:
● Scheduling:
○ Greedy algorithms to consolidate VMs on multi-core nodes.
○ Live migration to move VMs to underutilized nodes and shut down idle
nodes.
● Management:
○ Minimizing VM instances and removing unnecessary packages and
services.
○ Optimizing Linux kernel for cloud environments.
● Reliability.
● Ease of deployment.
● Quality of service.
● Control.
Key Considerations:
● Resource management approaches should be tailored to specific application
requirements and user needs.
● Consider the trade-offs between different metrics to achieve optimal resource
utilization without compromising performance.
● Effective resource management is essential for achieving the benefits of cloud
computing, such as scalability, cost-effectiveness, and high performance.
Security Attacks
Threats
Security Goals
● Fully Secure: All reachable system states are within the set of secure states.
● Precise Security: Reachable states exactly match secure states.
● Broad Security: Some reachable states fall outside the secure set.
Assurance
Human Issues
Attack Types
Security Services
Network Security
Vulnerability Scanning
Penetration Testing
Post-Attack Investigation
● Traditional security focuses on keeping attackers out, while cloud security also
addresses internal threats.
● Cloud security risks are unique and require specific considerations.
● Security.
● Performance.
● Availability.
Co-Tenancy
● Multiple users sharing the same physical infrastructure, posing security risks.
Inter-Cloud Communication
Case Study: "Hey, You, Get Off of My Cloud: Exploring Information Leakage in
Third-Party Compute Clouds"
● Trust and dependency: Customers must trust cloud providers with their data
and computations.
● Multi-tenancy: Risks associated with sharing resources with other customers.
Attack Model
● Placement: Attacker gains co-residency with the victim on the same physical
hardware.
● Extraction: Attacker exploits side channels to extract confidential information.
Threat Model
● Assumes that the cloud provider and infrastructure are trusted.
● Does not consider attacks that subvert administrator functions or exploit
hypervisor vulnerabilities.
● Adversaries: Malicious parties not affiliated with the provider.
● Victims: Users running confidential services on the cloud.
Co-Residency Attack
Determining Co-Residency
Causing Co-Residency
Instance Flooding
Exploiting Co-Residency
● Cross-VM attacks: Gaining information, creating covert channels, influencing
resource usage.
● Cache timing attacks: Analyzing cache usage patterns to infer information.
Preventive Measures
Summary
Security Responsibilities
Challenges
● Data integrity: Ensuring the integrity of data shared among multiple users.
● Choosing an ideal vendor: Selecting a trustworthy and secure service
provider.
Types of Collaboration
Objectives
SelCSP Framework
● Utilizes a fuzzy inference system for distributed tracking and access control.
● Maps permissions to local roles using a heuristic for the IDRM availability
problem.
Conflict Removal
● Exactly match role set exists: Utilizing existing roles to resolve conflicts.
● No exactly match role set exists: Creating virtual or collaborating roles to
address conflicts.
Summary
Motivations
Objectives
Approaches
Provider Offerings
Migration Decision
● Fuzzy inference engine: Used to determine the need for migration based on
SLA satisfaction.
● Input: Factors influencing SLA satisfaction.
● Output: Degree of SLA satisfaction.
● Migration threshold: If satisfaction falls below the threshold, migration is
initiated.
Case Studies
Results
Future Scope
Conclusion
Here are the notes from lectures 41, 42, 43, 44, 45, 46, 47, 48, 49, and 50:
● Docker Engine installation: The first step for running Docker containers is to
install the Docker engine on the target system.
● Docker Hub: A repository containing a vast collection of pre-built Docker
images, including popular software like MySQL and phpMyAdmin.
● Docker image: A lightweight, standalone, and executable package containing
all necessary components to run an application.
● Demo objectives:
○ Install the Docker engine on an Ubuntu system.
○ Download MySQL and phpMyAdmin Docker images from Docker Hub.
○ Modify the MySQL database and create tables and records.
○ Package the modified MySQL image and transfer it to a Windows
system with Docker engine installed.
○ Run the transferred MySQL and phpMyAdmin containers on the
Windows system without requiring separate installations.
● Advantages of using Docker:
○ Eliminates the need for separate installations and configurations on
different systems.
○ Simplifies application deployment and ensures consistent execution
across environments.
○ Preserves data and application state during transfer.
Lecture - 50: Docker Container - Demo (Continued)
● 5G: The fifth generation of mobile networks, a new global wireless standard
designed to connect various devices, machines, and objects.
● Key Features of 5G:
○ Higher data speeds.
○ Ultra-low latency.
○ Increased reliability.
○ Massive network capacity.
○ Improved availability.
○ More uniform user experience.
● Benefits of 5G for Cloud Computing:
○ Enhanced distribution and diversity of computing and storage
resources.
○ Improved support for heterogeneous environments.
○ Closing the gap between resource-constrained devices and distant
cloud centers.
● Edge Computing in 5G: Brings cloud capabilities closer to the user, enabling
faster processing and lower latency.
● Mobile Edge Computing (MEC): A key enabler for 5G, providing cloud
resources at the edge of the network.
● 5G's Role in Meeting Network Traffic Needs:
○ Handles massive amounts of data generated by mobile devices and
IoT.
○ Supports the stringent QoS requirements of interactive applications.
○ Provides a heterogeneous environment for interoperability among
diverse devices.
● Applications of Edge Computing:
○ Healthcare.
○ Entertainment and multimedia.
○ Smart cities, transportation, and logistics.
○ Industrial automation.
○ AR/VR applications.
● Mobile Cloud Computing (MCC): Integrates cloud computing with mobile
networks, enabling resource sharing and service delivery to mobile devices.
● Synergy Between 5G and Cloud Computing: 5G's high bandwidth, low
latency, and reliability create a strong foundation for a more powerful and
efficient cloud computing ecosystem.