Get The Most, From The Best!!
Get The Most, From The Best!!
Topics Labs
Elastic Load Balancing
Lab3:
Amazon EC2 Auto Scaling Using Auto Scaling in AWS
Amazon Route 53
Get the Most, from the Best!!
In a traditional data center environment, the scalability of your system is bound by your
hardware. Take the example of a tax preparation business in the United States. US
taxpayers must file their taxes by April 15. Online tax preparation companies know that
they will experience a steady flow of traffic starting near the middle of January, with
traffic peaking close to the April 15 deadline. In a data center, anticipating this four-
month period of heavy utilization requires spinning up enough physical servers to
handle the anticipated load. But what happens to those servers the rest of the year?
They sit idle in the data center.
What's required to implement such a system? Let's see how several AWS services can
be used together to create a scalable, on-demand architecture.
Get the Most, from the Best!!
Amazon Auto Scaling helps you maintain application availability and allows you to
dynamically scale your capacity up or down automatically according to conditions you
define.
Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS)
web service.
Get the Most, from the Best!!
Get the Most, from the Best!!
High availability
Health checks
Security features
TLS termination
Elastic Load Layer 4 or Layer 7 load balancing
Balancing
Operational monitoring
High Availability
Elastic Load Balancing automatically distributes traffic across multiple targets –
Amazon EC2 instances, containers and IP addresses – in a single Availability Zone or
multiple Availability Zones.
Health Checks
Elastic Load Balancing can detect unhealthy targets, stop sending traffic to them, and
then spread the load across the remaining healthy targets.
Security Features
Use Amazon Virtual Private Cloud (Amazon VPC) to create and manage security
groups associated with load balancers to provide additional networking and security
options. You can also create an internal (non-internet-facing) load balancer.
TLS Termination
Elastic Load Balancing provides integrated certificate management and SSL
decryption, allowing you the flexibility to centrally manage the SSL settings of the
load balancer and offload CPU intensive work from your application.
Layer 4 or Layer 7 Load Balancing
You can load balance HTTP/HTTPS applications for layer 7-specific features, or use
strict layer 4 load balancing for applications that rely purely on the TCP protocol.
Operational Monitoring
Elastic Load Balancing provides integration with Amazon CloudWatch metrics and
request tracing in order to monitor performance of your applications in real time.
Get the Most, from the Best!!
• Flexible application management • Extreme performance and static IP for • For applications that use the EC2-Classic
• Advanced load balancing of HTTP and your application network
HTTPS traffic • Load balancing of TCP traffic • Operates at both the request level and
• Operates at the request level • Operates at the connection level (Layer connection level
(Layer 7) 4)
Elastic Load Balancing supports three types of load balancers: Application Load Balancers,
Network Load Balancers, and Classic Load Balancers. You can select a load balancer based on
your application needs.
An Application Load Balancer (ALB) functions at the application layer, the seventh layer of the
Open Systems Interconnection (OSI) model. Application Load Balancers support content-
based routing, and supports applications that run in containers. They support a pair of
industry-standard protocols (WebSocket and HTTP/2) and also provide additional visibility into
the health of the target instances and containers. Web sites and mobile apps, running in
containers or on EC2 instances, will benefit from the use of Application Load Balancers. The
Application Load Balancer is ideal for advanced load balancing of HTTP and HTTPS traffic, ALB
provides advanced request routing that supports modern application architectures, including
microservices and container-based applications.
The Network Load Balancer (NLB) is designed to handle tens of millions of requests per
second while maintaining high throughput at ultra low latency, with no effort on your
part. Network Load Balancer operates at the connection level (Layer 4), routing connections
to targets - Amazon EC2 instances, containers and IP addresses based on IP protocol data. The
Network Load Balancer is API-compatible with the Application Load Balancer, including full
programmatic control of Target Groups and Targets. The Network Load Balancer is ideal for
load balancing of TCP traffic, NLB is capable of handling millions of requests per second while
maintaining ultra-low latencies. NLB is optimized to handle sudden and volatile traffic patterns
while using a single static IP address per Availability Zone.
The Classic Load Balancer (CLB) provides basic load balancing across multiple
Amazon EC2 instances and operates at both the request level and connection level.
Classic Load Balancer is intended for applications that were built within the EC2-
Classic network. The Classic Load Balancer is ideal for applications that were built
within the EC2-Classic network.
If you only provide one subnet, ELB will launch two nodes in the Availability
Zone of your subnet.
Get the Most, from the Best!!
If you only provide one subnet, ELB will launch two nodes in the Availability
Zone of your subnet.
If you provide two subnets in different Availability Zones, ELB will launch
one node in each Availability Zone.
Get the Most, from the Best!!
54.234.123.234
Endpoint
name.region.elb.amazonaws.co
m
ENI IPs
54.234.123.234
54.234.123.235
54.234.123.235
54.234.123.234
54.234.123.235
Load Distribution
Load is distributed to each load balancer node via DNS round robin. Load
balancer nodes distribute HTTP traffic to instances using “least outstanding
requests” algorithm.
Get the Most, from the Best!!
54.234.123.234
54.234.123.235
Load Balancer
A load balancer serves as the single point of contact for clients. The load balancer
distributes incoming application traffic across multiple targets, such as EC2 instances,
in multiple Availability Zones. This increases the availability of your application. You
add one or more listeners to your load balancer.
A listener checks for connection requests from clients, using the protocol and port
that you configure, and forwards requests to one or more target groups, based on the
rules that you define. Each rule specifies a target group, condition, and priority. When
the condition is met, the traffic is forwarded to the target group. You must define a
default rule for each listener, and you can add rules that specify different target
groups based on the content of the request (also known as content-based routing).
Each target group routes requests to one or more registered targets, such as EC2
instances, using the protocol and port number that you specify. You can register a
target with multiple target groups. You can configure health checks on a per target
group basis. Health checks are performed on all targets registered to a target group
that is specified in a listener rule for your load balancer.
The above diagram illustrates the basic components. Notice that each listener
contains a default rule, and one listener contains another rule that routes requests to
a different target group. One target is registered with two target groups.
AWS Auto Scaling monitors your applications and automatically adjusts capacity to
maintain steady, predictable performance at the lowest possible cost. Using AWS
Auto Scaling, it’s easy to setup application scaling for multiple resources across
multiple services in minutes. The service provides a simple, powerful user interface
that lets you build scaling plans for resources including Amazon EC2 instances and
Spot Fleets, Amazon ECS tasks, Amazon DynamoDB tables and indexes, and Amazon
Aurora Replicas. AWS Auto Scaling makes scaling simple with recommendations that
allow you to optimize performance, costs, or balance between them. If you’re already
using Amazon EC2 Auto Scaling to dynamically scale your Amazon EC2 instances, you
can now combine it with AWS Auto Scaling to scale additional resources for other
AWS services. With AWS Auto Scaling, your applications always have the right
resources at the right time.
It’s easy to get started with AWS Auto Scaling using the AWS Management Console,
Command Line Interface (CLI), or SDK. AWS Auto Scaling is available at no additional
charge. You pay only for the AWS resources needed to run your applications and
Amazon CloudWatch monitoring fees.
Launch Configuration/Launch EC2 Auto Scaling Group EC2 Auto Scaling Policy
Template
Logical group of EC2 instances Parameters for performing an
Instance configuration to be Amazon EC2 auto scaling action
launched: Automatically scale between:
• AMI • Min How to trigger policies?
• Instance type • Desired (optional) • Amazon CloudWatch-driven
• Security group • Max • Instance failure (health check)
• Instance key pair • Scheduled
• Storage Integration with Elastic Load • Manually
• IAM roles Balancing (optional)
• User data Scale out/in and by how much
Health checks to maintain group size • ChangeInCapacity (+/-#)
Only one active launch configuration • ExactCapacity(#)
at a time. Distribute and balance instances • ChangeInPercent (+/-%)
across Availability Zones.
• Cooldown period (simple scaling)
• Warmup period (step scaling)
Creating a launch configuration works much like creating an individual instance: you
must specify the same characteristics—IAM roles, security groups, storage, instance
type, user data, key pairs, etc. However, you do not specify the VPC or subnet in which
your instances will launch. That will be specified by the Auto Scaling group that uses
your launch configuration.
Launch configurations do allow you to specify one networking option: whether or not
to automatically assign a public IP address to each new instance that is created from
the launch configuration. Note that it is not necessary to select this option if your
instances will be launched in a private subnet behind a public Elastic Load Balancing
load balancer.
There are two basic ways to trigger changes to an EC2 Auto Scaling group. First, you can
define a scaling policy that either scales out or scales in based on a CloudWatch Alarm.
You can define a CloudWatch Alarm—e.g., "Average CPU Utilization > 50%"—that calls
a scaling policy. The policy will specify to either add or remove a fixed number of
instances or to adjust the number of running instances as a percentage of the desired
capacity of the EC2 Auto Scaling group.
Second, you can define a scheduled action. Scheduled actions set a new desired
capacity value at a specific time. You can specify a scheduled action to trigger on a
specific date and time or specify a recurring action that is executed at specific times
throughout a week, month, or year. Scheduled actions are an excellent way to pre-
warm capacity in response to anticipated traffic spikes.
Surge Queue Length is the number of requests queued by the load balancer while
waiting for a back-end instance to accept connections and process the request.
To manually change an EC2 Auto-Scaling group size, use the set-desired-capacity
operation in the following manner: aws autoscaling set-desired-capacity --auto-
scaling-group-name my-asg --desired-capacity 2 --honor-cooldown
Get the Most, from the Best!!
EC2 Auto Scaling maintains health state for instances and terminates instances
marked Unhealthy.
By default, it uses Amazon EC2 instance status checks.
If an Auto Scaling group is behind a load balancer, either the load balancer's instance
checks or the Amazon EC2 instance checks are used.
External scripts can trigger the recycling of an instance with the aws autoscaling set-
instance-health command.
Note:
• If you call TerminateInstance and then SetDesiredCapacity, you risk of having ASG
relaunch the "failed" instance.
• If you call SetDesiredCapacity and then TerminateInstance, you risk having Auto
Scaling terminate an instance other than the one with the least user sessions
followed by the TerminateInstance call terminating the intended instance, in which
case ASG will re-launch another instance.
ClosestToNextInstanceHour Terminates the instance closest to the next billable hour (default)
Get the Most, from the Best!!
EC2 Auto Scaling health checks allow us to create a "steady state" group that ensures
that a single instance is always running. This is useful for situations such as a Network
Address Translation (NAT) server, which is a single point of failure in a standard
public/private subnet architecture.
To create a steady state group for an instance, first create a launch configuration that
creates the instance. Then, create an EC2 Auto Scaling group with a minimum,
maximum, and desired size of 1. Whenever such an instance is marked as unhealthy
(e.g., an instance check fails, or an external script marks the instance as unhealthy
with a call to EC2 Auto Scaling set-instance-health), the EC2 Auto Scaling group will
terminate the existing instance and create a new one from the group's launch
configuration.
Note that in cases such as deploying a NAT instance, the NAT is still a single point of
failure, and you can still experience significant downtime while a failed NAT instance
is recycling. We cover more advanced strategies for high-availability NAT architecture
in our Advanced Architecting course.
Get the Most, from the Best!!
To configure your Auto Scaling group to scale based on a schedule, you create a
scheduled action, which tells Amazon EC2 Auto Scaling to perform a scaling action at
specified times. To create a scheduled scaling action, you specify the start time when
the scaling action should take effect, and the new minimum, maximum, and desired
sizes for the scaling action. At the specified time, Amazon EC2 Auto Scaling updates
the group with the values for minimum, maximum, and desired size specified by the
scaling action.
Get the Most, from the Best!!
Forecast load
Schedule minimum capacity
Use with:
Dynamic scaling target tracking
Applications that have periodic spikes
AWS also provides Predictive Scaling. You can use Predictive Scaling to scale your
Amazon EC2 capacity in advance of traffic changes. Auto Scaling enhanced with
Predictive Scaling delivers faster, simpler, and more accurate capacity provisioning,
resulting in lower cost and more responsive applications.
Predictive Scaling predicts future traffic based on daily and weekly trends, including
regularly-occurring spikes, and provisions the right number of EC2 instances in
advance of anticipated changes. Provisioning the capacity just in time for an
impending load change makes Auto Scaling faster than ever before. Predictive
Scaling’s machine learning algorithms detect changes in daily and weekly patterns,
automatically adjusting their forecasts. This removes the need for manual adjustment
of Auto Scaling parameters over time, making Auto Scaling simpler to configure and
consume.
Predictive Scaling can be configured through the AWS Auto Scaling console, AWS
Auto Scaling APIs via SDK/CLI, and CloudFormation. To get started, navigate to AWS
Auto Scaling page and create a scaling plan for Amazon EC2 resources that includes
Predictive Scaling. Once enabled, you can visualize their forecasted traffic and the
generated scaling actions within a few seconds.
You can use predictive scaling, dynamic scaling, or both. Predictive scaling works by
forecasting load and scheduling minimum capacity; dynamic scaling uses target
tracking to adjust a designated CloudWatch metric to a specific target. The two
models work well together because of the scheduled minimum capacity already set
by predictive scaling.
Predictive scaling is a great match for web sites and applications that undergo
periodic traffic spikes. It is not designed to help in situations where spikes in load are
not cyclic or predictable.
Get the Most, from the Best!!
15
Scale-out: Add 2 instances when 80% < average
CPU < 100% 10
Min 5
0
Get the Most, from the Best!!
60%
40%
When CPU utilization is more than 60%
and less than 80%: 20%
Scale-In Alarm
• Scale-out alarm is triggered 0%
Min 5
0
Get the Most, from the Best!!
60%
Instance warm-up period
Min 5
0
Get the Most, from the Best!!
60%
Instance warm-up period
40%
When CPU utilization is more than 80%
20%
and less than 100%: Scale-In Alarm
0%
• Scale-out alarm is triggered
• The alarm occurs after the instance warm-up Max: 20
Instance Count
period +1
15
• Since the alarm occurred during an instance warm- +1 +0
up period, two instances are launched 10
Min 5
0
Get the Most, from the Best!!
60%
Min 5
0
Get the Most, from the Best!!
40%
0
Get the Most, from the Best!!
40%
When CPU utilization is more than 40%
20%
and less than 60%: Scale-In Alarm
• No alarm is triggered 0%
Min 5
0
Get the Most, from the Best!!
Cooldown Period
Upon executing a scale in/out, suspend further scaling activities for
a cooldown period of time.
(Used for simple scaling policies)
Example: Suspend scaling for 5 minutes
Thrashing occurs when your settings for scaling are causing the removal of capacity
followed by quickly re-adding capacity.
Alarms invoke actions for sustained state changes only. CloudWatch alarms will not
invoke actions simply because they are in a particular state, the state must have
changed and been maintained for a specified number of periods.
For more information about general Amazon CloudWatch concepts, see
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/cloudwatc
h_concepts.html.
The Auto Scaling cooldown period is a configurable setting for your EC2 Auto Scaling
group that helps to ensure that EC2 Auto Scaling doesn't launch or terminate
additional instances before the previous scaling activity takes effect. After the EC2
Auto Scaling group dynamically scales using a simple scaling policy, EC2 Auto Scaling
waits for the cooldown period to complete before resuming scaling activities. When
you manually scale your EC2 Auto Scaling group, the default is not to wait for the
cooldown period, but you can override the default and honor the cooldown period.
Note that if an instance becomes unhealthy, EC2 Auto Scaling does not wait for the
cooldown period to complete before replacing the unhealthy instance. EC2 Auto
Scaling supports both default cooldown periods and scaling-specific cooldown
periods. Amazon EC2 Auto Scaling supports cooldown periods when using simple
scaling policies, but not when using target tracking policies, step scaling policies, or
scheduled scaling.
Instance Warmup
With step scaling policies, you can specify the number of seconds that it takes for a
newly launched instance to warm up. Until its specified warm-up time has expired, an
instance is not counted toward the aggregated metrics of the Auto Scaling group.
While scaling out, AWS also does not consider instances that are warming up as part
of the current capacity of the group. Therefore, multiple alarm breaches that fall in
the range of the same step adjustment result in a single scaling activity. This ensures
that we don't add more instances than you need.
1. Launch the
3. Act upon the instance
instance.
Scale Out (e.g. install software).
2. Send
4. Add it to the group.
notification.
1. Remove
instance from
the Auto Scaling 3. Act upon the instance
Scale In group. (e.g. retrieve logs)
2. Send 4. Terminate it.
notification.
In some cases, you may need to intervene before an EC2 Auto Scaling action adds to
or subtracts from your EC2 Auto Scaling group. EC2 Auto Scaling group lifecycle hooks
make this easy.
For example, during a scale-out event, EC2 Auto Scaling will launch an instance and
send out a pre-configured notification to a person or application and take no further
action. The receiver of the notification can then perform an action on the instance,
such as install software on it, before adding it to the EC2 Auto Scaling group.
Alternatively, during a scale-in event, EC2 Auto Scaling lifecycle hooks can be used to
remove the instance from service and again send out a notification. Upon receipt of
the notification, the agent can perform an action on the instance, such as retrieve
logs, before terminating the instance.
For more information about the EC2 Auto Scaling lifecycle, see
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroupL
ifecycle.html.
Get the Most, from the Best!!
Get the Most, from the Best!!
Elastic Load Balancing and Auto Scaling allow you to achieve highly flexible, scalable,
and resilient architectural designs. But what if you want to distribute traffic across
regions? There are various reasons you would want to distribute traffic across regions,
including disaster recovery (for widespread outages) and reduced latency (providing
services closer to where users are located).
Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS)
web service. It is designed to give developers and businesses an extremely reliable and
cost-effective way to route end users to internet applications by translating names like
www.example.com into the numeric IP addresses that computers use to connect to
each other, e.g., 192.0.2.1
O P
Amazon
Route 53
User
some-elb-name.us-west-2.elb.amazonaws.com
some-elb-name.ap-southeast-2.elb.amazonaws.com
Assume that you want to distribute your architecture across several regions around the
world and provide the fastest response time. Often, but not always, the region closest
to the user provides the fastest response times.
You can use Amazon Route 53 to perform what is known as latency-based routing
(LBR), which allows you to use DNS to route user requests to the AWS Region that will
give your users the fastest response.
For example, assume that you have load balancers in the US West (Oregon) Region and
in the Asia Pacific (Sydney) Region, and you've created a latency resource record set in
Amazon Route 53 for each load balancer. A user in Barcelona enters the name of your
domain in a browser, and DNS routes the request to an Amazon Route 53 name server.
Amazon Route 53 refers to its data on latency between the different regions and routes
the request appropriately.
In most cases, this means your request is routed to the nearest geographical location:
Australia for a user in New Zealand, or Oregon for a user in Canada.
Note that you now have all of the components of a scalable architecture, which
provides you with resiliency and scalability at different levels:
• Auto Scaling provides scalability of resources across subnets and Availability
Zones within an Amazon VPC.
• An Elastic Load Balancing load balancer handles addressing and health checks
across one or more Auto Scaling groups, routing requests to healthy instances.
• Amazon Route 53 can route traffic to the closest Elastic Load Balancing load
balancer and re-routes traffic to a different Amazon VPC or an entirely separate
AWS Region in the event of a slowdown or catastrophe in another region.
www.example.com Internet
Users
100
95
0 100
50
Load Load
Amazon Route 53
balancer balancer
weighted routing shifts New (Green)
Existing (Blue)
traffic from the old System
System
system to the new
system.
Technologies such as CloudWatch and CloudWatch Logs can be used to monitor the
green environment. If problems are found anywhere in the new environment,
weighted routing can be deployed to shift users back to the running blue servers.
When the new green environment is fully up and running without issues, the blue
environment can gradually be shut down. Due to the potential latency of DNS
records, a full shutdown of the blue environment can take anywhere from a day to a
week.
Get the Most, from the Best!!
Grinder: http://grinder.sourceforge.net/
Jmeter: http://jmeter.apache.org/
Bees with Machine Guns: https://github.com/newsapps/beeswithmachineguns
Get the Most, from the Best!!