AWS Game Tech - Intro Guide to Scalable Game Development
AWS Game Tech - Intro Guide to Scalable Game Development
Introductory Guide
to Scalable Game
INTRODUCTORY GUIDE TO SCAL ABLE GAME DEVELOPMENT ON AWS
2
Not only do you want your game to be compelling, Amazon Web Services (AWS) is a flexible, cost-
but you also want to give players the wide range of effective, easy-to-use cloud service. By running
online features they expect. This includes friend lists, your game on AWS, you can use on-demand
leaderboards, weekly challenges, various multiplayer capacity to scale up and down with your players
modes, ongoing content releases, and more. instead of guessing at server demands and
potentially over or under purchasing hardware.
To successfully execute a game launch, you need to Some of the world’s leading mobile, AAA, and
create momentum. Favorable app store ratings and indie developers, including Rovio, Epic Games,
reviews on popular e-retail channels are critical for and Gearbox Software, have recognized the
promoting awareness and boosting sales—just like advantages of AWS and are successfully running
with the first weekend of a movie release. To increase their games on the AWS Cloud.
favorable ratings, it’s important to deliver features
that excite players. Supporting these features This guide is broken into sections that cover the
requires a server backend. The server backend can different features of modern games, including
consist of the actual game servers for multiplayer friend lists, leaderboards, game servers, messaging,
games or servers that power game services like chat and user-generated content. You can start small
and matchmaking. In the event that your game goes using the AWS components and services you
viral and suddenly explodes from 100 to 100,000 need. Then, you can revisit this guide to evaluate
players, you’ll need a server backend that can scale additional AWS features as your game evolves.
up at a moment’s notice. At the same time, you want
a cost-effective solution, so you don’t overpay for
unused server capacity.
3
Quick jump
1.0 Before you start 3.0 Scaling game servers on AWS 4.0 Scaling data storage for games on AWS 5.0 Scaling game services as
Game design decisions......................................................5 Games as REST APIs........................................................18 Relational vs. NoSQL databases....................................28 asynchronous jobs
Game client considerations..............................................6 HTTP load balancing........................................................19 MySQL.................................................................................29 Leaderboards and avatars...............................................48
2.0 Launching a game backend on AWS Game servers.....................................................................23 Redis...................................................................................32 FIFO queues.......................................................................50
High availability, scalability, and security....................12 Targeted and group messages.......................................25 Amazon DynamoDB.........................................................34
Binary game data with Amazon S3..............................13 Final thoughts on game servers....................................26 Other NoSQL options.......................................................37 6.0 Getting started
Reference architecture for a full game backend........14 Binary game content with Amazon S3.........................40
DEVICE-TO-DEVICE PLAY ANALYTICS ASYNCHRONOUS GAMEPLAY For example, if you have a simple top 10
Players expect their saved games, profiles, and Maximizing long-tail revenue requires games While larger games generally include a real-time leaderboard, you might be able to store
other data to be stored online, allowing them to collect and analyze a large number of metrics online multiplayer mode, developers of all kinds it in a single MySQL or Amazon Aurora
to easily move from device to device. This regarding gameplay patterns, favorite items, of games are realizing the importance of keeping database table. If, instead, you have complex
operation typically involves synchronizing and purchase preferences, and more. To ensure the players engaged with asynchronous features. leaderboards with multiple sort dimensions,
merging local data as you move from one device success of in-game purchases, it’s important An example of asynchronous play includes it might be necessary to use a NoSQL
to another, so a local data storage solution isn’t that new game features target those areas of competing against your friends by tracking technology, such as Amazon ElastiCache or
always the right fit. the game where players are spending their time points, unlocks, badges, or similar achievements. Amazon DynamoDB (which are discussed
and money. Analytics can also provide insights This gives players the feel of a connected game later in this guide).
LEADERBOARDS AND RANKINGS on how to improve gameplay and drive more experience, even if they aren’t online all the time
Players continue to look for a competitive player engagement. or are using slower networks (like 3G or 4G) for
experience similar to classic arcade games. mobile games.
However, the focus is increasingly on friends’ CONTENT UPDATES
leaderboards rather than just a single global high Games that achieve the highest player retention PUSH NOTIFICATIONS
score list. This requires a leaderboard that can tend to have a continuous release cycle of new A common method for bringing players back
sort in multiple dimensions while maintaining items, levels, challenges, and achievements. to a game is to send targeted push notifications
good performance. The trend of games becoming more of a service to their mobile devices. For example, a user
than a single product reinforces the need for might get a notification that their friend beat
FREE-TO-PLAY MODEL constant post-launch changes and frequent their score or that a new challenge or level is
One of the biggest shifts over the past few years updates with new data and game assets. available, drawing them back into the core
has been the widespread move to the free-to- However, it’s important to find a balance in how game experience.
play model. In this model, games are free to much new content you launch and when so as
download and play, and games earn money not to overwhelm players. You can cut costs and UNPREDICTABLE CLIENTS
through advertising and in-app purchases (IAP) increase download speed using a content delivery Modern games run on a wide variety of
for items such as weapons, outfits, power-ups, network (CDN) to distribute this game content. platforms, including mobile devices, consoles,
and boost points. Free-to-play games are funded PCs, and browsers. A player roaming on their
by a small group of players who purchase these SYNCHRONOUS GAMEPLAY portable device can compete against a console
items, while the vast majority of users play for Synchronous multiplayer features enable user on Wi-Fi, and both would expect a consistent
free. This means your game backend needs to players to enjoy real-time interactions with experience. That’s why it’s necessary to use
be as cost-effective as possible, and it must be other players. However, moderation is key here, stateless protocols (for example, HTTP) and
able to scale up and down as needed. Even for as real-time interactions require a constant asynchronous calls as much as possible.
premiere AAA games, a large percentage of connection to the server, which can affect your
revenue comes from content updates and in- game’s performance.
game purchases.
6
1.0 BEFORE YOU START
This guide focuses on the architecture you can deploy ± Use JSON to transport data. It’s compact, cross- ± Never store security-critical data (like AWS access
on AWS. However, your game client implementation platform, fast to parse, has tons of library keys or other tokens) on the client device as
can also impact your game’s scalability. And because support, and contains data type information. If part of your game data or user data. Possessors
frequent network requests from the client use more you have large payloads, use gzip format, as the of access key IDs and secret access keys can make
bandwidth and require more server resources, it majority of web servers and mobile clients have direct HTTP calls using the APIs for individual
affects the cost of running your game backend. Here native support for gzip. Don’t waste time with over- AWS services or programmatic calls to AWS from
are a few important guidelines to follow: optimization—any payload in the range of hundreds the AWS Command Line Interface (AWS CLI), AWS
of kilobytes should be adequate. Developers also Tools for Windows PowerShell, or the AWS SDKs.
± All network calls should be asynchronous and use Apache Avro and MessagePack depending on If a person roots or jailbreaks their device, there’s a
non-blocking. This means that when a network their use case, comfort level with the formats, and risk that they can gain access to your server code,
request is initiated, the game client continues library availability. (Note: An exception to this is user data, and even your AWS billing account. With
on without waiting for a response from the multiplayer gameplay packets, which typically use PC games, your keys likely exist in memory when
server. When the server responds, this triggers UDP protocol, but this is a separate topic.) the game client is running. Pulling those keys
an event on the client, which is handled by a out isn’t difficult for someone with the technical
callback of some kind in the client code. On iOS, ± Use HTTP/1.1 with keepalives, and reuse HTTP know-how. It’s safe to assume that anything you
AFNetworking is one popular approach. Browser connections between requests. This minimizes the store on a game client will be compromised. If
games should use a call such as jQuery.ajax() or overhead your game incurs when making network you want your game client to directly access AWS
the equivalent. And C++ clients should consider requests. Each time you have to open a new HTTP services, consider using Amazon Cognito federated
libcurl, std::async, or similar libraries. Popular socket, it requires a TCP three-way handshake, which identities, which allows your application to obtain
game engines usually include an asynchronous can add upwards of 50 milliseconds. Additionally, temporary, limited-privilege credentials.
method for network and web requests. For repeatedly opening and closing TCP connections will
example, Unity offers UnityWebRequest and accumulate large numbers of sockets in the TIME_ ± As a precaution, never trust what a game client
Unreal Engine has HttpRequest. WAIT state on your server, which consumes valuable sends you. It’s an untrusted source, making it
server resources. important to always validate what you receive. It
can be something as trivial as a device clock set to a
± Always POST any sensitive data from the client past time—but it can also be malicious traffic, such
to the server over SSL. This includes login, stats, as SQL injection or XSS.
saved data, unlocks, and purchases. Because modern
computers are efficient at handling SSL and the
overhead is low, the same applies for any GET, PUT, Many of these concerns are not specific to AWS and
and DELETE requests. AWS offers Elastic Load are typical client/server safety issues, but keeping
Balancing to handle the SSL workload, completely them in mind will help you design a game that
offloading it from your servers. Multiplayer traffic is performs well and is reasonably secure.
generally transmitted over UDP, but it’s encrypted
and decrypted at each end by the developer.
2.0
7
Launching a game
backend on AWS
8
2.0 L AUNCHING A GAME ON AWS
Initial game
backend
To ensure your game can scale Creating an HTTP/JSON API for the bulk of your game AWS Elastic Beanstalk is a deployment management
features allows you to dynamically add instances service that sits on top of other AWS services,
out as it grows in popularity,
and easily recover from transient network issues. including Amazon Elastic Compute Cloud (Amazon
use stateless protocols as much Our game backend (Figure 1) consists of a server EC2), Elastic Load Balancing, and Amazon Elastic Load
Balancing
as possible. that talks in HTTP/JSON, stores data in MySQL, and Relational Database Service (Amazon RDS). Amazon
uses Amazon Simple Storage Service (Amazon S3) EC2 is a web service that provides secure, resizable
for binary content. This type of backend is easy to compute capacity in the cloud. It’s designed to make
develop and scales effectively. at-scale cloud computing easier for developers. The
Amazon EC2 simple web service interface allows
A common pattern for game developers is to run you to obtain and configure computing capacity HTTP/JSON servers
a web server locally on a laptop or desktop for with minimal friction. It reduces the time required
development and push the server code to the cloud to obtain and boot new server instances to merely
when it’s time to deploy. This pattern is best suited minutes, allowing you to quickly scale up or down as
A B
for stateless workloads (such as leaderboards, player your computing requirements change.
data management, and more). It isn’t the best option
Availability Zone Availability Zone
for stateful workloads like game servers. If you follow
this pattern, AWS Elastic Beanstalk can significantly
simplify the process of deploying your code to AWS.
Elastic Load Balancing automatically distributes You can push a zip, web application resource (WAR), or
incoming application traffic across multiple Amazon git repository of server code to AWS Elastic Beanstalk,
EC2 instances. It enables you to achieve fault which launches Amazon EC2 server instances, attaches
tolerance in your applications. Elastic Load Balancing a load balancer, sets up Amazon CloudWatch
offers two types of load balancers that feature high monitoring alerts, and deploys your application
availability, automatic scaling, and robust security: to the cloud. In short, AWS Elastic Beanstalk can
automatically set up most of the architecture shown
CLASSIC LOAD BALANCER in Figure 1. This is covered in detail in the AWS Elastic
This load balancer routes traffic based on Beanstalk Developer Guide.
application- or network-level information. It’s
ideal for simple load balancing traffic across To see AWS Elastic Beanstalk in action, sign in to
multiple Amazon EC2 instances. the AWS Management Console and follow the
Getting started using Elastic Beanstalk tutorial to
APPLICATION LOAD BALANCER create a new environment with the programming
This load balancer routes traffic based on language of your choice. This will launch the sample
advanced application-level information that application and boot a default configuration. You can
includes the content of the request. It’s ideal use this environment to get a feel for the AWS Elastic
for applications that need advanced routing Beanstalk control panel, how to update code, and
capabilities, microservices, and container- how to modify environment settings. If you’re new to
based architectures. AWS, you can use the AWS Free Tier to set up these
sample environments. (Note: The sample production
Amazon RDS makes it easy to set up, operate, environment described in this guide will incur costs
and scale a relational database in the cloud. It because it includes AWS resources that aren’t covered
provides cost-efficient and resizable capacity while under the AWS Free Tier.)
automating time-consuming administration tasks,
such as hardware provisioning, database setup, With the sample application up, we can create a new
patching, and backups. Amazon RDS supports many AWS Elastic Beanstalk application for our game and
familiar database engines, including Amazon Aurora, two new environments, one for development and one
PostgreSQL, MySQL, and more. for production, that we’ll customize for our game. Use
the following table to determine which settings to
change based on the environment type. For detailed
instructions, see Managing and configuring Elastic
Beanstalk applications. Then, follow the instructions
for Creating an Elastic Beanstalk environment in the
AWS Elastic Beanstalk Developer Guide.
11
2.1 INITIAL GAME BACKEND
By using two environments, you can enable a simple, When your new game client is ready for release,
effective workflow. As you integrate new game choose the correct server code version from the For more information, see Advanced environment
backend features, you push your updated code to the development environment. Then, deploy it to the customization with configuration files
development environment. This triggers AWS Elastic production environment. By default, deployments (.ebextensions) in the AWS Elastic Beanstalk
Beanstalk to restart the environment and create a incur a brief period of downtime while your app is Developer Guide.
new version. In your game client code, create two being updated and restarted. To avoid downtime for
configurations, one that points to development and production deployments, you can follow a pattern
one that points to production. Use the development known as swapping URLs or blue/green deployment.
configuration to test your game, and use the In this pattern, you deploy to a standby production
production profile when you want to create a new environment and update DNS to point to the new
game version to publish to the appropriate app stores. environment. For more details on this approach, see
Blue/Green deployments with Elastic Beanstalk in
the AWS Elastic Beanstalk Developer Guide.
12
2.1 INITIAL GAME BACKEND
For the production environment, you should ensure AWS Elastic Beanstalk can automatically deploy
that your game backend is deployed in a fault- across multiple Availability Zones. To use multiple
tolerant manner. Amazon EC2 is hosted in multiple Availability Zones with AWS Elastic Beanstalk, see
AWS Regions worldwide. Choose a Region that’s Auto Scaling group for your Elastic Beanstalk
near the bulk of your game’s customers to provide a environment in the AWS Elastic Beanstalk Developer
low-latency experience. For more information and a Guide. For additional scalability, you can use
list of the latest AWS Regions, see the AWS Global Auto Scaling to add and remove instances from
Infrastructure web page. these Availability Zones. For best results, consider
A B
modifying the Auto Scaling trigger to specify a
Within each Region are multiple, isolated locations metric (such as CPU usage) and threshold based Availability Availability
Zone Zone
known as Availability Zones, which you can think on your application’s performance profile. If the
of as logical data centers. Each of the Availability specified threshold is reached, AWS Elastic Beanstalk
Zones within a given Region is isolated physically but automatically launches additional instances.
connected via high-speed networking, so they can be
used together (Figure 2). A single Availability Zone is usually adequate for
development and test environments. This helps you
Balancing servers across two or more Availability keep costs low—assuming you can tolerate a bit of
Zones within a Region is a simple way to increase downtime in the event of a failure. However, if your
your game’s high availability. You can maintain a development environment is used by QA testers late
good balance between reliability and cost by pairing at night to validate builds, you probably want to treat
server instances, database instances, and cache this more like a production environment. In that case, C
instances together. you should use multiple Availability Zones.
Availability
Finally, set up the load balancer to handle Secure Zone
With your core game backend up and running, the next step
Elastic Load
is to examine other AWS services that could be useful for your Balancing
game. Before continuing, let’s review the following reference Stateful TCP socket HTTP/S HTTP/S TCP
architecture for a horizontally scalable game backend (Figure
3). This diagram depicts a game backend that supports a
broad set of game features, including login, leaderboards, 4 CloudFront CDN
Server
Stateful game servers HTTP/JSON servers Auto HTTP/JSON servers Auto Stateful game servers PUT
security group Scaling group Scaling group security group
6 8 Amazon S3 for
SQS for job binary game assets
CACHE CACHE queues
Writes SNS for push 3
messages
R RDS MySQL R
Figure 3 may seem overwhelming at first, but it's really just an evolution of the initial game backend launched Look at a single Availability Zone
using AWS Elastic Beanstalk. This key explains the numbers in the diagram: in Figure 3 and compare it to the
core game backend we launched
The diagram shows two Availability Zones If your game has features that require stateful As your database load continues to grow, you can
1 4 7 with AWS Elastic Beanstalk.
that are set up with identical functionality for sockets, such as chat or multiplayer gameplay, add Amazon RDS read replicas to help scale out
redundancy. Due to space constraints, not all game servers typically run code specifically for your database reads even further. Because you
components are shown in both Availability those features. These servers run on Amazon can read from the replica and you only access You can see how scaling your game builds
Zones—but both would function the same way. EC2 instances separate from your HTTP the master database to write, this also helps on the initial backend pieces with the
These Availability Zones could be the same as instances. For more information on stateful reduce the load on your main database. For more addition of caching, database replicas,
the two Availability Zones you initially chose game servers, read the Scaling game servers information on read replicas, read the Relational and background jobs.
using AWS Elastic Beanstalk. on AWS section of this guide. vs. NoSQL databases section of this guide.
2 The HTTP/JSON servers and primary/standby 5 As your game grows and your database load 7B At some point, you may decide to introduce a
databases can be the same ones you launched increases, the next step is to add caching. This NoSQL service, such as Amazon DynamoDB, to
using AWS Elastic Beanstalk. Continue to build is typically done using Amazon ElastiCache, supplement your main database and support
out as much of your game functionality in the the AWS-managed caching service. Caching functionality (like leaderboards). Or, you may
HTTP/JSON layer as possible. You can use HTTP frequently accessed items in ElastiCache choose to take advantage of NoSQL features,
Auto Scaling to automatically add and remove offloads read queries from your database. For such as atomic counters. For more information
Amazon EC2 HTTP instances in response to more information on caching data, read the on NoSQL options, read the Relational vs.
user demand. For more information on HTTP Caching section of this guide. NoSQL databases section of this guide.
Auto Scaling, read the Games as REST APIs (Note: This isn’t shown in Figure 3.)
section of this guide. 6
The next step is to consider moving some of
your server tasks to asynchronous jobs. You can 8
If your game includes push notifications, you can
3
You can use the same S3 bucket you initially use Amazon Simple Queue Service (SQS) to use Amazon Simple Notification Service (SNS).
created for binary data. Amazon S3 is built coordinate this work. Amazon SQS eliminates Amazon SNS supports mobile push notifications
to be highly scalable and needs some tuning dependencies on the other components to simplify the process of sending push messages
over time. As your game assets and user traffic in a loosely coupled system. Two or more across multiple mobile platforms. Your Amazon
continue to expand, you can add Amazon components exist and interoperate to achieve EC2 instances can also receive Amazon SNS
CloudFront in front of S3 to boost download a specific purpose, each with little or no messages. This enables you to do things like
performance and save costs. knowledge of other participating components. broadcast messages to all players who are
For example, if your game allows players to currently connected to your game servers.
upload and share assets like photos or custom
characters, you should execute time-intensive
tasks, such as image resizing in a background
job. This will result in quicker response times
for your game while decreasing the load on
your HTTP server instances.
3.0
16
Scaling game
servers on AWS
17
3.0 SCALING GAME SERVERS ON AWS
Games as
REST APIs
To make use of horizontal Game clients—whether on mobile devices, tablets, doesn’t have these features, you can implement all This is just a sampling—you can build a REST API in
PCs, or consoles—send HTTP requests to your your functionality using a REST API. We’ll discuss any web-friendly programming language. Amazon
scalability, implement most of your
servers for data, such as logins, sessions, friends, stateful servers later in this guide. First, let’s focus on EC2 gives you complete root access to the instance,
game’s features using an HTTP/ leaderboards, and trophies. Clients don’t maintain our REST layer. so you can deploy any of these packages. There are
JSON API, which typically follows long-lived server connections. This makes it easy to some restrictions on supported packages for AWS
scale horizontally by adding HTTP server instances. Deploying a REST layer to Amazon EC2 typically Elastic Beanstalk. For details, see the AWS Elastic
the REST architectural pattern. Clients can recover from network issues by simply consists of an HTTP server, such as Nginx or Apache, Beanstalk FAQs.
retrying the HTTP request. plus a language-specific application server. The
following table lists some of the popular packages AWS API Gateway offers a solution service for
When properly designed, a REST API can scale to game developers use to build REST APIs: creating, publishing, maintaining, monitoring,
hundreds of thousands of concurrent players. RESTful and securing REST, HTTP, and WebSocket APIs
servers are simple to deploy on AWS. And they Language Package at any scale. WebSocket APIs enable real-time
benefit from the wide variety of HTTP development, communication between the server and client,
Node.js Express, Restify, Sails
debugging, and analysis tools available on AWS. making this an excellent choice for multiplayer
Python Eve, Flask, Bottle games. For more information on how to use AWS
Nevertheless, some modes of gameplay—like real- Java Spring, Jersey API Gateway to create WebSocket APIs, see our
time online multiplayer games, chat, and game Go Gorilla Mux, Gin Amazon API Gateway guide.
invites—benefit from a stateful two-way socket that PHP Slim, Silex
can receive server-initiated messages. If your game RESTful servers benefit from medium-sized
Ruby Rails, Sinatra, Grape
instances because more can be deployed
horizontally at the same price point. General-
purpose, medium-sized instances (for example, M5)
or compute-optimized instances (for example, C5)
are a good match for RESTful servers.
19
3.1 GAMES AS REST APIS
Follow these guidelines to get the most out of Elastic Load Balancing:
± Always configure Elastic Load Balancing to ± Each load balancer you deploy must have a
balance between at least two Availability Zones unique Domain Name System (DNS) name. To
for redundancy and fault tolerance. Elastic set up a custom DNS name for your game, you
Load Balancing balances traffic between the EC2 can use a DNS alias (CNAME) to point your game’s
instances in the Availability Zones you specify. domain name to the load balancer. For detailed
If you want an equal distribution of traffic on instructions, see Configure a custom domain
servers, enable cross-zone load balancing—even name for your Classic Load Balancer in the Classic
if there are an unequal number of servers per Load Balancers Guide. Note that when your load
Availability Zone. This ensures optimal usage of balancer scales up or down, the IP addresses that
servers in your fleet. the load balancer uses change. So, it’s important
HTTP load
to use a DNS CNAME alias and to avoid referencing
the load balancer’s current IP addresses in your
± Configure Elastic Load Balancing to handle SSL DNS domain.
encryption and decryption. This offloads SSL from
balancing
your HTTP servers, meaning there’s more CPU for For more information, see What is Elastic Load Balancing?
your application code. For more information, see
Create a Classic Load Balancer with an HTTPS
Listener in the Classic Load Balancers Guide.
EXPLICIT SUPPORT FOR AMAZON ECS that contributes to network traffic. WebSocket is a
The Application Load Balancer can be great use case for delivering dynamic data (like updated
configured to load balance containers across leaderboards) while minimizing traffic and power usage
multiple ports on a single EC2 instance. on mobile devices. Elastic Load Balancing enables the
Dynamic ports can be specified in an ECS task support of WebSockets by changing the listener from
definition, giving the container an unused port HTTP to TCP. In TCP Mode, Elastic Load Balancing
when scheduled on EC2 instances. enables the Upgrade header when a connection is
established. Then, the load balancer terminates any
HTTP/2 SUPPORT connection that’s idle for more than 60 seconds
HTTP/2 (a revised edition of the older (for example, when a packet isn’t sent within that
HTTP/1.1 protocol), together with the timeframe). This means the client has to reestablish
Application Load Balancer, delivers additional the connection. WebSocket negotiations fail if the load
network performance as a binary protocol balancer sends an upgrade request and establishes a
Application
instead of a textual one. Binary protocols can WebSocket connection to other backend instances.
improve stability, as they’re inherently more
efficient to process and are much less prone
to errors than textual protocols. And HTTP/2 If you need specific features or metrics that Elastic Load
Load Balancer
supports multiplexing, which enables the Balancing doesn’t provide, you can deploy your own
reuse of TCP connections for downloading load balancer to Amazon EC2. Popular choices for games
content from multiple origins. It also cuts include HAProxy and F5’s BIG-IP Virtual Edition, both of
down on network overhead. which can run on EC2.
Our Application Load Balancer is a NATIVE IPV6 SUPPORT If you decide to deploy your own load balancer, keep in
second-generation load balancer that With the near exhaustion of IPv4 addresses, mind that there are several aspects you need to handle
many application providers are changing to a on your own. First, if your load surpasses what your
provides more granular control over model that rejects applications without IPv6 load balancer instances can handle, you need to launch
traffic routing based at the HTTP/ support. The Application Load Balancer natively additional EC2 instances. New auto-scaled application
supports IPv6 endpoints and routing to virtual instances aren’t automatically registered with your load
HTTPS layer. The following features
private cloud (VPC) IPv6 addresses. Many balancer instances. So, you need to write a script that
that come with the Application Load platforms require IPv6 as a failback option. updates the load balancer configuration files and restarts
Balancer can be highly beneficial for a the load balancers.
WEBSOCKETS SUPPORT
gaming workload: Like HTTP/2, the Application Load Balancer If you’re interested in HAProxy as a managed service,
supports WebSocket protocol, enabling you to consider AWS OpsWorks, which uses Chef Automate to
set up a longstanding TCP connection between manage EC2 instances and can deploy HAProxy as an
a client and server. This is a much more efficient alternative to Elastic Load Balancing.
method than standard HTTP connections that
are usually held open with a sort of heartbeat
21
3.1 GAMES AS REST APIS
Auto Scaling enables you to scale the number of To use Auto Scaling effectively, choose good metrics
EC2 instances in one or more Availability Zones to trigger scale-up and scale-down activities. Use the
based on system metrics like CPU utilization or following guidelines to determine your metrics:
network throughput.For an overview of Auto Scaling
functionality, see What is Amazon EC2 Auto Scaling?
in the Amazon EC2 User Guide. Then, walk through MONITOR CPUUTILIZATION
Getting started with Amazon EC2 Auto Scaling. This is a good Amazon CloudWatch metric. Web
servers tend to be CPU limited, whereas memory
You can use Auto Scaling with any type of EC2 remains fairly constant when the server processes are
instance, including HTTP, a game server, or a running. A higher percentage of CPUUtilization tends
background worker. HTTP servers are the easiest to to indicate the server is becoming overloaded with
scale because they sit behind a load balancer that requests. For finer granularity, pair CPUUtilization
distributes requests across server instances. Auto with NetworkIn or NetworkOut.
Amazon EC2
Scaling dynamically handles the registration or
deregistration of HTTP-based instances from Elastic BENCHMARK YOUR SERVERS
Load Balancing. This means traffic will be routed to a This helps you determine good values to scale on.
new instance as soon as it’s available. For HTTP servers, you can use a tool like Apache
Auto Scaling
HTTP server benchmarking tool or httperf to
measure server response times. Increase the load on
your servers while monitoring CPU or other metrics.
Then, make note of the point at which your server
response times degrade, and see how it correlates to
The ability to dynamically grow and shrink your system metrics.
server resources in response to user patterns
USE TWO AVAILABILITY ZONES
is a primary benefit of running on AWS. You should also choose a minimum of two servers
when configuring your Auto Scaling group. This
will ensure your game server instances are properly
distributed across multiple Availability Zones for
high availability. Elastic Load Balancing takes care of
balancing the load between multiple Availability Zones.
When you use Auto Scaling with AWS Elastic Beanstalk, it takes care of installing
your application code on new EC2 instances as they scale up. This is one of the
advantages of the managed container that AWS Elastic Beanstalk provides. However,
this approach is only for application servers, not game servers.
If you’re using Auto Scaling without AWS Elastic Beanstalk, you need to get your
application code onto your EC2 instances to implement automatic scaling. If you’re
not already using Chef or Puppet, consider using one of these tools to deploy
application code on your instances. AWS OpsWorks Auto Scaling uses Chef to
configure instances and offers a variant of Auto Scaling that provides both time-
based and load-based automatic scaling. With AWS OpsWorks, you can set up
custom start-up and shut-down steps for your instances as they scale. This is a
great alternative to managing automatic scaling when you’re already using Chef
or if you’re interested in using Chef to manage your AWS resources. For more
information, see Managing Load with Time-based and Load-based Instances in the
AWS OpsWorks User Guide.
If you’re not using any of these packages, you can use the Ubuntu cloud-init
package as a simple way to pass shell commands directly to EC2 instances. With
cloud-init, you can run a simple shell script that fetches the latest application code
and starts up the appropriate services. This solution is supported by the official
Amazon Linux AMI and the Canonical Ubuntu AMIs. For more details on these
approaches, see the AWS Architecture Center.
3.2
23
Game
servers
There are some gameplay However, sometimes a game server’s approach needs The following table lists several packages that allow C++ isn't listed in the table you see here
to be the opposite of a RESTful approach. Clients you to build event-driven servers: because it tends to be the language of choice
scenarios that work well with
establish a stateful two-way connection to the game for multiplayer game servers. Many commercial
an event-driven RESTful model. server via UDP, TCP, or WebSockets—enabling both game engines, such as Amazon Lumberyard
Language Package
For example, turn-based play the client and server to initiate messages. If the and Unreal Engine, are written in C++. This
network connection is interrupted, the client must Python Givent, Twisted enables you to take existing game code from the
and appointment games that perform reconnect logic and possibly logic to reset client and reuse it on the server. It’s particularly
Node.js Core, Socket.io, Async
don't require constant real-time its state. Because clients can’t simply be round-robin
Erlang Core
valuable when running physics or other
load balanced across a pool of servers, stateful game frameworks on the server, such as Havok, which
updates can be built as stateless servers introduce challenges for automatic scaling.
Java JBoss, Netty
frequently only support C++.
game servers using the techniques Ruby Event Machine
Historically, many games have used stateful Go Socket.io Regardless of programming language, stateful
mentioned in the previous section.
connections and long-running server processes for socket servers generally benefit from as large
game functionality, especially in the case of larger an instance as possible because they’re more
AAA and massively multiplayer online (MMO) games. sensitive to issues like network latency. Consider
If you have a game that’s architected in this manner, your game server’s bandwidth requirements
you can run it on AWS. For new games, however, we when determining the best Amazon EC2
encourage you to use HTTP as much as possible for instance type. The largest instances in the
stateless functions. And we only recommend using C2 compute-optimized instance family (for
stateful sockets (like UDP) for aspects of your game example, C5) are often the best options. This
that really need it, such as online multiplayer. new generation of instances uses enhanced
networking via single-root I/O virtualization (SR-
IOV). SR-IOV provides high packets per second,
low latency, and low jitter—making this an ideal
solution for game servers.
24
3.2 GAME SERVERS
1. Ask the user about the type of game they would In this approach, game clients first connect to your
like to join (one-on-one or teams, for example). REST API and request a stateful game server. Next,
2. Look at what game modes are currently being the REST API performs matchmaking logic and gives
played online. clients an IP address and server port to connect to.
The game client then connects directly to that game
3. Factor in variables like the user’s geolocation
server’s IP address. This hybrid approach gives you
(for latency) or ping time, language, and
the best performance for your socket servers because
overall ranking.
clients can directly connect to the EC2 instances. And
4. Place the user on a game server that contains you still get the benefits of using HTTP-based calls
a matching game. for your main entry point.
Game servers require long-lived processes, and they For most matchmaking needs, Amazon GameLift
can't be round-robin load balanced like with an HTTP provides a matchmaking system called FlexMatch.
request. After a player is on a given server, they You can control GameLift FlexMatch via your REST
Matchmaking
remain on that server until the game is over, which API and make calls to the Amazon GameLift API to
could be minutes or hours. initiate matching and return results. You can learn
more about FlexMatch in the Amazon GameLift
In a modern cloud architecture, you should minimize Developer Guide. If this solution doesn't suit your
your usage of long-running game server processes to matchmaking needs, you can find more information
Matchmaking is a feature that draws players in. the gameplay elements that require it. For example, about implementing matchmaking in a serverless
imagine an open-world or MMO game. Some of the custom environment in Fitting the Pattern:
functionality, such as running around the world and Serverless Custom Matchmaking with Amazon
interacting with other players, requires long-running GameLift on the Amazon Game Tech Blog.
game server processes. However, the rest of the API
operations, like listing friends, altering inventory,
updating stats, and finding games to play, can be
easily mapped to a REST web API.
Amazon SNS can help route messages between EC2 Amazon SQS and Amazon SNS are AWS messaging
server instances. For example, let’s assume player 1 services that provide different benefits to developers.
on server A wants to send a message to player 2 on A common pattern is to use Amazon SNS to publish
server C (Figure 4). In this scenario, server A can look messages to Amazon SQS queues to reliably and
at locally connected players. When server A can’t find asynchronously send messages to one or many
player 2, it can forward the message to an SNS topic system components. See the Amazon SQS section
to propagate the message to other servers. later in this guide to learn more about Amazon SQS
use cases for games.
and group
Apache ActiveMQ, or a similar package on Amazon
EC2. The advantage of Amazon SNS is that you don’t
Unlike the previous use case, which is designed to
have to spend time administering and maintaining
handle near-real-time in-game messaging, mobile push
queue servers and software on your own. For more
is the best choice for sending a message to draw a user
messages
information about Amazon SNS, see What is Amazon
back in when they’re out of a game. An example might
SNS? and Creating an Amazon SNS topic in the
be a user-specific event (such as a friend beating your
Amazon Simple Notification Service Developer Guide.
high score) or a broader game event (like a Double-XP
Weekend).
Player 1 Player 2
26
3.2 GAME SERVERS
It’s easy to become obsessed with finding the perfect programming framework or
pattern. Both RESTful and stateful game servers have their place. And any of the
languages discussed in this guide work well when programmed thoughtfully. When
making your choice, consider your overall game data architecture—where data lives,
how to query it, and how to efficiently update it.
4.0
27
Relational vs.
NoSQL databases
With modern game applications It’s important to spend time thinking about your There are many database options out there for both
overall game data architecture—where data lives, relational and NoSQL flavors, but the ones used most
that scale horizontally and
how to query it, and how to efficiently update it. A frequently for games on AWS are Amazon Aurora,
globally with your players, the number of new databases have become popular that Amazon ElastiCache for Redis, Amazon DynamoDB,
traditional approach of using a eschew traditional atomicity, consistency, isolation, Amazon RDS for MySQL, and Amazon DocumentDB
and durability (ACID) concepts in favor of lightweight (with MongoDB capability).
single, large relational database access, distributed storage, and eventual consistency.
becomes less tenable. These NoSQL databases can be especially beneficial First, we’ll cover MySQL because it’s both popular
for games, where data structures tend to be lists and and applicable to gaming. Combinations such as
sets—like friends, levels, and items—as opposed to MySQL and Redis or MySQL and Amazon DynamoDB
complex relational data. are especially successful on AWS. All database
alternatives described in this section support atomic
As a general rule, the biggest bottleneck for online operations, such as increment and decrement, which
games tends to be database performance. A typical are crucial for gaming.
web-based app has a high number of reads and few
writes—think of reading blogs, watching videos, and
so forth. Games are quite the opposite, with reads
and writes frequently hitting the database due to
constant state changes in the game.
29
4.1 REL ATIONAL VS. NOSQL DATABASES
MySQL
Amazon EC2 instances, configuring MySQL, attaching
SINGLE SOURCE OF TRUTH Amazon Elastic Block Store (Amazon EBS) volumes,
MySQL guarantees internal data consistency. setting up replication, running nightly backups, and
Part of what makes many NoSQL solutions so on. In addition, Amazon RDS offers advanced
faster is distributed storage and eventual features, including synchronous Multi-AZ replication
MySQL is the most widely adopted consistency. Eventual consistency means you for high availability, automated primary/standby
can write a key on one node, fetch that key failover, and read replicas for increased performance.
open-source relational database.
on another node, and have it not appear To get started with Amazon RDS, see Getting Started
there immediately. with RDS in the Amazon RDS User Guide.
With more than 20 years of community-backed
development and support, MySQL is a reliable, stable, EXTENSIVE TOOLS
and secure SQL-based database management system. MySQL has been around since the 1990s,
and there are extensive debugging and
data analysis tools available for it. In addition,
SQL is a general-purpose language that’s
widely understood.
30
4.1 REL ATIONAL VS. NOSQL DATABASES
The following are some configuration options that we recommend you As your game grows and your write
implement when you create your RDS MySQL DB instances: load increases, resize your RDS DB
instances to scale up.
DB INSTANCE CLASS ALLOCATED STORAGE SLOW SQL QUERIES Resizing an RDS DB instance requires some
downtime. However, if you deploy the
± You should use a micro instance for ± We recommend 5 GB of storage in To find and analyze slow SQL queries in instance in Multi-AZ mode (as you would
development/test environments development/test environments and 100 production, you should enable the MySQL slow for production), downtime is limited to the
and medium or larger instances for GB minimum in production environments query log in Amazon RDS (as shown in the time it takes to initiate a failover (typically
production environments. to enable provisioned IOPS. following list). These settings are configured a few minutes). For more information, see
using Amazon RDS DB parameter groups. Modifying an Amazon RDS DB Instance in
(Note: There’s a minor performance penalty the Amazon RDS User Guide. In addition,
for the slow query log.) you can add one or more Amazon RDS
read replicas to offload reads from your
MULTI-AZ DEPLOYMENT PROVISIONED IOPS ± Set SLOW_QUERY_LOG = 1 to enable. In primary RDS instance, leaving more cycles
Amazon RDS, slow queries are written to the for database writes. For instructions on
± This isn’t needed for development/test ± This is recommended for production MYSQL.SLOW_LOG table. deploying replicas with Amazon RDS, see
environments, but it’s recommended environments. Provisioned IOPS Working with Read Replicas.
for production environments to enable guarantees a certain level of disk ± Consider decreasing the default LONG_
synchronous Multi-AZ replication and performance, which is important for large QUERY_TIME value to 5, 3, or even 1. (The
failover. For best performance, always write loads. For more information, see default is 10.) The value set in LONG_
launch production on an RDS DB instance Provisioned IOPS Storage in the Amazon QUERY_TIME determines that only queries
that’s separate from any of your Amazon RDS User Guide. that take longer than the specified number
RDS development/test DB instances. of seconds are included.
± This is recommended for hands- ± It’s best to schedule Amazon RDS backup
off upgrades. snapshots and upgrades during your
low player count times, such as early in
the morning. If possible, avoid running
background jobs or nightly reports during
this window to prevent a query backlog.
31
4.1 REL ATIONAL VS. NOSQL DATABASES
There are several key features that Amazon Aurora The following are recommendations for using
Amazon Aurora in your gaming workload:
brings to a gaming workload:
Amazon
In Amazon Aurora, each 10 GB chunk of your database volume is upgrades during low player count times. If possible,
replicated six ways across three Availability Zones. This allows for the avoid running jobs or reports against the database
loss of two copies of data without affecting database write availability during this window to prevent backlogging.
and three copies without affecting read availability. Backups to
Aurora
Amazon S3 are automatic and continuous, offering 99.999999999 If your game grows beyond the bounds of a
percent durability with a retention period of up to 35 days. You can traditional relational database, like MySQL or
restore your database to any second (up to the last five minutes) Amazon Aurora, we recommend that you complete a
during the retention period. performance evaluation, including tuning parameters
and sharding. And consider a NoSQL offering, such
Amazon Aurora is a MySQL-compatible SCALABILITY as Redis or Amazon DynamoDB, to offload some
Amazon Aurora is capable of automatically scaling its storage workloads from MySQL. In the following sections,
relational database engine that
subsystem out to 64 TB of storage. This storage is automatically we’ll cover a few popular NoSQL offerings.
combines the speed and availability of provisioned for you, so you don’t have to provision storage ahead
high-end commercial databases with of time. This means you pay only for what you use, reducing the
costs of scaling. Amazon Aurora can deploy up to 15 read replicas in
the simplicity and cost-effectiveness of any combination of Availability Zones, including cross-region where
open-source databases. Amazon Aurora is available. This allows for seamless failover in case of
an instance failure.
Redis
tools for Redis are limited. Redis isn’t suitable as your
only data store. However, when used in conjunction
with a disk-backed database (such as MySQL or
Amazon DynamoDB) Redis can provide a highly
scalable solution for game data. Redis plus MySQL is a
Best described as an atomic data popular solution for gaming.
MongoDB
Progress is saved to MongoDB at logical points (for example,
at the end of a level or when a new achievement is unlocked).
Redis yields high-speed access for latency-sensitive game
data, and MongoDB provides simplified persistence.
DynamoDB
Range key store for leaderboards, scores, and
date-ordered data
PARTITION KEY
The partition key is a single attribute that
Amazon DynamoDB uses as input to an internal
hash function. This could be a player name,
Stateful game servers HTTP/JSON servers HTTP/JSON servers Stateful game servers game ID, UUID, or similar unique key. Amazon
security group Auto Scaling group Auto Scaling group security group DynamoDB builds an unordered hash index on
this key.
When you’re not sure how much read and write capacity
Amazon DynamoDB shards your data behind the To get the best performance from Amazon you’re going to need for your DynamoDB table, you
scenes to give you the throughput you requested DynamoDB, make sure your reads and writes are can use Amazon DynamoDB on-demand. You can pay
using the concept of read and write units. One read spread as evenly as possible across your keys. Using a for what you end up using while explicitly setting the
capacity unit represents one strongly consistent hexadecimal string, such as a hash key or checksum, capacity you’ll use. So, when you’re first launching your
read per second (or two eventually consistent reads is one easy strategy to inject randomness. new game or new content, you can absorb much of
per second) for an item up to 4 KB in size. One write the unpredictable nature of player activity without the
capacity unit represents one write per second for an risk of limited capacity or slow Auto Scaling responses
For more details on optimizing Amazon DynamoDB
item up to 1 KB in size. The defaults are five read (or even wasted, over-provisioned capacity). Once you
performance, see Best Practices for Designing
and five write units, equating to 20 KB of strongly know your players' read and write patterns, you can
and Architecting with DynamoDB in the Amazon
consistent reads per second and 5 KB of writes per always switch back to provisioned capacity during
DynamoDB Developer Guide.
second. You can increase your read and/or write normal operations to save costs and switch again to the
capacity at any time and by any amount up to your on-demand option for event releases.
account limits. You can also decrease the read and/or
write capacity by any amount, but this can’t exceed
more than four decreases in one day. You can scale
using the AWS Management Console or Amazon CLI
by selecting the table and modifying it appropriately. ! Amazon DynamoDB Accelerator
You can also take advantage of Amazon DynamoDB 3
Auto Scaling service to dynamically adjust Amazon DynamoDB Accelerator (DAX) allows you to
Amazon SNS
provisioned throughput capacity on your behalf Amazon provision a fully managed, in-memory cache that speeds
in response to actual traffic patterns. Amazon CloudWatch up the responsiveness of your Amazon DynamoDB
DynamoDB Auto Scaling works in conjunction 2 tables from millisecond-scale latency to microseconds.
with Amazon CloudWatch alarms that monitor the 1 This acceleration comes without the need for any major
4
capacity units (Figure 6). This service scales according changes in your game code, which simplifies deployment.
to your defined rules. All you need to do is reinitialize your Amazon DynamoDB
client with a new endpoint that points to DAX, and the
There’s a delay before the new provisioned rest of the code can remain untouched. DAX handles
6 Update table 5
throughput is available while data is repartitioned in cache invalidation and data population without your
the background. This doesn’t cause downtime, but intervention. This cache can help speed responsiveness
DynamoDB Application
it does mean that Amazon DynamoDB Auto Scaling when running events that might cause a spike in players,
table Auto Scaling
is best suited for changes over time, such as the such as a seasonal downloadable content (DLC) offering
number of players increasing from 1,000 to 10,000. or a new patch release.
It’s not designed to handle hourly user spikes. For
this, as with other databases, you need some form of
caching to add resiliency.
Figure 6: A high-level overview of how Amazon DynamoDB
Auto Scaling manages throughput capacity for a table
37
4.1 REL ATIONAL VS. NOSQL DATABASES
RIAK
Riak KV is a flexible key-value data
model for web scale profile and session
management, real-time big data, data
cataloging, content management,
360-degree customer data management,
digital messaging, and more.
COUCHBASE
Other NoSQL
Couchbase Cloud is a fully managed,
automated database that simplifies
database management for deploying,
managing, and operating Couchbase
options
Server across multi-cloud environments.
CASSANDRA
Apache Cassandra is an open-source,
distributed, NoSQL database that presents
There are a number of other a partitioned wide-column storage model
with eventually consistent semantics.
NoSQL alternatives you can
use for gaming, including Riak,
Couchbase, and Cassandra.
Even a short-lived cache—with just a few seconds caching strategy because it only populates the cache
for data such as leaderboards, friend lists, and recent when a client requests the data. This way, it avoids
activity—can offload your database significantly. extraneous writes to the cache for records that are
Plus, adding cache servers is more cost-effective than infrequently (or never) accessed or that change
adding additional database servers. before being read. This pattern is so ubiquitous that
most major web development frameworks, such as
Memcached is a high-speed, memory-based key- Ruby on Rails, Django, and Grails, include plugins
value store. It’s the gold standard for caching. In that wrap this strategy. The downside is that when
recent years, Redis has also become extremely data changes, the next client that requests it incurs a
popular because it offers advanced data types cache miss, resulting in a slower response time. This is
and similar performance to Memcached. Both because the new record needs to be queried from the
Memcached and Redis perform well on AWS. You database and populated into cache.
can install Memcached or Redis on EC2 instances,
or you can use Amazon ElastiCache for Redis— This leads us to the second-most prevalent caching
the AWS managed caching service. Like Amazon strategy. For data you know will be accessed
RDS and Amazon DynamoDB, Amazon ElastiCache frequently, populate the cache when records are
completely automates the installation, configuration, saved to avoid unnecessary cache misses. This results
Caching
and management of Memcached and Redis on AWS. in faster, more uniform client response times. Simply
For more details on setting up Amazon ElastiCache, populate the cache when you update the record
see What Is Amazon ElastiCache for Redis? in the rather than when the next client queries it. The
Amazon ElastiCache User Guide. tradeoff here is that if your data is changing rapidly,
it can result in an unnecessarily high number of cache
For gaming, adding a caching layer in To simplify management, ElastiCache servers are writes. And writes to the database can appear slower
grouped in a cluster. Most ElastiCache operations (like to users because the cache also needs to be updated.
front of your database for frequently
configuration, security, and parameter changes) are
used data can alleviate a significant performed at the cache cluster level. Despite the use To choose between these two strategies, you need
number of scalability problems. of the cluster terminology, ElastiCache nodes don’t to know how often your data is changing versus how
talk to each other or share cache data. And it deploys often it’s being queried.
the same versions of Memcached and Redis that you
would download yourself, so existing client libraries The final popular caching alternative is a timed
written in Ruby, Java, PHP, Python, and more are refresh. This is beneficial for data feeds that span
compatible with ElastiCache. multiple different records, such as leaderboards
or friend lists. In this strategy, you would have a
The typical approach to caching is known as lazy background job that queries the database and
population or cache aside. This means the cache refreshes the cache every few minutes. This decreases
is checked, and if the value isn’t in cache (a cache the write load on your cache and enables additional
miss), the record is retrieved, stored in cache, and caching to take place upstream (for example, at the
returned. Lazy population is the most prevalent CDN layer) because pages remain stable longer.
39
4.1 REL ATIONAL VS. NOSQL DATABASES
Amazon ElastiCache simplifies the process of scaling In general, monitoring hits, misses, and evictions Amazon ElastiCache for Redis version 3 and higher
your cache instances up and down. It provides access is sufficient for most applications. If the ratio of supports sharded clusters. You can create clusters with
to a number of Memcached metrics in Amazon hits to misses is too low, you should revisit your up to 15 shards, expanding the overall in-memory data
CloudWatch at no additional charge. Based on these application code to ensure your cache code is working store to more than 3.5 TiB. Each shard can have up to
metrics, you can set Amazon CloudWatch alarms as expected. As mentioned, evictions should typically five read replicas, allowing you to handle 20 million
to alert you to cache performance issues. You can be zero 100 percent of the time. If this isn’t the case, reads and 4.5 million writes per second.
configure these alarms to send emails when the either scale up your ElastiCache nodes to provide more
cache memory is almost full or when cache nodes memory capacity or revisit your caching strategy to The sharded model, in conjunction with the read
are taking a long time to respond. We recommend ensure you’re only caching what you need to. replicas, improves overall performance and availability.
monitoring the following metrics: Data is spread across multiple nodes, and the read
You can also configure your cache node cluster to replicas support rapid, automatic failover in the event
CPUUTILIZATION span multiple Availability Zones, providing high that a primary node has an issue.
This is the amount of CPU used by Memcached or availability for your game’s caching layer. In the
Redis. Elevated CPU may be indicative of an issue. event of an Availability Zone being unavailable, this To take advantage of the sharded model, you need to
prevents your database from being overwhelmed by use a cluster-aware Redis client. The client will treat
EVICTIONS a sudden spike in requests. When creating a cache the cluster as a hash table with 16,384 slots spread
This is the number of keys that must be forced cluster or adding nodes to an existing cluster, you can equally across the shards, and it will map the incoming
out of memory due to lack of space. This number choose the Availability Zones for the new nodes. You keys to the proper shard. Amazon ElastiCache for
should be zero. If it’s not near zero, you need a can either specify the requested number of nodes in Redis treats the entire cluster as a unit for backup and
larger ElastiCache instance. each Availability Zone or select the option to spread restore purposes—so you don’t have to think about or
nodes across zones. manage backups for the individual shards.
GETHITS/CACHEHITS AND
GETMISSES/CACHEMISSES With Amazon ElastiCache for Redis, you can create a
This is a measure of how frequently your cache read replica in another Availability Zone. If a primary
has the keys you need. The higher the percentage node fails, AWS provisions a new one. And when a
of hits, the more you’re offloading your database. primary node can’t be provisioned, you can decide
which read replica to promote to be the new primary.
CURRCONNECTIONS
This is the number of clients currently connected.
It excludes connections from read replicas.
4.2
40
Players expect an ongoing stream of new characters, Easy versioning with ETag
levels, and challenges for months—if not years—
after a game’s release. The ability to deliver this Amazon S3 supports HTTP ETag and the If-None-
content quickly and cost-effectively has a big impact Match HTTP header, both of which are well known to
on the profitability of a DLC strategy. web developers but frequently overlooked by game
Content
developers. These headers enable you to send a request
The game client itself is typically distributed through for a piece of Amazon S3 content and include the MD5
a given platform’s app store. Pushing a new version checksum of the version you already have. If you already
of a game just to make a new level available can be have the latest version, Amazon S3 responds with an
delivery and
onerous and time consuming. Promotional or time- HTTP 304 Not Modified status code (or an HTTP 200
limited content, such as Halloween-themed assets status code along with the file data) if you need it. For
or a long weekend tournament, are usually easier to an overview of this call flow, read about typical usage
manage yourself in a workflow that mirrors the rest of HTTP ETag.
CloudFront
clients (for example, a game patch, expansion, or supports the Amazon S3 ETag. For more information,
beta), we recommend using Amazon CloudFront see Request and Response Behavior for Amazon S3
in front of Amazon S3. CloudFront has points of Origins in the Amazon CloudFront Developer Guide.
presence (POP) located throughout the world, which
improves download performance. And you can Amazon CloudFront offers a Geo Targeting feature that
optimize costs by choosing which Regions CloudFront allows you to restrict access to your content. It detects
From an engagement perspective, DLC is serves. For more information, access the Amazon the country your customers are located in and forwards
CloudFront FAQs and refer to the question: How the country code to your origin servers. Your origin
a huge aspect of modern games, and it’s
does CloudFront lower my costs? server can then determine the type of personalized
becoming a primary revenue stream. content that will be returned to the customer based on
If you anticipate significant Amazon CloudFront their geographic location. This content can be anything
usage, contact our sales team. Amazon offers from a localized dialog file for a role-playing game to
reduced pricing that’s even lower than our on- localized asset packs for your game.
demand pricing for high-usage customers.
Content
and can even be accomplished directly from a web
Using Amazon S3 POST Using Amazon S3 POST
browser. For more information, see Browser-based
uploads using POST (AWS signature version 2) in
the Amazon S3 Developer Guide. You can also create
upload with secure URLs for players to upload content (say from Amazon S3 Amazon S3
Amazon S3
To protect against corruption, consider calculating
an MD5 checksum of the file and including it in the
Content-MD5 header. This will enable Amazon S3 to File transfer
In this example, you PUT the binary game asset This simple call flow handles the case where
(for example, the avatar or level) to Amazon S3, the asset data is stored verbatim in Amazon S3,
which creates a new object in Amazon S3. After you which is usually true of user-generated levels or
receive a success response from Amazon S3, you characters. This same pattern works for game
make a POST request to our REST API layer with saves as well—you store the game save data
the metadata for that asset. The REST API needs to in Amazon S3 and index it in your database by
have a service that accepts the Amazon S3 key name user_id, date, and any other important metadata.
plus any metadata you want to keep. Then, it stores If you need to do additional processing of an
the key name and the metadata in the database. Amazon S3 upload (for example, generating
The game’s other REST services can then query the preview thumbnails), make sure to read about
database to find new content, popular downloads, asynchronous jobs in the next section. There, you’ll
and so on. learn about adding Amazon SQS to queue jobs to
handle these types of tasks.
HTTP/JSON Data
Analytics and A/B testing After you identify the data, follow these steps to track it: For both analytics and A/B
testing, the data flow tends
Collecting data about your game is one of the 1. Collect metrics in a local data file on the player’s 3. For each file you upload, put a record somewhere
most important, easiest things you can do. device (for example, mobile, console, or PC). To indicating there’s a new file to process. Amazon to be unidirectional.
Perhaps the trickiest part is deciding what to make things easier, we recommend using a CSV S3 event notifications provide an excellent way
collect. Because Amazon S3 storage is cheap, format and a unique file name. For example, a to support this. To enable notifications, first That is, metrics flow in from users and are
consider keeping track of any reasonable given user might have their data tracked in 241- add a notification configuration identifying the processed, and then a human makes decisions
player metrics you can think of (for example, game_name-user_idYYYYMMDDHHMMSS.csv or events you want Amazon S3 to publish, such that impact future content releases or game
total hours played, favorite characters or something similar. as a file upload and the destinations where you features. With A/B testing, when you present
items, and current and highest level) if you’re want Amazon S3 to send the event notifications. players with different items, screens, and so
not sure what to measure or have a client 2. Periodically persist the data by having the client We recommend Amazon SQS, as you can have a forth, you can make a record of the choice
that’s not easily updated. upload the metrics file directly to Amazon S3. background worker listen to Amazon SQS for new they were given along with their subsequent
Alternatively, you can integrate with Amazon files and process them as they arrive. For more actions (such as purchase or cancel). Then, you
However, if you’re able to formulate Kinesis and adopt a loosely coupled architecture, details, see the Amazon SQS section. can periodically upload this data to Amazon S3
questions you want answered beforehand as discussed in the next chapter. When you go to and use Amazon EMR to create reports. In the
(or if client updates are easy), you can focus upload a given data file to Amazon S3, open a new 4. As part of a background job, process the data using simplest use case, you can generate cleaned
on gathering the data that helps you answer local file with a new file name. This simplifies the a framework like Amazon Elastic MapReduce up data from Amazon EMR in CSV format in
those specific questions. upload loop. (Amazon EMR) or another framework that you another Amazon S3 bucket and load it into a
choose to run on Amazon EC2. This background spreadsheet program.
process can look at new data files that have
been uploaded since the last run and perform The proper treatment of analytics and Amazon
aggregation or other operations on the data. (Note: EMR is beyond the scope of this guide.
If you’re using Amazon EMR, you may be able
Upload Processing
to skip step 3 because Amazon EMR has built-in For more information, see Data Lakes and
support for streaming new files.)
2. PUT file to Amazon S3 Analytics on AWS and the Best Practices for
5a. EMR cluster Amazon EMR guide. To contact us, please fill
5. Optionally, feed the data into Amazon Redshift
3. HTTP 200 OK from Amazon S3 out the form at the AWS Game Tech website.
1. Write local for additional data warehousing and analytics
metrics file Amazon S3 EMR
bucket OR flexibility. Amazon Redshift is an ANSI SQL-
compliant, columnar data warehouse that you pay
for by the hour. This enables you to perform queries
across large volumes of data, such as sums and
5b. Non-EMR workflow
min/max, using familiar SQL-compliant tools.
DynamoDB EC2 workers
OR Repeat these steps in a loop, uploading and
processing data asynchronously (Figure 9).
4. PUT new Amazon S3 key is ready
SQS
Figure 9: A simple pipeline for
analytics and A/B testing
45
4.2 BINARY GAME CONTENT WITH AMAZON S3
Amazon Athena There are, however, a few things to keep in mind to optimize performance while using Athena for your queries, including:
For example, a player needs updated stats in real This is a highly effective way to decouple your front-
time, so they won’t lose progress if they exit and re- end servers from backend processing, and it enables
enter the game. However, re-ranking the global top you to scale the two independently. For example, if
100 leaderboard doesn’t need to occur every time a the image resizing is taking too long, you can add
player posts a new high score. Instead, the ranking additional job instances without the need to scale
process could be decoupled from score posting and your REST servers.
performed in the background every few minutes.
Because game ranks are highly volatile in any active Choosing a serverless approach can also help
online game, this would have minimal impact on the alleviate backend management through event-driven
game experience. triggers. AWS Lambda functions are used here as an
entry point to all of your AWS resources, providing
Leaderboards
As another example, consider allowing players to a layer of abstraction between your game and AWS
upload a custom avatar for their character. In this features. For more information about serverless use
scenario, your front-end servers place a message cases for games, see our reference architectures for
into a queue (like Amazon SQS) about the new mobile real-time analytics and push notifications.
and avatars
avatar upload. You write a background job that runs
periodically, pulls avatars off the queue, processes (Note: You can also implement this pattern with an
them, and marks them as available in MySQL, alternative such as RabbitMQ or Apache ActiveMQ
Amazon Aurora, Amazon DynamoDB, or whichever deployed to Amazon EC2.)
database you’re using. The background job runs on
Many gaming tasks can be decoupled a different set of EC2 instances that can be set up to
automatically scale just like your front-end servers.
and handled in the background.
To help you get started quickly, AWS Elastic Beanstalk
provides worker environments that simplify this
process by managing the Amazon SQS queue and
running a daemon process on each instance that
reads from the queue for you.
49
5.0 SCALING GAME SERVICES AS ASYNCHRONOUS JOBS
Amazon SQS
SCALE HORIZONTALLY
Amazon SQS is designed to scale horizontally. Amazon SQS has the following caveats:
An Amazon SQS client can process about 50
requests per second. The more Amazon SQS ± Messages aren’t guaranteed to arrive in order.
client processes you add, the more messages You might receive messages in random order (for
Amazon SQS is a fully managed you can process concurrently. For tips on adding example, 2, 3, 5, 1, 7, 6, 4, 8). If you need strict
additional worker processes and EC2 instances, ordering of messages, review the First-In, First-
queue solution with a long-polling
see Increasing throughput using horizontal Out (FIFO) queues section that follows.
HTTP API, making it easy to interface scaling and action batching in the Amazon
with regardless of the server Simple Queue Service Developer Guide. ± Messages typically arrive quickly, but a message
might occasionally be delayed by a few minutes.
languages you’re using. REDUCE COSTS
You can save money using Amazon EC2 Spot ± Messages can be duplicated, and it's the responsibility
Instances for your job workers. Amazon SQS of the client to de-duplicate messages.
is designed to redeliver messages that aren’t
explicitly deleted, which protects against EC2 This means you should ensure your asynchronous
instances disappearing mid-job. You should jobs are coded to be idempotent and resilient to
only delete messages after you have completed delays. Resizing and replacing an avatar is a good
processing them, so another EC2 instance can example of idempotence because doing this twice
retry the job if a given instance fails while running. would yield the same result.
50
5.0 SCALING GAME SERVICES AS ASYNCHRONOUS JOBS
Finally, if your job workload scales FIFO queues Other queue options
up and down over time (for example,
The recommended method for using Amazon SQS In addition to Amazon SQS and Amazon SNS, there
perhaps more avatars are uploaded when is to engineer and architect your application to are dozens of other message queue approaches—
more players are online), consider using be resilient to disordered and duplicate queues. including RabbitMQ, ActiveMQ, and Redis—that can
However, you might have certain tasks where run effectively on Amazon EC2. With all of these
Auto Scaling to launch Spot Instances. duplicates can’t be tolerated because the ordering of approaches, you’re responsible for launching and
messages is absolutely critical to proper functioning. configuring a set of EC2 instances, which is outside
Amazon SQS offers multiple metrics that For example, you might allow micro-transactions the scope of this guide. Keep in mind that running a
you can use for Auto Scaling, the best being for a player who wants to buy a particular item reliable queue is much like running a highly available
ApproximateNumberOfMessagesVisible. The number once, and this action must be strictly regulated. To database. You should consider high-throughput disk
of visible messages is basically your queue backlog. supplement this type of requirement, FIFO queues (such as Amazon EBS PIOPS), snapshots, redundancy,
are available in select AWS Regions. FIFO queues can replication, failover, and more. Ensuring the uptime
For example, depending on the number of jobs you process messages in order and exactly once. Due to and durability of a custom queue solution can be
can process each minute, you could scale up when the the emphasis on message order and delivery, there time consuming and can fail at the worst times (for
number of visible messages hits 100 and then scale are additional limitations when working with FIFO instance, during your highest load peaks).
back down when that number falls below 10. For queues. For more details about FIFO queues, see
more information about Amazon SQS, see Monitoring Amazon SQS FIFO queues in the Amazon Simple
Amazon SQS queues using CloudWatch in the Queue Service Developer Guide.
Amazon Simple Queue Service Developer Guide.
6.0
51
Getting
started
52
We covered a The following are major takeaways of scalable game development patterns and some
simple steps you can take to begin your game’s journey on AWS:
Store binary content, such as game data, At extreme loads, determine whether
assets, and patches, on Amazon S3. Use advanced strategies (such as event-driven
Amazon S3 to offload network-intensive servers or sharded databases) are necessary.
downloads from your game servers. If you’re However, wait to implement these until
distributing these assets globally, consider it’s absolutely necessary to avoid adding
Amazon CloudFront. complexity to development, deployment,
and debugging.
Always deploy your EC2 instances and databases
to multiple Availability Zones for the best
availability—it’s as easy as splitting your AWS has a team of business and technical pros who
instances across two Availability Zones to start. are dedicated to supporting our gaming customers. If
you’re ready to talk to us about building your game
As your server load grows, add caching via on AWS, complete our contact form. A member of the
Amazon ElastiCache. Create at least one AWS Game Tech team will reach out to discuss your
ElastiCache node in each Availability Zone requirements and AWS Support options.
where you have application servers.