AWS Solutions Architect Associate Notes
AWS Solutions Architect Associate Notes
Contents
MODULE 1 – INTODUCTION & SCENARIO .......................................................................................................... 2
MODULE 2 - COURSE FUNDAMENTAL & AWS ACCOUNT .............................................................................. 3
MODULE 3 – CLOUD COMPUTING FUNDAMENTALS ...................................................................................... 9
MODULE 4 - AWS FUNDAMENTALS .................................................................................................................... 12
MODULE 5 - IAM, ACCOUNTS AND AWS ORGANISATIONS ......................................................................... 30
MODULE 6 - VIRTUAL PRIVATE CLOUD (VPC) BASICS ................................................................................ 53
MODULE 7- ELASTIC COMPUTE CLOUD (EC2) BASICS ................................................................................ 60
MODULE 8 - CONTAINERS & ECS......................................................................................................................... 72
MODULE 9 - ADVANCED EC2 ................................................................................................................................. 75
MODULE 10 - ROUTE 53 – GLOBAL DNS ............................................................................................................. 79
MODULE 11 - RELATIONAL DATABASE SERVICE (RDS) .............................................................................. 83
MODULE 12 - NETWORK STORAGE .................................................................................................................... 92
MODULE 13- HA & SCALING .................................................................................................................................. 92
MODULE 14 - SERVERLESS AND APPLICATION SERVICES ......................................................................... 96
MODULE 15 - GLOBAL CONTENT DELIVERY AND OPTIMIZATION ....................................................... 103
MODULE 16 - ADVANCED VPC NETWORKING .............................................................................................. 107
MODULE 17 - HYBRID ENVIRONMENTS AND MIGRATION ....................................................................... 109
MODULE 18 - SECURITY, DEPLOYMENT & OPERATIONS ......................................................................... 124
MODULE 19 - NOSQL DATABASE & DYNAMODB .......................................................................................... 130
MODULE 20 – EXAM ............................................................................................................................................... 144
MODULE 1 – INTODUCTION & SCENARIO
To configure and enable a virtual MFA device for use with your root user (console)
1) Sign in to the AWS Management Console.
2) On the right side of the navigation bar, choose your account name, and choose My Security
Credentials. If necessary, choose Continue to Security Credentials. Then expand the Multi-
Factor Authentication (MFA) section on the page.
3) Choose Activate MFA.
4) In the wizard, choose Virtual MFA device, and then choose Continue.
IAM generates and displays configuration information for the virtual MFA device, including a QR
code graphic. The graphic is a representation of the secret configuration key that is available for
manual entry on devices that do not support QR codes.
5) Open the virtual MFA app on the device.
If the virtual MFA app supports multiple virtual MFA devices or accounts, choose the option to
create a new virtual MFA device or account.
6) The easiest way to configure the app is to use the app to scan the QR code. If you cannot scan the
code, you can type the configuration information manually. The QR code and secret configuration
key generated by IAM are tied to your AWS account and cannot be used with a different account.
They can, however, be reused to configure a new MFA device for your account in case you lose
access to the original MFA device.
• To use the QR code to configure the virtual MFA device, from the wizard, choose Show QR
code. Then follow the app instructions for scanning the code. For example, you might need to
choose the camera icon or choose a command like Scan account barcode, and then use the
device's camera to scan the QR code.
• In the Manage MFA Device wizard, choose Show secret key, and then type the secret key into
your MFA app.
1. Sign in to the AWS Management Console and open the IAM console
at https://console.AWS.amazon.com/IAM/
2. In the navigation pane, choose Users.
3. Choose the name of the user whose access keys you want to create, and then choose the Security
credentials tab.
4. In the Access keys section, choose Create access key.
5. To view the new access key pair, choose Show. You will not have access to the secret access key
again after this dialog box closes. Your credentials will look something like this:
• Access key ID: AKIAIOSFODNN7EXAMPLE
• Secret access key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
6. To download the key pair, choose Download .csv file. Store the keys in a secure location. You will
not have access to the secret access key again after this dialog box closes.
Keep the keys confidential in order to protect your AWS account and never email them. Do not share
them outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No
one who legitimately represents Amazon will ever ask you for your secret key.
7. After you download the .csv file, choose Close. When you create an access key, the key pair is active
by default, and you can use the pair right away.
MODULE 3 – CLOUD COMPUTING FUNDAMENTALS
o More flexibility—your organisation can customise its cloud environment to meet specific
business needs.
o More control—resources are not shared with others, so higher levels of control and privacy
are possible.
o More scalability—private clouds often offer more scalability compared to on-premises
infrastructure.
• Hybrid cloud is any environment that uses both public and private clouds.
Advantages of the hybrid cloud:
o Availability zones are connected through redundant and isolated metro fibres.
Edge Locations
o Edge locations are the endpoints for AWS used for caching content.
o Edge locations consist of CloudFront, Amazon's Content Delivery Network (CDN).
o Edge locations are more than regions. Currently, there are over 150 edge locations.
o Edge location is not a region but a small location that AWS have. It is used for caching the content.
o Edge locations are mainly located in most of the major cities to distribute the content to end users with
reduced latency.
o For example, some user accesses your website from Singapore; then this request would be redirected
to the edge location closest to Singapore where cached data can be read.
Regional Edge Cache
o AWS announced a new type of edge location in November 2016, known as a Regional Edge Cache.
o Regional Edge cache lies between CloudFront Origin servers and the edge locations.
o A regional edge cache has a large cache than an individual edge location.
o Data is removed from the cache at the edge location while the data is retained at the Regional Edge
Caches.
o When the user requests the data, then data is no longer available at the edge location. Therefore, the
edge location retrieves the cached data from the regional edge cache instead of the Origin servers that
have high latency.
1. In a terminal window, use the ssh command to connect to the instance. You specify the path and file
name of the private key (.pem), the user name for your instance, and the public DNS name or IPv6
address for your instance. For more information about how to find the private key, the user name for
your instance, and the DNS name or IPv6 address for an instance, see Locate the private key and Get
information about your instance. To connect to your instance, use one of the following commands.
• (Public DNS) To connect using your instance's public DNS name, enter the following
command.
ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-public-dns-name
2. (Optional) Verify that the fingerprint in the security alert matches the fingerprint that you previously
obtained in (Optional) Get the instance fingerprint. If these fingerprints don't match, someone might
be attempting a "man-in-the-middle" attack. If they match, continue to the next step.
3. Enter yes.
AMI
➢ An AMI, or Amazon Machine image, as the name suggested is an image of an EC2 instance. now an
AMI can be used to create an EC2 instance or an AMI can be created from an EC2 instance.
AMI contains a few important things which you should be aware of.
➢ Firstly, an AMI contains attached permissions. and these permissions control which accounts can and
can't use the AMI.
➢ AMI can be set as a public AMI; in which case everyone is allowed to launch instances from that
AMI.
➢ The second type is the owner of an AMI is implicitly allowed to create EC2 instances from the AMI.
as the owner, he or she owns it.
➢ And finally, you can add explicit permissions to that AMI where the owner explicitly grants access
to the AMI is either private, so only the owner can make use of it, or you can explicitly add other
AWS account, so they're allowed access, or it can be set to public where everyone is allowed.
As well as permissions, an AMI handles two other important things
- It contains the boot volume of the instance. So, this is the C-drive in winnows and the root volume in
Linux.
- It also provides what's known as a block device mapping. and this is a configuration which links the
volumes that the AMI has and how they're presented to the OS.
so, it determines which volume is the boot volume and which is a data volume,
EC2 instances can run different OS. they can run a distribution or a version of Linux, as well as various
different versions of windows. You can connect to Windows instances using RDP, the remote desktop
protocol. And this runs in port 3389. With Linux instances you use the SSH protocol which runs on port 22.
My First EC2 Instance [Practical]
Important: This is the only chance for you to save the private key file.
Warning: For security reasons, do not choose Anywhere for Source with a rule for RDP. This would
allow access to your instance from all IP addresses on the internet. This is acceptable for a short time
in a test environment, but it is unsafe for production environments.
7. For Outbound rules, keep the default rule, which allows all outbound traffic.
8. Choose Create security group.
To launch an instance
Warning: Don't select Proceed without a key pair. If you launch your instance without a key pair, then you
can't connect to it.
When you are ready, select the acknowledgement check box, and then choose Launch Instances.
9. A confirmation page lets you know that your instance is launching. Choose View Instances to close
the confirmation page and return to the console.
10. On the Instances screen, you can view the status of the launch. It takes a short time for an instance to
launch. When you launch an instance, its initial state is pending. After the instance starts, its state
changes to running and it receives a public DNS name. (If the Public IPv4 DNS column is hidden,
choose the settings icon ( ) in the top-right corner, toggle on Public IPv4 DNS, and
choose Confirm.
11. It can take a few minutes for the instance to be ready so that you can connect to it. Check that your
instance has passed its status checks; you can view this information in the Status check column.
Simple Storage Service (S3) Basics
➢ S3 is a Global Storage Platform. It's global, because it runs from all of the AWS regions and can be
accessed from anywhere with an internet connection.
➢ It's a public service. it's regional based because your data is stored in a specific region. And it never
leaves that region unless you explicitly configure it to.
➢ S3 is regionally resilient, meaning the data is replicated across availability zones in that region.
➢ S3 can tolerate the failure of an AZ and it also has some ability to replicate data between regions.
➢ S3 might initially appear confusing. if you utilize it from the UI, you appear not to have to select a
region. Instead, you select the region when you create things inside S3.
➢ S3 is a public service so it can be accessed from anywhere as login as you have an internet
connection.
➢ The service itself runs from the AWS public zone. It can cope with unlimited data amounts, and it’s
designed for multi user usage of that data.
➢ S3 is perfect for hosting large amount of data. so, think movies, audio distribution, large scale photo
storage like stock images, large textual data or big datasets.
➢ S3 is economical & accessed via CLI/UI/API/HTTP
➢ S3 has two main things to delivers, objects and buckets.
➢ Objects are the data that S3 stores, your cat pictures, the latest episode of Game of Thrones, which
you have stored legally. you can think about objects like files, conceptually, most of the time they're
interchangeable.
➢ Buckets are containers for objects.
➢ An object in S3 is made up of two main components and some associated metadata.
First, there is the object key. and for now, you can think of the object key as similar to a file name.
The key identifies the object in a bucket. So, if you can uniquely access the object assuming that you
have permissions, remember, by default, even for public services, there is no access in AWS initially,
except for the account root user,
Now the other main component of an object is its value. and the value s the data or the contents of
the objects. the value of an object, in essence, how large the object is, can range from zero bytes up
to five terabytes in size. So, you can have an empty object or you can have one that is huge 5TB.
object also have version ID, metadata, some access control, as well as sub resource.
My First S3 Bucket
In this [DEMO] Lesson I step through the process of creating a simple S3 bucket and uploading objects. I
demonstrate the block public access settings, talk about the bucket ARN and go into some detail about
permissions on objects and how folders are really objects :)
First, you need to create an Amazon S3 bucket where you will store your objects.
If you are using the Show All Services view, your screen looks like this:
3. From the Amazon S3 console dashboard, choose Create Bucket.
4. In Create a Bucket, type a bucket name in Bucket Name.
The bucket name you choose must be globally unique across all existing bucket names in Amazon
S3 (that is, across all AWS customers).
5. In Region, choose Oregon.
6. Choose Create.
When Amazon S3 successfully creates your bucket, the console displays your empty bucket in
the Buckets pane.
➢ All templates have a list of resources at least one. it's the resources section of a CloudFormation
template that tells CloudFormation what to do. if resources are added to it, then cloud formation
creates resources. if resources are updated then it updates those resources if resources are removed
from a template and that template this reapplied then physical resources are removed.
➢ The Resources section of a template is the only mandatory part of a CloudFormation template,
which makes sense without resources the template wouldn't actually do anything.
➢ Description section that is free text field which let the author of the template add as the name
suggests a description. generally, you would use this give some details about what the temperature
does what resources get changed the cost of per template.
➢ Metadata section in the template is the next part that I want to talk about it got many functions
including some pretty advance one but one of the things that it does it can control how the different
thing in CloudFormation templates are presented through the console UI so through the AWS
console if you are applying it so you can specify grouping, you can control the order, you can add
descriptions and label, it’s a way that you can force how the UI presents the template. generally, the
bigger your template the wider audience and more likely it's going to have a meta data section.
➢ Parameters section of a template is when you can add fields which prompt the user for more
information. if you apply the template from the console UI you will see boxes that you need to type
in or select from dropdown. now things that you might use this for, which size of instance to create,
the name of something, the number of availability zones to use. Parameters can even have settings
for which are valid entries. so, you can apply criteria for values that can be added as parameters and
you can also apply default values.
➢ The next section is Mapping and this is another optional section of the CloudFormation templates
and something that we won't use as much especially when you just get started with cloud formation it
allows you to create lookup table.
➢ Conditions section allows decision making in the template so you can set certain things in a
template that will only occur if the condition is met now using conditions is a two-step process step
one is to create the condition.
➢ Output section can present output based on what's being created updated or deleted.
➢ Now, the second part of a CloudWatch is called CloudWatch Log and this allow for the collection
monitoring and actions based on logging data now this might be Windows event logs, web server
logs, Firewall logs, Linux server log almost anything which is log can be ingested by CloudWatch
logs.
➢ lastly, we have got CloudWatch Event and this function as an event hub. CloudWatch event
provides two powerful features. firstly, if AWS service does something maybe an ec2 instances is
terminated started or stopped then CloudWatch events will generate an event which can perform
another action. and the second type of thing that CloudWatch event can do is to generate an event to
do something at a certain time of a day or Saturday of a week.
➢ Now, because CloudWatch manages lots of different services it needs a way of keeping things
separated so the first concept I want to talk about is a Namespace and you can think of a namespace
as a container for monitoring data its way to keep things from becoming Messy its way to separate
things in two different areas now namespace have a name and this can be almost anything as long as
it stays within the rules set for namespace is names.
➢ Namespaces contain related metrics a metric is a collection of related data points in a time ordered
structure so to give you a few examples we might have CPU utilisation network in and out or disk
utilisation they are all Matrix if you imagine a set of service logging data for CPU utilisation it will
be time ordered it will start when you enable monitoring and it will finish when you disable it a
metric the and this is a fairly nouns point to understand a metric is not for a specific server CPU
utilisation is the metric and that metric might be receiving data for lots of different ec2 instances.
➢ I want to talk about data points. let's say we have got a metric called CPU utilisation every time any
server measured is utilisation and send it into a CloudWatch that goes into a CPU utilisation metric
and each one of those measurements so every time the server reports it CPU that measurement it’s
called data point. Now a data point and its simplest form consists of two things first a timestamp
which include the year month day are minute second and time zone when the measurement was
conducted and secondly a value in this case 98.3 which represent 98.3 CPU utilisation now, I
mentioned earlier that the CPU utilisation metric could contain data for many servers.
➢ Dimensions separate datapoints for different things or perspectives within the same metric.
➢ CloudWatch also allows us to take actions based on Matrix and this is done using alarm as a concept
a pretty simple alarms are created and their link to a specific Matrix then based on how you can
figure out the alarm it will take an action based on that metric.
To create an M out of N alarm, specify a lower number for the first value than you specify for
the second value. For more information, see Evaluating an alarm.
d. For Missing data treatment, choose how to have the alarm behave when some data points
are missing. For more information, see Configuring how CloudWatch alarms treat missing
data.
e. If the alarm uses a percentile as the monitored statistic, a Percentiles with low samples box
appears. Use it to choose whether to evaluate or ignore cases with low sample rates. If you
choose ignore (maintain alarm state), the current alarm state is always maintained when the
sample size is too low. For more information, see Percentile-based CloudWatch alarms and
low data samples.
10. Choose Next.
11. Under Notification, choose In alarm and select an SNS topic to notify when the alarm is
in ALARM state
To have the alarm send multiple notifications for the same alarm state or for different alarm states,
choose Add notification.
Shared Controls – Controls which apply to both the infrastructure layer and customer layers, but in
completely separate contexts or perspectives. In a shared control, AWS provides the requirements for the
infrastructure and the customer must provide their own control implementation within their use of AWS
services. Examples include:
• Patch Management – AWS is responsible for patching and fixing flAWS within the infrastructure,
but customers are responsible for patching their guest OS and applications.
• Configuration Management – AWS maintains the configuration of its infrastructure devices, but a
customer is responsible for configuring their own guest operating systems, databases, and
applications.
• Awareness & Training - AWS trains AWS employees, but a customer must train their own
employees.
Customer Specific – Controls which are solely the responsibility of the customer based on the application
they are deploying within AWS services. Examples include:
• Service and Communications Protection or Zone Security which may require a customer to route or
zone data within specific security environments.
High-Availability vs Fault-Tolerance vs Disaster Recovery
High-Availability (HA):
➢ Formally, the definition is that highly availability aims to ensure an agreed level of operational
performance, usually uptime, for a higher-than-normal period.
➢ Most of the student have an assumption that making s system highly available means ensuring that
the system never fails or that the user of the system never experience any outages, and that is not
true.
➢ HA isn't aiming to stop failure and it definitely doesn't mean that customers won’t experience
outage
➢ A highly available system is one designed to be online and providing services as often as possible.
it's a system designed so that when it fails its components can be replaced or fixed as quickly as
possible, often using automation to bring systems back into service.
➢ high availability is about maximizing a system's online time and that's it.
➢ highly availability is about keeping a system operational. it's about fast or automatic recovery of
issues. It's not about preventing user disruption. While that's a bonus, a highly available system can
still have user disruption to your user base when there is a failure.
➢ Now, high availability has costs required to implement it. it needs some design decisions to be made
in advance, and it requires a certain level of automation.
➢ Sometimes high availability needs redundant service or redundant infrastructure to be in place
ready to switch customers over to in the event of a disaster to minimize downtime.
Fault Tolerance (FT):
➢ Now let's take this a step further and talk about fault tolerance and how it differs from high
availability. when most people think a high availability, they're actually mixing it up with fault
tolerance.
➢ fault tolerance in some ways is very similar to high availability but it is much more.
➢ fault tolerance is defined as the property that enables a system, to continue operating properly in
the event of a failure of some of its components, so one or more fault within the system.
➢ fault tolerance means that if a system has faults and this could be one fault or multiple faults, then it
should continue to operate properly even while those faults are present and being fixed.it means it
has a failure without impacting customers.
➢ HA is just about maximizing uptime. fault tolerance is what means to operate through failure.
➢ fault tolerance can be expensive because it’s much more complex to implement versus high
availability.
Disaster Recovery (DR):
➢ The definition of disaster recovery is a set of policies, tools and procedures to enables the recovery
or continuation of vital technology infrastructure and systems following a natural or human-
induces disaster.
➢ So, while high availability and fault tolerance are a way designing systems to cope or operate
through disaster, disaster recovery is about what to plan for and do when disaster occurs, which
knocked out a system.
➢ The worst time for any business is recovering in the event of a major disaster. In that type of
environment bad decisions are made, decisions based on shock, lack of sleep and fear of how to
recover.
➢ So, a good set of DR processes need to pre-plan for everything in advance.
➢ Build a set of processes and documentation.
➢ Plan for staffing and physical issues when a disaster happens, you have a business premises with
some staff, then part of a good DR plan might be to have a standby premise ready. And this standby
premises can be used in the event of a disaster. That why done in advance your staff, unaffected by
the disaster, know exactly where to go.
➢ You might need space for IT systems or you might use a cloud platform such as AWS as a backup
location. but in any case, you need the idea of a backup premises or a backup location that's ready to
go in the event of a disaster. If you have local infrastructure then make sure you have resilience.
make sure you have plans in place and ready during a disaster.
➢ this might be extra hardware sitting at the backup site ready to go. Or it might be virtual machines or
instances operating in a cloud environment ready when you need them.
➢ A good DR plan means taking regular backups. this is essential. but the worst thing you can do is to
store these backups at the same site as your system. it's dangerous.
➢ If your main site is damaged, your primary data and your backups are damaged at the same time, and
that's a huge problem.
Summary:
high availability - Minimise any outage
Fault Tolerance - Operate Through Faults
Disaster recovery - used when these don't work
Domain Name System (DNS) Fundamentals
DNS
• DNS is a discovery service
• Translates machine into human and vice-versa
• When using say, amazon.com, you might access it using www.amazon.com. but this isn't what your
computer uses, that requires IP addresses. And so, one function of DNS, is to find the IP address for
a given domain name.
• For example, www.amzone.com, Now, there are two crucial things to realize about DNS and its
requirements. Because of the number of services on the internet, and on private networks, and
because of the importance of DNS, it's actually a huge database, and it has to be distributed and
resilient.
• DNS is a huge scale database. it's distributed and its global.
• it translates the information which machine need to and from information which human needs.
• when you use your laptop to browse www amazon.com from your prospective it just loads but that
some massive abstraction and simplification computer don't work with domain names for your laptop
to communicate to the ww.com web server it uses IP addresses how this conversation happen is
transparent to you. and that's because either your laptop or device is communicating directly to the
DNS system or it’s been configured to talk to a DNS resolver server on your behalf and potentially
this resolver server is running within your internet provider or on your internet router. DNS is a huge
scale database it distributed and its Global but somewhere in that Global platform is one piece of
information on single database which has information in it that we need to convert between the name
and IP addresses so in this example amazon.com.
• Now the database that piece of information that we are looking for is called as zone and the way that
that zone is stored is often referred to as a zone file and somewhere on the internet is one zone file
for amazon.com now that amazon.com zone file has a record inside it a record a DNS record which
links the name, www and IP addresses that your laptop needs to communicate with that website.
• this zone file is hosted by a DNS server that's known as a name server or NS for shorts so if you can
query this zone for the record www.amazon.com then use the result of that query which has an IP
address, your laptop can communicate with the web server.
• This zone file could be allocated anywhere on potentially one or two out of millions of DNS name
servers also one of the core pieces of functionality that DNS provides is it allow a DNS resolver
server which sitting either on your internet router or in your internet provider to find this zone.
• let's quickly summarise firstly we have the DNS client and the DNS client refers to the device or
thing which want to the data the DNS has so it wants IP address for amazon.com and generally the
DNS client is a piece of software running inside operating system on a device that you use so a
laptop, Mobile Phone or Tablet or PC.
• next we have got a DNS resolver and that could be a piece of software running also on the DNS
client for your laptop PC Tablet or it could be a separate server running inside your internet router or
a physical server running inside your internet provider and it is the DNS resolver that queries the
DNS system on your behalf. so general is a DNS client talks to the DNS server and it as the DNS
resolver to query DNS on its behalf.
• next we have got a DNS zone. and a DNS zone is a part of the Global DNS data. for example,
amazon.com, netflix.com they are all example of zones, and they live inside the DNS system. a zone
file is how the data for that zone is physically stored. so there going to be a zone file, for
amazon.com, netflix.com so if I talk about a DNS zone, I refer to what the data is, its substance. if I
talk about zone file and talking about the physically that data is stored.
• lastly, we have a name server or DNS server this is a server which shows these zone files. so, the
point that we need to get to when using DNS, is to find the name server, which host the particular
zone file, and then query that name server for a record that is in that zone file. that's the job of DNS.
so, the DNS resolver server needs to locate the correct name server for a given zone query that name
server, retrieve the information it needs, and then passed it back to the DNS client. that is the flow of
DNS.
Remember these!
• DNS Client=> Your laptop, phone, tablet, PC.
• Resolver=> software on your device, or a server which queries DNS on your behalf.
• DNS zone => A part of the DNS databases (e.g., amazon.com)
• Zone file => physical database for a zone
• Nameserver => where zone files are hosted
• DNS has to have a starting point, and that point is the DNS root. DNS is structure like a tree.
• DNS root is hosted on 13 special name servers, known as the root servers. The root servers are
operated, by 12 different large global companies or organization.
Route53 (R53) Fundamentals
➢ route 53 provides two main services first it's a service in a w s which allows you to register domain,
second and it can Host zone file for you on Managed name servers which it provides.
➢ route 53 is a global service with a single database it's one of a very few a w a services which
operates as a single global service. So don't need to pick a region, when using it from the console UI.
➢ the data that route53 stores and manages is distributed globally as a single set and it’s replicated
between regions and so it's the globally resilient service
➢ route 53 can tolerate the failure of one or more regions and continue to operate without any problems
now it's one of the most important AWS products it needs to be able to scale stay highly performance
while remaining reliable and continue work through failure.
➢ Route53 provides DNS zones, as well as hosting for those zones. it's basically DNS as a service.
➢ it lets you create and manage zone files, and these zone files are called hosted zones, in Route53
terminology, because they're hosted on AWS managed name servers.
➢ from route 53 prospective, every hosted zone also has a number of allocated manage name servers.
now a hosted zone can be public, which means that the data is accessible on the public internet. the
names servers for a public hosted zone live logically in the A W S Public zone and that is accessible
everywhere with the public internet connection.
➢ a hosted zone could also be private, which means that it linked to one or more VPCs, and only
accessible from within those VPCs and you might use this type of zone if you want to host sensitive
DNS records that you don't want to be publicly accessible.
An IAM identity policy or an IAM policies is just a set of security statements to AWS. it grants access or
denied access to AWS products and features to any identity, which uses that policy. identity policy is also
known as policy statements are created using JSON.
Above is an example of an identity policy document. and this is the type of thing that you would use with
the user, group or a role. At the higher level, a policy document is just one or more statements. so inside the
statement block, there are multiple statements each of them is inside a pair of curly braces.
When an identity attempts to access AWS resources that identity needs to prove who it is to AWS a Process
known as Authentication. once authenticated that identity is known as authenticated identity.
AWS knows which policies and identity has, and it could be multiple, and each of these policies can have
multiple statements in it, so AWS has a collection of all of the statements which apply to the given identity.
AWS also knows which resources or resources you are attempting to interact with as well as what actions
you want to perform on those resources
Let’s step through what make a statement. the first part of statement is a statement ID or a Sid and this is
an optional field which lets you identify a statement and what it does see how in this case it's States full
access. In the second statement, it states denycatbucket. This is just a way that we can inform the reader.
A statement only applies if the interaction that you are having with AWS match the action and the resources.
The action part of a statement match is one or more actions it can be very specific and list the specific
individual action.
Every interaction that you have with AWS is a combination of two main things. The resources that you're
interacting with, and the action that you're attempting to perform on that resource.
A statement only applies if the interaction that you're having with AWS match the action and the resource.
The action part of a statement matches one or more actions.
It can be every specific and list a specific individual action.
Effect controls what AWS does if the action and the resource parts of a statemen match the operation what
you are attempting to do with AWS.
Root
➢ The parent container for all the accounts for your organization. If you apply a policy to the root, it
applies to all organizational units (OUs) and accounts in the organization.
➢ Organization unit (OU)
➢ A container for accounts within a root. An OU also can contain other OUs, enabling you to create a
hierarchy that resembles an upside-down tree, with a root at the top and branches of OUs that reach
down, ending in accounts that are the leaves of the tree. When you attach a policy to one of the nodes
in the hierarchy, it flows down and affects all the branches (OUs) and leaves (accounts) beneath it.
An OU can have exactly one parent, and currently each account can be a member of exactly one OU.
Account
An account in Organizations is a standard AWS account that contains your AWS resources and the identities
that can access those resources.
There are two types of accounts in an organization: a single account that is designated as the management
account, and one or more member account.
➢ The management account is the account that you use to create the organization. From the
organization's management account, you can do the following:
o Create accounts in the organization
o Invite other existing accounts to the organization
o Remove accounts from the organization
o Manage invitations
o Apply policies to entities (roots, OUs, or accounts) within the organization
o Enable integration with supported AWS services to provide service functionality across all of
the accounts in the organization.
The management account has the responsibilities of a payer account and is responsible for paying all
charges that are accrued by the member accounts. You can't change an organization's management account.
➢ Member accounts make up all of the rest of the accounts in an organization. An account can be a
member of only one organization at a time. You can attach a policy to an account to apply controls to
only that one account.
Invitation
➢ The process of asking another account to join your organization. An invitation can be issued only by
the organization's management account.
➢ The invitation is extended to either the account ID or the email address that is associated with the
invited account.
➢ After the invited account accepts an invitation, it becomes a member account in the organization.
Invitations also can be sent to all current member accounts when the organization needs all members
to approve the change from supporting only consolidated billing features to supporting all features in
the organization.
➢ Invitations work by accounts exchanging handshakes. You might not see handshakes when you work
in the AWS Organizations console. But if you use the AWS CLI or AWS Organizations API, you
must work directly with handshakes.
Handshake
➢ A multi-step process of exchanging information between two parties. One of its primary uses in
AWS Organizations is to serve as the underlying implementation for invitations.
➢ Handshake messages are passed between and responded to by the handshake initiator and the
recipient. The messages are passed in a way that helps ensure that both parties know what the current
status is.
➢ Handshakes also are used when changing the organization from supporting only consolidated billing
features to supporting all features that AWS Organizations offers. You generally need to directly
interact with handshakes only if you work with the AWS Organizations API or command line tools
such as the AWS CLI.
AWS Organizations DEMO
➢ The GENERAL account will become the MANAGEMENT account for the organisation
➢ We will invite the PRODUCTION account as a MEMBER account and create the DEVELOPMENT
account as a MEMBER account.
➢ Finally - we will create an OrganizationAccountAccessRole in the production account, and use this
role to switch between accounts.
➢ WARNING: If you get an error "You have exceeded the allowed number of AWS Accounts" then
you can go here https://console.AWS.amazon.com/servicequotas/home?region=us-east-
1#!/services/organizations/quotas/L-29A0C5DF and request a quote increase for the number of
member accounts in an ORG
➢ Service control policies inherit down the organizational tree. this means if they're attached to the
organization as a whole, then they affect all of the account inside the organization.
➢ if they’re attached to an organizational unit then they impact all accounts directly inside that
organizational unit,
➢ Management account is special because it never affected by Service control policies.
➢ It's the only AWS account within AWS organizations which can’t be restricted using service control
policies.
➢ Service control policies are account permissions boundaries. means they limit what the AWS account
can do including the Account root user within that account.
➢ You can't directly restrict what the account root user of an AWS account can do. The account root
user always has full permissions over that entire AWS account but with a Service control policy you
can restrict that specifically any identities within that account.
➢ Service control policies define the limit of what is, and isn’t allowed just like a boundary, but they
don’t grant permissions.
➢ You still need to give identities within that AWS account permissions to AWS resources but any
SCPs will limit the permissions that can be assigned to individual identities.
➢ you can use SCPs in two ways, you can block by default and allow certain services, which is an
allowable list. Or you can allow by default and block access to certain services, which is a deny list.
➢ When you enable SCPs on your organization AWS apply a default policy, which is called
fullAWSaccess. This is applied to the organization and all OUs within that organization.
• Allow list strategy – You explicitly specify the access that is allowed. All other access is
implicitly blocked. By default, AWS Organizations attaches an AWS managed policy
called FullAWSAccess to all roots, OUs, and accounts. This helps ensure that, as you build
your organization, nothing is blocked until you want it to be. In other words, by default all
permissions are allowed. When you are ready to restrict permissions,
you replace the FullAWSAccess policy with one that allows only the more limited, desired
set of permissions. Users and roles in the affected accounts can then exercise only that
level of access, even if their IAM policies allow all actions. If you replace the default
policy on the root, all accounts in the organization are affected by the restrictions. You
can't add permissions back at a lower level in the hierarchy because an SCP never grants
permissions; it only filters them.
• Deny list strategy – You explicitly specify the access that is not allowed. All other access
is allowed. In this scenario, all permissions are allowed unless explicitly blocked. This is
the default behavior of AWS Organizations. By default, AWS Organizations attaches an
AWS managed policy called FullAWSAccess to all roots, OUs, and accounts. This allows
any account to access any service or operation with no AWS Organizations–imposed
restrictions. Unlike the allow list technique described above, when using deny lists, you
leave the default FullAWSAccess policy in place (that allow "all"). But then you attach
additional policies that explicitly deny access to the unwanted services and actions. Just as
with IAM permission policies, an explicit deny of a service action overrides any allow of
that action.
CloudWatch Logs
• It is a public service usable from AWS or on-premises
• It Stores, Monitor and Access logging data.
• It also integrated with AWS services like EC2, VPC Flow logs, Lambda, Cloud Trail, R53 and more.
• It is often the default place where AWS Services can output their logging too.
• CloudWatch Logs is a public service and can also be utilised in an on-premises environment and
even from other public cloud platforms.
• CloudWatch logs are also capable, of taking logging data and generating a metric from it. this is
known as metric filter.
• Now the starting point are logging sources, which can be include AWS products and services, mobile
or server-based applications, external computer services, or virtual or physical servers, databases, or
even external APIs. these sources inject data into CloudWatch logs as log events.
• Log events have a time stamp and a message block. CloudWatch logs treated this message as a row
block of data.
• Log events are stored inside log streams and log streams are essentially, a sequence of log events
from the same source.
• We also have log groups. log groups are containers for multiple log streams, for the same type of
logging. It also a place that stores configuration setting, where we define things like retention setting
and permissions.
CloudTrail
• CloudTrail Is a product which logs API calls and account events. It logs API calls/activities as a
CloudTrail Event
• 90 days stored by default in Event History
• It's very often used to diagnose security or performance issues, or to provide quality account level
traceability.
• It is enabled by default in AWS accounts and logs free information with a 90-day retention.
• It can be configured to store data indefinitely in S3 or CloudWatch Logs.
• CloudTrail events can be two types, Management event or data events.
• Management events provide information about management operations that are performed on
resources in you AWS account.
• These are also known as control plane operations. Think of things like creating an EC2 instance,
terminating an EC2 instance and creating a VPC, these are all control plane operations.
• Now, data events contain information about resource operations performed on or in a resource. So,
examples of this might be object being uploaded to S3, or object being accessed from S3, or when a
lambda function is being invoked.
• By default, Cloud Trail only logs management events because data events are often much higher
volume.
• CloudTrail trail is the unit of configuration withing the CloudTrail product. It's a way you provide
configuration to CloudTrail on how to operate.
• A trial logs events for the AWS region that it's created in.
• CloudTrail is a regional service but when you create a trial, it can be configured to operate in one or
two ways.
• You can create a trial which is a one region trail, or a trial can be set to all regions.
• A single region trail only logs events for that region.
• All region trail you can think of as a collection of trials in every AWS region, but it's managed as
one logical trail.
S3 Security
• S3 is private by default.
• The only identity which has any initial access to an S3 bucket is the account root user of the account
which own that bucket, so, the account which created it.
• Anything else so any of the permissions have to be explicitly granted and there are a few ways that
this can be done.
• The first f S3 bucket policy. and S3 bucket policy is a types of resource policy.
• A resource policy is just like an identity policy. but as the name suggests, they're attached to
resources instead of identities.
• Resource policies provide a resource perspective on permissions.
• The difference between resource policies and identity policies is all about this perspective.
• with identity policies, you're controlling what that identity can access, with resource policies you're
controlling who can access that resource.
• So, it’s from an inverse perspective one is identities and one is resources.
• Identity policies have one pretty significant limitation, you can only attach identity policy to
identities in your own account. So, identity policies can only control security inside your account.
• Resource policies can access from the same account or different account because the policy is
attached to the resource. and it can reference any other identities inside that policy. so, by attaching
the policy to the resource, and then having flexibility to be able to reference any other identity,
whether they are in the same account, or different accounts, resource policies, therefore are a great
way of controlling Access for a particular resource, no matter what the source of that access is.
• They also have another benefit; resource policies can allow or deny anonymous principles. identity
policies, by design have to be attached to a valid identity in AWS. you can't have one attached to
nothing. resource policies can be used to open a bucket to a world by referencing all principles even
those not authenticated by AWS.
S3 Performance Optimization
Understanding Performance characteristics of S3 is essential as a solution architect.
We know from the Animal4life scenario that remote worker need to upload large data sets and do so
frequently.
By default, when you upload an object to S3, it uploaded as a single blob of data in a single stream.
A file becomes an object and it's uploaded using the PUT object API call and places in a bucket and this all
happens in a single stream. This method has a problem if a stream fails the whole upload fails and the only
recovery from it a full restart of the entire upload.
If the upload fails at 4.5GB of a 5GB upload, that's 4.5GB of data wasted and probably a significant amount
of time.
S3 performance optimization
➢ When using the single PUT method, the speed and reliability of the upload with always be Limited
because of this single stream of data.
➢ Data transfer protocol such as BitTorrent have been developed to allow a speedy, distributed transfer
of data.
➢ Using data transfer with only a single stream is just a bad idea. now there is a limit within AWS if
you utilise a single PUT upload then you are limited to 5 GB of data as a maximum.
➢ Solution is multipart upload. multipart upload improves the speed and reliability of upload to S3 and
it does this by breaking data up into individual parts.
➢ The minimum size for using multipart upload is 100 MB. you can't use multipart upload if you are
uploading data smaller than this.
➢ Multipart upload is so effective is that each individual part is treated as its own isolated upload. each
individual part can fail in isolation and be restarted in isolation rather than needing to restart the
whole thing.
➢ This means that the risk of uploading large amount of data to S3 is significantly reduced.
➢ The transfer rate of the whole upload is the sum of all the individual parts. so you get much better
transfer rates by splitting this original blob of data into smaller individual parts and then uploading
them in parallel.
➢ Transfer acceleration uses the network of AWS edge locations which located in lots of convenient
locations globally. S3 bucket needs to be enabled for transfer acceleration. the default is that it's
switched off and there are some restrictions for enabling it.
➢ The bucket name cannot contain periods and it needs to be DNS compatible in its naming, so keep in
mind those two restrictions
Encryption 101
Encryption at rest is designed to protect against physical theft and physical tampering.
Encryption at rest is also useful commonly within cloud environment you are data stored on shared
hardware, and it’s done so in an encrypted form
The Other approach to encryption is known as encryption in transit and this is aimed at protecting data
while its being transferred between two places.
Plaintext: information that can be directly read by humans or a machine (this article is an example of
plaintext). Plaintext is a historic term pre-dating computer, when encryption was only used for hardcopy
text, nowadays it is associated with many formats including music, movies and computer programs
Algorithm: algorithms are used for important tasks such as data encryption, authentication, and digital
signatures
Key: a key is a variable value that is applied using an algorithm to a string or block of unencrypted text to
produce encrypted text, or to decrypt encrypted text.
Ciphertext: Ciphertext is the unreadable output of an encryption algorithm
Encryption: Encryption is the process of converting normal message (plaintext) into meaningless message
(Ciphertext).
Decryption: Decryption is the process of converting meaningless message (Ciphertext) into its original form
(Plaintext).
Symmetric Key Cryptography: It is an encryption system where the sender and receiver of message use a
single common key to encrypt and decrypt messages. Symmetric Key Systems are faster and simpler but the
problem is that sender and receiver have to somehow exchange key in a secure manner. The most popular
symmetric key cryptography system is Data Encryption System (DES).
Asymmetric encryption: Under this system a pair of keys is used to encrypt and decrypt information. A
public key is used for encryption and a private key is used for decryption. Public key and Private Key are
different. Even if the public key is known by everyone the intended receiver can only decode it because he
alone knows the private key.
Steganography: Steganography is the practice of hiding a secret message inside of (or even on top of)
something that is not secret. That something can be just about anything you want. These days, many
examples of steganography involve embedding a secret piece of text inside of a picture. Or hiding a secret
message or script inside of a Word or Excel document.
key management service (KMS)
➢ KMS is regional and public service. Now it’s a public service which means it occupies the AWS
Public zone and so it can be connected to from anywhere with access to this public zone. but you
need permissions in order to access it.
➢ KMS lets you create, store and manage cryptographic key. These are the keys which can be used to
convert plain text to cypher text and vice versa.
➢ Kms is capable of handling both symmetric and asymmetric keys.
➢ Kms is also capable of performing actual cryptographic operations which include encryption and
decryption but also many other
➢ Now the fundamental things to understand about kms is that cryptographic key is never leave the
product. kms can create keys, keys can be imported, it manages keys, it can use this key to perform
operations but the keys are locked inside the kms.
➢ KMS provides a FIPS 140-2 compliant service. That's a US security standard.
➢ Now the main thing that KMS manages as known as CMKs or customer master keys.
➢ These CMKs are used by KMS within cryptographic operations.
➢ CMKs are logical. think of them as a container for the actual physical master key. so, a CMK is
logical and it contains a few things. it contains a key ID, which is unique identifier for the key. it
contains a creation date, a key policy, which is the type of resource policy, a description and a state
of the key, whether its active or not.
➢ CMK backed by physical key material which can actually use to encrypt and decrypt.
➢ A CMK can only be used to directly encrypt or directly decrypt data that is a maximum of 4 KB in
size.
➢ KMS does not store the data encryption key in any way. It provides it to you or the service using
KMS, and then it discards.
➢ The reason it discards it that KMS doesn’t actually perform the encryption and decryption of data
using data encryption keys. you do that or the service using KMS does that.
➢ let's look how his works. when a data encryption key is generated, KMS provides you with two
versions of that data encryption key.
➢ First, a plaintext version of the key and also a ciphertext or an encrypted version of same encryption
key.
➢ The data encryption key is encrypted by the customer master key that generates it So, in the future it
can be decrypted by KMS using that same customer master key.
key concept
• CMKs are isolated to a region & never leave.
• AWS Managed or Customer Managed CMKs
• Customer managed key are more configurable.
• CMKs support rotation.
• CMK itself contains the current backing key, so the physical material that's used to encrypt and
decrypt, as well as previous keys caused by rotating that material.
• you can create Alias, which is essentially a shortcut to a particular CMK.
Key policy and Security
The starting point for KMS security is the key policy, which is a type of resource policy. It's like a
bucket policy on S3 bucket, only a key policy is on a key. Every customer master key has a key policy.
For CMKs you can adjust that policy.
Now there are two components to server-side encryption the first is the actual encryption and decryption
process itself.
So, taking plaintext, a key, an algorithm and generating ciphertext and the reverse which is taking that
ciphertext and the key and using the algorithm to produce the plaintext.
so, one half of the process that server-side encryption can do is the actual encryption operation and the
second part is generation and the management of the cryptographic keys.
Now these 3 methods, handle each of these differently.
Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
When you use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), each object is
encrypted with a unique key. As an additional safeguard, it encrypts the key itself with a master key that
it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available,
256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management
Service (SSE-KMS)
Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service
(SSE-KMS) is similar to SSE-S3, but with some additional benefits and charges for using this service.
There are separate permissions for the use of a CMK that provides added protection against unauthorized
access of your objects in Amazon S3. SSE-KMS also provides you with an audit trail that shows when
your CMK was used and by whom. Additionally, you can create and manage customer managed CMKs
or use AWS managed CMKs that are unique to you, your service, and your Region.
Server-Side Encryption with Customer-Provided Keys (SSE-C)
With Server-Side Encryption with Customer-Provided Keys (SSE-C), you manage the encryption keys
and Amazon S3 manages the encryption, as it writes to disks, and decryption, when you access your
objects
Object Encryption
• Encryption in S3 is simple.
• Buckets are not encrypted you define encryption at an object level and each object inside a
bucket could be using different encryption settings.
• S3 is capable of supporting two main encryption methods
• Clint side encryption
• Server-side encryption both of these refer to encryption at rest.
• There are three types of server-side encryption available for S3 objects.
• Server-side Encryption with Customer-provided keys (SSE-C)
• Server-side Encryption with Amazon S3-managed keys (SSE-S3)
• Server-side Encryption with Customer Master keys (CMKs) stored in AWS key Management
Service (SSE-KMS)
S3 Object Storage Classes
Amazon S3 offers a range of storage classes designed for different use cases. These include S3 Standard for
general-purpose storage of frequently accessed data; S3 Intelligent-Tiering for data with unknown or
changing access patterns; S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-
Infrequent Access (S3 One Zone-IA) for long-lived, but less frequently accessed data; and Amazon S3
Glacier (S3 Glacier) and Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive) for long-term archive
and digital preservation. If you have data residency requirements that can’t be met by an existing AWS
Region, you can use the S3 Outposts storage class to store your S3 data on-premises. Amazon S3 also offers
capabilities to manage your data throughout its lifecycle. Once an S3 Lifecycle policy is set, your data will
automatically transfer to a different storage class without any changes to your application.
Amazon S3 Standard (S3 Standard): S3 Standard offers high durability, availability, and performance
object storage for frequently accessed data. Because it delivers low latency and high throughput, S3
Standard is appropriate for a wide variety of use cases, including cloud applications, dynamic websites,
content distribution, mobile and gaming applications, and big data analytics.
Key features:
➢ Low latency and high throughput performance
➢ Designed for durability of 99.999999999% of objects across multiple Availability Zones
➢ Resilient against events that impact an entire Availability Zone
➢ Designed for 99.99% availability over a given year
➢ Backed with the Amazon S3 Service Level Agreement for availability
➢ Supports SSL for data in transit and encryption of data at rest
➢ S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes
Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering): Amazon S3 Intelligent-Tiering (S3 Intelligent-
Tiering) is the only cloud storage class that delivers automatic cost savings by moving objects between four
access tiers when access patterns change.
Key features:
➢ Automatically optimizes storage costs for data with changing access patterns
➢ Stores objects in four access tiers, optimized for frequent, infrequent, archive, and deep archive
access
➢ Frequent and Infrequent Access tiers have same low latency and high throughput performance of S3
Standard
➢ Activate optional automatic archive capabilities for objects that become rarely accessed
➢ Archive access and deep Archive access tiers have same performance as Glacier and Glacier Deep
Archive
➢ Designed for durability of 99.999999999% of objects across multiple Availability Zones
➢ Designed for 99.9% availability over a given year
➢ Backed with the Amazon S3 Service Level Agreement for availability
➢ Small monthly monitoring and auto-tiering fee
➢ No operational overhead, no retrieval fees, no additional tiering fees apply when objects are moved
between access tiers within the S3 Intelligent-Tiering storage class.
Amazon S3 Standard-Infrequent Access (S3 Standard-IA): S3 Standard-IA is for data that is accessed
less frequently, but requires rapid access when needed. S3 Standard-IA offers the high durability, high
throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. This
combination of low cost and high performance make S3 Standard-IA ideal for long-term storage, backups,
and as a data store for disaster recovery files. S3 Storage Classes can be configured at the object level and a
single bucket can contain objects stored across S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3
One Zone-IA. You can also use S3 Lifecycle policies to automatically transition objects between storage
classes without any application changes.
Key Features:
➢ Same low latency and high throughput performance of S3 Standard
➢ Designed for durability of 99.999999999% of objects across multiple Availability Zones
➢ Resilient against events that impact an entire Availability Zone
➢ Data is resilient in the event of one entire Availability Zone destruction
➢ Designed for 99.9% availability over a given year
➢ Backed with the Amazon S3 Service Level Agreement for availability
➢ Supports SSL for data in transit and encryption of data at rest
➢ S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA): S3 One Zone-IA is for data that is accessed
less frequently, but requires rapid access when needed. Unlike other S3 Storage Classes which store data in
a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20%
less than S3 Standard-IA. S3 One Zone-IA is ideal for customers who want a lower-cost option for
infrequently accessed data but do not require the availability and resilience of S3 Standard or S3 Standard-
IA. It’s a good choice for storing secondary backup copies of on-premises data or easily re-creatable data.
You can also use it as cost-effective storage for data that is replicated from another AWS Region using S3
Cross-Region Replication.
Key Features:
➢ Same low latency and high throughput performance of S3 Standard
➢ Designed for durability of 99.999999999% of objects in a single Availability Zone†
➢ Designed for 99.5% availability over a given year
➢ Backed with the Amazon S3 Service Level Agreement for availability
➢ Supports SSL for data in transit and encryption of data at rest
➢ S3 Lifecycle management for automatic migration of objects to other S3 Storage Classes
➢ † Because S3 One Zone-IA stores data in a single AWS Availability Zone, data stored in this storage
class will be lost in the event of Availability Zone destruction.
Amazon S3 Glacier (S3 Glacier): 3 Glacier is a secure, durable, and low-cost storage class for data
archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-
premises solutions. To keep costs low yet suitable for varying needs, S3 Glacier provides three retrieval
options that range from a few minutes to hours. You can upload objects directly to S3 Glacier, or use S3
Lifecycle policies to transfer data between any of the S3 Storage Classes for active data (S3 Standard, S3
Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA) and S3 Glacier.
Key Features:
➢ Designed for durability of 99.999999999% of objects across multiple Availability Zones
➢ Data is resilient in the event of one entire Availability Zone destruction
➢ Supports SSL for data in transit and encryption of data at rest
➢ Low-cost design is ideal for long-term archive
➢ Configurable retrieval times, from minutes to hours
➢ S3 PUT API for direct uploads to S3 Glacier, and S3 Lifecycle management for automatic
migration of objects.
Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive): S3 Glacier Deep Archive is Amazon S3’s
lowest-cost storage class and supports long-term retention and digital preservation for data that may be
accessed once or twice in a year. It is designed for customers — particularly those in highly-regulated
industries, such as the Financial Services, Healthcare, and Public Sectors — that retain data sets for 7-10
years or longer to meet regulatory compliance requirements. S3 Glacier Deep Archive can also be used for
backup and disaster recovery use cases, and is a cost-effective and easy-to-manage alternative to magnetic
tape systems, whether they are on-premises libraries or off-premises services.
Key Features:
➢ Designed for durability of 99.999999999% of objects across multiple Availability Zones
➢ Lowest cost storage class designed for long-term retention of data that will be retained for 7-10 years
➢ Ideal alternative to magnetic tape libraries
➢ Retrieval time within 12 hours
➢ S3 PUT API for direct uploads to S3 Glacier Deep Archive, and S3 Lifecycle management for
automatic migration of objects
S3 Outposts storage class: Amazon S3 on Outposts delivers object storage to your on-premises AWS
Outposts environment. Using the S3 APIs and features available in AWS Regions today, S3 on Outposts
makes it easy to store and retrieve data on your Outpost, as well as secure the data, control access, tag, and
report on it. S3 on Outposts provides a single Amazon S3 storage class, named S3 Outposts, which uses the
S3 APIs, and is designed to durably and redundantly store data across multiple devices and servers on your
Outposts. S3 Outposts storage class is ideal for workloads with local data residency requirements, and to
satisfy demanding performance needs by keeping data close to on-premises applications.
Key Features:
➢ S3 Object compatibility and bucket management through the S3 SDK
➢ Designed to durably and redundantly store data on your Outposts
➢ Encryption using SSE-S3 and SSE-C
➢ Authentication and authorization using IAM, and S3 Access Points
➢ Transfer data to AWS Regions using AWS DataSync
➢ S3 Lifecycle expiration actions
S3 Lifecycle Configuration
To manage your objects so that they are stored cost effectively throughout their lifecycle, configure their
Amazon S3 Lifecycle. An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3
applies to a group of objects. There are two types of actions:
Transition actions—Define when objects transition to another Using Amazon S3 storage classes. For
example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after you
created them, or archive objects to the S3 Glacier storage class one year after creating them.
Expiration actions—Define when objects expire. Amazon S3 deletes expired objects on your behalf. The
lifecycle expiration costs depend on when you choose to expire objects. There are costs associated with the
lifecycle transition requests.
Managing object lifecycle
Define S3 Lifecycle configuration rules for objects that have a well-defined lifecycle. For example:
➢ If you upload periodic logs to a bucket, your application might need them for a week or a month.
After that, you might want to delete them.
➢ Some documents are frequently accessed for a limited period of time. After that, they are
infrequently accessed. At some point, you might not need real-time access to them, but your
organization or regulations might require you to archive them for a specific period. After that, you
can delete them.
➢ You might upload some types of data to Amazon S3 primarily for archival purposes. For example,
you might archive digital media, financial and healthcare records, raw genomics sequence data, long-
term database backups, and data that must be retained for regulatory compliance.
With S3 Lifecycle configuration rules, you can tell Amazon S3 to transition objects to less expensive storage
classes, or archive or delete them.
S3 Replication
➢ S3 replication, a feature which allows you to configure the replication of objects between a source
and destination S3 bucket.
➢ There are two types of replications supported by S3.
➢ the first type, which has been available for some time is Cross-Region Replication or CRR, and that
allow the replication of objects from a source bucket to a destination bucket in different AWS
regions.
➢ The Second types of replications is Same-Region Replication or SRR, which as the name suggests,
in the same process where both the source and destination buckets ate in the same AWS region.
Replication configuration is applied to the source bucket. The replication configuration, configures S3 to
replicate from this source bucket to a destination bucket.
Another thing that's configured in the replication configuration is an IAM role to use for the replication
process.
The role is configured to allow the S3 service to assume it so that's defined in its trust policy.
The roles permission policy gives it the permission to read objects on the source bucket and permissions to
replicate those objects to the destination bucket.
This is how replication is configured between source and destination buckets, and of course that replication
is encrypted.
There is one crucial difference between replication which occurs in the same AWS account versus different
AWS account. inside one account, both S3 buckets are owned by the same AWS account so they both trust
the same areas account that they are in.
That means they both trust IAM as a service, which means that they both trust the IAM role. for the same
account that means that IAM role automatically has access to the source and destination buckets, as long as
the roles permission policy grant that access. if your configuring replication between different AWS account
though, that's not enough. the destination bucket, because it is in a different AWS account, doesn't trust the
source account or the role that's used to replicate that bucket contents.
S3 replication considerations
• Replication is not retroactive. you enable replication on a pair of buckets a source and a destination.
• If you enable replication on a bucket which already has objects, those objects will not be replicated.
• In order to enable replication on a bucket, both the source and destination bucket need to have
versioning enabled.
• It is one-way replication process only objects are replicated from the source to the destination.
• It is capable of handling objects which are encrypted using SSE-S3 and SSE-KMS, but this an extra
piece of configuration that you will need to enable. (Extra permissions required because KMS is
involved)
• Replication also requires the owner of the source bucket needs permissions on the objects which will
replicate.
• Another limitation is it will not replicate system events. So, if any changes are made in the source
bucket by Lifecycle Management, they will not be replicated to the destination bucket, so only user
events are replicated.
• in Addition to that, it can't replicate any objects inside a bucket that are using the Glacier or Glacier
Deep Archie Storage classes.
• DELETE are not replicated. if you perform any DELETES in the source bucket, they are not
replicated to the destination.
Why use replication?
➢ Same-region replications specifically, you might this process for Log Aggregation. So, if you
have got multiple different S3 buckets, which store logs for different systems, then you could use
this to aggregate those logs into a single S3 bucket.
➢ You might want to use Same-Region Replication to configure some sort of synchronization
between Production and Test accounts. maybe you want to replicate data from PROD to TEST
periodically, or maybe you want to replicate some testing data into your PROD account.
➢ Same-region Replication to implement resilience if you have strict sovereignty requirements.
➢ If you don’t have sovereignty requirements, then you can use Cross-region Replication and use
replication to implement global resilience improvements, so you can have backups of your data
copied to different AWS regions, to cope with large scale failure, you can also replicate data into
different regions to reduce latency.
S3 PreSigned URLs
➢ PreSigned URLs are a way that you can give another person or application access to an object inside
an S3 bucket using your credentials in a safe and secure way.
➢ All objects by default are private. Only the object owner has permission to access these objects.
However, the object owner can optionally share objects with others by creating a presigned URL,
using their own security credentials, to grant time-limited permission to download the objects.
➢ Anyone with valid security credentials can create a presigned URL. However, in order to
successfully access an object, the presigned URL must be created by someone who has permission to
perform the operation that the presigned URL is based upon.
➢ The credentials that you can use to create a presigned URL include:
• IAM instance profile: Valid up to 6 hours
• AWS Security Token Service: Valid up to 36 hours when signed with permanent credentials,
such as the credentials of the AWS account root user or an IAM user
• IAM user: Valid up to 7 days when using AWS Signature Version 4
• To create a presigned URL that's valid for up to 7 days, first designate IAM user credentials
(the access key and secret access key) to the SDK that you're using. Then, generate a
presigned URL using AWS Signature Version 4.
➢ If you created a presigned URL using a temporary token, then the URL expires when the token
expires, even if the URL was created with a later expiration time.
➢ Since presigned URLs grant access to your Amazon S3 buckets to whoever has the URL, we
recommend that you protect them appropriately. For more details about protecting presigned URLs,
see Limiting presigned URL capabilities.
S3 Select and Glacier Select
• S3 and Glacier Select allow you to use a SQL-Like statement to retrieve partial objects from S3
and Glacier.
• S3 can store HUGE objects (up to 5TB)
• You often want to retrieve the entire object
• Retrieving a 5TB object will takes time, uses 5TB
• Filtering at the client side doesn't reduce this
• S3/ Glacier select let you use SQL-like statements.
• to select part of the object, pre-filtered by S3.
• S3 Select and Glacier Select allow you to use number of file format such as CSV, JSON, Parquet,
BZIP2 compression for CSV and JSON.
S3 Events
S3 event notification
➢ event notification of S3 allows you to create event notification configurations on a bucket.
➢ When enabled, a notification is generated when a certain thing occurs within a bucket and these can
be delivered to different destinations including SNS topics, SQS queues, or Lambda functions.
➢ various different types of events are supported. for example, you can generate event notifications
when objects are created, which means Put, Post, Copy, and when long MultiparUpload operations
complete.
➢ You can also set event notification to trigger on object deletion so you can match any type of
deletion using the wild card.
➢ You can also have it trigger for object restores. So, if you have objects in S3 glacier or glacier deep
archive and you perform a restore operation, you can be notified when it starts and completes.
➢ You can get notifications relating to replication.
S3 Access Logs
➢ Server access logging provides detailed records for the requests that are made to a bucket. Server
access logs are useful for many applications. For example, access log information can be useful in
security and access audits. It can also help you learn about your customer base and understand your
Amazon S3 bill.
➢ By default, Amazon S3 doesn't collect server access logs. When you enable logging, Amazon S3
delivers access logs for a source bucket to a target bucket that you choose. The target bucket must be
in the same AWS Region as the source bucket and must not have a default retention period
configuration.
➢ An access log record contains details about the requests that are made to a bucket. This information
can include the request type, the resources that are specified in the request, and the time and date that
the request was processed.
➢ Important: There is no extra charge for enabling server access logging on an Amazon S3 bucket.
However, any log files that the system delivers to you will accrue the usual charges for storage. (You
can delete the log files at any time.) We do not assess data transfer charges for log file delivery, but
we do charge the normal data transfer rate for accessing the log files.
➢ You can enable or disable server access logging by using the Amazon S3 console, Amazon S3 API,
the AWS Command Line Interface (AWS CLI), or AWS SDKs.
➢ Before you enable server access logging, consider the following:
• In Amazon S3, you can grant permission to deliver access logs through bucket access control
lists (ACLs), but not through bucket policy.
• Adding deny conditions to a bucket policy might prevent Amazon S3 from delivering access
logs.
• You can use default bucket encryption on the target bucket only if AES256 (SSE-S3) is selected.
SSE-KMS encryption is not supported.
• You can't enable S3 Object Lock on the target bucket.
MODULE 6 - VIRTUAL PRIVATE CLOUD (VPC) BASICS
Network Refresh
• An IP address (internet protocol address) is a numerical representation that uniquely identifies a
specific interface on the network.
• Addresses in IPv4 are 32-bits long. This allows for a maximum of 4,294,967,296 (232) unique
addresses. Addresses in IPv6 are 128-bits, which allows for 3.4 x 1038 (2128) unique addresses.
• The total usable address pool of both versions is reduced by various reserved addresses and other
considerations.
• IP addresses are binary numbers but are typically expressed in decimal form (IPv4) or hexadecimal
form (IPv6) to make reading and using them easier for humans.
• The Internet Protocol (IP) is part of the Internet layer of the Internet protocol suite. In the OSI model,
IP would be considered part of the network layer. IP is traditionally used in conjunction with a
higher-level protocol, most notably TCP. The IP standard is governed by RFC 791.
Subnet masks
• single IP address identifies both a network, and a unique interface on that network. A subnet mask
can also be written in dotted decimal notation and determines where the network part of an IP
address ends, and the host portion of the address begins.
• When expressed in binary, any bit set to one means the corresponding bit in the IP address is part of
the network address. All the bits set to zero mark the corresponding bits in the IP address as part of
the host address.
• The bits marking the subnet mask must be consecutive ones. Most subnet masks start with 255. and
continue on until the network mask ends.
Private addresses
Within the address space, certain networks are reserved for private networks. Packets from these networks
are not routed across the public internet. This provides a way for private networks to use internal IP
addresses without interfering with other networks. The private networks are
• 10.0.0.1 - 10.255.255.255
• 172.16.0.0 - 172.31.255.255
• 192.168.0.0 - 192.168.255.255
Special addresses
Certain IPv4 addresses are set aside for specific uses:
• 127.0.0.0 Loopback address (the host’s own interface)
• 224.0.0.0 IP Multicast
• 255.255.255.255 Broadcast (sent to all interfaces on network)
CIDR (Classless Inter-Domain Routing) -- CIDR (Classless Inter-Domain Routing) also known as
supernetting -- is a method of assigning Internet Protocol (IP) addresses that improves the efficiency of
address distribution and replaces the previous system based on Class A, Class B and Class C networks. The
initial goal of CIDR was to slow the increase of routing tables on routers across the internet and decrease the
rapid exhaustion of IPv4 addresses. As a result, the number of available internet addresses has greatly
increased.
Amazon Virtual Private Cloud VPC
Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network
that you've defined. This virtual network closely resembles a traditional network that you'd operate in your
own data center, with the benefits of using the scalable infrastructure of AWS.
Amazon VPC is the networking layer for Amazon EC2.
The following are the key concepts for VPCs:
• Virtual private cloud (VPC) — A virtual network dedicated to your AWS account.
• Subnet — A range of IP addresses in your VPC.
• Route table — A set of rules, called routes, that are used to determine where network traffic is
directed.
• Internet gateway — A gateway that you attach to your VPC to enable communication between
resources in your VPC and the internet.
• VPC endpoint — Enables you to privately connect your VPC to supported AWS services and VPC
endpoint services powered by PrivateLink without requiring an internet gateway, NAT device, VPN
connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP
addresses to communicate with resources in the service. Traffic between your VPC and the other
service does not leave the Amazon network.
• CIDR block —Classless Inter-Domain Routing. An internet protocol address allocation and route
aggregation methodology.
VPC considerations
• What size should the VPC be?
• Are there any networks we can't use?
• VPCs, Cloud, on-premises, Partners & Vendors
• Try to predict the future
• VPC structure- Tiers & Resiliency (Available) zones
Access Amazon VPC
You can create, access, and manage your VPCs using any of the following interfaces:
• AWS Management Console — Provides a web interface that you can use to access your VPCs.
• AWS Command Line Interface (AWS CLI) — Provides commands for a broad set of AWS
services, including Amazon VPC, and is supported on Windows, Mac, and Linux.
• AWS SDKs — Provides language-specific APIs and takes care of many of the connection details,
such as calculating signatures, handling request retries, and error handling.
• Query API — Provides low-level API actions that you call using HTTPS requests. Using the Query
API is the most direct way to access Amazon VPC, but it requires that your application handle low-
level details such as generating the hash to sign the request, and error handling.
VPCs and subnets
• A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. It is logically
isolated from other virtual networks in the AWS Cloud. You can launch your AWS resources, such
as Amazon EC2 instances, into your VPC. You can specify an IP address range for the VPC, add
subnets, associate security groups, and configure route tables.
• A subnet is a range of IP addresses in your VPC. You can launch AWS resources into a specified
subnet. Use a public subnet for resources that must be connected to the internet, and a private subnet
for resources that won't be connected to the internet.
• To protect the AWS resources in each subnet, you can use multiple layers of security, including
security groups and network access control lists (ACL).
• You can optionally associate an IPv6 CIDR block with your VPC, and assign IPv6 addresses to the
instances in your VPC.
What can we do with a VPC?
• Launch instances into a subnet of your choice
• Assign custom IP address ranges in each subnet
• Configure router tables between subnets
• Create internet gateway and attach it to our VPC
• Much better security control over your AWS resources
• Instance Security groups
• Subnet network access network list (ACLs)
Default VPC vs Custom VPC
• Default VPC is user friendly, allowing you to immediately deploy instances
• All Subnets in default VPC have a router out to the internet.
• Each EC2 instance has both a public and private IP address
VPC Peering
• Allows you to connect one VPC with another via a direct network route using private IP addresses.
• Instances behave as if they were on the same private network
• You can peer VPCs with other AWS accounts as well as with other VPCs in the same account.
• Peering is in a star configuration: i.e., 1 central VPC peers with 4 others. NO TRANSITIVE
PEERING
Network Address Translation (NAT)
You can now use Network Address Translation (NAT) Gateway, a highly available AWS managed service
that makes it easy to connect to the Internet from instances within a private subnet in an AWS Virtual
Private Cloud (VPC). Previously, you needed to launch a NAT instance to enable NAT for instances in a
private subnet.
Amazon VPC NAT Gateway is available in the US East (N. Virginia), US West (Oregon), US West (N.
California), EU (Ireland), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney) regions.
Remember the following:
• When creating a NAT instance, Disable Source/Destination Check on the instances.
• NAT instances must be in a public subnet.
• There must be a route out of the private subnets to the NAT instance, in order for this to work.
• The amount of traffic that NAT instances can support depends on the instance size. if you are
bottlenecking, increase the instance size.
• You can create high availability using Autoscaling Groups, multiple subnets in different AZs, and a
script to automate failover.
• Behind a Security Group.
• Redundant inside the Availability Zone
• Preferred by the enterprise.
• No need to patch
• Not associated with Security groups
• Automatically assigned as public IP address
• Remember to update your route tables.
Security groups (SGs)
A security group acts as a virtual firewall for your EC2 instances to control incoming and outgoing traffic.
Inbound rules control the incoming traffic to your instance, and outbound rules control the outgoing traffic
from your instance. When you launch an instance, you can specify one or more security groups.
• It supports only allow rules, and by default, all the rules are denied. You cannot deny the rule for
establishing a connection.
• It is a stateful means that any changes made in the inbound rule will be automatically reflected in the
outbound rule. For example, if you are allowing an incoming port 80, then you also have to add the
outbound rule explicitly.
• It is associated with an EC2 instance.
• All the rules are evaluated before deciding whether to allow the traffic.
• Security Group is applied to an instance only when you specify a security group while launching an
instance.
• It is the first layer of defense.
• Security Groups (SGs) are another security feature of AWS VPC ... only unlike NACLs they are
attached to AWS resources, not VPC subnets.
• SGs offer a few advantages vs NACLs in that they can recognize AWS resources and filter based on
them, they can reference other SGs and also themselves.
• But, SGs are not capable of explicitly blocking traffic - so often require assistance from NACLs
Network Access control Lists (NACL)
A network access control list (NACL) is an optional layer of security for your VPC that acts as a firewall for
controlling traffic in and out of one or more subnets. You might set up network ACLs with rules similar to
your security groups in order to add an additional layer of security to your VPC.
• Your VPC automatically comes with a default network ACL, and by default it allows all outbound
and inbound traffic.
• You can create custom network ACLs. By default, each custom network ACL denies all inbound and
outbound traffic until you add rules.
• Each subnet in you VPC must be associated with a network ACL. if you don’t explicitly associate a
subnet with a network ACL, the subnet is automatically associated with the default network ACL.
• Block IP Addresses using network ACLs not Security Groups.
• You can associate a network ACL with multiple subnets; however, a subnet can be associated with
only one network ACL at a time. When you associate a network ACL with a subnet, the previous
association is removed.
• network ACLs contain a numbered list of rules that is evaluated in order, starting with the lowest
numbered rule.
• network ACLs have separate inbound and outbound rules, and each rule can either allow or deny
traffic.
• network ACLs are stateless; responses to allowed inbound traffic and subject to the rules for
outbound traffic (and vice versa).
• Network Access Control Lists (NACLs) are a type of security filter (like firewalls) which can filter
traffic as it enters or leaves a subnet.
• NACLs are attached to subnets and only filter data as it crosses the subnet boundary.
• NACLs are stateless and so see initiation and response phases of a connection and 1 inbound and 1
outbound stream requiring two roles (one IN one OUT)
Internet Gateways (IG)
• An internet gateway is a horizontally scaled, redundant, and highly available VPC component that
allows communication between your VPC and the internet.
• An internet gateway serves two purposes: to provide a target in your VPC route tables for internet-
routable traffic, and to perform network address translation (NAT) for instances that have been
assigned public IPv4 addresses.
• An internet gateway supports IPv4 and IPv6 traffic. It does not cause availability risks or bandwidth
constraints on your network traffic. There's no additional charge for having an internet gateway in
your account.
• If a subnet is associated with a route table that has a route to an internet gateway, it's known as a
public subnet.
• If a subnet is associated with a route table that does not have a route to an internet gateway, it's
known as a private subnet.
• When you add a new subnet to your VPC, you must set up the routing and security that you want for
the subnet.
• Internet Gateway is region resilient attached to VPC.
• There's a one-to-one relationship between internet gateways and the VPC. A VPC can have no
internet gateways which makes it entirely private, or it can have one internet gateway, those are the
two choices. And IGW can be created and not attached to VPC, so it can have zero attachment, but it
can only ever be attached to one VPC at a time at which point it's valid in all AZs that the VPC uses.
Virtualization 101
EC2 provides virtualization as a service. It’s an infrastructure as a service or IS product.
Virtualization is the process of running more than one operating system on a piece of physical hardware, a
server.
EC2 Architecture and Resilience
EC2 Architecture
• EC2 instances are virtual machines (OS + resources)
• EC2 instances run on EC2 Hosts
• EC2 provides Shared Hosts or Dedicated Hosts
• Hosts = 1AZ - AZ Fails, Host Fails, Instance Fail
What's EC2 Good for?
• Traditional OS + Application Compute
• Long Running Compute
• Server style application
• burst or steady-state load
• Monolithic application stacks
• Migrated application workloads or disaster recovery
EC2 Instance Types
• At a high level, when you choose an EC2 instance type, you are doing so to influence a few different
things.
• First, logically, the raw amount of resources that you get. like CPU, Memory, Local Storage
Capacity & Type.
• Resources Ratio.
• Storage and Data Network Bandwidth
• System Architecture / Vendor
• Additional Features and Capabilities.
EC2 Categories
• General Purpose - Default - Diverse workload, equal resource ratio.
• Compute Optimized - Media Processing, HPC, Scientific Modelling, gaming, Machine Learning.
• Memory Optimized - Processing large in-memory datasets, some database workloads.
• Accelerated Computing - Hardware GPU, field programmable gate arrays (FPCAs).
• Storage Optimized - Sequential and Random IO - scale-out transactional databases, data
warehousing, Elastic search, analytics workloads.
Decoding EC2 Types
• R5dn.8xlarge this is known as instance type.
• The letter at the start is the instance family. there are lots of example of this. The T family, the M
family, the I family and the R family. there's lot more, but each of these are designed for a specific
type or types of computing.
• The next part is the generation. So, the number 5, in this case is the generation.
• Generally, with AWS always select the most recent generation. it always provides the best price-to-
performance option.
• 8xlarge or eight extra-large is the instance size.
Instance Store
• Instance store backed instance is an EC2 instance using an Instance store as root device volume
created from a template stored in S3.
• An instance store is ephemeral storage that provides temporary block level storage for your instance.
Instance store is ideal for temporary storage like buffers, caches, and other temporary content.
• Instance store volumes accesses storage from disks that are physically attached to the host computer.
• When an Instance stored instance is launched, the image that is used to boot the instance is copied to
the root volume (typically sda1).
• Instance store provides temporary block-level storage for instances.
• Data on an instance store volume persists only during the life of the associated instance; if an
instance is stopped or terminated, any data on instance store volumes is lost.
Key points for Instance store backed Instance
• Boot time is slower then EBS backed volumes and usually less than 5 min
• Can be selected as Root Volume and attached as additional volumes
• Instance store backed Instances can be of maximum 10GiB volume size
• Instance store volume can be attached as additional volumes only when the instance is being
launched and cannot be attached once the Instance is up and running
• The data in an instance store persists only during the lifetime of its associated instance. If an instance
reboots (intentionally or unintentionally), data in the instance store persists
• Instance store backed Instances cannot be stopped, as when stopped and started AWS does not
guarantee the instance would be launched in the same host and hence the data is lost
• AMI creation requires usage on AMI tools and needs to be executed from within the running
instance
• Instance store backed Instances cannot be upgraded
• For EC2 instance store-backed instances AWS recommends to:
1. Distribute the data on the instance stores across multiple AZs
2. Back up critical data from the instance store volumes to persistent storage on a regular basis
• Data on Instance store volume is LOST in following scenarios:
1. Underlying disk drive fails
2. Instance stops
3. Instance terminates
4. Instance hibernates
Therefore, do not rely on instance store for valuable, long-term data.
Amazon Elastic Block Store (EBS)
• An “EBS-backed” instance means that the root device for an instance launched from the AMI is an
EBS volume created from an EBS snapshot
• An EBS volume behaves like a raw, unformatted, external block device that can be attached to a
single instance and are not physically attached to the Instance host computer (more like a network
attached storage).
• Volume persists independently from the running life of an instance. After an EBS volume is attached
to an instance, you can use it like any other physical hard drive.
• EBS volume can be detached from one instance and attached to another instance.
• EBS volumes can be created as encrypted volumes using the EBS encryption feature.
• EBS is block store which is separately attached to EC2. Also, its design such a way that it will be
replicated within its availability zone so it provides high availability and durability.
• And the additional advantage of it is, you can have back-ups for EBS by creating Snapshots which is
not possible instance store. So that whenever you want to retrieve the data you can just create the
EBS volume from the snapshot.
Key points for EBS backed Instance
• Boot time is very fast usually less than a min.
• Can be selected as Root Volume and attached as additional volumes.
• EBS backed Instances can be of maximum 16TiB volume size depending upon the OS.
• EBS volume can be attached as additional volumes when the Instance is launched and even when the
Instance is up and running.
• When EBS-backed instance is in a stopped state, various instance– and volume-related tasks can be
done for e.g., you can modify the properties of the instance, you can change the size of your instance
or update the kernel it is using, or you can attach your root volume to a different running instance for
debugging or any other purpose.
• EBS volumes are AZ scoped and tied to a single AZ in which created.
• EBS volumes are automatically replicated within that zone to prevent data loss due to failure of any
single hardware component.
• AMI creation is easy using a Single command.
• EBS backed Instances can be upgraded for instance type, Kernel, RAM disk and user data.
• Data on the EBS volume is LOST:
1. For EBS Root volume, if delete on termination flag is enabled (enabled, by default).
2. For attached EBS volumes, if the Delete on termination flag is enabled (disabled, by
default).
• Data on EBS volume is NOT LOST in following scenarios:
1. Reboot on the Instance.
2. Stopping an EBS-backed instance.
3. Termination of the Instance for the additional EBS volumes. Additional EBS volumes
are detached with their data intact.
Snapshots, Restore & Fast Snapshot Restore (FSR)
Amazon EBS Snapshots provide a simple and secure data protection solution that is designed to protect your
block storage data such as EBS volumes, boot volumes, as well as on-premises block data. EBS Snapshots
are a point in time copy of your data, and can be used to enable disaster recovery, migrate data across
regions and accounts, and improve backup compliance.
EBS Snapshots
• EBS Snapshots are backups of data consumed within EBS Volumes - Stored on S3.
• Snapshots are incremental, the first being a full back up - and any future snapshots being
incremental.
• Snapshots can be used to migrate data to different availability zones in a region, or to different
regions of AWS.
• Snapshots exist on S3. Think of snapshots as a photograph of the disk.
• Snapshots are point in time copies of Volumes.
• Snapshots are incremental - this means that only the blocks that have changed since your last
Snapshot are moved to S3.
• if this is your first Snapshots, it may take some time to create.
• To create a snapshot for Amazon EBS volumes that serve as root devices, you should stop the
instance before taking the snapshot.
• However, you can take snap while the instance is running.
• You can create AMI's from both Volumes and Snapshots.
• You can change EBS volume sizes on the fly, including changing the EC2 size and storage type.
• Volumes will ALWAYS be in the same availability zones as the EC2 instance.
Migrating EBS
• To move an EC2 volume from one AZ to another, take a snapshot of it, create an AMI from the
snapshot and then use the AMI to launch the EC2 instance in a new AZ.
• To move an EC2 volume from one region to another, take a snapshot of it, create an AMI from the
snapshot and then copy the AMI from one region to the other. Then use the copied AMI to launch
the new EC2 instance in the new region.
Amazon EBS fast snapshot restore
• Amazon EBS fast snapshot restore enables you to create a volume from a snapshot that is fully
initialized at creation. This eliminates the latency of I/O operations on a block when it is accessed for
the first time. Volumes that are created using fast snapshot restore instantly deliver all of their
provisioned performance.
• To get started, enable fast snapshot restore for specific snapshots in specific Availability Zones. Each
snapshot and Availability Zone pair refers to one fast snapshot restore. When you create a volume
from one of these snapshots in one of its enabled Availability Zones, the volume is restored using
fast snapshot restore.
• You can enable fast snapshot restore for snapshots that you own and for public and private snapshots
that are shared with you.
Fast snapshot restores quotas
• You can enable up to 50 snapshots for fast snapshot restore per Region. The quota applies to
snapshots that you own and snapshots that are shared with you. If you enable fast snapshot restore
for a snapshot that is shared with you, it counts towards your fast snapshot restore quota. It does not
count towards the snapshot owner's fast snapshot restore quota.
Fast snapshot restores states
After you enable fast snapshot restore for a snapshot, it can be in one of the following states.
enabling — A request was made to enable fast snapshot restore.
optimizing — Fast snapshot restore is being enabled. It takes 60 minutes per TiB to optimize a snapshot.
Snapshots in this state offer some performance benefit when restoring volumes.
enabled — Fast snapshot restore is enabled. Snapshots in this state offer the full performance benefit when
restoring volumes.
disabling — A request was made to disable fast snapshot restore, or a request to enable fast snapshot restore
failed.
disabled — Fast snapshot restore is disabled. You can enable fast snapshot restore again as needed.
EBS Encryption
The following are examples of problems that can cause instance status checks to fail:
Alternatively, you can choose the plus sign ( ) in the Alarm status column.
4. On the Manage CloudWatch alarms page, do the following:
a. Choose Create an alarm.
b. To receive an email when the alarm is triggered, for Alarm notification, choose an existing
Amazon SNS topic. You first need to create an Amazon SNS topic using the Amazon SNS
console.
Note: Users must subscribe to the specified SNS topic to receive email notifications when the
alarm is triggered. The AWS account root user always receives email notifications when
automatic instance recovery actions occur, even if an SNS topic is not specified or the root
user is not subscribed to the specified SNS topic.
Introduction to Containers
Containers are an operating system virtualization technology used to package applications and their
dependencies and run them in isolated environments. They provide a lightweight method of packaging and
deploying applications in a standardized way across many different types of infrastructure.
• Containers are created from container images: bundles that represent the system, applications, and
environment of the container.
• Container images act like templates for creating specific containers, and the same image can be used
to spawn any number of running containers.
What is Docker?
Docker is an open platform for developing, shipping, and running applications. Docker enables you to
separate your applications from your infrastructure so you can deliver software quickly. With Docker, you
can manage your infrastructure in the same ways you manage your applications. By taking advantage of
Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the
delay between writing code and running it in production.
Container Key Concepts
• Dockerfiles are used to build images
• It Portable - self-contained, always run as expected.
• Container are lightweight - Parent OS used, fs layer is shared
• Container only runs the application & environment it needs.
• provides much of the isolation VM's do
• Ports are 'exposed' to the host and beyond.
• it's important to understand that some more complex application stacks can consist of multiple
containers. you can use multiple containers in a single architecture, either to scale a specific part of
the application, or when you're using multiple tiers.
ECS - Concepts
Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that
helps you easily deploy, manage, and scale containerized applications. It deeply integrates with the rest of
the AWS platform to provide a secure and easy-to-use solution for running container workloads in the cloud
and now on your infrastructure with Amazon ECS Anywhere.
Amazon ECS leverages serverless technology from AWS Fargate to deliver autonomous container
operations, which reduces the time spent on configuration, patching, and security. Instead of worrying about
managing the control plane, add-ons, and nodes, Amazon ECS enables you to rapidly build applications and
grow your business.
The following sections dive into these individual elements of the Amazon ECS architecture in more detail.
Containers and images
To deploy applications on Amazon ECS, your application components must be architected to run in
containers. A container is a standardized unit of software development that contains everything that your
software application needs to run, including relevant code, runtime, system tools, and system libraries.
Containers are created from a read-only template called an image.
Images are typically built from a Dockerfile, which is a plaintext file that specifies all of the components
that are included in the container. After being built, these images are stored in a registry where they then can
be downloaded and run on your cluster.
Task definitions
To prepare your application to run on Amazon ECS, you must create a task definition. The task definition is
a text file (in JSON format) that describes one or more containers (up to a maximum of ten) that form your
application. The task definition can be thought of as a blueprint for your application. It specifies various
parameters for your application. For example, these parameters can be used to indicate which containers
should be used, which ports should be opened for your application, and what data volumes should be used
with the containers in the task. The specific parameters available for your task definition depend on the
needs of your specific application.
Tasks and scheduling
A task is the instantiation of a task definition within a cluster. After you have created a task definition for
your application within Amazon ECS, you can specify the number of tasks to run on your cluster.
The Amazon ECS task scheduler is responsible for placing tasks within your cluster. There are several
different scheduling options available. For example, you can define a service that runs and maintains a
specified number of tasks simultaneously.
Clusters
An Amazon ECS cluster is a logical grouping of tasks or services. You can register one or more Amazon
EC2 instances (also referred to as container instances) with your cluster to run tasks on them. Or, you can
use the serverless infrastructure that Fargate provides to run tasks. When your tasks are run on Fargate, your
cluster resources are also managed by Fargate.
When you first use Amazon ECS, a default cluster is created for you. You can create additional clusters in
an account to keep your resources separate.
Container agent
The container agent runs on each container instance within an Amazon ECS cluster. The agent sends
information about the resource's current running tasks and resource utilization to Amazon ECS. It starts and
stops tasks whenever it receives a request from Amazon ECS.
Amazon ECS can be used along with the following AWS services:
AWS Identity and Access Management: IAM (Identity and Access Management) is an access
management service that helps you securely control access to AWS resources. You can use IAM to control
who is authenticated (signed in) and authorized (has permissions) to view or perform specific actions on
resources.
Amazon EC2 Auto Scaling: Auto Scaling is a service that enables you to automatically scale out or in your
tasks based on user-defined policies, health status checks, and schedules. You can use Auto Scaling with a
Fargate task within a service to scale in response to a number of metrics or with an EC2 task to scale the
container instances within your cluster.
Elastic Load Balancing: The Elastic Load Balancing service automatically distributes incoming application
traffic across the tasks in your Amazon ECS service. It enables you to achieve greater levels of fault
tolerance in your applications, seamlessly providing the required amount of load-balancing capacity needed
to distribute application traffic. You can use Elastic Load Balancing to create an endpoint that balances
traffic across services in a cluster.
Amazon Elastic Container Registry: Amazon ECR is a managed AWS Docker registry service that is
secure, scalable, and reliable. Amazon ECR supports private Docker repositories with resource-based
permissions using IAM so that specific users or tasks can access repositories and images. Developers can
use the Docker CLI to push, pull, and manage images
AWS CloudFormation: AWS CloudFormation gives developers and systems administrators an easy way to
create and manage a collection of related AWS resources. More specifically, it makes resource provisioning
and updating more orderly and predictable. You can define clusters, task definitions, and services as entities
in an AWS CloudFormation script.
ECS - Cluster Mode
• An Amazon ECS cluster is a logical grouping of tasks or services. Your tasks and services are run on
infrastructure that is registered to a cluster.
• The infrastructure capacity can be provided by AWS Fargate, which is serverless infrastructure that
AWS manages, Amazon EC2 instances that you manage, or an on-premise server or virtual machine
(VM) that you manage remotely.
• In most cases, Amazon ECS capacity providers can be used to manage the infrastructure the tasks in
your clusters use.
• When you first use Amazon ECS, a default cluster is created for you, but you can create multiple
clusters in an account to keep your resources separate.
Cluster concepts
The following are general concepts about Amazon ECS clusters.
➢ Clusters are Region-specific.
➢ The following are the possible states that a cluster can be in.
➢ ACTIVE: The cluster is ready to accept tasks and, if applicable, you can register container instances
with the cluster.
➢ PROVISIONING: The cluster has capacity providers associated with it and the resources needed
for the capacity provider are being created.
➢ DEPROVISIONING: The cluster has capacity providers associated with it and the resources
needed for the capacity provider are being deleted.
➢ FAILED: The cluster has capacity providers associated with it and the resources needed for the
capacity provider have failed to create.
➢ INACTIVE: The cluster has been deleted. Clusters with an INACTIVE status may remain
discoverable in your account for a period of time. However, this behaviour is subject to change in the
future, so you should not rely on INACTIVE clusters persisting.
➢ A cluster may contain a mix of tasks hosted on AWS Fargate, Amazon EC2 instances, or external
instances.
➢ A cluster may contain a mix of both Auto Scaling group capacity providers and Fargate capacity
providers, however when specifying a capacity provider strategy, they may only contain one or the
other but not both.
➢ For tasks using the EC2 launch type, clusters can contain multiple different container instance types,
but each container instance may only be registered to one cluster at a time.
➢ Custom IAM policies may be created to allow or restrict user access to specific clusters.
Simple Routing
Simple routing lets you configure standard DNS records, with no special Route 53 routing such as weighted
or latency. With simple routing, you typically route traffic to a single resource, for example, to a web server
for your website.
• Simple Routing supports 1 record per name (www)
• Each Record can have multiple values
• All values are returned in a random order
• Simple Routing doesn’t support health checks - all values are returned for a record when queried
• Use Simple Routing when you want to route requests towards on service such as a web server.
R53 Health Checks
Amazon Route 53 health checks monitor the health and performance of your web applications, web servers,
and other resources. Each health check that you create can monitor one of the following:
• The health of a specified resource, such as a web server
• The status of other health checks
• The status of an Amazon CloudWatch alarm
• Health checkers located globally
• Health checkers check every 30s (every 10s cost extra)
• You can view the current and recent status of your health checks on the Route 53 console. You can
also work with health checks programmatically through one of the AWS SDKs, the AWS Command
Line Interface, AWS Tools for Windows PowerShell, or the Route 53 API.
• If you want to receive a notification when the status of a health check changes, you can configure an
Amazon CloudWatch alarm for each health check.
Failover Routing
Failover routing lets you route traffic to a resource when the resource is healthy or to a different resource
when the first resource is unhealthy.
• If the target of the health check is Healthy the Primary record is used.
• If the target of the health check is Unhealthy then any queries return the secondary record of the
same name.
• A common architecture is to use failover for an "out of band" failure/ maintenance page for a service
(e.g., EC2/S3)
• Use when you want to configure active passive failover.
Multi Value Routing
Multi value answer routing lets you configure Amazon Route 53 to return multiple values, such as IP
addresses for your web servers, in response to DNS queries. You can specify multiple values for almost any
record, but multi value answer routing also lets you check the health of each resource, so Route 53 returns
only values for healthy resources.
• Multi Value Routing supports multiple records with the same name.
• Each record is independent and can have an associated health check.
• Any records which fail health checks won’t be returned when queried.
• Up to 8 'healthy' records are returned. If more exist, 8 are randomly selected.
• Multi Value improve availability. it is NOT a replacement for load balancing.
Latency Routing
If your application is hosted in multiple AWS Regions, you can improve performance for your users by
serving their requests from the AWS Region that provides the lowest latency.
• Use Latency-based routing when optimising for performance & user experience.
• AWS maintains a database of latency between the user general location and the regions tagged in
records.
• The record returned is the one which offers the lowest estimated latency & is healthy
• Latency-Based routing supports one record with the same name in each AWS Region
Geolocation Routing
Geolocation routing lets you choose the resources that serve your traffic based on the geographic location of
your users, meaning the location that DNS queries originate from.
• R53 checks for records
1) in the state,
2) country,
3) the continent and
4) (optionally) default - it returns the most specific record or "NO ANSWER"
• Can be used for regional restrictions, language specific content or load balancing across regional
endpoints
• With Geolocation records are tagged with location. Either "US state", country, continent or default.
• An IP check verifies the location of the user (normally the resolver)
Geoproximity Routing
Geoproximity routing lets Amazon Route 53 route traffic to your resources based on the geographic location
of your users and your resources. You can also optionally choose to route more traffic or less to a given
resource by specifying a value, known as a bias. A bias expands or shrinks the size of the geographic region
from which traffic is routed to a resource.
• "+" or "-" bias can be added to rules. "+" increases a region size and decreases neighbouring regions
• Records can be tagged with an AWS Region or latitude & longitude coordinates
• Routing is distance based (including bias)
R53 Interoperability
• R53 normally has 2 jobs - Domain registrar and Domain Hosting
• R53 can do BOTH, or either Domain Registrar or Domain Hosting
• R53 Accepts your money (Domain Registration fee)
• R53 allocates 4 Name Servers (NS) (Domain Hosting)
• R53 Creates a zone file (Domain Hosting) on the above NS
• R53 communicates with the registry of the TLD (Domain Registrar)
• And Set the NS records for the domain to point at the 4 NS above
MODULE 11 - RELATIONAL DATABASE SERVICE (RDS)
Database Refresher
Relational databases
A relational database, also called Relational Database Management System (RDBMS) or SQL database,
stores data in tables and rows also referred to as records. A relational database works by linking information
from multiple tables through the use of “keys.” A key is a unique identifier which can be assigned to a row
of data contained within a table. This unique identifier, called a “primary key,” can then be included in a
record located in another table when that record has a relationship to the primary record in the main table.
When this unique primary key is added to a record in another table, it is called a “foreign key” in the
associated table. The connection between the primary and foreign key then creates the “relationship”
between records contained across multiple tables.
What you need to know about relational databases:
• They work with structured data.
• Relationships in the system have constraints, which promotes a high level of data integrity.
• There are limitless indexing capabilities, which results in faster query response times.
• They are excellent at keeping data transactions secure.
• They provide the ability to write complex SQL queries for data analysis and reporting.
• Their models can ensure and enforce business rules at the data layer adding a level of data integrity
not found in a non-relational database.
• They are table and row oriented.
• They Use SQL (structured query language) for shaping and manipulating data, which is very
powerful.
• SQL database examples: MySql, Oracle, Sqlite, Postgres and MS-SQL. NoSQL database examples:
MongoDB, BigTable, Redis, RavenDb, Cassandra, Hbase, Neo4j and CouchDb.
• SQL databases are best fit for heavy duty transactional type applications.
Non-relational databases
The non-relational database, or NoSQL database, stores data. However, unlike the relational database, there
are no tables, rows, primary keys or foreign keys. Instead, the non-relational database uses a storage model
optimized for specific requirements of the type of data being stored.
Some of the more popular NoSQL databases are MongoDB, Apache Cassandra, Redis, Couchbase and
Apache HBase. There are four popular non-relational types: document data store, column-oriented database,
key-value store and graph database. Often combinations of these types are used for a single application.
• They have the ability to store large amounts of data with little structure.
• They provide scalability and flexibility to meet changing business requirements.
• They provide schema-free or schema-on-read options.
• They have the ability to capture all types of data “Big Data” including unstructured data.
• They are document oriented.
• NoSQL or non-relational databases examples: MongoDB, Apache Cassandra, Redis, Couchbase and
Apache HBase.
• They are best for Rapid Application Development. NoSQL is the best selection for flexible data
storage with little to no structure limitations.
• They provide flexible data model with the ability to easily store and combine data of any structure
without the need to modify a schema.
ACID vs BASE
ACID and BASE are DB transactional models.
The CAP theorem states that it is impossible to achieve both consistency and availability in a partition
tolerant distributed system (i.e., a system which continues to work in cases of temporary communication
breakdowns).
The fundamental difference between ACID and BASE database models is the way they deal with this
limitation.
• The ACID model provides a consistent system.
• The BASE model provides high availability.
ACID stands for:
Atomic – Each transaction is either properly carried out or the process halts and the database reverts back to
the state before the transaction started. This ensures that all data in the database is valid.
Consistent – A processed transaction will never endanger the structural integrity of the database.
Isolated – Transactions cannot compromise the integrity of other transactions by interacting with them
while they are still in progress.
Durable – The data related to the completed transaction will persist even in the cases of network or power
outages. If a transaction fails, it will not impact the manipulated data.
ACID Use Case Example
Financial institutions will almost exclusively use ACID databases. Money transfers depend on the atomic
nature of ACID. An interrupted transaction which is not immediately removed from the database can cause a
lot of issues. Money could be debited from one account and, due to an error, never credited to another.
BASE stands for:
Basically Available – Rather than enforcing immediate consistency, BASE-modelled NoSQL databases will
ensure availability of data by spreading and replicating it across the nodes of the database cluster.
Soft State – Due to the lack of immediate consistency, data values may change over time. The BASE model
breaks off with the concept of a database which enforces its own consistency, delegating that responsibility
to developers.
Eventually Consistent – The fact that BASE does not enforce immediate consistency does not mean that it
never achieves it. However, until it does, data reads are still possible (even though they might not reflect the
reality).
BASE Use Case Example
Marketing and customer service companies who deal with sentiment analysis will prefer the elasticity of
BASE when conducting their social network research. Social network feeds are not well structured but
contain huge amounts of data which a BASE-modelled database can easily store.
Databases on EC2
• The Relational Database Service (RDS) is a Database(server) as a service product from AWS which
allows the creation of managed databases instances.
• RDS more accurately described as a Database Server-as-a-service product. RDS is a product which
provides Managed Database Instances which can themselves hold one or more DBs.
• with RDS, what you’re paying for and consuming, is a database server that you have access to.
• The benefits of RDS are that you don’t need to manage the physical hardware, or the server OS, or
the database system itself. AWS handle all of that behind the scenes.
• RDS supports the most popular database types that you’ve probably encountered.
MySQL, MariaDB, PostgreSQL, Oracle, and even MS SQL Server. You can pick whatever works
for your given application requirements. Each of them comes with features and limitations which
you should be aware of for real-world usage, but that you don’t need a detailed understanding of for
the exam.
• Amazon Aurora is a database engine which AWS have created, and you can select when creating
RDS instances.
RDS High-Availability (Multi AZ)
Amazon RDS provides high availability and failover support for DB instances using multi-AZ deployments.
MultiAZ is a feature of RDS which provisions a standby replica which is kept in sync Synchronously with
the primary instance.
• In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous
standby replica in a different Availability Zone.
• The primary DB instance is synchronously replicated across Availability Zones to a standby replica
to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system
backups.
• Running a DB instance with high availability can enhance availability during planned system
maintenance, and help protect your databases against DB instance failure and Availability Zone
disruption.
• The standby replica cannot be used for any performance scaling ... only availability.
• Backups, software updates and restarts can take advantage of MultiAZ to reduce user disruption.
RDS Automatic Backup, RDS Snapshots and Restore
➢ By default, Amazon RDS creates and saves automated backups of your DB instance securely in
Amazon S3 for a user-specified retention period.
➢ In addition, you can create snapshots, which are user-initiated backups of your instance that are kept
until you explicitly delete them.
➢ You can create a new instance from a database snapshots whenever you desire. Although database
snapshots serve operationally as full backups, you are billed only for incremental storage use.
➢ RDS is capable of performing Manual Snapshots and Automatic backups
➢ Manual snapshots are performed manually and live past the termination of an RDS instance
➢ Automatic backups can be taken of an RDS instance with a 0 (Disabled) to 35 Day retention.
➢ Automatic backups also use S3 for storing transaction logs every 5 minutes - allowing for point in
time recovery.
➢ Snapshots can be restored. but create a new RDS instance.
Automated Backups
➢ Turned on by default, the automated backup feature of Amazon RDS will backup your databases and
transaction logs. Amazon RDS automatically creates a storage volume snapshot of your DB instance,
backing up the entire DB instance and not just individual databases.
➢ This backup occurs during a daily user-configurable 30-minute period known as the backup window.
Automated backups are kept for a configurable number of days (called the backup retention period).
Your automatic backup retention period can be configured to up to thirty-five days.
Point-in-time Restores
➢ You can restore your DB instance to any specific time during the backup retention period, creating a
new DB instance. To restore your database instance, you can use the AWS Console or Command
Line Interface.
➢ To determine the latest restorable time for a DB instance, use the AWS Console or Command Line
Interface to look at the value returned in the LatestRestorableTime field for the DB instance. The
latest restorable time for a DB instance is typically within 5 minutes of the current time.
Database Snapshots
➢ Database snapshots are user-initiated backups of your instance stored in Amazon S3 that are kept
until you explicitly delete them. You can create a new instance from a database snapshots whenever
you desire. Although database snapshots serve operationally as full backups, you are billed only for
incremental storage use.
Snapshot Copies
➢ With Amazon RDS, you can copy DB snapshots and DB cluster snapshots. You can copy automated
or manual snapshots. After you copy a snapshot, the copy is a manual snapshot. You can copy a
snapshot within the same AWS Region, you can copy a snapshot across AWS Regions, and you can
copy a snapshot across AWS accounts.
Snapshot Sharing
➢ Using Amazon RDS, you can share a manual DB snapshot or DB cluster snapshot with other AWS
accounts. Sharing a manual DB snapshot or DB cluster snapshot, whether encrypted or unencrypted,
enables authorized AWS accounts to copy the snapshot.
➢ Sharing an unencrypted manual DB snapshot enables authorized AWS accounts to directly restore a
DB instance from the snapshot instead of taking a copy of it and restoring from that. This isn't
supported for encrypted manual DB snapshots.
➢ Sharing a manual DB cluster snapshot, whether encrypted or unencrypted, enables authorized AWS
accounts to directly restore a DB cluster from the snapshot instead of taking a copy of it and
restoring from that.
RDS Read-Replicas
➢ Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB)
instances.
➢ They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for
read-heavy database workloads.
➢ You can create one or more replicas of a given source DB Instance and serve high-volume
application read traffic from multiple copies of your data, thereby increasing aggregate read
throughput.
➢ Read replicas can also be promoted when needed to become standalone DB instances.
➢ Read replicas are available in Amazon RDS for MySQL, MariaDB, PostgreSQL, Oracle, and SQL
Server as well as Amazon Aurora.
(read) performance Improvements
➢ 5x direct read-replicas per DB instance
➢ Each providing an additional instance of read performance
➢ Global performance improvements
RDS Data Security
➢ SSL/TLS (in transit) is available for RDS, can be mandatory
➢ RDS supports EBS volume encryption - KMS
➢ Handled by HOST/EBS
➢ AWS or Customer Managed CMK generates data keys
➢ Data Keys used for encryption operations
➢ Storage, Logs, Snapshots & replicas are encrypted and encryption can’t be removed.
➢ RDS MSSQL and RDS Oracle support TDE (Transparent Data Encryption)
➢ Encryption handled within the DB engine
➢ RDS Oracle supports integration with CloudHSM
➢ Much stronger key controls (even from AWS)
Amazon RDS IAM Authentication
➢ You can authenticate to your DB instance using AWS Identity and Access Management (IAM)
database authentication.
➢ IAM database authentication works with MySQL and PostgreSQL. With this authentication method,
you don't need to use a password when you connect to a DB instance. Instead, you use an
authentication token.
➢ An authentication token is a unique string of characters that Amazon RDS generates on request.
Authentication tokens are generated using AWS Signature Version 4. Each token has a lifetime of 15
minutes.
➢ You don't need to store user credentials in the database, because authentication is managed externally
using IAM. You can also still use standard database authentication.
Aurora Architecture
An Amazon Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the
data for those DB instances. An Aurora cluster volume is a virtual database storage volume that spans
multiple Availability Zones, with each Availability Zone having a copy of the DB cluster data. Two types of
DB instances make up an Aurora DB cluster:
1. Primary DB instance – Supports read and write operations, and performs all of the data
modifications to the cluster volume. Each Aurora DB cluster has one primary DB instance.
2. Aurora Replica – Connects to the same storage volume as the primary DB instance and supports
only read operations. Each Aurora DB cluster can have up to 15 Aurora Replicas in addition to the
primary DB instance. Maintain high availability by locating Aurora Replicas in separate Availability
Zones. Aurora automatically fails over to an Aurora Replica in case the primary DB instance
becomes unavailable. You can specify the failover priority for Aurora Replicas. Aurora Replicas can
also offload read workloads from the primary DB instance.
Lambda@Edge
➢ Lambda@Edge allows CloudFront to run lambda function at CloudFront edge locations to modify
traffic between the viewer and edge location and edge locations and origins.
➢ Lambda@Edge is a feature of Amazon CloudFront that lets you run code closer to users of your
application, which improves performance and reduces latency.
➢ With Lambda@Edge, you don't have to provision or manage infrastructure in multiple locations
around the world. You pay only for the compute time you consume - there is no charge when your
code is not running.
➢ With Lambda@Edge, you can enrich your web applications by making them globally distributed and
improving their performance — all with zero server administration. Lambda@Edge runs your code
in response to events generated by the Amazon CloudFront content delivery network (CDN).
➢ Just upload your code to AWS Lambda, which takes care of everything required to run and scale
your code with high availability at an AWS location closest to your end user.
➢ You can run lightweight Lambda at edge locations
➢ Adjust data between the Viewer & Origin
➢ Currently supported Node.js and Python
➢ Run in the AWS Public Space (Not VPC)
➢ Layers are not supported.
➢ You can use Lambda functions to change CloudFront requests and responses at the following points:
o After CloudFront receives a request from a viewer (viewer request)
o Before CloudFront forwards the request to the origin (origin request)
o After CloudFront receives the response from the origin (origin response)
o Before CloudFront forwards the response to the viewer (viewer response)
Lambda@Edge Use cases:
➢ Redirecting Viewer Requests to a Country-Specific URL
➢ Serving Different Versions of an Object Based on the Device
➢ Content-Based Dynamic Origin Selection
➢ Using an Origin-Request Trigger to Change From a Custom Origin to an Amazon S3 Origin
➢ Using an Origin-Request Trigger to Gradually Transfer Traffic From One Amazon S3 Bucket to
Another.
AWS Global Accelerator
➢ AWS Global Accelerator is designed to improve global network performance by offering entry point
onto the global AWS transit network as close to customers as possible using ANycast IP addresses.
➢ AWS Global Accelerator is a service in which you create accelerators to improve the performance of
your applications for local and global users. Depending on the type of accelerator you choose, you
can gain additional benefits.
• By using a standard accelerator, you can improve availability of your internet applications
that are used by a global audience. With a standard accelerator, Global Accelerator directs
traffic over the AWS global network to endpoints in the nearest Region to the client.
• By using a custom routing accelerator, you can map one or more users to a specific
destination among many destinations.
➢ Global Accelerator is a global service that supports endpoints in multiple AWS Regions, which are
listed in the AWS Region Table.
➢ By default, Global Accelerator provides you with two static IP addresses that you associate with your
accelerator.
➢ With a standard accelerator, instead of using the IP addresses that Global Accelerator provides, you
can configure these entry points to be IPv4 addresses from your own IP address ranges that you bring
to Global Accelerator. The static IP addresses are anycast from the AWS edge network.
➢ Global Accelerator Components
➢ AWS Global Accelerator include the following components
• Static IP addresses
• Accelerator
• DNS Name
• Network Zone
• Listener
• Endpoint Group
Static IP addresses: By default, Global Accelerator provides you with two Static IP addresses that you
associate with your accelerator. OR you can bring your own.
Accelerator: An Accelerator directs traffic to optimal endpoints over the AWS Global network to improve
the availability and performance of your internet applications Each Accelerator include one or more
listeners.
DNS Name:
• Global Accelerator assigns each accelerator a default Domain Name System (DNS) name - those
points to the static IP addresses that Global Accelerator assigns to you.
• Depending on the use case, you can use your accelerator's static IP addresses or DNS name to route
traffic to your accelerator, or set up DNS records to route traffic using your own custom domain
name.
Network Zone:
• A Network Zone services the static IP addresses for your accelerator from a unique IP subnet.
Similar to an AWS AZ, a Network Zone is an isolated unit with its own set of physical infrastructure.
• When you configure an accelerator, by default Global Accelerator allocates two IPv4 addresses for
it. if one IP address from a Network Zone becomes unavailable due to IP address bocking by certain
client networks, or network disruptions, client applications can retry on the healthy static IP address
from the other isolated Network Zone.
Listener:
• A Listener processes inbound connection from clients to Global Accelerators, based on the port (or
port range) and protocol that you configure.
• Global Accelerator supports both TCP and UDP protocols. Each Listener has one or more endpoint
groups associated with it, and traffic is forwarded to endpoints in one of the groups.
• You associate endpoint groups with listeners by specifying the Regions that you want to distribute
traffic to. Traffic is distributed to optimal endpoints within the endpoint group associated with a
Listener
Endpoint group
• Each endpoint group is associated with a specific AWS Region.
• Endpoint groups include one or more endpoints in the Region.
• You can increase or reduce the percentage of traffic that would be otherwise directed to an endpoint
group by adjusting a setting called a traffic dial.
• The traffic dial lets you easily do performance testing or blue/green deployment testing for new
releases across different AWS Regions
Know what a Global Accelerator is and where you would use it.
• AWS Global Accelerator is a service in which you create accelerators to improve availability and
performance of your applications for local and global users.
• You can assign two static IP addresses (or alternatively you can bring your own).
• You can control traffic using traffic dials. This is done within the endpoint group.
MODULE 16 - ADVANCED VPC NETWORKING
For critical production workloads that require high resiliency, it is recommended to have one connection at
multiple locations. As shown in the figure above, such a topology ensures resilience to connectivity failure
due to a fiber cut or a device failure as well as a complete location failure. You can use Direct Connect
Gateway to access any AWS Region (except AWS Regions in China) from any AWS Direct Connect
location.
Maximum Resiliency for Critical Workloads
Maximum resilience is achieved by separate connections terminating on separate devices in more than one
location. This configuration offers customers maximum resilience to failure. As shown in the figure above,
such a topology provides resilience to device failure, connectivity failure, and complete location failure. You
can use Direct Connect Gateway to access any AWS Region (except AWS Regions in China) from any
AWS Direct Connect locations.
AWS Managed VPN connections as a backup for the Direct Connect
Some AWS customers would like the benefits of one or more AWS Direct Connect connections for their
primary connectivity to AWS, coupled with a lower-cost backup connection. To achieve this objective, they
can establish AWS Direct Connect connections with a VPN backup.
It is important to understand that AWS Managed VPN supports up to 1.25 Gbps throughput per VPN tunnel
and does not support Equal Cost Multi Path (ECMP) for egress data path in the case of multiple AWS
Managed VPN tunnels terminating on the same VGW. Thus, we do not recommend customers use AWS
Managed VPN as a backup for AWS Direct Connect connections with speeds greater than 1 Gbps.
Transit Gateway
➢ AWS Transit Gateway provides a hub and spoke design for connecting VPCs and on-premises
networks as a fully managed service without requiring you to provision virtual appliances like the
Cisco CSRs. No VPN overlay is required, and AWS manages high availability and scalability.
➢ Transit Gateway enables customers to connect thousands of VPCs. You can attach all your hybrid
connectivity (VPN and Direct Connect connections) to a single Transit Gateway— consolidating and
controlling your organization's entire AWS routing configuration in one place.
➢ Transit Gateway controls how traffic is routed among all the connected spoke networks using route
tables. This hub and spoke model simplify management and reduces operational costs because VPCs
only connect to the Transit Gateway to gain access to the connected networks.
➢ Transit Gateway is a regional resource and can connect thousands of VPCs within the same AWS
Region.
➢ You can create multiple Transit Gateways per Region, but Transit Gateways within an AWS Region
cannot be peered, and you can connect to a maximum of three Transit Gateways over a single Direct
Connect Connection for hybrid connectivity.
➢ For these reasons, you should restrict your architecture to just one Transit Gateway connecting all
your VPCs in a given Region, and use Transit Gateway routing tables to isolate them wherever
needed. There is a valid case for creating multiple Transit Gateways purely to limit misconfiguration
blast radius.
➢ Use AWS Resource Access Manager (RAM) to share a Transit Gateway for connecting VPCs across
multiple accounts in your AWS Organization within the same Region.
Transit Gateway Considerations
➢ Supports transitive routing
➢ Can be used to create global networks
➢ Share between accounts using AWS RAM
➢ Peer with different regions... same or cross account
➢ Less complexity
Storage gateway
AWS Storage Gateway connects an on-premises software appliance with cloud-based storage to provide
seamless integration with data security features between your on-premises IT environment and the AWS
storage infrastructure. You can use the service to store data in the Amazon Web Services Cloud for scalable
and cost-effective storage that helps maintain data security.
AWS Storage Gateway offers file-based file gateways (Amazon S3 File and Amazon FSx File), volume-
based (Cached and Stored), and tape-based storage solutions:
Amazon S3 File Gateway
Amazon S3 File Gateway supports a file interface into Amazon Simple Storage Service (Amazon S3) and
combines a service and a virtual software appliance. By using this combination, you can store and retrieve
objects in Amazon S3 using industry-standard file protocols such as Network File System (NFS) and Server
Message Block (SMB).
Tape Gateway
➢ A tape gateway provides cloud-backed virtual tape storage. The tape gateway is deployed into your
on-premises environment as a VM running on VMware ESXi, KVM, or Microsoft Hyper-V
hypervisor.
➢ With a tape gateway, you can cost-effectively and durably archive backup data in GLACIER or
DEEP_ARCHIVE. A tape gateway provides a virtual tape infrastructure that scales seamlessly with
your business needs and eliminates the operational burden of provisioning, scaling, and maintaining
a physical tape infrastructure.
Volume Gateway
➢ A volume gateway provides cloud-backed storage volumes that you can mount as Internet Small
Computer System Interface (iSCSI) devices from your on-premises application servers.
➢ The volume gateway is deployed into your on-premises environment as a VM running on VMware
ESXi, KVM, or Microsoft Hyper-V hypervisor.
➢ The gateway supports the following volume configurations:
Stored volumes – If you need low-latency access to your entire dataset, first configure your on-premises
gateway to store all your data locally. Then asynchronously back up point-in-time snapshots of this data
to Amazon S3. This configuration provides durable and inexpensive offsite backups that you can recover
to your local data center or Amazon Elastic Compute Cloud (Amazon EC2). For example, if you need
replacement capacity for disaster recovery, you can recover the backups to Amazon EC2.
Cached volumes – You store your data in Amazon Simple Storage Service (Amazon S3) and retain a copy
of frequently accessed data subsets locally. Cached volumes offer a substantial cost savings on primary
storage and minimize the need to scale your storage on-premises. You also retain low-latency access to your
frequently accessed data.
Snowball / Edge / Snowmobile
Snowball, Snowball Edge and Snowmobile are three parts of the same product family designed to allow the
physical transfer of data between business locations and AWS.
Key Concepts
➢ Move large amount of data IN and OUT of AWS
➢ Physical storage such as suitcase or truck
➢ Ordered from AWS Empty, Load up, Return
➢ Ordered from AWS with data, empty & Return
➢ For the exam know which to use.
Snowball
Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of
data into and out of the AWS cloud. Using Snowball addresses common challenges with large-scale data
transfers including high network costs, long transfer times, and security concerns.
➢ Ordered from AWS, log a Job, Device Delivered (not instant)
➢ Data Encryption uses KMS
➢ 50TB or 80TB Capacity
➢ Gbps (RJ45 1GBase-TX) or 10Gbps (LR/SR) Network
➢ 10TB to 10PT economical range (multiple devices)
➢ Multiple devices to multiple premises
➢ Only storage
Snowball Edge
Snowball Edge Storage Optimized devices provide both block storage and Amazon S3-compatible object
storage, and 40 vCPUs. They are well suited for local storage and large scale-data transfer. Snowball Edge
Compute Optimized devices provide 52 vCPUs, block and object storage, and an optional GPU for use cases
like advanced machine learning and full motion video analysis in disconnected environments.
➢ Support Storage and Compute
➢ Larger capacity as compared to Snowball
➢ Gbps (RJ45), 10/25 (SEP), 45/50/100 Gbps (QSEP+)
➢ Storage optimized (with EC2) - 80TB, 24 vCPU, 32 Gib RAM, 1 TB SSD
➢ Compute optimized - 100TB + 7.68 NVME, 52 vCPU and 208 Gib RAM
➢ Compute with GPU - same as above
➢ Ideal for remote sites or where data processing on ingestion is needed.
Snowmobile
AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to
AWS. You can transfer up to 100PB per Snowmobile, a 45-foot-long ruggedized shipping container, pulled
by a semi-trailer truck. Snowmobile makes it easy to move massive volumes of data to the cloud, including
video libraries, image repositories, or even a complete data center migration. Transferring data with
Snowmobile is more secure, fast and cost effective.
➢ Portable DC within a shipping container on a truck
➢ Special order
➢ Ideal for single location when 10 PB+ is required
➢ Up to 100PB per snowmobile
➢ Not economical for multi-site (Unless huge)
➢ Literally a Truck
Directory Service
What's a Directory?
• Stores objects (e.g., Users, Groups, Computers, Servers, File Shares) with a structure (domain/tree)
• Multiple trees can be grouped into a forest
• Commonly used in Windows Environments
• Sign in to multiple devices with the same username/password provides centralised management for
assets
• Microsoft Active Directory Domain Services (AD DS)
• AD DS most popular, open-source alternatives (SAMBA)
Directory Service
➢ AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft
Active Directory (AD), enables your directory-aware workloads and AWS resources to use managed
Active Directory (AD) in AWS.
➢ AWS Managed Microsoft AD is built on actual Microsoft AD and does not require you to
synchronize or replicate data from your existing Active Directory to the cloud.
➢ You can use the standard AD administration tools and take advantage of the built-in AD features,
such as Group Policy and single sign-on. With AWS Managed Microsoft AD, you can easily join
Amazon EC2 and Amazon RDS for SQL Server instances to your domain, and use AWS End User
Computing (EUC) services, such as Amazon WorkSpaces, with AD users and groups.
➢ It runs within a VPC
➢ To implement HA... deploy into multiple AZs.
➢ Some AWS services NEED a directory e.g., Amazon Workspaces.
➢ It can be isolated or integrated with existing on-premises system or act as a 'proxy' back to on-
premises.
AWS Directory Service includes several directory types to choose from.
Simple AD
Simple AD is a Microsoft Active Directory–compatible directory from AWS Directory Service that is
powered by Samba 4. Simple AD supports basic Active Directory features such as user accounts, group
memberships, joining a Linux domain or Windows based EC2 instances, Kerberos-based SSO, and group
policies. AWS provides monitoring, daily snap-shots, and recovery as part of the service.
Simple AD is a standalone directory in the cloud, where you create and manage user identities and manage
access to applications. You can use many familiar Active Directory–aware applications and tools that
require basic Active Directory features.
Simple AD does not support multi-factor authentication (MFA), trust relationships, DNS dynamic update,
schema extensions, communication over LDAPS, PowerShell AD cmdlets, or FSMO role transfer.
Simple AD is not compatible with RDS SQL Server. Customers who require the features of an actual
Microsoft Active Directory, or who envision using their directory with RDS SQL Server should use AWS
Managed Microsoft AD instead.
You can use Simple AD as a standalone directory in the cloud to support Windows workloads that need
basic AD features, compatible AWS applications, or to support Linux workloads that need LDAP service.
AWS Managed Microsoft AD
➢ AWS Directory Service lets you run Microsoft Active Directory (AD) as a managed service. AWS
Directory Service for Microsoft Active Directory, also referred to as AWS Managed Microsoft AD,
is powered by Windows Server 2012 R2. When you select and launch this directory type, it is
created as a highly available pair of domain controllers connected to your virtual private cloud
(VPC). The domain controllers run in different Availability Zones in a Region of your choice. Host
monitoring and recovery, data replication, snapshots, and software updates are automatically
configured and managed for you.
➢ With AWS Managed Microsoft AD, you can run directory-aware workloads in the AWS Cloud,
including Microsoft SharePoint and custom .NET and SQL Server-based applications. You can also
configure a trust relationship between AWS Managed Microsoft AD in the AWS Cloud and your
existing on-premises Microsoft Active Directory, providing users and groups with access to
resources in either domain, using single sign-on (SSO).
➢ AWS Directory Service makes it easy to set up and run directories in the AWS Cloud, or connect
your AWS resources with an existing on-premises Microsoft Active Directory. Once your directory
is created, you can use it for a variety of tasks:
• Manage users and groups
• Provide single sign-on to applications and services
• Create and apply group policy
• Simplify the deployment and management of cloud-based Linux and Microsoft
Windows workloads
• You can use AWS Managed Microsoft AD to enable multi-factor authentication by
integrating with your existing RADIUS-based MFA infrastructure to provide an
additional layer of security when users access AWS applications.
• Securely connect to Amazon EC2 Linux and Windows instances
AD Connector
➢ AD Connector is a proxy service that provides an easy way to connect compatible AWS applications,
such as Amazon WorkSpaces, Amazon QuickSight, and Amazon EC2 for Windows Server
instances, to your existing on-premises Microsoft Active Directory. With AD Connector, you can
simply add one service account to your Active Directory. AD Connector also eliminates the need of
directory synchronization or the cost and complexity of hosting a federation infrastructure.
➢ You can also use AD Connector to enable multi-factor authentication (MFA) for your AWS
application users by connecting it to your existing RADIUS-based MFA infrastructure. This provides
an additional layer of security when users access AWS applications.
➢ AD Connector is your best choice when you want to use your existing on-premises directory with
compatible AWS services.
Picking between Modes
1. Simple AD - The default. Simple requirements. A directory in AWS.
2. Microsoft AD - Applications in AWS which need MS AD DS, or you need to TRUST AD DS
3. AD Connector - Use AWS Services which need a directory without storing any directory info in the
cloud... proxy to your on-premises Directory
DataSync
➢ AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving
data between on-premises storage systems and AWS Storage services, as well as between AWS
Storage services. You can use DataSync to migrate active datasets to AWS, archive data to free up
on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the
cloud for analysis and processing.
➢ Writing, maintaining, monitoring, and troubleshooting scripts to move large amounts of data can
burden your IT operations and slow migration projects.
➢ DataSync eliminates or automatically handles this work for you.
➢ DataSync provides built-in security capabilities such as encryption of data in-transit, and data
integrity verification in-transit and at-rest.
➢ It optimizes use of network bandwidth, and automatically recovers from network connectivity
failures.
➢ In addition, DataSync provides control and monitoring capabilities such as data transfer scheduling
and granular visibility into the transfer process through Amazon CloudWatch metrics, logs, and
events.
➢ DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB)
shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3)
buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows
File Server file systems.
Key Features
➢ Scalable - 10Gbps per agent (~ 100TB per day)
➢ Bandwidth Limiters (avoid link saturation)
➢ Incremental and scheduled transfer options
➢ Compression and encryption
➢ Automatic recovery from transit errors
➢ AWS Service integration - S3, EFS, FSx
➢ Pay as you use... per GB cost for data moved
DataSync Components
➢ Task - A 'job' within DataSync defines what is being synced, how quickly, FROM where and TO
where
➢ Agent - Software used to read or write to on-premises data stores using NFS or SMB
➢ Location - every task has two locations FROM and TO. E.g., Network File System (NFS), Server
Message Block (SMB), Amazon EFS, Amazon FSx and Amazon S3
FSx for Windows Servers
➢ Amazon FSx for Windows File Server provides fully managed, highly reliable, and scalable file
storage that is accessible over the industry-standard Server Message Block (SMB) protocol.
➢ It is built on Windows Server, delivering a wide range of administrative features such as user quotas,
end-user file restores, and Microsoft Active Directory (AD) integration.
➢ It offers single-AZ and multi-AZ deployment options, fully managed backups, and encryption of
data at rest and in transit. You can optimize cost and performance for your workload needs with SSD
and HDD storage options; and you can scale storage and change the throughput performance of your
file system at any time.
➢ Amazon FSx file storage is accessible from Windows, Linux, and MacOS compute instances and
devices running on AWS or on premises.
➢ Fully managed native windows file servers/shares
➢ Designed for integration with windows environment
➢ Integrates with Directory Service or Self-Managed AD
➢ Single or Multi-AZ within a VPC
➢ On-demand and Schedules Backups
➢ Accessible using VPC, Peering, VPN, Direct Connect
FSx Key Features and Benefits
➢ VSS - User-Driven Restores
➢ Native file system accessible over SMB
➢ Windows permission model
➢ Supports DFS... Scale-out file shares structure
➢ Managed - no file server admin
➢ Integrates with DS and your own directory
DynamoDB - Architecture
➢ At a high level, Amazon DynamoDB is designed for high availability, durability, and consistently
low latency (typically in the single digit milliseconds) performance.
➢ Amazon DynamoDB runs on a fleet of AWS managed servers that leverage solid state drives (SSDs)
to create an optimized, high-density storage platform. This platform decouples performance from
table size and eliminates the need for the working set of data to fit in memory while still returning
consistent, low latency responses to queries. As a managed service, Amazon DynamoDB abstracts its
underlying architectural details from the user.
➢ In DynamoDB, tables, items, and attributes are the core components that you work with. A table is a
collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to
uniquely identify each item in a table and secondary indexes to provide more querying flexibility.
You can use DynamoDB Streams to capture data modification events in DynamoDB tables.
➢ The following are the basic DynamoDB components:
o Tables – Similar to other database systems, DynamoDB stores data in tables. A table is a
collection of data. For example, see the example table called People that you could use to
store personal contact information about friends, family, or anyone else of interest. You could
also have a Cars table to store information about vehicles that people drive.
o Items – Each table contains zero or more items. An item is a group of attributes that is
uniquely identifiable among all of the other items. In a People table, each item represents a
person. For a Cars table, each item represents one vehicle. Items in DynamoDB are similar in
many ways to rows, records, or tuples in other database systems. In DynamoDB, there is no
limit to the number of items you can store in a table.
o Attributes – Each item is composed of one or more attributes. An attribute is a fundamental
data element, something that does not need to be broken down any further. For example, an
item in a People table contains attributes called PersonID, LastName, FirstName, and so on.
For a Department table, an item might have attributes such as DepartmentID, Name,
Manager, and so on. Attributes in DynamoDB are similar in many ways to fields or columns
in other database systems.
DynamoDB Concepts
➢ It is NoSQL Public Database-as-a-Service (DBaaS) - Key/Value & Document
➢ No self-managed servers or infrastructure
➢ Manual/ Automatic provisioned performance IN/OUT or On-Demand
➢ Highly Resilient across AZs and optionally global
➢ Really fast, single-digit milliseconds (SSD based)
➢ backups, point-in-time recovery, encryption at rest
➢ Event-Driven integration. do things when data changes
DynamoDB Tables
DynamoDB Indexes
Amazon DynamoDB provides fast access to items in a table by specifying primary key values. However,
many applications might benefit from having one or more secondary (or alternate) keys available, to allow
efficient access to data with attributes other than the primary key. To address this, you can create one or
more secondary indexes on a table and issue Query or Scan requests against these indexes.
A secondary index is a data structure that contains a subset of attributes from a table, along with an alternate
key to support Query operations. You can retrieve data from the index using a Query, in much the same way
as you use Query with a table. A table can have multiple secondary indexes, which give your applications
access to many different query patterns.
• Query is the most efficient operation in DDB
• Query can only work on 1 PK (Primary Key) value at a time.
• and optionally a single, or range of SK values
• Indexes are alternative views on table data
• Different SK (LSI) or different PK and SK (GSI)
• Some or all attributes (projection)
Every secondary index is automatically maintained by DynamoDB. When you add, modify, or delete items
in the base table, any indexes on that table are also updated to reflect these changes.
DynamoDB supports two types of secondary indexes:
1. Global secondary index — An index with a partition key and a sort key that can be different from
those on the base table. A global secondary index is considered "global" because queries on the
index can span all of the data in the base table, across all partitions. A global secondary index is
stored in its own partition space away from the base table and scales separately from the base table.
2. Local secondary index — An index that has the same partition key as the base table, but a different
sort key. A local secondary index is "local" in the sense that every partition of a local secondary
index is scoped to a base table partition that has the same partition key value.
Local Secondary Indexes (LSI)
➢ Some applications only need to query data using the base table's primary key. However, there might
be situations where an alternative sort key would be helpful. To give your application a choice of sort
keys, you can create one or more local secondary indexes on an Amazon DynamoDB table and issue
Query or Scan requests against these indexes.
➢ A local secondary index maintains an alternate sort key for a given partition key value. A local
secondary index also contains a copy of some or all of the attributes from its base table. You specify
which attributes are projected into the local secondary index when you create the table. The data in a
local secondary index is organized by the same partition key as the base table, but with a different
sort key. This lets you access data items efficiently across this different dimension. For greater query
or scan flexibility, you can create up to five local secondary indexes per table.
➢ Every local secondary index must meet the following conditions:
• The partition key is the same as that of its base table.
• The sort key consists of exactly one scalar attribute.
• The sort key of the base table is projected into the index, where it acts as a non-key attribute.
➢ LSI is an alternative view for a table
➢ MUST be created with a table
➢ 5 LSIs per base table
➢ Alternatives SK on the table
➢ Shares the RCU and WCU with the table
➢ When you create a secondary index, you need to specify the attributes that will be projected into the
index. DynamoDB provides three different options for this:
1. KEYS_ONLY – Each item in the index consists only of the table partition key and sort key
values, plus the index key values. The KEYS_ONLY option results in the smallest possible
secondary index.
2. INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will
include other non-key attributes that you specify.
3. ALL – The secondary index includes all of the attributes from the source table. Because all of the
table data is duplicated in the index, an ALL projection results in the largest possible secondary
index.
Global Secondary Indexes (GSI)
➢ Some applications might need to perform many kinds of queries, using a variety of different
attributes as query criteria. To support these requirements, you can create one or more global
secondary indexes and issue Query requests against these indexes in Amazon DynamoDB.
➢ To speed up queries on non-key attributes, you can create a global secondary index. A global
secondary index contains a selection of attributes from the base table, but they are organized by a
primary key that is different from that of the table. The index key does not need to have any of the
key attributes from the table. It doesn't even need to have the same key schema as a table.
➢ Every global secondary index must have a partition key, and can have an optional sort key. The
index key schema can be different from the base table schema. You could have a table with a simple
primary key (partition key), and create a global secondary index with a composite primary key
(partition key and sort key)—or vice versa.
➢ Can be created at any time
➢ Default limit of 20 per base table
➢ Alternative PK and SK
➢ GSI's Have their own RCU and WCU allocations
➢ When you create a secondary index, you need to specify the attributes that will be projected into the
index. DynamoDB provides three different options for this:
1. KEYS_ONLY – Each item in the index consists only of the table partition key and sort key
values, plus the index key values. The KEYS_ONLY option results in the smallest possible
secondary index.
2. INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will
include other non-key attributes that you specify.
3. ALL – The secondary index includes all of the attributes from the source table. Because all of the
table data is duplicated in the index, an ALL projection results in the largest possible secondary
index.
LSI and GSI Considerations
➢ Careful with Projection (KEYS_ONLY, INCLUDE, ALL)
➢ Queries on attributes NOT projected are expensive
➢ Use GSIs as default, LSI only when strong consistency is required
➢ Use indexes for alternative access patterns
DAX Considerations
➢ Primary NODE (Writes) and Replicas (Read)
➢ Nodes are HA. primary failure = election
➢ In-Memory cache - Scaling and its much faster reads, reduces costs
➢ Scale UP and Scale OUT (Bigger or More)
➢ Supports write-through
➢ DAX Deployed WITHIN a VPC
Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyse data directly in Amazon Simple
Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console,
you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries
and get results in seconds.
Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you
run. Athena scales automatically—running queries in parallel—so results are fast, even with large datasets
and complex queries.
➢ Serverless Interactive Querying Service
➢ Ad-hoc queries on data - pay only data consumed
➢ Schema-on-read - table like translation.
➢ Original data never changed - remains on S3.
➢ Schema translates data => relational-like when read
➢ Output can be sent to other services
➢ Athena is an underrated service capable of working with unstructured, semi-structured or structured
data.
Elasticache
Amazon ElastiCache allows you to seamlessly set up, run, and scale popular open-source compatible in-
memory data stores in the cloud. Build data-intensive apps or boost the performance of your existing
databases by retrieving data from high throughput and low latency in-memory data stores. Amazon
ElastiCache is a popular choice for real-time use cases like Caching, Session Stores, Gaming, Geospatial
Services, Real-Time Analytics, and Queuing.
Amazon ElastiCache supports the Memcached and Redis cache engines. Each engine provides some
advantages.
➢ In-memory database... high performance
➢ Managed Redis or Memcached as a service
➢ Can be used to cache data - for READ HEAVY workloads with low latency requirements
➢ Reduces database workloads (expensive)
➢ Can be used to store Session Data (Stateless Servers)
➢ Requires applications code changes!
➢ An in-memory cache allows cost effective scaling of read-heavy workloads - and Performance
improvement at scale
Memcached Redis
Simple data structures Advanced Structures
No Replication Multi-AZ
Multiple Nodes (Sharding) Replication (Scale Reads)
No backups Backup & Restore
Multi-threaded Transactions
Redshift Architecture
➢ Amazon Redshift integrates with various data loading and ETL (extract, transform, and load) tools
and business intelligence (BI) reporting, data mining, and analytics tools. Amazon Redshift is based
on industry-standard PostgreSQL, so most existing SQL client applications will work with only
minimal changes. For information about important differences between Amazon Redshift SQL and
PostgreSQL.
➢ Amazon Redshift communicates with client applications by using industry-standard JDBC and
ODBC drivers for PostgreSQL.
➢ The core infrastructure component of an Amazon Redshift data warehouse is a cluster.
➢ A cluster is composed of one or more compute nodes. If a cluster is provisioned with two or more
compute nodes, an additional leader node coordinates the compute nodes and handles external
communication. Your client application interacts directly only with the leader node. The compute
nodes are transparent to external applications.
➢ The leader node manages communications with client programs and all communication with
compute nodes. It parses and develops execution plans to carry out database operations. The leader
node distributes SQL statements to the compute nodes only when a query references tables that are
stored on the compute nodes.
➢ The leader node compiles code for individual elements of the execution plan and assigns the code to
individual compute nodes. The compute nodes execute the compiled code and send intermediate
results back to the leader node for final aggregation.
➢ A compute node is partitioned into slices. Each slice is allocated a portion of the node's memory and
disk space, where it processes a portion of the workload assigned to the node. The leader node
manages distributing data to the slices and apportions the workload for any queries or other database
operations to the slices. The slices then work in parallel to complete the operation.
Remember this:
➢ Petabyte-scale Data warehouse
➢ OLAP (Column based) not OLTP (row/transaction)
➢ Pay as you use ... similar structure to RDS
➢ Direct Query S3 using Redshift Spectrum
➢ Direct Query other DBs using federated query
➢ Integrates with AWS tooling such as QuickSight
➢ SQL-like interface JDBC/ODBC connections
➢ Server based (not serverless)
➢ One AZ in a VPC - network cost/performance
➢ Leader Node - Query input, planning and aggregation
➢ Compute Node - Performing queries of data
➢ VPC Security, IAM Permissions, KMS at rest Encryption, CW Monitoring
➢ Redshift Enhanced VPC Routing - VPC Networking!
Exam Technique
➢ Shared Exam room, Kiosk or at home
➢ 130 Minutes as standard
➢ ESL (English Second Language) + 30 Minutes
➢ 65 Questions = 2 Minutes Per Question
➢ 720 /1000 Pass Mark
➢ Multiple Choice (Pick 1 / many)
➢ Multi-select (Pick correct)
➢ If it’s your first exam, assume you will run out of time
➢ The way to succeed is to be efficient
➢ 2 minutes to read Q, Answers and make a decision
➢ Don't guess until the end... later questions may remind you of something important from earlier
➢ Use the Mark for Review!
➢ Take ALL the practice tests you can
➢ Aim for 90%+ before you do the real exam
➢ Try and eliminate any crazy answers
➢ Find what matters in the question
➢ Highlight and remove any question fluff
➢ Identify what matters in the answers
➢ DON’T PANIC! mark for review and come back later