AWS Cloud
AWS Cloud
Cloud Foundations
INTRODUCTION TO COMPUTING
Code: a programming language that forms a computer program (Java, Python, C++).
Motherboard: holds all of the core computer hardware components together on a printed circuit board (PCB).
Central Processing Unit (CPU): runs the instructions it receives from applications and the OS. Performs the
basic arithmetic, logical, control and input/output (I/O) operations. Can have multiple cores (to increase
performance).
Memory: holds program instructions and data for the CPU to run/use. Memory is temporary storage (data lost
when computer turned off). Also called random access memory (RAM).
Storage Drive: used to store and retrieve digital data (documents, programs, application preferences etc.).
Drive storage is persistent (data preserved when computer turned off).
Either hard disk drive (HDD) or solid state drive (SSD).
Network Interface Card/Adapter: connects a computer to a computer network (internet).
Computer Network
Connects multiple devices to share data and resources (the internet).
Wired: using an Ethernet cable to connect to a router
Wireless: connected to router using a Wi-Fi signal
BASIC COMPUTING CONCEPTS
Server
A computer that provides data or services to other computers over a network. Differs from desktop hardware
to support more memory, multiple CPUs, redundant power supplies and smaller form factor.
Types of server
Web: used by web applications to serve Hypertext Markup Language (HTML) pages to a requesting client.
Database: hosts database software that applications use to store/retrieve data.
Mail: used to send/receive email from and to clients.
Data Centre
Hosts all of an organisation's computer and networking equipment (servers, storage devices, network devices,
cooling equipment, uninterruptible power supplies) in a physical location.
Virtual Machines
Runs on a physical computer (host). A software layer (hypervisor) provides access to the resources of the
physical computer (CPU, RAM, disk, network) to the VM. Multiple VMs can be provisioned on a single host. A
fundamental unit of computing in the cloud.
Virtualisation: the ability to create multiple VMs, each with its own OS and applications, on a single physical
machine.
Benefits
- Cost savings
- Efficiency
- Reusability and portability (able to duplicate a VM image on one or more physical hosts)
Project Manager: Develops plan, recruits staff, leads and manages team.
Analyst: Defines purpose of project, gathers and organises requirements into tasks for developers.
Quality Assurance: Runs all tests and investigates any failures.
Software Developer: Writes the code that makes up the application according to the specifications.
Data Administrator: Maintains the data that is needed in the application.
WHAT IS CLOUD COMPUTING
Cloud Computing
The on-demand delivery of IT resources online with pay-as-you-go pricing. The cloud is a computer that is
located somewhere else, accessed through the internet, and used in some way. The cloud comprises server
computers in large data centres in different locations around the world. It enables organisations to not have to
build, operate, and improve infrastructure of their own.
Uses
- Backup and storage
- Content delivery with high speed worldwide
- Hosting static/dynamic websites
Cloud
A cloud-based application that is fully deployed in the cloud (all parts of the application run in the cloud).
Hybrid
A way to connect infrastructure and applications between cloud-based resources and existing resources that
are not in the cloud. Infrastructure is often located on-premises of an organisation's data centre and the hybrid
model extends this into the cloud.
Private (on-premises)
Cloud infrastructure run from your own data centre which can use application management and virtualisation
to increase resource utilisation.
If you have a business, how can cloud computing benefit your business?
Cloud computing or cloud services providers like Amazon Web Services (AWS) provide rapid access to flexible
and low-cost IT resources. With cloud computing, you don’t need to make large upfront investments in
hardware. As a business owner, you don’t need to purchase a physical location, servers, storage, or databases.
Fixed expense: Funds that a company uses to acquire, upgrade, and maintain physical assets, such as property,
industrial buildings, or equipment.
Variable expense: An expense that the person who bears the cost can alter or avoid.
By using the cloud, businesses don’t need to invest money into data centres and servers. They can pay for only
what they use, and they pay only when they use these resources (which are also known as pay as you go).
Businesses save money on technology. They can adapt to new applications with as much space as they need in
minutes instead of weeks or days. Maintenance is reduced so that the business can focus on its core goals.
Increased innovation
•Perform quick, low-cost experimentation.
•Use prefabricated functionality without requiring in-house expertise (such as data warehousing and
analytics).
Increased experimentation
•Explore new avenues of business with minimal risk and expense.
•Test with different configurations.
AWS is a cloud services provider. AWS offers a broad set of global cloud-based products—which are also
known as services—that are designed to work together.
AWS offers three different models of cloud services: infrastructure as a service (IaaS), platform as a service
(PaaS), and software as a service (SaaS). All of these services are on the AWS Cloud.
With IaaS, you manage the server, which can be physical or virtual, and the operating system (Microsoft
Windows or Linux). In general, the data centre provider has no access to your server.
Basic building blocks for cloud IT include the following: •Networking features
•Compute
•Data storage space
With PaaS, someone else manages the underlying hardware and operating systems. Thus, you can run
applications without managing underlying infrastructure (patching, updates, maintenance, hardware, and
operating systems). PaaS also provides a framework for developers that they can build on to create
customized applications.
With SaaS, you manage your files, and the service provider manages all data centres, servers, networks,
storage, maintenance, and patching. Your concern is only the software and how you want to use it. You are
provided with a complete product that the service provider runs and manages. Facebook and Dropbox are
examples of SaaS. You manage your Facebook contacts and Dropbox files, and the service providers manage
the systems.
•Access control lists (ACLs) •Amazon Elastic Block Store (Amazon EBS)
•Amazon Elastic File Store (Amazon EFS) •Amazon Machine Image (AMI)
•Amazon Relational Database Service (Amazon RDS) •Amazon Simple Storage Service (Amazon S3)
•Amazon Virtual Private Cloud (Amazon VPC) •AWS Identity and Access Management (IAM)
•Direct-attached storage (DAS) •Network access control lists (network ACLs)
•Network-attached storage (NAS) •Storage area network (SAN)
•Relational database management system (RDBMS)
What are web services?
A web service is any piece of software that makes itself available over the internet. It uses a standardized
format, either Extensible Markup Language (XML) or JavaScript Object Notation (JSON), for the request and
the response of an application programming interface (API) interaction.
AWS services
Compute
•Calculated either by the hour or the second.
•Varies by instance type
Storage
•Charged typically per GB
Data transfer
•Outbound is aggregated and charged
•Inbound has no charge (with some exceptions)
•Charged typically per GB
In most cases, you won’t be charged for inbound data transfer or for data transfer between other AWS
services in the same AWS Region.
•As AWS grows, AWS focuses on lowering the cost of doing business.
•This practice results in AWS passing savings from economies of scale to you.
•Since 2006, AWS has lowered pricing more than 75 times and continues to do so.
•Future higher-performing resources replace current resources for no extra charge.
**Note: Some charges might be associated with other AWS services that are used with these services.
TCO considerations
AWS INFRASTRUCTURE OVERVIEW
AWS Regions
Points of presence
AWS provides a global network of 216 PoP locations.
•The PoPs consist of 205 edge locations and 11 Regional edge caches.
•PoPs are used with Amazon CloudFront, a global content delivery network (CDN) that delivers content to end
users with reduced latency.
•Regional edge caches are used for content with infrequent access.
Fault-tolerant:
•Continues operating properly in the presence of a failure
•Includes built-in redundancy of components
Highly available:
•High level of operational performance with reduced downtime
AWS SERVICES AND SERVICE CATEGORIES
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability,
data availability, security, and performance. Use it to store and protect any amount of data for
websites and mobile apps. It is also used for backup and restore, archive, enterprise
applications, Internet of Things (IoT) devices, and big data analytics.
Amazon Elastic Block Store (Amazon EBS) is high-performance block storage that is designed
for use with Amazon EC2 for both throughput-intensive and transaction-intensive workloads. It
is used for various workloads; such as relational and non-relational databases, enterprise
applications, containerized applications, big data analytics engines, file systems, and media
workflows.
Amazon Elastic File System (Amazon EFS) provides a scalable, fully managed elastic Network
File System (NFS) file system for AWS Cloud services and on-premises resources. It is built to
scale on demand to petabytes, growing and shrinking automatically as you add and remove
files. Using Amazon EFS reduces the need to provision and manage capacity to accommodate
growth.
Amazon Simple Storage Service Glacier is a secure, durable, and low-cost Amazon S3 cloud
storage class for data archiving and long-term backup. It is designed to deliver 11 9s
(99.999999999 percent) of durability and to provide comprehensive security and compliance
capabilities to meet stringent regulatory requirements.
Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity as virtual
machines in the cloud.
Amazon EC2 Auto Scaling gives you the ability to automatically add or remove EC2 instances
according to conditions that you define.
AWS Elastic Beanstalk is a service for deploying and scaling web applications and services. It
deploys them on familiar servers such as Apache HTTP Server and Microsoft Internet
AWS Lambda gives you the ability to run code without provisioning or managing servers. You
pay for only the compute time that you consume, so you won’t be charged when your code
isn’t running.
Amazon Elastic Container Registry (Amazon ECR) is a fully managed Docker container registry
that facilitates storing, managing, and deploying Docker container images.
Amazon Elastic Kubernetes Service (Amazon EKS) facilitates deploying, managing, and scaling
containerized applications that use Kubernetes on AWS.
AWS Fargate is a compute engine for Amazon ECS that you can use to run containers without
managing servers or clusters.
Amazon Relational Database Service (Amazon RDS) facilitates setting up, operating, and scaling
a relational database in the cloud. It provides resizable capacity while automating time-
consuming administration tasks, such as hardware provisioning, database setup, patching, and
backups.
Amazon Aurora is a relational database that is compatible with MySQL and PostgreSQL. It is up
to five times faster than standard MySQL databases and three times faster than standard
PostgreSQL databases.
Amazon Redshift gives you the ability to run analytic queries against petabytes of data that is
stored locally in Amazon Redshift. You can also run queries directly against exabytes of data that
are stored in Amazon S3. Amazon Redshift delivers fast performance at any scale.
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond
performance at any scale with built-in security, backup and restore, and in-memory caching.
Amazon Virtual Private Cloud (Amazon VPC) gives you the ability to provision logically isolated
sections of the AWS Cloud.
Elastic Load Balancing automatically distributes incoming application traffic across multiple
targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions.
Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data,
videos, applications, and application programming interfaces (APIs) to customers globally. It has
low latency and high transfer speeds.
AWS Transit Gateway is a service that customers can use to connect their virtual private clouds
(VPCs) and their on-premises networks to a single gateway.
Amazon Route 53 is a scalable, cloud Domain Name System (DNS) web service. It is designed to
give you a reliable way to route end users to internet applications. Route 53 translates names
(like www.example.com) into the numeric IP addresses (like 192.0.2.1) that computers use to
connect to each other.
AWS Direct Connect provides a way to establish a dedicated private network connection from
your data centre or office to AWS. Using AWS Direct Connect can reduce network costs and
increase bandwidth throughput.
AWS Client VPN provides a secure private tunnel from your network or device to the AWS global
network.
AWS Identity and Access Management (IAM) gives you the ability to manage access to AWS
services and resources securely. By using IAM, you can create and manage AWS users and groups.
You can use IAM permissions to allow and deny user and group access to AWS resources.
AWS Organizations permits you to restrict what services and actions are allowed in your accounts.
Amazon Cognito gives you the ability to add user sign-up, sign-in, and access control to your web
and mobile apps.
AWS Artifact provides on-demand access to AWS security and compliance reports and select
online agreements.
AWS Key Management Service (AWS KMS) provides the ability to create and manage keys. You
can use AWS KMS to control the use of encryption across a wide range of AWS services and in
your applications.
AWS Shield is a managed distributed denial of service (DDoS) protection service that safeguards
applications running on AWS.
AWS Cost and Usage Report contains the most comprehensive set of AWS cost and usage data
available. It includes additional metadata about AWS services, pricing, and reservations.
AWS Budgets provides the ability to set custom budgets that alert you when your costs or usage
exceeds (or will likely exceed) your budgeted amount.
AWS Cost Explorer has an easy-to-use interface that you can use to visualize, understand, and
manage your AWS costs and usage over time.
Management and Governance service category
AWS Management Console provides a web-based user interface for accessing your AWS
account.
AWS Config provides a service that helps you track resource inventory and changes.
Amazon CloudWatch gives you the ability to monitor resources and applications.
AWS Auto Scaling provides features that you can use to scale multiple resources to meet
demand.
AWS Command Line Interface (AWS CLI) provides a unified tool to manage AWS services.
AWS Well-Architected Tool provides help in reviewing and improving your workloads.
AWS handles the security of the physical infrastructure that hosts your resources. This infrastructure includes
the following:
•Physical security of data centres with controlled, need-based access, located in nondescript facilities. The
physical security includes 24/7 security guards, two-factor authentication, access logging and review, video
surveillance, and disk degaussing and destruction.
•Hardware infrastructure, which includes servers, storage devices, and other appliances that AWS services rely
on.
•Software infrastructure that hosts operating systems, service applications, and virtualization software.
•Network infrastructure, which includes routers, switches, load balancers, firewalls, and cabling. This facet of
security includes nearly continuous network monitoring at external boundaries, secure access points, and
redundant infrastructure with intrusion detection.
•Virtualization infrastructure, including instance isolation.
When customers use AWS services, they maintain complete control over their content. Customers are
responsible for managing critical content security requirements, including the following:
•Which content they choose to store on AWS.
•Which AWS services are used with the content.
•Which country that content is stored in.
•The format and structure of that content and whether it is masked, anonymized, or encrypted.
•Who has access to that content and how those access rights are granted, managed, and revoked.
Customers retain control of the security that they choose to protect their data, environment, applications,
AWS Identity and Access Management (IAM) settings, and operating systems. Thus, the shared responsibility
model changes depending on the AWS services that the customer decides to implement.
Service characteristics and security responsibility
Amazon S3 features
•Amazon S3 Standard
Designed to provide high-durability, high-availability, and high-performance object storage for frequently
accessed data. Because it delivers low latency and high throughput, Amazon S3 Standard is appropriate for
many use cases. These use cases include cloud applications, dynamic websites, content distribution, mobile
and gaming applications, and big data analytics.
•Amazon S3 Intelligent-Tiering
Designed to optimize costs. It automatically moves data to the most cost-effective access tier without affecting
performance or operational overhead. For a small monthly monitoring and automation fee per object, Amazon
S3 monitors access patterns of the objects in Amazon S3 Intelligent-Tiering. It then moves the objects that
haven’t been accessed for 30 consecutive days to the Infrequent Access tier. If an object in the Infrequent
Access tier is accessed, it is automatically moved back to the Frequent Access tier. The Amazon S3 Intelligent-
Tiering storage class doesn’t charge retrieval fees when you use it. Also, it doesn’t charge additional fees when
objects are moved between access tiers. It works well for long-lived data with access patterns that are
unknown or unpredictable.
Redundancy in Amazon S3
When you create a bucket in Amazon S3, it is associated with a specific AWS Region. Whenever you store data
in the bucket, it is redundantly stored across multiple AWS facilities in your selected Region.
Amazon S3 is designed to durably store your data even if two AWS facilities experience concurrent data loss.
Seamless scaling
Amazon S3 automatically manages the storage behind your bucket even when your data grows. Because of
this system, you can get started immediately and have your data storage grow with your application needs.
Amazon S3 also scales to handle a high volume of requests. You don’t need to provision the storage or
throughput, and you are billed for only what you use.
Common use cases for Amazon S3
Amazon S3 pricing
Storage class type - Standard storage is designed for the following: » 11 9s of durability
» 4 9s of availability
- Standard-Infrequent Access (S-IA): » 11 9s of durability
» 3 9s of availability
Data transfer - Pricing is based on the amount of data that is transferred out of the S3 Region.
- Data transfer in is free, but you incur charges for data that is transferred out.
AMAZON ELASTIC COMPUTE CLOUD (AMAZON EC2)
Amazon EC2
1. AMI
An AMI:
•Is a template that is used to create an EC2 instance (which is a virtual machine, or VM, that runs in the AWS
Cloud)
•Contains a Windows or Linux operating system
•Often also has some software preinstalled
AMI choices:
•Quick Start –Linux and Windows AMIs that are provided by AWS
•My AMIs –Any AMIs that you created
•AWS Marketplace –Preconfigured templates from third parties
•Community AMIs –AMIs shared by others; use at your own risk
AMI Benefits
Repeatability
•Use an AMI to launch instances repeatedly with efficiency and precision.
Reusability
•Instances that are launched from the same AMI are identically configured.
Recoverability
•You can create an AMI from a configured instance as a restorable backup.
•You can replace a failed instance by launching a new instance from the same AMI
2.Instance type
The instance type that you choose determines the following:
•Memory(RAM)
•Processing power(CPU)
•Disk space and disk type(storage)
•Network performance
Naming
•Example: t3.large
•t is the family name.
•3 is the generation number.
•Large is the size.
Sizes
T3 - websites/applications, build servers, test and staging environments, microservices, code repositories.
C5 - scientific modelling, batch processing, ad serving, highly scaleable multiplayer gaming, video encoding.
R5 - high-performance databases, data mining, applications that perform real-time processing of big data.
3. Network settings
•Does software on the EC2 instance need to interact with other AWS services?
•If yes, attach an appropriate AWS Identity and Access Management (IAM) role.
•An IAM role that is attached to an EC2 instance is kept in an instance profile.
•Use user data scripts to customize the runtime environment of your instance.
•Script runs the first time the instance starts.
6. Storage options
•Configure the root volume where the guest operating system is installed.
•Potential benefits of tagging include filtering, automation, cost allocation, and access control.
8. Security group
•A security group is a set of firewall rules that control traffic to the instance.
•It exists outside of the instance's guest OS.
•Create rules that specify the source and which ports that network communications can use.
•Specify the port number and the protocol, such as Transmission Control Protocol (TCP), User
Datagram Protocol (UDP), or Internet Control Message Protocol (ICMP).
•Specify the source (for example, an IP address or another security group) that is allowed to use the
rule.
9. Key pair
•At instance launch, you specify an existing key pair or create a new key pair.
•For Windows AMIs, use the private key to obtain the administrator password that you need to log in to your
instance.
•For Linux AMIs, use the private key to use SSH to securely connect to your instance.
EC2 instance lifecycle
On-Demand Instances
•Pay by the hour
•No long-term commitments
•Eligible for the AWS Free Tier
Dedicated Hosts
•A physical server with EC2 instance capacity that is fully dedicated to your use
Dedicated Instances
•Instances that run in a VPC on hardware that is dedicated to a single customer
Reserved Instances
•Full, partial, or no upfront payment for the instance that you reserve
•Discount on hourly charge for that instance
•1-year or 3-year term
Spot Instances
•Run when they are available and your bid is above the market price
•Can be interrupted by AWS with a 2-minute notification
•Include the following interruption options: terminated, stopped, or hibernated
•Can be significantly less expensive than On-Demand Instances
•Are a good choice when you have flexibility in when your applications can run
Per-second billing is available for On-Demand Instances, Reserved Instances, and Spot Instances that run
Amazon Linux or Ubuntu.
Amazon EC2 pricing models: Benefits
AWS re/start
Linux
INTRODUCTION TO LINU X
Distributions
A Linux distribution is a packaged version of Linux that a group of individuals or a company develops. It
includes the core operating system functionality (kernel) and additional complementary tools and software
applications.
Typically downloaded and can be installed by using various formats by using an Amazon Machine Image (AMI)
for Amazon Linux 2.
Examples: Fedora (Red Hat – source of RHEL which Amazon Linux 2 is derived), Debian, and OpenSUSE.
Linux Components
Kernel: refers to the core component of an operating system. Controls everything in the operating system,
including the allocation of CPU time and memory storage to running programs, and access to peripheral
devices.
Daemons: a computer program that runs in the background and is not under the control of an interactive user.
It typically provides a service to other running programs. Process names that traditionally end with the letter d
Examples; syslogd (stores as log file), sshd (communication between client and server).
Applications: software that provides a set of functions that help a user perform a type of task or activity (word
processor/web browser/email client/media player).
Data files: contain the information that programs use and can have different types of data (music file/text
file/image file). Typically grouped in directories.
Has a name that uniquely identifies them [directoryName]fileName[.extension]
Configuration files: a special type of file that stores initial settings or important values for a system program.
These values configure the behaviour of the associated program or capture the data that the program uses.
When you use the CLI, the shell that you select defines the list of commands and functions that you can run. A
shell interprets the command that you type and invokes the appropriate kernel component that runs the
command.
Bash Shell is the default Linux shell.
Manual pages
Contain a description of the purpose, syntax, and options for a Linux command. You access the man pages by
using the man command.
The man command displays documentation information for the command that you specify as its argument.
- Name: The name and a brief description of the purpose of the command
- Synopsis: The syntax of the command
- Description: A detailed description the command’s usage and functions
- Options: An explanation of the command’s options
You can also search a command’s man page by using the forward slash (/) character: /<searchString>
To exit the manual pages, enter q
Linux Distributions
A Linux distribution includes the Linux kernel and the tools, libraries, and other software applications that the
vendor has packaged. Most widely used distributions are derived from the following sources:
•Fedora: Red Hat, an IBM company, mainly sponsors this distribution. Fedora issued to develop, test, and
mature the Linux operating system. Fedora is the source of the commercially distributed RHEL from which the
Amazon Linux 2 and CentOS distributions are derived
•Debian: This Linux distribution adheres strongly to the free software principles of open source. The Ubuntu
distribution is derived from Debian, and the British company Canonical Ltd. maintains it.
•OpenSUSE: The German company SUSE sponsors this distribution, which is used as the basis for the
company’s commercially supported SUSE Enterprise Linux offering.
Amazon Linux 2
Latest Linux operating system that AWS offers. It is designed to provide a stable, secure, and high-performance
runtime environment for applications that run on Amazon Elastic Compute Cloud (Amazon EC2). It supports
the latest EC2 instance type features and includes packages that facilitate integration with AWS.
LINUX COMMAND LINE
The Linux login workflow consists of the following three main steps:
After a network connection is made, you can connect by using a program like Putty or by using the terminal on
Mac OS.
The name is checked against the /etc/passwd file, and the password is checked against the /etc/shadow file.
NOTE: To copy and paste into the CLI use the right mouse click
The Linux command prompt or command line is a text interface to your Linux computer. It is commonly
referred to as the shell, terminal, console, or prompt.
Useful Commands
whoami: show the current user's user name when the command is invoked
id: helps identify the user and group name and numeric IDs (UID or group ID) of the current user or any other
user on the server.
hostname: displays the TCP/IP hostname. The hostname is used to either set or display the system's current
host, domain, or node name.
uptime: indicates how long the system has been up since the last boot.
date: display the current time in a given format. It can also set the system date.
cal: used to display a calendar. If no arguments are specified, the current month is displayed.
clear: used to clear the terminal screen (clears all text on the terminal screen and displays a new prompt).
echo: places specified text on the screen. It is useful in scripts to provide users with information as the script
runs. It directs selected text to standard output or in scripts to provide descriptions of displayed results.
history: views the history file.
- It displays the current user's history file
- Up and down arrow keys cycle through the output or the history file
- This command can be run by using an event number: for example, !143
Note: If you make a mistake when writing a command, don't re-enter it. Use the history command to call the
command back to the prompt, and then edit the command to correct the mistake.
You should use the history command in the following use cases:
- Accessing history between sessions
- Removing sensitive information from the history: for example, a password that is entered
into a command argument
touch: can be used to create, change, or modify timestamps on existing files. Also used to create a new empty
file in a directory.
To create a new file using the touch command, enter touch file_example_1
cat: used to show contents of files.
stdin: Standard Input is the device through which input is normally received: for example, a keyboard or a
scanner.
stdout: Standard Output is the device through which output is normally delivered: for example, the display
monitor or a mobile device screen.
stderr: Standard Error is used to print the error messages on the output screen or window terminal.
the tab key automatically completes commands and file or directory names. The Bash tab saves time and
provides greater accuracy. To use this feature, you must enter enough of the command or file name for it to be
unique.
Pressing the tab key twice will show all matching options.
USERS AND GROUPS
Managing users
- User information can be stored locally or on another server accessible through a network.
- When information is stored locally, Linux stores it in the /etc/passwd file.
- Best practice is to assign one user per account (do not share accounts).
Managing Groups
A group is a set of accounts and a convenient way to associate user accounts with similar security needs
(easier to grant permissions to a group than to individual users).
The storage location for groups is the /etc/group file.
User Permissions
The root user has the permissions to access and modify anything on the system (administrator).
A standard user is a user that only has rights and file permissions that the user’s role requires.
Note: use caution with root. Do not log in to the system with administrative permissions. Log in as a standard
user, and then elevate permissions only when necessary (mistakes can make system inoperable).
Root user command prompt ends with #
Standard user command prompt ends with $
su command: stands for substitute user and can be used to log in as any user (not just root user) to accomplish
administrative tasks (allows switch to root user’s environment).
Delegate specific commands to specific users by adding them to /etc/sudoers
sudo command: to run a command with one-time root permissions. sudo requires the password of the current
user whereas su requires the password of the substitute account. sudo is safer as it does not require password
exchange.
IAM is an AWS service that is used to manage users and access to resources. It determines who can launch,
configure, manage, and remove resources. It provides control over access permissions for people and for
systems or other applications that might make programmatic calls to AWS resources.
EDITING FILES
Vim: Command file editor (default text editor for virtually all Linux distributions)
nano: Command file editor
gedit: GUI application (requires GNOME, Xfce, or K Desktop Environment (KDE)
Modes
- Command: Keystrokes issue commands to Vim
- Insert: Keystrokes enter content into the text file
Command Mode
Keystroke Effect
x Delete the character at the cursor
G Move the cursor to the bottom of the file
gg Move the cursor to the top of the file
42G Move the cursor to line 42 of the file
/keyword Search the file for keyword
y Yank text (cut)
p Put text (paste)
r Replace character under the cursor
i Move to insert mode
ZZ Save changes and exit Vim
dd Delete the line at the cursor
u Undo the last command
/g Global
:s/old/new/g Globally find old and replace with new
O Enter insert mode and create a line below the cursor
A Enter insert mode and enter text after the cursor
h, j, k, l Move cursor left, down, up, and right
Keystroke Effect
:w Writes file (save)
:q Quits Vim
:wq Writes file then quits Vim
:wq! Writes file and forces quit
:q! Quits Vim without saving changes
Vim Help
Command Effect
CTRL+X Quit nano
CTRL+O Save the file
CTRL+K Cut text
CTRL+U Paste text
CTRL+G Get help
^G Display help text
^X Close current file buffer and exit nano
^O Write the current file to disk
^W Search for a string or a regular expression
^Y Move to previous screen
^V Move to next screen
^K Cut the current line and store it in cutbuffer
^U Uncut from cutbuffer into the current line
^C Display the position of the cursor
^_ Go to the line and column number
^\ Replace a string or a regular expression
M-W Repeat the last search
M-^ or M-6 Copy the current line and store it in the cutbuffer
^E Move to the end of the current line
M-] Move to the matching bracket
M-< or M-, Switch to the previous file buffer
M-> or M-. Switch to the next file buffer
WORKING WITH THE FILE SYSTEM
File names
- They are case sensitive
- They must be unique within the directory
- They should not contain / or spaces
Extensions
Optional and not necessarily mapped to applications
File Systems
A way of naming, retrieving, and organising data on the storage disk. A file is located inside a directory.
Directory Function
/ Root of the file system
/boot Boot files and kernel
/dev Devices
/etc Configuration files
/home Standard users' home directories
/media Removable media
/mnt Network drives
/root Root user home directory
/var Log files, print spool, network services
ls command options
Option Description
-l Long format (shows permissions)
-h File sizes reported in a human-friendly format
-a Shows all files, including hidden files
-R Lists subdirectories
--sort=extension or -x Sorts alphabetically by file extension
--sort=size or -s Sorts by file size
--sort=time or -t Sorts by modification time
--sort=version or -v Sorts by version number
You can combine options: ls -al displays hidden files and file details.
Exploring files and directories
Command Description
cat Shows contents of a file
cp Copies a file
rm Removes a file
mkdir Creates a directory (several with one command: mkdir dir1 dir2 dir3)
mv Moves a file from one directory to another: mv source destination
rmdir Deletes existing empty directories: rmdir <DirectoryName>
If directory not empty, use rm -r <DirectoryName>
pwd Current location in the file system
less Scroll backwards through file
more Scroll forwards through a file
cp command
Option Description
cp -a Archive files
cp -f Force copy by overwriting the destination file if needed
cp -i Interactive - Ask before overwrite
cp -l Link files instead of copy
cp -n No file overwrite
cp -u Update - copy when source is newer than destination
rm command
Option Description
rm -d Removes a directory; the directory must be empty: rm -d dir
rm -r Allows you to remove a non-empty directory: rm -r dir
rm -f Never prompt user (useful when deleting a directory with many files)
rm -i Prompts user for confirmation for each file
rm -v Display the names of deleted files
Absolute: The complete path to the resource from the root of the file system (shows entire folder structure)
Relative: The path to the resource from the current directory
cd command
Used to move from one directory to another
Use ../ to go up a single directory at a time
WORKING WITH FILES
Important Commands
hash: Used to see a history of programs and commands that are run from the command line Information is
maintained by command in a hash table.
Syntax: hash [-lr] [-p pathName] [-dt] [commandName ...]
find: Searches for files by using criteria such as the file name, the size, and the owner.
Syntax: find [directory to start from][options][what to find]
tar: Bundles multiple files into one file (created bundle is a tarball): tar -cvf tarball.tar <file1 file2>
Links
Can use links to refer to the same file by using different names and to access the same file from more than one
location in the file system.
Every file has an inode object that uniquely identifies its data location and attributes. Identified with a unique
number.
Hard Link - Points to the original file's inode. If the file is deleted, the data still exists until the hard link is
deleted.
Symbolic - Points to the original file name or a hard link. If the file is deleted, the soft link is broken until you
create a new file with the original name.
MANAGING FILE PERMISSIONS
Permission Types
Read (r): Gives the user control to open and read a file
Write (w): Gives the user the control to change the file's content
Execute (x): Gives the user the ability to run a program
(-) : The file type indicates the regular file in the directory
Permission Modes
Symbolic: A combination of letters and symbols to add permissions or to remove any set permissions.
Note: with the chmod command, the user can change the permissions on a file.
chmod allows the user to set permissions for files and directories.
ls command is used to list files and directories. Option -l (lowercase L) shows the file or directory, size,
modified date and time, file or folder name, and owner of the file and its permissions.
Ownership
A user is the owner of the file. Every file in the system belongs to one user name/file owner.
User: can create a new file or directory. Ownership is set to the user ID of the user who created the file.
Group: can contain multiple users. Users who belong to that group will have the same permissions to access
the file.
Other: means the user did not create the file and does not belong to a user group that could own the file.
Default Permissions
The root account has superuser privileges and can run commands. Non-root users can perform or run
commands that are similar to root users who use the sudo command.
chown command
The user (owner) and associated group for a file or directory can be changed using the chown command to
change ownership.
[options] - The chown command can be used with or without options
[user] - indicates the user name/ID of the new owner of the file or directory being altered
[:] - use when changing a group of the file or directory
[group] - changing the ownership of the associated group is optional
[file(s)] - the source file or directory that will be altered
chmod command
The command that is used to change permissions is the chmod command. The change mode or chmod
command is used to set permissions on files and directories.
Permission Value
Read 4
Write 2
Execute 1
All permissions 7
Bash Metacharacters
Special characters that have a meaning to the shell and that users can use to work faster and more powerful
interactions with Bash (especially useful when writing scripts).
Control output, wildcards, and chaining commands
Metacharacter Description
* (star) Any number of any character (wildcard)
? (hook) Any one character (wildcard)
[characters] Any matching characters between brackets (wildcard)
`cmd` or $cmd Command substitution - uses backticks (`), not single quotation marks (' ')
; Chain commands together, all written on a single line
~ Represents the home directory of the user
- Represents the previous working directory
Redirection operators
Operator Description
> Sends the output of a command to a file (by default will overwrite existing file
content)
< Receives the input for a command from a file
| Runs a command and redirects its output as input to another command. You can
chain several commands by using pipes (multi-stage piping), which is referred to as a pipeline.
>> Appends the output of a command to the existing contents of a file
2> Redirects errors that are generated by a command to a file
2 Appends errors that are generated by a command to the existing contents of a file
Using | grep
grep is commonly used after another command, along with a pipe ( | ). Used to search the piped output of a
previous command.
cut command
Cuts sections from lines of text by character, byte position, or delimiter.
sed command
Non-interactive text editor and edits data based on the rules that are provided (can insert, delete, search, and
replace).
sort command
Sorts file contents in a specified order: alphabetical, reverse order, number, or month.
awk command
Used to write small programs to transform data
MANAGING PROCESSES
Programs
A series of instructions given to a computer that tells which actions the computer should take.
Process
Is a running program and identified by process ID number (PID).
States of a process
ps command
The ps (process status) command gives an overview of the current processes that will be running in the OS.
Displays: - Process ID (PID)
- Terminal type (TTY)
- Time process has been running
- Command (CMD) which launched the process
Option Description
-e List all current processes
-b Use batch mode
-fp <number> List processes by PID
pidof command
Shows the PID (process ID) of the current running program (pidof sshd will show the PID of sshd).
pstree command
Displays the current running processes in a tree format and merges identical branches denoted by [ ] square
brackets and child processes that are under the parent processes as denoted by { } curly brackets.
Example: pstree [options] [pid, user]
top command
You can use the top command to examine the processes that run on a server and to examine resource
utilization. Displays a real-time system summary and information of a running system. It displays information
and a list of processes or threads being managed.
Options Description
-h and -v Displays usage and version information
-b Starts top in Batch mode
kill command
Explicitly ends processes usually when the process fails to quit on its own.
The following are kill command signals that you can use:
• -9 SIGKILL–Stops any process immediately with no graceful exit
• -15 SIGTERM–Exits without immediately terminating
• -19 SIGSTOP–Pauses the process and can use the command line
crontab command
Stands for cron table, is made up of a list of commands, and is also used to manage this table (minute, hour,
day of the month, month of the year, day of the week, command).
MANAGING SERVICES
systemctl command
Has many subcommands , including status, start, stop, restart, enable, and disable. Services provide
functionality such as networking, remote administration, and security.
Restarting the entire server would mean that the reboot would also stop all the properly running services on
the server. Restarting only the failing service means that the healthy services can continue to run.
Monitoring Services
Command Description
lscpu List CPU information
lshw List hardware
du Check file and directory sizes
df Display disk size and free space (df -h displays in human-readable format)
fdisk List and modify partitions on the hard drive
vmstat Indicate use of virtual memory
free Indicate use of physical memory
top Display system’s processes and resource usage (use to determine what
process is responsible for high CPU usage)
uptime Indicate the amount of time that the system has been up, number of users, and central
processing unit (CPU) wait time
Amazon CloudWatch
AWS CloudWatch monitors the health and performance of your AWS resources and applications.
•It offers monitoring of Amazon Elastic Compute Cloud (Amazon EC2) instances, such as CPU usage, disk reads,
and writes.
•You can create alarms. For example, when CPU usage exceeds a certain threshold, a notification is sent
through Amazon Simple Notification Service (Amazon SNS).
THE BASH SHELL
What is a shell?
- The primary purpose of a shell is to allow the user to interact with the computer operating system
- A shell accepts and interprets commands
- A shell is an environment in which commands, programs, and shell scripts are run
- Bash is the default Linux shell
Shell variables
A variable is used to store values. Variables can be a name, a number, or special characters; by default,
variables are strings. Scripts or other commands can call shell variables. The values that these shell variables
represent are then substituted into the script or command.
By convention and as a good practice, the name of a variable that a user has created is in lowercase.
Environment (system) variable names are capitalized. Also, there is no space before or after the equ the
variable must contain no spaces or special characters within the variable name. A variable name can contain
only letters (a to z or A to Z), numbers (0 to 9), or the underscore character ( _).
A value can be assigned as a number, text, file name, device, or other data type. Variables are assigned by
using the = operator. The value of the variable is located to the right of the = operator.
To display the value of a variable, use the echo $VARIABLE_NAME. Also use the echo command to view the
output from environment variables or system-wide variables.
Environment variables
Environment variables are structurally the same as shell variables; they are no different from each other. Both
use the key-value pair, and they are separated by the equal (=) sign.
Common Environment Variables
env command
The env command is a shell command for Linux and displays the environment variable. You use this command
to display your current environment and can be useful in testing.
alias command
By using aliases, you can define new commands by substituting a long command with a short one. Aliases can
be set temporarily in the current shell, but it is more common to set them in the user's .bashrc file so that they
are permanent.
How it works: Enter the command alias, desired alias, and then the command to run. Ensure the value of the
command in single quotation marks
unalias command
The unalias command removes the configured alias if it is not configured in the .bashrc file.
BASH SHELL SCRIPTS
Shell scripts
The # character
Bash ignores lines that are preceded with #. The #character is used to define comments or notes to the user
that might provide instructions or options
- The first line defines the interpreter to use (it gives the path and name of the interpreter)
- Scripts must begin with the directive for which shell will run them
- The location and shell can be different
- Each shell has its own syntax, which tells the system what syntax to expect
Useful commands
Command Description
echo Displays information on the console
read Reads a user input
subStr Gets the substring of a string
+ Adds two numbers or combine strings
file Opens a file
mkdir Creates a directory
cp Copies files
mv Moves or renames files
chmod Sets permissions on files
rm Deletes files, folders, etc.
ls Lists directories
You can use all the shell commands that you saw earlier in the course, such as grep, touch, and redirectors ( >,
>>, <, << )
Arguments
Arguments are values that you want the script to act on and are passed to the script by including them in a
script invocation command separated by spaces.
Expressions
Expressions are evaluated and usually assigned to a variable for reference later in the script.
Conditional statements
Conditional statements allow for different courses of action for a script depending on the success or failure of
a test.
Shell scripts with conditional statements can be used when:
- The script asks the user a question that has only a few answer choices
- Deciding whether the script must be run
- Ensuring that a command ran correctly and taking action if it failed
if statement
If the first command succeeds with an exit code of 0 (success), then the subsequent command runs. An if
statement must end with the fi keyword.
if - else statement
if <condition>; then <command>; elif <other condition>; then <other command>; else <default command>;fi
test command
Loop statements
Stop running the entire loop (exits before the condition of the loop is met because the counter reaches the
value passed as a parameter).
Terminates the current loop iteration and returns control back to the top of the loop.
Are used with loops to manage their conditions. These commands return predetermined exit status (either a
status of true or a status of false).
- Useful in testing
- Can return code status. Each code can be associated to a specific error
o For example: exit 0: The program has completed without any error
exit 1: The program has an error
exit n: The program has a specific error.
- $? is a command to get the processing status of the last command that ran.
Security! Check that the script contains only the required functionality
Test! Test all scripts to confirm that they function as expected
SOFTWARE MANAGEMENT
Managing software
The approach for managing software varies depending on the Linux distribution type. Features such as the
software package format and the utility tools used to install, update, and delete packages are different
depending on the source of the distribution.
Debian Method
A package manager installs, updates, and deletes software that is bundled in a package. A package contains
everything that is needed to install the software, including the precompiled code, documentation, and
installation instructions.
Repositories
Are servers that contain software packages. Software packages are retrieved from a repository that can be
hosted in an online or local system. When you use a package manager, you define the location of the
repositories that contain the software packages that the manager can access. This repository information is
typically defined in a package manager configuration file. For example, for the YUM package manager, the
repository information is stored in the /etc/yum.conf file.
- amzn2-core
- amzn2extra-docker
Using the YUM package manager
The following are the typical steps involved in installing software from source code:
1. Download the source code package: Software source code packages are typically compressed archive files
called a tarball.
2. Unarchive the package file: Tarballs usually have the .tar.gz file extension and can be unarchived and
decompressed using the tar command.
3. Compile the source code: A GCC compiler can be used to compile the source code into binary code.
4. Install the software: Once the source code has been compiled, install the software by following the
instructions that are typically included in the package.
wget and curl, are commonly used to download files to a server. Both support the HTTP, HTTPS, and File
Transfer Protocol (FTP) protocols and provide additional capabilities of their own
wget https://www.example.com/mySoftware.zip
curl https://www.example.com/mySoftware.zip
CURL example: Installing the AWS CLI
1. Download the AWS CLI installation file using the curl command. The -o option specifies the file name
that the downloaded package is written to (in this case, awscliv2.zip).
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
2. Unzip the installation file. When the file is unzipped, a directory named aws is created under the
current directory.
unzip awscliv2.zip
3. Run the installation program. The installation command uses a file named install in the newly
unzipped aws directory.
sudo ./aws/install
MANAGING LOG FILES
What is logging?
• Logs keep records of events on the system, which helps with auditing.
Logging can help troubleshoot issues: What or who caused an error? Did anyone wrongfully access a file, a
database, or a server?
Logs are a key to security audits (gathering information about the system) and service-level agreements
(troubleshooting must start within x hours after an issue occurs).
sudo cat /var/log/YUM.log log file lists programs that were installed or updated. (YUM is a package
management utility to install, update, and remove software.)
Logging levels
System Logs
cat, less, more, tail,and head are all commands that are useful to read logs. Using the pipe redirector| and
grep is an efficient way to look for a specific pattern in a log.
You can also open the files using editors such as vi or gedit.
The grep command searches the given file for lines that contain a match to the specified strings or words.
Add the grep command when you look for a specific string of text in log files.
cat yourlog. log | grep ERROR
Important Log files
The lastlog command retrieves user information from the /var/log/lastlog file and outputs it in the console
Log rotation
Log rotation can help with the following in regard to bulky logs:
- It is a way to limit the total size of the logs that are retained.
- It still helps analysis of recent events.
Log rotation is an automated process that is used in system administration where dated log files are archived
AWS re/start
Networking
INTRODUCTION TO NETW ORKING
A computer network is a collection of computing devices that are logically connected together to communicate
and share resources.
It has a host. A host is a node that has a unique function. Other devices connect to nodes so they can access
data or other services. An example of a host is a server, because a server can provide access to data, run an
application, or provide a service.
What is Data?
In computing, its bits and bytes, which equal the value of zero or one. Data can be sent over a network and
saved.
1. Data starts from the source computer. The source computer sends data to the target computer. As the data
leaves the source computer, it is processed and transformed by the different functions in the OSI layers, from
layer 7 down to layer 1. During transmission of data, it can also be encrypted, and additional transmission-
related information, which are called headers, are added to it.
2. Next, the data travels to the application layer (layer 7). This layer provides the interface that enables
applications to access the network services.
3. Next is the presentation layer (layer 6). This layer ensures that the data is in a usable format and handles
encryption and decryption.
4. The data moves to the session layer (layer 5). This layer maintains the distinction between the data of
separate applications.
5. Next is the transport layer (layer 4). This establishes a logical connection between the source and
destination. It also specifies the transmission protocol to use, such as Transmission Control Protocol (TCP).
6. The network layer (layer 3) decides which physical path the data will take. At layer 3(network layer) a
message or data is called a packet. Packets are associated with Internet Protocol (IP) addresses.
7. The data link layer (layer 2) defines the format of the data on the network. At layer 2(the data link layer) a
message or data is called a frame. Frames are associated with a Media Access Control (MAC) address which is
known as a physical address.
8. Finally, the data travels to the physical layer (layer 1), which transmits the raw bitstream over the physical
network.
9. After the data has been transformed through the OSI layers, it is in a package format that is ready to be
transmitted over the physical network.
10. Once the target computer receives the data package, it unpacks the data through the OSI layers, but in
reverse, from layer 1 (physical) through 7 (application).
Layer 1 Physical (cables, physical transmission bits and volts) Regions, Availability Zones
Networking components
Client
A client is a computer hardware device that allows users to access data and a network.
The client makes the request to the server.
Server
Are used to physically connect networks together. Most network nodes are linked together by using some type
of cabling. There are three cables:
Switch
Router
A router is a network device that connects multiple network segments into one network.
•It connects multiple switches and their respective networks to form a larger network (that is, it acts as a
switch between networks).
•It can also filter the data that goes through it, which enables data to be routed differently.
•This device operates at layer 2 and 3 of the OSI.
• A router can filter traffic, while a switch can only switch traffic.
Modem
From the standpoint of geographical span, two of the most common types of computer networks are local
area networks (LANs) and wide area networks (WANs).
A LAN connects devices in a limited geographical area, such as a floor, building, or campus.
LANs commonly use the Ethernet standard for connecting devices, and they usually have high data-transfer
rates.
A WAN connects devices in a large geographical area, such as multiple cities or countries.
WANs use technologies such as fiber-optic cables and satellites to transmit data which are used to connect
LANs.
Network topologies
• A topology is a pattern (or diagram) that shows how nodes connect to each other.
• Computer networks use different topologies to share information.
• The two topologies are: • Physical topology–Refers to the physical layout of wires in the network
• Logical topology–Refers to how data moves through the network.
Physical topologies
Bus topology: The logical topology and data flow on the network also follows the route of the cable, it moves
in one direction.
Star topology: The logical topology works with the central switch managing data transmission. Data that is
sent from any node on the network must pass through the central switch to reach its destination. The central
switch can also function as a repeater to prevent data loss.
Mesh topology: A mesh topology is a complex structure of connections that are similar to peer-to-peer, where
the nodes are interconnected. Mesh networks can be full mesh or partial mesh.
Hybrid topology: A hybrid topology combines two or more different topology structures.
VPC topology: Is a virtual network that allows you to launch AWS resources that you define. It’s a logical
network.
A network management model is a representation of how data is managed, and how applications are hosted in
a network.
The two most common models for LAN are: •Client-server
•Peer-to-peer
Client-server model
The data management and application hosting are centralized at the server and distributed to the clients.
All clients on the network must use the designated server to access shared files and information that are
stored on the serving computer.
Peer-to-peer model
In this model, each node has its own data and applications and is responsible for its own management and
security.
The peer-to-peer model is a distributed architecture that shares tasks or workloads among peers.
Network Protocols
A network protocol defines the rules for formatting and transmitting data between devices on a network.
It typically operates at layer 3 (Network) and layer 4 (Transport) of the OSI model.
Connection-oriented protocol
Connectionless protocol
• It sends a message from one endpoint to the other, without ensuring that the destination is available and
ready to receive the data.
• It does not require a session between the sender and receiver.
• It uses asynchronous communication.
TCP/IP
When TCP and IP are combined they form the TCP/IP protocol suite. TCP/IP implements the set of protocols
that the internet runs on.
During this handshake, the protocol establishes parameters that support the
data transfer between two hosts.
There are also something called reset (RST) flags when a connection closes abruptly and causes and error.
INTERNET PROTOCOLS (IP)
IP is a network protocol that establishes the rules for relaying and routing data in the internet. It uses IP
addresses to identify devices and port numbers to identify endpoints.
Supports subnetting to subdivide a network.
IP is a critical standard within the larger TCP/IP protocol suite when it is combined with the connection-
oriented Transmission Control Protocol (TCP). TCP/IP implements the set of protocols that provides a crucial
service for the internet because it enables the successful routing of network traffic among devices on a
network.
IP addresses
An IP address uniquely identifies a device on a network. Each device on a network has an IP address, and it
serves two main functions: • It identifies a host and a network.
• It is also used for location addressing.
An IP address works in layer 3(networking) of the OSI model. IP addresses can be assigned to devices in a
dynamic or static way. IP addresses can also be made public or private.
There are certain ranges for private IP addresses located in a guide called RFC 1918.
• 10.0.0.0 –10.255.255.255
• 172.16.0.0 –172.31.255.255
• 192.168.0.0 –192.168.255.255
A private IP address, such as 10.0.0.0, can only be accessed within a logically isolated private network.
With a public IP address such as 54.239.28.85 [amazon.com], anyone can publicly access this over the internet.
An IPv4 address uniquely identifies a device within a network. This address is made of a 32-bit number, in
decimal digits, separated by periods.
IP addresses - IPv6
IPv6 standard extends the range of IPv4 addresses by a factor of 1,028. It uses a group of hexadecimal
numbers that are separated by eight colons (:).
• Increases security
• Handles packets more efficiently
• Improves performance
• The numbers identify both the network and device on the network
Dynamic IP addresses can change - Useful for devices that leave and come back to a network.
Static IP addresses cannot change - Useful for devices that are connected to often, like printers.
EC2 instances can be assigned a static or dynamic IP address depending on the use case. If the instance is used
as a server, the best address to assign it is a static IP address, also known as an Elastic IP address (EIP).
Otherwise, it will be assigned a dynamic IP address, when the instance is stopped and restarted, the IP address
will change.
When a network is assigned a range of IP addresses, such as 10.0.0.0-10.255.255.255, a few addresses have a
special purpose. They are not assigned as host addresses.
• The default router address is typically the second address in the range: 10.0.0.1.
• The broadcast address is the last address in the range: 10.255.255.255
Converting an IP address into binary
To understand IP addressing, you can convert the number into binary. A binary number is expressed in the
base-2 numeral system, and it consists of only zeroes and ones:
• The value of 0 or 1 is known as a binary digit, or bit.
• In an IPv4 address, each of the four numbers between the dots is an 8-bit binary number. This means the
entire address is a 32-bit binary number.
• The following table can be used to convert an 8-bit binary number to a decimal, or a decimal to an 8-bit
binary number:
Port numbers
A port number allows a device in a network to further identify the other devices or applications that
communicate with it. It is also known as an endpoint.
• When a port is blocked by a firewall, or if using a VPC it can be blocked by an AWS service like a Security
Group or Network Access Control List, the source will not be able to send or receive traffic depending on the
rules. Essentially ports can be blocked or allowed to certain traffic for security reasons.
• When troubleshooting issues, you can use commands such as netstat, ss, and telnet. These commands are
used at layer 4 of the OSI, but some can be used at layer 7. These will be covered more into detail in later
sections.
• The command netstat confirms established connections, so if a port is blocked, it will not show as an
established connection.
• The command telnet confirms TCP connections to a web server, note, that this can also be used at layer 7 in
the OSI model.
• The command ss is very similar to netstat, however, it confirms IPv4 connections only.
NETWORKING IN THE AW S CLOUD
Amazon VPC
Amazon VPC is a service that you can use to provision a logically isolated section of the AWS Cloud.
This service is called a virtual private cloud, or Amazon VPC. With an Amazon VPC, you can launch your AWS
resources in a virtual network that you define.
• You can spin up a logical environment of what was previously in a data center within minutes in the cloud.
• It is more cost-effective than maintaining equipment in a company data center; you pay for only the
resources that you use.
• It is designed so that companies can migrate and use AWS Cloud services easily.
• It’s secure, scalable, and reliable.
• It works with many innovative AWS and third-party services.
• You can create multiple Amazon VPCs and create test environments before they go live.
When you create a VPC, you must specify the IPv4 address range by choosing a CIDR block, such as
10.0.0.0/16.
• An Amazon VPC address range could be as large as /16 (65,536 addresses) or as small as /28 (16 addresses).
• Private IP ranges should be used according to RFC 1918.
Within each subnet CIDR block, AWS reserves the first four IP addresses and the last IP address:
• 10.0.0.0 – Network address
• 10.0.0.1 – VPC router
• 10.0.0.2 – Domain Name System (DNS) server
• 10.0.0.3 – Reserved for future use
• 10.0.0.255 – Network broadcast, not supported in the VPC, but still reserved
To determine the CIDR range, you can use the following third-party calculator: https://www.subnet-
calculator.com/
To determine the recommended range of private IP addresses that you can use, you can refer to the following
guide: https://datatracker.ietf.org/doc/html/rfc1918.
A Virtual Private Gateway (VPC) is like a data center but in the cloud. It’s logically isolated from other virtual
networks from which you can spin up and launch your AWS resources within minutes.
Private Internet Protocol (IP) addresses are how resources within the VPC communicate with each other. An
instance needs a public IP address for it to communicate outside the VPC. The VPC will need networking
resources such as an Internet Gateway (IGW) and a route table in order for the instance to reach the internet.
An Internet Gateway (IGW) is what makes it possible for the VPC to have internet connectivity. It has two jobs:
perform network address translation (NAT) and be the target to route traffic to the internet for the VPC. An
IGW's route on a route table is always 0.0.0.0/0.
A route table contains routes for your subnet and directs traffic using the rules defined within the route table.
You associate the route table to a subnet. If an IGW was on a route table, the destination would be 0.0.0.0/0
and the target would be IGW.
Security groups and Network Access Control Lists (NACLs) work as the firewall within your VPC. Security
groups work at the instance level and are stateful, which means they block everything by default. NACLs work
at the subnet level and are stateless, which means they do not block everything by default.
INTRODUCTION TO IP SUBNETTING
IP address
Networking is how you connect computers and other devices around the world so that they can communicate
with one another. Each one has an IP address so that traffic (data packets) can be directed to and from each
device. The internet uses these IP addresses to deliver a packet of information to each address it is routed to.
Subnetting
Subnetting is the technique for logically partitioning a single physical network into multiple smaller
subnetworks or subnets.
Organizations use subnets to divide large networks into smaller, more interconnected networks to increase
speed, minimize security threats, and reduce network traffic.
IP subnetting is a method for dividing a single, physical network into smaller subnetworks (subnets).
Subnetting in an IPv4 address gives you 32 bits to divide into two parts: a network ID and a host ID. Depending
on the number of bits you assign to the network ID, subnetting provides either a greater number of total
subnetworks or more hosts. (Hosts are devices that can be part of each subnet.)
Each IP address belongs to a class of IP addresses depending on the number in the first octet.
Standard IPv4 address classes have three network ID sizes: 8 bits for Class A (which allows for more hosts), 16
bits for Class B, and 24 bits for Class C (which can have more subnetworks). However, in many cases, standard
sizes do not fit all. With subnetting, you can have more control over the length of the network ID portion of an
IP address. Your options go beyond the bounds of the standard 8-bit, 16-bit, or 24-bit lengths. Therefore, you
can create more Host IDs for host devices per subnetwork.
The opposite of subnetting is supernetting, where you combine two or more subnets to create a single
supernet. You can refer to this supernet by using a CIDR prefix.
First octet Class Example IP IPv4 bits for network ID sizes
value address
0–126 Class A 34.126.35.125 8
128–191 Class B 134.23.45.123 16
192–223 Class C 212.11.123.3 24
224–239 Class D 225.2.3.40 Used for multicast and cannot be used for regular internet traffic
240–255 Class E 245.192.1.123 Reserved and cannot be used on the public internet
A 32-bit IP address uniquely identifies a single device on an IP network. The subnet mask divides the 32 binary
bits into the host and network sections, but they are also broken into four 8-bit octets. Each subnet is a
network inside a network and contains the following parts:
• Network ID: This portion of the IP address identifies the network and makes it unique.
• Subnet mask: A subnet mask defines the range of IP addresses that can be used within a network or subnet.
It also separates an IP address into two parts: network bits and host bits.
• Host ID range: This range consists of all of the IP addresses between the subnet address and the broadcast
address. To calculate, take the number of usable host IP addresses within the subnet minus the first and last.
• Number of usable host IDs: This number depends on the class and prefix of subnet. Depending on the CIDR,
it can run between 30 and 254. It is always minus the broadcast ID and the first character of the IP address
(minus 2).
• Broadcast ID: This IP address is used to target all systems on a specific subnet instead of a single host. It
permits traffic to be sent to all devices on a specific subnet rather than a specific host.
Subnet Masks
Subnet masks split IP addresses into host and network sections based on four 8-bit octets. In this way, it
defines which part of the IP address belongs to the device and which part belongs to the network. It also
covers up the subnet so that it isn’t seen outside of allowed traffic.
A subnet mask is a 32-bit number created by setting host bits to all 0s and setting network bits to all 1s. In this
way, the subnet mask separates the IP address into the network and host addresses.
The 255 address is always assigned to a broadcast address, and the 0 address is always assigned to a network
address. Neither one can be assigned to hosts because they are reserved for these special purposes.
A subnet mask uses its own 32-bit number to mask the IP address and further enable the subnetting process.
Subnet masks:
• Determine which hosts are on the local network and which hosts are outside the network. Hosts can talk
directly to hosts on the same network, but they must communicate with a router to talk to hosts on external
networks.
• Hide network size information for IPv4 addresses.
• Are used for special purposes:
• Class DIPv4 addresses are used for multicast addressing.
• In computer networking, multicast refers to group communication where information is addressed
to a group of destination computers simultaneously. For example, multicast addressing is used in
internet television and multipoint video conferences.
• Class E IPv4 addresses cannot be used in real applications because they are used only in
experimental ways.
CIDR
An IP addressing scheme that improves the allocation of IP addresses. Using CIDRs, you can create not only
subnets but also supernets. CIDRs are used at the internet service provider (ISP) level and higher.
The general rule is that subnets are used at the organizational level, but CIDRs are used at the internet service
provider (ISP) level and higher.
• Subnets: When you place a mask over the subnet, you instantly create an entire subnetwork that is a
subordinate network of the internet. The subnet mask signals to the router which part of the IP address is
assigned to the hosts (individual participants of the network). It also signals which address determines the
network.
• CIDRs: This scheme adds suffixes and then integrates them directly into the IP address. Using CIDRs, you can
create not only subnets but also supernets. In addition, you can use CIDR to subdivide a network into several
networks.
ADDITIONAL NETWORKING PROTOCOLS
A communication protocol is a system of rules. These rules permit two or more entities of a communications
system to transmit information through any variation of a physical quantity. The different types of
communication protocols include transport, application, management, and support protocols.
Transport protocols run over the best-effort IP layer to provide a mechanism for applications to communicate
with each other. The two general types of transport protocols are a connectionless protocol (User Datagram
Protocol) and a connection-oriented protocol (Transmission Control Protocol).
Application protocols govern various processes, from downloading a webpage to sending an email. Examples
include HTTP, SSL, TLS, mail protocols (SMTP, POP, and IMAP), and remote desktop protocols (RDP and SSH).
Management protocols are used to configure and maintain network equipment. Support protocols facilitate
and improve network communications.
Transport protocols
TCP
TCP/IP is a connection-oriented protocol. It defines how to establish and maintain network communications
where application programs can exchange data. Data that is sent through this protocol is divided into smaller
chunks called packets.
In terms of the OSI model, TCP is a transport-layer protocol. It provides reliable virtual-circuit connection
between applications; that is, a connection is established before data transmission begins. Data is sent without
errors or duplication and is received in the same order as it is sent. No boundaries are imposed on the data;
TCP treats the data as a stream of bytes.
UDP
The UDP uses a simple, connectionless communication model to deliver data over an IP network. Compared to
TCP, UDP provides only a minimum set of functions. It is considered to be unreliable because it does not
guarantee the delivery or ordering of data. Its advantages are that it has a lower overhead, and it is faster than
TCP.
In terms of the OSI model, UDP is also a transport-layer protocol and is an alternative to TCP. It provides an
unreliable datagram connection between applications. Data is transmitted link by link; there is no end-to-end
connection. The service provides no guarantees. Data can be lost or duplicated, and datagrams can arrive out
of order.
Because TCP and UDP use ports for communication, most layer 4 transport problems revolve around ports
being blocked. When troubleshooting layer 4 communications issues, first make sure that no access lists or
firewalls are blocking TCP/UPD ports. Remember that the transport layer controls the reliability of any given
link through flow control, segmentation and desegmentation, and error control. Some protocols can keep
track of the segments and retransmit the ones that fail. The transport layer acknowledges successful data
transmission and sends the next data if no errors have occurred. The transport layer creates packets from the
data that it receives from the upper layers.
Network Protocols
A network protocol defines the rules for formatting and transmitting data between devices on a network. It
typically operates at layer 3 (network) or layer 4 (transport) of the OSI model.
TCP is great for transferring important files because connection is guaranteed even though it has a larger
overhead (time). It is connection oriented.
TCP has something that is called the TCP handshake. This handshake comprises three messages:
• Synchronize (SYN)
• Synchronize/Acknowledge (SYN/ACK)
• Acknowledge (ACK)
During this handshake, the protocol establishes parameters that support the data transfer between two hosts.
For example:
• Host A sends a SYN packet to Host B.
• Host B sends the SYN with an ACK attached to acknowledge that they received it with the message
back to Host A.
• Host A sends the last message with ACK to Host B informing them that they received the SYN/ACK
message.
Another process gracefully closes the communication between the sender and receiver (similar to saying
goodbye to someone) with three messages: •Finish (FIN)
•Finish/Acknowledge (FIN/ACK)
•Acknowledge (ACK)
There are also flags called reset (RST) flags when a connection closes abruptly and causes an error.
Application protocols
HTTP
HTTP is the protocol that is used to reach webpages. A full HTTP address is expressed as a uniform resource
locator (URL).
Secure Hypertext Transfer Protocol (HTTPS) is a combination of HTTP with the SSL/TLS protocol.
SSL is a standard for securing and safeguarding communications between two systems by using encryption.
TLS is an updated version of SSL that is more secure. Many security and standards organizations—such as
Payment Card Industry Security Standards Council (PCI SSC)—require organizations to use TLS version 1.2 to
retain certification.
A TLS handshake is the process that initiates a communication session that uses TLS encryption. During a TLS
handshake, the two communicating sides exchange messages to acknowledge each other and verify each
other. They establish the encryption algorithms that they will use, and agree on session keys. TLS handshakes
are a foundational part of how HTTPS works.
SSL/TLS creates a secure channel between a user’s computer and other devices as they exchange information
over the internet. They using three main concepts—encryption, authentication, and integrity—to accomplish
this result. Encryption hides data that is being transferred from any third parties. Without SSL/TLS, data gets
sent as plain text, and malicious actors can eavesdrop or alter this data. SSL/TLS offers point-to-point
protection to ensure that the data is secure during transport.
To provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and internal
connected resources, you need AWS Certificate Manager (ACM).
RDP and SSH are both used to remotely access machines and other servers. They’re both essential for securely
accessing cloud-based servers, and they also aid remote employees in using infrastructure on premises.
RDP is a protocol that is used to access the desktop of a remote Microsoft Windows computer. Use port 3389
with clients that are available on different operating systems.
SSH is a protocol that opens a secure command line interface (CLI) on a remote Linux or Unix computer. The
standard TCP port for SSH is 22.
DNS
DNS is a database for domain names. It is similar to the contacts list on a mobile phone. The contacts list
matches people’s (or organization’s) names with phone numbers. DNS functions like a contacts list for the
internet.
DNS translates human-readable domain names (for example, www.amazon.com) to machine-readable IP
addresses (for example, 192.0.2.44). DNS servers automatically map IP addresses to domain names.
ICMP
Network devices use ICMP) to diagnose network communication issues and generate responses to errors in IP
networks. A good example is the ping utility, which uses an ICMP request and ICMP reply message. When a
certain host or port is unreachable, ICMP might send an error message to the source.
DHCP
DHCP automatically assigns IP addresses, subnet masks, gateways, and other IP parameters to devices that are
connected to a network.
Some examples of DHCP options are router (default gateway), DNS servers, and DNS domain name.
FTP
FTP is a network protocol that authorizes the transfer of files from one computer to another. FTP performs two
basic functions: PUT and GET. If you have downloaded something such as an image or a file, then you probably
used an FTP server.
Network Utilities
When you work with networks, it is important to check network performance, bandwidth usage, and network
configurations.
The following list contains a few common network utilities that you can use to quickly troubleshoot network
issues. These tools can help ensure uninterrupted service and prevent long delays.
Wireless Technologies
• Wired Equivalent Privacy (WEP): WEP offers wireless protection and added security to wireless networks by
encrypting data.
• Wi-Fi Protected Access (WPA): WPA was introduced as a replacement for WEP. Although WEP and WPA
share similarities, WPA offers improvements in handling security keys and user authorization. Has a 256-bit
key.
• Bluetooth Low Energy (BLE): BLE optimizes energy consumption. You might also hear Bluetooth Low Energy
referred to as BLE, Bluetooth LE, or Bluetooth Smart. BLE technology is primarily used in mobile applications,
and it is suitable for IoT. It was initially developed for the periodic transfer of small chunks of data over short
ranges. BLE is used in solutions that span a range of domains, including healthcare, fitness, beacons, security,
and home entertainment.
• 5G cellular systems: The complete rollout of this technology will take a couple of years. It will eventually
provide download speeds up to 10 Gbps.
The Internet of Things refers to physical devices, or things, that are connected to the internet so that they can
share and collect data.
The primary goal of IoT is for devices to self-report in real time, which can improve efficiency. It can also
surface important information more quickly than a system that depends on human intervention.
What is IoT?
• Is about the ability to transfer data over a network without requiring human-to-human or human-to-
computer interaction
• Is about expanding product capabilities (usage)
• Is about creating value from generated data (analysis)
Examples • Smartphones
• Wearables (smart watches, smart glasses, and others)
• Connected cars
• Thermostats
A device can be thought of as a combination of sensors and actuators that gather vast amounts of information
from its environment. An example is a temperature sensor that captures the temperature-related details from
a living room (which is its environment).
These devices are purpose built and do not come with many compute abilities. For these devices to
communicate, they use lightweight protocols such as MQ Telemetry Transport (MQTT), which do not have a
big footprint on the device.
The data from the device is typically sent to a device gateway, which then pushes this data onto the cloud-
based platform by using the internet (HTTPS). As soon as the data gets to the cloud, software performs some
kind of processing on it.
This processing could be very simple, such as checking that the temperature reading is within an acceptable
range. It could also be complex, such as using computer vision on video to identify objects (such as intruders in
your house).
This data is processed in the platform, and actions are run in a rule-based fashion. Among many other
offerings, the platform mainly provides services for data management, analytics, support enhancements of the
IoT application, and a user interface.
The user interface, typically a mobile device or a computer, is your window to the IoT world.
• You can use AWS IoT Core to connect billions of IoT devices and route trillions of messages to different AWS
services.
• With AWS IoT Core, users can choose their preferred communication protocol.
• Communication protocols include MQTT, Secure Hypertext Transfer Protocol (HTTPS), MQTT over
WebSocket Secure (WSS), and long-range wide area network (LoRaWAN).
Devices communicate with cloud services by using various technologies and protocols. Examples include:
• Wi-Fi and broadband internet
• Broadband cellular data
• Narrow-band cellular data
• Long-range wide area network (LoRaWAN)
• Proprietary radio frequency (RF) communications
Enterprise Mobility
Enterprise mobility is a growing trend for businesses. This approach supports remote-working options, which
use your personal laptops and mobile devices for business. Remote workers can connect and access data
through cloud technology.
Customers can use Amazon WorkSpaces to provision virtual, cloud-based Microsoft Windows or
Amazon Linux desktops, known as WorkSpaces, for their users.
Amazon WorkSpaces eliminates the need to procure and deploy hardware or install complex
software. Customers can quickly add or remove users as their needs change. Users can access
their virtual desktops from multiple devices or web browsers.
• Simple to manage: Customers can deploy and manage applications for their WorkSpaces by using Amazon
WorkSpaces Application Manager (Amazon WAM). They can also use the same tools to manage WorkSpaces
that they use to manage on-premises desktops.
• Secure: Amazon WorkSpaces uses either AWS Directory Service or AWS Managed Microsoft AD to
authenticate users. Customers can add multi-factor authentication (MFA) for additional security. They can use
AWS Key Management Service (AWS KMS) to encrypt data at rest, disk I/O, and volume snapshots. Customers
can also control the IP addresses of users that are allowed to access their WorkSpaces.16
• Scale consistently: Customers can increase the size of the root and user volumes for a WorkSpace, up to
1000 GB each. They can expand these volumes whether they are encrypted or unencrypted. Customers can
request a volume expansion one time in a 6-hour period.
To ensure that data is preserved, customers cannot decrease the size of the root or user volumes after they
launch a WorkSpace
The use cases for Amazon WorkSpaces are nearly endless, but some of the most common use cases include:
Security
INTRODUCTION TO SECU RITY
Security Basics
Security is the practice of protecting valuable assets.
• Assets can be physical or digital and include people, buildings, computers, software applications, and data.
• Cybersecurity is concerned with protecting networks, devices, systems, and digital information from the
following:
– Unauthorized access
– Malicious modification, theft, or destruction
– Disruption of intended use
• The primary goal of cybersecurity is to ensure the confidentiality, integrity, and availability of digital
information.
• Confidentiality protects the privacy of the information by preventing unauthorized access to it. A common
method to ensure confidentiality, for example, is to first ask users to identify themselves before they are
allowed to use a system. This process is known as authentication.
• Integrity ensures that the information is always accurate and correct where it is stored and whenever it is
moved. The data cannot be altered by unauthorized users as it moves inside and outside its containing system
or when it reaches its final storage location. Hashing is an example of a technique that can be used to ensure
that data has not been tampered with during transit.
• Availability ensures that the information is accessible to users when they need it. Businesses typically
address availability requirements by creating plans such as a business continuity plan (BCP) and a disaster
recovery plan (DRP). These plans define processes and procedures to maintain or quickly restore the
availability of the systems containing the information in the event of failure or disruption.
• Virus: A program that can corrupt or delete data and propagate itself from one system to another.
• Spyware: Code that secretly gathers information on a system and reports it to the attacker.
• Worm: A program that spreads itself and consumes resources destructively on a computer.
• Remote access Trojan (RAT): A software tool used to gain unauthorized access to a computer in order to
control it.
Security Strategy
Security controls are measures that protect against threats and eliminate vulnerabilities. There are three types
of security controls: preventive, detective, and corrective. For each type of control, you can implement
physical, technical, and administrative security measures to ensure information confidentiality, integrity, and
availability.
A preventative security control protects a system from security threats before they can happen. A detective
security control helps find a vulnerability early or quickly alert when a breach has happened. A corrective
security control remediates a security breach.
Each type of control provides protection in three different security areas: physical, administrative, and
technical. A physical control is a device or object, such as a security camera. An administrative control is
usually a policy or a procedure that must be followed. Finally, a technical control is usually some software that
provides security functions.
Security lifecycle
An effective security strategy addresses security in stages of a lifecycle. These stages consist of prevention,
detection, response, and analysis. Note that the first three stages correspond to the three types of security
controls.
In the prevention stage, you identify the assets to be protected, assess their vulnerabilities, and implement
measures to remove any discovered vulnerability.
In the detection stage, you implement monitoring solutions to quickly identify and generate alerts if a breach
is detected.
In the response (or corrective) stage, you perform the corrective tasks to eliminate the breach and restore
normal operations.
Finally, in the analysis stage, you review the steps used to resolve the issue and identify any lessons learned. If
necessary, you update your security policies and procedures to make adjustments based on the result of the
analysis.
SECURITY LIFECYCLE: PREVENTION
Prevention strategy
• Physical layer: Network devices and equipment are protected from physical access to keep intruders out.
• Data link layer: Filters are applied to network switches help prevent attacks based on media access control
(MAC) addresses.
• Network and transport layers: Implementing firewalls and access control lists (ACLs) helps to mitigate
unauthorized access to internal systems.
• Session and presentation layers: By using authentication and encryption methods, you can prevent
unauthorized data accesses.
• Application layer: Solutions, such as virus scanners and an IDS, help protect applications.
Types of prevention measures
• Hosts include workstations, servers, or other devices that run services and applications or store data.
• Examples of systems hardening measures include the following:
– Apply operating system (OS) patches and security updates regularly.
– Remove unused applications and services.
– Monitor and control configuration changes.
Identity management
Implement controls for user authentication and authorization.
Network hardening is the activity in the layered security prevention strategy that focuses on protecting the
network.
• The goal is to stop the unauthorized access, misuse, modification, or destruction of a computer network and
its resources. Network hardening combines policies and procedures with hardware and software solutions to
achieve this goal.
• A network security threat is any attempt to expose, alter, disable, or gain unauthorized access to an
organization’s network. Its purpose is to steal data or perform a malicious activity.
• Network attacks start by discovering information about a network and then exploiting a vulnerability in the
network.
• Types of network security discovery threats include the following:
– Network mapping
– Port scanning
– Traffic sniffing
Network mapping
Network mapping exposes the topology of a network.
• Attackers can use it to find out which devices and hosts are present in the network.
• Examples of network mapping commands and tools include the following:
– ping: Determines the IP address of a host
– traceroute: Identifies the network path and devices that a message traverses to reach a host
destination
– Nmap: Discovers which hosts are on a network
Port scanning
Port scanning exposes the available protocols and services in a network.
• Port scanning sends packets sequentially to ports on a host to determine which ports are open.
• Attackers can use it to find out which protocols and services are implemented on the network.
• Nmap is an example of a port scanning tool and does the following:
– Determines which protocols are supported by a host
– Determines which ports are open on a host
– Identifies which services are connected to open ports
Traffic sniffing
Traffic sniffing exposes the information that is traveling through a network.
• Traffic sniffing reads the data in all of the packets that pass through a network interface card (NIC) or
network device.
• Attackers can use it to read any unencrypted data that is passing through a network.
• Wireshark is an example of a traffic sniffing tool.
– It captures network traffic data for multiple protocols.
– You can use it to interactively browse the data.
– It saves the data in multiple formats.
• To protect against network mapping and port scanning, restrict the use of, or disable, network discovery
protocols.
– Internet Control Message Protocol (ICMP)
– Simple Network Management Protocol (SNMP)
• To protect against traffic sniffing, consider the following measures:
– Disable promiscuous mode on NICs
– Use switches instead of hubs in a network
– Encrypt sensitive data in transit
•In the AWS Cloud, use the Amazon Inspector service to discover unintended network exposure vulnerabilities
• Segmenting a network
– Creating private subnets
– Using network access control lists (network ACLs)
Network firewall
A network firewall is a protection mechanism to filter incoming and outgoing traffic in a network.
• Security groups:
– Act like a built-in firewall for instances
– Are associated with network interfaces on an instance
– Define allow rules that determine the access to an instance
» Inbound traffic rules
» Outbound traffic rules
– Are stateful (if you allow a certain type of traffic into an instance, the same type of traffic is allowed
out of the instance).
• Monitors network traffic, detects threats, and automatically defends against them
• Uses different types of threat detection mechanisms including the following:
– Anomaly-based detection: The IPS compares the current traffic pattern against established
baselines for any deviation.
– Signature-based detection: The IPS monitors and analyzes the traffic for known patterns of attack.
• Can be a hardware or software solution
• Is usually placed behind a network firewall
Segmenting a network
You can use network segmentation to apply different security controls to different parts of a network.
• Segmenting creates multiple smaller logical networks from a large network. Each logical network is called a
subnet.
• Each subnet is assigned a contiguous subset of the IP addresses of the large network.
• Each subnet can be configured with its own security controls to meet the requirements of the different types
of resources in the network.
• Classless Inter-Domain Routing (CIDR) notation is used to specify subnet IP address ranges. This notation
provides a shorthand for describing the size of a network.
• Other benefits of segmentation include the following:
– Easier network management
– Improved network performance
• A network ACL:
– Filters inbound and outbound network traffic.
– Is typically implemented in a switch or router.
– Is stateless (inbound and outbound rules are independent of each other).
• In the AWS Cloud, a network ACL:
– Is associated with a subnet in a VPC.
– Allows or denies traffic in and out of a subnet based on rules.
– Hardens security as a secondary level of defence at the subnet level.
PREVENTION: SYSTEMS HARDENING
Systems hardening is the action of securing computing systems to protect them from attacks.
Physical security
Physical security is essential to systems hardening.
Security baselines
A baseline defines the expected conditions of a system.
Patching
A patch:
• Is applied on a system where a weakness was discovered
• Fixes a performance or feature issue
• Reduces the types of methods that can infiltrate the system
• Makes a system more reliable and secure
• Comes as an update for the software or as part of a collection of updates (service pack)
Common systems hardening recommendations
Client Server
Turn on antivirus and firewalls Restrict physical access
Run fewer applications Use dedicated roles
Apply updates when they are released Secure file systems
Limit removable media Use encryption and PKI
Control downloads Use alerts
Restrict terminal services Apply updates when they are released
Monitor the environment Limit administrative access
AWS Tools
Trusted Advisor provides recommendations that help you follow AWS best practices. Trusted Advisor
evaluates your account by using checks.
GuardDuty is a threat detection service. GuardDuty continuously monitors your AWS accounts and workloads
for malicious activity and delivers detailed security findings for visibility and remediation.
Shield is a managed DDoS protection service that safeguards applications that run on AWS.
CloudTrail offers auditing, security monitoring, and operational troubleshooting by tracking user activity and
API usage.
PREVENTION: DATA SECURITY
• Data in motion travels from and to the internet; from and to devices such as smartphones, servers, personal
computers; or directly between these devices.
• Data at rest stay inside devices, such as smartphones, servers, USB keys, and hard drives.
You should use cryptographic techniques, encryption, and controls to secure data based on whether it is in
motion or at rest.
• Cryptography is the discipline that embodies the principles and techniques for providing data security,
including confidentiality and data integrity.
• Encryption is the process of using a code, called a cipher, to turn readable data into unreadable data for
another party. The cipher contains both algorithms to encrypt and to decrypt the data.
The goal of encryption is to achieve data confidentiality.
• A key is a series of numbers and letters that the algorithm uses to encrypt and decrypt data. Only the owners
of the keys can encrypt and decrypt data.
Encryption
• Uses of encryption:
– Algorithms to encrypt and decrypt data
– A secret key to ensure that only the key owners can encrypt and decrypt the data
• Types of encryption: symmetric, asymmetric, and hybrid
Symmetric encryption uses the same key to encrypt and decrypt the data. The key is a shared secret between
the sender and the receiver. Symmetric encryption is fast and reliable and is used for bulk data.
Asymmetric encryption uses both a private key and a public key (a key pair) to encrypt and decrypt the data.
Every user in the conversation has a key pair. Asymmetric encryption is more complex and much slower than
symmetric encryption. However, it provides more capabilities in the way that keys are managed.
A hybrid encryption approach uses both symmetric encryption and asymmetric encryption to protect the data
further.
Symmetric Asymmetric
Encryption Fast and straightforward Complex and time-consuming
Process speed Fast (even large amounts of data) Slow
One key: 128 or 256 bits Two keys: length can be 2048 bits
Keys
or higher
Is extremely secure; the risk of Provides additional security
Level of security
compromise if the shared key is lost services; the key is not shared
Becomes complex with more keys Includes an easy-to-manage key
Manageability
system
Only confidentiality Non-repudiation, authentication,
Security services provided
and more
Is used to securely transmit large Is used for authentication or digital
Use cases
amounts of data, or encrypt databases signatures
AWS CloudHSM and AWS KMS
AWS CloudHSM is a cloud-based hardware security module (HSM). You can use CloudHSM to generate and use
your own encryption keys on the AWS Cloud.
With AWS Key Management Service (AWS KMS), you can create and manage cryptographic keys and control
their use across a wide range of AWS services and in your applications.
Hashing
Hashing is a one-way encryption to create a signature of the file.
• Data integrity means ensuring that the data remains accurate and consistent when it is stored or travels over
a network.
• The data must not have been corrupted or tampered with.
• The data that you receive remains the same as the data that was sent.
• One way to determine data integrity is by using a hash mechanism.
Permissions
A permission grants a specific type of access to a resource (for example, write access to a file). Permissions are
classified into two types: discretionary (based on identity or other criteria) and role-based (based on an
assigned role).
A permission is assigned to a subject (a person, device, or system) to give the subject the resource access
ability defined by the permission.
PREVENTION: PUBLIC KEY INFRASTRUCTURE
Public key infrastructure (PKI) is a collection of technologies that are used to apply cryptography principles to
transfer information securely between two entities. It is based on a practical distribution and implementation
of keys, with a set of tools to achieve confidentiality, integrity, non-repudiation, and authenticity.
PKIs are used to implement the encryption of public keys but also to manage public-key-associated certificates.
Certificates are digital documents that are used in PKI to prove the ownership of a public key. Certificates
contain information about the entity that provided and verified the certificate, the entity to which the
certificate belongs, and the public key.
The entity that issues the certificate is called the issuer or the certificate authority. The entity that receives the
certificate is called the subject.
Enabling trust
Trust
• Prevents rogue systems from integrating between two computers that would like to exchange information
• Is achieved through the exchange of public keys that validate and identify the parties
Public keys
• Ensure that trust exists throughout your entire hierarchy
• Are located in the following: – System that is requesting a certificate
– System that is offering a service
PKI components
Certificates
Digital certificates are electronic credentials that are used to represent online identities of individuals,
computers, and other entities on a network. Digital certificates are like personal identification cards.
Two types of certificates are available: certificates signed by a CA and self-signed certificates.
A certificate with public key and corresponding private key can be used for encryption and decryption. When
only the public key is used, the certificate establishes trust and performs encryption.
PREVENTION: IDENTITY MANAGEMENT
• It is the active administration of subjects, objects, and their relationships regarding access permissions.
• It ensures that identities receive appropriate access to resources.
• It ensures that systems remain scalable in granting access to resources.
Authentication
Multi-factor authentication (MFA) is an authentication method that requires multiple methods or ways of
authentication.
Dictionary attacks
A dictionary attack attempts to systematically enter each word in the dictionary as a password until it finds a
match. Countermeasures for dictionary attacks include enforcing a strong password policy and locking out
access after a fixed number of unsuccessful attempts.
Password managers
• A cloud-based service that you can use to centrally manage SSO access to all Amazon Web Services (AWS)
accounts, including user permissions and AWS Organizations
Federated users
Federated users is a type of SSO implementation that is used between web identities. It uses a token to verify
user identity between distinct systems.
With SSO, individuals can sign into different networks or services by using the same group or personal
credentials. For example, by using SSO, you can use your Google account credentials to sign into Facebook.
Amazon Cognito
• It is an Amazon service that provides user management, authentication, and authorization for your web and
mobile apps.
• You can use Amazon Cognito by signing in with a user name and password through a third-party website.
AWS Identity and Access Management (IAM) is a service that helps you control access to AWS resources in a
secure way by using authentication and authorization.
PREVENTION: AWS IDENTITY AND ACCESS MANAGEMENT (IAM)
IAM is a service that helps securely control access to AWS resources. You can use it to manage access to AWS
services and resources securely. Using IAM, you can create and manage AWS users and groups (to support
authentication). You can also use IAM for permissions to allow or deny their access to AWS resources (to
support authorization).
IAM uses access control concepts that you already know—such as users, groups, and permissions—so that you
can specify which users can access specific services.
Authentication
Use IAM to configure authentication, which is the first step because it controls who can access AWS resources.
IAM is used for user authentication, and applications and other AWS services also use it for access.
Authorization
IAM is used to configure authorization based on the user. Authorization determines which resources users can
access and what they can do to or with those resources. Authorization is defined through the use of policies. A
policy is an object in AWS that, when associated with an identity or resource, defines their permissions.
IAM reduces the need to share passwords or access keys when granting access rights to other people or
systems. It also makes it easy to enable or disable a user’s access.
Use IAM to centrally manage access regarding who can launch, configure, manage, and delete resources. It
provides granular control over access permissions for users, systems, or other applications that might make
programmatic calls to other AWS resources.
IAM features
Identity federation is a system of trust between two parties. Its purpose is to authenticate users and convey
the information needed to authorize their access to resources. In this system, an identity provider (IdP) is
responsible for user authentication. A service provider (SP), such as a service or an application, controls access
to resources.
Security credentials
IAM: Authorization
IAM is global. It is not on a per-Region basis. It applies across all AWS Regions.
IAM policies
An IAM policy is a formal statement of one or more permissions.
Best practice
When you attach the same policy to multiple IAM users, put the users in a group and attach the policy to the
group instead.
DETECTION
Antivirus software
An IDS can detect an attack by using different mechanisms, including the following:
• Anomaly-detection based – The IDS compares the current traffic pattern or system activity against
established baselines for any deviation.
• Signature-based detection – The IDS monitors and analyses the traffic for known patterns of attack.
Amazon GuardDuty
GuardDuty is a threat detection service that continuously monitors your AWS accounts and workloads for
malicious activity. It delivers detailed security findings for visibility and remediation.
When you activate GuardDuty and configure it to monitor your account, GuardDuty automatically detects
threats by using anomaly detection and machine learning techniques. You can view the security findings that
GuardDuty produces in the GuardDuty console or through Amazon CloudWatch Events.
GuardDuty detects unauthorized and unexpected activity in your AWS environment by analysing and
processing data from different AWS service logs. These logs include the following:
• AWS CloudTrail event logs
• Virtual private cloud (VPC) flow logs
• Domain Name System (DNS) logs
GuardDuty extracts various fields from these logs and uses them for profiling and anomaly detection.
AWS CLOUDTRAIL
CloudTrail is an auditing, compliance monitoring, and governance tool from AWS. It is classified as a
Management and Governance tool in the AWS Management Console.
CloudTrail logs, continuously monitors, and retains account activity related to actions across your AWS
infrastructure, which gives you control over storage, analysis, and remediation actions.
CloudTrail benefits
• It increases your visibility into user and resource activity. With this visibility, you can identify who did what
and when in your AWS account.
• Compliance audits are simplified because activities are automatically recorded and stored in event logs.
Because CloudTrail logs activities, you can search through log data, identify actions that are noncompliant,
accelerate investigations into incidents, and then expedite a response.
• Because you are able to capture a comprehensive history of changes that are made in your account, you can
analyse and troubleshoot operational issues in your account.
• CloudTrail helps discover changes made to an AWS account that have the potential of putting the data or the
account at heightened security risk. At the same time, it expedites AWS audit request fulfilment. This action
helps to simplify auditing requirements, troubleshooting, and compliance.
AWS Config is a service used for assessing, auditing, and evaluating the configuration of your AWS resources.
• Provides AWS resource inventory, configuration history, and configuration change notifications
• Provides details on all configuration changes
• Can be used with AWS CloudTrail to gain additional details on a configuration change
• Is useful for the following:
– Compliance auditing
– Security analysis
– Resource change tracking
– Troubleshooting
With AWS Config, you can perform the following configuration management tasks:
• Retrieve an inventory of AWS resources.
• Discover new and deleted resources.
• Record configuration changes continuously. You can determine overall compliance against the configurations
that your internal guidelines specify.
• Get notified when configurations change and analyse detailed resource configuration histories.
AWS Config:
• Monitors resource usage activity and configurations to detect vulnerabilities
• Continuously evaluates the configuration of resources against the AWS Config rules that you define:
– Security prevention rules
– Compliance rules
•Helps troubleshoot security configuration issues
AWS Config rules
An AWS Config rule represents a desired configuration for a resource and is evaluated against configuration
changes on the resource.
• When a security alert is activated, the alert must be verified because false positives can happen, especially
with a system such as an automated intrusion detection system (IDS).
• If the alert is verified, then the event must be investigated. What is the scope of the attack?
• The first step to respond to the attack is to contain infected elements if there are any, such as hosts infected
by a virus. Then, block access to network addresses.
• Notify the departments or the teams that will be affected that they might have limited access to the systems
that they use. Stakeholders might be customers that won’t be able to use a website.
• Recover to get back to business as soon as possible: add security rules, rebuild infected systems, recover
data, and take other appropriate steps.
• Finally, see whether there is a way to strengthen the system to avoid another attack or recover faster. You
can also implement new procedures for the team in case of an attack.
Understanding the business continuity plan (BCP) and disaster recovery plan (DRP)
• DRP: How to recover from an outage or loss and return to a normal situation as quickly as possible.
- Primary goal: Restore business functionality quickly and with minimum impact.
- Security goal: Do not lower the level of controls or safeguards that are in place.
- Follow-on goal: Prevent this threat, exploit, or disaster from happening again.
Disaster recovery: Understanding recovery time objective (RTO) and recovery point objective (RPO)
• Recovery time objective (RTO): The maximum acceptable delay between the interruption of service and
restoration of service. The RTO determines an acceptable length of time for service downtime.
How quickly do you need to recover IT infrastructure to maintain business continuity?
• Recovery point objective (RPO): The maximum acceptable amount of time since the last data recovery point.
The RPO is directly linked to how much data will be lost and how much will be retrieved.
How much data can you lose before the business suffers?
Work recovery time (WRT) involves recovering or restoring data, testing processes, and then making the
system live for production. It corresponds to the time between systems and resource recovery, and the start of
normal processing.
The maximum tolerable downtime (MTD) is the sum of the RTO and the WRT. In other words, MTD = RTO +
WRT.
MTD is the total time that a business can be disrupted after a disaster without causing any unacceptable
consequences from a break in business continuity. Include the MTD value as part of the BCP and DRP.
Analysis is the final phase of the security lifecycle. In the analysis phase, you review the cause of security
incidents and analyse current security controls to determine weaknesses. The objective is to improve and
strengthen those controls to better protect your network, facilities, and organization.
Ensure that each threat yields a better security solution even if no breach occurred.
Have flexibility when considering option to add to the solution.
Maintain a testing environment to test solutions to potential threats.
Risk assessment
Risk is the likelihood of a threat occurring against a particular asset and the possible impact to that asset if the
threat occurs.
A risk assessment helps to identify and rank risk. 1. Identify threats
2. Identify vulnerabilities
3. Determine likelihood
4. Determine impact
5. Determine risk
Risk response strategies
Environment monitoring
A company’s Acceptable Use Policy (AUP) defines how employees or users can be monitored on a company’s
network:
– At work
– Remotely
– On mobile devices
Types of monitoring
Logging Policy
Identify which resources and activities in your enterprise must be logged. Capture this information in a logging
policy. Also, define how logs are managed.
It is important to protect log information from unauthorized access and to back up logs regularly. To ensure
that analysis results are correct, keep the clocks on all log servers accurate and synchronized.
Trusted Advisor is an online resource to help you reduce cost, increase performance, and improve security by
optimizing your AWS environment.
The status of the check is shown by using colour coding on the dashboard page:
• Trusted Advisor notifications – Stay up to date with your AWS resource deployment. You will receive a
weekly notification email message when you opt in for this service.
• Access management – Control access to specific checks or check categories.
• AWS Support application programming interface (API) – Retrieve and refresh Trusted Advisor results
programmatically.
• Action links – Access items in a Trusted Advisor report from hyperlinks that take you directly to the console.
From the console, you can implement the Trusted Advisor recommendations.
• Recent changes – Track recent changes of check status on the console dashboard. The most recent changes
appear at the top of the list to bring them to your attention.
• Exclude items – Customize the Trusted Advisor report. You can exclude items from the check result if they
are not relevant.
• Refresh all – Refresh individual checks or refresh all the checks at once by choosing Refresh All in the upper-
right corner of the summary dashboard. A check is eligible for 5-MinuteRefresh after it was last refreshed.
Trusted Advisor security checks
Trusted Advisor provides popular performance and security recommendations to all AWS customers. The
following Trusted Advisor checks are available to all customers at no cost:
1. AWS Identity and Access Management (IAM) use: Checks for the existence of at least one IAM user to
discourage the use of root access
2. Multi-factor authentication (MFA) on root account: Checks the root account and warns you if MFA is not
activated
3. Security groups – Specific ports unrestricted: Checks security groups for rules that allow unrestricted access
(0.0.0.0/0) to specific ports
4. Amazon Simple Storage Service (Amazon S3) bucket permissions: Checks buckets in Amazon S3 that have
open access permissions or that allow access to any authenticated AWS user.
5. Amazon Elastic Block Store (Amazon EBS) public snapshots: Checks the permission settings for your
Amazon EBS volume snapshots and alerts you if any snapshots are marked as public
6. Amazon Relational Database Service (Amazon RDS) public snapshots: Checks the permission settings for
your Amazon RDS database (DB) snapshots and alerts you if any snapshots are marked as public.
SECURITY BEST PRACTICES FOR CREATING AN AWS ACCOUNT
The AWS services that you use will determine your responsibility. In addition, you are responsible for other
factors, including your data's sensitivity, your company's requirements, and applicable laws and regulations.
The following tasks require that you sign in to AWS with your root user credentials:
• Change your account settings
• Restore AWS Identity and Access Management (IAM) user permissions
• Change your AWS Support plan or cancel your AWS Support plan
• Activate IAM access
• View certain tax invoices
• Close your AWS account
• Register as a seller
• Configure an Amazon Simple Storage Service (Amazon S3) bucket
• Edit or delete an S3 bucket
Security best practices for your AWS account: Stop using root user
AWS recommends that if you have access keys for your account root user, you remove them as soon as
possible. Before you remove the access keys, confirm that they are not being used anywhere in your
applications.
Activate billing report, such as the AWS Cost and Usage Report:
To receive billing reports, you must have an S3 bucket in your AWS account to receive and store reports.
When you set up the report, you can select an existing S3 bucket or create a new one. Whichever you choose,
limit access to only the ones who need it.
AWS COMPLIANCE PROGRAM
PCI DSS
• Payment Card Industry (PCI) Data Security Standard (DSS) is an international regulated set of requirements
intended to maintain a secure environment for payment card transactions.
AWS compliance program
Recall that in the AWS Cloud, security is a shared responsibility between the customer and AWS. AWS is
responsible for the security OF the cloud, and the customer is responsible for the security IN the cloud.
Specifically, AWS handles the security of the physical infrastructure that hosts customer resources, and
customers are responsible for the security of everything that they put in the cloud.
Similarly, compliance is a shared responsibility between you (the customer) and AWS. To aid your compliance
efforts, AWS regularly achieves third-party validation for thousands of global compliance requirements. AWS
continually monitors these requirements to help you meet security and compliance standards for finance,
retail, healthcare, government, and beyond. You inherit the latest security controls operated by AWS,
strengthening your own compliance and certification programs. You also receive access to tools that you can
use to reduce the cost and time to run your own specific security assurance requirements. AWS supports
security standards and compliance certifications, including PCI DSS, HIPAA, and GDPR.
AWS supports many security standards and compliance certifications. These standards and certifications help
customers satisfy compliance requirements for most regulatory agencies around the world.
AWS SECURITY RESOURCES
Amazon Web Services (AWS) communicates its security and control environment, which is relevant to
customers, in the following ways:
• Doing industry certifications and independent third-party attestations
• Providing information about AWS security and control practices in whitepapers and web content
• Providing certificates, reports, and other documentation directly to AWS customers under a nondisclosure
agreement (NDA)
• AWS provides basic support plans for all AWS customers. The basic support plan includes the following:
– Customer service and communities
– AWS Trusted Advisor
– AWS Persona Health Dashboard
Python
Programming
INTRODUCTION TO PROGRAMMING
Automation
Automation refers to any technology that removes human interaction from a system, equipment, or process.
Scripts are often written to automate labour-intensive tasks and streamline workflows.
Some text editors include features that help programmers write code.
Examples: – Microsoft Visual Studio Code
– Sublime Text
– Vi or Vim
– nano
– GNU emacs
– Notepad++
– TextEdit
Examples: – Python
– JavaScript
– C#
– C/C++
Compilers and interpreters take the high-level language that you are developing in, and turn it into low-level
machine code.
Compilers do this process all at one time after changes are made, but before it runs the code.
Examples: C/C++, Basic, GoLang.
Interpreters do this process one step at a time while the code is running.
Examples: Python, Ruby, JavaScript.
What is a variable?
A variable is an identifier in your code that represents a value in memory.
The variable name helps humans to remember what the value means.
Note: Assignment operator is an equal sign (=).
Primitive data type: Data types that are built into a coding language with no modification.
Composite data type: Combines multiple data types into a single unit.
Functions
Functions are collections of instructions that can be called repeatedly in a program.
- Functions do something useful.
- Functions can return a value (to be stored in a variable).
- Functions can return a value based on input values.
- Functions can accept values as input.
- Functions can accept many values as input.
•The programmer must be able to predict what those steps will be...
– When they write the code initially
– When they debug problems that they encounter
Version control
A version control system is software that tracks versions of your code and documents as you update them.
Version control can be done locally on your computer or by using a website that is dedicated to saving these
versions.
Collaboration is doing version control, but in the cloud or on a dedicated website so that multiple people can
work on a project.
Advantages
- Ease of access to project changes
- Error tracking
- Security
Python is a free, easy-to-learn, general-purpose programming language. It has a simpler syntax compared to
other programming languages. Python is neither compiler or interpreter.
Why Python?
- The interpreter enables fast exploratory programming.
- Dynamic typing makes it easy to write quick scripts.
- Python syntax is simple when it is compared to other languages.
- Python can support object-oriented, structural, and functional programming styles.
Another reason to use Python is that it works across platforms. It works on macOS, Linux, Microsoft Windows,
and other platforms.
AWS Cloud9 provides a few key benefits: • Start projects quickly and code with only a web browser.
• Code together in real time.
• Build serverless applications with ease.
AWS Lambda
• Upload your code to AWS Lambda.
• Set up your code to trigger from an event, such as a user who is visiting your webpage.
• Lambda runs your code only when it is triggered, and it uses only the compute resources that are
needed.
• You pay only for the compute time that you use.
• Multiple languages are supported.
• AWS Cloud9 is included in the Lambda interface, so you can share code with developers.
Other tools: Shell scripting
Shell scripting commands are run directly from the command line of an operating system. They are available
on any machine and on any operating system without the need to install new software. Different
environments require different syntax or types of shell scripting, such as Bash and Zshell.
• Shell scripting can be a powerful tool for system administration and command-line work, but it can be
challenging when you want to use more complicated data structures.
• For example: Python can perform some actions—such as creating an HTTP server—in a single line. However,
it could require many lines of code to do the same action in Bash.
• Python has many external libraries and resources. It is a complete programming language.
PYTHON BASICS
Identifiers
In Python, identifier is the name for entities like class, functions, and variables. It helps differentiate one entity
from another entity.
When you name objects in Python, you must observe some rules:
- An identifier cannot start with a digit. 1variable is invalid, but variable1 is valid.
- Keywords cannot be used as identifiers.
- You cannot use special symbols like !, @, #, $, and % in your identifiers.
- Identifiers can be any length.
Functions
Comments
Data types
Data types determine what kind of data is stored in an object.
Python stores the type of an object with the object. When the operation is performed, it checks whether that
operation makes sense for that object. (This technique is called dynamic typing.)
Basic Data types
Note: Using int()is not the same as rounding. It only removes any decimal digits that the number has. It does
not round the decimal numbers up or down, so be careful.
Strings
The triple quote notation allows a string to span multiple lines and include non-printable formatting characters
such as Newline and Tab.
String concatenation
• Strings are immutable. When you manipulate them, you create a new string.
• You can concatenate or add strings together, which creates a new string.
• It is possible to create an empty string. – Example: x = ""
Variables
• When you do something with variables, Python looks up the values to understand what to do.
• You can have as many variables as you want.
• You can name your variables whatever you want, within certain restrictions.
• Restrictions are: – A variable name can only contain letters, numbers, and underscores ( _ ).
– A variable name cannot begin with a number.
– A variable name cannot be a keyword in Python.
Operators
Operators in Python are used for math, equivalence tests, and string operations.
Operator precedence
Operator Description
(expressions...), [expressions...], Parenthesized expression, list, dictionary, set
{key: value...}, {expressions...}
** Exponentiation
+, -, ~ Positive, negative, bitwise NOT (Unary operators)
*, /, %, // Multiplication, division, remainder, floor division
+, - Addition, subtraction
<<, >> Left and right bitwise shift
in, not in, is, is not, <, <=, >, >=, !=, == Membership operators, identity operators,
comparison operators
and, or, not Logical (Boolean) operators
=, +=, -=, *=, /=, %=, //=, **=, &=, |=, ^=, >>=, <<= Assignment operators
Statements
• A statement is usually a line of code, and each line of code that you have seen so far is a single statement.
• A statement is an individual instruction to Python.
Exceptions
Tuple
Tuples are used to store multiple items in a single variable.
Tuple is one of 4 built-in data types in Python used to store collections of data, the other 3 are List, Set, and
Dictionary, all with different qualities and usage.
A tuple is a collection which is ordered and unchangeable.
Tuples are written with round brackets.
Conditionals in code
Some things to note: • The colons after the conditionals are important.
• You can also use the print function.
Loops
• Loops are a technique that tells Python to run a block of code repeatedly.
• Python runs loops for a certain number of times, or while a certain condition is true.
• The two types of Python loops are called for and while.
While loops
• While loops can run indefinitely, so you must include the condition for the loop to stop. Otherwise, the code
creates an infinite loop.
• It is common to use an iterative counter with loops. As the loop completes, the counter increases (or
decreases). When the counter reaches a specific number, the loop stops.
Lists
• Lists are a mutable data type. Lists can contain multiple data types (strings, ints, floats, and even other lists).
• Lists are denoted with brackets ([ ]) on each end.
• Values are enclosed in brackets, and they are separated with commas.
• Any number of items can be in a list—even zero (no) items.
For loops
Dictionaries
• Dictionaries contain immutable keys, which are associated to their values. Keys must be immutable data
types.
• Dictionaries can be nested inside each other.
• To create an empty dictionary, use a pair of braces with nothing inside: {}
• Keys are separated from their values with a colon: {"Key":"Value"}
• Retrieve a value in the dictionary by its key: myDict.get("key") or myDict["key"]
Input
• The input() function asks the user to enter text and saves the result to a variable.
• One optional argument is a prompt for the user, as a string.
FUNCTIONS
In Python, a function is a named sequence of statements that belong together. Their primary purpose is to
help organize programs into chunks that match how you think about the solution to the problem.
First, define the function. Name it and put placeholders for arguments. Indent lines of code inside the function,
like code in loops.
• Functions are used when you must perform the same task multiple times in a program.
• Functions are called by name and the function call often includes arguments that the function code needs for
processing.
• Python includes many built-in functions, such as print and help.
Types of functions
Function arguments enable developers to pass values to a function. For example, a function that is called
setColor could include a string for the color that is being passed. Such a call would appear as setColor(“red”).
Functions enable developers to use the same code many times without retyping the statements.
MODULES AND LIBRARIES
Standard library
The Python standard library is a collection of script modules that a Python program can access. It simplifies the
programming process and reduces the need to rewrite commonly used commands.
Python libraries can also consist of modules that are written in C.
1. Created by you. You might need Python to complete a specific set of tasks or functions that are grouped
together. If so, it can be easier to bind them together into a module.
2. External sources (created by others).
3. Pre-packaged with Python and part of the standard library (examples: math, time, and random).
Importing modules
Standard library modules are imported by using the import command. You can also import specific functions
or constants from a module by using the from command.
Examples include:
You must import the module or the function before you can use it, even if the module is part of the standard
library.
Creating modules
File handlers
File handlers enable Python to read and write to files.
Exception handling
OS capabilities
• Four useful functions in the JSON module are: dump, dumps, load, loads
dump and dumps: Turn various kinds of structured data into a string, which can be written to a file.
load and loads: Turn a string back into structured data.
dump and load work directly with files.
dumps and loads work with strings.
–The s at the end of the name is for string.
What is pip?
pip is the package manager for Python, and it is similar to apt in Linux.
• It is used to install third-party packages. A package holds one or more Python modules that you can use in
your code.
• It is installed along with Python.
It is not called from within Python—pip is called from the command line, like Python itself is.
PYTHON FOR SYSTEM ADMINISTRATION
In Python V3, the os module has been deprecated and replaced by the subprocess module.
Python can improve system administration by running code that makes complex decisions, and then calling
os.system() and subprocess.run() to manage the system.
Module deprecation:
• By default, it does not use a shell. Instead, it tries to run a program with the given string as a name.
• You must pass in a list to run a command with arguments. – Example: subprocess.run(*“python”,”-version”+)
Safety Developers often pass an input string to os.system() without checking the actual commands.
This practice can be dangerous. For example, a malicious user can pass in string to delete
your files.
Separate subprocess.run() is implemented by a class that is called Popen, which is run as a separate
process process.
Additional Because subprocess.run() is really the Popen class, it has useful, new methods such as poll(),
functionality wait(), and terminate().
DEBUGGING AND TESTING
Debugging
Assertions
• Assertions are conditions, such as if statements, that check the values in the application.
• Dynamic analysis uses assertion statements during runtime to raise errors when conditions certain
conditions occur.
Log monitoring
Developers, in code, writes to a text file often referred to as a log file. As certain conditions happen in the
running application, the logging code writes information to the log file.
With log monitoring you get a full view of the application as it is running. The application can be exercised by a
user and the developer can see inspect the log file for a “real world” use of the application.
What to log?
In Python, the default, native log monitoring tool is the library logging.
Software Testing
Unit tests
A unit is the smallest testable part of any software. It is the most basic level of testing. A unit usually has only
one or a few inputs with a single output.
An example of a unit testis verifying each individual function of a program. Developers are responsible for their
own unit testing.
Integration tests
Individual units are combined and tested as a group. You test the interaction between the different parts of
the software so that you can identify issues.
Analogy: When a pen is manufactured, all the pieces of the pen (units) are produced separately—the cap, the
body, the ink cartridge, and so on. All components are tested individually (unit testing). When more than one
unit is ready, you can test them to see how they interact with each other.
A developer can perform integration tests, but dedicated testers are frequently used to do these tests.
System tests
A complete and integrated application is tested. This level determines whether the software meets specific
requirements.
Analogy, continued: When the pen is assembled, testing is performed to see if the pen works. Does it write in
the correct colour? Is it the specified size?
Acceptance testing
Acceptance testing is formalized testing that considers user needs, business needs, and whether the software
is acceptable for delivery to the final user.
DEVOPS AND CONTINUOUS INTEGRATION
DevOps
DevOps is a software engineering culture and practice that aims to unify
software development (Dev) and software operation (Ops).
Goals of DevOps
• DevOps is meant to bridge the gaps between traditional IT, software development, and quality assurance
(QA).
– The most difficult part for beginners is the QA part. The appearance of your code is important.
• DevOps is meant to bridge or reduce the need for specialized individual work.
– When you begin to develop, you might notice that it is easy to become immersed in your own work.
DevOps is meant to make it easier to do development work in a team.
Automation
The goal of automation is creative efficiency. However, automation has several risks that can undermine this
goal:
Automation: Risks
Over-automation: happens when you automate steps in the development process so that it reduces creativity.
If you must think about and consider specific steps in a different way each time that you do them, you
probably should not automate them—for example, analysing, planning, and designing.
Under-automation: occurs when you avoid automation to make sure that things are handled correctly, or
because it is helpful to find exactly where code stops working. Processes that are good to automate include
building, testing, and deploying.
Bad automation: happens when you automate a process that does not work well. Bad automation can be
fixed by revisiting the planning stage of development.
CD also ensures that at any point in the development process, a working version of the code can be produced
immediately. – This part is the deployment automation.
With CI/CD, development teams can make code changes to the main branch, while ensuring that they do not
affect any changes that other developers make.
CONFIGURATION MANAGE MENT
Project infrastructure
Project infrastructure is the way that a project is organized. An architect organizes the infrastructure of a
bridge. Software developers organize the infrastructure of code.
Project infrastructure is a critical discipline for helping to ensure that projects reach their goals. It also includes
ensuring that Python code is styled properly and that it functions as expected.
Code organization
For style: Utilities like pylint can be run to ensure that code blocks are indented correctly, and to fix the code
blocks that are not formatted well.
For logic: Utilities like pytest can be used to run tests to make sure that code changes still meet the
requirements.
• Developers check out code from a repository like AWS CodeCommit or GitHub.
• When they are finished with the code, developers upload their changes to the repository.
• When the new code passes all tests, it can be merged back into the main project.
•The process of checking code in and out can be done by:
– Running Git from the command line
– Using tools that are built into integrated development environments (IDEs), such as PyCharm
•Running tools — such as pylint and pytest — can also be a part of the check-in process.
• As developers update the code, the release managers who are responsible for distributing new versions of
software can monitor the changes.
• After all tests and functionality are verified, the release manager creates a new distribution of the software
based on the contents of the repository.
• Release managers can now version the software, which helps manage customer issues when problems occur.
(This point is where rolling back to a previous version becomes important.)
•In most cases, versioning takes the form of a numeric value (for example: Version 3.2.1).
AWS re/start
Databases
INTRODUCTION TO DATABASES
What is data?
• Data is raw bits and pieces of information.
– Images, words, and phone numbers are examples of data.
What is a database?
• A database is a collection of data that is organized into files that are called tables.
– Tables are a logical way of accessing, managing, and updating data.
Data models
• A data model represents the logical structure of the data that is stored in a database.
• Data models are used to determine how data can be stored and organized.
• The following items are examples of data models: – Relational
– Semi-structured
– Entity relationship
– Object-based
Schema
• A database schema defines the organization of a database.
• It is based on the data model.
• It describes the elements of a database’s design: – Tables
– Columns
– Relationships
– Constraints
Relational databases
• A relational database is a collection of data items that have predefined relationships between them.
• A relational database is often referred to as a structured query language (SQL) database.
• The database requires a fixed definition of the structure of the data.
• The data is stored in tables with rows and columns.
Use cases:
• Ecommerce
• Customer relationship management (CRM): Managing interactions with customers
• Business intelligence (BI) tools: Finance reporting and data analysis
Use cases:
• Fraud detection
• Internet of Things (IoT)
• Social networks
DBMS
• A DBMS is software or database as a service (DBaaS) that provides database functionality.
• The primary benefit of a DBaaS is to avoid the cost of installing and maintaining servers.
The following are two variations of DBMSs:
Locations
DBaaS
•Reduced cost:
– These databases reduce the cost of installing and maintaining servers.
•Fully managed:
– For example, with managed AWS databases, you don’t need to manage database management
tasks, such as server provisioning, patching, setup, configuration, backups, or recovery.
•Faster:
– With these databases, you can use companies, such as AWS, that offer large amounts of storage and
processing power in their data centres
DBaaS examples
Amazon Aurora
A part of Amazon RDS, Aurora is a fully managed relational database engine.
Amazon Dynamo
DBDynamoDB is a fully managed NoSQL database service.
DATA INTERACTION AND DATABASE TRANSACTION
• Application developer
– Creates applications that populate and manipulate the data within a database according to the application’s
functional requirements
• End user
– Uses reports that are created from the information within the database
– Typically accesses a database through a client-server application or a three-tier web application
• Data analyst
– Collects, cleans, and interprets data within a database system
• Database administrator
– Designs, implements, administers, and monitors data in database systems
– Ensures consistency, quality, and security of the database
1. Users use computers and devices that run client applications, which use SQL to request data.
2. The applications use SQL that is sent to the server over a network to communicate with the
database.
3. The server runs a database management system, which receives the requests, processes the
SQL, and returns the response.
1. The user uses a client computer or device that runs a web browser. A webpage
that is running in the web browser captures the user’s input and sends a request to
the web server.
2. The web server gathers the information on the webpage and forwards the
request to the application server for processing.
3. A web application component that is running on the application server receives
the request. It contains the SQL commands to access the database to satisfy the
request. The component sends the commands to the database server.
4. The DBMS that runs on the database server receives and processes the SQL
commands. The DBMS returns the results to the application server.
5. The web application component on the application server processes the results
and returns them to the web server.
6. The web server formats the results into a webpage.
7. The web browser on the client device displays the webpage that contains the SQL
results to the user.
Embedded SQL in application code
• In both interaction models, an application contains the SQL commands that the user requires.
• An application developer embeds SQL statements in the application code so that the application can perform
database tasks.
• The application is installed on a user computer or an application server.
Transactions in databases
A transaction is a collection of changes made to a database that must be performed as a unit.
A transaction is also called a logical unit of work. In other words, either all of its operations succeed, and the
transaction succeeds, or if one or more operations fail, then the entire transaction fails.
At the database level, either all the database changes related to the transaction are performed, or no change is
made to the database at all. It’s an all or nothing modification.
Status of a transaction
• Active state: In the initial state of every transaction and when the transaction is being run, the status is
active.
• Partially committed: A transaction is in a partially committed state when it is completing its final operation.
• Failed state: A transaction is in a failed state when any checks made by the database recovery system fail.
• Aborted state: An aborted transaction occurs if the transaction is in a failed state, and the database rolls
back to its original state before running the transaction.
• Committed state: When all of the operations within a transaction have been successfully performed, the
transaction is considered committed.
• Run a set of operations so that the database never contains the result of partial operations.
– If one operation fails, the database is restored to its original state.
– If no errors occur, the full set of statements changes the database.
Properties of transactions
Transactions follow four standard properties — atomicity, consistency, isolation, and durability — which are
known as ACID.
Atomicity ensures that changes are successfully completed all at once or not at all.
Consistency ensures that any changes will not violate the integrity of the database, including any constraints.
Isolation keeps all transactions in isolation. Transactions are isolated so that they do not interfere with the
other transactions.
Durability ensures that as soon as a transaction is committed, the change is permanent.
CREATING TABLES AND LEARNING DIFFERENT D ATA TYPES
SQL
SQL (pronounced SEE-kwell) is the language that is used for querying and manipulating data. SQL is also used
for defining structures in databases. SQL is a standard programming language for relational databases.
DML
Description
• Views, changes, and manipulates data in a table
• Includes commands to select, update, and insert data into a table, and to delete data from a table
• Is typically used by data analysts, report authors, and programmers who write client applications
Statements
SELECT: Retrieves data from a table
INSERT: Adds rows to a table
UPDATE: Modifies rows in a table
DELETE: Deletes rows from a table
DDL
Description
• Creates and defines the database and the objects in it
• Includes commands to create and delete tables
• Is typically used by database administrators and programmers
Statements
CREATE: Creates a database or a table
ALTER TABLE: Adds, deletes, or modifies columns in a table; also adds or deletes constraints
DROP: Deletes a database object, such as a table or a constraint
DCL
Description
• Controls access to the data in a database
• Includes commands to grant or revoke database permissions
• Is typically used by database administrators and programmers
Statements
REVOKE: Revokes permissions from a database user
GRANT: Grants permissions to a database user
The following table contains examples of commonly used SQL built-in data types:
Identifiers
• Identifiers represent the names of the objects that the user creates, in contrast to language keywords or
statements.
• As a recommended practice, capitalize language keywords and commands, and define identifiers in
lowercase.
• It is important to remember that different database management systems handle capitalization conventions
differently.
Different database management systems handle capitalization conventions differently. The following are some
examples:
• IBM and Oracle: When you are processing code that you write, IBM and Oracle database management
systems automatically convert identifiers to uppercase. (That is, they will ignore the case that you used.) To
retain the case that you used for your identifiers, you must enclose them in double quotation marks (" ").
• Microsoft SQLServer: Microsoft SQLServer can be configured to be case sensitive or not case sensitive, but it
is case sensitive by default. Case sensitivity is associated with the collation properties of SQL Server, which
determine the sorting rules, case, and accent sensitivity properties for your data.
• MySQL Server: MySQL Server is case sensitive by default except in Microsoft Windows.
Constraints on data
Constraints enforce limits on the type of data that can go into a table.
• NOT NULL: ensures that a column does not hold a NULL value.
• UNIQUE: requires a column or set of columns to have values that are unique to those columns.
• DEFAULT: If no value was provided for the column, DEFAULT provides a value when the DBMS inserts a row
into a table.
• Reserved terms are SQL keywords or symbols that have specific meanings when being processed.
• For clarity and to avoid errors, do not use reserved terms in the names of databases, tables, columns, or
other database objects.
Tables
Naming tables
• Certain factors should drive your naming conventions. For example, for the database, tables, and columns,
consider the following factors:
– Rules and limitations that the DBMS imposes
– Naming conventions that the organization adopts
– Clarity
Some additional recommended practices when naming tables and table elements include the following:
• Use descriptive names. The name should be meaningful and should describe the entity that is being
modelled.
• Be consistent when choosing to use singular or plural table names.
• Use a consistent format in table names. For example, if you create a table to store order details, you can use
camel case (orderDetails) or underscore (order_details) for the compound table name. Whichever convention
you select, use it consistently.
Primary key
A primary key is a special column in a table that has a unique value for each row and uniquely identifies the
row.
Foreign key
A foreign key is a special column in a table that holds the primary key value from another table. A foreign key
creates a relationship between the two tables.
A table can have zero or one primary key (PK). The PK can consist of one column or multiple columns
(compound PK). The PK of one table can be defined as a foreign key (FK) of another table to establish a
relationship between the two tables.
Referential integrity
A database quality where every non-NULL foreign key value matches an existing, primary key value
Numeric types
Numeric data types represent numerical values.
The INTEGER, SMALLINT, and BIGINT data types represent whole numbers that can be positive, negative, or
zero. They are exact numeric data types.
When choosing which integer data type to use for a numeric column, select the type with the smallest range
that is enough to accommodate the values that will be stored. In other words, do not over-allocate and waste
storage.
Note that the INTEGER data type can also be abbreviated as INT.
A DECIMAL data type represents an exact fixed-point number. It has two arguments: precision and scale.
Precision defines the total number of digits in the number. Scale defines the number of digits after the decimal
point. The scale cannot exceed the precision. An example use case for a DECIMAL is to store monetary values.
The FLOAT and REAL data types represent approximate numbers. They are stored more efficiently and can
generally be processed faster than DECIMAL values. They work well for scientific calculations that must be fast
but not necessarily exact to a digit.
• .csv files can be opened in any program that works with plain text.
• .csv files have the following format: –Each line contains the same sequence of data.
–Each data point is separated by a comma.
• These files are most commonly used to import or export data in databases and spreadsheets.
Importing a CSV
• Verify that the .csv file has data that matches the number of columns of the table and the type of data in
each column.
• Create a table in MySQL with a table name that corresponds to the .csv file that you want to import.
• Import by using the LOAD DATA statement.
–If the first row of the file contains column headers, use the IGNORE 1 ROWS clause to ignore the first row.
–If the rows in the file are terminated by a newline character, use the TERMINATED BY '\n' clause to indicate
so.
This statement imports data from the temporary file into the city table:
Exporting a CSV
This statement exports data from the city table and places it into the temporary city.csv file:
Cleaning data
As changes are made to databases over time, issues can arise due to disorganization or errors in the data. Data
should be cleaned for a number of reasons, but the following list contains the main reasons:
• Increased productivity
• Improved data quality
• Fewer errors
To combat these issues, data can be cleaned by using the following SQL string functions:
• LEFT, RIGHT, and TRIM: Use these functions to select only certain elements of strings and remove certain
characters.
• CONCAT: Combine strings from several columns and put them together.
• LOWER: Force every character in a string to be lowercase.
• UPPER: Force every character in a string to be uppercase.
DESCRIBE statement
The DESCRIBE statement provides a description of the specified table or view. Usually, tables have more than
one column.
INSERT statement
• When you insert a row, you must specify a column where the data will go.
• When you insert values, you must enclose the values with single quotation marks (' ') for character or date
values.
Name Description
tableName The table where data will be inserted
col_1, col_2, col_3, ... Each column of the table where the data is going
val_1, val_2, val_3, ... Values against each column
First, specify the table name and a list of comma-separated columns inside parentheses after the INSERT INTO
clause. Then, put a comma-separated list of values of the corresponding columns inside the parentheses that
follow the VALUES keyword.
You can use the INSERT INTO statement in two ways. In the first example, when you add values for all the
columns of the table, you do not need to specify the column names. In the second example, both the column
name and the values are written. The number of columns and values must be the same. In addition, the
positions of columns must correspond to the positions of their values.
NULL statement
NULL statements are used as placeholders or to represent a missing value to improve readability. They also
clarify the meaning and actions of conditional statements.
The INSERT statement can insert a NULL value into a column. You can insert a NULL value into an int column
with a condition that the column must not have NOT NULL constraints.
SELECT statement
You use the SELECT statement to select one or more columns from a table. You can also use the SELECT
statement when you want to access a subset of rows, columns, or both. When you query tables, you must
include the FROM clause in your syntax. The result of the SELECT statement is called a result set. It lists rows
that contain the same number of columns.
How it works
The syntax for selecting data follows a precise order. The required clauses must precede the optional clauses.
The first clause contains SELECT and the column names, and the FROM clause with the table name
immediately follows it.
All optional clauses will follow these first two required clauses.
Considerations
• Enclose literal strings, text, and literal dates with single quotation marks (' ').
• As a best practice to improve readability, capitalize SQL keywords (for example, SELECT, FROM, and WHERE).
• Depending on the database engine or configuration, data values that you provide in conditions might be case
sensitive.
Different ways to SELECT columns
Basics
• The clause is followed by the item or items being acted on. In this example, SELECT is followed by the column
names.
• Brackets ([ ]) enclose optional parameters.
• With the SELECT clause, you must specify one or more columns or use an asterisk (*) to request all columns.
Optional clauses
WHERE
Request only certain rows from a table.
In SQL, you can use the WHERE clause to apply a filter that selects only certain rows from a table. In a SELECT
statement, the WHERE clause is optional. The SELECT-FROM-WHERE block can be useful for locating certain
information in rows. You could use this construct if you needed a list of all the cities that are located within a
country.
GROUP BY
Use a column identifier to organize the data in the result set into groups.
Here, the SELECT statement selects the rows from the country table, groups the rows by continent, and counts
the number of rows in each group. The result is a listing of the number of countries in each continent.
HAVING
Use with GROUP BY to specify which groups to include in results.
The HAVING clause filters the results of a GROUP BY clause in a SELECT statement. In this example, the query
selects only the continents that have more than one country after the rows in the table are grouped by
continent.
ORDER BY
Sort query results by one or more columns and in ascending or descending order.
Use the ORDER BY clause to sort query results by one or more columns and in ascending or descending order.
If the items in the table are needed in a specific order of importance, you might need to order the results in
ascending or descending order.
Comment Syntax
Comments begin with specific characters to denote that they are to be ignored and not run.
Single-line comment
• This type of comment begins with a double dash (--).
• Any text between the double dash and the end of the line will be
ignored and not performed.
Inline comment
• This type of comment begins with a double dash (--).
• This comment is similar to the single-line comment in that any text
between the double dash and the end of the line will be ignored
and not performed. This comment differs in that it is preceded by
syntax within the same line, which is not ignored.
Multiple-line comment
• This type of comment begins with /*and ends with */
• Any text between the /*and */will be ignored.
PERFORMING A CONDITIONAL SEARCH
Types of operators
These operators can be used in SELECT, INSERT, UPDATE, and DELETE statements.
Arithmetic operators
Comparison operators
Operator precedence
SQL operators are evaluated in a defined order in a SQL statement.
Aliases
An alias is used to assign a temporary name to a table or column within a SQL query.
• The alias exists only while the SQL statement is running.
• Aliases are useful for assigning names to obscurely named tables and columns.
• Aliases are also used to assign meaningful names to query output that uses arithmetic SQL operators.
• Aliases can be specified in a SQL statement by using the optional AS keyword.
• If spaces are desired in an alias name, the alias should be defined in quotation marks.
You can use aliases to include a column header of your choosing in a query result set.
In some situations, aliases can make your SQL statements simpler to write and easier to read.
NULL values
Databases use NULL to represent the absence of value for a data item.
• Because they have no value, NULL values cannot be compared to each other by using typical comparison
operators.
• Because they have no value, NULL values are not equal to one another.
• Use IS NULL and IS NOT NULL when working with NULL values in a WHERE clause.
• Tables can be designed so that NULL values are not allowed.
WORKING WITH FUNCTIONS
Built-in functions
Some common functions include aggregate functions, conversion functions, date functions, string functions,
mathematical functions, and control flow and window functions.
Aggregate functions
• Sorting is the practice of organizing the sequence of the data returned by a query so that the data can be
analysed effectively.
• Structured query language (SQL) statements use the ORDER BY clause to sort query output in a specified
order.
• Query output can be sorted in either ascending or descending order.
• SQL statements use the GROUP BY clause to combine query output into groups.
• SQL statements use the HAVING clause to apply filter conditions to aggregated group data.
Set operators
Set operators are used to combine the results of multiple queries into a single result set. You can use different
tables to compare or unite the results into one result set. Queries that contain set operations are referred to
as compound queries.
You can use the UNION operator to combine the results of two or more SELECT statements into a single result
set. Using UNION without the ALL operator will remove duplicate rows from the resulting set. The keyword
ALL lists duplicate rows and displays them in the result set.
JOINs
JOIN clauses (inner, left, right, and full) are used to combine rows from two or more tables.
• INNER JOIN: This JOIN returns only the overlapping data between the two tables.
• LEFT JOIN: This JOIN returns the overlapping data between the two tables and the non-matching data from
the left table.
• RIGHT JOIN: This JOIN is the opposite of LEFT JOIN. It returns the overlapping data between the two tables
and the non-matching data from the right table.
• FULL JOIN: This JOIN returns the overlapping data between the two tables and the non-matching data from
both the left and right tables.
The critical thing to remember is that JOINs are clauses in SQL that link two tables together. A JOIN is usually
based on the key or common value that defines the relationship between those two tables.
You can use a SELF JOIN to join a table to itself by using either a LEFT JOIN or an INNER JOIN.
AMAZON RELATIONAL DATABASE SERVICE (AMAZON RDS)
Amazon RDS is a managed database service that sets up and operates a relational database in the cloud.
Running an unmanaged, standalone relational database can be time-consuming and have limited scope. To
address these challenges, AWS provides a service that sets up, operates, and scales the relational database
without any on-going administration.
Amazon RDS provides cost-efficient and resizable capacity while automating time-consuming administrative
tasks.
Amazon RDS frees you to focus on your applications so that you can give them the performance, high
availability, security, and compatibility that they need. With Amazon RDS, your primary focus is your data and
optimizing your application.
Web and mobile applications Ecommerce applications Mobile and online games
•High throughput •Low-cost database •Rapid growth capacity
•Massive storage scalability •Data security •Automatic scaling
•High availability •Fully managed solution •Database monitoring
DB instance
A DB instance is an isolated database environment that runs in the cloud. It is the basic building block of
Amazon RDS.
Automatic: Creates automated backups (data and transaction logs) of DB instances during the backup window.
Manual: Creates storage volume snapshots of your DB instances.
Aurora
Aurora is a relational database engine.
Benefits of Aurora
• Aurora uses the same code, tools, and applications as existing MYSQL and PostgreSQL databases.
• Aurora includes a high-performance storage subsystem. Its database engine is customized to take advantage
of that fast distributed storage.
• An Aurora DB has clusters that consist of one or more DB instances and a cluster volume that manages the
data for those DB instances.
Aurora DB cluster
An Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for
those DB instances.
Primary DB instance
The primary DB is the main instance. The instance allows read and write operations and allows for data
modification.
Aurora Replica
The Aurora Replica connects to the same storage volume as the primary DB instance and supports only read
operations.
Enterprise applications
Compared to commercial databases, Aurora can help cut down your database costs by 90 percent or more
while improving the database’s reliability and availability.
•Data is stored in tables by using predefined • Data is stored in tables with a flexible column
columns of specific data types. structure.
•Relationships can be defined between tables •Each item stored in a table can have a different
by using table foreign keys. number and type of data elements.
•Better performance is achieved by adding • Better performance is achieved by adding a
compute or memory capacity to a single new server to an existing pool of database
database server. servers.
• Tables in DynamoDB: Similar to relational database systems, DynamoDB stores data in tables.
– The table name and primary key must be specified at table creation.
– Each DynamoDB table has at least one column that acts as the primary key.
• The primary key is the only data that is required when storing a row in a DynamoDB table. Any other data is
optional.
• An item is a group of attributes that is uniquely identifiable among all of the other items.
– Each item consists of one or more attributes.–Each item is uniquely identified by its primary key
attribute.
– This concept is similar to a table row in a relational database.
REMEMBER: Primary keys uniquely identify each item in the table, so no two items can have the same primary
key.
• Using the global table option creates a DynamoDB table that is automatically replicated across your choice of
AWS Regions worldwide.
– Deliver fast, local read and write performance for global applications.
– Your applications can stay highly available in the unlikely event of isolation or degradation of an
entire AWS Region.
• Global tables eliminate the difficult work of replicating data between Regions and resolving update conflicts
between copies.
;
AWS re/start
AWS Architecture
AWS CLOUD ADOPTION FRAMEWORK (AWS CAF)
The AWS CAF provides guidance and best practices to help organizations build a comprehensive approach to
cloud computing across the organization.
• The AWS CAF guidance also helps organizations throughout the IT lifecycle to accelerate successful cloud
adoption.
• The AWS CAF is organized into perspectives (sets of business or technology capabilities that are the
responsibility of key stakeholders).
• Perspectives consist of sets of capabilities.
For any organization to successfully migrate its IT portfolio to the cloud, three elements—people, process, and
technology—must be aligned. The AWS CAF provides guidance to support a successful migration to the cloud.
Core perspectives
Business perspective
• IT finance
• IT strategy
• Benefits realization
• Business risk management
Stakeholders from the Business perspective include business managers, finance managers, budget owners, and
strategy stakeholders. They can use the AWS CAF to create a strong business case for cloud adoption and
prioritize cloud adoption initiatives. Stakeholders should ensure that an organization’s business strategies and
goals align with its IT strategies and goals.
People perspective
• Resource management
• Incentive management
• Career management
• Training management
• Organizational change management
Stakeholders from the People perspective include human resources, staffing, and people managers. They can
use the AWS CAF to evaluate organizational structures and roles, assess new skill and process requirements,
and identify gaps. Performing an analysis of needs and gaps can help prioritize training, staffing, and
organizational changes to build an agile organization.
Governance perspective
• Portfolio management
• Program and project management
• Business performance measurement
• License management
Stakeholders from the Governance perspective include the chief information officer (CIO), program managers,
enterprise architects, business analysts, and portfolio managers. They can use the AWS CAF to focus on the
skills and processes needed to align IT strategy and goals with business strategy and goals. This focus helps the
organization maximize the business value of its IT investment and minimize the business risks.
Platform perspective
• Compute provisioning
• Network provisioning
• Storage provisioning
• Database provisioning
• Systems and solution architecture
• Application development
Stakeholders from the Platform perspective include the chief technology officer (CTO), IT managers, and
solutions architects. They use a variety of architectural dimensions and models to understand and
communicate the nature of IT systems and their relationships. They must be able to describe the architecture
of the target state environment in detail. The AWS CAF includes principles and patterns for implementing new
solutions on the cloud and for migrating on-premises workloads to the cloud.
Security perspective
Stakeholders from the Security perspective include the chief information security officer (CISO), IT security
managers, and IT security analysts. They must ensure that the organization meets security objectives for
visibility, auditability, control, and agility. Security perspective stakeholders can use the AWS CAF to structure
the selection and implementation of security controls that meet the organization’s needs.
Operations perspective
• Service monitoring
• Application performance monitoring
• Resource inventory management
• Release management or change management
• Reporting and analytics
• Business continuity or disaster recovery (DR)
• IT service catalogue
Stakeholders from the Operations perspective (for example, IT operations managers and IT support managers)
define how day-to-day, quarter-to-quarter, and year-to-year business is conducted. Stakeholders from the
Operations perspective align with and support the operations of the business. The AWS CAF helps these
stakeholders define current operating procedures. It also helps them identify the process changes and training
that are needed to implement successful cloud adoption.
AWS WELL-ARCHITECTED FRAMEWORK
The Well-Architected Framework describes key concepts, design principles, and architectural best practices for
designing and running workloads in the AWS Cloud.
Features
The Well-Architected Framework provides a set of foundational questions that help you to understand
whether a specific architecture aligns well with cloud best practices. It also includes information about services
and solutions that are relevant to each question and references to relevant resources.
Provides
• Questions that are centred on critically understanding architectural decisions
• Domain-specific lenses
• Hands-on labs
• AWS Well-Architected Tool
• AWS Well-Architected Partner Program
The Well-Architected Framework helps you design your architecture from different perspectives, or pillars. The
pillars are operational excellence, security, reliability, performance efficiency, cost optimization, and
sustainability. Each pillar contains a set of design principles and best practices.
Operational excellence
Key topics:
• Manage and automate changes.
• Respond to events.
• Define standards to manage daily operations.
This pillar includes how your organization supports your business objectives and your ability to run workloads
effectively. It also includes how your organization supports your ability to gain insight into their operations and
to continuously improve supporting processes and procedures to deliver business value.
An example of an operational excellence best practice is to continuously monitor the health and performance
of your workloads using a service such as Amazon CloudWatch. You can use this service to initiate automated
responses to adjust the resources that your workloads use and to prevent performance issues or failures.
Make frequent, small, reversible changes: Design workloads that are scaleable and loosely coupled to permit
components to be updated regularly. Automated deployment techniques together with smaller, incremental
changes reduces the blast radius and allows for faster reversal when failures occur. This increases confidence
to deliver beneficial changes to your workload while maintaining quality and adapting quickly to changes in
market conditions.
Refine operations procedures frequently: As you evolve your workloads, evolve your operations
appropriately. As you use operations procedures, look for opportunities to improve them. Hold regular reviews
and validate that all procedures are effective and that teams are familiar with them. Where gaps are
identified, update procedures accordingly. Communicate procedural updates to all stakeholders and teams.
Gamify your operations to share best practices and educate teams.
Anticipate failure: Perform “pre-mortem” exercises to identify potential sources of failure so that they can be
removed or mitigated. Test your failure scenarios and validate your understanding of their impact. Test your
response procedures to ensure they are effective and that teams are familiar with their process. Set up regular
game days to test workload and team responses to simulated events.
Learn from all operational failures: Drive improvement through lessons learned from all operational events
and failures. Share what is learned across teams and through the entire organization.
Security
Key topics:
• Identify and manage who can do what.
• Establish controls to detect security events.
• Protect systems and services.
• Protect the confidentiality and integrity of data.
The security pillar involves the ability to monitor and protect systems while delivering business value through
risk assessments and mitigation strategies. An example of security in the cloud would be staying up to date
with AWS and industry recommendations and threat intelligence. Automation can be used for security
processes, testing, and validation to scale security operations.
Maintain traceability: Monitor, alert, and audit actions and changes to your environment in real time.
Integrate log and metric collection with systems to automatically investigate and take action.
Apply security at all layers: Apply a defence in depth approach with multiple security controls. Apply to all
layers (for example, edge of network, VPC, load balancing, every instance and compute service, operating
system, application, and code).
Automate security best practices: Automated software-based security mechanisms improve your ability to
securely scale more rapidly and cost-effectively. Create secure architectures, including the implementation of
controls that are defined and managed as code in version-controlled templates.
Protect data in transit and at rest: Classify your data into sensitivity levels and use mechanisms, such as
encryption, tokenization, and access control where appropriate.
Keep people away from data: Use mechanisms and tools to reduce or eliminate the need for direct access or
manual processing of data. This reduces the risk of mishandling or modification and human error when
handling sensitive data.
Prepare for security events: Prepare for an incident by having incident management and investigation policy
and processes that align to your organizational requirements. Run incident response simulations and use tools
with automation to increase your speed for detection, investigation, and recovery.
Reliability
The reliability pillar encompasses the ability of a workload to perform its intended function correctly and
consistently when it is expected to. This ability includes operating and testing the workload through its total
lifecycle.
The performance efficiency pillar refers to using computing resources efficiently while meeting system
requirements. At the same time, it is important to maintain that efficiency as demand fluctuates and
technologies evolve. To implement performance efficiency, take a data-driven approach to building a high-
performance architecture. Gather data on all aspects of the architecture from the high-level design to the
selection and configuration of resource types.
Reviewing your choices on a regular basis helps ensure that you are taking advantage of the continually
evolving AWS Cloud. Monitoring helps ensure that you are aware of any deviance from expected performance.
Make trade-offs in your architecture to improve performance, such as using compression or caching, or
relaxing consistency requirements.
Factors that influence performance efficiency in the cloud include the following:
• Selection: It is important to choose the best solution that will optimize your architecture. Solutions vary
based on the kind of workload that you have, and you can use AWS to customize your solutions in many
different ways and configurations.
• Review: You can continually innovate your solutions and take advantage of the newer technologies and
approaches that become available. Any of these newer releases could improve the performance efficiency of
your architecture.
• Monitoring: After you implement your architecture, you must monitor performance to help ensure that you
can remediate any issues before customers are affected and aware of them. With AWS, you can use
automation and monitor your architecture with tools such as Amazon CloudWatch, Amazon Kinesis, Amazon
Simple Queue Service (Amazon SQS), and AWS Lambda.
• Trade-offs: An example of a trade-off that helps ensure an optimal approach is trading consistency,
durability, and space against time or latency to deliver higher performance.
Go global in minutes: Deploying your workload in multiple AWS Regions around the world allows you to
provide lower latency and a better experience for your customers at minimal cost.
Use serverless architectures: Serverless architectures remove the need for you to run and maintain physical
servers for traditional compute activities. For example, serverless storage services can act as static websites
(removing the need for web servers) and event services can host code. This removes the operational burden of
managing physical servers, and can lower transactional costs because managed services operate at cloud scale.
Experiment more often: With virtual and automatable resources, you can quickly carry out comparative
testing using different types of instances, storage, or configurations.
Consider mechanical sympathy: Use the technology approach that aligns best with your goals. For example,
consider data access patterns when you select database or storage for your workload.
Cost optimization
Cost optimization refers to the ability to avoid or eliminate unneeded expenses and resources. It is a continual
process of refinement and improvement over the span of a workload’s lifecycle.
Similar to the other pillars within the Well-Architected Framework, cost optimization has trade-offs to
consider. For example, you want to consider whether to optimize for speed-to-market or for cost. In some
cases, it’s best to optimize for speed—going to market quickly, shipping new features, or meeting a deadline—
rather than investing in upfront cost optimization.
Adopt a consumption model: Pay only for the computing resources you consume, and increase or decrease
usage depending on business requirements. For example, development and test environments are typically
only used for eight hours a day during the work week. You can stop these resources when they’re not in use
for a potential cost savings of 75% (40 hours versus 168 hours).
Measure overall efficiency: Measure the business output of the workload and the costs associated with
delivery. Use this data to understand the gains you make from increasing output, increasing functionality, and
reducing cost.
Stop spending money on undifferentiated heavy lifting: AWS does the heavy lifting of data centre operations
like racking, stacking, and powering servers. It also removes the operational burden of managing operating
systems and applications with managed services. This allows you to focus on your customers and business
projects rather than on IT infrastructure.
Analyse and attribute expenditure: The cloud makes it easier to accurately identify the cost and usage of
workloads, which then allows transparent attribution of IT costs to revenue streams and individual workload
owners. This helps measure return on investment (ROI) and gives workload owners an opportunity to optimize
their resources and reduce costs.
Sustainability
The discipline of sustainability addresses the long-term environmental, economic, and societal impact of your
business activities. When building cloud workloads, the practice of sustainability includes the following:
• Understanding the impacts of the services used
• Quantifying impacts through the entire workload lifecycle
• Applying design principles and best practices to reduce these impacts
This pillar focuses on environmental impacts, especially energy consumption and efficiency. They are
important levers for architects to inform direct action to reduce resource usage.
You can use the AWS Cloud to run workloads designed to support your wider sustainability challenges.
Examples of these challenges include reducing carbon emissions, lowering energy consumption, recycling
water, or reducing waste in other areas of your business or organization.
Establish sustainability goals: For each cloud workload, establish long-term sustainability goals such as
reducing the compute and storage resources required per transaction. Model the return on investment of
sustainability improvements for existing workloads, and give owners the resources they need to invest in
sustainability goals. Plan for growth, and architect your workloads so that growth results in reduced impact
intensity measured against an appropriate unit, such as per user or per transaction. Goals help you support the
wider sustainability goals of your business or organization, identify regressions, and prioritize areas of
potential improvement.
Maximize utilization: Right-size workloads and implement efficient design to ensure high utilization and
maximize the energy efficiency of the underlying hardware. Two hosts running at 30% utilization are less
efficient than one host running at 60% due to baseline power consumption per host. At the same time,
eliminate or minimize idle resources, processing, and storage to reduce the total energy required to power
your workload.
Anticipate and adopt new, more efficient hardware and software offerings: Support the upstream
improvements your partners and suppliers make to help you reduce the impact of your cloud workloads.
Continually monitor and evaluate new, more efficient hardware and software offerings. Design for flexibility to
allow for the rapid adoption of new efficient technologies.
Use managed services: Sharing services across a broad customer base helps maximize resource utilization,
which reduces the amount of infrastructure needed to support cloud workloads. For example, customers can
share the impact of common data centre components like power and networking by migrating workloads to
the AWS Cloud and adopting managed services, such as AWS Fargate for serverless containers, where AWS
operates at scale and is responsible for their efficient operation. Use managed services that can help minimize
your impact, such as automatically moving infrequently accessed data to cold storage with Amazon S3
Lifecycle configurations or Amazon EC2 Auto Scaling to adjust capacity to meet demand.
Reduce the downstream impact of your cloud workloads: Reduce the amount of energy or resources required
to use your services. Reduce or eliminate the need for customers to upgrade their devices to use your services.
Test using device farms to understand expected impact and test with customers to understand the actual
impact from using your services.
WELL-ARCHITECTED PRINCIPLES
The AWS Well-Architected Framework identifies a set of general design principles to facilitate good design in
the cloud:
Reliability
• Is the probability that an entire system functions for a specified period of time
• Includes hardware, firmware, and software
• Measures how long the item performs its intended function
• Mean time between failure (MTBF): Total time in service divided by the number of failures
• Failure rate: Number of failures divided by the total time in service
Availability
Availability is a measure of the percentage of time that a resource is operating normally.
• Availability is a percentage of uptime (such as 99.9 percent) over a period of time (commonly a year).
• Availability is equal to the normal operation time divided by the total time.
• There is common shorthand: • This shorthand refers to only the number of 9s.
• For example, five 9s is 99.999 percent available.
Traditional, or on-premises IT
In traditional, on-premises IT, HA is the following:
• Expensive
• Suitable for only mission-critical applications
AWS Cloud
AWS expands availability and recoverability options by providing the ability to use the following:
• Multiple servers
• Isolated redundant data centres within each Availability Zone
• Multiple Availability Zones within each AWS Region
• Multiple Regions around the world
• Fault-tolerant services
TRANSITIONING A DATA CENTER TO THE CLOUD
A traditional on-premises infrastructure (or corporate data center) might include a setup that is similar to this
example. This diagram represents a three-tier, client-server architecture in a corporate data center. The box
labelled Corporate Data Center indicates what is contained in the data center.
The bottom of this diagram includes the database servers with attached tape backup devices. This tier is
responsible for the database logic.
The middle of the diagram contains the application servers. An application server is a component-based
product that resides in the middle tier of a server-centric architecture. It provides middleware services for
security and state maintenance and also provides data access and persistence. The application servers also
contain the business logic. The middle section also contains network-attached storage (NAS). NAS devices are
file servers that provide a centralized location for users on a network to store, access, edit, and share files.
The web servers are located at the top of the diagram. The web servers are responsible for the presentation
logic. They are accompanied by load balancers. Load balancers are responsible for efficiently distributing
incoming network traffic across a group of backend servers.
The Microsoft Active Directory or Lightweight Directory Access Protocol (LDAP) server is like a phone book that
anyone can use to locate organizations, individuals, and other resources (such as files and devices in a
network) on the public internet or on a corporate intranet.
The box labelled Storage Area Network (SAN) with the attached external disks refers to storage that is outside
the corporate data center. A SAN is a specialized, high-speed network that provides block-level network access
to storage. SANs are often used to improve application availability (for example, multiple data paths). SANs are
also used to enhance application performance (for example, off-load storage functions, separate networks,
and so on).
You could replace a traditional on-premises or corporate data center with the following in the AWS Cloud:
• You can replace servers, such as the on-premises web servers and app servers, with Amazon Elastic Compute
Cloud (Amazon EC2) instances that run all the same software. Because EC2 instances can run a variety of
Microsoft Windows Server, Red Hat, SuSE, Ubuntu, or Amazon Linux operating systems, you can run many
server applications on EC2 instances.
• You can replace the LDAP server with AWS Directory Service, which supports LDAP authentication. With
Directory Service, you can set up and run Microsoft Active Directory in the cloud or connect your AWS
resources with existing on-premises Microsoft Active Directory.
• You can replace software-based load balancers with Elastic Load Balancing (ELB) load balancers. ELB is a fully
managed load balancing solution that scales automatically as needed. It can perform health checks on
attached resources and redistribute a load away from unhealthy resources as necessary.
• Amazon Elastic Block Store (Amazon EBS) is a storage service to use with Amazon EC2. You can replace SAN
solutions with EBS volumes. You can attach these volumes to application servers to store data long-term and
share the data between instances.
• You can use Amazon Elastic File System (Amazon EFS) to replace your NAS file server. Amazon EFS is a file
storage service for EC2 instances. It offers a user-friendly interface that you can use to create and configure file
systems. It also grows and shrinks your storage automatically as you add and remove files so that you always
use exactly the amount of storage that you need. Another solution could be to run an NAS solution on an EC2
instance.
• You can replace databases with Amazon Relational Database Service (Amazon RDS). With this service, you
can run Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server on a platform that is
managed by AWS.
• Finally, you can automatically back up RDS instances to Amazon Simple Storage Service (Amazon S3). Using
Amazon S3 replaces the need for on-premises, database backup hardware. Amazon S3 provides object storage
through a web service interface. Objects can be up to 5 GB, and you can turn on versioning for your objects.
After transitioning to the AWS Cloud, the example data center might look like this diagram.
The ELB load balancer distributes traffic to the web servers that are now located on EC2 instances. The LDAP
server is now Directory Service. ELB has replaced software-based load balancers and distributes traffic to the
servers, which are now EC2 instances. Amazon EBS has replaced SAN solutions. Amazon EFS has replaced the
NAS file server, and Amazon RDS has replaced the databases.
AWS re/start
Systems
Operations
SYSTEMS OPERATIONS ON AWS
Systems operations (SysOps) is concerned with the deployment, administration, and monitoring of systems
and network resources in an automatable and reusable manner.
SysOps contains critical tasks that keep many companies running today. SysOps supports technical systems by
monitoring them and helping ensure that their performance meets expectations and is trouble-free. SysOps
typically requires understanding a system's entire environment.
SysOps involves the responsibilities and tasks required to build (create), test, deploy, monitor, maintain, and
safeguard complex computing systems.
SysOps professionals typically use automation because of the large size of the infrastructure.
Systems operations in the cloud
• Cloud computing provides organizations the ability to automate the development, testing, and deployment
of complex IT operations.
IAM
• Centrally manage authentication and access to Amazon Web Services (AWS) resources.
• Create users, groups, and roles.
• Apply policies to them to control their access to AWS resources.
Use IAM to configure authentication, which is the first step, because it controls who can access AWS
resources. IAM can also be used to authenticate resources. For example, applications and other AWS services
use it for access.
IAM is used to configure authorization, which is based on knowing who the user is. Authorization controls
which resources users can access and what they can do to or with those resources. IAM reduces the need to
share passwords or access keys when you grant access rights. It also makes it easy to turn on or turn off a
user’s access over time and as appropriate.
• Authenticates by using an access key ID • Uses account ID or alias, IAM user name, and
and a secret access key password
• Provides access to APIs, AWS Command Line • Prompts the user for an authentication code if
Interface (AWS CLI), SDKs, and other development multi-factor authentication (MFA)is turned on
tools
1. Identity-based policies allow a user to attach managed and inline policies to IAM identities, such as users or
the groups that users belong to. A user can also attach identity-based policies to roles. Identity-based policies
are defined and stored as JSON documents.
2. Resource-based policies allow a user to attach inline policies to resources. The most common examples of
resource-based policies are Amazon Simple Storage Service (Amazon S3) bucket policies and IAM role trust
policies. Resource-based policies are JSON policy documents.
3. AWS Organizations service control policies (SCPs) apply permissions boundaries to AWS Organizations,
organizational units (OUs), or accounts. SCPs use the JSON format.
4. Access control lists (ACLs) can also be used to control which principals (that is, users or resources) can
access a resource. ACLs are similar to resource-based policies although they are the only policy type that does
not use the JSON policy document structure.
Best practices
1. Avoid using the account root user credentials for daily administration. Instead, when you set up a new AWS
account, define at least one new IAM user. Then, grant the user or users access so that they can do most daily
tasks by using these IAM user credentials.
2. Delegate administrative functions by following the principle of least privilege. Grant access to only services
that are needed, and limit permissions within those services to only the parts that are needed. You can always
grant additional rights over time if the need arises.
3. Use IAM roles to provide cross-account access. Other best practices for IAM mentioned earlier in this lesson
include configuring strong password policies, turning on MFA for any privileged users, and rotating credentials
regularly.
The AWS Management Console provides a rich graphical interface for a majority
of the products and services that AWS offers. Occasionally, new features might
not have all of their capabilities available through the console when the feature
initially launches.
The AWS CLI provides a suite of utilities that can be run from a command
program in Linux, macOS, or Microsoft Windows.
The software development kits (SDKs) are packages that AWS provides. The
SDKs provide access to AWS services by using popular programming
languages, such as Python, Ruby, .NET, or Java. The SDKs make it
straightforward to use AWS in existing applications. You can also use them
to create applications to deploy and monitor complex systems entirely
through code.
AWS CLI
• The AWS CLI is available for Linux, Microsoft Windows, and macOS.
• After installing the AWS CLI, you can use the aws configure command to specify default settings.
1. Use the curl command. The -o option specifies the file name that the downloaded package is written to.
The option in the example command will write the downloaded file to the current directory with the local
name awscliv2.zip.
2. Use the unzip command to extract the installer package. If unzip is not available, use an equivalent program.
The command extracts the package and creates a directory named aws under the current directory.
3. Next, run the install program. By default, the files are all installed to /usr/local/aws-cli, and a symbolic link is
created in /usr/local/bin.
The command includes sudo to give you write permissions to those directories.
This command displays the version of the AWS CLI and its software dependencies that you just installed.
Command-line format
The command line format can be broken down into several parts:
Query option
Use the --query option to limit fields displayed in the result set.
Dry-run option
The --dry-run option:
• This option checks for required permissions without making a request.
• It also provides an error response if unauthorized.
Tooling &
Automation
AWS SYSTEMS MANAGER
Systems Manager is a collection of capabilities that help you manage your applications and infrastructure
running in the AWS Cloud.
Capabilities overview
Documents
A Systems Manager document defines the actions that Systems Manager performs on your managed
instances.
Automation
Safely automate common and repetitive IT operations and management tasks across AWS resources.
Run Command
The Systems Manager Run Command provides an automated way to run predefined commands against EC2
instances.
Patch Manager
Deploy operating system and software patches automatically across large groups of EC2 instances or on-
premises machines.
Maintenance Windows
Schedule windows of time to run administrative and maintenance tasks across your instances.
State Manager
Maintain consistent configuration of Amazon EC2 or on-premises instances.
Parameter Store
Parameter Store provides a centralized store to manage configuration data or secrets.
Inventory
The Inventory capability collects information about instances and the software that is installed on them.
ADMINISTRATION TOOLS
AWS CloudFormation
You can use CloudFormation to create, update, and delete a set of AWS resources as a single unit.
• You define the resources in a template, which can be written in JSON or YAML.
• CloudFormation provisions the resources defined in a template as a single unit called a stack.
• Key features of CloudFormation include the ability to do the following:
• Preview how proposed changes to a stack will impact the existing environment.
• Detect drift.
• Invoke an AWS Lambda function.
Benefits of CloudFormation
CloudFormation provides the following benefits: reusability, repeatability, and maintainability.
Servers
HOSTING A STATIC WEBSITE ON AMAZON S3
• Amazon S3 stores the HTML, CSS, and JavaScript pages of the static website.
• Amazon S3 automatically assigns an endpoint URL that you can use to access the website.
The benefits of using the Amazon S3 website hosting feature include the following:
Use cases
One limitation of Amazon S3 is that it can serve only HTTP requests to a website. If you need to support HTTPS,
you can use Amazon CloudFront to serve the static website hosted on Amazon S3.
The type of separator depends on the Region that contains the bucket. For example, if the bucket is created in
the US West (Oregon) Region, the separator character is a dash. However, if the bucket is created in the
Europe (Frankfurt) Region, the separator character is a period.
• The S3 bucket should store the website in a folder hierarchy that reflects the content structure of the
website.
• The S3 bucket must include an index document that you define during bucket configuration. The default
name is index.html.
EC2 instances run as virtual machines on host computers that are located in AWS Availability Zones. Each
virtual machine runs an operating system (OS), such as Amazon Linux or Microsoft Windows. You can install
and run applications on the OS in each virtual machine or even run enterprise applications that span multiple
virtual machines.
The virtual machines run on top of a hypervisor layer that AWS maintains. The hypervisor is the operating
platform layer that provides the EC2 instances with access to the actual hardware that the instances need to
run. This hardware includes processors, memory, and storage. Each EC2 instance receives a particular number
of virtual CPUs for processing and an amount of memory, or RAM.
Some EC2 instances use an instance store. The instance store is also known as ephemeral storage. It is storage
that is physically attached to the host computer and provides temporary block-level storage for use with an
instance. The data in an instance store persists only during the lifetime of the instance that uses it. If an
instance reboots, data in the instance store persists. If the instance stops or terminates, data in the instance
store is lost and cannot be recovered.
Amazon EBS optimized instances minimize input/output (I/O) contention between Amazon EBS and other
traffic from your instance, which provides better performance. I/O contention occurs when virtual machines
compete for I/O resources because there is limited network bandwidth.
EC2 instances can also connect to the internet at large, other EC2 instances, and Amazon Simple Storage
Service (Amazon S3) object storage. You can configure the degree of network access to suit your needs and to
balance accessibility needs with security requirements. Different instance types provide different levels of
network performance.
AMI
Template that contains information used to create an EC2 instance
Components:
• Template for root volume: includes an OS, and perhaps an application server and other applications.
• Block device mapping: specifies the default EBS volumes and instance store volumes to attach to the
instance when it is launched.
• Launch permissions: control which AWS accounts can use the AMI.
Benefits:
• Repeatable
• Reusable
• Available from multiple sources
Instance types
Defines a combination of CPU, memory, storage, and networking capacity. Many instance types exist and give
you the flexibility to choose the appropriate mix of resources for your applications. Some are general purpose,
and others are designed to provide extra CPU (processing power), extra RAM (memory), or extra I/O network
performance. Instance types are grouped by categories and families. You should choose the most cost-
effective instance type that supports your workload’s requirements.
Key pairs
Amazon EC2 uses public key cryptography to encrypt and decrypt login information. Public key cryptography
uses a public key to encrypt a piece of data, such as a password. The recipient then uses a private key to
decrypt the data. The public and private keys are known as a key pair.
A key pair is necessary to log in to your instance. You need a key pair that is known and registered in the SSH
settings of the OS that you are connecting to. Typically, you specify the name of the key pair when you launch
the instance. You can create a new key pair and download it as part of the instance launch process.
Alternatively, when you launch the instance, you can specify a key pair that you already have access to. When
the instance is launched, AWS handles the process of configuring the instance to accept the key pair that you
specify. After the instance has booted and you want to connect to it, you can use the private key to connect to
the instance.
VPC
A VPC provides the networking environment for an EC2 instance.
When you launch an EC2 instance, you launch it into a network environment. Typically, you launch it into a
VPC that is created with Amazon Virtual Private Cloud (Amazon VPC). The VPC defines a virtual network in your
own logically isolated area within the AWS Cloud. You can then launch AWS resources, such as instances, into
the VPC. Your VPC closely resembles a traditional network that you might operate in your own data centre.
In the VPC, you define one or more subnets. Subnets are logical network segments within the VPC, and each
subnet exists within a single Availability Zone. Another part of the network configuration is an internet
gateway. An internet gateway is a horizontally scaled, redundant, and highly available VPC component that
handles the communication between the instances in your VPC and the internet.
A virtual private gateway is an optional component that supports virtual private network (VPN) connections.
The virtual private gateway sits on the Amazon side of the VPN connection. You create a virtual private
gateway and attach it to the VPC that you want to create the VPN connection from. The customer side of the
VPN connection has a customer gateway, which is a physical device or software application. Notice that the
diagram shows only one possible VPN solution.
Security groups are also in this network diagram. Each security group defines a set of firewall rules that allow
or block inbound and outbound traffic to or from an instance. Security groups act at the instance level, not the
subnet level. Therefore, each instance in a subnet in your VPC can be assigned to a different set of security
groups. If you do not specify a security group at launch time, the instance will be automatically assigned to the
default security group for the VPC.
Types of IP addresses
A private IP address is always assigned to each instance when it is launched. It is allocated to the instance from
the pool of private IP addresses that are available in the subnet. EC2 instances in the VPC can use private IP
addresses to communicate with each other.
A public IP address can be optionally assigned to an EC2 instance. It is generated dynamically from a pool of
available AWS public IP addresses. Clients can use the public IP address to connect to the instance from the
internet. If you stop an instance and then start it again, it receives a new public IP address. However, if you
reboot an instance, it retains the same public IP address.
An Elastic IP address is a publicly accessible IP address that is allocated from an AWS pool of public IP
addresses. An Elastic IP address can optionally be provisioned and then assigned to an EC2 instance. Elastic IP
addresses are similar to public IP addresses, except that an Elastic IP address is static. You can reassign an
Elastic IP address to another instance at any time.
Security groups
Each instance must have at least one security group that is associated with it. Security groups are essentially
stateful firewalls that surround one or more EC2 instances to give you control over network traffic. A stateful
firewall is a firewall that monitors the full state of active network connections. You can control Internet Control
Message Protocol (ICMP), Transmission Control Protocol (TCP), and User Datagram Protocol (UDP) network
traffic that can pass to the instance. It’s important to understand that security groups are applied to specific
instances rather than at the entry point to your network.
In addition to restricting which ports traffic can flow through, you can also restrict which IP addresses that
traffic can originate from. If you set the source IP address range as 0.0.0.0/0, traffic on that port will be
allowed from any source. However, you can also specify a specific IP address or a Classless Inter-Domain
Routing (CIDR) range. Alternatively, you can allow access only from sources within the AWS Cloud that have a
specific security group assigned to them. By default, when you create a new security group in a VPC, all
outbound traffic is open.
You can assign multiple security groups to a single instance. For example, you can create an administrative
security group, which would allow traffic on TCP port 22. You can also create a database server security group,
which would allow traffic on TCP port 3306. Then, you can assign both of those security groups to one
instance. You can apply a single security group to multiple instances.
Instance profile
• You use an instance profile to attach an AWS Identity and Access Management (IAM) role to an EC2 instance.
• The role supplies temporary permissions that applications running on the instance use to authenticate when
they make calls to AWS resources.
User data
• You can pass user data to an instance to perform customization and configuration tasks when the instance
starts.
• The format of user data varies depending on the OS:
• A shell script or cloud-init directives on a Linux instance
• A batch script or a PowerShell script on a Windows instance
• By default, a user data script runs only the first time you launch an instance.
Instance security
• Protect the default user account (ec2-useron Linux and Administrator on Windows) because it has
administrative permissions.
• Create additional accounts for new users to access the instance.
• Create a key pair or use an existing key pair for the new user.
• For a Linux instance, add new user accounts with SSH access to the instance, and do use not
password logins.
• For a Windows instance, use Active Directory or AWS Directory Service to tightly and centrally
control user and group access.
• Apply security patches regularly.
Remote connection to an instance
Use EC2 Instance Connect or Session Manager, a capability of AWS Systems Manager, to connect to your
instances without the need to manage SSH keys.
• Use the instance console screenshot capability to troubleshoot launch or remote connection problems.
• Turn on termination protection to protect an instance from accidental termination.
• Turn off source and destination check on a network address translation (NAT) instance.
MANAGING AWS EC2 INSTANCES
This diagram shows the lifecycle of an instance. The arrows show actions that you can take, and the boxes
show the state that the instance will enter after that action. An instance can be in one of the following states:
• Pending: When an instance is first launched from an Amazon Machine Image (AMI) or when you start a
stopped instance, it first enters the pending state. At this point, the instance is booted and deployed to a host
computer. The instance type that you specified at launch determines the hardware of the host computer for
your instance.
• Running: When the instance is fully booted and ready, it exits the pending state and enters the running
state. At this time, you can connect over the internet to your running instance.
• Rebooting: An instance temporarily goes into a rebooting state as a result of a reboot action. AWS
recommends that you reboot an instance by using the Amazon EC2 console, AWS Command Line Interface
(AWS CLI), or AWS SDKs instead of invoking a reboot from within the guest operating system (OS). A rebooted
instance stays on the same physical host and maintains the same public Domain Name System (DNS) name and
public IP address. If the instance has instance store volumes, they retain their data.
• Shutting-down: This state is an intermediary state between running and terminated. A terminate action on
the instance initiates this state.
• Terminated: An instance reaches the terminated state as the result of a terminate action. A terminated
instance remains visible in the Amazon EC2 console until the virtual machine is deleted. However, you can’t
connect to or recover a terminated instance.
• Stopping: Instances that are backed by Amazon Elastic Block Store (Amazon EBS) can be stopped or
hibernated. They enter the stopping state before they attain the fully stopped state.
• Stopped: A stopped instance is not billed for usage. While the instance is in the stopped state, you can
modify certain attributes of the instance (for example, the instance type). Starting a stopped instance puts it
back into the pending state, which typically moves the instance to a new host machine. You can also terminate
a stopped instance.
Instance hibernation
Hibernation stops an instance so that its memory and processes can be restored when you start it again:
• Saves the contents of the instance memory (RAM) to the EBS root volume
• Reloads the RAM contents and resumes previously running processes when the instance is started
Hibernation is useful when you have an instance that you must quickly restart but that takes a long time to
warm up if you stop and start it.
Many situations in the cloud require building a new server, including the following:
• Automatic scaling: You might have solutions that must be able to deploy new instances without human
intervention.
• Cost savings: You might decide that you do not need to keep an instance right now. However, perhaps you
do need the ability to recreate it on a short notice. Batch processing use cases typically are in this category.
• Downgrading: You might want to downgrade an instance to save on costs. For example, you can downgrade
an instance that runs on hardware that is dedicated to you, a single customer, to hardware that has shared
tenancy. Alternatively, you might want downgrade the size of an instance (for example, from t2.xlarge to
t2.large).
• Repairing impaired instances: The underlying hardware supporting an EC2 instance can fail. Booting an
instance will place the new EC2 instance on healthy infrastructure.
• Upgrading: Upgrading the OS architecture or image type might require you to launch a new instance.
Resizing an instance
To change the size of an instance, do the following: 1. Stop the instance.
2. Modify the instance’s instance type.
3. Restart the instance.
Updating an instance
You are responsible for periodically updating the OS and security of your instances.
AMI deprecation
Elastic Beanstalk is a platform as a service (PaaS) that facilitates the quick deployment, scaling, and
management of your applications.
To use Elastic Beanstalk, you upload your code and provide information about your application. Elastic
Beanstalk automatically launches an environment and creates and configures the AWS resources to run your
application. These resources include EC2 instances, HTTP servers, and application servers. Elastic Beanstalk
runs on the Amazon Linux AMI and the Windows Server AMI.
You can deploy your code through the AWS Management Console, the AWS Command Line Interface (AWS
CLI), or an integrated development environment (IDE) such as Visual Studio or Eclipse.
• Elastic Beanstalk supports web applications written for common platforms, including Java, .NET, PHP,
Node.js, Python, Ruby, Go, and Docker.
• It gives you control over key runtime configuration options and resources, such as the following:
• EC2 instance type • Database • Amazon EC2 Auto Scaling options
• There is no charge to use the Elastic Beanstalk service itself. You pay for only the resources used by the
underlying services that store and run your applications.
Scaling overview
• Scaling is the ability to increase or decrease compute capacity to meet fluctuating demand.
• Scale out when demand increases.
• Scale in when capacity needs decrease.
• You can scale manually or automatically (auto scaling).
Amazon EC2 Auto Scaling helps ensure that you have the correct number of instances available to handle the
load for your application. You can specify premade selections such as the maximum, minimum, or capacity
thresholds in order to help ensure that your solution meets demand while also maintaining your limits.
Better fault tolerance: Amazon EC2 Auto Scaling can detect when an instance is unhealthy, terminate it, and
launch an instance to replace it. You can also configure Amazon EC2 Auto Scaling to use multiple Availability
Zones. If one Availability Zone becomes unavailable, Amazon EC2 Auto Scaling can launch instances in another
one to compensate.
Better availability: Amazon EC2 Auto Scaling helps ensure that your application always has the right amount of
capacity to handle the current traffic demand.
Better performance: When traffic increases, having more instances gives you the ability to distribute and
share the work to maintain a good response time.
Better cost management: Amazon EC2 Auto Scaling can dynamically increase and decrease capacity as
needed. Because you pay for the EC2 instances you use, you save money by launching instances when they are
needed and terminating them when they aren't.
Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service. It is designed to give
developers and businesses a reliable way to route users to internet applications. It translates names (such as
www.example.com) into the numeric IP addresses (such as 192.0.2.1) that computers use to connect to each
other. Route 53 entries are often configured to point to an ELB load balancer.
ELB automatically distributes incoming traffic across multiple targets, such as EC2 instances, containers, and IP
addresses. ELB load balancers are often configured to point to Amazon EC2 Auto Scaling groups.
Each Amazon EC2 Auto Scaling group contains a collection of EC2 instances. These instances share similar
characteristics and are treated as a logical grouping for the purposes of scaling and management. They help
you maintain application availability and give you the ability to dynamically scale capacity up or down
automatically according to conditions that you define. Any instances launched or terminated within the Auto
Scaling group are automatically registered with the load balancer.
ELB service
Use cases
A load balancer works as the single point of contact for clients and serves as a traffic flag in front of your
servers. It distributes the incoming application traffic across multiple targets, such as Amazon Elastic Compute
Cloud (Amazon EC2) instances. The load balancer will maximize speed by monitoring capacity performance and
the health status of targets that are located in multiple Availability Zones.
Before you can use a chosen load balancer and benefit from its features, you must add listeners and register
your targets (or target groups).
ELB components
Load balancers can have more than one listener. This example shows two listeners:
• Each listener checks for connection requests from clients, by using the protocol and port that were
configured.
• The listener forwards requests to one or more target groups, based on the defined rules.
Rules are attached to each listener, and each rule specifies a target group, condition, and priority:
• When the condition is met, the traffic is forwarded to the target group.
• You must define a default rule for each listener, and you can add rules that specify different target groups
based on the content of the request.
• This configuration is also known as content-based routing.
Each target group routes requests to one or more registered targets, such as EC2 instances, by using the
protocol and port number that you specify:
• You can register a target with multiple target groups.
• You can configure health checks for each target group.
Health checks, which are shown as attached to each target group, are performed on all targets that are
registered to a target group that is specified in a listener rule for your load balancer.
Notice that each listener contains a default rule, and one listener contains another rule that routes requests to
a different target group. As the diagram implies, you can register a target with multiple target groups.
Listeners
• A listener is a process that defines the port and protocol that the load balancer listens on.
• Each load balancer needs at least one listener to accept traffic.
• Up to 50 listeners can be created on a load balancer.
• Routing rules are defined on listeners
Target groups
• A target group contains registered targets that provide support to resources such as the following:
• Amazon Elastic Compute Cloud (Amazon EC2) instances
• Amazon Elastic Container Service (Amazon ECS) container instances
• A single target can have multiple target group registrations.
Capacity: Capacity limits represent the minimum and maximum group size that you want for your auto scaling
group. The group's desired capacity represents the initial capacity of the auto scaling group at the time of
creation. For example, in the diagram, the auto scaling group has a minimum size of one instance, a desired
capacity of two instances, and a maximum size of four instances. The scaling policies that you define adjust the
number of instances within your minimum and maximum ranges.
Scaling in and out: An increase in CPU utilization outside the desired range could cause the auto scaling group
to scale out (adding two instances to the auto scaling group in the example shown). Then when the CPU
utilization decreases, the auto scaling group would scale in, potentially returning to the minimum desired
capacity by terminating instances.
Instance health: The health status of an auto scaling instance indicates whether it is healthy or unhealthy. This
notification can come from sources such as Amazon EC2, Elastic Load Balancing (ELB), or custom health checks.
When Amazon EC2 Auto Scaling detects an unhealthy instance, it terminates the instance and launches a new
one.
Termination policy: Amazon EC2 Auto Scaling uses termination policies to determine which instances it
terminates first during scale-in events.
Launch template: A launch template specifies instance configuration information. It includes the ID of the
Amazon Machine Image (AMI), the instance type, a key pair, security groups, and other parameters used to
launch EC2 instances. When auto scaling groups scale out, the new instances are launched according to the
configuration information specified in the latest version of the launch template.
Scheduled scaling
By scaling based on a schedule, you can scale your application in response to predictable load changes.
Dynamic scaling
Dynamic scaling scales the capacity of your auto scaling group as traffic changes occur.
Predictive scaling
Predictive scaling uses machine learning models to predict your expected traffic (and Amazon EC2 usage),
including daily and weekly patterns. These predictions use data that is collected from your actual Amazon EC2
usage and data points that are drawn from your own observations. The model needs historical data from at
least 1 day to start making predictions. The model is re-evaluated every 24 hours to create a forecast for the
next 48 hours.
• Increase the capacity of your auto scaling group in advance of daily and weekly patterns in traffic flows.
• Forecast load.
• Schedule minimum capacity.
Instance health
Amazon EC2 Auto Scaling periodically checks the health status of all instances within the auto scaling group to
make sure that they're running and in good condition. If Amazon EC2 Auto Scaling detects that an instance is
no longer in the running state, it is treated as an immediate failure, marks the instance as unhealthy, and
replaces it.
Types of health checks: • Amazon EC2 status checks and scheduled events (default)
• Elastic Load Balancing (ELB) health checks
• Custom health checks
Termination policy
A termination policy specifies the criteria that Amazon EC2 Auto Scaling uses to choose an instance for
termination when scaling in. There are various predefined termination policies, including a default policy.
The default termination policy is designed to help ensure that your instances span Availability Zones evenly for
high availability. When Amazon EC2 Auto Scaling terminates instances, it first determines which Availability
Zones have the most instances, and it finds at least one instance that is not protected from scale in.
Amazon EC2 Auto Scaling provides other predefined termination policies, including the following:
OldestInstance: This policy terminates the oldest instance in the group. This option is useful when you are
upgrading the instances in the auto scaling group to a new EC2 instance type. You can gradually replace
instances of the old type with instances of the new type.
NewestInstance: This policy terminates the newest instance in the group. This policy is useful when you’re
testing a new launch template but do not want to keep it in production.
OldestLaunchTemplate: This policy terminates instances that have the oldest launch template. This choice is
good when you are updating a group and phasing out the instances from a previous template configuration.
ClosestToNextInstanceHour: This policy terminates instances that are closest to the next billing hour. Using
this policy is a good way to maximize the use of your instances and manage your Amazon EC2 usage costs.
Lifecycle hooks
Lifecycle hooks provide an opportunity to perform a user action before the completion of a scale-in or scale-
out event.
Launch Templates
A launch template specifies instance configuration information (including the ID of the AMI, the instance type,
a key pair, security groups, and other parameters) and gives you the option to have multiple versions. The
Amazon EC2 Auto Scaling group then maintains the right number of EC2 instances defined by the launch
template depending on your needs. Together, both the launch template and the auto scaling group policies
determine what to launch within the auto scaling group and how to manage them.
Best practices
Steady-state group:
• You set an Amazon EC2 Auto Scaling group with the same min, max, and desired values.
• An instance is recreated automatically if it becomes unhealthy or if an Availability Zone fails.
• There is still potential downtime while an instance recycles.
Avoid thrashing
Thrashing is the condition in which there is excessive use of a computer’s virtual memory, and the computer is
no longer able to service the resource needs of applications that run on it. When you configure automatic
scaling, make sure that you avoid thrashing. Thrashing could occur if instances are removed and added—or
added and removed—in succession too quickly.
AMAZON ROUTE 53
With this service you can do the following: • Register or transfer a domain name.
• Resolve domain names to IP addresses.
• Connect to infrastructure.
• Distribute traffic across Regions.
• Support high availability and lower latency.
• By default, AWS assigns a hostname to your load balancer that resolves to a set of IP addresses.
• Assign your own hostname by using an alias resource record set.
• Create a Canonical Name Record (CNAME) that points to your load balancer.
Routing policies
1. Simple routing policy: Use for a single resource that performs a given function for your domain—for
example, a web server that serves content for the example.com website.
2. Weighted routing policy: Use to route traffic to multiple resources in proportions that you specify.
3. Latency routing policy: Use when you have resources in multiple AWS Regions and you want to route traffic
to the Region that provides the lowest latency.
4. Failover routing policy: Use when you want to configure active-passive failover.
5. Geolocation routing policy: Use when you want to route traffic based on the location of your users.
6. Geoproximity routing policy: Use to route traffic based on the location of your resources and, optionally,
shift traffic from resources in one location to resources in another location.
7. Multivalue answer routing policy: Use when you want Route 53 to respond to DNS queries with up to eight
healthy records that are selected at random.
8. IP-based routing policy: Use when you want to route traffic based on the location of your users and have
the IP addresses that the traffic originates from.
A blue/green deployment is a deployment that reduces the risk of the site or application becoming unavailable
because you run two matching production environments. One environment is referred to as the blue
environment, and the other environment is referred to as the green environment.
The diagram shows an example of a blue/green deployment. Notice the two parallel environments, each with
its own ELB load balancer and Amazon EC2 Auto Scaling configuration. The Route 53 weighted routing feature
is then used to begin shifting users over from the existing (blue) environment to the new (green) environment.
This process might be done to migrate users to the new or upgraded green environment.
You can use services such as Amazon CloudWatch and Amazon CloudWatch Logs to monitor the green
environment. If problems are found anywhere in the new environment, Route 53 weighted routing can be
deployed to shift users back to the running blue servers.
When the new green environment is fully up and running without issues, the blue environment can gradually
be shut down. Because of the potential latency of DNS records, a full shutdown of the blue environment can
take anywhere from a day to a week.
AMAZON CLOUDFRONT
CloudFront is a web service that speeds up the distribution of static and dynamic web content
(such as .html, .css, .js, and image files) to users.
CloudFront delivers content through a worldwide network of data centers that are called edge
locations.
To deliver content to end users with lower latency, Amazon CloudFront uses a global network of more than
450 edge locations and 13 Regional edge caches in more than 90 cities across 48 countries.
CloudFront edge locations (also known as points of presence, or POPs) make sure that popular content can be
served quickly to viewers. CloudFront also has Regional edge caches that bring more content closer to viewers,
even when the content is not popular enough to stay at an edge location, to help improve performance for
that content.
Key features
Security
• Protects against network and application layer attacks
• Delivers content, APIs, or applications over HTTPS using the latest TLS version (TLSv1.3) to encrypt
and secure communication between viewer clients and CloudFront
• Supports multiple methods of access control
• Is compliant with major industry standards, including PCI-DSS, HIPAA, and ISO/IEC, to help ensure
secure delivery for sensitive data
Availability
• Automatically serves content from a backup origin when the primary origin is unavailable by using
its native origin failover capability.
Edge computing
• Offers programmable and secure edge CDN computing capabilities through CloudFront Functions
and Lambda@Edge
Continuous deployment
• Gives you the ability to deploy two separate but identical environments—called a blue/green
deployment—and support integration with the ability to roll out releases gradually without any
Domain Name System (DNS) changes
Cost-effectiveness
• Offers personalized pricing options, including pay-as-you-go, the CloudFront security savings bundle,
and custom pricing. With CloudFront, there are no upfront payments or fixed platform fees, no long-
term commitments, no premiums for dynamic content, and no requirements for professional
services to get started.
• Offers free data transfer between AWS Cloud services and CloudFront.
Companies are able to accomplish the following through CloudFront
The diagram on the slide demonstrates what happens when users request objects after CloudFront has been
configured to deliver your content. Here is a description of each step:
1. A user accesses your website or application and sends a request for an object, such as an image file or an
.html file.
2. DNS routes the request to the CloudFront POP (edge location) that can best serve the request—typically the
nearest CloudFront POP in terms of latency—and routes the request to that edge location.
3. CloudFront checks its cache for the requested object. If the object is in the cache, CloudFront returns it to
the user. If the object is not in the cache, CloudFront does the following:
A. CloudFront compares the request with the specifications in your distribution and forwards the
request to your origin server for the corresponding object (for example, to your S3 bucket or your
HTTP server).
B. The origin server sends the object back to the edge location.
C. As soon as the first byte arrives from the origin, CloudFront begins to forward the object to the
user. CloudFront also adds the object to the cache for the next time someone requests it.
Cost estimation
Traffic distribution: Pricing varies across geographic Regions based on the edge location.
Requests: Number and type of requests
Geographic Region
Data transfer out: The amount of data transferred out of CloudFront edge locations.
AWS re/start
Serverless and
Containers
AWS LAMBDA
With Lambda, you run your code only when needed, and the service scales automatically to thousands of
requests per second.
With Lambda, you can run code without provisioning or managing servers. The steps to use Lambda are as
follows:
1. Upload your code to Lambda, and Lambda takes care of everything that is required to run and scale your
code with high availability.
2. Set up your code to invoke from other AWS services, or invoke your code directly from any web or mobile
application, or HTTP endpoint.
3. AWS Lambda runs your code only when invoked. You pay only for the compute time that you consume. You
pay nothing when your code is not running.
AWS Lambda use case
• An API provides programmatic access to an application and are often used for programs that communicate
with each other.
- The client application sends a request to the server application by using the server application’s API.
- The server application returns a response to the client application.
APIs make computer software accessible to developers through code so that developers can build software
programs that interact with other software programs. APIs work as an intermediary so that applications are
able to communicate with each other.
RESTful APIs
• A RESTful API is an interface that two computer systems use to exchange information securely over the
internet
- Is designed for loosely coupled network-based applications
- Communicates over HTTP
- Exposes resources at specific URIs
Uniform Interface: A request should be made to a single endpoint or URI when it interacts with each distinct
resource that is part of the service.
Stateless: The server does not track which requests the connecting client has made over time. It also does not
keep track of which step the client might have completed in terms of a series of actions. Instead, any session
information about the client is known only to the client itself.
Cacheable: REST clients should be able to cache the responses that they receive from the REST server.
Layered System: RESTful services support layered systems, where the client might connect to an intermediate
server. The REST server can be distributed, which supports load balancing.
Code on Demand: The server could pass code (that can be run) to the client, such as some JavaScript. This
feature extends the functionality of the REST client.
RESTful components
Client: Clients are users who want to access information from the web. The client can be a person or a
software system that uses the API. For example, developers can write programs that access weather data from
a weather system. You can also access the same data from your browser when you visit the weather website
directly.
Resource: Resources are the information that different applications provide to their clients. Resources can be
images, videos, text, numbers, or any type of data. The machine that gives the resource to the client is also
called the server. Organizations use APIs to share resources and provide web services while maintaining
security, control, and authentication. In addition, APIs help them to determine which clients get access to
specific internal resources.
Request: Requests are sent by the client to a server. The request is formatted so that the server will
understand.
Response: The response is what the server sends back to reply to the request from the client. This information
includes a status message of success or failure, a message body containing the resource representation, and
metadata about the response.
API Gateway is service that developers can use to create and maintain APIs. For example, you can create a
REST API for an application that you run on AWS.
Amazon API Gateway handles all the tasks that are involved in accepting and processing concurrent API calls at
scale. These tasks include traffic management, authorization and access control, monitoring, and API version
management. You pay for only the API calls that you receive and the amount of data that is transferred out.
Efficient API development: Run multiple versions of the same API simultaneously with API Gateway, which
gives you the ability to quickly iterate, test, and release new versions.
Performance at any scale: Provide end users with the lowest possible latency for API requests and responses
by taking advantage of the AWS global network of edge locations using Amazon CloudFront.
Cost savings at scale: Decrease your costs as your API usage increases per Region across your AWS accounts
using the tiered pricing model for API requests.
Monitoring: Monitor performance metrics and information on API calls, data latency, and error rates from the
API Gateway dashboard.
Flexible security controls: Authorize access to your APIs with AWS Identity and Access Management (IAM) and
Amazon Cognito.
RESTful API options: HTTP APIs are the best way to build APIs for a majority of use cases because they can be
significantly cheaper than REST APIs.
After your API is deployed, API Gateway provides you with a dashboard to visually monitor calls to a service.
The API interfaces that you develop have a frontend and a backend. The client uses the frontend applications
to make requests. The parts of the API implementation that communicate with the other AWS services are
referred to as the backend.
In API Gateway, the frontend is encapsulated by method requests and method responses. The backend is
encapsulated by requests and responses that work with the other AWS services. These AWS services provide
the functionality that the API exposes, and they take action accordingly.
The diagram illustrates how APIs can be built for various applications. For example, these applications include
web and mobile applications, Internet of Things (IoT) devices, and other applications that use API Gateway. In
API Gateway, you can create, publish, maintain, and monitor APIs. These APIs can integrate with other AWS
serverless applications.
The application uses Amazon Simple Storage Service (Amazon S3) to hosts its presentation code and Amazon
Cognito for authentication and authorization. The application also stores its data in a DynamoDB database.
The application's user interface invokes the RESTful API exposed by API Gateway. This API forwards the user's
request to a Lambda function, which performs the application's functions, accesses the database, and returns
a response.
1. Amazon S3 hosts static web resources—including HTML, CSS, JavaScript, and image files—that are loaded in
the user's browser.
2. Amazon Cognito provides user management and authentication functions to secure the backend API.
3. The browser runs JavaScript that sends and receives data by communicating with API Gateway through REST
web services. The data that is sent through APIGateway uses the backend API that was built with Lambda.
4. DynamoDB provides the persistence layer in this example. The Lambda function that the API uses can store
data in the DynamoDB database.
AWS STEP FUNCTIONS
Step Functions is a serverless orchestration service. You can use it to combine Lambda functions and other
AWS services to build business-critical applications. With Step Functions, you can quickly create distributed
applications that leverage AWS services in addition to your own microservices.
Orchestration centrally manages a workflow by breaking it into multiple steps, adding flow logic, and tracking
the inputs and outputs between the steps.
You can use AWS Step Functions to coordinate AWS services into serverless workflows. Workflows consist of a
series of steps. The output of one step is the input to the next step.
As your applications run, Step Functions maintains the application state, tracking exactly which workflow step
your application is in, and stores an event log of data that is passed between application components. That
means if the workflow is interrupted for any reason, you’re application can pick up right where it left off.
Core concepts
Step Functions is based on workflows (or state
machines) and tasks.
Benefits
Features
Step Functions is a managed serverless service. Its main features include the following:
• Automatic scaling
• High availability
• Pay per use
• Security and compliance
Use cases
Step Functions is useful for creating end-to-end workflows to manage jobs with dependent components and
for dividing business processes into a series of steps.
You can use Step Functions to implement a business process as a series of steps that make up a workflow. The
individual steps in the workflow can invoke a Lambda function that has some business logic. This slide shows
an example.
In this example of a banking system, a new bank account is created after validating a customer’s name and
address by using the account-processing-workflow AWS Step Functions workflow. The workflow begins with
two Lambda functions—CheckName and CheckAddress—running in parallel as task states. Once both are
complete, the workflow initiates the OpenNewAccount Lambda function. You can define retry and catch
clauses to handle errors from task states. You can use predefined system errors or handle custom errors
thrown by these Lambda functions in your workflow. Because your workflow code takes on error handling, the
Lambda functions can focus on the business logic and have less code.
CONTAINERS ON AWS
A container is an application and its dependencies, which can be run in resource-isolated processes. Containers
provide a common interface for migrating applications between environments. Containers are isolated but
share an OS and, where appropriate, bins/libraries.
Containers can run on any Linux system with appropriate kernel-feature support and the Docker daemon
present. This ability makes containers portable. Your laptop, your VM, your Amazon Elastic Compute Cloud
(Amazon EC2) instance, and your bare metal server are all potential hosts. The lack of a hypervisor
requirement also results in almost no noticeable performance overhead. The processes are communicating
directly to the kernel and are largely unaware of their container silo. Most containers boot in only a couple of
seconds.
Benefits of containers
Environmental consistency: the application’s code, configurations, and dependencies are packaged into a
single object.
Process isolation: they have no shared dependencies or incompatibilities because each container is isolated
from the other. Process isolation provides operational efficiency.
Operational efficiency: you can run multiple applications on the same instance.
Developer productivity: increase developer productivity by removing cross-service dependencies and
conflicts.
Version control: you can track versions of your application code and their dependencies. Docker container
images have a manifest file (Dockerfile).
Docker
Benefits of Docker
Microservices architecture: Public documentation on Docker recommends that you run one service per
container.
Stateless: They consist of read-only layers. This means that after the container image has been created, it does
not change.
Portable: Your application is independent from the configurations of low-level resources, such as networking,
storage, and OS details. This feature provides portability. For example, if your application runs in a Docker
container, it will run anywhere.
Single, immutable artefact: Docker also assists with packaging your applications and dependencies in a single,
immutable artifact.
Reliable deployments: When a developer finishes writing and testing code, they can wrap it in a container and
publish it directly to the cloud, and it will instantly work because the environment is the same.
Components of Docker
AWS CONTAINER SERVICES
Registry: Amazon Elastic Container Registry (Amazon ECR), where you can store your container images.
Management: the deployment, scheduling, and scaling of containerized applications.
AWS services are Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service
(Amazon EKS). Amazon ECS provisions new application container instances and compute resources. Use
Amazon EKS to deploy, manage, and scale containerized applications using Kubernetes on AWS.
Hosting: is where the containers run. You can currently run your containers on Amazon ECS using the Amazon
EC2 launch type (where you get to manage the underlying instances on which your containers run), or you can
choose to run your containers in a serverless manner with the AWS Fargate launch type.
Amazon ECR
A fully managed Docker container registry that developers can use to store, manage, and deploy Docker
container images.
Amazon ECS
A highly scalable, high-performance container management service that supports Docker containers.
With Amazon ECS, you can run applications on a managed cluster of EC2 instances. It provides flexible
scheduling. Amazon ECS uses a built-in scheduler or it uses a third-party scheduler, such as Apache Mesos. You
can also perform task, service, or daemon scheduling.
Amazon EKS
A managed service that you can use to run Kubernetes on AWS without needing to install and operate your
own Kubernetes clusters.
With Amazon EKS, AWS manages upgrades and high availability services for you. Amazon EKS runs three
Kubernetes managers across three Availability Zones to provide high availability. Amazon EKS automatically
detects and replaces unhealthy managers and provides automated version upgrades and patching for the
managers.
AWS Fargate
A compute engine for Amazon ECS that you can use to run containers without needing to manage servers or
clusters.
Deploying to AWS
Deploying your managed container solutions on AWS involves selecting an orchestration tool and a launch
type.
Amazon ECS is a fully managed container orchestration service that provides the most secure, reliable, and
scalable way to run containerized applications.
Amazon EKS is a fully managed Kubernetes service that provides the most secure, reliable, and scalable way to
run containerized applications using Kubernetes.
AWS re/start
AWS Database
Services
INTRODUCTION TO DATABASES ON AWS
Data warehouses
A data warehouse is a central repository of information that can be analysed to make more informed
decisions. A data warehouse can contain multiple databases.
• The top tier is the frontend client that presents results through
reporting, analysis, and data mining tools.
• The middle tier consists of the analytics engine that is used to access
and analyse the data.
Amazon Redshift is a fully managed data warehouse service in the cloud and is scalable with virtually no
downtime.
• You can use it to run complex analytic queries against petabytes of structured data.
• It uses sophisticated query optimization, columnar storage on high-performance local disks, and parallel
query execution.
• Amazon Redshift monitors clusters automatically and nearly continuously and has encryption built in.
Big data
• Incur a low price point for small customers.
• Ease deployment and maintenance via managed service.
• Focus more on data and less on database management.
• Aurora is a highly available, resilient, and cost-effective managed relational database. Amazon Relational
Database Service (Amazon RDS) fully manages Aurora.
• The Aurora database engine is fully compatible with existing MySQL and PostgreSQL open-source databases
and regularly adds compatibility for new releases.
• Aurora can provide up to five times the throughput of standard MySQL and up to three times the throughput
of standard PostgreSQL that runs on the same hardware.
• It combines the performance and availability of high-end commercial databases with the simplicity and cost-
effectiveness of open source databases.
Use cases
AWS DATABASE MIGRATION SERVICE (AWS DMS)
AWS DMS is a service that migrates databases to Amazon Web Services (AWS) quickly and securely.
Database consolidation
Database migration process
The AWS DMS architecture consists of the following components: • Replication instance
• Task
• Source
• Target
• The AWS SCT converts your existing database schema and code objects from one database engine to
another.
• It is used for heterogeneous migrations.
• You can convert a relational schema or a data warehouse schema.
• The database objects that the AWS SCT converts include the source database schema, views, stored
procedures, and functions.
• The AWS SCT can also scan your application source code for embedded SQL statements and convert them so
that they are compatible with the target database.
AWS Networking
Services
AWS CLOUD NETWORKING AND AMAZON VIRTUAL PRIVATE CLOUD
In its most basic form, a cloud-based network is a private IP address space where you can deploy computing
resources. In Amazon Web Services (AWS), a virtual private cloud (VPC) component provides this private
network space. A VPC enables you to define a virtual network in your own logically isolated area within the
AWS Cloud. Inside this virtual network, you can deploy AWS computing resources. These resources include, for
example, Amazon Elastic Compute Cloud (Amazon EC2) or Amazon Relational Database Service (Amazon RDS)
instances. You can also define how—and whether—your private network space connects to endpoints in your
network topology.
In the example, a VPC contains three EC2 instances and an RDS instance. It is connected to the internet and the
corporate data center. It is also connected to a secondary VPC.
A VPC is a virtual network that is provisioned in a logically isolated section of the AWS Cloud:
• Supports logical separation with subnets
• Offers fine-grained security
• Supports an optional hardware virtual private network (VPN)
• Valid private IP address ranges are defined by Request for Comment (RFC) 1918.
• In a VPC, you can only define networks between /16 and /28.
Amazon VPC reserved IP addresses
VPC CONNECTIVITY OPTIONS
A NAT device forwards traffic from an instance that is in a private subnet to the internet or other AWS
services and then sends the response back to the instance.
• VPC peering connects two VPCs so that you route traffic between them using private addresses.
• A Site-to-Site VPN connection establishes a secure connection between your on-premises equipment and
your VPCs.
• A VPC endpoint privately connects your VPC to supported AWS services and to services that are powered by
PrivateLink without leaving the AWS network.
• Transit Gateway establishes a network transit hub that you can use to interconnect your VPCs and on-
premises networks without using the public internet.
SECURING AND TROUBLE SHOOTING YOUR NETWORK
Network ACLs
Security groups
A security group allows traffic to or from an elastic network interface and has the following characteristics:
•It defines traffic rules in an inbound rules table and an outbound rules table.
•It is configured by default to do the following: • Deny all inbound traffic.
• Allow all outbound traffic.
• Allow traffic between resources that are assigned to the
same security group.
• It is stateful. If rules allow traffic to flow in one direction, responses
can automatically flow in the opposite direction.
Bastion host
• Verify that the instance is up and running. Check that it has passed both the System Status and Instance
Status checks.
• Verify that the security groups that are associated with the instance allow connections for the required
protocols and ports.
• Verify that the network ACLs that are associated with the subnet allow traffic from the necessary ports and
protocols.
• Verify that the route table that is associated with the subnet has destination rules that point to the
appropriate targets.
Check the following if you cannot connect to an instance through the internet:
• Verify that the public IP address or Domain Name System (DNS) name that you are using is correct.
• Verify that the instance has a public IP address or Elastic IP address.
• Verify that an internet gateway is attached to the instance’s VPC.
• Verify that the route table of the instance’s subnet has a route rule for the destination 0.0.0.0/0 through the
internet gateway.
Check the following if you cannot connect to an instance through Secure Shell (SSH):
•Verify that the instance's IP address or hostname is correct.
•Verify the instance connection credentials: instance private key, or username and password.
•Run the AWSSupport-TroubleshootSSH automation document to help you find and resolve the problem.
Troubleshooting NAT
Storage and
Archiving
CLOUD STORAGE OVERVI EW
Cloud storage is a service that stores data on the internet through a cloud computing provider that manages
and operates data storage as a service.
The AWS storage services can be grouped into the following general categories:
• Object storage
• File storage
• Block storage
• Hybrid storage
Throughput-optimized • Streaming workloads that require consistent, fast throughput at a low price
• Big data
• Data warehouses
• Log processing
• Not a boot volume
Instance stores provide temporary block-level storage for your EC2 instance. This storage is located on disks
that are physically attached to the host computer.
• You use the block device mapping feature of the Amazon EC2 API and the AWS Management Console to
attach an instance store to an instance.
• Instance store data persists for only the lifetime of its associated instance.
• You cannot create or destroy instance store volumes independently from their instances.
• You can control the following: • Whether instance stores are exposed to the EC2 instance
• What device name is used
• Features are available for many instance types but not all instance types.
• The number, size, and type—such as hard disk drive (HDD) compared with solid state drive (SSD)—differ by
instance type.
Use cases
• Instance store volumes are used for temporary storage of information that is continually changing, such as
the following: • Buffers
• Caches
• Scratch data
• Other temporary content
• Instance store volumes are used for data that is replicated across a fleet of instances, such as a load-
balanced pool of web servers.
AMAZON EFS
Amazon EFS is scalable, fully managed, elastic Network File System (NFS) storage for use with AWS Cloud
services and on-premises resources.
Amazon EFS is a petabyte-scale, low-latency file system that does the following:
• Supports NFS
• Is compatible with multiple AWS services
• Is compatible with all Linux-based instances and servers
• Uses tags
Benefits
Performance attributes
Storage classes
Standard storage classes • EFS Standard
• EFS Standard Infrequent Access (Standard-IA)
Use cases
Amazon EFS is designed to provide performance for a broad spectrum of workloads and applications, including
the following:
• Home directories
• File system for enterprise applications
• Application testing and development
• Database backups
• Web serving and content management
• Media workflows
• Big data analytics
Amazon EFS architecture
Amazon S3 is an object storage service that provides secure, durable, and highly available data storage in the
AWS Cloud.
You can use Amazon S3 to store and retrieve any amount of data (objects) at anytime from anywhere on the
web.
Amazon S3 features
• Storage classes
• Storage management
• Access management and security
• Data processing
• Storage logging and monitoring
• Analytics and insights
• Strong consistency
Buckets
A bucket is a container for objects that are stored in Amazon S3. Every object is contained in a bucket.
Objects
Objects are the fundamental entities that are stored in Amazon S3. Objects consist of object data and
metadata.
Object keys
The unique identifier for an object within a bucket
Regions
The geographical area where Amazon S3 will store the buckets that you create
Additional Amazon S3 features
S3 Intelligent-Tiering
S3 Intelligent-Tiering is designed to optimize cost by automatically storing objects in three access tiers:
• Frequent Access
• Infrequent Access
• Archive Instant Access
How it works
Amazon S3 Glacier is a storage service purpose-built for data archiving. It provides high performance, flexible
retrieval, and low-cost archive storage in the cloud.
Vault
A vault is a container for storing archives.
Unique URI form: https://region-specific-endpoint/account-id/vaults/vault-name
Archive
An archive is any data, such as a photo, video, or document.
Unique URI form: https://region-specific-endpoint/account-id/vaults/vault-name/archives/archive-id
Job
An Amazon S3 Glacier job can retrieve an archive or get an inventory of a vault.
Unique URI form: https://region-specific-endpoint/account-id/vaults/vault-name/jobs/job-id
Notification configuration
An Amazon S3 Glacier notification configuration can notify you when a job is completed.
Unique URI form: https://region-specific-endpoint/account-id/vaults/vault-name/notification-configuration
Amazon S3 access to archives
Amazon S3 Glacier provides three archive retrieval options:
• Expedited: 1–5 minutes • Standard: 3–5 hours • Bulk: 5–12 hours
Security features
Security comparison
Storage Gateway is a hybrid storage service that enables on-premises applications to use AWS Cloud storage.
You can use Storage Gateway for backup and archiving, disaster recovery (DR), cloud data processing, storage
tiering, and migration.
Storage Gateway supports file, volume, and tape interfaces.
Volume Gateway
Access block storage as volumes on Amazon S3.
Tape Gateway
Back up and archive data to virtual tape on Amazon S3.
Use Storage Gateway for hybrid scenarios where some storage is needed on premises but some storage can be
offloaded to cloud storage services (Amazon S3, Amazon S3 Glacier, or Amazon EBS).
AWS TRANSFER FAMILY AND OTHER MIGRATION SERVICES
The Transfer Family is a secure transfer service that you can use to transfer files into and out of AWS storage
services.
The Transfer Family supports transferring data from or to the following AWS storage services:
• Amazon Simple Storage Service (Amazon S3) storage buckets
• Amazon Elastic File System (Amazon EFS) Network File System (NFS) file systems
Migration services
DataSync features
• Synchronizes between on premises and AWS
• Is efficient and fast
• Is a managed service
• Connects over the internet or AWS Direct Connect
• Includes AWS DataSync Agent (NFS protocol)
Snowball features
• Sensors or machines
• Data collection in remote locations
• Media and entertainment aggregate content
AWS re/start
Jumpstart on
AWS
AMAZON CLOUDWATCH
• Amazon CloudWatch monitors the state and utilization of most resources that you can manage under AWS.
CloudWatch helps with performance monitoring. However, by itself, it will not add or remove EC2 instances.
Amazon EC2Auto Scaling can help with this situation.
With Amazon EC2 Auto Scaling, you can maintain the health and availability of your fleet. You can also
dynamically scale your EC2 instances to meet demands during spikes and lulls.
• Basic Monitoring for Amazon EC2 instances: Seven pre-selected metrics at a 5-minute frequency and three
status check metrics at a 1-minute frequency, for no additional charge.
• Detailed Monitoring for Amazon EC2 instances: All metrics that are available to Basic Monitoring at a 1-
minute frequency, for an additional charge. Instances with detailed monitoring enabled provide data
aggregation by Amazon EC2, Amazon Machine Image (AMI) ID, and instance type.
• Test a selected metric against a specific threshold (greater than or equal to, less than or equal to)
• The ALARM state is not necessarily an emergency condition
Metric components
Metrics are the fundamental concept in CloudWatch. A metric represents a time-ordered set of data points
that are published to CloudWatch. Think of a metric as a variable to monitor, and the data points represent the
values of that variable over time.
Metrics are uniquely defined by a name, a namespace, and zero or more dimensions.
Namespace is a container for CloudWatch metrics. Metrics in different namespaces are isolated from each
other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
Dimension is a name-value pair that uniquely identifies a metric. You can assign up to 10 dimensions to a
metric. Each metric has specific characteristics that describe it, and you can think of dimensions as categories
for those characteristics. Dimensions help you design a structure for your statistics plan. You can use
dimensions to filter the results that CloudWatch returns.
Period is the length of time that is associated with a specific CloudWatch statistic. Periods are defined in
numbers of seconds. You can adjust how the data is aggregated by varying the length of the period. A period
can be as short as 1 second or as long as 1 day (86,400 seconds).
Standard metrics:
• Grouped by service name
• Display graphically so that selected metrics can be compared
• Only appear if you have used the service in the past 15 months
• Reachable programmatically through the AWS Command Line Interface (AWS CLI) or application
programming interface (API)
Custom metrics:
• Grouped by user-defined namespaces
• Publish to CloudWatch by using the AWS CLI, an API, or a CloudWatch agent
Events – An event indicates a change in your AWS environment. AWS resources can generate events when
their state changes. For example, Amazon Elastic Compute Cloud (Amazon EC2) generates an event when the
state of an EC2 instance changes from pending to running. You can generate custom application-level events
and publish them to CloudWatch Events. You can also set up scheduled events that are generated on a
periodic basis.
Targets – A target processes events. Example targets include EC2 instances, AWS Lambda functions, Amazon
Simple Notification Service (Amazon SNS) topics, and Amazon Simple Queue Service (Amazon SQS) queues.
Rules – A rule matches incoming events and routes them to targets for processing. A single rule can route to
multiple targets, all of which are processed in parallel. This enables different parts of an organization to look
for and process the events that are of interest to them.
Configure – Decide what information you need to capture in your logs, and where and how it will be stored.
Collect – Instances are provisioned and removed in a cloud environment. You need a strategy for periodically
uploading a server’s log files so that this valuable information is not lost when an instance is eventually
terminated.
Analyse – After all the data is collected, it is time to analyse it. Using log data gives you greater visibility into
the daily health of your systems. It can also provide information on upcoming trends in customer behaviour,
and insight into how customers currently use your system.
• Logs, continuously monitors, and retains account activity that is related to actions across your AWS
infrastructure
• Records application programming interface (API) calls for most AWS services
– AWS Management Console and AWS Command Line Interface (AWS CLI) activity are also recorded
• Is supported for a growing number of AWS services
• Automatically pushes logs to Amazon Simple Storage Service (Amazon S3) after it is configured
• Will not track events within an Amazon Elastic Compute Cloud (Amazon EC2) instance
– Example: Manual shutdown of an instance
CloudTrail can help you answer questions that require detailed analysis.
Configure a trail
By default, when you access the CloudTrail event history for the Region that you are viewing, CloudTrail shows
only the results from the last 90 days. These events are limited to management events with create, modify,
and delete API calls; and also account activity. For a complete record of account activity—including all
management events, data events, and read-only activity—you must configure a CloudTrail trail.
1. Configure a new or existing Amazon Simple Storage Service (Amazon S3) bucket for
uploading log files.
2. Define a trail to log desired events (all management events are logged by default).
3. Create an Amazon Simple Notification Service (Amazon SNS) topic to receive notifications.
4. Configure Amazon CloudWatch Logs to receive logs from CloudTrail (optional).
5. Turn on log file encryption and integrity validation for log files (optional).
6. Add tags to your trail (optional).
Monitoring and security
When you monitor the activity on your account and secure your resources and data, the features of
CloudWatch and CloudTrail are complementary. Using both services is a best practice. For example, you can
examine the logs from CloudWatch Logs and also examine CloudTrail entries to detect potential unauthorized
use.
To use Amazon Athena, point to your data in Amazon Simple Storage Service (Amazon S3), define the schema,
and start querying by using standard SQL. Most results are delivered within seconds. With Athena, you do not
need complex ETL jobs to prepare your data for analysis. Athena makes it easy for anyone with SQL skills to
quickly analyze large-scale datasets.
Athena works with various standard data formats, including comma-separated values (CSV), JavaScript Object
Notation (JSON), Optimized Row Columnar (ORC), Apache Avro, and Apache Parquet. Athena is ideal for quick,
ad hoc querying. However, it can also handle complex analysis, including large joins and arrays. Athena uses
Amazon S3 as its underlying data store, which makes your data highly available and durable.
Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
You can quickly query your data without needing to set up and manage any servers or data warehouses.
Athena enables you to query all your data in Amazon S3 without needing to set up complex processes to
extract, transform, and load ETL) the data.
Also, Athena offers fast, interactive query performance. Athena automatically runs queries in parallel, so most
results come back within seconds.
AWS Organizations is an account management service that enables you to consolidate multiple AWS accounts
into an organization that you create and centrally manage. AWS Organizations include consolidated billing and
account management capabilities that help you to better meet the budgetary, security, and compliance needs
of your business.
The diagram shows a basic organization, or root. This example organization consists of seven accounts that are
organized into six organizational units (OUs). An OU is a container for accounts within a root. An OU can also
contain other OUs, which enables you to create a hierarchy that looks like an upside-down tree. The tree has a
root at the top and branches of OUs that reach down, ending in accounts that are the leaves of the tree.
When you attach a policy to one of the nodes in the hierarchy, it flows down and affects all the branches and
leaves. This organization has several policies that are attached to some of the OUs or directly to accounts.
An OU can have only one parent and, currently, each account can be a member of exactly one OU. An account
is a standard AWS account that contains your AWS resources. You can attach a policy to an account to apply
controls to only that one account.
Organizations setup
A tag:
• Is a key-value pair that can be attached to an AWS resource.
• Enables you to identify and categorize resources.
Tag characteristics
AWS Config provides AWS managed rules, which are predefined, customizable rules that AWS Config uses to
evaluate whether your AWS resources comply with common best practices. You can customize the behavior of
a managed rule to suit your needs. For example, you could use the required-tags managed rule to quickly
assess whether a specific tag is applied to your resources. This rule enables you to specify the key of the
required tag and, optionally, its value. After you activate the rule, AWS Config compares your resources to the
defined conditions and reports any non-compliant resources. The evaluation of a managed rule can occur
when a resource changes, or on a periodic basis.
Use an IAM policy to require the use of specific tags on a resource, and AWS Config to periodically verify that
all resources are tagged.
• Pay only for what you need, when you need it.
• Create scripts or templates to shut down environments.
• Can turn off unused resources
– Specific services after business hours and during holidays
– Development or test environments
– Disaster recovery (DR) environments
– Instances that are tagged as temporary
• Cost and Usage reports –These reports enable you to understand your costs and usage for all services. For
example, the Monthly costs by service report (displayed in the screen capture) shows your costs for the last 3
months, grouped by service. The top five services are shown by themselves, and the rest are grouped into one
bar (labelled Others).
• Reserved Instance (RI) reports –These reports are specific to your Reserved Instances usage. They provide an
understanding of your comparative utilization costs for Reserved Instances versus On-Demand Instances.
You can view data for up to the last 13 months, forecast how much you are likely to spend for the next 3
months, and get recommendations for which Reserved Instances to purchase.
If you have many accounts and have enabled consolidated billing for AWS Organizations, you can use AWS
Cost Explorer to view costs across all your linked accounts. You can also monitor the individual daily and
monthly spend for each linked account.
AWS Budgets
AWS Budgets enables you to set custom budgets that alert you when costs or usage exceed (or are forecasted
to exceed) your budgeted amount. AWS Budgets uses the cost visualization that is provided by Cost Explorer to
show you the status of your budgets and to provide forecasts of your estimated costs. You can also use
Budgets to create notifications if you go over your budgeted amounts, or when your estimated costs exceed
your budgets. Budgets can be tracked at the monthly, quarterly, or yearly level. You can customize the start
and end dates. Budget alerts can be sent through email or through an Amazon Simple Notification Service
(AmazonSNS) topic.
Writing and using a stopinator script is a technique for automating the shutdown of instances. A stopinator is a
generic term for any script or application that is written against the AWS Cloud, and that looks for and stops
unused instances.
Serverless stopinator
You do not need to create or use an Amazon Elastic Compute Cloud (Amazon EC2) instance to run a stopinator.
A simple and efficient design is to use a combination of a Lambda function and an Amazon CloudWatch Events
event in a serverless solution. The logic to stop and start an instance is implemented as a Lambda function.
This function is then triggered by a CloudWatch Events event according to the desired schedule.
AWS Trusted Advisor is an online resource to help you reduce cost, increase performance, and improve
security by optimizing your AWS environment.
AWS Trusted Advisor analyses your AWS environment and provides recommendations for best practices in five
categories:
• Cost optimization – Advice about how you can save money by eliminating unused and idle resources, or
making commitments to reserved capacity.
• Performance – Advice about how to improve the performance of your services by checking your service
limits, ensuring that you use provisioned throughput, and monitoring for overutilized instances.
• Security – Advice about how to improve the security of your applications by closing gaps, enabling various
AWS security features, and examining your permissions.
• Fault tolerance – Advice about how to increase the availability and redundancy of your AWS applications by
using automatic scaling, health checks, Multi-AZ deployment, and backup capabilities.
• Service limits – Advice about the services whose usage exceeds 80 percent of their service limit.
AWS Trusted Advisor cost optimization features
You can use AWS Trusted Advisor to identify idle resources, such as EC2 instances, underused load balancers
and volumes, and unused Elastic IP addresses. Trusted Advisor is also a good tool for cost optimization. It
provides checks and recommendations that enable you to achieve cost savings.
AWS Support provides a mix of tools and technology, people, and programs. The AWS Support resources are
designed to proactively help you optimize performance, lower costs, and innovate faster.
Types of support
• Support is provided for –
– Experimenting with AWS
– Production use of AWS
– Business-critical use of AWS
Build faster – Use AWS experts to quickly build knowledge and expertise.
Mitigate risks – AWS Support can help you maintain the strictest security standards and proactively alert you
to issues that require attention.
Management resources – Proactively monitor your environment and automate remediation.
Get expert help – Cloud support engineers work at the same standards for technical aptitude as the AWS
software development organization.
Technology
• AWS Personal Health Dashboard provides alerts and remediation guidance if AWS experiences events that
might impact customers.
• AWS Trusted Advisor is an online resource that checks for opportunities to reduce monthly expenditures
and increase productivity.
• AWS Health API provides programmatic access to the AWS Health information that is in the Personal Health
Dashboard.
Programs
• AWS Infrastructure Event Management (IEM) provides guidance for architecture and scaling. They also offer
operational support during planned events, such as shopping holidays.
• Architectural reviews with AWS solutions architects are included with Enterprise Support.
– AWS Well-Architected helps cloud architects build secure, resilient, and efficient infrastructure for their
applications and workloads.
• Proactive services that are delivered by AWS Support experts are included with Enterprise Support.
AWS Technical Support tiers cover development and production issues for AWS products and services.
How-to – Find resources to assist customers and answer their questions about AWS services and features
Best practices – Help customers successfully integrate, deploy, and manage applications in the cloud
Troubleshooting – Help customers with issues about application programming interfaces (APIs) and AWS
software development kits (SDKs)
Troubleshooting – Help customers with operational or systemic issues with AWS resources
Issues – Identify issues with the AWS Management Console or other AWS tools
Problems detected – Help customers with issues that were detected by Amazon Elastic Compute Cloud
(Amazon EC2) health checks
Trusted Advisor does not focus on only one service, and it is not only a security tool. For example, Trusted
Advisor can tell you how the infrastructure is performing and when security groups have been left open. It can
tell you whether you are using fault tolerance and if you are at risk with all the resources that you deployed in
an Availability Zone. It can also tell you if you have deployed resources that you are not using, but are still
being charged for.
• Increase efficiency
• Validate every change before release
• Reduce cost by removing unwanted resources
• Enforce security at every layer
• Deploy configuration changes to running instances
• Make configuration automated and repeatable
User data – Enables you to author scripts that are run on instance launch.
Amazon Machine Images (AMIs) – By creating base images that are customized to the needs of your
organization, you can pre-deploy installations and configurations into the EC2 instances that are launched
from the AMI.
Configuration and deployment frameworks – Technologies such as Chef, Puppet, and Ansible enable you to
configure new instances by using templates.
AWS OpsWorks – A configuration management service that provides managed instances of Chef and Puppet.
AWS CloudFormation – An AWS service that enables you to configure architectures for repeatable
deployments.
The new custom AMI then becomes the AMI that is used to create all new instances in the organization.
To enforce the policy that all new instances are launched only from the new base AMI, do the following:
• Create processes that scan the running Amazon EC2 instances in your account.
• Terminate any instances that are not using the standard AMIs.
Another option is to configure instances at boot time. An example of configuring an instance at boot time is
the use of the user data option to run a script when you launch an EC2 instance.
Creating AMIs
To create an AMI, you can use any one of the following tools:
• AWS Management Console
• AWS Command Line Interface (AWS CLI)
• AWS application programming interface (API)
• Costs are incurred for Amazon EBS snapshots of volumes stored in Amazon S3.
• Create Linux AMIs directly from an Amazon EC2 instance root volume snapshot using one of two tools:
– AWS Management Console
– AWS CLI command: aws ec2 register-image
AMAZON EC2 LAUNCH TEMPLATES
• Use infrastructure as code (IaC) to automate resource provisioning in the cloud consistently and reliably.
• AWS CloudFormation is one of the AWS tools for managing resources with IaC.
• After deployment, you can manage the infrastructure using other tools, such as AWS System Manager or
AWS OpsWorks. In both cases, use code to ensure consistency and reliability
• Infrastructure as code (IaC) is a method to declare the resources that are needed in the cloud in using text
files
• JavaScript Object Notation (JSON) and YAML Ain’t Markup (YAML) syntaxes are used by AWS to declare
resources in the cloud
• Enables the definition of simple to complex infrastructures in text
• Understanding the syntaxes of JSON & YAML are required to build infrastructure as code
JSON
– Syntax for storing and transporting data.
– Text-based format, so it is human-readable.
– Documents are easily written.
– Stores key-value pairs and arrays of data.
Key – Unique identifier for an item of data.
Value – Data that is identified or a pointer to the location of that data.
Advantages:
• Lightweight (minimal syntax and mockup) — good for application programming interfaces (APIs).
• Easy for humans to read and write.
• Easy for machines to parse and generate.
Disadvantages:
• No native support for binary data (such as image files).
YAML
– Syntax for storing data.
– Text-based format, so it is human-readable.
– Documents are easily written.
– Store key-value pairs, lists, and associative arrays of data.
– Store complex data structures in a single YAML document.
AWS CLOUDFORMATION
AWS CloudFormation
A stack is a collection of AWS resources that were created from a template. You might provision (create) a
stack many times.
When a stack is provisioned, the AWS resources that are specified by the stack template are created. Any
charges incurred from using these services will start accruing when they are created as part of the AWS
CloudFormation stack.
When a stack is deleted, the resources that are associated with that stack are deleted. The order of deletion is
determined by AWS CloudFormation. You do not have direct control over what gets deleted when.
• Parameters enable you to input custom values to your template each time you create or update a stack.
AWS re/start
Additional
Services
CLOUD ADOPTION FRAME WORK (CAF)
The AWS CAF leverages AWS experience and best practices to help you digitally transform and accelerate your
business outcomes through the innovative use of AWS.
Benefits
How it works
Analytics services
AWS Data Exchange is the world’s most comprehensive service for third-party datasets. AWS Data Exchange is
the only data marketplace with more than 3,500 products from over 300 providers delivered—through files,
APIs, or Amazon Redshift queries—directly to the data lakes, applications, analytics, and machine learning (ML)
models that use it. With AWS Data Exchange, the user can streamline all third-party data consumption, from
existing subscriptions—which the user can migrate at no additional cost—to future data subscriptions in one
place. As an AWS service, AWS Data Exchange is secure and compliant, integrated with AWS and third-party
tools and services, and offers consolidated billing and subscription management.
Amazon EMR is a web service that efficiently processes vast amounts of data by using Apache Hadoop and
AWS services.
AWS Glue is a scalable, serverless data integration service to discover, prepare, and combine data for
analytics, ML, and application development.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service to build and run
applications that use Apache Kafka to process streaming data without needing Apache Kafka infrastructure
management expertise. Apache Kafka is an open source platform for building real-time streaming data
pipelines and applications. However, Apache Kafka is difficult for users to architect, operate, and manage on
their own.
Amazon OpenSearch Service is a managed service to deploy, operate, and scale OpenSearch Service clusters in
the AWS Cloud. OpenSearch Service supports OpenSearch and legacy Elasticsearch OSS (up to 7.10, the final
open source version of the software).
Application integration
Amazon EventBridge is used to route events from sources such as homegrown applications, AWS services, and
third-party software to consumer applications across the organization. EventBridge provides a consistent way
to ingest, filter, transform, and deliver events so users can build new applications quickly. EventBridge event
buses are well suited for many-to-many routing of events between event-driven services. EventBridge Pipes is
intended for point-to-point integrations between these sources and targets, with support for advanced
transformations and enrichment.
AWS Step Functions is a serverless orchestration service for integrating with AWS Lambda functions and other
AWS services to build business-critical applications. Through the Step Functions graphical console, the user
sees their application’s workflow as a series of event-driven steps. Step Functions is based on state machines
and tasks. In Step Functions, a workflow is called a state machine, which is a series of event-driven steps. Each
step in a workflow is called a state. A Task state represents a unit of work that another AWS service, such as
Lambda, performs. A Task state can call any AWS service or API.
Business productivity
Amazon Connect is an omnichannel cloud contact center. The user can set up a contact center in a few steps,
add agents who are located anywhere, and start engaging with customers.
Amazon Simple Email Service (Amazon SES) is an email platform that provides a cost-effective way for users
to send and receive email messages by using their own email addresses and domains.
Compute
AWS Local Zones are a type of infrastructure deployment that places compute, storage, database, and other
select AWS services close to large population and industry centers.
AWS Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any
on-premises or edge location for a truly consistent hybrid experience. With Outposts solutions, the user can
extend and run AWS services on premises, and Outposts is available in a variety of form factors. With
Outposts, the user can run some AWS services locally and connect to a broad range of services available in the
local AWS Region. Users can also use Outposts to run applications and workloads on premises by using familiar
AWS services, tools, and APIs. Outposts supports workloads and devices that require low latency access to on-
premises systems, local data processing, data residency, and application migration with local system
interdependencies.
With AWS Wavelength, developers can build applications that deliver ultra-low latencies to mobile devices
and end users. AWS Wavelength deploys standard AWS compute and storage services to the edge of
communications service providers' 5G networks. The user can extend a virtual private cloud (VPC) to one or
more Wavelength Zones. The user can then use AWS resources such as Amazon Elastic Compute Cloud
(Amazon EC2) instances to run the applications that require ultra-low latency and a connection to AWS
services in the Region.
Containers
Amazon Elastic Container Registry (Amazon ECR) is an AWS managed container image registry service that is
secure, scalable, and reliable. It supports private repositories with resource-based permissions by using AWS
Identity and Access Management (IAM) so that specified users or EC2 instances can access their container
repositories and images. The user can use their preferred command line interface (CLI) to push, pull, and
manage Docker images, Open Container Initiative (OCI) images, and OCI-compatible artifacts. Amazon ECR also
supports public container image repositories. The AWS container services team maintains a public road map
on GitHub. It contains information about what the teams are working on and gives all AWS customers the
ability to provide direct feedback.
Customer engagement
AWS Activate for startups provides eligible startups with free tools, resources, and content designed to help
startups reach their goals.
Professionals use AWS IQ to find and engage experts on AWS. All experts on AWS IQ who respond to custom
requests are AWS Certified and must maintain a high success rate.
Databases
Amazon MemoryDB for Redis is a Redis-compatible, durable, in-memory database service that delivers ultra-
fast performance. It is purpose-built for modern applications with microservices architectures.
Amazon Neptune is a fast, reliable, fully managed graph database service used to build and run applications
that work with highly connected datasets.
Developer tools
AWS AppConfig is a capability of AWS Systems Manager to create, manage, and quickly deploy application
configurations. A configuration is a collection of settings that influence the behavior of an application. AWS
AppConfig can be used with applications hosted on EC2 instances, Lambda, containers, mobile applications, or
Internet of Things (IoT) devices. AWS AppConfig helps deploy application configuration in a managed and a
monitored way just like code deployments but without the need to deploy the code if a configuration value
changes. With AWS AppConfig, users can update configurations by entering changes through the API or the
AWS Management Console.
AWS CloudShell is a browser-based shell to securely manage, explore, and interact with AWS resources.
CloudShell is pre-authenticated with the user’s console credentials. Common development and operations
tools are pre-installed, so there’s no need to install or configure software on the local machine. With
CloudShell, users can quickly run scripts with the AWS Command Line Interface (AWS CLI), experiment with
AWS service APIs by using the AWS SDKs, or use a range of other tools to be more productive.
AWS CodeArtifact is a fully managed artifact repository service that organizations of any size can use to
securely store, publish, and share software packages used in their software development process. CodeArtifact
works with commonly used package managers and build tools such as Maven and Gradle (Java), npm and yarn
(JavaScript), pip and twine (Python), and NuGet (.NET).
AWS X-Ray is a service that collects data about requests that the user’s application serves, and provides tools
to view, filter, and gain insights into that data to identify issues and opportunities for optimization. For any
traced request to an application, users can see detailed information, not only about the request and response,
but also about calls that the application makes to downstream AWS resources, microservices, databases, and
web APIs.
End-user computing
Amazon AppStream 2.0 is an AWS End User Computing (EUC) service that can be configured for software as a
service (SaaS) application streaming or delivery of virtual desktops with selective persistence.
Amazon WorkSpaces is a fully managed desktop virtualization service for Windows, Linux, and Ubuntu that
gives the user the ability to access resources from any supported device.
Amazon WorkSpaces Web is a low cost, fully managed, Linux-based service that is designed to facilitate secure
browser access to internal websites and SaaS applications from existing web browsers without the
administrative burden of appliances, managed infrastructure, specialized client software, or virtual private
network (VPN) connections.
Frontend web and mobile
AWS Amplify is a complete solution for frontend web and mobile developers to build, ship, and host full-stack
applications on AWS with the flexibility to leverage the breadth of AWS services as use cases evolve. No cloud
expertise is needed.
AWS AppSync creates serverless GraphQL and Pub/Sub APIs that simplify application development through a
single endpoint to securely query, update, or publish data.
AWS Device Farm is an application testing service for users to improve the quality of their web applications
and mobile apps by testing them across an extensive range of desktop browsers and real mobile devices. With
Device Farm, users don’t have to provision and manage any testing infrastructure.
Internet of Things
AWS IoT Core connects billions of Internet of Things (IoT) devices and routes trillions of messages to AWS
services without managing infrastructure.
AWS IoT Greengrass is an open source edge runtime and cloud service for building, deploying, and managing
device software.
Machine learning
Amazon Comprehend is a natural-language processing (NLP) service that uses ML to uncover valuable insights
and connections in text.
Amazon Kendra is an intelligent enterprise search service that helps the user search across different content
repositories with built-in connectors.
Amazon Lex is a fully managed artificial intelligence (AI) service with advanced natural language models to
design, build, test, and deploy conversational interfaces in applications.
Amazon Polly uses deep learning technologies to synthesize natural-sounding human speech so that the user
can convert articles to speech. With dozens of lifelike voices across a broad set of languages, Amazon Polly
helps users build speech-activated applications.
Amazon Rekognition offers pre-trained and customizable computer vision (CV) capabilities to extract
information and insights from images and videos.
Amazon SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly
build and train ML models and then deploy them into a production-ready hosted environment.
Amazon Textract is an ML service that automatically extracts text, handwriting, and data from scanned
documents. It goes beyond optical character recognition (OCR) to identify, understand, and extract data from
forms and tables.
Amazon Transcribe provides transcription services for audio files and audio streams. It uses advanced ML
technologies to recognize spoken words and transcribe them into text.
Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and
customizable language translation.
Management and governance
AWS Compute Optimizer recommends optimal AWS compute resources for workloads. It can help reduce
costs and improve performance by using ML to analyze historical utilization metrics. Compute Optimizer helps
the user to choose the optimal resource configuration based on utilization data.
With AWS Control Tower, users can enforce and manage governance rules for security, operations, and
compliance at scale across all their organizations and accounts in the AWS Cloud.
The AWS Health Dashboard is the single place to learn about the availability and operations of AWS services.
The user can view the overall status of AWS services, and they can sign in to view personalized
communications about their particular AWS account or organization. The account view provides deeper
visibility into resource issues, upcoming changes, and important notifications.
AWS Launch Wizard offers a guided way of sizing, configuring, and deploying AWS resources for third-party
applications, such as Microsoft SQL Server Always On and HANA-based SAP systems, without the need to
manually identify and provision individual AWS resources.
AWS Resource Groups manages and automates tasks on large numbers of resources at one time. A user can
use resource groups to organize their AWS resources, and tags are key and value pairs that act as metadata for
organizing those resources.
With AWS Service Catalog, IT administrators can create, manage, and distribute portfolios of approved
products to end users, who can then access the products they need in a personalized portal. Typical products
include servers, databases, websites, or applications that are deployed by using AWS resources (for example,
an EC2 instance or an Amazon Relational Database Service [Amazon RDS] database).
The AWS Application Discovery Service helps systems integrators quickly and reliably plan application
migration projects by automatically identifying applications running in on-premises data centres, their
associated dependencies, and their performance profile.
AWS Application Migration Service is a highly automated lift-and-shift (rehost) solution that simplifies,
expedites, and reduces the cost of migrating applications to AWS. Companies can use this service to lift and
shift a large number of physical, virtual, or cloud servers without compatibility issues, performance disruption,
or long cutover windows.
AWS Migration Hub provides a single location to track migration tasks across multiple AWS tools and partner
solutions. With Migration Hub, users can choose the AWS and partner migration tools that best fit their needs
while providing visibility into the status of their migration projects.
AWS Transfer Family is a secure transfer service to transfer files into and out of AWS storage services. Transfer
Family is part of the AWS Cloud platform.
Security, identity, and compliance
AWS Audit Manager helps users continually audit their AWS usage to simplify how they manage risk and
compliance with regulations and industry standards. Audit Manager automates evidence collection so users
can assess whether their policies, procedures, and activities—also known as controls—are operating
effectively.
AWS Directory Service provides multiple ways to set up and run Microsoft Active Directory with other AWS
services, such as Amazon EC2, Amazon RDS for SQL Server, Amazon FSx for Windows File Server, and AWS IAM
Identity Center (successor to AWS Single Sign-On).
AWS Firewall Manager simplifies a user’s AWS WAF administration and maintenance tasks across multiple
accounts and resources. With Firewall Manager, users set up their firewall rules only once. The service
automatically applies these rules across accounts and resources, even as new resources are added.
With AWS IAM Identity Center (successor to AWS Single Sign-On), a user can manage sign-in security for their
workforce identities, also known as workforce users. IAM Identity Center provides one place where users can
create or connect workforce users and centrally manage their access across all their AWS accounts and
applications. Users can use multi-account permissions to assign their workforce users access to AWS accounts.
AWS Key Management Service (AWS KMS) is an encryption and key management service scaled for the cloud.
Other AWS KMS keys and functionality are used by other AWS services, and a user can use them to protect
data in their own applications that use AWS.
AWS Network Firewall is a stateful, managed, network firewall and intrusion detection and prevention service
for a user’s VPC that is created in Amazon Virtual Private Cloud (Amazon VPC). With Network Firewall, a user
can filter traffic at the perimeter of a VPC. This includes filtering traffic going to and coming from an internet
gateway, NAT gateway, or over VPN or AWS Direct Connect.
AWS Resource Access Manager (AWS RAM) helps users securely share their resources across AWS accounts,
within their organization or organizational units (OUs) in AWS Organizations, and with IAM roles and IAM users
for supported resource types. A user can use AWS RAM to share resources with other AWS accounts.
AWS Secrets Manager helps a user to securely encrypt, store, and retrieve credentials for databases and other
services. Instead of hardcoding credentials in applications, a user can make calls to Secrets Manager to retrieve
credentials whenever needed. Secrets Manager helps protect access to IT resources and data by giving users
the ability to rotate and manage access to their secrets.
AWS Security Hub provides users with a comprehensive view of their security state in AWS and helps them
check their environment against security industry standards and best practices. Security Hub collects security
data from across AWS accounts, services, and supported third-party partner products and helps users analyze
their security trends and identify the highest priority security issues.
Storage
AWS Elastic Disaster Recovery minimizes downtime and data loss with fast, reliable recovery of on-premises
and cloud-based applications by using affordable storage, minimal compute, and point-in-time recovery.
Amazon FSx makes it cost-effective to launch, run, and scale feature-rich, high-performance file systems in the
cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of
capabilities.