Digitalcloud - Training-Amazon S3 and Glacier
Digitalcloud - Training-Amazon S3 and Glacier
digitalcloud.training/amazon-s3-and-glacier
January 5, 2022
Amazon S3 is object storage built to store and retrieve any amount of data from
anywhere on the Internet.
It’s a simple storage service that offers an extremely durable, highly available, and
infinitely scalable data storage infrastructure at very low costs.
Keys can be any string, and they can be constructed to mimic hierarchical attributes.
Alternatively, you can use S3 Object Tagging to organize your data across all your S3 buckets and/or prefixes.
Amazon S3 provides a simple, standards-based REST web services interface that is designed to work with any Internet-development toolkit.
For objects larger than 100 megabytes use the Multipart Upload capability.
Updates to an object are atomic – when reading an updated object you will either get the new object or the old one, you will never get partial
or corrupt data.
It is recommended to access S3 through SDKs and APIs (the console uses APIs).
Event notifications for specific actions, can send alerts or trigger actions.
SNS Topics.
SQS Queue.
Lambda functions.
Need to configure SNS/SQS/Lambda before S3.
No extra charges from S3 but you pay for SNS, SQS and Lambda.
Requester pays function causes the requester to pay (removes anonymous access).
Provides eventual consistency for overwrite PUTS and DELETES (takes time to propagate).
Key (name).
Value (data).
Version ID.
Metadata.
Access Control Lists.
For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests per second per prefix in a bucket.
1/18
There are no limits to the number of prefixes in a bucket.
For read intensive requests you can also use CloudFront edge locations to offload from S3.
Additional Capabilities
Additional capabilities offered by Amazon S3 include:
Requester Pays The requester rather than the bucket owner pays for requests and data transfer
Events Trigger notifications to SNS, SQS, or Lambda when certain events happen in your bucket
BitTorrent Use the BitTorrent protocol to retrieve any publicly available object by automatically generating a .torrent file.
Storage Class Analyzes storage access patterns to help you decide when to transition the right data to the right storage class.
Analysis
Storage Lens Delivers organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to
improve cost-efficiency and apply data protection best practices.
S3 Object Add your own code to S3 GET requests to modify and process data as it is returned to an application.
Lambda
Use Cases
Typical use cases include:
Backup and Storage – Provide data backup and storage services for others.
Application Hosting – Provide services that deploy, install, and manage web applications.
Media Hosting – Build a redundant, scalable, and highly available infrastructure that hosts video, photo, or music uploads and
downloads.
Software Delivery – Host your software applications that customers can download.
Static Website – you can configure a static website to run from an S3 bucket.
Persistent data stores are non-volatile storage systems that retain data when powered off.
This contrasts with transient data stores and ephemeral data stores which lose the data when powered off.
The following table provides a description of persistent, transient, and ephemeral data stores and which AWS service to use:
Persistent data Data is durable and sticks around after reboots, restarts, or power cycles S3, Glacier, EBS, EFS
store
Transient Data Data is just temporarily stored and passed along to another process or SQS, SNS
Store persistent store
Ephemeral Data Data is lost when the system is stopped EC2 Instance Store, Memcached
Store (Elasticache)
Buckets
Files are stored in buckets:
2/18
You can use an object key name (prefix) to mimic folders.
You can create folders in your buckets (only available through the Console).
Bucket names are part of the URL used to access the bucket.
Bucket naming:
For better performance, lower latency, and lower cost, create the bucket closer to your clients.
Objects
Each object is stored and retrieved by a unique key (ID or name).
Service endpoint.
Bucket name.
Object key (name).
Optionally, an object version.
Objects stored in a bucket will never leave the region in which they are stored unless you move them to another region or enable cross-
region replication.
You can define permissions on objects when uploading and at any time afterwards using the AWS Management Console.
Subresources
Sub-resources are subordinate to objects, they do not exist independently but are always associated with another entity such as an object or
bucket.
3/18
Sub-resources associated with objects include:
Cross-origin-resource-sharing (CORS)
Used to allow requests to a different origin when connected to the main origin.
The request will fail unless the origin allows the requests using CORS headers (e.g. Access-Control-Allow-Origin).
Storage Classes
There are six S3 storage classes.
The table below provides the details of each Amazon S3 storage class:
Availability ≥3 ≥3 ≥3 1 ≥3 ≥3 ≥3
Zones
Objects stored in the S3 One Zone-IA storage class are stored redundantly within a single Availability Zone in the AWS Region you select.
4/18
IAM policies.
Bucket policies.
Access Control Lists (ACLs).
Query string authentication (URL to an Amazon S3 object which is only valid for a limited time).
Access auditing can be configured by configuring an Amazon S3 bucket to create access log records for all requests made against it.
For capturing IAM/user identity information in logs configure AWS CloudTrail Data Events.
By default a bucket, its objects, and related sub-resources are all private.
The resource owner refers to the AWS account that creates the resource.
With IAM the account owner rather than the IAM user is the owner.
Within an IAM policy you can grant either programmatic access or AWS Management Console access to Amazon S3 resources.
Amazon Resource Names (ARN) are used for specifying resources in a policy.
arn:partition:service:region:namespace:relative-id.
For S3 resources:
arn:aws:s3:::bucket_name.
arn:aws:s3:::bucket_name/key_name.
A bucket owner can grant cross-account permissions to another AWS account (or users in an account) to upload objects.
Individual users.
AWS accounts.
Everyone (public/anonymous).
All authenticated users (AWS users).
Access policies define access to resources and can be associated with resources (buckets and objects) and users.
You can use the AWS Policy Generator to create a bucket policy for your Amazon S3 bucket.
Resource-based policies:
User policies:
5/18
You cannot grant anonymous permissions in an IAM user policy as the policy is attached to a user.
User policies can grant permissions to a bucket and the objects in it.
ACLs:
You grant permission to another AWS account using the email address or the canonical user ID.
However, if you provide an email address in your grant request, Amazon S3 finds the canonical user ID for that account and adds it to
the ACL.
Grantee accounts can then then delegate the access provided by other accounts to their individual users.
Pre-defined Groups
Authenticated Users group:
Access permission to this group allows anyone in the world access to the resource.
The requests can be signed (authenticated) or unsigned (anonymous).
Unsigned requests omit the authentication header in the request.
AWS recommends that you never grant the All Users group WRITE, WRITE_ACP, or FULL_CONTROL permissions.
Providing WRITE permission to this group on a bucket enables S3 to write server access logs.
Not applicable to objects.
The following table lists the set of permissions that Amazon S3 supports in an ACL.
The set of ACL permissions is the same for an object ACL and a bucket ACL.
Depending on the context (bucket ACL or object ACL), these ACL permissions grant permissions for specific buckets or object
operations.
The table lists the permissions and describes what they mean in the context of objects and buckets.
READ Allows grantee to list the objects in the bucket Allows grantee to read the object data and its metadata
WRITE Allows grantee to create, overwrite and delete any object N/A
in the bucket
READ_ACP Allows grantee to read the bucket ACL Allows grantee to read the object ACL
WRITE_ACP Allows grantee to write the ACL for the applicable buckets Allows grantee to write the ACL for the applicable object
FULL_CONTROL Allows grantee the READ, WRITE, READ_ACP, Allows grantee the READ, WRITE, READ_ACP,
WRITE_ACP permissions on the bucket WRITE_ACP permissions on the object
6/18
WRITE is only applicable to the bucket level (except for ACP).
The only recommended use case for the bucket ACL is to grant write permissions to the S3 Log Delivery group.
When granting other AWS accounts the permissions to upload objects, permissions to these objects can only be managed by the object
owner using object ACLs.
For an IAM user to access resources in another account the following must be provided:
If an AWS account owns a resource it can grant permissions to another account, that account can then delegate those permissions or a
subset of them to users in the account (permissions delegation).
An account that receives permissions from another account cannot delegate permissions cross-account to a third AWS account.
Charges
No charge for data transferred between EC2 and S3 in the same region.
Data Retrieval (applies to S3 Standard-IA and S3 One Zone-IA, S3 Glacier and S3 Glacier Deep Archive).
Charges are:
Requester pays:
The bucket owner will only pay for object storage fees.
The requester will pay for requests (uploads/downloads) and data transfers.
Can only be enabled at the bucket level.
Multipart upload
Can be used to speed up uploads to S3.
Multipart upload uploads objects in parts independently, in parallel and in any order.
7/18
Performed using the S3 Multipart upload API.
Improves throughput.
Can begin upload before you know the final object size.
S3 Copy
You can create a copy of objects up to 5GB in size in a single atomic operation.
For files larger than 5GB you must use the multipart upload API.
Once uploaded to S3 some object metadata cannot be changed, copying the object can allow you to modify this information.
Transfer acceleration
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon
S3 bucket.
S3 Transfer Acceleration leverages Amazon CloudFront’s globally distributed AWS Edge Locations.
Bucket names must be DNS compliance and cannot have periods between labels.
.s3-accelerate.amazonaws.com.
.s3-accelerate.dualstack.amazonaws.com (dual-stack option).
S3 Transfer Acceleration supports all bucket level features including multipart uploads.
Static Websites
S3 can be used to host static websites.
8/18
Cannot use dynamic content such as PHP, .Net etc.
Automatically scales.
You can use a custom domain name with S3 using a Route 53 Alias record.
When using a custom domain name the bucket name must be the same as the domain name.
Can enable redirection for the whole domain, pages, or specific objects.
Access Control Supports both public and private Supports only publicly readable content
content
Redirection support Not applicable Supports both object-level and bucket-level redirects
Requests support Supports all bucket and object Supports only GET and HEAD requests on objects
operations
Responses to GET and HEAD requests at the Returns a list of the object keys Returns the Index document that is specified in the
root of the bucket in the bucket website configuration
SSL support Supports SSL connections Does not support SSL connections
Pre-Signed URLs
Pre-signed URLs can be used to provide temporary access to a specific object to those who do not have AWS credentials.
By default all objects are private and can only be accessed by the owner.
To share an object you can either make it public or generate a pre-signed URL.
These can be generated using SDKs for Java and .Net and AWS explorer for Visual Studio.
Versioning
Versioning stores all versions of an object (including all writes and even if an object is deleted).
Old versions count as billable size until they are permanently deleted.
9/18
Enabling versioning does not replicate existing objects.
Cross Region Replication requires versioning to be enabled on the source and destination buckets.
Only the S3 bucket owner can permanently delete objects once versioning is enabled.
When you try to delete an object with versioning enabled a DELETE marker is placed on the object.
You can delete the DELETE marker and the object will be available again.
Deletion with versioning replicates the delete marker. But deleting the delete marker is not replicated.
Enabled.
Versioned.
Un-versioned.
Objects that existed before enabling versioning will have a version ID of NULL.
Suspension:
If you suspend versioning the existing objects remain as they are however new versions will not be created.
While versioning is suspended new objects will have a version ID of NULL and uploaded objects of the same name will overwrite the
existing object.
A lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions:
Transition actions—Define when objects transition to another storage class. For example, you might choose to transition objects to the
STANDARD_IA storage class 30 days after you created them, or archive objects to the GLACIER storage class one year after creating
them.
There are costs associated with the lifecycle transition requests. For pricing information, see Amazon S3 Pricing.
Expiration actions—Define when objects expire. Amazon S3 deletes expired objects on your behalf.
Can be applied to specific objects within a bucket: objects with a specific tag or objects with a specific prefix.
Amazon S3 supports the following lifecycle transitions between storage classes using a lifecycle configuration:
10/18
You can transition from the STANDARD storage class to any other storage class.
You can transition from any storage class to the GLACIER or DEEP_ARCHIVE storage classes.
You can transition from the STANDARD_IA storage class to the INTELLIGENT_TIERING or ONEZONE_IA storage classes.
You can transition from the INTELLIGENT_TIERING storage class to the ONEZONE_IA storage class.
You can transition from the GLACIER storage class to the DEEP_ARCHIVE storage class.
You can’t transition from any storage class to the STANDARD storage class.
You can’t transition from any storage class to the REDUCED_REDUNDANCY storage class.
You can’t transition from the INTELLIGENT_TIERING storage class to the STANDARD_IA storage class.
You can’t transition from the ONEZONE_IA storage class to the STANDARD_IA or INTELLIGENT_TIERING storage classes.
You can transition from the GLACIER storage class to the DEEP_ARCHIVE storage class only.
You can’t transition from the DEEP_ARCHIVE storage class to any other storage class.
From the STANDARD or STANDARD_IA storage class to INTELLIGENT_TIERING. The following constraints apply:
For larger objects, there is a cost benefit for transitioning to INTELLIGENT_TIERING. Amazon S3 does not transition objects that
are smaller than 128 KB to the INTELLIGENT_TIERING storage class because it’s not cost effective.
From the STANDARD storage classes to STANDARD_IA or ONEZONE_IA. The following constraints apply:
For larger objects, there is a cost benefit for transitioning to STANDARD_IA or ONEZONE_IA. Amazon S3 does not transition
objects that are smaller than 128 KB to the STANDARD_IA or ONEZONE_IA storage classes because it’s not cost effective.
Objects must be stored at least 30 days in the current storage class before you can transition them to STANDARD_IA or
ONEZONE_IA. For example, you cannot create a lifecycle rule to transition objects to the STANDARD_IA storage class one day
after you create them.
Amazon S3 doesn’t transition objects within the first 30 days because newer objects are often accessed more frequently or
deleted sooner than is suitable for STANDARD_IA or ONEZONE_IA storage.
If you are transitioning noncurrent objects (in versioned buckets), you can transition only objects that are at least 30 days
noncurrent to STANDARD_IA or ONEZONE_IA storage.
From the STANDARD_IA storage class to ONEZONE_IA. The following constraints apply:
Objects must be stored at least 30 days in the STANDARD_IA storage class before you can transition them to the ONEZONE_IA class.
Encryption
You can securely upload/download your data to Amazon S3 via SSL endpoints using the HTTPS protocol (In Transit – SSL/TLS).
Encryption options:
SSE-C Upload your own AES-256 encryption key which S3 uses when it writes objects
Client Side Encrypt objects using your own local encryption process before uploading to S3
As an additional safeguard, it encrypts the key itself with a master key that it rotates regularly.
Amazon S3 server-side encryption uses one of the strongest block ciphers available to encrypt your data, 256-bit Advanced Encryption
Standard (AES-256).
If you need server-side encryption for all the objects that are stored in a bucket, use a bucket policy.
To request server-side encryption using the object creation REST APIs, provide the x-amz-server-side-encryption request header.
Note: You need the kms:Decrypt permission when you upload or download an Amazon S3 object encrypted with an AWS Key Management
Service (AWS KMS) customer master key (CMK), and that is in addition to kms:ReEncrypt, kms:GenerateDataKey, and kms:DescribeKey
permissions.
11/18
There are three options for using server-side encryption: SSE-S3, SSE-KMS and SSE-C. These are detailed below,
When you use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), each object is encrypted with a unique key.
As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates.
Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to
encrypt your data.
There are separate permissions for the use of a CMK that provides added protection against unauthorized access of your objects in Amazon
S3.
SSE-KMS also provides you with an audit trail that shows when your CMK was used and by whom.
Additionally, you can create and manage customer managed CMKs or use AWS managed CMKs that are unique to you, your service, and
your Region.
When using server-side encryption with customer-provided encryption keys (SSE-C), you must provide encryption key information using the
following request headers:
x-amz-server-side-encryption-customer-algorithm – Use this header to specify the encryption algorithm. The header value must be “AES256”.
x-amz-server-side-encryption-customer-key – Use this header to provide the 256-bit, base64-encoded encryption key for Amazon S3 to use
to encrypt or decrypt your data.
x-amz-server-side-encryption-customer-key-MD5 – Use this header to provide the base64-encoded 128-bit MD5 digest of the encryption key
according to RFC 1321. Amazon S3 uses this header for a message integrity check to ensure that the encryption key was transmitted without
error.
Client-side encryption
This is the act of encrypting data before sending it to Amazon S3.
1. Use a customer master key (CMK) stored in AWS Key Management Service (AWS KMS).
2. Use a master key you store within your application.
Option 1. Use a customer master key (CMK) stored in AWS Key Management Service (AWS KMS)
When uploading an object—Using the customer master key (CMK) ID, the client first sends a request to AWS KMS for a CMK that it can use
to encrypt your object data. AWS KMS returns two versions of a randomly generated data key:
12/18
A plaintext version of the data key that the client uses to encrypt the object data.
A cipher blob of the same data key that the client uploads to Amazon S3 as object metadata.
When downloading an object—The client downloads the encrypted object from Amazon S3 along with the cipher blob version of the data key
stored as object metadata. The client then sends the cipher blob to AWS KMS to get the plaintext version of the data key so that it can
decrypt the object data.
When uploading an object—You provide a client-side master key to the Amazon S3 encryption client. The client uses the master key only to
encrypt the data encryption key that it generates randomly. The process works like this:
1. The Amazon S3 encryption client generates a one-time-use symmetric key (also known as a data encryption key or data key) locally. It
uses the data key to encrypt the data of a single Amazon S3 object. The client generates a separate data key for each object.
2. The client encrypts the data encryption key using the master key that you provide. The client uploads the encrypted data key and its
material description as part of the object metadata. The client uses the material description to determine which client-side master key to
use for decryption.
3. The client uploads the encrypted data to Amazon S3 and saves the encrypted data key as object metadata (x-amz-meta-x-amz-key) in
Amazon S3.
When downloading an object—The client downloads the encrypted object from Amazon S3. Using the material description from the object’s
metadata, the client determines which master key to use to decrypt the data key. The client uses that master key to decrypt the data key and
then uses the data key to decrypt the object.
The following diagram depicts the options for enabling encryption and shows you where the encryption is applied and where the keys are
managed:
Event Notifications
Amazon S3 event notifications can be sent in response to actions in Amazon S3 like PUTs, POSTs, COPYs, or DELETEs.
Amazon S3 event notifications enable you to run workflows, send alerts, or perform other actions in response to changes in your objects
stored in S3.
To enable notifications, you must first add a notification configuration that identifies the events you want Amazon S3 to publish and the
destinations where you want Amazon S3 to send the notifications.
You can configure notifications to be filtered by the prefix and suffix of the key name of objects.
13/18
Reduced Redundancy Storage (RRS) object lost events.
Replication events.
Publish event messages to an Amazon Simple Notification Service (Amazon SNS) topic.
Publish event messages to an Amazon Simple Queue Service (Amazon SQS) queue.
Publish event messages to AWS Lambda by invoking a Lambda function and providing the event message as an argument.
Need to grant Amazon S3 permissions to post messages to an Amazon SNS topic or an Amazon SQS queue.
Need to also grant Amazon S3 permission to invoke an AWS Lambda function on your behalf. For information about granting these
permissions.
Object Tags
S3 object tags are key-value pairs applied to S3 objects which can be created, updated, or deleted at any time during the lifetime of the
object.
Allow you to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics.
Up to ten tags can be added to each S3 object and you can use either the AWS Management Console, the REST API, the AWS CLI, or the
AWS SDKs to add object tags.
Alternatively, you can call the S3 PUT Bucket Metrics API to enable and configure publication of S3 storage metrics.
CloudWatch Request Metrics will be available in CloudWatch within 15 minutes after they are enabled.
CloudWatch Storage Metrics are enabled by default for all buckets and reported once per day.
S3 requests.
Bucket storage.
Bucket size.
All requests.
HTTP 4XX/5XX errors.
With CRR, every object uploaded to an S3 bucket is automatically replicated to a destination bucket in a different AWS Region that you
choose.
You enable a CRR configuration on your source bucket by specifying a destination bucket in a different Region for replication.
You can use either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs to enable CRR.
Versioning must be enabled for both the source and destination buckets .
With CRR you can only replication between regions, not within a region (see SRR below for single region replication).
You can configure separate S3 Lifecycle rules on the source and destination buckets.
You can replicate KMS-encrypted objects by providing a destination KMS key in your replication configuration.
14/18
You can set up CRR across AWS accounts to store your replicated data in a different account in the target region.
Provides low latency access for data by copying objects to buckets that are closer to users.
To activate CRR you need to configure the replication on the source bucket:
The replicas will be exact replicas and share the same key names and metadata.
You can specify a different storage class (by default the source storage class will be used).
Bucket owners must have permission to read the object and object ACL.
Can be used across accounts but the source bucket owner must have permission to replicate objects into the destination bucket.
What is replicated:
Objects that existed before enabling replication (can use the copy API).
Objects created with SSE-C and SSE-KMS.
Objects to which the bucket owner does not have permissions.
Updates to bucket-level subresources.
Actions from lifecycle rules are not replicated.
Objects in the source bucket that are replicated from another region are not replicated.
Deletion behavior:
If a DELETE request is made without specifying an object version ID a delete marker will be added and replicated.
If a DELETE request is made specifying an object version ID the object is deleted but the delete marker is not replicated.
Charges:
New objects uploaded to an Amazon S3 bucket are configured for replication at the bucket, prefix, or object tag levels.
Replicated objects can be owned by the same AWS account as the original copy or by different accounts, to protect from accidental deletion.
Replication can be to any Amazon S3 storage class, including S3 Glacier and S3 Glacier Deep Archive to create backups and long-term
archives.
15/18
When an S3 object is replicated using SRR, the metadata, Access Control Lists (ACL), and object tags associated with the object are also
part of the replication.
Once SRR is configured on a source bucket, any changes to the object, metadata, ACLs, or object tags trigger a new replication to the
destination bucket.
S3 Analytics
Can run analytics on data stored on Amazon S3.
This includes data lakes, IoT streaming data, machine learning, and artificial intelligence.
S3 Inventory
You can use S3 Inventory to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory
needs.
Amazon S3 inventory provides comma-separated values (CSV), Apache optimized row columnar (ORC) or Apache Parquet (Parquet) output
files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that
have names that begin with a common string).
Daily storage metrics for buckets ‐ Monitor bucket storage using CloudWatch, which collects and processes storage data from
Amazon S3 into readable, daily metrics. These storage metrics for Amazon S3 are reported once per day and are provided to all
customers at no additional cost.
Request metrics ‐ Monitor Amazon S3 requests to quickly identify and act on operational issues. The metrics are available at 1-minute
intervals after some latency to process. These CloudWatch metrics are billed at the same rate as the Amazon CloudWatch custom
metrics.
Replication metrics ‐ Monitor the total number of S3 API operations that are pending replication, the total size of objects pending
replication, and the maximum replication time to the destination Region. Only replication rules that have S3 Replication Time Control
(S3 RTC) enabled will publish replication metrics.
To do this, you can use Amazon S3 server access logging, AWS CloudTrail logs, or a combination of both.
AWS recommend that you use AWS CloudTrail for logging bucket and object-level actions for your Amazon S3 resources.
Server access logging provides detailed records for the requests that are made to a bucket.
You must not set the bucket being logged to be the destination for the logs as this creates a logging loop and the bucket will grow
exponentially.
S3 performance guidelines
AWS provide some performance guidelines for Amazon S3. These are summarized here:
16/18
Measure Performance – When optimizing performance, look at network throughput, CPU, and DRAM requirements. Depending on the mix
of demands for these different resources, it might be worth evaluating different Amazon EC2 instance types.
Scale Storage Connections Horizontally – You can achieve the best performance by issuing multiple concurrent requests to Amazon S3.
Spread these requests over separate connections to maximize the accessible bandwidth from Amazon S3.
Use Byte-Range Fetches – Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring
only the specified portion. You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object. This
helps you achieve higher aggregate throughput versus a single whole-object request. Fetching smaller ranges of a large object also allows
your application to improve retry times when requests are interrupted.
Retry Requests for Latency-Sensitive Applications – Aggressive timeouts and retries help drive consistent latency. Given the large scale
of Amazon S3, if the first request is slow, a retried request is likely to take a different path and quickly succeed. The AWS SDKs have
configurable timeout and retry values that you can tune to the tolerances of your specific application.
Combine Amazon S3 (Storage) and Amazon EC2 (Compute) in the Same AWS Region – Although S3 bucket names are globally unique,
each bucket is stored in a Region that you select when you create the bucket. To optimize performance, we recommend that you access the
bucket from Amazon EC2 instances in the same AWS Region when possible. This helps reduce network latency and data transfer costs.
Use Amazon S3 Transfer Acceleration to Minimize Latency Caused by Distance – Amazon S3 Transfer Acceleration manages fast,
easy, and secure transfers of files over long geographic distances between the client and an S3 bucket. Transfer Acceleration takes
advantage of the globally distributed edge locations in Amazon CloudFront. As the data arrives at an edge location, it is routed to Amazon S3
over an optimized network path. Transfer Acceleration is ideal for transferring gigabytes to terabytes of data regularly across continents. It’s
also useful for clients that upload to a centralized bucket from all over the world.
Glacier
Glacier is an archiving storage solution for infrequently accessed data.
The key difference between the top tiers is that Deep Archive is lower cost, but retrieval times are much longer (12 hours).
The S3 Glacier tier has configurable retrieval times from minutes to hours (you pay accordingly).
Archived objects are not available for real time access and you need to submit a retrieval request.
Glacier must complete a job before you can get its output.
17/18
Requested archival data is copied to S3 One Zone-IA.
You cannot specify Glacier as the storage class at the time you create an object.
Glacier automatically encrypts data at rest using AES 256 symmetric keys and supports secure transfer of data over SSL.
Glacier does not archive object metadata; you need to maintain a client-side database to maintain this information.
Glacier file archives from 100MB up to 40TB can be uploaded to Glacier using the multipart upload API.
You can upload data to Glacier using the CLI, SDKs or APIs – you cannot use the AWS Console.
Glacier adds 32-40KB (indexing and archive metadata) to each object when transitioning from other classes using lifecycle policies.
AWS recommends that if you have lots of small objects they are combined in an archive (e.g. zip file) before uploading.
Glacier archive IDs are added upon upload and are unique for each upload.
Archive retrieval:
When data is retrieved it is copied to S3 and the archive remains in Glacier and the storage class therefore does not change.
AWS SNS can send notifications when retrieval jobs are complete.
To retrieve specific objects within an archive you can specify the byte range (Range) in the HTTP GET request (need to maintain a DB of byte
ranges).
Glacier Charges:
There is no charge for data transfer between EC2 and Glacier in the same region.
Related posts:
18/18