AMAZON S3

Techmandra

November 24, 2021

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Storage classes

Amazon S3 offers a range of storage classes designed for different use cases:

S3 Standard for general-purpose storage of frequently accessed data
S3 Intelligent-Tiering for data with unknown or changing access patterns
S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-Infrequent Access (S3 One Zone-IA) for long-lived, but less frequently accessed data
Amazon S3 Glacier (S3 Glacier) and Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive) for long-term archive and digital preservation

Your choice of Amazon S3 storage classes

Cost factors

Storage per gigabytes (GB)
- Storage Overhead charges – for every Object stored in Glacier or Glacier Deep archive, it stores 8 KB of data in S3 standard for Metadata to support List operation and 32 KB for index in Glacier which is charged as per your glacier tier.
Request – which is the number of APIs – PUTs and GETs requests.
Retrievals – which is measured in gigabytes (GB), applicable to a few storage classes.
You have monitoring and automation charges for intelligent tiering
Early delete charges for some storage class.
Data transfer cost – usually no transfer fee for Data-in and depending on the requestor location and medium of data transfer, different charges for data-out
Various management features such as analytics, batch processing, Inventory, and storage replication, where you need to pay differently
Minimum object size – For IA, one Zone IA if you store an object lesser than 128 KB you are still going to pay for 128 KB minimum. For intelligent tiering there isn’t a minimum billable object size however 128 KB object won’t move.

Refer to Amazon S3 pricing page for more details.

Storage overhead charges

When you transition objects to the GLACIER or DEEP_ARCHIVE storage class, a fixed amount of storage is added to each object to accommodate metadata for managing the object:

For each object archived to GLACIER or DEEP_ARCHIVE, Amazon S3 uses 8 KB of storage for the name of the object and other metadata. Amazon S3 stores this metadata so that you can get a real-time list of your archived objects by using the Amazon S3 API. For more information, see Get Bucket(List Objects). You are charged Amazon S3 STANDARD rates for this additional storage.
For each object that is archived to GLACIER or DEEP_ARCHIVE, Amazon S3 adds 32 KB of storage for index and related metadata. This extra data is necessary to identify and restore your object. You are charged GLACIER or DEEP_ARCHIVE rates for this additional storage.

The above table illustrates that it is not always a good idea to move files to Amazon S3 Glacier: it makes sense when you have files with a size larger than 1000 KB.

You can calculate potential savings based on your case using this Excel File.

Tip #1: Leverage Intelligent Tiering

Amazon S3 Intelligent-Tiering is a storage class designed for customers who want to optimize storage costs automatically when data access patterns change, without performance impact or operational overhead. S3 Intelligent-Tiering is the first cloud object storage class that delivers automatic cost savings by moving data between two access tiers — frequent access and infrequent access — when access patterns change, and is ideal for data with unknown or changing access patterns.

It is the ideal storage class for long-lived data with access patterns that are unknown or unpredictable.

Tip #2: Implement lifecycle policies

If you know your data access pattern, you can use lifecycle policies to define actions that you want Amazon S3 to take during an object’s lifetime. For example, transition objects to another storage class, archive them, or delete them after a specified period of time.

You can define a lifecycle policy for all objects or a subset of objects in a bucket by using a shared prefix (objects names that begin with a common string) or a tag:

to clean old versions of objects.
to clean incomplete partial uploads.
to automatically change storage tier.
to automatically archive objects.
to automatically delete objects.

How do I create a lifecycle policy for an S3 Bucket?

Tip #3: Delete old versions of objects in versioned buckets

After you enable object versioning for an S3 bucket, successive uploads or PUTs of a particular object will create distinct, named, individually addressable versions of the object in order to provide you with protection against overwrites and deletes. You can preserve, retrieve, and restore every version of every object in an S3 bucket that has versioning enabled.

Old versions of objects still use storage space. You may want to automate deletion of those old and not relevant object version using Amazon S3 lifecycles (see Tip #2).

Tip #4: Clean incomplete multipart uploads

Amazon S3’s multipart upload accelerates the uploading of large objects by allowing you to split them up into logical parts that can be uploaded in parallel. If you initiate a multipart upload but never finish it, the in-progress upload occupies some storage space and will incur storage charges. However, these uploads are not visible when you list the contents of a bucket.

You can leverage lifecycle policies to automatically clean incomplete multipart upload.

Tip #5: Know your data access patterns

Knowing your data access pattern – for instance in a billing and invoicing system, pdf invoices are likely to be accessed in the first 30 days of their publication and no longer accessed afterward – you can leverage lifecycle policies to automatically move object from STANDARD storage class to STANDARD_IA (IA, for infrequent access).

IA storage class provides you the same API and performance as the regular S3 storage. IA is approximately four times cheaper than S3 standard storage ($0.007 GB/month vs $0.03 GB/month), but you pay for the retrieval ($0.01 GB). Retrieval is free on standard S3 storage class.

IA is a great candidate for disaster recovery backups. It makes sense to directly upload any object over 128KB to IA and save 60% on storage for a year without losing availability or durability of the data.

If you don’t know your data access patterns you can use Amazon S3 Storage Class Analysis. It helps you analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA storage class.

You can also leverage the dashboard provided by Amazon S3 Storage Lens to get an holistic view of your bucket.

Tip #6: Archive old files

Amazon S3 Glacier and S3 Glacier Deep Archive are designed to be the lowest cost Amazon S3 storage classes, allowing you to archive large amounts of data at a very low cost. It is designed for use cases where data is retained for months, years, or decades.

You can use Amazon S3 Glacier on purpose or leverage lifecycle policies to automatically archive old and not accessed objects.

Trade-offs: Objects that have been archived must be restored before being used again.

DO NOT IMPLEMENT archive on small objects (smaller than 1MBs). Amazon S3 Glacier bills for both data and requests. When we have small files, we tend to have thousands if not millions of them. In that case archiving may lead to unexpected charges. Prefer archiving compressed objects (multiple files into one).

Tip #7: Delete objects that are no longer relevant

With Amazon S3 lifecycle policies, you may also automate deletion of objects that are no longer relevant. For instance, you might not need your development database backups for more than 7 days.

Tip #8: Consider deleting unused files which can be recreated

Sometimes, it might be cheaper to generate files on the fly. For example, if you need several resolutions of image thumbnails that are accessed rarely it may make sense to just keep original images and recreate other resolutions when they are requested and cache them on the content delivery network.

You’ll find an architecture allowing this in the Resize Images on the Fly with Amazon S3, AWS Lambda, and Amazon API Gateway blog post.

Tip #9: Consider using batch objects

Usually, a lot of tiny objects can get very expensive very quickly. It makes sense to batch objects. If you always upload and download all objects at the same time, it best practice to store them as a single file (using tar). You should design a system to avoid a huge number of small files. It is usually a good pattern to have some clustering that prevents small files.