S3 is Object-Based Storage

Manages data as objects rather then in file systmes or data blocks.

  • Upload any file type you can think of to S3.
  • Examples include photos, videos, code, documents, and text files.
  • Cannot be used to run an OS or DB.

S3 Basics

  • Unlimited storage. The total amount of data and the number of objects you can store is unlimited.
  • Objects up to 5 TB in Size. S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB.
  • S3 Buckets. Store files in buckets (similar to folders).
  • Tired Storage. Offers a range of storage classes designed for different use cases.
  • Lifecycle Management. Defines automatically transition objects to a cheaper tier or delete objects that are no longer required after a set of period of time.
  • Versioning. All versions of an object are stored and can be retrived, including deleted objects. Once enabled, versioning cannot be disabled - only suspended. Supports MFA, so you need to procced with 2-factor in order to delete an object. If you enabled public access to versioned objects, old versions will not be accessible.

Working with S3 Buckets

Universal Namespace. All AWS accounts share the S3 namespace. Each S3 bucket name is globally unique.

Example S3 URLs

https://bucket-name.s3.Region.amazonaws.com/key-name

Key-Value Store

Key the name of object (e.g., Toba.jpg) Value the data itself, which is made up of a sequence of bytes Version ID important for storing multiple versions of the same object Metadata data about the data you are storing (e.g., content-type, last-modified, etc.)

Hight Available and Highly Durable

  • build for 99.95 - 99.99% service availability, depending on the S3 tier.
  • designed for 99.999999999 (9 decimal places) durability for data stored in S3.
  • data stored redundantly across multiple devices in multiple facilities

Securing your data 🔒

Buckets are private by default. You have to apply public access on both the bucket and its objects in order to make the bucket public.

  1. Encryption 🔑

1.1 Encryption in Transit

  • SSl/TLS
  • HTTPS

1.2 Encryption at Rest: Server-Side Encryption

  • SSE-S3 : S3-managed keys, using AES 256-bit encryption
  • SSE-KMS : AWS Key Management Service-managed keys
  • SSE-C : Customer-provided keys

1.3 Encryption at Rest: Client-Side Encryption

  • encrypt you files yourself before uploading to S3
  1. Access Control List (ACLs) Define which AWS accounts or groups are granted access and the type of access. You can attach S3 ACLs to individial objects within a bucket.
  2. Bucket Policies. S3 bucket policies specify what actions are allowed or denied (e.g., allow user Alice to PUT but not DELETE objects in the bucket.) Bucket policies work on an entire bucket level.

Enforcing Server-Side Encryption

1️⃣ Console. Select encryption settings in S3 Bucket.

2️⃣ Bucket Policy.

x-amz-server-side-encryption: AES256 - (SSE-S3 - S3 managed keys)

x-amz-server-side-encryption: aws:kms - (SSE-KMS - KMS managed keys)

💡 You can create abucket policy that denies any S3 PUT request that doesn’t include the x-amz-server-side-encryption parameter in request header.

PUT request

PUT /my-image.jpg HTTP/1.1
Host: myBucket.s3.<Region>.amazonaws.com
Date: Wed, 12 Oct 2009 17:50:00 GMT
Authorization: authorization string
Content-Type: text/plain
Content-Length: 11434
x-amz-meta-author: Janet
Expect: 100-continue
x-amz-server-side-encryption: AES256
[11434 bytes of object data]

Static Websites on S3

S3 scales automatically to meet demand Many enterprises will put static websites on S3 when they think there is going to be a large number or requests.


S3 Storage Classes

S3 Standard

  • data is stored redunduntly across multiple devices in multiple facilities (>= 3 AZs)
  • designed for frequent access
  • suitable for most workloads: the default storage class; use cases include websites, content distribution, mobile and gaming applications, and big data analatics.
  • 99.99% availability
  • 99.999999999 (11 9’s) durability

S3 Standard-Infrequent Access (S3 Standard-IA)

Use Case: great for long-term storage, backups and as data store for disaster recovery files

  • rapid access used for data that is accessed less frequently but requires rapid access when needed.
  • you pay to access data there is low per-GB storage price and a per-GB retrieved fee.
  • 99.9% availability
  • 99.999999999 (11 9’s) durability

S3 One Zone Infrequent Access

Use Case: good for long-lived, infrequently accessed, non-critical data

Like S3 Standard-IA, but data is stored redunduntly within single AZ.

  • cost 20% less then regular S3 Standard-IA
  • 99.5% availability
  • 99.999999999 (11 9’s) durability

S3 Intelligent-Tiering

Use Case: If you don;t know whether you’ll be accessing data frequently or infrequently

💵 optimization: montly fee of $0.0025 per 1,000 objects

  • automatically moves your data to the most cost-effective tier based on how frequently you access each object.
  • 99.99% availability
  • 99.999999999 (11 9’s) durability

3 Glacier Options

  • you pay each time you access data
  • use only for archieving data
  • glacier is cheap storage
  • optimized for data that is very infrequently accessed
  • 99.99% availability
  • 99.999999999 (11 9’s) durability

Option 1: Glacier Instant Retrieval

provides long-term data archiving with instant retrieval time for your data.

Option 2: Glacier Flexible Retrieval

ideal storage class for archive data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases. can be minites or up to 12 hours.

Option 3: Glasier Deep Archive

cheapest storage class and designed for customers that retain data sests for 7-10 years or longer to meet customer needs and regulatory compliance requirements. the standard retrieval time is 12 hours and the bulk retrieval time is 48 hours.

Performance across the S3 Storage Classes

S3 Standard S3 Intelligent-Tiering S3 Standard-IA S3 One Zone-IA+ S3 Glacier Instant Retrieval S3 Glacier Flexible Retrieval S3 Glacier Deep Archive
Designed for Durability 99.999999999% (11 9’s) 99.999999999% (11 9’s) 99.999999999% (11 9’s) 99.999999999% (11 9’s) 99.999999999% (11 9’s) 99.999999999% (11 9’s) 99.999999999% (11 9’s)
Designed for Availability 99.99% 99.90% 99.90% 99.50% 99.99% 99.99% 99.99%
Availability SLA 99.90% 99% 99% 99% 99.90% 99.90% 99.90%
AZs >=3 >=3 >=3 1 >=3 >=3 >=3
Min Capacity charge per Object n/a n/a 128KB 128KB 128KB 40KB 40KB
Min Storage duration charge n/a 30 days 30 days 30 days 90 days 90 days 180 days
Retrieval Fee n/a n/a Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved Per GB retrieved
Storage Type Object Object Object Object Object Object Object
Lifecycle Transitions Yes Yes Yes Yes Yes Yes Yes

Storage Classes - Costs 💵

S3 Standard

general-purpose for any type of data, typically used for frequent access

First 50 TB / Month 0.023$ per GB
Next 450 TB / Month 0.022$ per GB
Over 500 TB / Month 0.021$ per GB

S3 Intelligent-Tiering

cost saving strategies applied for data with unknown or changing access patterns

no min storage duration

First 50 TB / Month 0.023$ per GB
Next 450 TB / Month 0.022$ per GB
Over 500 TB / Month 0.021$ per GB
All Storage / Month 0.0025$ per GB
Monitoring and Automation, All Storage / Month 0.0025$ per 1000 Objects

S3 Standard Infrequent Access

long-lived but infrequently accessed data (once a month) that needs milliseconds access 30 days min storage duration

All Storage / Month 0.0125$ per GB

S3 One Zone-Infrequent Access

re-creatable infrequently accessed data that needs milliseconds access 30 days min storage duration

All Storage / Month 0.01$ per GB

S3 Glacier

long-lived archived data accssed once a quarter with instant retrieval in milliseconds

All Storage / Month 0.004$ per GB

S3 Glacier Flexible Retrieval

archived data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost. retrieval time: can be minutes or up to 12 hours 90 days min storage duration

All Storage / Month 0.0036$ per GB

S3 Glacier Deep Archive

designed to retain data sets for 7-10 years or longer. retrieval time: 12 hours and the bulk is 48 hours 180 days min storage duration

All Storage / Month 0.00099$ per GB

Lifecycle Management

  • automate moving object between the different storage tiers, thereby maximazing cost effectiveness.
  • can be used in conjuction with versioning
  • can be applied to current and noncurrent versions

Example: S3 Standard: keep for 30 days -> S3 IA: after 30 days -> Glacier: After 90 days

S3 Object Lock 🔒

  • you can use S3 Object Lock to store objects using a write once, read many model. It can help to prevent objects from being delete or modified for a fixed amount of time or indefinitely.
  • object lock can be on individual objects or applied across the bucket as a whole
  • object lock comes in two modes: Governance Mode, Complience Mode

S3 Object Lock Modes

Governance Mode

Users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.

Complience Mode

A protected object version can’t be overwritten or deleted by any user, including the root in your AWS Account. Retention mode of object can’t be changed and its retenton period can’t be shortened. Compiance mode ensures an object version can’t be overriten or deleted for the duration of the retention period.

Retention Periods 🕥

Retention period protects an object version for a fixed amount of time. Ehen ypu place a retention period on an object version, Amazon S3 stores a timestamp in the object version’s metadata to indicate when the retention period expires.

💡 After retention period expires, the object version can be overriten or deleted unless you also placed a legal hold on object version.

Like retention period, a legal hold prevents an object version from being overriten or deleted.

However, a legal holds doesn’t have an assosiated retention period and remains in effect untill removed.

A Legal Hold can be placed and removed by any user who has s3:PutObjectLegalHold permission.

Glacier Vault Lock

Allows to deploy and enforce compiance controls for individual S3 Glacier vaults with a vault lock policy.

you can specify controls such as WORM, in a vault lock policy.

🔴 once locked, the policy can’t be changed


Optimizing S3 Performance

S3 Prefixes

this is folders inside you bucket. it doesn’t include object name

bucketname/folder1/subfolder1/file.md -> prefix: /folder1/subfolder1/
bucketname/folder2/subfolder1/file.md -> prefix: /folder2/subfolder1/
bucketname/folder3/file.md -> prefix: /folder3

S3 Performance

S3 has low latency. You can get the first byte out of S3 within 100-200 milliseconds.

You can achive a hight number of requests:

  • 3,500 PUT/COPY/POST/DELETE
  • 5,500 GET/HEAD

requests per second, pre prefix.

💡 the more prefixes you have on the bucket, the higher performance you can get. for example, 2 prefixes gives you 11,000 rps; 4 prefixes = 22,000; etc.

Limitations with KMS

🟠 SSE-KMS; with file upload, you will call GenerateDataKey in the KMS API. -> KMS quota (can’t be increased)

🟠 SSE-KMS; with file download you will call Decrypt in the KMS API. -> KMS quota (can’t be increased)

better to use native S3 encryption (SSE-S3) rather then KMS

Uploads

🟢 Multipart Uploads

  • recommended for files over 100 MB
  • required for files over 5 GB
  • parallelize uploads (increase efficiency)

Downloads

🟢 S3 Byte-Range Fetches

  • parallelize downloads by specifying byte ranges.
  • if there is a failure in the download, it it’s only for a specific byte range.

Backup Data with S3 Replication

  • you can replicate objects from one bucket to another.versioning must be enabled for both sides
  • objects in an existing bucket are not replicated automatically
  • delete markers are not replicated by default