S3 is Object-Based Storage
Manages data as objects rather then in file systmes or data blocks.
- Upload any file type you can think of to S3.
- Examples include photos, videos, code, documents, and text files.
- Cannot be used to run an OS or DB.
S3 Basics
- Unlimited storage. The total amount of data and the number of objects you can store is unlimited.
- Objects up to 5 TB in Size. S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB.
- S3 Buckets. Store files in buckets (similar to folders).
- Tired Storage. Offers a range of storage classes designed for different use cases.
- Lifecycle Management. Defines automatically transition objects to a cheaper tier or delete objects that are no longer required after a set of period of time.
- Versioning. All versions of an object are stored and can be retrived, including deleted objects. Once enabled, versioning cannot be disabled - only suspended. Supports MFA, so you need to procced with 2-factor in order to delete an object. If you enabled public access to versioned objects, old versions will not be accessible.
Working with S3 Buckets
Universal Namespace. All AWS accounts share the S3 namespace. Each S3 bucket name is globally unique.
Example S3 URLs
https://bucket-name.s3.Region.amazonaws.com/key-name
Key-Value Store
Key the name of object (e.g., Toba.jpg)
Value the data itself, which is made up of a sequence of bytes
Version ID important for storing multiple versions of the same object
Metadata data about the data you are storing (e.g., content-type
, last-modified
, etc.)
Hight Available and Highly Durable
- build for 99.95 - 99.99% service availability, depending on the S3 tier.
- designed for 99.999999999 (9 decimal places) durability for data stored in S3.
- data stored redundantly across multiple devices in multiple facilities
Securing your data 🔒
Buckets are private by default. You have to apply public access on both the bucket and its objects in order to make the bucket public.
- Encryption 🔑
1.1 Encryption in Transit
- SSl/TLS
- HTTPS
1.2 Encryption at Rest: Server-Side Encryption
- SSE-S3 : S3-managed keys, using AES 256-bit encryption
- SSE-KMS : AWS Key Management Service-managed keys
- SSE-C : Customer-provided keys
1.3 Encryption at Rest: Client-Side Encryption
- encrypt you files yourself before uploading to S3
- Access Control List (ACLs) Define which AWS accounts or groups are granted access and the type of access. You can attach S3 ACLs to individial objects within a bucket.
- Bucket Policies. S3 bucket policies specify what actions are allowed or denied (e.g., allow user Alice to PUT but not DELETE objects in the bucket.) Bucket policies work on an entire bucket level.
Enforcing Server-Side Encryption
1️⃣ Console. Select encryption settings in S3 Bucket.
2️⃣ Bucket Policy.
x-amz-server-side-encryption: AES256
- (SSE-S3 - S3 managed keys)
x-amz-server-side-encryption: aws:kms
- (SSE-KMS - KMS managed keys)
💡 You can create abucket policy that denies any S3 PUT request that doesn’t include the
x-amz-server-side-encryption
parameter in request header.
PUT request
PUT /my-image.jpg HTTP/1.1
Host: myBucket.s3.<Region>.amazonaws.com
Date: Wed, 12 Oct 2009 17:50:00 GMT
Authorization: authorization string
Content-Type: text/plain
Content-Length: 11434
x-amz-meta-author: Janet
Expect: 100-continue
x-amz-server-side-encryption: AES256
[11434 bytes of object data]
Static Websites on S3
S3 scales automatically to meet demand Many enterprises will put static websites on S3 when they think there is going to be a large number or requests.
S3 Storage Classes
S3 Standard
- data is stored redunduntly across multiple devices in multiple facilities (>= 3 AZs)
- designed for frequent access
- suitable for most workloads: the default storage class; use cases include websites, content distribution, mobile and gaming applications, and big data analatics.
- 99.99% availability
- 99.999999999 (11 9’s) durability
S3 Standard-Infrequent Access (S3 Standard-IA)
Use Case: great for long-term storage, backups and as data store for disaster recovery files
- rapid access used for data that is accessed less frequently but requires rapid access when needed.
- you pay to access data there is low per-GB storage price and a per-GB retrieved fee.
- 99.9% availability
- 99.999999999 (11 9’s) durability
S3 One Zone Infrequent Access
Use Case: good for long-lived, infrequently accessed, non-critical data
Like S3 Standard-IA, but data is stored redunduntly within single AZ.
- cost 20% less then regular S3 Standard-IA
- 99.5% availability
- 99.999999999 (11 9’s) durability
S3 Intelligent-Tiering
Use Case: If you don;t know whether you’ll be accessing data frequently or infrequently
💵 optimization: montly fee of $0.0025 per 1,000 objects
- automatically moves your data to the most cost-effective tier based on how frequently you access each object.
- 99.99% availability
- 99.999999999 (11 9’s) durability
3 Glacier Options
- you pay each time you access data
- use only for archieving data
- glacier is cheap storage
- optimized for data that is very infrequently accessed
- 99.99% availability
- 99.999999999 (11 9’s) durability
Option 1: Glacier Instant Retrieval
provides long-term data archiving with instant retrieval time for your data.
Option 2: Glacier Flexible Retrieval
ideal storage class for archive data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases. can be minites or up to 12 hours.
Option 3: Glasier Deep Archive
cheapest storage class and designed for customers that retain data sests for 7-10 years or longer to meet customer needs and regulatory compliance requirements. the standard retrieval time is 12 hours and the bulk retrieval time is 48 hours.
Performance across the S3 Storage Classes
S3 Standard | S3 Intelligent-Tiering | S3 Standard-IA | S3 One Zone-IA+ | S3 Glacier Instant Retrieval | S3 Glacier Flexible Retrieval | S3 Glacier Deep Archive | |
---|---|---|---|---|---|---|---|
Designed for Durability | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) | 99.999999999% (11 9’s) |
Designed for Availability | 99.99% | 99.90% | 99.90% | 99.50% | 99.99% | 99.99% | 99.99% |
Availability SLA | 99.90% | 99% | 99% | 99% | 99.90% | 99.90% | 99.90% |
AZs | >=3 | >=3 | >=3 | 1 | >=3 | >=3 | >=3 |
Min Capacity charge per Object | n/a | n/a | 128KB | 128KB | 128KB | 40KB | 40KB |
Min Storage duration charge | n/a | 30 days | 30 days | 30 days | 90 days | 90 days | 180 days |
Retrieval Fee | n/a | n/a | Per GB retrieved | Per GB retrieved | Per GB retrieved | Per GB retrieved | Per GB retrieved |
Storage Type | Object | Object | Object | Object | Object | Object | Object |
Lifecycle Transitions | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Storage Classes - Costs 💵
S3 Standard
general-purpose for any type of data, typically used for frequent access
First 50 TB / Month | 0.023$ per GB |
Next 450 TB / Month | 0.022$ per GB |
Over 500 TB / Month | 0.021$ per GB |
S3 Intelligent-Tiering
cost saving strategies applied for data with unknown or changing access patterns
no min storage duration
First 50 TB / Month | 0.023$ per GB |
Next 450 TB / Month | 0.022$ per GB |
Over 500 TB / Month | 0.021$ per GB |
All Storage / Month | 0.0025$ per GB |
Monitoring and Automation, All Storage / Month | 0.0025$ per 1000 Objects |
S3 Standard Infrequent Access
long-lived but infrequently accessed data (once a month) that needs milliseconds access 30 days min storage duration
All Storage / Month | 0.0125$ per GB |
S3 One Zone-Infrequent Access
re-creatable infrequently accessed data that needs milliseconds access 30 days min storage duration
All Storage / Month | 0.01$ per GB |
S3 Glacier
long-lived archived data accssed once a quarter with instant retrieval in milliseconds
All Storage / Month | 0.004$ per GB |
S3 Glacier Flexible Retrieval
archived data that does not require immediate access but needs the flexibility to retrieve large sets of data at no cost. retrieval time: can be minutes or up to 12 hours 90 days min storage duration
All Storage / Month | 0.0036$ per GB |
S3 Glacier Deep Archive
designed to retain data sets for 7-10 years or longer. retrieval time: 12 hours and the bulk is 48 hours 180 days min storage duration
All Storage / Month | 0.00099$ per GB |
Lifecycle Management
- automate moving object between the different storage tiers, thereby maximazing cost effectiveness.
- can be used in conjuction with versioning
- can be applied to current and noncurrent versions
Example: S3 Standard: keep for 30 days -> S3 IA: after 30 days -> Glacier: After 90 days
S3 Object Lock 🔒
- you can use S3 Object Lock to store objects using a write once, read many model. It can help to prevent objects from being delete or modified for a fixed amount of time or indefinitely.
- object lock can be on individual objects or applied across the bucket as a whole
- object lock comes in two modes: Governance Mode, Complience Mode
S3 Object Lock Modes
Governance Mode
Users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
Complience Mode
A protected object version can’t be overwritten or deleted by any user, including the root in your AWS Account. Retention mode of object can’t be changed and its retenton period can’t be shortened. Compiance mode ensures an object version can’t be overriten or deleted for the duration of the retention period.
Retention Periods 🕥
Retention period protects an object version for a fixed amount of time. Ehen ypu place a retention period on an object version, Amazon S3 stores a timestamp in the object version’s metadata to indicate when the retention period expires.
💡 After retention period expires, the object version can be overriten or deleted unless you also placed a legal hold on object version.
Legal Holds
Like retention period, a legal hold prevents an object version from being overriten or deleted.
However, a legal holds doesn’t have an assosiated retention period and remains in effect untill removed.
A Legal Hold can be placed and removed by any user who has s3:PutObjectLegalHold
permission.
Glacier Vault Lock
Allows to deploy and enforce compiance controls for individual S3 Glacier vaults with a vault lock policy.
you can specify controls such as WORM, in a vault lock policy.
🔴 once locked, the policy can’t be changed
Optimizing S3 Performance
S3 Prefixes
this is folders inside you bucket. it doesn’t include object name
bucketname/folder1/subfolder1/file.md -> prefix: /folder1/subfolder1/
bucketname/folder2/subfolder1/file.md -> prefix: /folder2/subfolder1/
bucketname/folder3/file.md -> prefix: /folder3
S3 Performance
S3 has low latency. You can get the first byte out of S3 within 100-200 milliseconds.
You can achive a hight number of requests:
- 3,500 PUT/COPY/POST/DELETE
- 5,500 GET/HEAD
requests per second, pre prefix.
💡 the more prefixes you have on the bucket, the higher performance you can get. for example, 2 prefixes gives you 11,000 rps; 4 prefixes = 22,000; etc.
Limitations with KMS
🟠 SSE-KMS; with file upload, you will call GenerateDataKey
in the KMS API. -> KMS quota (can’t be increased)
🟠 SSE-KMS; with file download you will call Decrypt
in the KMS API. -> KMS quota (can’t be increased)
better to use native S3 encryption (SSE-S3) rather then KMS
Uploads
🟢 Multipart Uploads
- recommended for files over 100 MB
- required for files over 5 GB
- parallelize uploads (increase efficiency)
Downloads
🟢 S3 Byte-Range Fetches
- parallelize downloads by specifying byte ranges.
- if there is a failure in the download, it it’s only for a specific byte range.
Backup Data with S3 Replication
- you can replicate objects from one bucket to another.versioning must be enabled for both sides
- objects in an existing bucket are not replicated automatically
- delete markers are not replicated by default