S3
- 0 - 5TB. Universal namespace. Upload returns HTTP 200.
- Read after write consistency for PUTS of new Objects
- Eventual consistency for overwrite PUTS and DELETES - NOW Strongly consistent
- MFA Deletes
- 99.99% availability design
- 99.9% guaranteed availably
- 11 9’s Durability
- S3 Standard - sustain loss of 2 facilities concurrently
- S3 - IA - retrieval fee
- S3 One Zone - IA (formerly reduced redundancy storage RRS)
- S3 - Intelligent Tiering - most cost-effective via ML
- S3 Glacier - retrieval configurable mins to hours
- S3 Glacier Deep Archive - 12 hrs retrieval
- Cross Region Replication
- S3 Transfer Acceleration - upload to edges (get a distinct URL), then back to bucket
- Cost:
- Storage
- Requests
- data transfer
- replication
- Versioning
- Must make each version public individually, setting latest version public doesn’t cascade
- Once enabled, can’t be disabled, just suspended
- Includes all writes and delete
- Has MFA Delete
- Lifecycle management - can be current versions and previous versions
- S3 Object Lock - Write Once, ready many (WORM) - retention period. Indiv obj or buckets
- Governance Mode - can be changed w/ special permissions
- Compliance Mode - can’t be changed even by root
- Legal Hold - in effect until removed
- Glacier Vault Lock
- Performance
- Prefix - mybucket/folder1/subfolder1/object.jpg
- First byte latency 100-200 ms
- 3500 req/sec per prefix PUT/COPY/POST/DELETE
- 5500 GET/HEAD
- Spread across prefixes to achieve more reqs/sec
- SSE-KMS KMS Limits for the API : upload -> GenerateDataKey; download -> Decrypt
- KMS quota is region specific: 5500, 10000, 30000 req/sec. Can’t request increase ATM
- Multipart Uploads
- Recommended > 100 MB; required > 5 GB
- Parallelize uploads
- Downloads - byte-range fetches (parallelize downloads by byte ranges)
- Speed up downloads
- Get partial amounts (like header)
- S3 Select - SQL expressions to only retrieve what’s needed (~ 400% faster/ 80% cheaper)
- Glacier Select
- Sharing buckets across accounts:
- Using bucket policies & IAM - applies to entire bucket. programmatic access only
- Using Bucket ACLS & IAM - aplies to individual objects. programmatic
- Cross-account IAM Roles - programmatic and console
- Cross Region Replication
- Will require versioning to be turned on for source and destination
- Only begins when set up - existing objects not automatically replicated
- Permissions (public) now replicate to other regions
- Virtual-host style URLs favored over path-style
Athena & Macie
- Athena is a query service to query data in S3 as standard SQL
- Serverless
- No need for ETL
- Useful for log data ELB Logs, S3 access logs, click stream data, etc
- Works with JSON, Apache Parquet, Apache ORC
- Macie - security service using ML and NLP to recognize if s3 contain sensitive data to discover PII. CloudTrail logs, etc