Logical Enigma
AWS Notes

S3

  • 0 - 5TB. Universal namespace. Upload returns HTTP 200.
  • Read after write consistency for PUTS of new Objects
  • Eventual consistency for overwrite PUTS and DELETES - NOW Strongly consistent
  • MFA Deletes
  • 99.99% availability design
  • 99.9% guaranteed availably
  • 11 9’s Durability
  • S3 Standard - sustain loss of 2 facilities concurrently
  • S3 - IA - retrieval fee
  • S3 One Zone - IA (formerly reduced redundancy storage RRS)
  • S3 - Intelligent Tiering - most cost-effective via ML
  • S3 Glacier - retrieval configurable mins to hours
  • S3 Glacier Deep Archive - 12 hrs retrieval
  • Cross Region Replication
  • S3 Transfer Acceleration - upload to edges (get a distinct URL), then back to bucket
  • Cost:
    • Storage
    • Requests
    • data transfer
    • replication
  • Versioning
    • Must make each version public individually, setting latest version public doesn’t cascade
    • Once enabled, can’t be disabled, just suspended
    • Includes all writes and delete
    • Has MFA Delete
  • Lifecycle management - can be current versions and previous versions
  • S3 Object Lock - Write Once, ready many (WORM) - retention period. Indiv obj or buckets
    • Governance Mode - can be changed w/ special permissions
    • Compliance Mode - can’t be changed even by root
    • Legal Hold - in effect until removed
  • Glacier Vault Lock
  • Performance
    • Prefix - mybucket/folder1/subfolder1/object.jpg
    • First byte latency 100-200 ms
    • 3500 req/sec per prefix PUT/COPY/POST/DELETE
    • 5500 GET/HEAD
    • Spread across prefixes to achieve more reqs/sec
    • SSE-KMS KMS Limits for the API : upload -> GenerateDataKey; download -> Decrypt
    • KMS quota is region specific: 5500, 10000, 30000 req/sec. Can’t request increase ATM
    • Multipart Uploads
    • Recommended > 100 MB; required > 5 GB
    • Parallelize uploads
    • Downloads - byte-range fetches (parallelize downloads by byte ranges)
    • Speed up downloads
    • Get partial amounts (like header)
  • S3 Select - SQL expressions to only retrieve what’s needed (~ 400% faster/ 80% cheaper)
  • Glacier Select
  • Sharing buckets across accounts:
    • Using bucket policies & IAM - applies to entire bucket. programmatic access only
    • Using Bucket ACLS & IAM - aplies to individual objects. programmatic
    • Cross-account IAM Roles - programmatic and console
  • Cross Region Replication
    • Will require versioning to be turned on for source and destination
    • Only begins when set up - existing objects not automatically replicated
    • Permissions (public) now replicate to other regions
    • Virtual-host style URLs favored over path-style

Athena & Macie

  • Athena is a query service to query data in S3 as standard SQL
    • Serverless
    • No need for ETL
    • Useful for log data ELB Logs, S3 access logs, click stream data, etc
    • Works with JSON, Apache Parquet, Apache ORC
  • Macie - security service using ML and NLP to recognize if s3 contain sensitive data to discover PII. CloudTrail logs, etc