S3 Flashcards

1
Q

S3 basics

A
  • Safe place to store files
  • Object-based storage (can upload)
  • Data is spread across multiple devices & facilities
  • 0b to 5TB per file
  • Storage unlimited
  • Buckets - universal namespace
  • After uploading a file, you are provided with a HTTP 200 code back to your browser
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

S3 objects

A

Consist of:

  • Key & value
  • Version ID
  • Metadata
  • Subresources:
    • Access control lists
    • Torrents
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

S3 data consistency and guarantees

A
  • ‘Read after Write’ consistency for PUTS of new objects
  • ‘Eventual Consistency’ for overwrite PUTS and DELETES (i.e. aren’t immediate, if you try to read immediately, you may get an older version)
    Guarantees:
  • 99.99% availability for the S3 platform
  • 99.99% (11 9s) durability for S3 information (i.e. info not being lost)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

S3 features

A
  • Tiered storage available
  • Lifecycle management
  • Versioning
  • Encryption
  • MFA for deleting objects
  • Secure data further using Access Control Lists and Bucket Policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 Storage Classes

A
  • Standard
  • Infrequently Accessed (IA): rapid access only when really needed. Charged a retrieval fee
  • One zone IA: lower cost option for IA, only one AZ
  • Intelligent tiering: optimises costs by moving data between tiers automatically. No impact on performance or operational overhead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

S3 Glacier

A
  • Glacier: secure, durable, low-cost data archiving. Store any amount of data at or lower than on-prem. Retrieval times configurable from minutes to hours
  • Glacier Deep Archive: lowest-cost storage class, retrieval time of 12 hours is acceptable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S3 costs

A

Costs incurred on:

  • Storage
  • Number of requests
  • Storage management pricing (i.e. moving between tiers)
  • Data transfer
  • Transfer acceleration (fast transfer of files over long distances using CloudFront)
  • Cross-region replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

S3 Versioning

A
  • All versions of an object are stored (including writes, deletes receive a delete marker over that version)
  • Once enabled, versioning cannot be disabled (only suspended) - have to delete bucket
  • Integrates with lifecycle rules (i.e. moving to Glacier)
  • Versioning’s MFA delete capability for additional layer of security
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Lifecycle management with S3

A
  • Lifecycle Rules to automate managing objects (moving between tiers/delete after certain number of days etc.)
  • Can be used in conjunction with versions (i.e. applied to certain versions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

S3 Object Lock and Glacier Vault Lock

A
  • S3 object lock: store objects with ‘write once, read many’ (WORM) model. Stops objects from being modified or deleted
  • Can assist in meeting regulatory requirements for WORM such as:
    • Governance mode: only users with special permissions can modify
    • Compliance mode: no one (not even root user) can delete until expiry
    • Legal holds: placed on object indefinitely until removed
  • Glacier vault lock: similar to object lock, allows for WORM models in Glacier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

S3 Performance

A
  • Prefixes: subfolders within a bucket. Can get better performance by spreading reads across different prefixes
  • Limitations with KMS: uploading/downloading with KMS encryption counts towards KMS quota and adds to latency
  • Multipart uploads: recommended for objects over 100MB, required for files over 5GB. Can parallelize uploads as well
  • Byte-range fetches: parallelize downloads by specifying byte ranges
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

S3 Select and Glacier Select

A
  • Allows you to select certain objects with SQL statements. E.g. a csv file within a zip file, instead of downloading and unzipping
  • Highly regulated industries write data directly to Glacier to satisfy compliance rules. Others have lifecycle rules then move objects to Glacier. Glacier Select lets you run SQL queries against Glacier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AWS Organisations and Consolidated Billing

A

Organisations:
- An account management service that enables consolidation of multiple AWS accounts to manage centrally.
- Best practice is to have root account just for billing and other accounts for certain teams/role types (Devs, testers etc.). Policies made at top level and inherited
Consolidated billing:
- paying account is independent but settles the resource bills for linked accounts.
- benefit is the ability to aggregate services for better pricing
- Service Control Policies enable/disable account services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sharing S3 buckets across accounts

A
  1. Use bucket policies & IAM (entire bucket, programmatic access only)
  2. Use bucket ACLs & IAM (down to individual objects, programmatic access only)
  3. Cross-account IAM roles. Programmatic and console access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cross-region replication

A
  • Bucket files exist in another region
  • Versioning must be enabled on both the source and destination buckets
  • Files in existing bucket are not replicated automatically but all subsequent updated files will be replicated automatically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

S3 transfer acceleration

A
  • Utilises CloudFront edge network to accelerate uploads to S3
  • Instead of direct upload to S3, use a distinct URL to upload to an edge location which will then transfer to S3
  • Users around the world upload to edge locations, it then uses backbone network to upload to S3 in the region that you specify. Can be a faster upload
17
Q

AWS DataSync

A
  • A way of syncing large amounts of data to AWS (often from on-prem)
  • Used with NFS and SMB-compatible file systems
  • Install DataSync agent to start the replication
18
Q

CloudFront 1

A
  • A content delivery network. A system of distributed servers (network) that deliver webpages and other content to a user based on the geographic location of the user, the origin of the webpage and a content deliver server
  • Means we don’t have to pull web content directly from its actual server location
  • Edge location: location where content is cached (not a Region/AZ)
  • Origin: origin of the files that the CDN will distribute (e.g. S3 bucket, EC2 instance etc.)
19
Q

CloudFront 2

A
  • When a user accesses content, it become cached in that Edge location, so if another user from that location accesses the same content, it is already in the location and doesn’t have to be downloaded from the original server
  • Edge locations are not just read only, can write to them to
  • Objects are cached for the life of the TTL
  • You can clear (invalidate) cached objects, but you will be charged
20
Q

CloudFront Signed URLs and Cookies

A
  • Used for restricting access for users (i.e. premium content)
  • A signed URL is for individual files
  • A signed cookie is for multiple files
  • If origin is EC2 then use CloudFront signed URL. If its S3 then use a singular file in S3 and a S3 signed URL
21
Q

Snowball

A
  • Petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of S3
  • Addresses large-scale data transfer issues, such as high network costs, long transfer costs and security
  • Up to one-fifth the cost of high-speed internet
  • Is a physical, secure and encrypted box. AWS erases data after transfer
  • Snowmobile is a shipping-container version of Snowball for Exabyte-scale transfers
22
Q

Storage Gateway

A
  • A service than connects on-prem software appliance with cloud-based storage to provide integration between on-prem IT environment and AWS storage infrastructure
  • Is software appliance available for download as a VM image that yo install as a host in you datacenter
  • Three types:
    1. File Gateway (NFS & SMB): for flat files, stored directly on S3
    2. Volume Gateway (iSCSI). (a) Stored volumes store the entire dataset on site and asynchronously back up to S3; (b) entire dataset is stored on S3 and frequently accessed data cached on site
    3. Tape Gateway - cost-effective way to archive data into AWS
23
Q

Athena

A
  • Interactive query service which enables analysis and querying of data located in S3 using standard SQL
  • Serverless, nothing to provision, pay per query / per TB scanned
  • No complex ETL processes
    Can be used for:
    • query log files
    • generate business reports
    • analyse AWS costs & usage reports
    • run queries on click-stream data
24
Q

Macie

A

Security service that uses ML/NLP to discover, classify and protect sensitive data stored in S3

  • Can provide dashboards, reporting and alerts
  • Can also analyse CloudTrail logs