S3/Glacier Flashcards
What is S3?
Simple Storage Service. It provides object based (blob) storage. It was one of the first AWS services introduced back in 2006.
Why would I use S3?
Analytics: Data lakes (Athena, Redshift Spectrum, QuickSight), IoT Streaming Data repository (Kinesis Firehose), AI/ML Storage (Rekognition, Lex, MXNet) , Storage class analysis (S3 Mgmt analytics)
Static Web Hosting: simple and massively scalable static website hosting
BitTorrent: use the BitTorrent protocol to retrieve any publicly available object by automatically generating a .torrent file
What is the largest S3 object size?
5TB
What is the largest object in a single PUT?
5GB
When is it recommended to use multi-part uploads?
If yourfile is larger than 100MB
What is a key?
Is NOT a file path though it looks like one pointing to an object store. It’s the name of the record in the file store
What are S3 storage classes and what is the purpose of each?
- Standard: frequently accessed data
- Standard-IA: long-lived, infrequently access data
- One Zone-IA: long-lived, infrequently accessed, non-critical data
- Reduced redundancy: frequently accessed, non-critical data
- Intelligent-tiering; long-lived data with changing or unknown access patterns
- –
i. Glacier: long-term data archiving with retrieval times ranging from minutes to hours
ii. Glacier Deep Archive: long-term data archiving with retrieval times within 12 hours
What is intelligent tiering?
Moves files to the next tier based on data type/usage
Does intelligent tiering add to the cost?
Yes, it adds some but you will save money due to changing to lower tiers
What is intelligent tiering archive?
Automatically moves data to Glacier or DeepGlacier after a certain period of time. This is NOT lifecycle management
What is S3 Lifecycle Management?
It moves data to other storage classes at a set time.
What are the benefits of S3 Lifecycle Management?
- Optimize storage costs
- Adhere to data retention policies using automation
- Helps keep S3 volumes well-maintained
- Data destruction is one of the more difficult tasks and this helps provide clarity and enforcement
What are the Lifecycle rules based on?
- Prefixes
- Tags
- Current vs previous versions
What is storage class analysis?
A useful tool within S3 that helps you ensure you are using your storage in the most cost-effective manner. You can run reports to view the frequency at which data is accessed and then potentially change storage type.
What is the “Requester Pays” cost option?
The requester rather than the bucket owner pays for requests AND data transfer
Does S3 support tagging?
Yes, assign tags to objects for use in costing, billing, security etc
What is an S3 Event?
This occurs when an action is taken on a bucket or object and triggers notifications to SNS, SQS, or Lambda
What is transfer acceleration?
Speeds up data uploads using CloudFront (PoP locations) in reverse
How is S3 secured?
- Resource-based through object ACL and bucket policies
- User-based through IAM policies
- Optional multi-factor authentication before delete
- Through encryption at rest and in transit
How is MFA used in S3?
- Safeguards against accidental deletion of an object
2. Safeguards against changing the versioning state of your bucket
What are the options for encryption at rest?
- SSE-S3: Use S3’s existing encryption key for AES-256
- SSE-C: Upload your own AES-256 key which S3 will use when it writes the objects
- SSE-KMS: Use a key generated and managed by AWS KMS
- Client-Side: Encrypt objects using own local encryption process before uploading to S3 (PGP, GPG)
What is PGP?
Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for signing, encrypting, and decrypting texts, e-mails, files, directories, and whole disk partitions and to increase the security of e-mail communications. Phil Zimmermann developed PGP in 1991.
PGP encryption uses a serial combination of hashing, data compression, symmetric-key cryptography, and finally public-key cryptography; each step uses one of several supported algorithms. Each public key is bound to a username or an e-mail address.
What is GPG?
GNU Privacy Guard (GnuPG or GPG) is a free-software replacement for Symantec’s PGP cryptographic software suite.
GnuPG is a hybrid-encryption software program because it uses a combination of conventional symmetric-key cryptography for speed, and public-key cryptography for ease of secure key exchange, typically by using the recipient’s public key to encrypt a session key which is used only once.
Why would you use PGP vs GPG?
PGP is used by the software of the RSA and the algorithm of IDEA encryption, and, on the other hand, GPG is used in software having advanced encryption of NIST and AES, which are standardized forms of by nature.
PGP has restrictions when it comes to using for personal and commercial use, and on the other hand, GPG can be used in both personal and commercial services by downloading the free digital signature and encrypted program.
PGP is actually owned by a company called Symantec, which is a proprietary solution, while, on the other hand, GPG is a source that is open to all in a standard form.
How does S3 protect data?
- Versioning
- Multi-factor authentication
- Cross-region replication
What is versioning?
A means of keeping multiple variants of an object in the same bucket. New version with each write. It enables “roll-back” and “un-delete” capabilities
How can you use S3 versioning?
To preserve, retrieve, and restore every version of every object stored in an S3 bucket. You can easily recover from both unintended user actions and application failures
Do old versions count towards the bill?
Yes, until they are permanently deleted
Is versioning integrated with lifecycle management?
Yes, you can use Lifecycle management to delete old versions automatically after a certain number of days
Why would I use Cross-region replication?
- Security
- Compliance
- latency
What are some characteristics of Glacier?
- It’s a service by itself with its own API, console, etc
- Cheap, slow to respond, and seldom accessed
- Used by AWS Storage Gateway Virtual Tape Library (VTL)
- Integrated with AWS S3 via Lifecycle Management
- Faster retrieval speed options if you pay more, though it is still meant to be long-term storage. It’s not fast enough for online content.
What are the components of Glacier?
- Glacier Vault (like an S3 bucket)
- Archive – like an S3 object
- Policies with Access
What is a glacier policy?
It defines what rules the vault must obey
What is Glacier Vault Lock?
- It is different than the vault access policy
- It enforces rules like no deletes or MFA
- It’s immutable meaning it can’t change though it can be overwritten or deleted.
What are the characteristics of an archive?
- It can be a file including a zip, tar, etc
- The max size is 40TB
- Immutable
How do I create a Vault Lock?
- Create a Vault Lock
- Initiate a vault lock
- 24 hour timer starts to confirm vault lock is performing
i. If it elapses and you don’t confirm it the process aborts
ii. If you complete the lock it sets permanently
What happens if you delete an object in a versioning-enabled S3 bucket?
Amazon S3 inserts a delete marker instead of removing the object permanently. The delete marker becomes the current object version
Does the SOAP API in S3 support versioning?
SOAP support over HTTP is deprecated, but it is still available over HTTPS. New Amazon S3 features are not supported for SOAP
How are objects charged in S3 versioning?
Normal Amazon S3 rates apply for every version of an object stored and transferred.
In what versioning states can an S3 bucket object reside?
Unversioned (the default)
Versioning-enabled
Versioning-suspended
After you version-enable a bucket, it can never return to an unversioned state.
What happens to objects that existed prior to enabling versioning in S3?
Objects that are stored in your bucket before you set the versioning state have a version ID of null. When you enable versioning, existing objects in your bucket do not change. What changes is how Amazon S3 handles the objects in future requests.
What happens if you have an object expiration lifecycle policy in your unversioned bucket and you want to maintain the same permanent delete behavior when you enable versioning?
You must add a noncurrent expiration policy. The noncurrent expiration lifecycle policy manages the deletes of the noncurrent object versions in the version-enabled bucket.
What are the transfer acceleration URLs?
xyz. s3-accelerate.amazonaws.com
xyz. s3-accelerate.dualstack.amazonaws.com
What is the S3 Transfer Acceleration Speed Comparison tool used for?
comparing general upload speed across different AWS regions