S3/Glacier Flashcards
What is S3?
Simple Storage Service. It provides object based (blob) storage. It was one of the first AWS services introduced back in 2006.
Why would I use S3?
Analytics: Data lakes (Athena, Redshift Spectrum, QuickSight), IoT Streaming Data repository (Kinesis Firehose), AI/ML Storage (Rekognition, Lex, MXNet) , Storage class analysis (S3 Mgmt analytics)
Static Web Hosting: simple and massively scalable static website hosting
BitTorrent: use the BitTorrent protocol to retrieve any publicly available object by automatically generating a .torrent file
What is the largest S3 object size?
5TB
What is the largest object in a single PUT?
5GB
When is it recommended to use multi-part uploads?
If yourfile is larger than 100MB
What is a key?
Is NOT a file path though it looks like one pointing to an object store. It’s the name of the record in the file store
What are S3 storage classes and what is the purpose of each?
- Standard: frequently accessed data
- Standard-IA: long-lived, infrequently access data
- One Zone-IA: long-lived, infrequently accessed, non-critical data
- Reduced redundancy: frequently accessed, non-critical data
- Intelligent-tiering; long-lived data with changing or unknown access patterns
- –
i. Glacier: long-term data archiving with retrieval times ranging from minutes to hours
ii. Glacier Deep Archive: long-term data archiving with retrieval times within 12 hours
What is intelligent tiering?
Moves files to the next tier based on data type/usage
Does intelligent tiering add to the cost?
Yes, it adds some but you will save money due to changing to lower tiers
What is intelligent tiering archive?
Automatically moves data to Glacier or DeepGlacier after a certain period of time. This is NOT lifecycle management
What is S3 Lifecycle Management?
It moves data to other storage classes at a set time.
What are the benefits of S3 Lifecycle Management?
- Optimize storage costs
- Adhere to data retention policies using automation
- Helps keep S3 volumes well-maintained
- Data destruction is one of the more difficult tasks and this helps provide clarity and enforcement
What are the Lifecycle rules based on?
- Prefixes
- Tags
- Current vs previous versions
What is storage class analysis?
A useful tool within S3 that helps you ensure you are using your storage in the most cost-effective manner. You can run reports to view the frequency at which data is accessed and then potentially change storage type.
What is the “Requester Pays” cost option?
The requester rather than the bucket owner pays for requests AND data transfer
Does S3 support tagging?
Yes, assign tags to objects for use in costing, billing, security etc
What is an S3 Event?
This occurs when an action is taken on a bucket or object and triggers notifications to SNS, SQS, or Lambda
What is transfer acceleration?
Speeds up data uploads using CloudFront (PoP locations) in reverse
How is S3 secured?
- Resource-based through object ACL and bucket policies
- User-based through IAM policies
- Optional multi-factor authentication before delete
- Through encryption at rest and in transit
How is MFA used in S3?
- Safeguards against accidental deletion of an object
2. Safeguards against changing the versioning state of your bucket
What are the options for encryption at rest?
- SSE-S3: Use S3’s existing encryption key for AES-256
- SSE-C: Upload your own AES-256 key which S3 will use when it writes the objects
- SSE-KMS: Use a key generated and managed by AWS KMS
- Client-Side: Encrypt objects using own local encryption process before uploading to S3 (PGP, GPG)
What is PGP?
Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for signing, encrypting, and decrypting texts, e-mails, files, directories, and whole disk partitions and to increase the security of e-mail communications. Phil Zimmermann developed PGP in 1991.
PGP encryption uses a serial combination of hashing, data compression, symmetric-key cryptography, and finally public-key cryptography; each step uses one of several supported algorithms. Each public key is bound to a username or an e-mail address.
What is GPG?
GNU Privacy Guard (GnuPG or GPG) is a free-software replacement for Symantec’s PGP cryptographic software suite.
GnuPG is a hybrid-encryption software program because it uses a combination of conventional symmetric-key cryptography for speed, and public-key cryptography for ease of secure key exchange, typically by using the recipient’s public key to encrypt a session key which is used only once.
Why would you use PGP vs GPG?
PGP is used by the software of the RSA and the algorithm of IDEA encryption, and, on the other hand, GPG is used in software having advanced encryption of NIST and AES, which are standardized forms of by nature.
PGP has restrictions when it comes to using for personal and commercial use, and on the other hand, GPG can be used in both personal and commercial services by downloading the free digital signature and encrypted program.
PGP is actually owned by a company called Symantec, which is a proprietary solution, while, on the other hand, GPG is a source that is open to all in a standard form.