S3 Flashcards
S3 Object Lock
** Exam Tips ****
You can S3 object Lock to store object using write once, read many(WORM) model. it can help you prevent objects from being deleted or modified for a fixed amount of time or indefinitely.
you can add S3 object to add extra layer of protection against object changes and deletion.
We got s3 object lock in 2 modes
1. Governance Mode: Users can’t overwrite or delete object version or alter its lock settings unless they have special permissions. with governance mode you protect objects against being deleted by mot users but you can still grant some users permissions to alter the retention settings or delete the object if necessary.
- Compliance mode: a protected object version can’t be overwritten or deleted by any user including the root user in your AWS account Compliance mode ensures the object version can’t be overwritten or deleted for the duration of the retention period.
Glacier Vault lock
** Exam Tips ****
S3 Glacier Vault lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier vaults with a vault lock policy. you can specify controls such as WORM, in a vault lock policy and lock the policy from future edits. Once locked the policy can no longer be changed.
S3 performance
1. S3 prefixes 2. S3 limitations when using KMS 3. Multipart Uploads 4. S3 Byte-Range Fetches (Downloads)
** Exam Tips ****
- S3 Prefixes:
mybucketname/folder1/subfolder1/image.jpg > /folder1/subfolder1 is the prefix. prefix is nothing but in-between bucket name and object name.
you can also achieve a high number of requests:3500 PUT/COPY/POST/DELETE and 5500 GET/HEAD requests per second per prefix.
you can get better performance by spreading your reads across different prefixes. for example if you are using two prefixes you can achieve 11k requests per second.
- S3 limitations when using KMS:
If you are using SSE-KMS to encrypt your objects in s3, you must keep in mind the KMS limits.
When you upload a file, you will call GenerateDatakey in the KMS API
when you download a file, you will call Decyrpt in the KMS API
* Uploading/downloading will count toward the KMS quota * Region-specific, however its either 5500, 10,000, or 30,000 requests per second. * Currently you cannot request quota increase for KMS.
- Multipart Uploads:
* Recommended for files over 100MB * Required for files over 5GB * Parallelize uploads (Increases efficiency)
- S3 Byte-Range Fetches (Downloads):
* Parallelize downloads by specifying byte ranges * If there is a failure in the download, it's only for a specific byte range
Can be used to speed by downloads also can be used to download partial amounts of the file(e.g,header information)
S3 Select and Glacier Select
** Exam Tips ****
S3 Select:
With S3 select you can use a simple SQL expressions to return only the data from the store your are interested instead of retrieving the entire object. this means you are dealing with an order of magnitude less data, which improves the performance of underlying applications
Glacier Select:
Glacier select allows you to run SQL queries against Glacier directly.
AWS organizations & Consolidated Billing
** Exam Tips ****
AWS Organizations:
AWs organizations is an account management service that enables you to consolidate multiple AWS accounts into an organization that you create and centrally manage.
Consolidated Billing:
It takes aggregate of all aws accounts. Paying account
Advantages of consolidated billing:
One bill per AWS account
Very easy to track charges and allocate costs
Volume pricing discount
Enable/disable AWS accounts using service control policies either on OU or an individual account
S3 Cross account access:
Cross Region Replication
***** Exam Tips*********
S3 Cross account access:
3 different ways to share s3 buckets across accounts
- using bucket policies & IAM (applies across the bucket). Programmatic access only
- using Bucket ACL’s & IAM ( individual objects). Programmatic access only
- Cross-account IAM Roles. Programmatic and console access.
Cross Region Replication :
- When Cross region replication is enabled on source bucket and if we upload an object in a source bucket, we have to make the object public to access it.
Once the same object is replicated in destination bucket do we need to make it public in the destination bucket? YES
- Versioning must be enabled on both source and destination buckets.
- Files in an existing bucket are not replicated automatically.
- All subsequent updated files will be replicated automatically
- Delete markers are not replicated.
- Deleting individual versions or delete markers will not be replicated
S3 Transfer acceleration
**** Exam Tips ******
S3 transfer acceleration utilizes the cloud front edge network to accelerate your uploads to S3.
instead of uploading directly to your S3 bucket, you can use a distinct URL to upload directly to an edge location which will then transfer that file to S3.
You will get distinct URL to upload to:
bucket-name.s3-accelerate.amazonaws.com
AWS datasync
** Exam Tips **
AWS datasync allows you to move large amounts of data into AWS.
If we are using on-premises data center we install AWS data sync agent on a server and connect to you NAS or filesystem to copy data to AWS and write data from AWS
Datasync automatically encrypts data and accelerates transfer over the WAN. datasync performs data integrity checks in-transit and at-rest.
Exam Tips:
- used to move large amounts of data from on-premises to AWS
- used with NFS and SMB compatible file systems
- Replication can be done hourly/weekly or daily
- Install data sync agent to start the replication
- Can be used to replicate EFS to EFS
CloudFront
Cloudfront Signed URL’s & Cookies and S3 signed URL’s
***** Exam Tips *******
Cloud front is a Global sevrice
A content Delivery network (CDN) is a system of distributed servers(Network) that deliver webpages and other web contents to user based on their geographic locations of the user, the origin of the webpage and content delivery server.
Edge location: This is the location where content will be cached. this is separate to region/AZ
Origin: This is origin of all the files that the CDN will distribute. this can be an S3 bucket an EC2 instance and Elastic load balancer or Route53
Distribution: this is the name given the CDN which consists of collection of edge locations.
Amazon cloud front can be used to deliver your entire website including dynamic, static, streaming and interactive content using a global network of edge locations. Requests for your content is automatically routed to nearest edge locations, s content is delivered with the best performance.
- Web Distribution : Typically used for websites.
- RTMP: Used for media streaming
Edge locations are not just READ only– we can write to them (i.e put an object to them) example: S3 transfer acceleration
You can clear cached objects(invalidate cached objects), but you will be charged.
Cloudfront Signed URL’s & Cookies and S3 signed URL’s :
- A signed URL is for individual files.
1 file = 1URL
if we want user to access one single file then we will use signed URL - A signed cookie is for multiple files.
1 cookie = multiple files
If we want the user to access all the lectures then use signed cookie - When we create a signed URL or Signed cookie we attach a policy and the policy can include :
URL Expiration IP address ranges Trusted signers ( which AWS account can create signed URl's)
How signed URL’s work :
Our cloud front URL connects to backend S3 bucket/EC2 instance, it connects using OAI (origin access identity) to access the origin see the video for more clear information
Cloud front signed URL’s Feature:
- can have different origins. Does not have to be EC2
- key-pair is account wide and managed by the root user
- can utilize caching features
- can filter by date, IP address, expiration, path
S3 Signed URL’s:
Issues a requests as the IAM user who creates the presigned URL.
if the user has S3 bucket access then they are using s3 signed URL’s.
Snowball
Snowball Edge
Snowmobile
**** Exam Tips *****
Snowball:
is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of date into and out of AWS. Using snowball addresses common challenges with large scale data transfers including high network costs, long transfer times and security concerns.
Snowball comes in either a 50Tb or 80Tb in size.
Snowball uses multiple layers of security designed to protect your data including tamper-resistant enclosure, 256 bit encryption and an industry standard trusted platform module(TPM) designed to ensure both security and full chain of custody.
Snowball Edge:
Snowball edge is a 100TB data transfer device with on-board storage and compute capabilities.
Snowmobile:
AWS snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS. you can transfer upto 100PB per snowmobile.
Storage Gateway
***** Exam Tips *********
Aws Storage gateway is a service that connects an on-premises software appliance with cloud based storage to provide seamless and secure integration between an on-premises IT environment and AWS storage infrastructure . The service enables you to securely store data to the AWS cloud for scalable and cost effective storage.
AWS storage Gateway software appliance is available for download as a virtual machine (VM) image that you install on a host in your data center. Storage gateway supports either VMWare ESXI or Microsoft Hyper-V.
3 different types of storage gateways
- File Gateway (NFS & SMB) : files are stored as objects in your S3 buckets
Ownership, Permissions, and timestamps are durably store in S3 in the user-metadata of the object associated with the file. - Volume Gateway (iSCSI)
Date written to these volumes can be asynchronously backed up as point-in-time snapshots of your volumes, and store in the cloud as Amazon EBS snapshots. snapshots are incremental backups that capture only changed blocks
* Stored Volumes
Stored volumes let you store your primary data locally, while asynchronously backing up that data to AWS. This data is asynchronosly backed up to amazon s3 in the form of Amazon elastic block store
* Cached Volumes
cached volumes let you use S3 as your primary data storage while retaining frequently accessed data locally in your storage gateway. - Tape Gateway (VTL)
Tape Gateway offers a durable, cost effective solution to archive your data in the AWS Cloud
Athena vs Macie
*** Exam Tips *****
Athena:
Interactive query service which enables you to analyze and query data located in S3 using standard SQL
- Serverless, nothing to provision, pay per query/per TB scanned
- No need to setup complex Extract/Transform/Load(ETL) processes
- Works directly with data stored in S3
What can Athena Be used for?
- Can be used to query log files store in S3 e.g, ELB logs, S3 access logs etc.
- Generate business reports on data stores in S3
- Analyse AWS cost and usage reports
*Run queries on click stream data
Macie:
What is PII(Personally Identifiable information):
Personal data used to establish an individual’s identity
This data could be exploited by criminals, used in identity theft and financial fraud.
Home address, email address, SSN
Passport number, derivers license number, D.O.B, phone number, bank account, credit card number.
Macie is a security service which uses machine learning and NLP(natural language processing) to discover, classify and protect sensitive data stored in S3
- uses AI to recognize if your S3 objects contains sensitive data such as PII
- Dashboards, reporting and alerts
- works directly with data store in S3
- can also analyze cloudTrail logs
- Great for PCI-DSS and preventing ID theft.
S3 & IAM Summary
read exam tips before exam and FAQ’s