Performance Efficiency - Storage performance options Flashcards
1
Q
Storage considerations
A
- Sharing needs
- Latency needs
- Throughput
- Persistence or volatile
- File size
- Access method: sequentially or randomly
- Availability
- Durability
- Frequency of access and update
2
Q
ACID
A
- Atomicity: either all of the transaction succeeds or none of it does
- Consistency:
- All data will be valid according to all defined rules. Also refers to a query returning the same result every time the same request is made
- Strong consistency means the latest data is returned, but, it may result with higher latency
- With eventual consistency results are less consistent early on, but they are provided much faster with low latency
- Isolation: all transactions will occur in isolation. No transaction will be affected by any other transaction
- Durability: once a transaction is committed, it will remain in the system – even if there’s a system crash immediately following the transaction
3
Q
S3 - Characteristics 1
A
- Offers shared access, low latency, high throughput, durability and data availability. Doesn’t support NFS
- Provides encryption, access management, lifecycle management, and query-in-place (SQL like query capability)
- S3 Reduced Redundancy is an storage option for non critical / reproducible storage with less durability and less cost than S3
- S3 Transfer Acceleration enables fast and secure transfers of files over long distances between a client and S3. It takes advantage of CloudFront’s ELs
4
Q
S3 - Characteristics 2
A
- Objects are private by default. Can store an object with up to 5TB size
- S3 Object Lock allows to store objects using write-once-read-many (WORM) model. It prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely
- S3 is a public service so it’s not under a VPC. To make a private connection can use a VPC gateway endpoint
- Lifecycle rules can be configured to abort all multipart uploads that are failing to complete in a specific time period
5
Q
S3 - Storage classes / tiers 1
A
- Standard: for general-purpose storage of frequently accessed data
- Intelligent tiering:
- For unknown or changing access frequency. Doesn’t require minimum storage duration
- Has some sub-tiers classified by access frequency, and amount of data retrieved
- Can store smaller objects than 128KB
6
Q
S3 - Storage classes / tiers 2
A
- Standard Infrequent Access (Standard IA):
- For data that is accessed less frequently, but requires rapid access when needed. As durable as Standard
- The minimum billable object size is 128 KB. Objects are charged for a minimum of 30 days
- Has less retrieval cost than Standard class
- One Zone Infrequent Access (One Zone IA):
- For the same purpose as Standard IA but with less availability
- Same charges as Standard IA for durability and retrieval
7
Q
S3 - Storage classes / tiers 3
A
- Glacier
- Glacier Deep Archive
- Outposts:
- Delivers object storage to on-premises AWS Outposts environment
- Ideal for workloads with local data residency requirements, and for performance needs that keeps data close to on-premises applications
8
Q
S3 - Glacier Characteristics
A
- Provides Archival storage. Minimum storage time is 90 days
- Offers encryption, access control, and audit logging
- Can query stored data without retrieval
- Latency options (can retrieve data from 1 minute to 12 hours)
- Expedited: premium price, data available in minutes
- Standard: data available in hours
- Bulk: cheaper and data available longer than the rest
9
Q
S3 - Glacier Deep Archive characteristics
A
- Very economical
- Minimum storage time is 180 days
- Can be accessed once or twice in a year and retrieved within 12 hours
10
Q
S3 - Organizing objects using prefixes
A
- A prefix is a logical grouping of the objects in a bucket. It’s similar to a directory name
- A delimiter allows to collect all the keys with a common prefix into a single summary list
- The prefix and delimiter parameters help to organize and then browse the keys hierarchically
- For example, if you store information about cities, you might organize them by country, then by province or state. You might use ‘/’ as the delimiter. So ‘Europe/France/Nouvelle-Aquitaine/Bordeaux’ and ‘North America/Canada/Quebec/Montreal’ would be some of the instances
11
Q
S3 - Presigned URLs
A
- A presigned URL gives access to the object identified in the URL, as long as the creator of the presigned URL has permissions to access that object
- Useful if you need a user / customer to be able to access / upload a specific object, but you don’t require them to have AWS security credentials or permissions
- On a presigned URL creation need the following: security credentials, bucket name, object key, HTTP method (GET to download, PUT to upload), and an expiration date and time
12
Q
S3 - CORS
A
- First S3 receives a preflight request from a browser, then evaluates the CORS configuration for the bucket and uses the first CORS Rule that matches the incoming browser request to enable a cross-origin request
- A CORS configuration is a document with rules that identify the origins to be allowed, the operations (HTTP methods) supported, and other specific operation information. JSON or XML can be used to describe rules
13
Q
EBS - Characteristics 1
A
- It’s always attached to an EC2 instance. Can be multi-attached to as many as 16 instances but with some conditions, only in the same AZ (not across many AZs)
- Provides replication on multiple AZs, access control, and encryption
- Snapshots:
- They’re automatically saved in S3 and constrained to the region they were created
- Can create new volumes from it. Can be generated and accessed by other EC2 instances
- Can copy them across regions
- The EBS fast snapshot restore feature charges for each snapshot and each enabled AZ
14
Q
EBS - Characteristics 2
A
- Elastic volumes is a feature that will automatically grow and shrink a volume based on the size of storing unit. You pay for what you use, not by the provisioned capacity
- RAID arrays allow to use multiple EBS volumes to improve performance or redundancy
- Use RAID 0 when performance is more important than fault tolerance
- Use RAID 1 when fault tolerance is more important than performance
15
Q
EBS - Storage options
A
- HDD (Throughput is the dominant attribute, measured in MB/s):
- Throughput optimized HDD: designed for frequently accessed, throughput-intensive workloads. Throughput: maximum 500 MB/s per volume
- Cold HDD: for less frequently accessed data at the lowest cost. Throughput: maximum 250 MB/s per volume
- SSD (IOPS is the dominant attribute):
- General purpose SSD: a balance of price and performance. Recommended for most workloads. IOPS: maximum 16,000 IOPS per volume
- Provisioned IOPS SSD: highest performance mission-critical, low-latency, or high-throughput workloads. IOPS: maximum 256,000 IOPS per volume
- Previous generation: HDD that can be used for workloads with small datasets where data is accessed infrequently