Storage Flashcards
Difference between EBS and EFS and Object storage?
- EBS can be attached to one EC2 instance only.
- EFS can be shared across multiple EC2 instances.
- EBS Mountable, Bootable
- EFS Mountable, not Bootable
- Object storage - collection of objects, not Mountable, not Bootable
What is the consistency of S3 storage?
S3 offers strong consistency for creates and eventual consistency for updates.
Types of EBS storage?
- GP SSD ($) gp2, gp3
- Provisioned IOPS SSD io1/io2
- Throughput optimized HDD ($$) st1,
- Cold HDD ($) sc1
What are the 3 ways of data egress from Glacier?
Expedited (5mins), standard and bulk (upto 12 hrs)
What is the difference between HA and Fault Tolerant in the context of deploying to AZs?
HA just barely meets the SLA. Fault tolerant fully meets the SLAs under failure of an AZ. So 4 servers in 2 AZs is HA but 8 servers in 2 AZs is fault tolerant if a min of 4 servers are required for SLA.
If you take a snapshot every day and it takes you 10 minutes to recover an instance on failure, what can the best backup RTO and RPO be?
RTO of 10 minutes and RPO of 1 day.
Relation between block size and IOPS?
block size x IOPS = throughput
Talk about EBS
- Allocates block storage (Volumes) to instances
- 1 Volume = 1 AZ but HA in that AZ - all data is replicated within a AZ
- Different storage types - Magnetic, SSD etc
- Billed as GB./month
Dominant Performance Attribute of gp2/io1/st1/sc1?
gp2 and io1 - IOPS oriented
st1 and sc1 - throughput oriented
Can HDD storage used as boot volume?
No. Only SSD volumes can be boot volumes.
Why would you choose SSD over HDD?
- SSD is better suited for random IO - databases,
- HDD is better suited for streaming large amounts of data sequentially - log files, big data use cases
What is burst performance?
To understand burst mode, you must be aware that every gp2 volume regardless of size starts with 5.4 million I/O credits at 3000 IOPS. This means that even for very small volumes, you start with a high-performing volume. This is ideal for “bursty” workloads, such as daily reporting and recurring extract, transform, and load (ETL) jobs. It is also good for workloads that don’t require high-sustained IOPS.
How does this work? Well, as stated earlier, the gp2 volumes start with I/O credit that, if fully used, works out to 3000 IOPS for 30 minutes. The burst credit is always being replenished at the rate of 3 IOPS per GiB per second. Consider a daily ETL workload that uses a lot of I/O. For the daily job, gp2 can burst, and during downtime, burst credit can be replenished for the next day’s run. Now let’s consider a workload that never consumes more IOPS than the burst. Such a workload will continue to see very good IOPS as long as credits are replenished faster than they are consumed.
What are EBS Snapshots?
- backups to s3 of an EBS volume
- first backup is full data
- future snaps are incremental
- volume can be restored from a snapshot
What are EBS Snapshots?
- backups to s3 of an EBS volume
- first backup is full data
- future snaps are incremental
- volume can be restored from a snapshot
What is FSR?
Fast Snapshot Restore - to immediately populate a volume from a snapshot, else populating is done lazily upon demand
- up to 50 FSR per region (50 snap-to-AZs)
EBS Encryption
- Uses KMS to store encryption keys (DEK)
- Accounts can be set to encrypt by default
- each volume uses 1 DEK
- encrypted volumes cannot be changed to decrypted
- curveball question: OS is not aware of encryption - no performance loss - encryption is between the host and the EBS volume
Object versioning - what are the states of an s3 bucket for versioning?
Versioning can be “disabled”
It can be “enabled”, but then cannot be disabled afterwards
It can be “suspended” and can be re-enabled
What are the 3 types of S3 encryption?
- SSE-C - server side customer-provided keys
- SSE-S3 - server side Amazon S3 managed keys
- SSE-KMS - server side with customer master keys (CMK) stored in KMS
How does SSE-C work?
- Customer is responsible for the encryption keys
- When storing an object you are required to provide the key along with the data
- hash of the key is taken and attached to the object, this is a one-way hash
- when asking for an object you provide the same encryption key, the hash is compared and if they match the data is decrypted and returned back
How does SSE-S3 work?
- S3 generates a master key for encryption for all objects
- you cannot influence this master key, cannot change any options, invisible to you, auto-rotated by s3
- for every object stored in the encrypted bucket s3 generates an encryption key, encrypts data using the key, encrypts the key itself with the master key and stores it with the encrypted data, discards un-encrypted key
- no role based separation - s3 admin can view un-encrypted data since S3 manages keys - this can be unacceptable for some regulated industries where separation of role is required (use SSE-KMS)
How does SSE KMS work?
- S3 generates a CMK and stores it in KMS or you can also ask it to use a customer-managed CMK that you generated in KMS
- for every object a DEK is generated from CMK, data is encrypted, and both encrypted DEK and data are stored together
- for decryption the CMK is used to decrypt the DEK which is then used to decrypt the data
- decrypted DEK is then discarded
- if you as a user do not have access to the CMK in KMS or to KMS itself you cannot decrypt the object so role separation is achieved
What are the S3 storage classes?
- S3 standard - replication across at least 3AZs, HTTP/1.1 200 OK indicates object has been stored durably, GB/month billing, used for frequently accessed data
- S3 standard - IA (infrequent access), 3AZs, availability is the same, cheaper to store, per request/charge data-out same as standard, compromises: retrieval fee for every GB, min duration charge of 30 days, min of 128k per object, use for long lived data, dont use for temp data
- S3 1Z IA - same as above but cheaper than IA, only in one AZ, no replication - risk of data loss if AZ fails, same durability of 11 9s but assuming the AZ does not fail, data is replicated within one AZ, use for long lived data, infrequently accessed, non critical, can be easily replaced (think: replica copies across regions, intermediate data that you can afford to lose)
- S3 Glacier, 11 9s, 3AZs, 1/5th of the cost, “expidited” retrieval 1-5 mins, “standard” -3-5 hours, “bulk” 5-12 hours, 40KB min charge, 90 day min billable duration
- S3 Glacier Deep Archive - even more restrictions, 3AZs, 180 day min billable duration, 12 hours to 48 hours restore time
What is S3 Select or Glacier Select?
- SELECT SQL-like statements to retrieve parts of an S3 object to reduce bandwidth used on a huge object
for example - CSV, JSON, Parquet, BZIP etc.
What is Lifecycle Configuration?
- set of rules on a bucket or group of objects which can take Transition or Expiration actions
What does Basline performance of GP2 mean?
Every EBS volume has a Baseline performance IOPS based on its size with a min
- there is a min of 100 IO credits per second regardless of volume
- 3 IO credits/second/gb , so anything under 33.33 gb is not getting you extra credits
- every volume starts off with 5.4million credits
GP2 can burst upto 3000 IOPS and that’s the burst rate
What is EFS
Elastic File System is an implementation of NFSv4
Mounted on many EC2 instances
Can be mounted on Linux (exam)
Isolated to the VPC that it is provisioned into, but can be mounted across VPCs with some special steps
Access is via mount targets
Types: General purpose (default) and MAX IO performance (scale to high throughput but high latency)
Bursting and IO provisioned throughput modes
Storage classes: Standard and IA classes (infrequent access)
Lifecycle policies can be used with the classes
What are IOPS
Input Output Operations Per Second
- one IOP is a 16kb chunk of data transferred in 1 second
- if you transfer 160kb of data that represents 10 IOPs
What does Basline performance of GP2 mean?
Every EBS volume has a Baseline performance IOPS based on its size with a min
- streaming into the bucket is a min of 100 IO credits per second regardless of volume
- 3 IO credits/second/gb , so anything under 33.33 gb is not getting you extra credits
GP2 can burst upto 3000 IOPS and that’s the burst rate
What is EFS
Elastic File System is an implementation of NFSv4
Mounted on many EC2 instances
Can be mounted on Linux
Isolated to the VPC that it is provisioned into, but can be mounted across VPCs with some special steps
Access is via mount targets