EBS Flashcards
root volume vs attached volume
data stored on root volume is lost when the instance is terminated
on attached volume - no
Database should be put on attached volume
an instance can have multiple attached EBS volumes
EBS
Elastic Block Store Volume is a network drive you can attach to your instances while they run and you can persist your data on it
not a physical drive, uses network to communicate with the instance, so there might be a latency
can be detached and attached to another instance quickly as long as they are in the same AZ
EBS and AZ
the volume is locked to AZ
to move volume across, you need to shapshot it first
EBS capacity
volume has a provisioned capacity so we need to specify how many GBs and IOPS we want when we create it
you get billed for what you provisioned, not what you are actually using
we can increase capacity as we go along
4 types of EBS volumes
- GP2 (SSD)
- IO1 (SSD)
- STI (HDD)
- SCI (HDD)
GP2
(SSD)
General purpose SSD volume that balances price and performance for a wide variety of workloads
can only be used as boot volume
IO1
highest-performance SSD volume for
- mission-critical applications
- that require sustained IOPS performance
- or more than 16000 IOPS per volume which is GP2 limit
low-latency or high-throughput workloads
can only be used as boot volume
STI
low cost HDD volume designed for frequently accessed, throughput-intensive workloads
streaming workloads requiring consistent, fast throughput at a low price
SC1
lowest cost HDD volume designed for less frequently accessed workloads
throughput oriented storage for large volumes of data that is infrequently accessed
EBS volumes are characterized in
Size
Throughput
IOPS - I/O Ops per second
GP2 use cases
- virtual desktops
- low-latency interactive apps
- development and test environments
GP2 size
1 GB - 16 TiB
GP2 volume IOPS
small GP2 volume can burst IOPS to 3000
max IOPS on GP2 is 16000
the rule is 3 IOPS per GB. If you change the size of the volume, IOPS changes too, but never goes over 16000
IO1 use cases
large database workloads
MongoDB, Cassandra, MS SQL, Oracle …
when you have a critical database
IO1 size
4 GB - 16 TiB
IO1 volume IOPS
min 100 - max 64000 (for Nitro instances), for others 32000
IOPS is provisioned and is called PIOPS. When we change the instance size - the IOPS doesn’t change automatically with it. You need to change it.
max 50 IOPS for 1 GB
ST1 usage
Big data, Data warehouses, Log processing, Apache Kafka
ST1 size
500 GiB - 16 TiB
ST1 IOPS
max 500
max throughput 500 MiB/s - can burst
SC1 size
500 GiB - 16 TiB
SC1 IOPS
250 max
max throughput 250 MiB/s - can burst
so it’s less good and cheaper ST1
disk I/O (i.e. from EBS volumes)
is bandwidth dependent.
throughput
throughput measures the rate at which messages arrive at their destination successfully. Average data throughput tells the user how many packets are arriving at their destination.
measured in bits per second (bps)
network latency
the speed of traffic on your network.
expressed in milliseconds (ms).
The most common measure of latency is called ‘round trip time’ (RTT). As the name suggests, this is the time it takes for a packet to get from one point on the network to another.
Latency from a general point of view is a time delay between the cause and the effect of some physical change in the system being observed, but, known within gaming circles as “lag”, latency is a time interval between the input to a simulation and the visual or auditory response, often occurring because of network delay in online games.
network jitter
a variance in latency, or the time delay between when a signal is transmitted and when it is received. This variance is measured in milliseconds (ms) and is described as the disruption in the normal sequence of sending data packets.
bandwidth
Bandwidth can be measured in bits per second (bps) megabits per second (Mbps) and gigabits per second (Gbps).
Having a high bandwidth doesn’t guarantee high network performance. It is theoretical packet delivery, whereas throughput is practical
If throughput in the network is being affected by network latency, packet loss, and jitter then your service will see delays even if you have a substantial amount of bandwidth available.
EBS snapshots
are incremental - only backup changed blocks
they use I/O so don’t run them while the app is handling a lot of traffic
you don’t have to detach a volume when doing a snapshot but it is recommended
max 100000 snapshots per account
Snapshots are stored in
S3 but you won’t see them directly, but you will get billed for S3 space
Snapshots and AZ
you can copy snapshots across AZ or regions
Snapshots and AMI
you can create AMI from snapshots
EBS volumes restored by snapshots
need to be pre-warmed - use fio or dd command to pre-read the entire volume
Snapshots can be automated by
using Amazon Data Lifecycle Manager
to migrate EBS volume to a different AZ or region
- snapshot the volume
- optionally copy the volume to a different region
- create a volume from the snapshot in the AZ of your choice
When you create an encrypted EBS volume
- data at rest is encrypted inside the volume
- all the data in flight moving between the instance and the volume is encrypted
- all the snapshots are encrypted
- all volumes created from snapshot are encrypted
- when you copy an unencrypted snapshot, you can enable encryption
AES-256
Encryption and latency
Encryption has a minimal impact on latency
How to encrypt an unencrypted EBS volume
- create an EBS snapshot of the volume
- encrypt the EBS snapshot (using copy)
- create new EBS volume from the snapshot - the volume will also be encrypted
- attach the encrypted volume to the original instance
Mbps
megabit per second
(symbol Mbit/s or Mb/s, often abbreviated “Mbps”) is a unit of data transfer rate equal to:
1,000 kilobits per second
1,000,000 bits per second
125,000 bytes per second
125 kilobytes per second
kbps
kilobit per second (symbol kbit/s or kb/s, often abbreviated “kbps”) is a unit of data transfer rate equal to:
1,000 bits per second
125 bytes per second
Gbps
gigabit per second (symbol Gbit/s or Gb/s, often abbreviated “Gbps”) is a unit of data transfer rate equal to:
1,000 megabits per second 1,000,000 kilobits per second 1,000,000,000 bits per second 125,000,000 bytes per second 125 megabytes per second
Tbps
terabit per second (symbol Tbit/s or Tb/s, sometimes abbreviated “Tbps”) is a unit of data transfer rate equal to:
1,000 gigabits per second 1,000,000 megabits per second 1,000,000,000 kilobits per second 1,000,000,000,000 bits per second 125,000,000,000 bytes per second 125 gigabytes per second
Tbps
terabit per second (symbol Tbit/s or Tb/s, sometimes abbreviated “Tbps”) is a unit of data transfer rate equal to:
1,000 gigabits per second 1,000,000 megabits per second 1,000,000,000 kilobits per second 1,000,000,000,000 bits per second 125,000,000,000 bytes per second 125 gigabytes per second
kBps
kilobyte per second (kB/s) (can be abbreviated as kBps) is a unit of data transfer rate equal to:
8,000 bits per second
1,000 bytes per second
8 kilobits per second
MBps
megabyte per second (MB/s) (can be abbreviated as MBps) is a unit of data transfer rate equal to:
8,000,000 bits per second
1,000,000 bytes per second
1,000 kilobytes per second
8 megabits per second
GBps
gigabyte per second (GB/s) (can be abbreviated as GBps) is a unit of data transfer rate equal to:
8,000,000,000 bits per second 1,000,000,000 bytes per second 1,000,000 kilobytes per second 1,000 megabytes per second 8 gigabits per second
TBps
terabyte per second (TB/s) (can be abbreviated as TBps) is a unit of data transfer rate equal to:
8,000,000,000,000 bits per second 1,000,000,000,000 bytes per second 1,000,000,000 kilobytes per second 1,000,000 megabytes per second 1,000 gigabytes per second 8 terabits per second
Mibit/s
mebibit per second
1 mebibit = 220 bits = 1048576bits = 1024 kibibits[3]
This unit is most useful for measuring RAM and ROM chip capacity.
The mebibit is closely related to the megabit which equals 106 bits = 1,000,000 bits.
1 megabit = 106bits = 1000000bits = 1000 kilobits.
MiB/s
mebibyte per second
mebibyte is equal to 1048576bytes, i.e., 1024 kibibytes. The unit symbol for the mebibyte is MiB.
1024 MiB = 1 gibibyte (GiB)
Instance Store
some instances do not come with Root EBS volume
Instead, they come with Instance Store = ephemeral storage
it is physically attached to the machine where your EC2 is
block storage like EBS
some instance types have it attached. Root is still EBS volume but there is also extra storage called ephemeral0
Instance Store Pros
- Better I/O performance (no network involved)
very high IOPS, hundreds of thousands of IOPS - EBS cannot achieve this (64000 IOPS is max) - good for buffer / cache / scratch data / temporary content
- data survives reboots
Instance Store Cons
- on stop or termination instance store is lost
- you can’t resize it on the fly or add new stores to EC2 once you’ve provisione an instance store
- backups must be operated by the user
- risk of data loss if hardware fails so replicate the data across other ISs in other stores
Instance Store capacity
disks up to 7.5 TiB tebibytes for the moment
can be stripped to reach 30 TiB
but once you set up a disk in local instance store, it cannot change its size
TiB
tebibyte
1 tebibyte = 240 bytes = 1099511627776bytes = 1024 gibibytes
1024 TiB = 1 pebibyte (PiB)
RAID
Redundant Array of Independent drives
Raid is just a collection of disks in a pool to become a logical volume.
RAID Stripe
RAID0
sharing data randomly to multiple disk. This won’t have full data in a single disk. If we use 3 disks half of our data will be in each disks.
RAID Mirroring
used in RAID 1
Mirroring is making a copy of same data. In RAID 1 it will save the same content to the other disk too.
EBS is already replicated storage because
it is automatically replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component.
This replication makes Amazon EBS volumes ten times more reliable than typical commodity disk drives.
EBS RAID0 Stripe
if you want to increase performance, for ex., IOPS to 100000
EC2 instance will have one logical volume (RAID0 Stripe) backed by two or more EBS volumes. The data will go to either of them.
EBS RAID1 Mirror
if you want to mirror your EBS volumes
to increase fault tolerance
EC2 instance will have one logical volume (RAID0 Stripe) backed by two or more EBS volumes. The data will go to both of them
EBS RAID
is possible as long as the OS supports it (Windows, Linux). It is not something you configure in AWS console, rather on the OS level in EC2 instance
RAID5, RAID6 and RAID10 are not recommended
EBS RAID0 Stripe disadvantage
if one disk fails, the total logical volume is gone, all the data is failed
increased risk to have faults
EBS RAID0 Stripe advantages
you get more total disk space and IO, increase in performance
EBS RAID0 use case
- an application that needs a lot of IOPS and doesn’t need fault tolerance
- a database that has replication already built-in
EBS RAID1 Mirroring disadvantage
we have to send the data to 2 EBS volumes at the same time, so we have to use 2 times the network throughput
you need an EC2 instance that has more network throughput to handle writing to several EBS volumes at a time
EBS RAID1 use case
When fault tolerance is more important than I/O performance; for example, as in a critical application.
Root EBS volumes of instances
get terminated by default if the EC2 instance gets terminated
you can disable it
You have provisioned an 8TB (8000 GB) gp2 EBS volume and you are running out of IOPS. What is NOT a way to increase performance?
increase EBS volume size because EBS IOPS peaks at 16,000 IOPS. or equivalent 5334 GB.
You can
- mount EBS volumes in RAID0
- change to an IO1
You would like to leverage EBS volumes in parallel to linearly increase performance, while accepting greater failure risks. Which RAID mode helps you in achieving that?
RAID0
Although EBS is already a replicated solution, your company SysOps advised you to use a RAID mode that will mirror data and will allow your instance to not be affected if an EBS volume entirely fails. Which RAID mode did he recommend to you?
RAID1
You would like to have the same data being accessible as an NFS drive cross AZ on all your EC2 instances. What do you recommend?
mount an EFS
You would like to have a high-performance cache for your application that mustn’t be shared. You don’t mind losing the cache upon termination of your instance. Which storage mechanism do you recommend as a Solution Architect?
Instance Store
provides the best disk performance
You are running a high-performance database that requires an IOPS of 210,000 for its underlying filesystem. What do you recommend?
It is possible to run a database on EC2. It is also possible to use instance store, but there are some considerations to have. The data will be lost if the instance is stopped, but it can be restarted without problems. One can also set up a replication mechanism on another EC2 instance with instance store to have a standby copy. One can also have back-up mechanisms. It’s all up to how you want to set up your architecture to validate your requirements. In this case, it’s around IOPS, and we build an architecture of replication and back up around it
you can’t use
- EBS GP2 drive because its max is 16000 IOPS
- EBS IO1 - max is 64000