Resilient Architecture Flashcards

Question 1

Q

How do you autoscale in AWS?

Answer

A

Setup an auto-scaling group.
Setup a load balancer.
Configure auto-scaling to listen to Cloudwatch alarms.

Question 2

Q

What is the difference between HA and Fault Tolerance and DR?

Answer

A

HA is to guarantee maximum uptime

there can be minimal disruption to service but it is restored quickly
an off-road vehicle carrying a spare tire encounters a flat

FT is to work through malfunctioning components in the system

there typically cannot be any loss of functionality during component outage
a plane in the air with engine failure uses redundant second engine
a patient on an operation table on critical monitoring equipment that cannot stop functioning
FT costs a lot to implement and is more complex in design than HA

DR is failure of a larger scale than affects HA or FT

human induced or natural
entire system is compromised or lost
typically solved by having a second physical location to take over, far away from disaster site
backups should be stored off site for on-prem solutions
determine what your RTO and RPOs need to be for the use-case

Question 3

Q

What is Route53?

Answer

A

A DNS service from AWS

Register domains, Global Service single DB
Hosts Zone Files
Managed Nameservers (NS) 4 per domain
Liases with the TLD registrar and provides NS records where a particular domain resides (eg: )
Zone files store record sets

Question 4

Q

How does DNS work? High level part 1

Answer

A

Root Hints file on the DNS resolver (ISP provided) points to the 13 DNS Root servers where the Root Zone lies
Root Zone is authoritative
Root Zone is a DB of the top level domains (.com etc)

Question 5

Q

What are the different types of DNS records?

Answer

A

“A” record points to the IPv4 of the server
“CNAME” - canonical name - points to the “A” record and are alternate names pointing to the same IP (eg, ftp.google.com, mail.google.com)
CNAME only can point to A names not to an IP address (exam question!)
MX records: Points to a server for a specific mail domain
TXT records: Arbitrary text to prove domain ownership

Question 6

Q

What does TTL on a DNS record indicate?

Answer

A

TTL values indicate how long the resolver can cache the IPv4 returned from the domain resolution

Question 7

Q

What is an ALB?

Answer

A

Application Load Balancer

“Target” is a single compute resource
“Target groups” are groups of targets
Rules are evaluated to determine which target group to send requests to
Rules are “path” based or “host” based

Question 8

Q

What are Launch configurations and Launch Templates?

Answer

A

Templates came after Configurations
Allows you to define the configuration an EC2 in advance (ami type, memory, networking, user data, iam role attached etc)
LTs have versions, is recommended over LC

Question 9

Q

What is an ASG?

Answer

A

Auto Scaling Group

automatic scaling for EC2
uses the EC2 configuration within LTs or LCs
3 important values: Min size, Desired and Maximum (eg: 1:2:4)
Provision or terminate to keep at Desired level
Scaling policies based on Metrics
Runs in a VPC across one or more Subnets

Question 10

Q

Types of Auto scaling?

Answer

A

Manual
Scheduled scaling based on time
Dynamic scaling
- simple scaling based on a metric, example: cpu - if CPU > 50% increase desired capacity else remove 1 from desired capcity
- stepped scaling - lets you define more details - add one instance if cpu > 50%, add 3 instances if cpu > 80% (bigger or smaller steps), react in a more extreme way, preferable to simple
- target scaling: eg: 40% desired aggregate cpu across all instances in the group

Question 11

Q

What is cool down period?

Answer

A

EC2 has min billing so bringing in instances in and out too frequently can be costly
Cool down period waits for a the time period before a scaling action is applied since the last scaling action

Question 12

Q

What are NLBs?

Answer

A

Network Load balancer
Only understand TCP and UDP, non-HTTP(s)
~100ms vs ~400ms for ALBs
Rapid scaling - millions of requests per second
1 interface with static IP/AZ, can use EIPs

Question 13

Q

What is SSL Offload

Answer

A

ELBs have 3 types of SSL off load:
1. Bridging - SSL is terminated on the LB, LB needs an SSL Cert matching the domain name, new encrypted connection between ALB and EC2 instances (ALB decrypts and then re-encrypts when talking to EC2 instances so EC2 needs to decrypt which can be an overhead)

Pass through - NLB usually uses this, does not decrypt, passes it through to EC2, cannot decrypt data, AWS does not know what cert you use on the EC2 instance, still has admin and compute overhead on EC2
SSL Offload - ELB has cert, but cert not needed on EC2 instance since connection is not HTTPS. Only ELB decrypts, so no overhead on the EC2 instances

Question 14

Q

What is Session Stickiness?

Answer

A

If enabled, the LB generates a cookie called “aws-alb”
Duration defined by you (1s to 7days)
LB will go to the same backend EC2 instance if the cookie is present

Question 15

Q

What is Boot time to service time?

Answer

A

Time required by AWS to provision EC2, software updates and installation within the OS - for AWS provided AMIs that is in mins.

Question 16

Q

What is SQS

Answer

Study These Flashcards

A

Simple Queue Service
HA, Performant by design
Standard Qs and FIFO Qs 
FIFO guarantees order
Standard Qs try to devlier in order but not guranteed
256K message size max

VisibilityTimeout - Fault Tolerance - Client can explicitly delete the message after processing. If Client dies when processing it then it comes back into Q after VT so another worker can see and process it

DeadLetter Q - problematic messages, corrupt messages can be dropped here for later examination

ASGs can scale instances based on length of a Q

Question 17

Q

What is Fan out architecture WRT SQS and SNS

Answer

Study These Flashcards

A

You publish a message to a SNS topic

The message is distributed to multiple SQS queues with different workloads at the end of each Q

Each workload can then work in parallel on the message received in its own Q

Useful when multiple un-connected things have to be done based on a single event (me)

Question 18

Q

Standard Qs vs FIFO Q

Answer

Study These Flashcards

A

STD - multi lane highway, same msg can be delivered twice, scale much more than FIFO queues but messages could be out of order

FIFO - single lane highway, msgs are delivered once, 3K/s with batching or 300/s without batching

Billed based on request - one request can receive between 1 and 10 msgs, request can return 0 or more messages, so not cost-efficient if you call it very frequently

Can encrypt messages using KMS as it sits in the Q

Question 19

Q

Short Polling vs Long Polling

Answer

Study These Flashcards

A

SP = short duration, could return 0 messages

LP can specify a wait time, upto 20s, it will wait for messages to arrive, this is how you should poll SQS

Question 20

Q

What is Kinesis

Answer

Study These Flashcards

A

Scalable streaming service

Designed to ingest lots of data from lots of apps

Public and HA by design

Persistence - rolling 24 hour window, data stays for 24 hours by default, older data is replaced by new data entering

Lots of producers pushing to a stream

Shard architecture - 1MB ingestion and 2MB consumption capacity, Kinesis Data Records are stored across shards

Question 21

Q

Kinesis Data Firehose

Answer

Study These Flashcards

A

KDF Can move data from a Kinesis stream en-masse into another destination like S3 to store it for a longer time

Question 22

Q

Difference between SQS and Kinesis - how to pick between the two

Answer

Study These Flashcards

A

Is it about ingestion of data or about async communication, de-coupling between entities?

Ingestion of data = Kinesis
Decoupling = SQS

Resilient Architecture Flashcards

Design HA and Fault Tolerant Systems (22 cards)