Scaling Flashcards

1
Q

What does DNS stand for?

A

Domain Name Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the DNS Server do?

A

Converts website domain names to IP Addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does HTTP stand for?

A

Hypertext Transfer Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does IP stand for?

A

Internet Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does JSON stand for?

A

JavaScript Object Notation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Under what circumstances might you use a non-relational database over a relational database?

A
  • you need super-low latency
  • unstructured data
  • You only need to serialize / deserialize data
  • You need to store a large amount of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 common types of non-relational databases?

A
  1. key-value stores
  2. graph stores
  3. column stores
  4. document stores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a join operation?

For databases

A

Join combines data from multiple tables into a new dataset based on a specified condition (like a common field)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Do non-relational databases support join operations?

A

Generally, no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between vertical scaling and horizontal scaling?

A
  • Vertical Scaling involves making your existing resources more powerful
  • Horizontal Scaling involves adding more resources to your pool of resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 2 reasons you would prefer horizontal scaling over vertical scaling

A
  1. Vertical scaling has a hard limit
  2. Vertical scaling does not have failover/redundancy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a load balancer do?

A

distributes incoming traffic among servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the context of Database Replication

Explain the difference of master DBs and slave DBs

A
  • master DBs handle all write operations
  • slave DBs handle all read operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are 3 key benefits of database replication?

A
  1. Performance (parallelization)
  2. Reliability
  3. High Availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the data flow for a read-through cache

A

read(x):
- If x not in cache:
- cache[x] := db.read(x)
- return cache[x]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the data flow for a write-through cache

A

write(x, v):
- If x in cache:
- cache[x] := v
- db.write(x, v)

17
Q

Can a cache be both read-through AND write-through?

A

Yes

18
Q

Do you need a different cache in each datacenter?

A

Yes!

otherwise the cache can become a SPOF

19
Q

In cacheing

What is an expiration policy?

A

How long you wait before removing data from the cache

20
Q

What problem can happen if a cache expiration policy is too small on a cache?

A

You lose the speedup benefits of cacheing

21
Q

What problem can happen if a cache expiration policy is too large on a cache?

A

Data can become stale

22
Q

What does SPOF stand for?

A

Single Point Of Failure

23
Q

What is the best situation for cacheing?

A

When data is read frequently but modified infrequently

24
Q

What are the 5 key considerations around cacheing?

A
  1. effectiveness (high read, low write)
  2. Expiration Policy
  3. Consistency
  4. Failure Mitigation (avoid becoming SPOF)
  5. Invalidation Policy
25
Q

What does CDN stand for?

A

Content Delivery Network

26
Q

What is a CDN?

A

A network of geographically dispersed servers used to deliver static content

27
Q

What is the typical cost structure for a CDN?

A

charged per data transfer in/out

28
Q

When a user visits a website, which CDN server should deliver content to the user?

A

The CDN server geographically closest to the user

29
Q

Why is stateless architecture generally preferable to stateful architecture?

A

Because requests from the same client can be routed to different servers

without the overhead of sticky sessions

So, better decoupling

30
Q

How might we be able to store “state” data in a stateless architecture?

A

Keep state data in a separate data store from the rest of the web layer architecture

31
Q

How might we improve website availability / performance across wider geographical areas?

A

Use multiple data centers (thing AWS regions / AZs), and geo-load balancing

Using a CDN helps with geo-performance but not availability

32
Q

Describe a message queue architecture

A

(think SQS):

X Producers putting work on the queue, Y consumers taking work off the queue; decouples producer and consumer work

33
Q

How can you perform horizontal scaling at the database level without replicating your database?

A

DB sharding (e.g. with a shard hash of primary_key % num_shards)

34
Q

What are 3 downsides of sharding?

A
  1. Resharding after growth
  2. Celebrity Problem or an uneven traffic distribution
  3. Complicated Joins (can be mitiigated by denormalizing data)
35
Q

What are the 8 big ideas for scaling?

A
  1. Statelessness
  2. Redundancy
  3. Cacheing
  4. Multiple Data Centers
  5. CDNs for static assets
  6. Sharding in Data Tier
  7. Decoupling
  8. Logging / Monitoring / Automation