Designing Systems that Scale Flashcards

1
Q

What are the pros and cons of the single-server design?

A

Pros: Cheap, easy to maintain
Cons: Single point of failure, cannot scale to support more users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a single-server design, what is the main benefit of separating the database from the main server into a separate resource?

A

You can now scale the server and database independently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is vertical scaling?

A

Replace your current webserver with a larger, more expensive webserver with increased cpu and memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the pros and cons of vertical scaling?

A

Pros: easier to maintain
Cons: Expensive, single point of failure, limited scalability as you get larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of a load balancer in horizontal scaling?

A
  1. It distributes work evenly across multiple machines
  2. It can reroute traffic to different servers if one or more goes down
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is horizontal scaling?

A

When you scale your application with a cluster of machines which are given a distributed amount of work by a load balancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the pros and cons of horizontal scaling?

A

Pros: Resilient, infinitely scalable
Cons: More instances to manage, individual web servers must be stateless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the horizontal scaling technique, what does it mean that the individual web servers must be stateless?

A

The web servers have to be stateless because any request from the client must be able to be sent to any web server in the cluster. Can’t assume that one web server knows the state of a previous request and can act on it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: Choose the simplest architecture that meets your projected traffic requirements, but no simpler

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the ‘cold standby’ failover strategy

A

If a database goes down, you use periodic backups to restore your data on another database/server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the pros/cons of the ‘cold standby’ failover strategy

A

Pros: It’s better than nothing, prevents you from losing all your data. Cheaper option and simple.
Cons: Takes a long time to transfer backup data to new database, you’ll lose any data that wasn’t backed up in the last periodic backup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the ‘warm standby’ failover strategy

A

You have a mirror image of your database that can be instantly ready to take the place of the main database (replication)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the pros/cons of the ‘warm standby’ failover strategy

A

Pros: Near instant switch to backup, little/no data loss, easy to turn on (handled by db platforms)
Cons: More expensive, tiny chance that you might still lose data during the switch to backup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the ‘hot standby’ failover strategy

A

Front end webserver is writing the same data to multiple backup hosts at the same time, so if one host goes down the traffic is immediately routed to another host.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the pros/cons of the ‘hot standby’ failover strategy

A

Pros: most resilient, essentially a horizontal scaling solution
Cons: Most expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a shard in a horizontally scaled database?

A

It’s a horizontal partition of your database

17
Q

What are the pros/cons of sharding your database?

A

Pros: It allows ultimate resiliency by backing up specific pieces of your data. Scalable as your database grows
Cons: It can get complicated joining/merging shards back together, so you need to be aware of how to best organize your data across partitions to minimize them

18
Q

What is MongoDB?

A

It’s a distributed NoSQL database which stores and partitions data based on key-value pairs, and manages replication

19
Q

What is Amazon RDS?

A

It’s a service in AWS that manages relational database servers and allows you to quickly query structured data

20
Q

What is Amazon Redshift?

A

It’s a service in AWS that supports data warehouse/lake approaches, enabling it to access and analyze huge amounts of data

21
Q

What are the scalability differences between RDS and Redshift?

A
  • RDS can only scale vertically, but can generate multiple read replicas to support increased user traffic.
  • Redshift can scale vertically and horizontally due to the nature of how it distributes data across nodes
22
Q

What are the tradeoffs between RDS and Redshift?

A

RDS delivers fast query searches and is cheaper, however it can only scale vertically which is a limitation if you need to deal with lots of data. Redshift can scale horizontally, allowing it to store massive amounts of data. However, queries can take much longer due to the time it takes to query multiple nodes.

23
Q

What does it mean that SQL databases follow a relational model?

A

It means that structured data is stored in tables, and foreign keys are used to form relationships across multiple tables

24
Q

What does it mean that NoSQL databases follow a non-relation model?

A

It means that data is stored as documents, key-value pairs, or graphs where there is no defined schema

25
Q

What’s another name for NoSQL databases?

A

Sharded databases

26
Q

What is the “hot spot” or “celebrity” problem in NoSQL databases?

A

A “hot spot” is a shard that gets much more traffic than any other shard for whatever reason. Because of this, you need to be aware/scale for this type of increased traffic.

27
Q

What are the main traits of normalized data?

A

Less storage space, more lookups (because data is stored in separate tables), updates happen in one place (one table)

28
Q

What are the main traits of denormalized data?

A

More storage space (because data is often redundant to keep relevant data close together), one lookup (assuming you organized relevant data together), updates are difficult (because you have to update multiple instances of the same data)

29
Q

What is a big determining factor on whether or not you denormalize your data?

A

It largely depends on the types of calls you plan on making to the data for the overall customer experience

30
Q

What is AWS’s ‘data lake’ solution?

A

S3

31
Q

T/F: data in a data lake has to be structured

A

False, a data lake is just a giant pile of unstructured data