Designing Systems that Scale Flashcards by Casey Graves

What are the pros and cons of the single-server design?

Pros: Cheap, easy to maintain
Cons: Single point of failure, cannot scale to support more users

How well did you know this?

Not at all

Perfectly

In a single-server design, what is the main benefit of separating the database from the main server into a separate resource?

You can now scale the server and database independently

How well did you know this?

Not at all

Perfectly

What is vertical scaling?

Replace your current webserver with a larger, more expensive webserver with increased cpu and memory

How well did you know this?

Not at all

Perfectly

What are the pros and cons of vertical scaling?

Pros: easier to maintain
Cons: Expensive, single point of failure, limited scalability as you get larger

How well did you know this?

Not at all

Perfectly

What is the purpose of a load balancer in horizontal scaling?

It distributes work evenly across multiple machines
It can reroute traffic to different servers if one or more goes down

How well did you know this?

Not at all

Perfectly

What is horizontal scaling?

When you scale your application with a cluster of machines which are given a distributed amount of work by a load balancer

How well did you know this?

Not at all

Perfectly

What are the pros and cons of horizontal scaling?

Pros: Resilient, infinitely scalable
Cons: More instances to manage, individual web servers must be stateless

How well did you know this?

Not at all

Perfectly

In the horizontal scaling technique, what does it mean that the individual web servers must be stateless?

The web servers have to be stateless because any request from the client must be able to be sent to any web server in the cluster. Can’t assume that one web server knows the state of a previous request and can act on it.

How well did you know this?

Not at all

Perfectly

T/F: Choose the simplest architecture that meets your projected traffic requirements, but no simpler

True

How well did you know this?

Not at all

Perfectly

Explain the ‘cold standby’ failover strategy

If a database goes down, you use periodic backups to restore your data on another database/server

How well did you know this?

Not at all

Perfectly

Explain the pros/cons of the ‘cold standby’ failover strategy

Pros: It’s better than nothing, prevents you from losing all your data. Cheaper option and simple.
Cons: Takes a long time to transfer backup data to new database, you’ll lose any data that wasn’t backed up in the last periodic backup

How well did you know this?

Not at all

Perfectly

Explain the ‘warm standby’ failover strategy

You have a mirror image of your database that can be instantly ready to take the place of the main database (replication)

How well did you know this?

Not at all

Perfectly

Explain the pros/cons of the ‘warm standby’ failover strategy

Pros: Near instant switch to backup, little/no data loss, easy to turn on (handled by db platforms)
Cons: More expensive, tiny chance that you might still lose data during the switch to backup

How well did you know this?

Not at all

Perfectly

Explain the ‘hot standby’ failover strategy

Front end webserver is writing the same data to multiple backup hosts at the same time, so if one host goes down the traffic is immediately routed to another host.

How well did you know this?

Not at all

Perfectly

Explain the pros/cons of the ‘hot standby’ failover strategy

Pros: most resilient, essentially a horizontal scaling solution
Cons: Most expensive

How well did you know this?

Not at all

Perfectly

What is a shard in a horizontally scaled database?

Study These Flashcards

It’s a horizontal partition of your database

What are the pros/cons of sharding your database?

Study These Flashcards

Pros: It allows ultimate resiliency by backing up specific pieces of your data. Scalable as your database grows
Cons: It can get complicated joining/merging shards back together, so you need to be aware of how to best organize your data across partitions to minimize them

What is MongoDB?

Study These Flashcards

It’s a distributed NoSQL database which stores and partitions data based on key-value pairs, and manages replication

What is Amazon RDS?

Study These Flashcards

It’s a service in AWS that manages relational database servers and allows you to quickly query structured data

What is Amazon Redshift?

Study These Flashcards

It’s a service in AWS that supports data warehouse/lake approaches, enabling it to access and analyze huge amounts of data

What are the scalability differences between RDS and Redshift?

Study These Flashcards

RDS can only scale vertically, but can generate multiple read replicas to support increased user traffic.
Redshift can scale vertically and horizontally due to the nature of how it distributes data across nodes

What are the tradeoffs between RDS and Redshift?

Study These Flashcards

RDS delivers fast query searches and is cheaper, however it can only scale vertically which is a limitation if you need to deal with lots of data. Redshift can scale horizontally, allowing it to store massive amounts of data. However, queries can take much longer due to the time it takes to query multiple nodes.

What does it mean that SQL databases follow a relational model?

Study These Flashcards

It means that structured data is stored in tables, and foreign keys are used to form relationships across multiple tables

What does it mean that NoSQL databases follow a non-relation model?

Study These Flashcards

It means that data is stored as documents, key-value pairs, or graphs where there is no defined schema

What's another name for NoSQL databases?

Sharded databases

What is the "hot spot" or "celebrity" problem in NoSQL databases?

A "hot spot" is a shard that gets much more traffic than any other shard for whatever reason. Because of this, you need to be aware/scale for this type of increased traffic.

What are the main traits of normalized data?

Less storage space, more lookups (because data is stored in separate tables), updates happen in one place (one table)

What are the main traits of denormalized data?

More storage space (because data is often redundant to keep relevant data close together), one lookup (assuming you organized relevant data together), updates are difficult (because you have to update multiple instances of the same data)

What is a big determining factor on whether or not you denormalize your data?

It largely depends on the types of calls you plan on making to the data for the overall customer experience

What is AWS's 'data lake' solution?

T/F: data in a data lake has to be structured

False, a data lake is just a giant pile of unstructured data

Designing Systems that Scale Flashcards

(31 cards)