Scaling Flashcards
What does DNS stand for?
Domain Name Service
What does the DNS Server do?
Converts website domain names to IP Addresses
What does HTTP stand for?
Hypertext Transfer Protocol
What does IP stand for?
Internet Protocol
What does JSON stand for?
JavaScript Object Notation
Under what circumstances might you use a non-relational database over a relational database?
- you need super-low latency
- unstructured data
- You only need to serialize / deserialize data
- You need to store a large amount of data
What are the 4 common types of non-relational databases?
- key-value stores
- graph stores
- column stores
- document stores
What is a join operation?
For databases
Join combines data from multiple tables into a new dataset based on a specified condition (like a common field)
Do non-relational databases support join operations?
Generally, no
What is the difference between vertical scaling and horizontal scaling?
- Vertical Scaling involves making your existing resources more powerful
- Horizontal Scaling involves adding more resources to your pool of resources
What are 2 reasons you would prefer horizontal scaling over vertical scaling
- Vertical scaling has a hard limit
- Vertical scaling does not have failover/redundancy.
What does a load balancer do?
distributes incoming traffic among servers.
In the context of Database Replication
Explain the difference of master DBs and slave DBs
- master DBs handle all write operations
- slave DBs handle all read operations
What are 3 key benefits of database replication?
- Performance (parallelization)
- Reliability
- High Availability
Describe the data flow for a read-through cache
read(x):
- If x not in cache:
- cache[x] := db.read(x)
- return cache[x]
Describe the data flow for a write-through cache
write(x, v):
- If x in cache:
- cache[x] := v
- db.write(x, v)
Can a cache be both read-through AND write-through?
Yes
Do you need a different cache in each datacenter?
Yes!
otherwise the cache can become a SPOF
In cacheing
What is an expiration policy?
How long you wait before removing data from the cache
What problem can happen if a cache expiration policy is too small on a cache?
You lose the speedup benefits of cacheing
What problem can happen if a cache expiration policy is too large on a cache?
Data can become stale
What does SPOF stand for?
Single Point Of Failure
What is the best situation for cacheing?
When data is read frequently but modified infrequently
What are the 5 key considerations around cacheing?
- effectiveness (high read, low write)
- Expiration Policy
- Consistency
- Failure Mitigation (avoid becoming SPOF)
- Invalidation Policy
What does CDN stand for?
Content Delivery Network
What is a CDN?
A network of geographically dispersed servers used to deliver static content
What is the typical cost structure for a CDN?
charged per data transfer in/out
When a user visits a website, which CDN server should deliver content to the user?
The CDN server geographically closest to the user
Why is stateless architecture generally preferable to stateful architecture?
Because requests from the same client can be routed to different servers
without the overhead of sticky sessions
So, better decoupling
How might we be able to store “state” data in a stateless architecture?
Keep state data in a separate data store from the rest of the web layer architecture
How might we improve website availability / performance across wider geographical areas?
Use multiple data centers (thing AWS regions / AZs), and geo-load balancing
Using a CDN helps with geo-performance but not availability
Describe a message queue architecture
(think SQS):
X Producers putting work on the queue, Y consumers taking work off the queue; decouples producer and consumer work
How can you perform horizontal scaling at the database level without replicating your database?
DB sharding (e.g. with a shard hash of primary_key % num_shards
)
What are 3 downsides of sharding?
- Resharding after growth
- Celebrity Problem or an uneven traffic distribution
- Complicated Joins (can be mitiigated by denormalizing data)
What are the 8 big ideas for scaling?
- Statelessness
- Redundancy
- Cacheing
- Multiple Data Centers
- CDNs for static assets
- Sharding in Data Tier
- Decoupling
- Logging / Monitoring / Automation