System Design Flashcards
TCP v. UDP
TCP
Transport layer – accuracy > speed
Connection-oriented – client and server must be connected before data sent
stateful protocol – can detect errors
UDP
speed > accuracy
real time service but some delays
HTTP and HTTPS
defines method requests, addresses, default ports
works on top of Transport Layer Security
TLS Handshake
client sends a request
servers submits a digital certificate
if certificate accepted by the client, client generates a session key to encrypt info transmitted during the session
handshake finishes, session begins
websocket
unprovoked server send!
server send data to clients without receiving a request first
Messages to be passed back and forth
use case: real-time data. up-to-date info is critical
transport layer
tcp – accuracy > speed
udp – speed > accuracy (video streaming)
retries
fail fast - low limit and alert user
risk thundering herd
jitter to introduce randomness in reqs
circuit breakers
opens when problem is detected
prevent cascading failures when shared resource goes down
rate limiting
cap usage, prevent autoscaling > budget. control reqs by customer
token bucket
leaky bucket
fixed and sliding window
queue based load leveling
order tasks in queue when they are concurrently requesting a service
introduces latency
good for scenarios when latency is ok and order matters
gateway aggregation
put a gateway in front of backend to aggregate and then dispatch requests.
Risk point of failure.
load balancing methods
round robin, least connections, consistent hashing
load balancing industry standard
nginx, amazon elb
load balancing pros
reliability, scalability, performance
load balancing risks
bottleneck
need to share session data across backends
longer deploys
scalable systems features
reliability (retries)
availability (rate limiting)
load balancing
sql db pros
relational - foreign keys
SQL querying language
structured data
ACID compliant - all or nothing transactions
sql db cons
hard to scale write-heavy systems
more work to define schema
harder to store unstructured data
nosql db pros
good for unstructured data
key-value pairs stored in docs
good for scaling -> support heavy write and read systems
nosql db cons
eventual consistency
harder to query multiple tables
types of db sharding
geo sharding
range sharding (first letter)
hash sharding
sharding pros
more scalable
faster queries with indexing
one shard downtime won’t affect all
reduce hardware costs
sharding cons
not all data can be sharded
foreign key reltns only maintained within a single shard
table joins very expensive
analytics
batch processing
web crawling
batch processing
large file uploads
job queue
real-time events
stream processing - fast yet brittle
generating a newsfeed
pub sub
scheduled tasks
job queue, batch processing
in-memory application caching
server maintains the cache
more memory demands
distributed in-memory caching
redis, memcache
3rd party server
database cache
use db to cache
file-system cache
cdn – store commonly accessed files
caching policies
FIFO
LRU
LFU
TTL
write through cache
updates cache and mem simultaneously. consistency > speed
write-behind cache
updates mem and cache asynch
speed> consistency
Symmetric Encryption
Faster, less compute, less secure
Same key for encryption and decryption.
Used for communication post TLS handshake.
Assymetric Encryption
Slower, more compute, more secure.
Used to establish TLS handshake.
RSA TLS
In transit encryption
HTTPS, TLS
at rest encryption
Encrypt dbs and hash and salt passwords
messaging encryption
end-to-end encryption. Only stored on users’ device
Authentication
username and pw login
1FA, MFA
Session or Token
Session Authentication
server creates session id
stores in cookie in users’ browser
stateful – more complex
Token Authentication
Server creates an encrypted token at login and client stores token in memory
stateless - token stored in db
JWT
small, secure, easy to parse, transparent (easy to tell if they’ve been tampered w)
Types of Authorization
RBAC, ABAC - company wide
ACL - granular
Cloud architecture pros
Upfront affordability
Pro maintenance
Scalability
Security
Cloud architecture cons
Higher cost of ownership
Loss of control
vendor lock
Industry specific regulations
Location specific
No airgapping
Cloud Provider Offerings
VMs, GPUs, batch processing
Containers
Dbs
Networking
Terraform
Infrastructure as code
Declare VMs, DNS records, low-layer resources in code
Kubernetes
Declare upper-layer resources in code
Group containers in clusters to manage and allocate resources
Push CDN
engineer pushes CDN w every update
Pull CDN
ass