System Design Basic Concepts Flashcards

1
Q

Four main things to care about with system design

A

Performance, Scalability, Availability, Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we call the hardware that runs a system?

A

Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we call the hardware a client uses?

A

Device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between a “service” and a “server”?

A

A server is an instance of a binary that provides many services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does API stand for?

A

Application Programming Interface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an API?

A

It defines how programs interact with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a good mental test for scalability?

A

How well could the program behave if capacity was increased ten-fold?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In what context would you use an Entity-Relationship Diagram?

A

Designing a database schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do you call the set of rules used for an Entity-Relationship Diagram?

A

Unified Modeling Language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does UML stand for?

A

Unified Modeling Language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is monolithic design?

A

When all software is built and deployed as a single unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are “microservices”?

A

When software is represented as a collection of independent services that communicate with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “loose coupling” vs. “tight coupling”?

A

Loose coupling is when different components and services have minimal dependencies on each other.

Tight coupling is when they’re highly-dependent on each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is “high cohesion” vs. “low cohesion”?

A

High cohesion is when the logic, methods and classes of a single service are functionally related.

Low cohesion is when the service does a lot of overlapping things and has a vague role.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is high cohesion good? Four things.

A

It’s easier to maintain, deploy, test and understand?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two models of interservice interaction?

A

Orchestration and Choreography

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is orchestration?

A

A model of interservice interaction one service is the “orchestrator” and manages communication between services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is choreography?

A

A model of interservice interaction where an event stream holds events and each service may produce events or subscribe (listen to) certain events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is persistence?

A

After data is written to a DB, it is stably stored on non-volatile memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Volatile vs. non-volatile memory

A

Volatile memory is erased when it is powered off, like RAM. Non-volatile is like a hard drive, which maintains data even when powered off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does NoSQL stand for?

A

Not Only SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is NoSQL?

A

Catch-all term for DBs that don’t store data in tables, such as a key-value collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a normalized database vs. a de-normalized one?

A

In a normalized database, data is isolated and non-redundant.

A de-normalized database is when data from one table is copied to be part of another. An example is when you have a FrequencyCap table, and a FrequencyCap field on Campaign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Vertical vs. Horizontal Scaling

A

Vertical scaling: Increase the resources of a single machine

Horizontal scaling: Adding more machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Three examples of ways you can add to vertically scale

A

CPU, memory, storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Two examples of ways to horizontally scale a database

A

Replication, sharding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Replication (in the context of a database)

A

When you copy data from the primary DB to multiple secondary read-only nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Sharding (in the context of a database)

A

When you split data into smaller datasets, which you distribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Vertical partitioning (in the context of a database)

A

When data in a DB is sharded by-column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How do you avoid imbalanced data when sharding a database? That is, one shard having a lot more data than the others.

A

You hash the shard key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does TCP/IP stand for?

A

Transmission Control Protocol / Internet Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is TCP/IP?

A

A model for how data is transmitted around the internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the four layers of TCP/IP?

A

Application layer (HTTP)
Transport layer (TCP)
Internet layer (IP)
Network Layer (LAN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Internet Protocol

A

Rules for how data packets are routed across networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Transmission Control Protocol

A

Rules for how to deal with network unreliability, so data is reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does UDP stand for?

A

User Datagram Protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is UDP?

A

Protocol for network unreliability handling – it’s a faster version of TCP that is OK with dropping packets?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

In what situation might you want to use UDP over TCP?

A

When you’re OK with dropping packets, like for streaming video

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does SSL stand for?

A

Secure Sockets Layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does TLS stand for?

A

Transport Layer Security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What’s the relationship between SSL and TLS?

A

SSL is deprecated in favor of TLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is SSL?

A

It encrypts connections between the client and server?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is HTTPS?

A

It’s HTTP but secured by SSL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What does DNS stand for?

A

Domain Name System

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What does DNS do?

A

It resolves domain names to IP addresses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Proxy

A

A server relaying traffic to/from the client, and applying logic to that traffic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Reverse Proxy

A

A server that accepts requests from clients and forwards them to the server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What’s the difference between a proxy and a reverse proxy?

A

Proxy = used by client
Reverse Proxy = used by server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is the most common use case for a reverse proxy?

A

Load balancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

System Integration

A

Term for deciding how components share information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Database Integration

A

A system integration strategy where the database is the primary means of sharing information between components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does REST stand for?

A

Representational State Transfer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What are the four REST standards?

A

1: Client and server should be separate and independent

2: Data is represented as resources managed by the server

3: Interfaces support a common set of operations

4: Operations are stateless – the client leaves no context on the server

54
Q

RESTful API

A

An API that follows REST standardsE

55
Q

Explain how HTTP follows REST standards

A

1: Client and server are separate and independent

2: HTML, CSS, JS, images, etc. are resources managed by the server

3: GET, POST, PUT, DELETE are an interface with a common set of operations

4: Client leaves no context on the server

56
Q

What does RPC stand for

A

Remote Procedure Call

57
Q

What is an RPC

A

It invokes a routine in a process on some other machine

58
Q

Stub

A

Interface that handles RPCs by serializing/deserializing and redirecting the call

59
Q

What does IDL stand for?

A

Interface Design Language

60
Q

What is an IDL for?

A

It defines how components communicate via RPCs.

61
Q

What’s a Google example of an IDL?

A

Protocol buffers (protos)

62
Q

Distributed System

A

A group of processes running on different machines and communicating through a network

63
Q

What are the two key problems you have to solve with a distributed system

A

Network unreliability
Data inconsistency

64
Q

What are three causes of network unreliability?

A

Network faults
Network congestion
Network partitions

65
Q

Network fault

A

When machines are unable to communicate with each other for some reason

66
Q

Network congestion

A

When there is too much traffic on the network

67
Q

Network partition

A

When the network is split into two groups of machines that can’t communicate with each other

68
Q

Strong consistency

A

When simultaneous read requests to different nodes are guaranteed to return the same data

69
Q

Eventual consistency

A

When nodes are guaranteed to eventually have the same data, albeit not immediately and simultaneously.

70
Q

What does CAP stand for?

A

Consistency, Availability, Partition Tolerance

Consistency really means strong consistency

71
Q

What is the CAP Theorem?

A

You can’t have all three of
strong consistency
availability
partition tolerance

72
Q

Problem with a CA system

A

It can’t tolerate a partition so it’s basically a single-node system

73
Q

Problem with an AP system

A

It doesn’t have strong consistency, so a partition could cause stale data in nodes.

74
Q

Problem with a CP system

A

It doesn’t guarantee availability, nodes could be shut down during a partition

75
Q

Five 9s

A

This is a measure of availability, the uptime of the network being 99.999%

76
Q

Cluster

A

A group of nodes

77
Q

Heartbeat

A

When nodes send periodic messages to the coordination service to indicate normal operation

78
Q

Autoscaling

A

When nodes are automatically added or removed based on traffic

79
Q

Leader Election

A

When you have a primary node, and it fails, one of the backup nodes is automatically selected to be the new primary.

80
Q

Where should you maintain the metadata for a distributed system?

A

Database

81
Q

Front-end server

A

First layer of servers a request reaches

82
Q

Back-end server

A

Servers the client doesn’t directly communicate with

83
Q

Web server

A

A stateless server that responds to requests from clients (front-end server)

84
Q

API Gateway

A

When a web server acts as a single point of access to multiple services, and presents an interface to clients that hides those details

85
Q

How does web server relate to RPC/REST

A

REST => used to communicate with clients
RPC => used to communicate with back-end servers

86
Q

Throttling

A

When a web server has a maximum capacity over some time window, and rejects requests exceeding that capacity

87
Q

Load shedding

A

When a web server discards or re-routes requests that exceed system capacity

88
Q

Authentication

A

When a web server validates a user’s identity

89
Q

Authorization

A

When a web server determines whether a user has permission to access a particular resource

90
Q

Degraded dependency

A

When a downstream server (a dependency of the server being analyzed) isn’t able to handle capacity.

91
Q

Pushback

A

When a downstream server that can’t handle capacity lets the web server know “stop sending me stuff”, so that the web server can load shed.

92
Q

Load Balancer

A

Distributes incoming network traffic across a group of servers.

93
Q

What are two places in the system a load balancer might go?

A

It can go in front of the web server which has multiple instances, and it can go between the web server and the backend servers.

94
Q

Two types of load balancers

A

Layer-4 and Layer-7

95
Q

Layer-4 vs. Layer-7 load balancer

A

Layer-4 only uses network and transport layer data to make routing decisions.

Layer-7 also uses HTTP request data.

96
Q

Cache

A

Temporarily stores a subset of data on a high-speed medium, to improve performance and reduce resource usage.

97
Q

How much better performance does RAM get vs. SSD?

A

20x

98
Q

How much better performance does RAM get vs. HDD?

A

80x

99
Q

Why not always use RAM vs. SSD/HDD?

A

It is way more expensive and it is volatile.

100
Q

Where in a system would you put a cache?

A

It can go in any number of places because it’s such a generic concept – you would attach it to a particular service that communicates (directly or indirectly) with a DB, so the service doesn’t always have to communicate with the DB.

101
Q

Local Cache

A

Cache located on the same machine as the server.

102
Q

What is a Local Cache also known as?

A

Co-Located Cache

103
Q

Cache Hit / Miss

A

When data is found (or not found, respectively) in the cache

104
Q

Cache Invalidation

A

The strategy for how cache entries are marked as invalid, to be removed.

105
Q

Most common strategy for cache invalidation

A

TTL

106
Q

Cache consistency

A

In a system with multiple caches, those caches are consistent if they have the same values for the same entries.

107
Q

Cache replacement policy

A

How you decide what to remove when the cache is full.

108
Q

Three most popular cache replacement policies

A

Least Recently Used
Least Frequently Used
First In First Out

109
Q

Cache write policy

A

How you decide how to handle writes to the cache and the DB

110
Q

Three cache write policies and what they are?

A

Write-through (write to both cache and DB)

Write-back (write only to cache, separate service syncs to DB)

Write-around (write only to DB, populate cache on cache miss)

111
Q

Blob Storage

A

Large volumes of unstructured data, like videos and images

112
Q

If you have blob storage, how do you still use the DB?

A

The DB stores metadata about the blobs, and the paths used to access them by-key.

113
Q

What does CDN stand for?

A

Content Delivery Network

114
Q

What is a Content Delivery Network?

A

It’s a set of geographically-distributed servers where each server copies content from the origin server and distributes it to users.

115
Q

What is the purpose of a CDN?

A

To get data geographically closer to users by putting content in a “zone” physically close to the user.

116
Q

How does the client know to get data from the CDN?

A

The web server tells it to use some specific CDN server.

117
Q

What is a common example of when a CDN is used?

A

For large files, like videos.

118
Q

Facade

A

Logical grouping of READ methods in a Read API

119
Q

What is the main thing to know about Read vs. Write APIs?

A

That they are often wholly separate interfaces and services.

120
Q

Fan-out service

A

Handles one write that triggers multiple writes in multiple destinations.

121
Q

Two models of fan-out service

A

Push model: Propagate new items as they’re created

Pull model: Fan-out happens at regular intervals, or on-demand

122
Q

What does GUID stand for

A

Globally-unique identifier

123
Q

What is GUID

A

It’s a strategy for unique IDs where you just generate enormous numbers and pray that there is no collision.

124
Q

Snowflake

A

A strategy for unique IDs where you use the time and server ID as the prefix for a GUID, and then guarantee unique suffix within server at time, so the ID is guaranteed to be unique.

125
Q

Unique ID service

A

A service that generates an ID value guaranteed to be unique, often distributed, with servers syncing IDs with each other.

126
Q

Data warehouse

A

Stores data from multiple services for analysis.

127
Q

Data lake

A

Stores data in its original raw format – a cheap dumping ground for data uploads.

128
Q

Steps of Map-Reduce:

A

Map: Send a subset of data to a node that maps input to output

Shuffle: Redistribute outputs to get them grouped by-key on same node

Reduce: Apply reduce function to keyed group of outputs

129
Q

Functional vs. Non-Functional Testing

A

Functional testing verifies the correctness of a system on various inputs/outputs

Non-functional testing verifies properties other than correctness

130
Q

There are many examples of non-functional tests, give three.

A

Any three of:

Regression test
A/B test
Load test
Stress Test
Endurance test
Security test

131
Q

Integration test

A

Tests the interactions between units of software within a group

132
Q

End-to-end test

A

Tests the entire system using an external client interacting with the system interface