System Design Basic Concepts Flashcards

Question 1

Q

Four main things to care about with system design

Answer

A

Performance, Scalability, Availability, Reliability

Question 2

Q

What do we call the hardware that runs a system?

Question 3

Q

What do we call the hardware a client uses?

Question 4

Q

What’s the difference between a “service” and a “server”?

Answer

A

A server is an instance of a binary that provides many services

Question 5

Q

What does API stand for?

Answer

A

Application Programming Interface

Question 6

Q

What is an API?

Answer

A

It defines how programs interact with each other.

Question 7

Q

What is a good mental test for scalability?

Answer

A

How well could the program behave if capacity was increased ten-fold?

Question 8

Q

In what context would you use an Entity-Relationship Diagram?

Answer

A

Designing a database schema

Question 9

Q

What do you call the set of rules used for an Entity-Relationship Diagram?

Answer

A

Unified Modeling Language

Question 10

Q

What does UML stand for?

Answer

A

Unified Modeling Language

Question 11

Q

What is monolithic design?

Answer

A

When all software is built and deployed as a single unit

Question 12

Q

What are “microservices”?

Answer

A

When software is represented as a collection of independent services that communicate with each other

Question 13

Q

What is “loose coupling” vs. “tight coupling”?

Answer

A

Loose coupling is when different components and services have minimal dependencies on each other.

Tight coupling is when they’re highly-dependent on each other.

Question 14

Q

What is “high cohesion” vs. “low cohesion”?

Answer

A

High cohesion is when the logic, methods and classes of a single service are functionally related.

Low cohesion is when the service does a lot of overlapping things and has a vague role.

Question 15

Q

Why is high cohesion good? Four things.

Answer

A

It’s easier to maintain, deploy, test and understand?

Question 16

Q

What are the two models of interservice interaction?

Answer

A

Orchestration and Choreography

Question 17

Q

What is orchestration?

Answer

A

A model of interservice interaction one service is the “orchestrator” and manages communication between services.

Question 18

Q

What is choreography?

Answer

A

A model of interservice interaction where an event stream holds events and each service may produce events or subscribe (listen to) certain events.

Question 19

Q

What is persistence?

Answer

A

After data is written to a DB, it is stably stored on non-volatile memory.

Question 20

Q

Volatile vs. non-volatile memory

Answer

A

Volatile memory is erased when it is powered off, like RAM. Non-volatile is like a hard drive, which maintains data even when powered off.

Question 21

Q

What does NoSQL stand for?

Answer

A

Not Only SQL

Question 22

Q

What is NoSQL?

Answer

A

Catch-all term for DBs that don’t store data in tables, such as a key-value collection.

Question 23

Q

What is a normalized database vs. a de-normalized one?

Answer

A

In a normalized database, data is isolated and non-redundant.

A de-normalized database is when data from one table is copied to be part of another. An example is when you have a FrequencyCap table, and a FrequencyCap field on Campaign.

Question 24

Q

Vertical vs. Horizontal Scaling

Answer

A

Vertical scaling: Increase the resources of a single machine

Horizontal scaling: Adding more machines

Question 25

Q

Three examples of ways you can add to vertically scale

Answer

A

CPU, memory, storage

Question 26

Q

Two examples of ways to horizontally scale a database

Answer

A

Replication, sharding

Question 27

Q

Replication (in the context of a database)

Answer

A

When you copy data from the primary DB to multiple secondary read-only nodes.

Question 28

Q

Sharding (in the context of a database)

Answer

A

When you split data into smaller datasets, which you distribute.

Question 29

Q

Vertical partitioning (in the context of a database)

Answer

A

When data in a DB is sharded by-column.

Question 30

Q

How do you avoid imbalanced data when sharding a database? That is, one shard having a lot more data than the others.

Answer

A

You hash the shard key.

Question 31

Q

What does TCP/IP stand for?

Answer

A

Transmission Control Protocol / Internet Protocol

Question 32

Q

What is TCP/IP?

Answer

A

A model for how data is transmitted around the internet.

Question 33

Q

What are the four layers of TCP/IP?

Answer

A

Application layer (HTTP)
Transport layer (TCP)
Internet layer (IP)
Network Layer (LAN)

Question 34

Q

Internet Protocol

Answer

A

Rules for how data packets are routed across networks.

Question 35

Q

Transmission Control Protocol

Answer

A

Rules for how to deal with network unreliability, so data is reliable.

Question 36

Q

What does UDP stand for?

Answer

A

User Datagram Protocol

Question 37

Q

What is UDP?

Answer

A

Protocol for network unreliability handling – it’s a faster version of TCP that is OK with dropping packets?

Question 38

Q

In what situation might you want to use UDP over TCP?

Answer

A

When you’re OK with dropping packets, like for streaming video

Question 39

Q

What does SSL stand for?

Answer

A

Secure Sockets Layer

Question 40

Q

What does TLS stand for?

Answer

A

Transport Layer Security

Question 41

Q

What’s the relationship between SSL and TLS?

Answer

A

SSL is deprecated in favor of TLS.

Question 42

Q

What is SSL?

Answer

A

It encrypts connections between the client and server?

Question 43

Q

What is HTTPS?

Answer

A

It’s HTTP but secured by SSL.

Question 44

Q

What does DNS stand for?

Answer

A

Domain Name System

Question 45

Q

What does DNS do?

Answer

A

It resolves domain names to IP addresses.

Question 46

Q

Proxy

Answer

A

A server relaying traffic to/from the client, and applying logic to that traffic.

Question 47

Q

Reverse Proxy

Answer

A

A server that accepts requests from clients and forwards them to the server.

Question 48

Q

What’s the difference between a proxy and a reverse proxy?

Answer

A

Proxy = used by client
Reverse Proxy = used by server

Question 49

Q

What is the most common use case for a reverse proxy?

Answer

A

Load balancer

Question 50

Q

System Integration

Answer

A

Term for deciding how components share information

Question 51

Q

Database Integration

Answer

A

A system integration strategy where the database is the primary means of sharing information between components.

Question 52

Q

What does REST stand for?

Answer

A

Representational State Transfer

Question 53

Q

What are the four REST standards?

Answer

A

1: Client and server should be separate and independent

2: Data is represented as resources managed by the server

3: Interfaces support a common set of operations

4: Operations are stateless – the client leaves no context on the server

Question 54

Q

RESTful API

Answer

A

An API that follows REST standardsE

Question 55

Q

Explain how HTTP follows REST standards

Answer

A

1: Client and server are separate and independent

2: HTML, CSS, JS, images, etc. are resources managed by the server

3: GET, POST, PUT, DELETE are an interface with a common set of operations

4: Client leaves no context on the server

Question 56

Q

What does RPC stand for

Answer

A

Remote Procedure Call

Question 57

Q

What is an RPC

Answer

A

It invokes a routine in a process on some other machine

Question 58

Q

Stub

Answer

A

Interface that handles RPCs by serializing/deserializing and redirecting the call

Question 59

Q

What does IDL stand for?

Answer

A

Interface Design Language

Question 60

Q

What is an IDL for?

Answer

A

It defines how components communicate via RPCs.

Question 61

Q

What’s a Google example of an IDL?

Answer

A

Protocol buffers (protos)

Question 62

Q

Distributed System

Answer

A

A group of processes running on different machines and communicating through a network

Question 63

Q

What are the two key problems you have to solve with a distributed system

Answer

A

Network unreliability
Data inconsistency

Question 64

Q

What are three causes of network unreliability?

Answer

A

Network faults
Network congestion
Network partitions

Answer 63

A

When machines are unable to communicate with each other for some reason

Answer 64

A

When there is too much traffic on the network

Answer 65

A

When the network is split into two groups of machines that can’t communicate with each other

Answer 66

A

When simultaneous read requests to different nodes are guaranteed to return the same data

Answer 67

A

When nodes are guaranteed to eventually have the same data, albeit not immediately and simultaneously.

Answer 68

A

Consistency, Availability, Partition Tolerance

Consistency really means strong consistency

Answer 69

A

You can’t have all three of
strong consistency
availability
partition tolerance

Answer 70

A

It can’t tolerate a partition so it’s basically a single-node system

Answer 71

A

It doesn’t have strong consistency, so a partition could cause stale data in nodes.

Answer 72

A

It doesn’t guarantee availability, nodes could be shut down during a partition

Answer 73

A

This is a measure of availability, the uptime of the network being 99.999%

Answer 74

A

A group of nodes

Answer 75

A

When nodes send periodic messages to the coordination service to indicate normal operation

Answer 76

A

When nodes are automatically added or removed based on traffic

Answer 77

A

When you have a primary node, and it fails, one of the backup nodes is automatically selected to be the new primary.

Answer 78

A

First layer of servers a request reaches

Answer 79

A

Servers the client doesn’t directly communicate with

Answer 80

A

A stateless server that responds to requests from clients (front-end server)

Answer 81

A

When a web server acts as a single point of access to multiple services, and presents an interface to clients that hides those details

Answer 82

A

REST => used to communicate with clients
RPC => used to communicate with back-end servers

Answer 83

A

When a web server has a maximum capacity over some time window, and rejects requests exceeding that capacity

Answer 84

A

When a web server discards or re-routes requests that exceed system capacity

Answer 85

A

When a web server validates a user’s identity

Answer 86

A

When a web server determines whether a user has permission to access a particular resource

Answer 87

A

When a downstream server (a dependency of the server being analyzed) isn’t able to handle capacity.

Answer 88

A

When a downstream server that can’t handle capacity lets the web server know “stop sending me stuff”, so that the web server can load shed.

Answer 89

A

Distributes incoming network traffic across a group of servers.

Answer 90

A

It can go in front of the web server which has multiple instances, and it can go between the web server and the backend servers.

Answer 91

A

Layer-4 and Layer-7

Answer 92

A

Layer-4 only uses network and transport layer data to make routing decisions.

Layer-7 also uses HTTP request data.

Answer 93

A

Temporarily stores a subset of data on a high-speed medium, to improve performance and reduce resource usage.

Answer 94

A

It is way more expensive and it is volatile.

Answer 95

A

It can go in any number of places because it’s such a generic concept – you would attach it to a particular service that communicates (directly or indirectly) with a DB, so the service doesn’t always have to communicate with the DB.

Answer 96

A

Cache located on the same machine as the server.

Answer 97

A

Co-Located Cache

Answer 98

A

When data is found (or not found, respectively) in the cache

Answer 99

A

The strategy for how cache entries are marked as invalid, to be removed.

Answer 100

A

In a system with multiple caches, those caches are consistent if they have the same values for the same entries.

Answer 101

A

How you decide what to remove when the cache is full.

Answer 102

A

Least Recently Used
Least Frequently Used
First In First Out

Answer 103

A

How you decide how to handle writes to the cache and the DB

Answer 104

A

Write-through (write to both cache and DB)

Write-back (write only to cache, separate service syncs to DB)

Write-around (write only to DB, populate cache on cache miss)

Answer 105

A

Large volumes of unstructured data, like videos and images

Answer 106

A

The DB stores metadata about the blobs, and the paths used to access them by-key.

Answer 107

A

Content Delivery Network

Answer 108

A

It’s a set of geographically-distributed servers where each server copies content from the origin server and distributes it to users.

Answer 109

A

To get data geographically closer to users by putting content in a “zone” physically close to the user.

Answer 110

A

The web server tells it to use some specific CDN server.

Answer 111

A

For large files, like videos.

Answer 112

A

Logical grouping of READ methods in a Read API

Answer 113

A

That they are often wholly separate interfaces and services.

Answer 114

A

Handles one write that triggers multiple writes in multiple destinations.

Answer 115

A

Push model: Propagate new items as they’re created

Pull model: Fan-out happens at regular intervals, or on-demand

Answer 116

A

Globally-unique identifier

Answer 117

A

It’s a strategy for unique IDs where you just generate enormous numbers and pray that there is no collision.

Answer 118

A

A strategy for unique IDs where you use the time and server ID as the prefix for a GUID, and then guarantee unique suffix within server at time, so the ID is guaranteed to be unique.

Answer 119

A

A service that generates an ID value guaranteed to be unique, often distributed, with servers syncing IDs with each other.

Answer 120

A

Stores data from multiple services for analysis.

Answer 121

A

Stores data in its original raw format – a cheap dumping ground for data uploads.

Answer 122

A

Map: Send a subset of data to a node that maps input to output

Shuffle: Redistribute outputs to get them grouped by-key on same node

Reduce: Apply reduce function to keyed group of outputs

Answer 123

A

Functional testing verifies the correctness of a system on various inputs/outputs

Non-functional testing verifies properties other than correctness

Answer 124

A

Any three of:

Regression test
A/B test
Load test
Stress Test
Endurance test
Security test

Answer 125

A

Tests the interactions between units of software within a group

Answer 126

A

Tests the entire system using an external client interacting with the system interface