Priority 1 Flashcards

1
Q

SSL Certificate

A

A digital certificate granted to a server by acertificate authority. Contains the server’s public key, to be used as part of theTLS handshakeprocess in anHTTPSconnection.

An SSL certificate effectively confirms that a public key belongs to the server claiming it belongs to them. SSL certificates are a crucial defense againstman-in-the-middle attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Streaming

A

In networking, it usually refers to the act of continuously getting a feed of information from a server by keeping an open connection between the two machines or processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SHA

A

Short for “Secure Hash Algorithms”, the SHA is a collection of cryptographic hash functions used in the industry. These days, SHA-3 is a popular choice to use in a system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

YAML

A

A file format mostly used in configuration. Example:

version: 1.0
name: AlgoExpert Configuration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relational Database

A

A type of structured database in which data is stored following a tabular format; often supports powerful querying using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SQL Database

A

Any database that supports SQL. This term is often used synonymously with “Relational Database”, though in practice, noteveryrelational database supports SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Client—Server Model

A

The paradigm by which modern systems are designed, which consists of clients requesting data or service from servers and servers providing data or service to clients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Content Delivery Network

A

ACDNis a third-party service that acts like a cache for your servers. Sometimes, web applications can be slow for users in a particular region if your servers are located only in another region. A CDN has servers all around the world, meaning that the latency to a CDN’s servers will almost always be far better than the latency to your servers. A CDN’s servers are often referred to asPoPs(Points of Presence). Two of the most popular CDNs areCloudflareandGoogle Cloud CDN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Memory

A

Short forRandom Access Memory (RAM). Data stored in memory will belostwhen the process that has written that data dies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Monitoring

A

The process of having visibility into a system’s key metrics, monitoring is typically implemented by collecting important events in a system and aggregating them in human-readable charts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Virtual Machine

A

AVMis a form of computer inside of a computer. It is a program that you run on a machine that completely emulates a new kernel and operating system. Very useful when isolating programs from one another while having them share the same physical machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Redundancy

A

The process of replicating parts of a system in an effort to make it more reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Socket

A

A kind of file that acts like a stream. Processes can read and write to sockets and communicate in this manner. Most of the time the sockets are fronts for TCP connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DNS

A

Short for Domain Name System, it describes the entities and protocols involved in the translation from domain names to IP Addresses. Typically, machines make a DNS query to a well known entity which is responsible for returning the IP address (or multiple ones) of the requested domain name in the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Polling

A

The act of fetching a resource or piece of data regularly at an interval to make sure your data is not too stale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Client

A

A machine or process that requests data or service from a server.

Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

JSON

A

A file format heavily used in APIs and configuration. Stands forJavaScriptObjectNotation. Example:

{
   "version": 1.0,
   "name": "AlgoExpert Configuration"
}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

IP Packet

A

Sometimes more broadly referred to as just a (network)packet, an IP packet is effectively the smallest unit used to describe data being sent overIP, aside from bytes. An IP packet consists of:

  • anIP header, which contains the source and destinationIP addressesas well as other information related to the network
  • apayload, which is just the data being sent over the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

HTTP

A

TheHyperTextTransferProtocol is a very common network protocol implemented on top of TCP. Clients make HTTP requests, and servers respond with a response.

Requests typically have the following schema:

host: string (example: algoexpert.io)
port: integer (example: 80 or 443)
method: string (example: GET, PUT, POST, DELETE, OPTIONS or PATCH)
headers:  pair list (example: "Content-Type" => "application/json")
body: opaque sequence of bytes

Responses typically have the following schema:

status code: integer (example: 200, 401)
headers:  pair list (example: "Content-Length" => 1238)
body: opaque sequence of bytes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Process

A

A program that is currently running on a machine. You should always assume that any process may get terminated at any time in a sufficiently large system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Socket

A

A kind of file that acts like a stream. Processes can read and write to sockets and communicate in this manner. Most of the time the sockets are fronts for TCP connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Databases

A

Databases are programs that either use disk or memory to do 2 core things:recorddata andquerydata. In general, they are themselves servers that are long lived and interact with the rest of your application through network calls, with protocols on top of TCP or even HTTP.

Some databases only keep records in memory, and the users of such databases are aware of the fact that those records may be lost forever if the machine or process dies.

For the most part though, databases need persistence of those records, and thus cannot use memory. This means that you have to write your data to disk. Anything written to disk will remain through power loss or network partitions, so that’s what is used to keep permanent records.

Since machines die often in a large scale system, special disk partitions or volumes are used by the database processes, and those volumes can get recovered even if the machine were to go down permanently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Throughput

A

The number of operations that a system can handle properly per time unit. For instance the throughput of a server can often be measured in requests per second (RPS or QPS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Port

A

In order for multiple programs to listen for new network connections on the same machine without colliding, they pick aportto listen on. A port is an integer between 0 and 65,535 (216ports total).

Typically, ports 0-1023 are reserved forsystem ports(also calledwell-knownports) and shouldn’t be used by user-level processes. Certain ports have pre-defined uses, and although you usually won’t be required to have them memorized, they can sometimes come in handy. Below are some examples:

  • 22: Secure Shell
  • 53: DNS lookup
  • 80: HTTP
  • 443: HTTPS
25
Q

Node/Instance/Host

A

These three terms refer to the same thing most of the time: a virtual or physical machine on which the developer runs processes. Sometimes the wordserveralso refers to this same concept.

26
Q

Spatial Database

A

A type of database optimized for storing and querying spatial data like locations on a map. Spatial databases rely on spatial indexes likequadtreesto quickly perform spatial queries like finding all locations in the vicinity of a region.

27
Q

Percentiles

A

Most often used when describing alatency distribution. If yourXth percentile is 100 milliseconds, it means thatX% of the requests have latencies of 100ms or less. Sometimes, SLAs describe their guarantees using these percentiles.

28
Q

IP Address

A

An address given to each machine connected to the public internet. IPv4 addresses consist of four numbers separated by dots:a.b.c.dwhere all four numbers are between 0 and 255. Special values include:

  • 127.0.0.1: Your own local machine. Also referred to aslocalhost.
  • 192.168.x.y: Your private network. For instance, your machine and all machines on your private wifi network will usually have the192.168prefix.
29
Q

Hashing Function

A

A function that takes in a specific data type (such as a string or an identifier) and outputs a number. Different inputsmayhave the same output, but a good hashing function attempts to minimize thosehashing collisions(which is equivalent to maximizinguniformity).

30
Q

SQL

A

Structured Query Language. Relational databases can be used using a derivative of SQL such as PostgreSQL in the case of Postgres.

31
Q

SHA

A

Short for “Secure Hash Algorithms”, the SHA is a collection of cryptographic hash functions used in the industry. These days, SHA-3 is a popular choice to use in a system.

32
Q

Configuration

A

A set of parameters or constants that are critical to a system. Configuration is typically written inJSONorYAMLand can be eitherstatic, meaning that it’s hard-coded in and shipped with your system’s application code (like frontend code, for instance), ordynamic, meaning that it lives outside of your system’s application code.

33
Q

Pagination

A

When a network request potentially warrants a really large response, the relevant API might be designed to return only a singlepageof that response (i.e., a limited portion of the response), accompanied by an identifier or token for the client to request the next page if desired.

Pagination is often used when designingListendpoints. For instance, an endpoint to list videos on the YouTube Trending page could return a huge list of videos. This wouldn’t perform very well on mobile devices due to the lower network speeds and simply wouldn’t be optimal, since most users will only ever scroll through the first ten or twenty videos. So, the API could be designed to respond with only the first few videos of that list; in this case, we would say that the API response ispaginated.

34
Q

CRUD Operations

A

Stands forCreate,Read,Update,DeleteOperations. These four operations often serve as the bedrock of a functioning system and therefore find themselves at the core of many APIs. The termCRUDis very likely to come up during an API-design interview.

35
Q

Availability Zone

A

Sometimes referred to as anAZ, an availability zone designates a group of machines that share one or more central system components (e.g., power source, network connectivity, machine-cooling system).

Availability zones are typically located far away from each other such that no natural disaster can realistically bring down two of them at once. This ensures that if you have redundant storage, for instance, with data stored in two availability zones, losing oneAZstill leaves you with an operational system that abides by anySLAthat it might have.

36
Q

Microservice Architecture

A

When a system is made up of many small web services that can be compiled and deployed independently. This is usually thought of as a counterpart ofmonoliths.

37
Q

Non-Relational Database

A

In contrast with relational database (SQL databases), a type of database that is free of imposed, tabular-like structure. Non-relational databases are often referred to as NoSQL databases.

38
Q

Peer-To-Peer Network

A

A collection of machines referred to as peers that divide a workload between themselves to presumably complete the workload faster than would otherwise be possible. Peer-to-peer networks are often used in file-distribution systems.

39
Q

Load Balancer

A

A type ofreverse proxythat distributes traffic across servers. Load balancers can be found in many parts of a system, from the DNS layer all the way to the database layer.

40
Q

Spatial Database

A

A type of database optimized for storing and querying spatial data like locations on a map. Spatial databases rely on spatial indexes likequadtreesto quickly perform spatial queries like finding all locations in the vicinity of a region.

41
Q

NoSQL Database

A

Any database that is not SQL-compatible is called NoSQL.

42
Q

DoS Attack

A

Short for “denial-of-service attack”, a DoS attack is an attack in which a malicious user tries to bring down or damage a system in order to render it unavailable to users. Much of the time, it consists of flooding it with traffic. Some DoS attacks are easily preventable with rate limiting, while others can be far trickier to defend against.

43
Q

CAP Theorem

A

Stands forConsistency,Availability,Partition tolerance. In a nutshell, this theorem states that any distributed system can only achieve 2 of these 3 properties. Furthermore, since almost all useful systems do have network-partition tolerance, it’s generally boiled down to:Consistency vs. Availability; pick one.

One thing to keep in mind is that some levels of consistency are still achievable with high availability, butstrongconsistency is much harder.

44
Q

Replication

A

The act of duplicating the data from one database server to others. This is sometimes used to increase the redundancy of your system and tolerate regional failures for instance. Other times you can use replication to move data closer to your clients, thus decreasing the latency of accessing specific data.

45
Q

HTTPS

A

TheHyperTextTransferProtocolSecure is an extension ofHTTPthat’s used for secure communication online. It requires servers to have trusted certificates (usuallySSL certificates) and uses the Transport Layer Security (TLS), a security protocol built on top ofTCP, to encrypt data communicated between a client and a server.

46
Q

Key-Value Store

A

A Key-Value Store is a flexible NoSQL database that’s often used for caching and dynamic configuration. Popular options include DynamoDB, Etcd, Redis, and ZooKeeper.

47
Q

Cache

A

A piece of hardware or software that stores data, typically meant to retrieve that data faster than otherwise.

Caches are often used to store responses to network requests as well as results of computationally-long operations.

Note that data in a cache can becomestaleif the main source of truth for that data (i.e., the main database behind the cache) gets updated and the cache doesn’t.

48
Q

Blob Storage

A

Widely used kind of storage, in small and large scale systems.

They don’t really count as databases per se, partially because they only allow the user to store and retrieve data based on the name of the blob.

This is sort of like a key-value store but usually blob stores have different guarantees. They might be slower than KV stores but values can be megabytes large (or sometimes gigabytes large). Usually people use this to store things likelarge binaries, database snapshots, or imagesand other static assets that a website might have.

Blob storage is rather complicated to have on premise, and only giant companies like Google and Amazon have infrastructure that supports it. So usually in the context of System Design interviews you can assume that you will be able to useGCSorS3. These are blob storage services hosted by Google and Amazon respectively, that cost money depending on how much storage you use and how often you store and retrieve blobs from that storage.

49
Q

Persistent Storage

A

Usually refers to disk, but in general it is any form of storage that persists if the process in charge of managing it dies.

50
Q

Latency

A

The time it takes for a certain operation to complete in a system. Most often this measure is a time duration, like milliseconds or seconds. You should know these orders of magnitude:

  • Reading 1 MB from RAM: 250 μs (0.25 ms)
  • Reading 1 MB from SSD: 1,000 μs (1 ms)
  • Transfer 1 MB over Network: 10,000 μs (10 ms)
  • Reading 1MB from HDD: 20,000 μs (20 ms)
  • Inter-Continental Round Trip: 150,000 μs (150 ms)
51
Q

Server

A

A machine or process that provides data or service for a client, usually by listening for incoming network calls.

Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.

52
Q

High Availability

A

Used to describe systems that have particularly high levels of availability, typically 5 nines or more; sometimes abbreviated “HA”.

53
Q

Rendezvous Hashing

A

A type of hashing also coinedhighest random weighthashing. Allows for minimal re-distribution of mappings when a server goes down.

54
Q

Disk

A

Usually refers to eitherHDD (hard-disk drive)orSSD (solid-state drive). Data written to disk will persist through power failures and general machine crashes. Disk is also referred to asnon-volatile storage.

SSD is far faster than HDD (see latencies of accessing data from SSD and HDD) but also far more expensive from a financial point of view. Because of that, HDD will typically be used for data that’s rarely accessed or updated, but that’s stored for a long time, and SSD will be used for data that’s frequently accessed and updated.

55
Q

Server

A

A machine or process that provides data or service for a client, usually by listening for incoming network calls.

Note that a single machine or piece of software can be both a client and a server at the same time. For instance, a single machine could act as a server for end users and as a client for a database.

56
Q

Availability

A

The odds of a particular server or service being up and running at any point in time, usually measured in percentages. A server that has 99% availability will be operational 99% of the time (this would be described as having twoninesof availability).

57
Q

IP

A

Stands forInternet Protocol. This network protocol outlines how almost all machine-to-machine communications should happen in the world. Other protocols likeTCP,UDPandHTTPare built on top of IP.

58
Q

File System

A

An abstraction over a storage medium that defines how to manage data.

While there exist many different types of file systems, most follow a hierarchical structure that consists of directories and files, like theUnix file system’s structure.

59
Q

DDoS Attack

A

Short for “distributed denial-of-service attack”, a DDoS attack is a DoS attack in which the traffic flooding the target system comes from many different sources (like thousands of machines), making it much harder to defend against.