Other Concepts Flashcards

1
Q

Leader Election - Fundamentals

A
  • The process by which nodes in a cluster elect a “leader” amongst them
  • The leader is responsible for the primary operations of the service
  • Guarantees that all nodes in the cluster know which one is the leader at any time, and elect a new leader if the current one dies for whatever reason
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Leader Election - Consensus Algorithm

A
  • Complex algorithms that are utilized to have multiple entities agree on a single data value, like who is the leader amongst a group of machines
  • Two popular algorithms are Paxos and Raft
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Peer-to-Peer Networks - Fundamentals

A
  • A collection of machines (peers) that divide a workload between themselves to presumably complete the workload faster than would otherwise be impossible
  • Often used in file-distribution systems
  • Gossip Protocol:
    - When a set of machines talk to each other in an uncoordinated manner inside a cluster to spread information through a system without requiring a central source of data
    - The number of machines informed grows exponentially because more and more machines communicate with each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Peer-to-Peer Networks - High-level process

A
  • Every file to be transferred typically is split into small parts
  • One machine starts the transfer by sending one part to another machine
  • Then, these machines which received the first part, start sending that part to other machines. They use the Gossip Protocol to find out about the state and location of the parts
  • The previous step is repeated until all the machines have a complete file
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Polling

A
  • Act of fetching a resource or piece of data regularly at an interval to make sure your data is not too stale
  • Useful when few requests (not thousands or more) are executed within a small interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Streaming

A
  • In networking refers to continuously getting a feed of information from a server by keeping an open connection between two machines or processes
  • The server proactively and continuously sends information to clients
  • Useful to execute multiple requests at any time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Configuration - Fundamentals

A
  • Set of parameters or constants that are critical to a system
  • Typically written in JSON or YAML. Can be static or dynamic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Configuration - Dynamic Configuration

A
  • Makes harder to review by developers because the value isn’t clear at sight
  • Applications and configurations can be deployed separately. For example, the configuration can be deployed at lesser intervals than the application, so the changes to the configuration can be tested against the same application’s release
  • May need tools to enforce that the application doesn’t have problems when configuration changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Rate Limiting - Fundamentals

A
  • Act of limiting the number of requests sent to or from a system
  • Often used to limit the number of requests to prevent DoS attacks and can be enforced at IP address, user account, region level, etc.
  • Also can be implemented in tiers. For example, a type of network request could be limited to 1 per second, 5 per 10 seconds, and 10 per minute
  • Tools: Redis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Rate Limiting - Attacks

A
  • DoS attack (Denial-of-Service attack)
  • Attack where a malicious user tries to bring down or damage in order to render it unavailable to users. Much of the time it consists of flooding the system with traffic
  • Some attacks can be preventable with rate limiting, but others can be trickier to defend against
  • DDoS attack (Distributed Denial-of-Service attack): a DoS attack where the traffic flooding comes from many different sources, making it harder to defend against
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Logging and Monitoring - Logging

A
  • Act of collecting and storing useful information about events in your system
  • Typically programs output log messages to STDOUT or STDERR pipes, which automatically get aggregated into a centralized logging solution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Logging and Monitoring - Monitoring

A
  • Process of having visibility into a system’s key metrics
  • Typically implemented by collecting important events and aggregating them in human-readable charts
  • To avoid breaking your metrics system when changing your logs, you can use a time-series database to store relevant events directly from your system, not from your logs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Logging and Monitoring - Alerting

A
  • The process through system administrators get notified when critical system issues occur
  • Can be set up by defining thresholds on monitoring charts. So alerts are sent to communication tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Publish/Subscribe Pattern - Fundamentals

A
  • The Pub/Sub pattern is a messaging model that consists of publishers and subscribers
  • Publishers publish messages on special topics (also called channels) without caring about / knowing who will read those messages
  • Subscribers subscribe to topics and read messages from topics
  • Pub/Sub systems often offer guarantees like at-least-once delivery, persistent storage, ordering of messages, and replayability of messages. Other features less likely to be provided are end-to-end encryption, and message filtering
  • Tools: Apache Kafka, Google Cloud Pub/Sub, AWS EventBridge, AWS SNS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Publish/Subscribe Pattern - Idempotent operation

A
  • An operation that has the same outcome regardless of how many times it has been performed
  • Operations performed through a Pub/Sub system, typically have to be idempotent because those systems allow messages to be consumed multiple times
  • For example, increasing an integer value in a database is not an idempotent operation but, setting a value to “COMPLETE” always will be idempotent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

MapReduce - Fundamentals

A
  • A popular framework for processing very large datasets in a distributed setting efficiently, quickly, and in a fault-tolerant manner
  • Map and reduce functions must be idempotent
  • Tools: Hadoop
17
Q

MapReduce - Steps

A
  • Map step: runs a map function on the various chunks of the dataset and transforms them into intermediate key-value pairs
  • Shuffle step: reorganizes the intermediate key-value pairs such that pairs of the same key are routed to the same machine
  • Reduce step: runs a reduce function on those key-value pairs and transforms them into meaningful data
18
Q

MapReduce - Usage

A
  • One of the use cases is counting the number of occurrences of words in a large text file
  • MapReduce library use:
  • Engineers only worry about the map and reduce functions, as well as their inputs and outputs
  • Other concerns like tasks parallelization, fault-tolerance of jobs are the responsibility of the MapReduce implementation
19
Q

MapReduce - Distributed File System (DFS) 1

A
  • It’s an abstraction over a cluster of machines that allows them to act like one large file system
  • Files are split into chunks of a certain size, and those chunks are sharded across a large cluster of machines
  • Two popular implementations are: Google File System (GFS), and Hadoop Distributed File System (HDFS)
20
Q

MapReduce - Distributed File System (DFS) 2

A
  • Takes care of availability and replication guarantees that are more complex to achieve than other FS
  • A central control plane decides where chunks reside, routing reads to the right nodes, and handling communication between machines
  • Different DFS implementations have different APIs and semantics, but they achieve the same goal: extremely large-scale persistent storage
21
Q

API Design - Pagination

A
  • When a network request potentially warrants a large response, an API might be designed to return only a single page
  • To request the next page an identifier/token accompanies the previous request
  • Often used when designing List endpoints. For example:
  • An endpoint to list videos on the YouTube trending page could return a huge list of videos
  • It wouldn’t perform well on mobile devices due to lower network speeds, and wouldn’t be optimal since most users will only scroll through the first videos
22
Q

API Design - CRUD and Documentation

A
  • CRUD operations:
    - Stands for Create, Read, Update, Delete operations
    - These operations serve as the bedrock of a system, so they serve as the core of many APIs
  • Swagger is the standard to document APIs
23
Q

API Design - HTTP methods

A
  • GET: transmits data identified by the URL to the client
  • POST: creates a new resource
  • PUT: updates a resource based on its provided identifier, or creates a resource if the identifier doesn’t match one
  • PATCH: submits a partial modification to a resource. It throws an exception when no resource match the provided identifier
  • DELETE: deletes a resource