Introduction Flashcards

1
Q

What is a distributed system?

A

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate actions by passing messages. These components, like software programs, run on physical hardware such as computers, referred to as nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two central parts of a distributed system?

A

The two central parts of a distributed system are: 1) The various parts that compose the system, located remotely and separated by a network, and 2) The network itself, acting as a communication mechanism for message exchange.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three main benefits of distributed systems?

A

The three main benefits of distributed systems are Performance (achieving objectives for timeliness), Scalability (the capability to handle a growing amount of work or be enlarged), and Availability (the probability of the system working as required, when required).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between vertical and horizontal scaling in distributed systems?

A

Vertical scaling involves adding resources like memory, CPU, or disk to a single node. In contrast, horizontal scaling involves adding more nodes to the system, allowing it to handle more traffic and store larger amounts of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do distributed systems achieve high availability, and what is redundancy?

A

Distributed systems achieve high availability through redundancy, which involves storing data on multiple, redundant computers. This ensures that if one computer fails, another can take over, minimizing downtime and maintaining continuous service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some challenges and trade-offs in distributed systems?

A

Distributed systems face challenges in balancing performance, scalability, and availability. There is often tension between these benefits and other system properties, requiring trade-offs based on understanding the constraints and limitations of distributed systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is developing software for distributed systems more challenging than for single-computer systems?

A

Distributed systems are subject to many more constraints than single-computer systems, making development more complex. Developers new to distributed systems often carry over assumptions from single-computer development, leading to problems in the systems they build.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the “fallacies of distributed computing” and who identified them?

A

The “fallacies of distributed computing” are a collection of eight false assumptions made by developers about distributed systems. They were identified by L. Peter Deutsch and others at Sun Microsystems to help developers build better systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the fallacy regarding the reliability of networks in distributed systems?

A

A common fallacy is assuming that the network is reliable. In reality, networks can fail due to various factors, including hardware issues, making it crucial to design systems that can handle network unreliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the assumption of zero latency affect distributed system development?

A

Assuming zero latency can be misleading, as there is always a significant difference in latency between local and remote system calls, especially over long distances. This assumption can impact system design and efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is the assumption of infinite bandwidth a fallacy in distributed computing?

A

While bandwidth has improved, it’s not infinite, especially when traffic crosses the Internet. This fallacy can lead to incorrect assumptions about system capabilities and impact the design of a distributed system’s topology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the fallacy about network security in distributed systems?

A

The fallacy is assuming the network is secure. In reality, networks often involve multiple, possibly insecure parts controlled by different organizations, necessitating security measures in system design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the “global clock fallacy” in distributed systems?

A

The global clock fallacy is assuming a consistent global clock across all nodes in a distributed system. Each node has its own clock, potentially running at different rates, which can affect event timing and order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are distributed systems challenging to design and reason about?

A

Distributed systems are difficult to design due to network asynchrony, partial failures, and concurrency. These factors introduce complexity and unpredictable behaviors not present in non-distributed systems, increasing the risk of errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is network asynchrony in distributed systems, and why is it challenging?

A

Network asynchrony refers to the inability of communication networks in distributed systems to provide strong guarantees on event delivery times. This can result in messages being delivered extremely late, out of order, or not at all, contrasting with the stricter guarantees provided by memory operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are partial failures in distributed systems, and how do they add complexity?

A

Partial failures occur when only some components of a distributed system fail. This contrasts with single-server applications that assume either complete functionality or total failure. Ensuring atomicity across components is complex, requiring operations to be applied to all or none of the nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does concurrency contribute to the complexity of distributed systems?

A

Concurrency in distributed systems involves multiple computations happening simultaneously and potentially on the same data. These interleaved computations can interfere with each other, leading to unexpected behaviors, especially compared to non-concurrent, sequential applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is it important to consider network asynchrony, partial failures, and concurrency when designing distributed systems?

A

Considering these factors is vital because they are major contributors to the complexity in distributed systems. Being aware of them helps in anticipating and appropriately handling edge cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is the correctness of a distributed system defined?

A

The correctness of a distributed system is defined in terms of the properties it must satisfy, primarily focusing on safety and liveness properties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two primary measures of correctness in distributed systems?

A

The two primary measures of correctness in distributed systems are safety properties, which define what must never happen, and liveness properties, which define what must eventually happen in a correct system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a safety property in the context of distributed systems?

A

A safety property in distributed systems is a condition that defines something that must never occur in a correct system. It is about preventing incorrect states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a liveness property in distributed systems?

A

A liveness property in distributed systems is a condition that defines something that must eventually occur in a correct system. It ensures that the system continues to make progress.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is the safety property often prioritized over liveness in distributed systems?

A

In distributed systems, safety properties are often prioritized over liveness because it’s usually more important to ensure the system does not reach incorrect states. There is an inherent tension between safety and liveness, and sometimes compromises on liveness are made to maintain safety.

24
Q

What is the inherent challenge in balancing safety and liveness properties in distributed systems?

A

The challenge in balancing safety and liveness properties lies in the fact that it is sometimes physically impossible to satisfy both simultaneously. Compromises may need to be made, often prioritizing safety to avoid incorrect states, even if it means compromising on certain aspects of liveness.

25
Q

Why do we need a generic model for distributed systems?

A

We need a generic model for distributed systems to provide a common framework for solving problems generically, allowing us to apply reasoning across various systems with different hardware, networks, and other varying factors.

26
Q

What are the two main categories of distributed system models based on communication nature?

A

The two main categories of distributed system models based on the nature of communication are synchronous systems and asynchronous systems.

27
Q

What characterizes a synchronous system in distributed computing?

A

In a synchronous system, each node has an accurate clock, and there is a known upper bound on message transmission delay and processing time. The execution is split into rounds, with nodes operating in lock-step.

28
Q

What defines an asynchronous system in distributed computing?

A

An asynchronous system is characterized by the absence of a fixed upper bound on message delivery times and the time elapsed between consecutive steps of a node. Nodes in such a system do not have a common notion of time and run at independent rates.

29
Q

Which distributed system model is more reflective of real-life scenarios, such as the Internet?

A

The asynchronous system model is more reflective of real-life distributed systems like the Internet, where there is limited control over all components and minimal guarantees on message transmission times.

30
Q

Why is the synchronous model easier to describe, program, and reason about compared to the asynchronous model?

A

The synchronous model is easier to handle because it operates on known bounds for message transmission and processing, and nodes run in a coordinated manner. This predictability simplifies programming and reasoning about the system.

31
Q

What is a fail-stop failure in a distributed system?

A

In a fail-stop failure, a node halts and remains halted permanently. This failure is detectable by other nodes, which can recognize the failure through communication attempts.

32
Q

How does a crash failure differ from a fail-stop failure in distributed systems?

A

In a crash failure, a node halts but does so silently, meaning other nodes may not directly detect the failure. They can only infer the failure when they are unable to communicate with the node.

33
Q

What is an omission failure in distributed systems?

A

An omission failure occurs when a node fails to respond to incoming requests. This type of failure can disrupt communication and coordination within the system.

34
Q

What characterizes a Byzantine failure in a distributed system?

A

A Byzantine failure is when a node exhibits arbitrary behavior, such as transmitting random messages, taking incorrect steps, or stopping entirely. This can be due to malicious actions or software bugs.

35
Q

Why are fail-stop failures considered simpler and more convenient in distributed system design?

A

Fail-stop failures are simpler and more convenient because they are detectable by other nodes. However, they are not very realistic in many real-life systems where it’s not easy to identify if another node has crashed or not.

36
Q

Under which failure assumption do most algorithms in distributed systems operate?

A

Most algorithms in distributed systems operate under the assumption of crash failures, where a node may halt silently and its failure is inferred by an inability to communicate with it.

37
Q

What is the challenge with multiple deliveries of a message in distributed systems?

A

In distributed systems, due to unreliable networks, messages might get lost and are often retried, leading to multiple deliveries. This can result in unintended side effects, like processing the same transaction multiple times.

38
Q

What is an idempotent operation in the context of distributed systems?

A

An idempotent operation is one that can be applied multiple times without changing the result beyond the initial application. This ensures that even if a message is delivered and processed multiple times, the outcome remains the same.

39
Q

What is the de-duplication approach in message delivery for distributed systems?

A

In the de-duplication approach, each message is given a unique identifier. The recipient tracks these identifiers to avoid executing operations for already received messages, ensuring each operation is performed only once.

40
Q

What is the difference between delivery and processing in the context of exactly-once semantics?

A

Delivery refers to the arrival of a message at the destination node at the hardware level, while processing involves the handling of the message by the software application layer. Exactly-once semantics is more concerned with processing a message only once, rather than its delivery frequency.

41
Q

Can exactly-once delivery be guaranteed in distributed systems?

A

It’s impossible to guarantee exactly-once delivery in distributed systems due to network unreliability. However, exactly-once processing can sometimes be achieved to ensure a message’s effect occurs only once.

42
Q

What are at-most-once and at-least-once delivery semantics in distributed systems?

A

At-most-once delivery sends every message only once, regardless of its delivery success, while at-least-once delivery involves continuously sending a message until an acknowledgment is received from the recipient.

43
Q

Why is it challenging to identify failures in distributed systems?

A

Identifying failures in distributed systems is challenging due to the asynchronous nature of the network, which makes it hard to differentiate between a crashed node and a node that is slow to respond.

44
Q

What is the main mechanism used to detect failures in distributed systems?

A

Timeouts are the primary mechanism used to detect failures. They impose an artificial upper bound on message delays, allowing the system to assume failure if a node is slower than this bound.

45
Q

What is the trade-off involved in selecting a small timeout value for failure detection?

A

A small timeout value reduces the waiting time for responses from crashed nodes but can lead to false positives, mistakenly declaring slow but operational nodes as failed.

46
Q

What is the trade-off in choosing a large timeout value for failure detection?

A

A large timeout value is more lenient towards slow nodes but can delay the detection of actual crashed nodes, leading to inefficiencies in the system.

47
Q

What is a failure detector in the context of distributed systems?

A

A failure detector is a component within a node used to identify other nodes that have failed. It is vital for algorithms that need to progress in the presence of failures.

48
Q

How are failure detectors categorized?

A

Failure detectors are categorized based on two properties: completeness (the rate of successfully identifying crashed nodes) and accuracy (the number of mistakes made in identifying failures).

49
Q

What is a perfect failure detector, and is it achievable in asynchronous systems?

A

A perfect failure detector detects every faulty process without false positives. However, it is impossible to build such a detector in purely asynchronous systems due to inherent limitations in detecting failures accurately and promptly.

50
Q

What characterizes a stateless system?

A

A stateless system maintains no state of past interactions and performs its functions based solely on current inputs provided to it, either directly or indirectly.

51
Q

Can you give an example of a stateless system?

A

An example of a stateless system is a service that calculates the price of a product by retrieving its initial price and current discounts from other services, using only this data to perform calculations.

52
Q

What distinguishes a stateful system?

A

Stateful systems maintain and mutate a state over time. Their results depend on this stored state, which evolves based on past interactions and data.

53
Q

Give an example of a stateful system.

A

An example of a stateful system is one that stores the ages of a company’s employees and provides information like the maximum age, which depends on the registered employee data.

54
Q

What are some benefits of stateless systems over stateful systems?

A

Stateless systems are easier to design, build, and scale compared to stateful systems. They treat all nodes as identical, simplifying traffic balancing and scaling through the addition or removal of servers.

55
Q

Why are stateful systems more challenging than stateless systems?

A

Stateful systems are more challenging because they hold different data across nodes, requiring additional efforts in directing traffic correctly and ensuring synchronization across instances.

56
Q

How do stateless and stateful components differ in terms of architecture?

A

In system architecture, stateless components handle business capabilities without maintaining state, while stateful components are responsible for handling and processing data, often with added complexity.