Introduction Flashcards

Question

Why do we need a generic model for distributed systems?

Answer 1

We need a generic model for distributed systems to provide a common framework for solving problems generically, allowing us to apply reasoning across various systems with different hardware, networks, and other varying factors.

Answer 2

The two main categories of distributed system models based on the nature of communication are synchronous systems and asynchronous systems.

Answer 3

In a synchronous system, each node has an accurate clock, and there is a known upper bound on message transmission delay and processing time. The execution is split into rounds, with nodes operating in lock-step.

Answer 4

An asynchronous system is characterized by the absence of a fixed upper bound on message delivery times and the time elapsed between consecutive steps of a node. Nodes in such a system do not have a common notion of time and run at independent rates.

Answer 5

The asynchronous system model is more reflective of real-life distributed systems like the Internet, where there is limited control over all components and minimal guarantees on message transmission times.

Answer 6

The synchronous model is easier to handle because it operates on known bounds for message transmission and processing, and nodes run in a coordinated manner. This predictability simplifies programming and reasoning about the system.

Answer 7

In a fail-stop failure, a node halts and remains halted permanently. This failure is detectable by other nodes, which can recognize the failure through communication attempts.

Answer 8

In a crash failure, a node halts but does so silently, meaning other nodes may not directly detect the failure. They can only infer the failure when they are unable to communicate with the node.

Answer 9

An omission failure occurs when a node fails to respond to incoming requests. This type of failure can disrupt communication and coordination within the system.

Answer 10

A Byzantine failure is when a node exhibits arbitrary behavior, such as transmitting random messages, taking incorrect steps, or stopping entirely. This can be due to malicious actions or software bugs.

Answer 11

Fail-stop failures are simpler and more convenient because they are detectable by other nodes. However, they are not very realistic in many real-life systems where it’s not easy to identify if another node has crashed or not.

Answer 12

Most algorithms in distributed systems operate under the assumption of crash failures, where a node may halt silently and its failure is inferred by an inability to communicate with it.

Answer 13

In distributed systems, due to unreliable networks, messages might get lost and are often retried, leading to multiple deliveries. This can result in unintended side effects, like processing the same transaction multiple times.

Answer 14

An idempotent operation is one that can be applied multiple times without changing the result beyond the initial application. This ensures that even if a message is delivered and processed multiple times, the outcome remains the same.

Answer 15

In the de-duplication approach, each message is given a unique identifier. The recipient tracks these identifiers to avoid executing operations for already received messages, ensuring each operation is performed only once.

Answer 16

Delivery refers to the arrival of a message at the destination node at the hardware level, while processing involves the handling of the message by the software application layer. Exactly-once semantics is more concerned with processing a message only once, rather than its delivery frequency.

Answer 17

It's impossible to guarantee exactly-once delivery in distributed systems due to network unreliability. However, exactly-once processing can sometimes be achieved to ensure a message's effect occurs only once.

Answer 18

At-most-once delivery sends every message only once, regardless of its delivery success, while at-least-once delivery involves continuously sending a message until an acknowledgment is received from the recipient.

Answer 19

Identifying failures in distributed systems is challenging due to the asynchronous nature of the network, which makes it hard to differentiate between a crashed node and a node that is slow to respond.

Answer 20

Timeouts are the primary mechanism used to detect failures. They impose an artificial upper bound on message delays, allowing the system to assume failure if a node is slower than this bound.

Answer 21

A small timeout value reduces the waiting time for responses from crashed nodes but can lead to false positives, mistakenly declaring slow but operational nodes as failed.

Answer 22

A large timeout value is more lenient towards slow nodes but can delay the detection of actual crashed nodes, leading to inefficiencies in the system.

Answer 23

A failure detector is a component within a node used to identify other nodes that have failed. It is vital for algorithms that need to progress in the presence of failures.

Answer 24

Failure detectors are categorized based on two properties: completeness (the rate of successfully identifying crashed nodes) and accuracy (the number of mistakes made in identifying failures).

Answer 25

A perfect failure detector detects every faulty process without false positives. However, it is impossible to build such a detector in purely asynchronous systems due to inherent limitations in detecting failures accurately and promptly.

Answer 26

A stateless system maintains no state of past interactions and performs its functions based solely on current inputs provided to it, either directly or indirectly.

Answer 27

An example of a stateless system is a service that calculates the price of a product by retrieving its initial price and current discounts from other services, using only this data to perform calculations.

Answer 28

Stateful systems maintain and mutate a state over time. Their results depend on this stored state, which evolves based on past interactions and data.

Answer 29

An example of a stateful system is one that stores the ages of a company's employees and provides information like the maximum age, which depends on the registered employee data.

Answer 30

Stateless systems are easier to design, build, and scale compared to stateful systems. They treat all nodes as identical, simplifying traffic balancing and scaling through the addition or removal of servers.

Answer 31

Stateful systems are more challenging because they hold different data across nodes, requiring additional efforts in directing traffic correctly and ensuring synchronization across instances.

Answer 32

In system architecture, stateless components handle business capabilities without maintaining state, while stateful components are responsible for handling and processing data, often with added complexity.

Introduction Flashcards

(56 cards)