Introduction Flashcards
What is a distributed system?
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate actions by passing messages. These components, like software programs, run on physical hardware such as computers, referred to as nodes.
What are the two central parts of a distributed system?
The two central parts of a distributed system are: 1) The various parts that compose the system, located remotely and separated by a network, and 2) The network itself, acting as a communication mechanism for message exchange.
What are the three main benefits of distributed systems?
The three main benefits of distributed systems are Performance (achieving objectives for timeliness), Scalability (the capability to handle a growing amount of work or be enlarged), and Availability (the probability of the system working as required, when required).
What is the difference between vertical and horizontal scaling in distributed systems?
Vertical scaling involves adding resources like memory, CPU, or disk to a single node. In contrast, horizontal scaling involves adding more nodes to the system, allowing it to handle more traffic and store larger amounts of data.
How do distributed systems achieve high availability, and what is redundancy?
Distributed systems achieve high availability through redundancy, which involves storing data on multiple, redundant computers. This ensures that if one computer fails, another can take over, minimizing downtime and maintaining continuous service.
What are some challenges and trade-offs in distributed systems?
Distributed systems face challenges in balancing performance, scalability, and availability. There is often tension between these benefits and other system properties, requiring trade-offs based on understanding the constraints and limitations of distributed systems.
Why is developing software for distributed systems more challenging than for single-computer systems?
Distributed systems are subject to many more constraints than single-computer systems, making development more complex. Developers new to distributed systems often carry over assumptions from single-computer development, leading to problems in the systems they build.
What are the “fallacies of distributed computing” and who identified them?
The “fallacies of distributed computing” are a collection of eight false assumptions made by developers about distributed systems. They were identified by L. Peter Deutsch and others at Sun Microsystems to help developers build better systems.
What is the fallacy regarding the reliability of networks in distributed systems?
A common fallacy is assuming that the network is reliable. In reality, networks can fail due to various factors, including hardware issues, making it crucial to design systems that can handle network unreliability.
How does the assumption of zero latency affect distributed system development?
Assuming zero latency can be misleading, as there is always a significant difference in latency between local and remote system calls, especially over long distances. This assumption can impact system design and efficiency.
Why is the assumption of infinite bandwidth a fallacy in distributed computing?
While bandwidth has improved, it’s not infinite, especially when traffic crosses the Internet. This fallacy can lead to incorrect assumptions about system capabilities and impact the design of a distributed system’s topology.
What is the fallacy about network security in distributed systems?
The fallacy is assuming the network is secure. In reality, networks often involve multiple, possibly insecure parts controlled by different organizations, necessitating security measures in system design.
What is the “global clock fallacy” in distributed systems?
The global clock fallacy is assuming a consistent global clock across all nodes in a distributed system. Each node has its own clock, potentially running at different rates, which can affect event timing and order.
Why are distributed systems challenging to design and reason about?
Distributed systems are difficult to design due to network asynchrony, partial failures, and concurrency. These factors introduce complexity and unpredictable behaviors not present in non-distributed systems, increasing the risk of errors.
What is network asynchrony in distributed systems, and why is it challenging?
Network asynchrony refers to the inability of communication networks in distributed systems to provide strong guarantees on event delivery times. This can result in messages being delivered extremely late, out of order, or not at all, contrasting with the stricter guarantees provided by memory operations.
What are partial failures in distributed systems, and how do they add complexity?
Partial failures occur when only some components of a distributed system fail. This contrasts with single-server applications that assume either complete functionality or total failure. Ensuring atomicity across components is complex, requiring operations to be applied to all or none of the nodes.
How does concurrency contribute to the complexity of distributed systems?
Concurrency in distributed systems involves multiple computations happening simultaneously and potentially on the same data. These interleaved computations can interfere with each other, leading to unexpected behaviors, especially compared to non-concurrent, sequential applications.
Why is it important to consider network asynchrony, partial failures, and concurrency when designing distributed systems?
Considering these factors is vital because they are major contributors to the complexity in distributed systems. Being aware of them helps in anticipating and appropriately handling edge cases.
How is the correctness of a distributed system defined?
The correctness of a distributed system is defined in terms of the properties it must satisfy, primarily focusing on safety and liveness properties.
What are the two primary measures of correctness in distributed systems?
The two primary measures of correctness in distributed systems are safety properties, which define what must never happen, and liveness properties, which define what must eventually happen in a correct system.
What is a safety property in the context of distributed systems?
A safety property in distributed systems is a condition that defines something that must never occur in a correct system. It is about preventing incorrect states.
What is a liveness property in distributed systems?
A liveness property in distributed systems is a condition that defines something that must eventually occur in a correct system. It ensures that the system continues to make progress.