Lecture 1: Introduction Flashcards
Distributed Systems
A set of cooperating computers communicating over a network to achieve a coherent task. Examples include storage for big websites, big data computations like MapReduce, and peer-to-peer file sharing.
Decentralization
The absence of a central server in a system, where each node acts as both a client and a server. This can lead to increased robustness and fault tolerance.
Fault Tolerance
The ability of a system to continue operating properly in the event of failure of one or more components.
Registers
The fastest form of storage directly built into the CPU. Registers store small amounts of data and instructions being processed by the CPU, enabling nearly instantaneous access.
Nonvolatile Memory
Storage medium that retains data even when power is turned off, such as NAND flash memory used in SSDs and USB drives. Offers faster read and write speeds compared to traditional magnetic and optical storage.
Symmetric Multiprocessing (SMP)
Multiprocessor system architecture where all processors are treated equally and share access to all system resources, including memory and I/O devices.
Asymmetric Multiprocessing (AMP)
Multiprocessor system architecture where one processor is designated as the master and controls the system, while the other processors act as subordinate processors and perform specialized tasks.
Observer Pattern
Design pattern where an object, known as the subject, maintains a list of its dependents, called observers, and notifies them of any state changes, enabling them to react accordingly.
Name two common approaches to achieving fault tolerance in distributed systems.
Two common approaches are availability, where the system continues to operate despite certain failures, and recoverability, where the system can recover from failures and resume normal operation after repairs.
What are some tools used for achieving fault tolerance in distributed systems?
Non-volatile storage, such as hard drives or flash drives, is commonly used to store checkpoints or logs of system state to recover from failures.
How does scalability relate to fault tolerance in distributed systems?
Scalability allows distributed systems to handle failures by adding more resources or nodes to compensate for failures, maintaining performance despite disruptions.
What is fault tolerance, and why is it important in distributed systems?
Fault tolerance is the ability of a system to continue operating properly in the event of failure of one or more components. It’s crucial in distributed systems due to the inherent complexity and potential for failures in large-scale deployments.
What are some implementation topics commonly encountered in distributed systems?
Common implementation topics include remote procedure call (RPC), threads, and concurrency control mechanisms like locks.
What are the main infrastructure components in distributed systems?
The main infrastructure components are storage, communication, and computation.
What are some examples of distributed systems applications?
Examples include storage for big websites, big data computations such as MapReduce, and peer-to-peer file sharing.