Lecture 1: Introduction Flashcards by Thomas Reddy

Distributed Systems

A set of cooperating computers communicating over a network to achieve a coherent task. Examples include storage for big websites, big data computations like MapReduce, and peer-to-peer file sharing.

How well did you know this?

Not at all

Perfectly

Decentralization

The absence of a central server in a system, where each node acts as both a client and a server. This can lead to increased robustness and fault tolerance.

How well did you know this?

Not at all

Perfectly

Fault Tolerance

The ability of a system to continue operating properly in the event of failure of one or more components.

How well did you know this?

Not at all

Perfectly

Registers

The fastest form of storage directly built into the CPU. Registers store small amounts of data and instructions being processed by the CPU, enabling nearly instantaneous access.

How well did you know this?

Not at all

Perfectly

Nonvolatile Memory

Storage medium that retains data even when power is turned off, such as NAND flash memory used in SSDs and USB drives. Offers faster read and write speeds compared to traditional magnetic and optical storage.

How well did you know this?

Not at all

Perfectly

Symmetric Multiprocessing (SMP)

Multiprocessor system architecture where all processors are treated equally and share access to all system resources, including memory and I/O devices.

How well did you know this?

Not at all

Perfectly

Asymmetric Multiprocessing (AMP)

Multiprocessor system architecture where one processor is designated as the master and controls the system, while the other processors act as subordinate processors and perform specialized tasks.

How well did you know this?

Not at all

Perfectly

Observer Pattern

Design pattern where an object, known as the subject, maintains a list of its dependents, called observers, and notifies them of any state changes, enabling them to react accordingly.

How well did you know this?

Not at all

Perfectly

Name two common approaches to achieving fault tolerance in distributed systems.

Two common approaches are availability, where the system continues to operate despite certain failures, and recoverability, where the system can recover from failures and resume normal operation after repairs.

How well did you know this?

Not at all

Perfectly

What are some tools used for achieving fault tolerance in distributed systems?

Non-volatile storage, such as hard drives or flash drives, is commonly used to store checkpoints or logs of system state to recover from failures.

How well did you know this?

Not at all

Perfectly

How does scalability relate to fault tolerance in distributed systems?

Scalability allows distributed systems to handle failures by adding more resources or nodes to compensate for failures, maintaining performance despite disruptions.

How well did you know this?

Not at all

Perfectly

What is fault tolerance, and why is it important in distributed systems?

Fault tolerance is the ability of a system to continue operating properly in the event of failure of one or more components. It’s crucial in distributed systems due to the inherent complexity and potential for failures in large-scale deployments.

How well did you know this?

Not at all

Perfectly

What are some implementation topics commonly encountered in distributed systems?

Common implementation topics include remote procedure call (RPC), threads, and concurrency control mechanisms like locks.

How well did you know this?

Not at all

Perfectly

What are the main infrastructure components in distributed systems?

The main infrastructure components are storage, communication, and computation.

How well did you know this?

Not at all

Perfectly

What are some examples of distributed systems applications?

Examples include storage for big websites, big data computations such as MapReduce, and peer-to-peer file sharing.

How well did you know this?

Not at all

Perfectly

What is the core concept of distributed systems?

The core concept of distributed systems is a set of cooperating computers communicating over a network to achieve a coherent task.

What are some reasons for building distributed systems?

Reasons include achieving high performance through parallelism, fault tolerance, handling naturally distributed problems, and achieving security through isolation.

Describe the scalability challenge in distributed systems.

Scalability refers to the ability to handle increasing workload or data by adding resources. The challenge lies in ensuring that adding resources results in proportional performance improvements.

How do partial failures differ from complete failures in distributed systems?

Partial failures occur when some components of the system stop working while others continue, whereas complete failures involve the entire system becoming inoperative.

What is the role of threads in distributed systems?

Threads are used for concurrent programming in distributed systems to harness multi-core CPUs and structure concurrent operations to simplify programming.

Explain the concept of recoverability in fault tolerance.

Recoverability refers to the system’s ability to resume normal operation after a failure, typically by saving state information and restoring it when the failure is resolved.

What is the significance of non-volatile storage in fault tolerance?

Non-volatile storage allows systems to store critical state information persistently, enabling them to recover from failures by restoring the system’s previous state.

How do distributed systems address the challenge of network failures?

Distributed systems often incorporate redundancy and error-handling mechanisms to mitigate the impact of network failures, such as replicating data or employing routing protocols.

What are the primary goals of building abstractions in distributed systems?

The goals are to simplify the interface for applications, hide the distributed nature of the underlying infrastructure, and enable easier development of distributed applications.

How do distributed systems handle the challenge of concurrency?

Concurrency in distributed systems is managed using techniques like thread synchronization, locks, and distributed algorithms to ensure consistency and prevent race conditions.

What is a common scalability issue when dealing with web servers and databases?

The bottleneck often shifts from web servers to databases as the number of web servers increases, requiring careful design to maintain performance.

What is the significance of non-volatile storage in fault tolerance?

Non-volatile storage allows systems to recover from failures by storing checkpoints or logs of system states.

What is the primary challenge in managing replicated copies for fault tolerance?

The challenge is ensuring that replicas stay in sync and do not drift apart.

What is consistency in the context of distributed systems?

Consistency refers to ensuring that all nodes in a distributed system have the same view of the data at any given time.

Why is consistency challenging in distributed systems with replication?

Replication introduces the risk of inconsistency due to delays in updating replicas and the possibility of concurrent updates.