Part 3: process Flashcards
Process
A process is often defined as a program in execution, meaning a program that is currently being executed on one of the operating system’s virtual processors.
The operating system ensures that independent processes cannot maliciously or inadvertently affect the correctness of each other’s behavior. This is achieved through concurrency transparency.
Each time a process is created, the operating system must create a complete independent address space. This allocation can involve initializing memory segments. [Pages: 118,119]
Threads in Distributed Systems
Parallelism and Performance:
Distributed systems often consist of multiple machines or nodes, each with its own set of processors or cores. Threads enable these systems to achieve true parallelism by allowing multiple tasks to run concurrently across different nodes, maximizing the utilization of available resources and enhancing system performance.
Responsiveness:
In distributed applications, responsiveness is crucial. For instance, in a distributed database system, while one thread might be handling a query on one node, another thread on a different node can handle updates or other queries. This ensures that the system remains responsive to multiple clients or requests simultaneously.
Resource Utilization:
Threads allow distributed systems to make optimal use of available resources. For example, while one thread is waiting for data from a remote node (which might involve network latency), another thread can perform local computations or handle other tasks, ensuring that the CPU and other resources are not idle.
Fault Tolerance and Recovery:
In distributed systems, failures are a common occurrence. Threads can be used to implement redundancy. For instance, multiple threads can be spawned to perform the same task on different nodes. If one thread (or its corresponding node) fails, others can take over, ensuring system reliability.
Load Balancing:
Threads are instrumental in implementing load balancing in distributed systems. Workloads can be distributed among multiple threads running on different nodes, ensuring that no single node is overwhelmed with requests.
Simplifying Complex Operations:
Distributed systems often involve complex operations that can be broken down into smaller tasks. Threads allow these tasks to be executed concurrently, simplifying the implementation and execution of complex operations.
Asynchronous Operations:
Threads enable asynchronous operations in distributed systems. For instance, a thread can send a request to a remote node and continue with other tasks without waiting for the response. Another thread can handle the response when it arrives.
Scalability:
As distributed systems grow and incorporate more nodes, threads ensure that the system can scale effectively. New tasks or requests can be handled by spawning additional threads, ensuring that the system can handle increased loads without significant modifications to the underlying architecture.
multithreaded server
Benefits and Usage:
Multithreading in distributed systems is primarily found on the server side. Multithreading not only simplifies server code but also makes it easier to develop servers that exploit parallelism for high performance. This is true even for uniprocessor systems. However, with modern multicore processors, multithreading for parallelism becomes an obvious choice. [Pages: 128,127]
Dispatcher/Worker Model:
A common organization for a multithreaded server is the dispatcher/worker model. In this setup, one thread, known as the dispatcher, reads incoming requests for operations. After examining the request, the server selects an idle (i.e., blocked) worker thread and hands it the request. The worker then performs a blocking read on the local file system, which might cause the thread to be suspended until the data is fetched from the disk. If the thread is suspended, another thread (like the dispatcher) can be selected to acquire more work or another worker thread that is now ready to run. [Pages: 128]
Virtualization
Virtualization deals with extending or replacing an existing interface to mimic the behavior of another system. Virtualization plays a significant role in distributed systems, especially in cloud computing. Cloud providers offer services like Infrastructure-as-a-Service (IaaS), where virtualization is crucial. Instead of renting out a physical machine, a cloud provider rents out a virtual machine that may share a physical machine with other customers. This approach ensures almost complete isolation between customers, giving the illusion of a dedicated physical machine.
networked user interfaces
A major task of client machines in distributed systems is to provide the means for users to interact with remote servers. There are two primary ways to support this interaction:
For each remote service, the client machine has a separate counterpart that contacts the service over the network. An example is a calendar running on a user’s smartphone that needs to synchronize with a remote, possibly shared calendar. An application-level protocol handles the synchronization in such cases.
A second approach is to provide direct access to remote services by offering only a user interface. In this case, the client machine acts as a terminal with no need for local storage. In the context of networked user interfaces, everything is processed and stored at the server side. This approach, known as the thin-client approach, has gained attention with the rise of Internet connectivity and the use of mobile devices. Thin-client solutions simplify system management. [Pages: 137,138,139]
Iterative vs concurrent
✦ Iterative Servers:
Definition: An iterative server handles one client request at a time. After serving one client, it then moves on to the next client.
Advantages:
Simplicity: Iterative servers are generally easier to implement and understand.
Predictability: Since only one request is handled at a time, the behavior of the server is more predictable.
Disadvantages:
Scalability: Iterative servers might not scale well under heavy loads since they can only handle one request at a time.
Responsiveness: If a request takes a long time to process, subsequent clients have to wait, leading to potential delays.
✦ Concurrent Servers:
Definition: A concurrent server can handle multiple client requests simultaneously. This is often achieved using multi-threading or multi-processing.
Advantages:
Scalability: Concurrent servers can serve multiple clients at once, making them more scalable under heavy loads.
Responsiveness: Since multiple requests can be processed in parallel, clients might experience shorter wait times.
Disadvantages:
Complexity: Implementing a concurrent server can be more complex due to the need to manage multiple threads or processes and handle potential synchronization issues.
Resource Consumption: Handling multiple requests simultaneously can lead to higher resource consumption, especially if not managed efficiently.
stateless vs stateful
Stateless Design:
Definition:
A stateless server does not keep information on the state of its clients. Once a request has been processed, the server forgets the client completely. For example, a Web server is typically stateless; it responds to incoming HTTP requests and then forgets the client. [Pages: 145]
Soft State:
A particular form of a stateless design is where the server maintains what is known as soft state. In this case, the server promises to maintain state on behalf of the client, but only for a limited time. After that time has expired, the server falls back to default behavior. [Pages: 145]
Advantages:
Stateless designs can be simpler and more scalable. There’s no need to manage client state, which can simplify server design and improve performance. [Pages: 146]
Drawbacks:
Performance might be suboptimal in some cases if the server has to re-fetch or re-compute information for every request. [Pages: 145]
Stateful Design:
Definition:
A stateful server maintains persistent information about its clients. This information needs to be explicitly deleted by the server. For example, a file server that allows a client to keep a local copy of a file would maintain a table containing (client, file) entries to track which client has the most recent version of a file. [Pages: 145, 146]
Advantages:
Stateful designs can improve the performance of read and write operations as perceived by the client. [Pages: 146]
Drawbacks:
If the server crashes, it has to recover its entire state as it was just before the crash. This can introduce considerable complexity. [Pages: 146]
General organization of a server cluster
Definition and Context:
A server cluster is essentially a collection of machines connected through a network, where each machine runs one or more servers. The server clusters considered in the document are those in which the machines are connected through a local-area network, often offering high bandwidth and low latency. [Pages: 155]
Three-Tiered Organization:
Many server clusters are logically organized into three tiers:
First Tier: Consists of a (logical) switch through which client requests are routed. Such a switch can vary in its implementation but serves as the entry point for the server cluster, offering a single network address. [Pages: 155,156,157]
Second Tier: Not explicitly mentioned in the provided excerpts, but typically, this tier might consist of application or processing servers that handle the business logic or processing tasks.
Third Tier: Comprises data-processing servers, notably file and database servers. Depending on the usage of the server cluster, these servers may be running on specialized machines configured for high-speed disk access and having large server-side data caches. [Pages: 156]
PlanetLab
Definition and Context:
PlanetLab is a distributed system where participating organizations donate one or more nodes (computers) that are subsequently shared among all PlanetLab users. Each node in PlanetLab is organized with two primary components: the virtual machine monitor (VMM) and Vservers. [Pages: 163]
Virtual Machine Monitor (VMM):
The VMM is an enhanced Linux operating system. The enhancements mainly comprise adjustments for supporting Vservers. [Pages: 163]
Vservers:
A Vserver can be thought of as a separate environment in which a group of processes run. The Linux VMM ensures that Vservers are separated, meaning processes in different Vservers are executed concurrently and independently, each making use only of the software packages and programs available in their own environment. The isolation between processes in different Vservers is strict, which eases supporting users from different organizations that want to use PlanetLab for various experiments. [Pages: 163, 164]
Slices:
To support experimentation, PlanetLab uses the concept of slices. Each slice is a set of Vservers, with each Vserver running on a different node. A slice can be thought of as a virtual server cluster, implemented by means of a collection of virtual machines. [Pages: 164]
Node Manager:
Central to managing PlanetLab resources is the node manager. Each node has such a manager, implemented by means of a separate Vserver. The node manager’s primary task is to create other Vservers on the node it manages and to control resource allocation. The node manager itself cannot be contacted directly over a network, allowing it to focus only on local resource management. [Pages: 164]
Resource Allocation:
An advantage of the container-based approach toward virtualization in PlanetLab, in comparison to running separate guest operating systems, is that resource allocation can generally be much simpler. It is possible to overbook resources by allowing for dynamic resource allocation. [Pages: 165]