Introduction to OS Flashcards

1
Q

What are the key roles of an operating system?

A
  1. Hides the complexities of the underlying hardware from both the applications and the application developers (abstraction)
  2. Manages the underlying hardware on behalf of the executing applications (arbitration)
  3. Provides isolation and protection to multiple applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following are components of the OS?
- file system vs file editor
- device driver or cache memory
- scheduler or web browser

A

Components of OS: file system, device driver, scheduler
Not components: file editor, cache memory, web browser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the distinction between OS abstractions, mechanisms, and policies?

A

Abstraction are entities that represent other entities, often in a simplified manner such as process, thread, file, socket, memory page

Policies are the rules around how a system is maintained and resources are utilized such as least-recently used (LRU), earliest deadline first (EDF)

Mechanism are the tools by which policies are implemented, e.g., create, schedule, open, write, allocate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the principle of separation of mechanism and policy?

A

The idea behind separation of mechanism and policy is that how you enforce a policy shouldn’t be coupled to the policy itself. Since a given policy is only valid in some contexts or during some states of execution, having a mechanism that only suits that policy is very brittle. Instead, we should try to make sure that our mechanism is able to support a variety of policies, as any one of these policies may be in effect at a given point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the principle of optimize for the common case?

A

We ensure that the most frequent path of execution operates as performantly as possible. This is valuable because:

  • It’s simpler than trying to optimize across all possible cases
  • It leads to the largest performance gains as you are optimizing the flow that by definition is executed the most often

We need to ask the following questions to understand the common case:
- Where will the OS be used?
- What will the user want to execute on the machine?
- What are the workload requirements?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What happens during a user-kernel mode crossing?

A

The CPU switches from user mode (where application runs with limited access to system resources) to kernel mode (OS has full access to hardware and memory allowing the application to execute privileged operations, e.g. accessing hardware, managing memory, or performing I/O operations, through a system call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some of the reasons why user-kernel mode crossing happens?

A

It occurs anytime an application needs access to underlying hardware, be it physical memory, storage, or I/O devices, for example:
- read from a file
- write to a file
- listen on a socket
- allocate memory
- free memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a kernel trap and why does it happen? What are the steps that take place during a kernel trap?

A

A kernel trap is a signal that the hardware sends to the operating system when it detects that something has occurred.

A kernel trap will occur if the hardware detects an attempt to perform an unprivileged operation, which is basically any operation that occurs when the special bit is not set on the CPU. As well, the hardware may issue a trap if it detects an attempt to access memory that requires special privileges.

During a kernel trap, the hardware sends a signal to the operating system, which initiates controls and invokes its trap handler. The operating system can then examine the process that caused the trap as well as the reason for the trap and make a decision as to whether it will allow the attempted operation or potentially kill the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a system call and how does it happen? What are the steps that take place during a system call?

A

A system call is an operation that the operating system exposes that an application can directly invoke if it needs the operating system to perform a specific service and to perform certain privileged access on its behalf.

Examples of system calls include open, send, mmap.

When a user process makes a system call, the operating system needs to context switch from that process to the kernel, making sure it holds onto the arguments passed to that system call. Before the kernel can actually carry out the system call, the trap bit on the CPU must be adjusted to let the CPU know the execution mode is kernel mode. Once this occurs, the kernel must jump to the system call and execute it. After the execution, the kernel must reset the trap bit in the CPU, and context switch back to the user process, passing back the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

On a 64-bit Linux-based operating system, which system call is used to:
- Send a signal to a proces?
- Set the group identity of a process?
- Mount a file system?
- Read/write system parameters?

A
  • Send a signal to a proces: kill
  • Set the group identity of a process: setgid
  • Mount a file system: mount
  • Read/write system parameters: sysctl
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a monolithic OS? What are the design decisions and performance tradeoffs?

A

Monolithic OS - every possible service that any one of the applications require/any type of hardware will demand, is already part of OS

  • Pros: Everything included; inlining, compile-time optimizations
  • Cons: No customization; Not too portable/manageable; Large memory footprint (which can impact performance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a modular OS? What are the design decisions and performance tradeoffs?

A

Modular OS - has a number of basic services and API part of it, but everything can be added as a module. OS is easily customized which particular file system or scheduler the OS uses. The OS specifies certain interfaces that any module must implement in order to be part of the OS. Depending on the workload, a module can be installed that implements the required interface.

  • Pros: Maintainability; Smaller footprint; Less resource needs
  • Cons: All the modularity/indirection can reduce some opportunities for optimization; Maintenance can still be an issue as modules from different codebases can be slung together at runtime
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a microkernel OS? What are the design decisions and performance tradeoffs?

A

Microkernel OS - only requires the most basic primitives and can support basic services. Everything else, all other software components, applications and software that we typically think of as an OS component like file system and device driver, are run outside the OS kernel at user level (unprivileged level). This type of OS requires a lot of inter-process interactions, and therefore supports IPCs as one of its core abstractions and mechanisms, along with address space and threads.

  • Pros: Size; Verifiability (great for embedded devices)
  • Cons: Bad portability (often customized to underlying hardware); Harder to find common OS components due to specialized use case; Expensive cost of frequent user/kernel crossing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a process?

A

A process is a state of a program (an application on disk/flash memory/cloud; a static entity) when executing and loaded in memory and becomes an active entity.

It represents the execution state of an active application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a process look like?

A

A process encapsulates the following elements (process states) for running an application:
- stack: dynamic part of the address state that grows or shrinks during execution in LIFO order
- heap: dynamically created during execution
- data and text: static state when process first loads

Every single element has to be uniquely identified by its address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between stack and heap?

A

The stack is a region of memory used for managing function calls and local variables. Memory is allocated and deallocated automatically in a Last-In, First-Out (LIFO) manner. The speed is very fast due to its well-structured allocation pattern. It’s limited in size, typically defined at the start of the program. Variables exist only as long as the function they belong to is running.

The heap is a region of memory used for dynamic memory allocation. Memory is allocated and deallocated manually, e.g., malloc() and free(). Speed is slower than the stack due to fragmentation and manual management. Size is much larger than the stack but depends on available system memory. The variables persist until explicitly deallocated, making heap memory useful for objects that need to outlive function calls.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an address space?

A

An address space is an OS abstraction used to encapsulate all of the process states defined by a range of addresses. It is an “in memory” representation of a process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the purpose of the page tables?

A

A page table is a data structure used in a virtual memory system to map virtual addresses to physical addresses (in physical memory or DRAM).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If two processes, P1 and P2, are running at the same time, is it possible for them to have the same virtual address space range?

A

Yes. The OS underneath will map P1’s virtual address to some physical location, and P2’s virtual address to other physical location. Decoupling the virtual addresses that are used by the processes from the physical addresses where data actually is makes it possible for different processes to have the exact the same address space range and to use the exact same addresses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does the operating system know what a process is doing?

A

At any given point in time, the CPU knows where in the binary (instruction sequence of the application) the process currently is via the program counter (PC). The PC is maintained on the CPU while the process is executing in a register. Process stack also defines what a process is doing. The stack pointer (SP) defines the top of the stack.

For every process, the OS maintains a process control block (PCB).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a process control block (PCB)?

A

It is a data structure that the OS maintains for every one of the processes that it manages. It is created when the process is initially created itself. Certain fields of the PCB are updated when process state changes while other fields change too frequently. Some fields in the PCB include:
- process state
- process number
- program counter
- registers
- memory limits
- list of open files
- priority
- signal mask
- CPU scheduling info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How are PCBs used?

A

PCBs are used by the OS for context switching between processes. Saving and restoring process states are done via PCBs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a context switch between processes?

A

It is the mechanism used by the OS to switch the execution from the context of one process to the context of another process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why is context switch between processes expensive?

A
  1. There are direct costs which is the number of cycles that have to be executed to load and store all the values of the PCBs to and from memory.
  2. Indirect costs - When we context switch to another process, some or even all of the data belonging from the previous process in the processor cache will be replaced to make room for the data needed by the current process. The next time the previous process is scheduled to execute, its data will not be in the cache. It will spend much more time reading data from memory, which will incur cache misses. This is called cold cache.

Running in cold cache is bad because every single data access requires much longer latency to memory and it slows down the execution of the process. We therefore want to limit context switching between processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What do we do or get when a cache is hot?
1. Most process data is in the cache so the process performance will be at its best 2. Sometimes, we must context switch when there is another prioritised process or there is a policy to timeshare the CPU.
26
Describe the states in a lifetime of a process.
When a process is created, it's in the *new* state. At this point, the operating system initializes the PCB for the process, and it moves to the *ready* state. In this state it is able to be executed, but it is not being executed. Once the process is scheduled and moved on to the CPU it is in the *running* state. If the process is then interrupted by the scheduler, it moves back the the *ready* state. If the process is running, and then makes an I/O request, it will move onto the wait queue for that I/O device and be in the *waiting* state. After the request is serviced, the process will move back to the *ready* state. If a running process exits, it moves to the *terminated* state.
27
The CPU is able to execute a process when the process is in which state?
*Running* and *ready* states
28
Describe all the steps which take place for a process to transition from a waiting (blocked) state to a running (executing on the CPU) state.
When a process is in a blocked state, this means that process is currently waiting for an I/O request to be fulfilled. Usually this means that the process is sitting on an I/O queue within the kernel that is associated with the device it's making the request to. Once the request is fulfilled, the process moves back to a ready state, where it can be executed, although it is not yet scheduled. The currently executing process must be preempted, and the ready process must be switched in. At this point, the process is executing on the CPU.
29
What two basic mechanisms are supported by the OS for process creation?
1. fork - the OS will create a new PCB for the child process. It will copy the exact same values from the parent PCB into the child PCB. Both the parent and the child will continue their execution at the instruction that's immediately after the fork. 2. exec - It will take a PCB structure created via fork, but it will not leave its values to match the parent's values. Instead, the OS will replace the child's image and load a new program. The child's PCB will point to values or describe values that describe this new program. Note that the program counter of the child will now point to the first instruction of the new program.
30
In process creation, what processes are privileged processes? Give examples.
These are called root processes. Examples are sched(), init(), and pageout.
31
On UNIX-based OSs, which process is often regarded as the "parent of all processes"?
init
32
On the Android OS, which process is regarded as the "parent of all App processes"?
ZYGOTE - a daemon process which has the single purpose of launching app processes.
33
What is the role of the CPU scheduler?
The CPU scheduler is an OS component that determines which one of the currently ready processes will be dispatched to the CPU to start running, & how long should it run for. To manage the CPU, the OS must be able to: * preempt - interrupt the executing process and save its current context * schedule - run the scheduling algorithm in order to choose on the ready processes that should be run next * dispatch - dispatch the chosen process onto the CPU and switch into its context so that the process can finally start executing
34
What is the formula for useful CPU work?
Useful CPU work = total processing time / total time total processing time = 2 * Tp where Tp is the amount of time to run a process total time = 2*Tp + 2*tsched Note that the longer the CPU runs a process, the less frequent it runs the scheduler
35
How I/O operations affect scheduling?
Consider a process had made an I/O request. - OS delivered the request and placed the process on an I/O queue that's associated with the particular device - Process is now waiting in the I/O queue. It will remain in the queue until the device completes the operations that the I/O event is complete and responds to the request. - Once the I/O request is dismissed, the process is ready to run again. Process can be placed in the ready queue or can be scheduled on the CPU.
36
What are the different ways a process can make its way into the ready queue?
1. A process which is waiting on an I/O event 2. A process which is running on the CPU but its time slice expired 3. When a new process is created via the fork call 4. A process which is waiting for an interrupt
37
What are some responsibilities of the CPU scheduler?
1. Maintaining the ready queue 2. Deciding on when to context switch
38
What are inter-process communication mechanisms?
1. They help transfer data and information from one address space to another. 2. They maintain the protection and isolation that operating systems are trying to enforce. 3. They provide flexibility and performance.
39
What are the pros-and-cons of message-based vs. shared-memory-based IPC?
Message-passing IPC - The OS establishes a communication channel, e.g., shared buffer. Processes interact with it by writing or sending a message into that buffer, or reading or receiving a message from that shared communication channel. * Pros: OS will manage the message passing channel; has system calls in place for operations like read/write and recv/send; user doesn't need to worry about any of the details. * Cons: overhead; every piece of data that is sent between processes must first be copied from the sending process's address space into memory associated with the kernel and then into the receiving process's address space; here is user/kernel crossing for every message. Shared-memory IPC - The OS establishes a shared memory channel and then it maps it into the address space of both processes. Processes are allowed to directly read and write from this memory. * Pros: The kernel is completely uninvolved. The processes can read from and write to this memory as they would with any other memory in their address space * Cons: Without the kernel, a lot of the APIs that were taken for granted with message passing IPC may have to be re-implemented. For example, the user now has to manage all of the synchronization themselves.
40
How are threads different from processes?
Threads represent multiple independent execution contexts. They are part of the same virtual address space, which means they share all of the virtual to physical address mappings, as well as the code, data, and files. However, they potentially execute different instructions, access different portions of the address space, operate on different portions of the input. Each thread has a different program counter, stack pointer, stack, thread-specific registers. The PCB is more complicated as it contains all information that is shared among all the threads, as well as separate information about every single one of the execution contexts that are part of the process.
41
What are benefits of multithreading? When is it useful to add more threads, when does adding threads lead to pure overhead? What are the possible sources of overhead associated with multithreading?
There are two main benefits of multithreading, namely parallelization and concurrency. Parallelization comes into effect on multi CPU systems in which separate threads can be executing at the same time on different processors. Parallelizing work allows for the work to be accomplished more quickly. Concurrency refers to the idea that the CPU can do a little bit of work on one thread, and then switch and do a little bit of work on another thread. Succinctly, concurrency refers to the idea that the execution of tasks can be interwoven with one another. The primary benefit of concurrency without parallelization - that is, concurrency on single CPU platforms - is the hiding of I/O latency. When a thread is waiting in an I/O queue, that thread is blocked: its execution cannot proceed. In a multithreaded environment, a thread scheduler can detect a blocked thread, and switch in a thread that can accomplish some work. This way, it looks as if progress is always being made (CPU utilization is high) even though one or more threads may be waiting idle for I/O to complete at any given time. Adding threads make sense if you have fewer threads than you have cores or if you are performing many I/O requests. Adding threads can lead to pure overhead otherwise. While new threads require less memory than new processes, they still do require memory. This means that threads cannot be made for free, and in fact thread creation is often quite expensive. In addition, thread synchronization can add more overhead to the system, in terms of the CPU cycles and memory that must be consumed to represent and operate synchronously.
42
Do the following statements apply to processes (P), threads (T), or both (B)? 1. Can share a virtual address space 2. Take longer to context switch 3. Have an execution context 4. Usually result in hotter caches when multiple exist 5. Make use of some communication mechanism
1. Can share a virtual address space (T) 2. Take longer to context switch (P) 3. Have an execution context (B) 4. Usually result in hotter caches when multiple exist (T) 5. Make use of some communication mechanism (B)
43
Describe the lifetime of a thread.
A thread can be created (a new execution context can be created and initialized). To create a thread, we need to specify what procedure it will run, and what arguments to pass to that procedure. Once the thread has been created, it will be a separate entity from the thread that created it, and we will say at this point that the process overall is multithreaded. During the lifetime of a thread, the thread may be executing on the CPU - similar to how processes may be executing on the CPU at any point in time - or it may be waiting. Threads may wait for several different reasons. They may be waiting to be scheduled on the CPU (just like processes). They may be waiting to acquire some synchronization construct - like a mutex - or they may be waiting on some condition to become true or some signal to be received - like a condition variable. Different queues may be maintained depending on the context of the waiting. Finally, at the end of their life, threads can be joined back in to their parents. Through this join, threads can convey the result of the work they have done back to their parents. Alternatively, detachable threads cannot be joined back into their parent, so this step does not apply to them. Threads can also be zombies! A zombie thread is a thread that has completed its work, but whose data has not yet been reclaimed. A zombie thread is doing nothing but taking up space. Zombies live on death row. Every once in a while, a special reaper thread comes and cleans up these zombie threads. If a new thread is requested before a zombie thread is reaped, the allocated data structures for the zombie will be reused for the new thread.
44
What do we need to support threads?
- thread data structure: to distinguish a thread from a process. It should allow us to identify threads and keep track of their resource usage. - mechanisms to create and manage threads - mechanisms to allow threads to safely coordinate among each other, especially when there are certain dependencies between their execution, when these threads are executing concurrently.
45
What is a data race problem in multithreaded environments?
A data race problem is when multiple threads are accessing the same data at the same time and modify it at the same time.
46
Provide synchonization mechanisms for threads that deal with concurrency issues.
1. Mutual exclusion - only one thread at a time is allowed to perform an operation; uses mutexes 2. Waiting on other threads - specific condition before proceeding; uses condition variable
47
What is the thread type proposed by Birrel?
It is a data structure that contains all information that is specific to a thread and that can describe a thread. This inclues the thread ID, register values, program counter, stack pointer, and othe attributes.
48
How do we create a new thread according to Birrel?
Birrel proposes a fork call with two parameters, a procedure (proc argument) and args which are the arguments for the procedure. fork (proc, args)
49
What is the mechanism proposed by Birrel to determine that a thread is done, retrieve its result, or determine the status of its computation?
join(thread) - terminate a thread. When the parent thread calls join, with the thread ID of the child thread, it will be blocked until the child completes. Join will return to the parent the result of the child's computation.
50
What are mutexes?
Mutexes are synchronization constructs that enforce the principle of mutual exclusion. Mutual exclusion is the requirement that one thread of execution never enter its critical section at the same time that another concurrent thread of execution enters its own critical section.
51
What are conditonal variables?
Condition variables are constructs used in conjunction with mutexes to control concurrent execution. While mutexes and mutual exclusion represent binary states - either the thread can access a shared resource or it cannot - condition variables allow for more complex states. Threads can wait on condition variables until a condition is met, and threads can signal/broadcast on condition variables when a condition is met.
52
Write the steps/code for entering/existing a critical section for problems such as reader/writer, reader/writer with selective priority (e.g., reader priority vs. writer priority)?
free: resource_counter = 0 reading: resource_counter > 0 writing: resource_counter = -1 ///////////////////////////////// Multiple Readers: mutex_lock(&m); while(resource_counter== -1) wait(&cond_reader, &m); resource_counter++; mutex_unlock(&m); // read data.... mutex_lock(&m); resource_counter--; if(resource_counter== 0) signal(&cond_writer); mutex_unlock(&m); ////////////////////////////////////////// Single Writer: mutex_lock(&m); while(resource_counter != 0) wait(&cond_writer, &m); resource_counter = -1; mutex_unlock(&m); // write data.... mutex_lock(&m); resource_counter = 0; mutex_unlock(&m); signal(&cond_writer); broadcast(&cond_reader);
53
What are spurious wake-ups, how do you avoid them, and can you always avoid them?
Spurious wake ups occur when a thread is woken up when the application is in a state during which it's impossible for that thread to make progress. This often occurs when one thread tries to signal/broadcast while holding a mutex. Threads associated with condition variable being signaled on will be woken up from the wait queue associated with the condition variable, but since the signaling thread holds the mutex, they will immediately be placed on the wait queue associated with acquiring the mutex. CPU cycles will be wasted. Spurious wake ups can often be avoided by placing signal/broadcast calls outside of the mutex. Note that, this cannot always happen: if a signaling call depends on some configuration of shared state, this must be checked while holding the mutex, as with any other shared state access.
54
Why do we need a while() for the predicate check in the critical section entry code examples in the mutex lessons?
When a condition variable is signaled on, and a thread is to be "woken up", two things occur: - The thread is removed from the wait queue - The thread reacquires the mutex The two steps are logically independent. What this means is that another thread may be given CPU time (and potentially the mutex) in between these two steps. As a result, the shared state may be updated before the waking up thread reacquires the mutex. Therefore, the thread must do another check when it actually acquires the mutex to ensure the condition it needs to proceed is still true.
55
What’s a simple way to prevent deadlocks? Why?
The simplest way to prevent deadlocks is to maintain a lock order. This will prevent a cycle in the wait graph, which is necessary and sufficient for a deadlock to occur. Two threads can deadlock if they are each trying to acquire mutexes the other holds. A thread can also deadlock with itself if it tries to acquire a mutex it already holds. Enforcing a lock order can ensure that neither of these scenarios occurs.
56
Describe the boss-workers multithreading pattern. If you need to improve a performance metric like throughput or response time, what could you do in a boss-worker model? What are the limiting factors in improving performance with this pattern?
In the boss-workers problem, you have one boss thread and n worker threads. For each task that enters the system, the boss dispatches the tasks that the workers consume. This is often, though not necessarily, done through a shared request queue/buffer. In this case, the boss will add items to the queue, and the workers will consume items from the queue. If you wanted to improve throughput or response time in this model, there are mainly two things you could do. First, increase the number of threads. The more you leverage concurrency, the more potential you have to increase the efficiency of your processing. That being said, dynamic thread creation is expensive, so it makes sense to just have a pool of workers up front instead of creating new threads on the fly. Increasing the size of the pool may help performance. Second, decrease the amount of work the boss has to do. Since the boss has to do some processing on every request, the throughput of the system is correlated to this processing time (throughput = 1 / boss_time_per_order). Make sure to make this is low as possible. This approach is still limited by the amount of processing power you have. After a certain number of threads, the work will become CPU-bound. Some threads will end up just having to wait their turn for computing resources. In addition, this approach lacks locality. The boss doesn't keep track of what any of the workers were doing last. It is possible that a thread is just finishing a task that is very similar to an incoming task, and therefore may be best suited to complete that task. The boss has no insight into this fact.
57
Describe the pipelined multithreading pattern. If you need to improve a performance metric like throughput or response time, what could you do in a pipelined model? What are the limiting factors in improving performance with this pattern?
In the pipeline pattern, the execution of the task is broken down into stages. Threads are assigned to continually complete the work at once stage, receiving requests from the stage before it and passing work on to the stage after it. One of the benefits of this process is locality. Since each thread performs the same (small) task over and over again, it is more likely that the hardware cache will be hot, increasing performance. On a larger level, this pattern will be bottle necked by the stage that takes the longest to complete. One way to overcome the difference in processing needed for each stage is to allocate a different number of threads to each stage. For example, a stage that takes three times as long as another stage should have three times the amount of threads. A downside of this approach is that it is difficult to keep the pipeline balanced over time. When the input rate changes, or the resources at a given stage are exhausted, rebalancing may be required.
58
What are the execution time formulas for the boss-worker and pipeline models?
Boss/worker formula: execution time = time_to_finish_1_order * ceiling (num_orders / num_concurrent_threads) Pipeline formula: execution time = time_to_finish_first_order + (remaining_orders * time_to_finish_last_stage)
59
Can you explain the relationship among kernel vs. user-level threads? Think though a general mxn scenario (as described in the Solaris papers), and in the current Linux model. What happens during scheduling, synchronization and signaling in these cases?
User level threads are associated with a user level library (like pthreads), while kernel level threads are associated with the kernel threading implementation (like NPTL). For a user thread to run, it must be associated with a kernel thread, which in turn must be scheduled on the CPU. In the one-to-one model, there is one kernel thread for every user thread. That means that when a user thread is created, a kernel thread is also created. This 1:1 model is the current situation in Linux and is supported at the kernel level by the task struct. The benefit of the approach is that the kernel understands that the process is multithreaded, and it also understands what those threads need. Since the operating system already supports threading mechanisms to manage its thread, the user libraries can benefit directly from the multithreading support available in the kernel. One downside of this approach is that is it expensive: for every operation we must go to the kernel and pay the cost of a system call. Another downside is that since we are relying on the mechanisms and policies supported by the kernel, we are limited to only those policies and mechanisms. As well, execution of our applications on different operating systems may give different results. In a so-called many-to-many scenario, there can be one or more user threads scheduled on one or more kernel threads. The kernel is aware that the process is multithreaded since it has assigned multiple kernel level threads to the process. This means that if one kernel level thread blocks on an operation, we can context switch to another, and the process as a whole can proceed. One of the downsides of this model is that is requires extra coordination between the user- and kernel-level thread managers. Signaling in a many to many scenario comes with complexities. If the kernel thread has a signal enabled but the user thread does not, the user threading library may have to send directed signals back down into the kernel to get the right user level thread to respond to the signal.
60
What is the difference between "hard process state" and "light process state" in thread-related data structures?
After splitting the PCB (kernel's PCB - virtual address mapping; kernel-level threads - stack and register pointers needed to represent execution state), the kernel's PCB is further split into: - Hard process state: includes the virtual address mapping which is relevant to all user-level threads that execute within the process - Light process state: includes the signal masks and system call arguments which is only relevant to a subset of user-level threads that are associated with a particular kernel-level thread
61
Why use multiple data structures over a single process control block?
Multiple data structures have the following advantages over a single PCB structure to represent all aspects of the execution state of a process: 1. Scalability: multiple but smaller data structures versus large continuous data structure 2. Overheads: easier to share versus private for each entity 3. Performance: on context switch - save and restore what needs to change versus saving and restoring everything 4. Flexibility: update all versus user-level library only needs to update portion/subset of the state
62
In Linux 3.17: 1. What is the name of the kernel thread structure? 2. What is the name of the data sturtcure contained in the above structure that describes the process the kernel thread is running?
1. kthread_worker 2. task_struct
63
What are the kernel-level data structures in Solaris 2.0?
Process - list of kernel-level threads - virtual address space - user credentials - signal handlers Light-Weight Process (LWP) - similar to user-level threads but visible to the kernel; not needed when process is not running - user level registers - system call arguments - resource usage info - signal mask Kernel-level threads - kernel-level registers - stack pointer - scheduling information - pointers to associated LWP, process, CPU structure CPU - current thread - list of kernel-level threads - dispatching and interrupt handling information
64
Can you explain why some of the mechanisms described in the Solaris papers (for configuring the degree concurrency, for signaling, the use of LWP…) are not used or necessary in the current threads model in Linux?
The mechanisms described in the Solaris paper were all mechanisms for rationing the number of threads that are created at any given point in time. Configuring the degree of concurrency implied that you couldn't just be automatically given as much as you needed. Having LWPs multiplex over kernel threads indicated that kernel threads needed to be shared. The rationing was based on real concerns about the available memory in a system at any given time. The native implementation of threads in Linux is the Native POSIX Threads Library (NPTL). This is a 1:1 model, meaning that there is a kernel level task for each user level thread. In NPTL, the kernel sees every user level thread. This is acceptable because kernel trapping has become much cheaper, so user/kernel crossings are much more affordable. Also, modern platforms have more memory - removing the constraints to keep the number of kernel threads as small as possible.
65
In the pthreads_library, which function sets the concurrency level? For the above function, which concurrency value instructs the implementation to manage the concurrency level as it seems appropriate?
- pthread_setconcurrency() - 0
66
What are adaptive mutexes?
Mutexes that sometimes result in the thread to spin and other times it results in the thread to block, depending on the length of the critical sections. If the critical section is short, the thread spin on the CPU and burn a few cycles until the other thread releases the mutex. For long critical sections, the thread is properly blocked. Note that we only use adaptive mutexes on multiple CPU systems.
67
In the Linux kernel's codebase, a minimum of how many threads are needed to allow a system to boot? What is the name of the variable used to set this limit?
- 20 threads - max_threads
68
What is an interrupt?
Interrupts are events generated externally by components other than the CPU, e.g., I/O devices, timers, other CPUs. Type of interrupt depends on the specific configuration of the physical platform and the types of devices that it has. They also appear asynchronously.
69
What is a signal?
Signals are events triggered by the CPU and software running on it. Type of signal depends on the operating system. They appear synchronously or asynchronously.
70
What are the similarities between interrupts and signals?
- Have a unique ID depending on the hardware or operating system - Both can be masked and disabled/suspended via corresponding mask; per CPU interrupt mask, per process signal mask - If enabled, both trigger their corresponding handler. The interrupt handler is set for the entire system by the operating system; signal handlers are set on per process basis by process
71
What happens during interrupt handling?
When a user-level application tries to perform a illegal task using the hardware, the kernel is notified via an interrupt. An interrupt is handled on a per-CPU basis, and the operating system maintains an interrupt handler table, which maps interrupts by the interrupt number (INT) to handling procedures. When the interrupt occurs, the kernel jumps to the associated interrupt handler and executes that code. Which interrupts occur is a function of the platform on which you are running. How those interrupts are handled is a function of the OS on top of the physical system.
72
What happens during signal handling?
Each process maintains its own signal handling table, which is very similar to kernel-level interrupt handling table. Each entry contains a reference to the signal and a reference to the handling code. When a signal comes in, the process jumps to the handling code. Threads cannot have their own handler, although they can set their own signal masks to ensure that they can disable signals they don't want to receive.
73
What's the potential issue if an interrupt or signal handler needs to lock a mutex? What's the workaround described in the Solaris papers?
Since handlers are executed within a thread's stack, there is the potential for a thread to deadlock with itself if it tries to lock a mutex it has already acquired. This is because the current stack frame needs the mutex that was acquired in a lower stack frame to be released. The handling routine was trying to lock a mutex that was already held by the thread that was interrupted by the handling routine. The solution is to have signal handlers execute in another thread. This way, the signal handling code can contend for a mutex like any other thread, which removes the possibility of deadlock. Since dynamic thread creation is expensive: - if the handler routine does not include locks, then execute it on the stack of the interrupted thread - if the handler routine include locks (and therefore will block), then turn it into real thread Another solution is to have threads alter their signal masks before entering and after exiting their critical sections. While this solution requires fewer SPARC instructions than creating a new thread to handle signals, mutex acquisition happens much more frequently than signals. This is another example of optimizing for the common case!
74
What are the two types of signals?
1. One-shot signals - If there are multiple instances of the same signal that will occur, they will be handled at least once. The handling routine must be re-enabled every single time. This demonstrates an overriding behavior. 2. Real time signals - If a signal is raised n times, then the handler is guaranteed to be called n times as well. This demonstrates a queuing behavior.
75
Using the most recent POSIX standard, indicate the correct signal names: - terminal interrupt signal - high bandwidth data is available on a socket - background process attempting to write - file size limit exceeded
- terminal interrupt signal: SIGINT - high bandwidth data is available on a socket: SIGURG - background process attempting to write: SIGTTOU - file size limit exceeded: SIGXFSZ
76
Consider boss-worker (120 ms per order; 6 threads) and pipeline (20 ms per stage; 6 stages) models, both for 11 orders. Calculate the execution time and average time per order.
Boss-worker model: Execution time = 120 * ceiling (11/5) = 120 * 3 = 360 ms Average time per order = (5*120 + 5*240 + 1*360) / 11 = 196 ms Pipeline stage model: Execution time = 120 + (10*20) = 320 ms Average time = (120 + 140 + ... 320) / 11 = 220 ms
77
What are the pros and cons of a multi-process (MP) web server?
The main benefits of the MP model are its simplicity. Each request is handled in a new process with its own address space: there is no need for any (explicit) synchronization code. However, this benefit comes at the cost of performance. First, the memory foot print of the MP implementation is large. Processes have discrete address spaces, which means that memory usage grows steeply with request count. As a result, there is less memory available for caching, which means that computation becomes disk-bound more quickly. As well, context switching slows down the performance of the MP implementation. Context switching between processes consumes CPU cycles, which could otherwise be spent handling requests.
78
What are the pros and cons of a multi-threaded (MT) web server?
The main benefits of the MT model, compared to the MP model are that it is more efficient. MT applications share the address space of the process they are contained in, so they are more memory efficient than MP applications. As well, this sharing allows for smaller context switches (and often leaves hotter hardware caches), which can help performance. Unfortunately, this benefit comes at the cost of complexity. MT applications require explicit synchronization code to be written (mutexes, condition variables, etc). In addition, MT applications require kernel MT support.
79
What are the benefits of the event-based model described in the Flash paper over MT and MP? What are the limitations? Would you convert the AMPED model into a AMTED (async multi-threaded event-driven)? How do you think an AMTED version of Flash would compare to the AMPED version of Flash?
The event-based model operated primarily in one thread, which makes its memory footprint smaller than both the MT and MP model. As well, the cost of context switching was not (as) present. That being said, helper processes were involved, but their utilization was as needed: they weren't created blindly for every new request entering the system. The benefit of the smaller memory footprint was that more memory was available for the various types of caching done in the system. This meant that Flash could delay becoming disk-bound longer than the MT/MP models. Summary: - single address space - single flow of control - smaller memory requirement; compact address - no context switching (low overheads) - no synchronization Limitations of the main approach behind Flash was that not every kernel supported asynchronous I/O. As a result, Flash had to fake it in a sense, utilizing helper processes to make I/O look async. I think that the AMTED model would perform better than the AMPED model, as the MT model uses less memory than the MP model, which again allows for a bigger cache. I think the main reason for choosing process-based helpers was that kernel support for multithreading was spotty at the time of the writing of the paper; thus, a process based approach was the most portable. That being said, I think the performance differential would be on par with that between the standard MP/MT models, which is to say, not that much.
80
Of the three models, which model likely requires the least amount of memory? - boss-worker - pipeline stage - event-driven
Event-driven model. Extra memory is only required for helper processes/threads for concurrent blocking I/O operations
81
There are several sets of experimental results from the Flash paper discussed in the lesson. Do you understand the purpose of each set of experiments (what was the question they wanted to answer)? Do you understand why the experiment was structured in a particular why (why they chose the variables to be varied, the workload parameters, the measured metric…).
1. Single File Test - This test was a test of "best case": to show what performance would look like across the servers when the request pattern was optimal. To achieve this, requests were made for a single file which varied in size. SPED performed better than AMPED - which performs the extra check to see if the file is in memory (not an issue here because its one file and is always going to be in memory). Both performed better than MT/MP, which were slowed by context switching overhead. (SPED > AMPED/Flash > MT/MP > Apache) 2. Owlnet Trace - This tests workloads that can primarily be served from cache, but not completely. AMPED performs slightly better than SPED here. Since some of the work is disk-bound SPED blocks where AMPED context switches. Both of these approaches are better than MT/MP models, which have more memory allocated to them (less for cache size), and still incur the overhead of context switching. 3. CS Trace - This tests workloads that become disk-bound very quickly. In this case, AMPED, and MT/MP smoke SPED. Since SPED is single-threaded, with not helpers, SPED blocks a lot when the workload is disk-bound. AMPED only has enough helper processes to keep the disk busy, so it has a smaller memory footprint and few context switching overheads than MP/MT Summary: - When data is in cache: SPED >> AMPED Flash; unnecessary test for memory presence - SPED and AMPED Flash >> MT/MP; sync & context switching overhead - When workload is disk-bound: AMPED Flash >> SPED; blocks because no async I/O - AMPED Flash >> MT/MP; more memory efficient, less context switching
82
List and describe some performance metrics.
1. Execution time 2. Throughput, e.g., bytes per second (amount of data in a given amount of time) 3. Request rate 4. CPU utilization - % of time a CPU is working to complete tasks 5. Wait time 6. Platform efficiency 7. Performance/$ 8. Performance/W 9. Percentage of SLA violations
83
If you ran your server from the class project for two different traces: (i) many requests for a single file, and (ii) many random requests across a very large pool of very large files, what do you think would happen as you add more threads to your server? Can you sketch a hypothetical graph?
Single file: Adding more threads would likely increase the throughput of the GETFILE server. Files need to be read from disk into memory before they can be served. Once a file is read into memory, it can stay in memory until its contents need to be replaced. More threads would be able to serve more concurrent requests. Random requests across a large pool of large files: Adding more threads may not have a huge impact on throughput in this scenario. Any given new request is not likely to be able to read from main memory, since the range of inputs is so wide. This means that this application would likely be disk-bound from the get-go. Increasing the number of threads allows for more concurrent requests to be processed, but each thread will likely have to pull content from disk. I would say that the graph for throughput vs threads in both cases would be logarithmically increasing, with the graph for the single file example rising much more sharply than the graph for the large pool.