0. Midterm Questions Flashcards

Question

Describe how a scheduling algorithm might improve resource allocation by observing processes communicate using pipes. What is a tradeoff involved in this decision?

Answer 1

Assume I have the following shell pipeline: foo | bar. Due to the pipe between them, before each run we know that bar cannot make much progress until foo runs and begins to fill the pipe. So we could just run foo to completion, buffer all of its output in the pipe, and then run bar to completion. This would minimize the context switches between them. However, the tradeoff here is that this requires a large amount of memory - or some kind of storage - to buffer the pipe contents. Switching between the processes can help reduce memory overhead by keeping the pipe size smaller. Ideally, we could ru them on separate cores, but might still need to schedule them carefully so that bar has enough work to do when it runs to justify the context switch while foo doesn't fill the buffer too quickly.

Answer 2

The reason to have separate exceptions is to allow the OS to establish separate code paths - this is kind of given away by the second part of the question. This is particularly important given that TLB exceptions are extremely common and, if you improve performance on this hot code path, then the system will run faster overall. One way that the OS could take advantage of this differentiation is by designing a TLB handler that doesn't require saving all of the state required by the system call path. In general, system calls and hardware interrupts can branch off into very long code paths, meaning that it is reasonable to save all of the registers up front. However, the TLB code path is extremely specific, and so might be able to avoid saving as much state - at least initially. If it branches off into a page fault, then more state might need to be saved, but that could be done at that point.

Answer 3

See 2015 Midterm Q7

Answer 4

See 2015 midterm, page 12

Answer 5

See 2015 midterm page 14. Quick notes: Bottleneck is the linear search for lowest available file descriptor. Change this by allowing that open doesn't have to return lowest-available FD. This allows us to search for a FD at a random starting position (not 0 every time) and allows us to apply something called read-lock-read optimization (?)

Answer 6

All of them

Answer 7

Load and store

Answer 8

A linear dependency graph

Answer 9

Prioritizing interactive threads

Answer 10

base-and-bounds address translation

Answer 11

1. CONCURRENCY. One single-core systems concurrency is an illusion, since there are never really two threads running at once. This is useful since it provides users with the illusion that multiple things are happening at once - music is playing while the web page is updating while they are typing into the terminal. Concurrency is provided by rapidly switching between threads fast enough to human perceptual limitations. This requires the ability to perform context switches between threads, stopping one's use of the processor to allow another to continue, and then returning the first to exactly where it was stopped. 2. ATOMICITY. When we discussed synchronization we identified cases where we wanted to make multiple actions that actually occurred separately happen all at once, or atomicly. An example is a modification to a shared data structure that requires multiple operations that should all happen at once - imagine adding an entry to a synchronized linked list, which requires updating multiple pointers. Atomicity is more than useful, and in certain cases (like our example) is actually required for correctness. One way we provide the illusion of atomicity is by locking shared data structures, which forces other users to sleep while we make our modifications, allowing them to all look atomic. 3. ADDRESS SPACES. Address spaces comprise multiple illusions: that processes have access to a large (1) amount of private (2) contiguous (3) memory (4). In reality, (1) the size of the address space may be larger than the amount of available physical memory. In addition, the same physical memory may be temporally multiplexed between processes, making it not entirely private (2). And it is certainly not contiguous, with any virtual page mapping to any physical page (3), and may not even be memory at all (4) if the page has been swapped to disk or the virtual address is set up to point to a file by mmap. But the AS abstraction is useful because it simplifies how processes deal with memory. They can lay out their memory the same way each time and logically separate regions, such as the stack and heap, in ways that allow them to grow dynamically. At some level all of these illusions are provided by translating virtual addresses into physical addresses, with the extra level of indirection allowing the OS to move memory around and reuse it for other purposes as long as the virtual addresses obey the memory interface

Answer 12

The complication is that NUMA challenges the address space assumption that all memory is uniform. Now, some parts of each process's address space will be faster than others, and if care is not taken these faster parts ma change over time as pages move to different regions of physical memory. The simplest approach to using the faster memory is to treat it as a cache for the slower memory, leveraging our design principle as creating the machine as a series of caches. The operating system should try to put frequently-used pages in the fast memory and less-frequently used pages in the slow memory, which also requires some way of tracking page usage.

Answer 13

1. Random. (This is from the no-nothing group.) Choose a thread at random. Pros: simple and serves as a good comparison point. Cons: too simple and unlikely to produce good performance. 2. Round-Robin. (This is from the no-nothing group.) Simply establish an ordering between threads and run them in that order. Pros: also simple. Cons: also unlikely to produce good performance, doesn’t reward interactivity, doesn’t incorporate priorities, etc. 3. Shortest-Job First. (This is from the know-it-all group.) Order jobs by how long they take to complete and run them until they do. Pros: minimizes average waiting time. Cons: can’t be implemented, also doesn’t time-share between tasks. 4. Shortest-Remaining Time First. (This is from the know-it-all group.) Order jobs by how long they will run before they block and run them in that order. Pros: also does a good job at reducing waiting time and improving interactive performance. Cons: can’t be implemented, and may starve long-running non-interactive threads. 5. Multi-Level Feedback Queues. Maintain a list of priority queues. Jobs run roundrobin (or random) and always from the top queue first. If a job runs to the end of its quantum, it may be demoted to a lower priority level; if it completes early, it may be promoted to a higher priority level. Pros: rewards interactive use. Cons: may starve long-running background tasks, and all solutions to this issue are somewhat ugly. 6. Rotating Staircase Deadline Scheduler. I’m not going to try to describe this briefly— instead, refer to the lecture notes. Pros are that it can provide guarantees about interactive response time. The only con that I can think up is that it never made it into mainline Linux, but I’m sure that there are others.

Answer 14

SEE 2014 MIDTERM number 7

Answer 15

Schedulers face the challenge of determining when users are waiting, in some form or another, or a task to finish—to animate a cursor or draw a character glyph in response to input, as one example, or to render a page that has been retrieved for the Internet as another. Because users are aware of the performance of interactive tasks, but not aware of the performance of others, schedulers may want to allocate these interactive processes more resources or provide them prioritized access to the processor. Smartphones are a unique computing device that is both (a) always on and (b) battery powered. Desktops may be left on but are not battery powered, while laptops are battery-powered but usually shut down when not in use and moved around. These two features combine to produce a device that has both an important resource limitation (energy) and the potential for a large amount of background usage. This makes it even more important that smartphone schedulers identify interactive tasks and ensure that non-interactive work does not drain the device’s battery. Adding to this challenge is the fact that smartphones face a constantly-changing environment caused by mobility: one minute they have a connection over a high-speed and energy-efficient Wifi network, the next they are forced to use a slower and more power-hungry 3G mobile data network. These changes create the potential for non-interactive background tasks to do even more damage to device lifetimes if they are allowed to use the smartphone inefficiently. There were several different ways to incorporate the unique features and interaction patterns of smartphones into interactivity detection within the smartphone thread scheduler. Two important aspects of smartphones that you may have noticed and utilized: • Unlike computing devices utilizing windowing environments allowing multiple apps to be present on the screen at one time, the limited size of smartphone displays means that users are usually only interacting with a single app at one time. While this doesn’t eliminate the interactivity detection problem entirely—apps may have multiple threads, some performing interaction, others doing background jobs—it does reduce the problem to determining which of the threads of the foreground app are interactive and which are not. In addition, when the screen is off you can probably assume that nothing interactive is going on, although you can’t shut down work altogether because apps deliver notifications to users by performing background tasks and may be doing other useful things to optimize the foreground experience. But if the user isn’t looking at the device they probably aren’t waiting on it! • Smartphones also feature a number of sensors that may simplify the interactivity detection problem. Onboard cameras may be able to determine whether the user is looking at the device and use that to help guess if they are waiting on an action to complete. Orientation sensors may detect movement associated with gameplay and trigger additional performance for that highly-interactive set of apps. That said, since the point of this question was to get you to think creatively we will definitely accept other reasonable answers. If you really have a great idea about how to solve this problem, why not come join the blue Systems Research Group and try your ideas out for real on the hundreds of smartphones deployed as part of PHONELAB?

Answer 16

First, how and in what cases do 64K pages improve or degrade performance? This is related to the discussion we had in class regarding page size. A straightforward benefit of 64K pages is that they allow the TLB to map more memory with the same number of entries, potentially leading to fewer TLB misses and associated TLB faults. Since 64K pages are larger than 4K ones, the question reduces to in what case do larger pages help and hurt? They help when there is sufficient spatial locality in memory accesses within the contents of the 64K page and they hurt when they do not. One way to think about it is once I touch one byte of memory on the 64K page, what is the likelihood I will touch bytes on the 15 other 4K pages inside the 64K page? If it’s high, I might as well make room for the whole page as soon as I touch any byte inside of it; if it’s low, I’m going to be moving a lot of memory around without any gain over locating the byte on a standard 4K page2 So spatial locality is what the OS would like to know. Maybe the best way of finding out would be to ask the process to tell me directly, so if it has a data structure that spans multiple 4K pages it would be better stored on a 64K page. (In addition, I might be able to tell from accesses on groups of 4K pages that they really all belong to one larger page, but it’s not clear if the content can be relocated at this point.) So how do we implement pseudo-64K pages without actual hardware support, i.e. on 4K underlying pages? Easy: whenever we have a TLB fault within a 64K region identified as a 64K page we load entries (and page contents) for not only the 4K page that caused the fault but for all of the other 15 pages within the 64K page. In a way, we can be even more flexible than real 64K pages, since we could load N extra pages on each side of the faulting page (for example, although 2N + 1 is hard to make even). Clearly this requires a software-managed TLB to do in all cases, since a hardware managed TLB will load the page translation when it is in memory and not alert the OS. (You could still implement the same behavior on page faults, but that’s not exhaustive enough to cover all cases.) The benefits of 64K pages that we preserve are fewer TLB faults since all 15 extra 4K pages on the 64K page are now loaded in the TLB and into memory. Assuming sufficient spatial locality, this is a good thing! The benefit that is lost is that the amount of memory that the TLB can map is still limited by the underlying 4K page size.

0. Midterm Questions Flashcards

(43 cards)