Input/Output Flashcards
Advances in chip technology have made it possible to put an entire controller, including all the bus access logic, on an inexpensive chip. How does that affect the model of Fig. 1-6 (https://imgur.com/a/Hvu8wNk)?
(TUT11)
In the figure, we see controllers and devices as separate units. The reason is to allow a controller to handle multiple devices, and thus eliminate the need for having a controller per device. If controllers become almost free, then it will be simpler just to build the controller into the device itself. This design will also allow multiple transfers in parallel and thus give better performance.
Given the speeds listed in Fig. 5-1 (https://imgur.com/a/ElwlSlB), is it possible to scan documents from a scanner and transmit them over an 802.11g network at full speed? Defend your answer.
(TUT11)
The 802.11g specification is a standard for wireless local area networks (WLANs) that offers transmission over relatively short distances at up to 54 megabits per second (Mbps), compared with the 11 Mbps theoretical maximum with the earlier 802.11b standard.
Scanner: According to the table, it is data rate is 1 MB/s
Figure 5-3(b) (https://imgur.com/a/SkOW6LD) shows one way of having memory-mapped I/O even in the presence of separate buses for memory and I/O devices, namely, to first try the memory bus and if that fails try the I/O bus. A clever computer science student has thought of an improvement on this idea: try both in parallel, to speed up the process of accessing I/O devices. What do you think of this idea?
(TUT11)
A normal memory request. The memory bus finishes first, but the I/O bus is still busy. If the CPU waits until the I/O bus finishes, it has reduced memory performance. If it just tries the memory bus for the second reference, it will fail if this one is an I/O device reference. If there were some way to immediately cancel the previous I/O bus reference to try the second one, the improvement might work (there is never such an option). All in all, it is a bad idea.
Explain the tradeoffs between precise and imprecise interrupts on a superscalar machine.
An advantage of precise interrupts is simplicity of code in the operating system since the machine state is well defined. On the other hand, in imprecise interrupts, OS writers have to figure out what instructions have been partially executed and up to what point. However, precise interrupts increase complexity of chip design and chip area, which may result in slower CPU.
A DMA controller has five channels. The controller is capable of requesting a 32-bit word every 40 nsec. A response takes equally long. How fast does the bus have to be to avoid being a bottleneck?
(TUT11)
Given:
- 5 channels.
- 32-bit word is requested every 40 nsec. A response takes 40 nsec too.
The round trip time for a request is 80 nsec. Data transfer rate is: 32 bit/80 nsec = 4 bit/10 nsec = 4 bit × 10⁸ sec = 400 Mbit/sec = 50 MB/sec.
This is the speed that the bus should have to avoid being a bottleneck. Having five channels is irrelevant, since each request and response takes the bus.
Suppose that a system uses DMA for data transfer from disk controller to main memory. Further assume that it takes t1 nsec on average to acquire the bus and t2 nsec to transfer one word over the bus (t1»_space; t2). After the CPU has programmed the DMA controller, how long will it take to transfer 1000 words from the disk controller to main memory, if (a) word-at-a-time mode is used, (b) burst mode is used? Assume that commanding the disk controller requires acquiring the bus to send one word and acknowledging a transfer also requires acquiring the bus to send one word.
(TUT11)
Word-at-a-time mode: 1000 × [(t₁ + t₂) + (t₁ + t₂) + (t₁ + t₂)] Where the first term is for acquiring the bus and sending the command to the disk controller, the second term is for transferring the word, and the third term is for the acknowledgement. All in all, a total of 3000 × (t₁ + t₂) nsec.
Burst mode: (t₁ + t₂) + t₁ + (1000 × t₂) + (t₁ + t₂) where the first term is for acquiring the bus and sending the command to the disk controller, the second term is for the disk controller to acquire the bus, the third term is for the burst transfer, and the fourth term is for acquiring the bus and doing the acknowledgement. All in all, a total of (3 × t₁ + 1002 × t₂) nsec.
One mode that some DMA controllers use is to have the device controller send the word to the DMA controller, which then issues a second bus request to write to memory. How can this mode be used to perform memory to memory copy? Discuss any advantage or disadvantage of using this method instead of using the CPU to perform memory to memory copy
Memory to memory copy can be performed by first issuing a read command that will transfer the word from memory to DMA controller and then issuing a write to memory to transfer the word from the DMA controller to a different address in memory. This method has the advantage that the CPU can do other useful work in parallel. The disadvantage is that this memory to memory copy is likely to be slow since DMA controller is much slower than CPU and the data transfer takes place over system bus as opposed to the dedicated CPU-memory bus.
Suppose that a computer can read or write a memory word in 5 nsec. Also suppose that when an interrupt occurs, all 32 CPU registers, plus the program counter and PSW are pushed onto the stack. What is the maximum number of interrupts per second this machine can process?
(TUT11)
An interrupt requires pushing CPU registers, program counter and Program Status Word onto the stack. Returning from the interrupt requires fetching 34 words from the stack.
The maximum number of interrupts per second this machine can process can be calculated as follows:
In this machine there are 34 words and 1 program counter.
That is, total of 35 I/O operations where each takes 10 nsec take 350 nsec (350 × 10⁻⁹ seconds) in total.
Max number of interrupts per second: 1 / (350 × 10⁻⁹) = 2.85 × 10⁶.
Therefore, the machine can handle 2.85 million interrupts per second.
CPU architects know that operating system writers hate imprecise interrupts. One way to please the OS folks is for the CPU to stop issuing new instructions when an interrupt is signaled, but allow all the instructions currently being executed to finish, then force the interrupt. Does this approach have any disadvantages? Explain your answer.
The execution rate of a modern CPU is determined by the number of instructions that finish per second and has little to do with how long an instruction takes. If a CPU can finish 1 billion instructions/sec it is a 1000 MIPS machine, even if an instruction takes 30 nsec. Thus there is generally little attempt to make instructions finish quickly. Holding the interrupt until the last instruction currently executing finishes may increase the latency of interrupts appreciably. Furthermore, some administration is required to get this right.
In Fig. 5-9(b) (https://imgur.com/a/bqyKGWN), the interrupt is not acknowledged until after the next character has been output to the printer. Could it have equally well been acknowledged right at the start of the interrupt service procedure? If so, give one reason for doing it at the end, as in the text. If not, why not?
It could have been done at the start. A reason for doing it at the end is that the code of the interrupt service procedure is very short. By first outputting another character and then acknowledging the interrupt, if another interrupt happens immediately, the printer will be working during the interrupt, making it print slightly faster. A disadvantage of this approach is slightly longer dead time when other interrupts may be disabled.
A computer has a three-stage pipeline as shown in Fig. 1-7(a) (https://imgur.com/a/84sjjLa). On each clock cycle, one new instruction is fetched from memory at the address pointed to by the PC and put into the pipeline and the PC advanced. Each instruction occupies exactly one memory word. The instructions already in the pipeline are each advanced one stage. When an interrupt occurs, the current PC is pushed onto the stack, and the PC is set to the address of the interrupt handler. Then the pipeline is shifted right one stage and the first instruction of the interrupt handler is fetched into the pipeline. Does this machine have precise interrupts? Defend your answer.
Yes. The stacked PC points to the first instruction not fetched. All instructions before that have been executed and the instruction pointed to and its successors have not been executed. This is the condition for precise interrupts. Precise interrupts are not hard to achieve on machine with a single pipeline. The trouble comes in when instructions are executed out of order, which is not the case here.
A typical printed page of text contains 50 lines of 80 characters each. Imagine that a certain printer can print 6 pages per minute and that the time to write a character to the printer’s output register is so short it can be ignored. Does it make sense to run this printer using interrupt-driven I/O if each character printed requires an interrupt that takes 50 μ sec all-in to service?
The printer prints 50 × 80 × 6 = 24,000 characters/min, which is 400 characters/sec. Each character uses 50 μ sec of CPU time for the interrupt, so collectively in each second the interrupt overhead is 20 msec. Using interrupt-driven I/O, the remaining 980 msec of time is available for other work. In other words, the interrupt overhead costs only 2% of the CPU, which will hardly affect the running program at all.
Explain how an OS can facilitate installation of a new device without any need for recompiling the OS.
UNIX does it as follows. There is a table indexed by device number, with each table entry being a C struct containing pointers to the functions for opening, closing, reading, and writing and a few other things from the device. To install a new device, a new entry has to be made in this table and the pointers filled in, often to the newly loaded device driver.
In which of the four I/O software layers (https://imgur.com/a/1l5aHDj) is each of the following done.
a) Computing the track, sector, and head for a disk read.
b) Writing commands to the device registers.
c) Checking to see if the user is permitted to use the device.
d) Converting binary integers to ASCII for printing.
(TUT11)
a) Computing the track, sector, and head for a disk read: Device driver
b) Writing commands to the device registers: Device driver
c) Checking to see if the user is permitted to use the device: Device-independent software
d) Converting binary integers to ASCII for printing: User-level software
A local area network is used as follows. The user issues a system call to write data packets
to the network. The operating system then copies the data to a kernel buffer. Then it copies
the data to the network controller board. When all the bytes are safely inside the controller,
they are sent over the network at a rate of 10 megabits/sec. The receiving network controller
stores each bit a microsecond after it is sent. When the last bit arrives, the destination CPU is
interrupted, and the kernel copies the newly arrived packet to a kernel buffer to inspect it.
Once it has figured out which user the packet is for, the kernel copies the data to the user
space. If we assume that each interrupt and its associated processing takes 1 msec, that
packets are 1024 bytes (ignore the headers), and that copying a byte takes 1 μ sec, what is the
maximum rate at which one process can pump data to another? Assume that the sender is
blocked until the work is finished at the receiving side and an acknowledgement comes back.
For simplicity, assume that the time to get the acknowledgement back is so small it can be
ignored.
A packet must be copied four times during this process, which takes 4.1 msec. There are also
two interrupts, which account for 2 msec. Finally, the transmission time is 0.83 msec, for a total
of 6.93 msec per 1024 bytes. The maximum data rate is thus 147,763 bytes/sec, or about 12% of
the nominal 10 megabit/sec network capacity. (If we include protocol overhead, the figures get
even worse.)