Long Questions Flashcards by Darushen Pillay

Name and discuss two strategies for increasing the speed of execution of the data path. Give examples of their implementation.

Reduce the number of clock cycles needed to execute an instruction:
The number of clock cycles needed to execute a set of operations is the path length. This path length can be reduced by adding hardware, such as an incrementer to PC, which would eliminate a cycle, since we would not have to use the ALU to advance to PC.
Simplify the organization so that the clock cycle can be shorter:
The number of instruction cycles for fetching instructions can be reduced, but this requires more than just adding an incrementer to PC.
Overlap the execution of instructions (use segmenting):
We can separate the circuitry for fetching the instructions if the unit is functionally independent of the main data path. This will allow it to fetch the next opcode or operand on its own, perhaps even at the same time and fetching one or more instructions.

How well did you know this?

Not at all

Perfectly

What are the considerations when values are copied from an 8-bit memory port onto a 32-bit bus? Pay attention to signed and unsigned numbers.

Values can be copied into a 32 bit bus from an 8 bit memory port in one of two ways.
* The 32 bit word placed on the bus contains an 8 bit value in the lower order of 8 bits and zeros in the upper 24 bits when an unsigned value is required.
* The sign bit is duplicated into the upper 24 bit positions of the bus when a signed value is needed, and the value is stored in 8 bits in the lower order, this process is called sign extension.
* The upper 24 bits would then either be all zeros or all ones, depending on whether the 8-bit value’s leftmost bit is a 0 or a 1.
* The decision then as to if the 8 bit value is converted to an unsigned or signed 32 bit value on the bus is hence determined by which one of the two control signals is asserted.

How well did you know this?

Not at all

Perfectly

RAW dependencies are also called “true dependencies”. Motivate this with reference to other instruction dependencies and also say what can be done to mitigate these other dependencies.

RAW (Read After Write) (is also called Hazard/True dependency) dependencies need to wait for the result from a previous microstep to be calculated before it can be read.
The RAW dependency is called a TRUE dependency because of the fact that we don’t know the future.

There are 2 dependencies
* WAR (Write After Read) is when we are still busy working on a result and the result is being read, so we don’t overwrite the result.
* WAW (Write After Write) which means it is still being written to so don’t issue it. We can’t overwrite a result while there is a previous result going to the same place still being written.

These 2 dependencies (WAR and WAW) are resource conflicts – a conflict where you have an existing value that we are currently using/have to write too but there’s already something reading/writing to it already.
A way to deal with resource conflicts is to introduce on hardware level, shadow registers (hidden registers that can be remapped to stand in the place of another register).
Dependencies between instructions mean that once another instruction has provided a value that it requires, certain instructions can not be executed. It causes the command to wait.
To get around this, some CPUs skip over dependent instructions to get to non-dependent future instructions. But the CPU always needs to have the same effect as if the program had been executed in order.

How well did you know this?

Not at all

Perfectly

Define and discuss internal and external fragmentation with reference to the memory allocation scheme.

When working with chunks, we need to pick chunk sizes of which external fragmentation and internal fragmentation are options.
* External fragmentation is when space is wasted external to the segments, in the holes between the segments.
* The total memory space exists and satisfies the request, but it is not contiguous.
* This means that eventually, there will not be a single chunk big enough for the memory allocation, unless the external fragmentation is removed using compaction, which groups the holes into bigger chunks.

Internal Fragmentation is if some allocated memory is slightly larger than requested memory then this size difference is memory internal to a partition, but not being used (thus, can’t reach the extra memory).
So it is when space is wasted internal to a page. We can reduce fragmentation by using compaction, which is where we shuffle the contents to place all the free memory together in one large block.

How well did you know this?

Not at all

Perfectly

Explain the conceptual differences between segmentation and pagination. Also mention how, from the programmer’s perspective, segmentation faults differ from page faults. (Discuss the difference between segmentation and paging.)

A page fault is a fault that we would consider to be fixable and a segmentation fault is an error that we consider not to be fixable.
Segmentations make each procedure a separate segment. This removes the need for more than one physical copy of the main memory of every shared procedure and will save memory.
Paging is when the process is allocated physical memory as it becomes available.
Segmentation is up to the programmer while paging isn’t.
Paging also has one linear address space while segmentation has multiple linear address spaces and for both paging and segmentation the virtual address space can be greater than the memory size.
Segmentation can handle tables with variable sizes easily, while paging cannot so pages are of fixed length and segments are not. Paging simulates large memories while segmentation provides multiple address spaces.

How well did you know this?

Not at all

Perfectly

Explain why Unix file systems do not allow hard links to directories.

Copies of a link to a file are hard links. This means that the soft link has no value if the file is deleted, but the hard link still has the file data.
Hard links would break the file system’s directed acyclic graph structure that could maybe lead to directory loops.
This is a problem because there will be no way to detect if you are looping while traversing directories.
This means that deleting or transferring directories if hard links are used will result in the hard link still referring to the directory when it should no longer exist.

How well did you know this?

Not at all

Perfectly

Discuss, compare and evaluate the allocation strategies provided by FAT on the one hand, and Linux inodes on the other. Your analysis must include references to file size, access time and fragmentation.

FAT(File Allocation Table) is a file system used in computers.
* They are usually used on windows computers or flash and solid-state memory disks.
* FAT is an index table that is used to label lists of files that are connected to each other.
* It is stored on the disk and it associates lists of files with each other called clusters. The root directory contains the number of each file’s first cluster in the directory.
* The table is a linked list of entries for each cluster of contiguous areas of disk storage.
* Each entry includes how the cluster is used, a number in the file for the next cluster or a marker indicating the end of the file or the disk space was not used.
* FAT is traversed by the operating system, going through the cluster list until the end of the file is reached.

A file system object such as a file/directory is defined by Linux inodes.
* Each file is associated with an inode that is identified by a number. Inodes store metadata associated with the file that they point to such as who owns the file, what type of file it is, who can access the file and what access rights do they have. Because of this use of metadata, the files that they point to can be put anywhere on the disk which causes no clusters.
* The inode number indexes the inode table which by using this index allows the kernel file system driver to access the contents of the inode, including the location of the file and file access.

File Size:
* FAT file size used to be fixed but is now variable.
* FAT32 filse size supports a file size of 4GB and can have a max volume of 2TB
* Inode used to have a fixed size inode table on older systems, which reduced the number of files the system could handle.
* UNIX Inodes is limited by design to 16EB

Access time:
* FAT is quite slow as it has to search through a linked list to find a file
* Inode is much faster as it uses indexes onto a table to find a file

Fragmentation:
* FAT has internal fragmentation but no external fragmentation.
* Inode has no external fragmentation but there is a small chance for chance for internal fragmentation

How well did you know this?

Not at all

Perfectly

Which strategy is used by a typical Unix file system to ensure that an index block allows a large range of file sizes and rapid access, but does not occupy too much space itself ?

There are only 12 direct blocks in an Inode, each with a pointer to a data block. In addition, an Inode has a pointer, a double indirect block and a triple indirect block to an indirect block.
Each indirect block contains a further 12 data block pointers, and a double indirect block contains 12 indirect block pointers.
This makes it easier to fit large files, while small files only need to use the 12 direct blocks that save space.

How well did you know this?

Not at all

Perfectly

Define and discuss the merits of associative cache memory.

When lots of different lines in memory compete for the same cache slot, we can allow two or more lines in each cache.
For each address, a cache with n possible entries is an n-way set-associative cache. Then a set of n cache entries needs to be checked when searching for a line in the cache.
This can lead to the following merits.
An algorithm called LRU (Least Recently Used) can be used to select an object to be discarded when a new entry is brought into the cache. This algorithm also keeps an order to each set of locations and marks the entry that was most recently accessed.
When a new entry comes in, the one at the end of the list (the least recently accessed entry) is discarded.
As the chance to discard something and then need it again soon is small, this will improve the efficiency and performance.

How well did you know this?

Not at all

Perfectly

Explain how the Mic-1’s N and Z bits are wired and how they are involved with branching on the microinstruction level.

When the ALU is finished computing the result, it loads values into the N and Z bits.
The N and Z bits are received from the ALU and are saved in one bit flip flops. The N bit is something we can use to determine whether the output ALU is negative (it is set to 1 if negative and 0 if positive) and the Z bit will tell us if it is zero (it is set to 1 if it is zero and 0 if not).
This helps with branching which is when an algorithm has to make a choice to do one of two things. It helps with branching, as saving the ALU status flags in N and Z allows the correct MPC computation values available and stable.
For any comparison, the Z value is XOR’ed because if the 2 values are the same the XOR returns 0, which then branches to the required condition, while if the values are different, the XOR returns 1 and then Z is 0 and therefore branches to the false condition.

How well did you know this?

Not at all

Perfectly

Motivate and discuss the use of cache memory with respect to the principles of locality.

Cache memory increases a machine’s performance as it decreases the time that the CPU has to wait between requesting and receiving an instruction from data.

There are two principles of locality, Spatial locality and temporal locality.
* The spatial principle of locality is used when memory locations with numerically similar addresses to recently accessed memory location are likely to be accessed again in the future.

Temporal location assumes that the following instruction will require information recently accessed by the CPU very soon.
Both principles reduce the time that the CPU must wait by estimating data which will be required in the near future. Therefore it increases the speed at which the CPU can run, not the speed of the clock, but reduces waiting and is therefore important for our cache system.

How well did you know this?

Not at all

Perfectly

Why does the pipelines Mic-4 processor need four memory instruction registers?

The Queue used in the mic-4 will supply 4 different versions of a memory instruction register. These correspond to the 4 stages we have.
* Stage 4 is driving the register values into latches.
* Stage 5 is when the ALU and the shifter compute.
* Stage 6 is when the values are written from the C latch back to registers.
* Lastly stage 7 is when we access memory. The breakdown in the parts of these MIR. We have separate instructions driving different parts. It means to drive the circuitry we need to have the values inside the MIR to drive it.

How well did you know this?

Not at all

Perfectly

Explain pointwise what happens when a function is called according to the cdecl calling convention. Assume the function being called (the callee) has parameters and local variables. Refer to the caller as well as the callee.

In cdecl subroutine arguments are pushed onto the stack and the return value/address is always in the EAX register. The convention pushes parameters from right to left and the registers EAX, ECX and EDX are available to use in the procedure. The caller preserves these 3 registers and the callee preserves ebx, edi, esi and ebp. So below is what happens in the cdecl calling convention
Starting: boiler plate code
* We start by pushing ebp which preserves the old base pointer
* We move esp into ebp which moves the base pointer to the current stack position
* Then we sub a number into esp for the local variables

Ending: boiler plate code
* We move ebp into esp to reset the stack and get rid of all the local variables
* We pop ebp so we get the old base pointer
* Lastly we “ret” or return the value in eax

How well did you know this?

Not at all

Perfectly

Explain fully what is meant by demand paging. How is it the same as or different from the use of cache memory?

Demand paging loads pages on demand, and not in advance. Instead of loading the whole process, a lazy swapper is used to load only the necessary pages as called.

This differs from cache memory which always follows the principle of locality to achieve performance by predicting events in advance.

The main difference is that with demand paging, the backing store(used to store the pages) stores references taken to physical memory while cache memory stores references from memory to be taken to the processor.

How well did you know this?

Not at all

Perfectly

Describe the operation of a translation lookaside buffer.

The TLB is a solution to an extreme number of virtual addresses needed for a normal page table.
TLB’s are special tables that can compare the current virtual address being referenced to those referenced in the past. It maps the virtual page numbers onto the physical frame numbers, but only holds the most recently used virtual page numbers.
A virtual address is now presented and looked up in the TLB, and if a match is found it is a hit; with no match, it is a TLB miss, and the OS has the job of handling this (it is not a page fault since the page may be in memory). On a TLB miss, the value is loaded into the TLB for a faster access next time.
It is also known as associative memory. TLBs are small, like cache, as we only want those that we are likely to use again, available.

How well did you know this?

Not at all

Perfectly

Give an example of how speculative execution can have catastrophic consequences if the system architecture does not provide suitable hardware protections.

Study These Flashcards

Speculative execution is executing code before it is known. None of the outcomes must be irreversible by using this process, as it could turn out that they should not have been executed.
This is achieved by keeping the results using scratch registers and then copying the results to the true destination registers if the result has been executed. If not, it would lose the result that was overwritten, and the memory would hold a result that it should not.

Explain what the difference between dynamic linking and dynamic loading is.

Study These Flashcards

In dynamic linking, the linking is postponed until execution time and a small bit of code (a stub), is used to locate the appropriate memory-resident library routine.
The stub replaces itself with the address of the routine, and executes the routine.
The operating system verifies whether the routine is in the memory address of the process and if not, adds it to the address space.
However, dynamic loading loads the procedure as it is called into primary memory.
The entire program must be in memory to execute during dynamic loading, but certain procedures do not need to be loaded until they are called.
This is good because it suggests that routines that are not used are never loaded. It also means that all routines must be stored in a relocatable load format on the disk.
When a lot of code is required to manage seldom occurring situations, this is helpful.
Therefore no special support from the operating system is needed. The operating system can also help by providing libraries to implement this dynamic loading.

Flip-Flop vs Latch

Study These Flashcards

A flip flop grabs a value during a rising or falling edge of a clock signal. It is edge triggered(output changes when control signal changes)
A latch is 1-bit memory used to store previous input values. It is level triggered (output changes when input changes)

Ripple Carry adder vs Carry select adder

Study These Flashcards

A ripple carry adder works by moving all required bits left. The worst case being adding 1 to 111…111, the rightmost bit had to ripple all the way through to the leftmost bit
A carry select adder splits the 32 bit adder into a lower and higher 16 bit adder. Run the upper 16 in parallel where incoming carry is 0 and output is 1.Depending on what correct carry is, select that. Halves time.
This demonstrates the speed-cost dilemma where carry select will speed up but cost more from additional hardware

Briefly discuss the design of the system bus of a typical Intel system. Pay particular attention to the speed
of the connected components.

Study These Flashcards

Modern computers generally have a special-purpose bus between CPU and mem- ory (short length to increase speed) and at least one other bus for I/O devices (slower than special purpose bus). A minimal system with 1 memory bus and 1 I/O bus is illustrated below. (2 busses involved: internal and external)
* internal bus connects CPU to the ALU for fast data transfers (narrow and short length to increase speed).
* external bus connects CPU to memory and I/O devices (wider, so slower than internal bus).

The control store of the Mic-1 contains two offsets, T and F, to which a micro-operationn cann branch, depending on whether the N or Z latch contains 1 (branch to T) or 0 (branch to F). Someone now suggests that since the first micro-operation at T is exactly the same as the first micro-operation for the goto instruction, all branching instructions can simply branch to either goto or F instead of T or F. What do you think of this idea?

Study These Flashcards

Advantages:

Simplicity: Consolidating the branching instructions to either “goto” or “F” could simplify the control unit’s logic. It reduces the need for additional control signals to distinguish between “T” and “F” branches.

Reduced Control Store Complexity: Combining “T” and “F” branches into one could potentially reduce the size and complexity of the control store, as you’d only need to store micro-operations for “goto” and “F” rather than having separate entries for each.

Disadvantages:

Loss of Flexibility: Having separate “T” and “F” branches allows for more flexibility in the control flow.

Potential Performance Impact: The decision to branch to “goto” or “F” might introduce additional cycles or delays in the control unit.

Describe pointwise what happens during the four subcycles of the Mic-1 microarchitecture’s data path cycle.

Study These Flashcards

At falling edge (previous cycles), MIR is loaded by an address in MPC. ∆w - load time (which is known)
MIR signals propagate and B-bus gets register values. ∆w + ∆x = ALU input values are stable.
The ALU and shifter operate.
∆x + ∆w + ∆y : ALU/shifter outputs are stable.
Values reach register via C-BUS after ∆x + ∆w + ∆y + ∆z time.

One strategy for static branch prediction is to take all backward jumps, but to skip forward jumps. With reference to control-flow elements in high-level languages, is this a good strategy? Motivate your answer.

Study These Flashcards

If a branch instruction is not in the table then static prediction is used. A backward branch is assumed to be part of a loop and to be taken. The accuracy of these static predictions is extremely high.

A forward branch is assumed to be part of an if statement and assumed to not be taken. The accuracy of these static predictions is much lower than that of backwards branching. Therefore it works well for loops with little wrong predictions.

Discuss two different approaches to the parallelisation of a 1-bit full adder in terms of cost and performance.

Study These Flashcards

A ripple carry adder is a simple and cost effective approach but it is slow because it requires the carry bit to travel from the rightmost bit to the left most bit.

For adding 2, 16-bit words, a carry select adder would divide the adders into 2 upper halves and 1 lower half, performing the addition in parallel. This setup is more complex and therefore more expensive, but it is much faster.

With reference to the 32-bit ALU used in the Mic-1 architecture – explain how the implementation of ISA instructions for relational operators like = and ≤ should be approached.

**Equal:** **1. Comparison (CMP):** Use a CMP (Compare) operation to subtract the two values you want to compare. The result of the CMP operation sets the flags in the processor based on the relationship between the two values. **2. Conditional Branch (JE):** Use a conditional branch instruction (Jump if Equal) that checks the flags set by the previous CMP operation. If the flags indicate that the values are equal, the branch is taken, and the program execution continues at the specified target address. **Less than or Equal** **1. Comparison (CMP):** Similar to the equal operation, use a CMP operation to subtract the second operand from the first. **2. Conditional Branch (JLE):** Use a conditional branch instruction (Jump if Less Than or Equal To) that checks the flags set by the CMP operation. If the flags indicate that the first operand is less than or equal to the second operand, the branch is taken.

Explain in detail how the Mic-1 architecture can be pipelined, with reference to the design of the data path and the control store, the use of registers, and the original subcycles.

The data path design for pipelining is: instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MEM) and write back (WB). **1. Control Store** needs to be extended to support pipelining because each instruction's microcode needs to be broken down into micro-operations. **2. Introduce pipeline registers** between each stage to hold intermediate results **3. Instead of the original subcycles** having a single clock cycle per instruction, there are now multiple clock cycles per instruction.

Long Questions Flashcards

(26 cards)