Long Questions Flashcards
Name and discuss two strategies for increasing the speed of execution of the data path. Give examples of their implementation.
- Reduce the number of clock cycles needed to execute an instruction:
The number of clock cycles needed to execute a set of operations is the path length. This path length can be reduced by adding hardware, such as an incrementer to PC, which would eliminate a cycle, since we would not have to use the ALU to advance to PC. - Simplify the organization so that the clock cycle can be shorter:
The number of instruction cycles for fetching instructions can be reduced, but this requires more than just adding an incrementer to PC. - Overlap the execution of instructions (use segmenting):
We can separate the circuitry for fetching the instructions if the unit is functionally independent of the main data path. This will allow it to fetch the next opcode or operand on its own, perhaps even at the same time and fetching one or more instructions.
What are the considerations when values are copied from an 8-bit memory port onto a 32-bit bus? Pay attention to signed and unsigned numbers.
Values can be copied into a 32 bit bus from an 8 bit memory port in one of two ways.
* The 32 bit word placed on the bus contains an 8 bit value in the lower order of 8 bits and zeros in the upper 24 bits when an unsigned value is required.
* The sign bit is duplicated into the upper 24 bit positions of the bus when a signed value is needed, and the value is stored in 8 bits in the lower order, this process is called sign extension.
* The upper 24 bits would then either be all zeros or all ones, depending on whether the 8-bit value’s leftmost bit is a 0 or a 1.
* The decision then as to if the 8 bit value is converted to an unsigned or signed 32 bit value on the bus is hence determined by which one of the two control signals is asserted.
RAW dependencies are also called “true dependencies”. Motivate this with reference to other instruction dependencies and also say what can be done to mitigate these other dependencies.
- RAW (Read After Write) (is also called Hazard/True dependency) dependencies need to wait for the result from a previous microstep to be calculated before it can be read.
- The RAW dependency is called a TRUE dependency because of the fact that we don’t know the future.
There are 2 dependencies
* WAR (Write After Read) is when we are still busy working on a result and the result is being read, so we don’t overwrite the result.
* WAW (Write After Write) which means it is still being written to so don’t issue it. We can’t overwrite a result while there is a previous result going to the same place still being written.
- These 2 dependencies (WAR and WAW) are resource conflicts – a conflict where you have an existing value that we are currently using/have to write too but there’s already something reading/writing to it already.
- A way to deal with resource conflicts is to introduce on hardware level, shadow registers (hidden registers that can be remapped to stand in the place of another register).
- Dependencies between instructions mean that once another instruction has provided a value that it requires, certain instructions can not be executed. It causes the command to wait.
- To get around this, some CPUs skip over dependent instructions to get to non-dependent future instructions. But the CPU always needs to have the same effect as if the program had been executed in order.
Define and discuss internal and external fragmentation with reference to the memory allocation scheme.
When working with chunks, we need to pick chunk sizes of which external fragmentation and internal fragmentation are options.
* External fragmentation is when space is wasted external to the segments, in the holes between the segments.
* The total memory space exists and satisfies the request, but it is not contiguous.
* This means that eventually, there will not be a single chunk big enough for the memory allocation, unless the external fragmentation is removed using compaction, which groups the holes into bigger chunks.
- Internal Fragmentation is if some allocated memory is slightly larger than requested memory then this size difference is memory internal to a partition, but not being used (thus, can’t reach the extra memory).
- So it is when space is wasted internal to a page. We can reduce fragmentation by using compaction, which is where we shuffle the contents to place all the free memory together in one large block.
Explain the conceptual differences between segmentation and pagination. Also mention how, from the programmer’s perspective, segmentation faults differ from page faults. (Discuss the difference between segmentation and paging.)
- A page fault is a fault that we would consider to be fixable and a segmentation fault is an error that we consider not to be fixable.
- Segmentations make each procedure a separate segment. This removes the need for more than one physical copy of the main memory of every shared procedure and will save memory.
- Paging is when the process is allocated physical memory as it becomes available.
- Segmentation is up to the programmer while paging isn’t.
- Paging also has one linear address space while segmentation has multiple linear address spaces and for both paging and segmentation the virtual address space can be greater than the memory size.
- Segmentation can handle tables with variable sizes easily, while paging cannot so pages are of fixed length and segments are not. Paging simulates large memories while segmentation provides multiple address spaces.
Explain why Unix file systems do not allow hard links to directories.
- Copies of a link to a file are hard links. This means that the soft link has no value if the file is deleted, but the hard link still has the file data.
- Hard links would break the file system’s directed acyclic graph structure that could maybe lead to directory loops.
- This is a problem because there will be no way to detect if you are looping while traversing directories.
- This means that deleting or transferring directories if hard links are used will result in the hard link still referring to the directory when it should no longer exist.
Discuss, compare and evaluate the allocation strategies provided by FAT on the one hand, and Linux inodes on the other. Your analysis must include references to file size, access time and fragmentation.
FAT(File Allocation Table) is a file system used in computers.
* They are usually used on windows computers or flash and solid-state memory disks.
* FAT is an index table that is used to label lists of files that are connected to each other.
* It is stored on the disk and it associates lists of files with each other called clusters. The root directory contains the number of each file’s first cluster in the directory.
* The table is a linked list of entries for each cluster of contiguous areas of disk storage.
* Each entry includes how the cluster is used, a number in the file for the next cluster or a marker indicating the end of the file or the disk space was not used.
* FAT is traversed by the operating system, going through the cluster list until the end of the file is reached.
A file system object such as a file/directory is defined by Linux inodes.
* Each file is associated with an inode that is identified by a number. Inodes store metadata associated with the file that they point to such as who owns the file, what type of file it is, who can access the file and what access rights do they have. Because of this use of metadata, the files that they point to can be put anywhere on the disk which causes no clusters.
* The inode number indexes the inode table which by using this index allows the kernel file system driver to access the contents of the inode, including the location of the file and file access.
File Size:
* FAT file size used to be fixed but is now variable.
* FAT32 filse size supports a file size of 4GB and can have a max volume of 2TB
* Inode used to have a fixed size inode table on older systems, which reduced the number of files the system could handle.
* UNIX Inodes is limited by design to 16EB
Access time:
* FAT is quite slow as it has to search through a linked list to find a file
* Inode is much faster as it uses indexes onto a table to find a file
Fragmentation:
* FAT has internal fragmentation but no external fragmentation.
* Inode has no external fragmentation but there is a small chance for chance for internal fragmentation
Which strategy is used by a typical Unix file system to ensure that an index block allows a large range of file sizes and rapid access, but does not occupy too much space itself ?
- There are only 12 direct blocks in an Inode, each with a pointer to a data block. In addition, an Inode has a pointer, a double indirect block and a triple indirect block to an indirect block.
- Each indirect block contains a further 12 data block pointers, and a double indirect block contains 12 indirect block pointers.
- This makes it easier to fit large files, while small files only need to use the 12 direct blocks that save space.
Define and discuss the merits of associative cache memory.
- When lots of different lines in memory compete for the same cache slot, we can allow two or more lines in each cache.
- For each address, a cache with n possible entries is an n-way set-associative cache. Then a set of n cache entries needs to be checked when searching for a line in the cache.
This can lead to the following merits. - An algorithm called LRU (Least Recently Used) can be used to select an object to be discarded when a new entry is brought into the cache. This algorithm also keeps an order to each set of locations and marks the entry that was most recently accessed.
- When a new entry comes in, the one at the end of the list (the least recently accessed entry) is discarded.
- As the chance to discard something and then need it again soon is small, this will improve the efficiency and performance.
Explain how the Mic-1’s N and Z bits are wired and how they are involved with branching on the microinstruction level.
- When the ALU is finished computing the result, it loads values into the N and Z bits.
- The N and Z bits are received from the ALU and are saved in one bit flip flops. The N bit is something we can use to determine whether the output ALU is negative (it is set to 1 if negative and 0 if positive) and the Z bit will tell us if it is zero (it is set to 1 if it is zero and 0 if not).
- This helps with branching which is when an algorithm has to make a choice to do one of two things. It helps with branching, as saving the ALU status flags in N and Z allows the correct MPC computation values available and stable.
- For any comparison, the Z value is XOR’ed because if the 2 values are the same the XOR returns 0, which then branches to the required condition, while if the values are different, the XOR returns 1 and then Z is 0 and therefore branches to the false condition.
Motivate and discuss the use of cache memory with respect to the principles of locality.
Cache memory increases a machine’s performance as it decreases the time that the CPU has to wait between requesting and receiving an instruction from data.
There are two principles of locality, Spatial locality and temporal locality.
* The spatial principle of locality is used when memory locations with numerically similar addresses to recently accessed memory location are likely to be accessed again in the future.
- Temporal location assumes that the following instruction will require information recently accessed by the CPU very soon.
- Both principles reduce the time that the CPU must wait by estimating data which will be required in the near future. Therefore it increases the speed at which the CPU can run, not the speed of the clock, but reduces waiting and is therefore important for our cache system.
Why does the pipelines Mic-4 processor need four memory instruction registers?
The Queue used in the mic-4 will supply 4 different versions of a memory instruction register. These correspond to the 4 stages we have.
* Stage 4 is driving the register values into latches.
* Stage 5 is when the ALU and the shifter compute.
* Stage 6 is when the values are written from the C latch back to registers.
* Lastly stage 7 is when we access memory. The breakdown in the parts of these MIR. We have separate instructions driving different parts. It means to drive the circuitry we need to have the values inside the MIR to drive it.
Explain pointwise what happens when a function is called according to the cdecl calling convention. Assume the function being called (the callee) has parameters and local variables. Refer to the caller as well as the callee.
In cdecl subroutine arguments are pushed onto the stack and the return value/address is always in the EAX register. The convention pushes parameters from right to left and the registers EAX, ECX and EDX are available to use in the procedure. The caller preserves these 3 registers and the callee preserves ebx, edi, esi and ebp. So below is what happens in the cdecl calling convention
Starting: boiler plate code
* We start by pushing ebp which preserves the old base pointer
* We move esp into ebp which moves the base pointer to the current stack position
* Then we sub a number into esp for the local variables
Ending: boiler plate code
* We move ebp into esp to reset the stack and get rid of all the local variables
* We pop ebp so we get the old base pointer
* Lastly we “ret” or return the value in eax
Explain fully what is meant by demand paging. How is it the same as or different from the use of cache memory?
Demand paging loads pages on demand, and not in advance. Instead of loading the whole process, a lazy swapper is used to load only the necessary pages as called.
This differs from cache memory which always follows the principle of locality to achieve performance by predicting events in advance.
The main difference is that with demand paging, the backing store(used to store the pages) stores references taken to physical memory while cache memory stores references from memory to be taken to the processor.
Describe the operation of a translation lookaside buffer.
- The TLB is a solution to an extreme number of virtual addresses needed for a normal page table.
- TLB’s are special tables that can compare the current virtual address being referenced to those referenced in the past. It maps the virtual page numbers onto the physical frame numbers, but only holds the most recently used virtual page numbers.
- A virtual address is now presented and looked up in the TLB, and if a match is found it is a hit; with no match, it is a TLB miss, and the OS has the job of handling this (it is not a page fault since the page may be in memory). On a TLB miss, the value is loaded into the TLB for a faster access next time.
- It is also known as associative memory. TLBs are small, like cache, as we only want those that we are likely to use again, available.