Unit 5 Flashcards
For every instruction, the first two steps are identical: Name them.
Send the program counter (PC) to the memory that contains the code and fetch the instruction from that memory.
Read one or two registers, using fields of the instruction to select the registers to read. For the LDUR and CBZ instructions, we need to read only one register, but most other instructions require reading two registers.
3rd step for memory-reference instructions
Use ALU for address calculation
3rd step for arithmetic-logic instructions
Use ALU for operation execution
Every instruction must first be fetched from memory, based on the value of the blank.
program counter
Every instruction reads one or two blank
registers
3rd step for branch instructions
use ALU for comparison
The blank stores the program that is to be executed.
instruction memory
The blank stores the data needed by the running programs.
data memory
The blank commands the datapath according to the instructions of the program by setting the control lines for each of the major functional units.
control unit
The blank elements include the instruction and data memories, the register file, the ALU, and adders.
datapath
An operational element, such as an AND gate or an ALU.
Combinational element
A memory element, such as a register or a memory.
State element
A blank element has at least two inputs and one output.
state
The elements that operate on data values are all blank, which means that their outputs depend only on the current inputs.
combinational
A sequential element is another name for a _____ element.
state
A clock input is present on a _____ element.
state
An ALU is a _____ element.
combinational
The approach used to determine when data is valid and stable relative to the clock.
Clocking methodology
A clocking scheme in which all state changes occur on a clock edge.
Edge-triggered clocking
A signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit.
Control signal
The signal is logically high or true.
Asserted
The signal is logically low or false.
Deasserted
An blank allows us to read the contents of a register, send the value through some combinational logic, and write that register in the same clock cycle.
edge-triggered methodology
For the 64-bit LEGv8 architecture, nearly all of these state and logic elements will have inputs and outputs that are blank, since that is the width of most of the data handled by the processor
64 bits wide
A rising clock edge refers to the clock changing from blank
0 to 1
A unit used to operate on or hold data within a processor. In the LEGv8 implementation, the blank include the instruction and data memories, the register file, the ALU, and adders.
Datapath element
The register containing the address of the instruction in the program being executed.
Program counter (PC)
To execute any instruction, we must start by blank.
fetching the instruction from memory
A state element that consists of a set of registers that can be read and written by supplying a register number to be accessed.
Register file
To increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.
Sign-extend
There are two details in the definition of branch instructions (see COD Chapter 2 (Instructions: Language of the Computer)) to which we must pay attention:
The instruction set architecture specifies that the base for the branch address calculation is the address of the branch instruction.
The architecture also states that the offset field is shifted left 2 bits so that it is a word offset; this shift increases the effective range of the offset field by a factor of 4.
The address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the LEGv8 architecture, the blank is given by the sum of the offset field of the instruction and the address of the branch.
Branch target address
A branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional branches are taken branches.
Branch taken
A branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.
Branch not taken or (untaken branch)
This simplest blank will attempt to execute all instructions in one clock cycle.
datapath
The normal case of fetching the next instruction memory requires blank, not PC + 1.
PC + 4,
The blank must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control.
control unit
For load register and store register instructions, we use the ALU to compute the memory address by blank .
addition
For the R-type instructions, the ALU needs to perform one of the four actions (blank, blank, blank or blank), depending on the value of the 11-bit opcode field in the instruction (
AND, OR, subtract, or add
If the instruction is STUR, then ALUOp should be _____
00
If the instruction is STUR, then the ALU’s four control inputs should be _____.
0010
For LDUR and STUR instructions, the ALU function _____.
the same
If the instruction is ORR, then ALUOp should be _____.
10
If the instruction is ORR, then as well as examining the ALUOp bits, the ALU control will also examine _____.
instruction’s opcode field (Instruction[31:21])
If the instruction is ORR, then the ALU control will (after examining the ALUOp and opcode bits) output _____.
0001
From logic, a representation of a logical operation by listing all the values of the inputs and then in each case showing what the resulting outputs should be.
Truth table
An element of a logical function in which the output does not depend on the values of all the inputs. Don’t-care terms may be specified in different ways.
Don’t-care term
The blank, which as we saw in COD Chapter 2 (Instructions: Language of the computer), is between 6 and 11 bits wide and found in bits 31:26 to 31:21.
opcode field
The blank is always in bit positions 9:5 (Rn) for both R-type instructions and for the base register for load and store instructions.
first register operand
The blank is in one of two places. It is in bit positions 20:16 (Rm) for R-type instructions and it is in bit positions 4:0 (Rt) for the register to be written by load. That is also the field that specifies the register to be tested for zero for compare and branch on zero. Thus, we will need to add a multiplexor to select which field of the instruction is used to indicate the register number to be read.
other register operand
Another operand can also be a 19-bit offset for blank or a 9-bit offset for load and store.
compare and branch on zero
The blank for R-type instructions (Rd) and for loads (Rt) is in bit positions 4:0
destination register
The field that denotes the operation and format of an instruction.
Opcode
Also called single clock cycle implementation. An implementation in which an instruction is executed in one clock cycle. While easy to understand, it is too slow to be practical.
Single-cycle implementation
In contrast, an unconditional branch instruction always branches, so the blank is not used.
ALU
An implementation technique in which multiple instructions are overlapped in execution, much like an assembly line.
Pipelining
.LEGv8 instructions classically take what five steps:
Fetch instruction from memory.
Read registers and decode the instruction.
Execute the operation or calculate an address.
Access an operand in data memory (if necessary).
Write the result into a register (if necessary).
When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
Structural hazard
Also called a pipeline data hazard. When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction are not yet available.
Data hazard
Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory.
Forwarding
A specific form of data hazard in which the data being loaded by a load instruction has not yet become available when it is needed by another instruction.
Load-use data hazard
Also called bubble. A stall initiated in order to resolve a hazard.
Pipeline stall
Also called branch hazard. When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.
Control hazard
A method of resolving a branch hazard that assumes a given outcome for the conditional branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.
Branch prediction
Blank increases the number of simultaneously executing instructions and the rate at which instructions are started and completed.
Pipelining
Pipelining does not reduce the time it takes to complete an individual instruction, also called the blank.
latency
Blank and blank help make a computer fast while still getting the right answers.
Branch prediction and forwarding
The number of stages in a pipeline or the number of stages between two instructions during execution.
Latency (pipeline)
The division of an instruction into five stages means a five-stage pipeline, which in turn means that up to five instructions will be in execution during any single clock cycle. Thus, we must separate the datapath into five pieces, with each piece named corresponding to a stage of instruction execution. Name them.
IF: Instruction fetch
ID: Instruction decode and register file read
EX: Execution or address calculation
MEM: Data memory access
WB: Write back
During instruction fetch, the next instruction is fetched from blank. The right half of IM is shaded to depict that the memory is read.
instruction memory (IM)
During instruction decode, the instruction’s fields are converted into datapath control signals, and simultaneously the blank is read. For simplicity, just the register file is used to depict this stage. The right half is shaded to depict the read (vs. write).
register file (Reg)
During execute, the blank is used to perform the instruction’s operation or to compute an address, or an adder is used for branches.
ALU
During data memory access, the blank may be read (for a load instruction) or written (for a store instruction). For load, the right half is shaded, indicating read. (For store, the left half would be shaded).
data memory (DM)
During write back, the blank may be written by certain instructions (like R-type instructions). The left half is shaded to indicate write (vs. read). Although two Reg icons appear in the stylized depictions, only one register file exists.
register file (Reg)
An instruction that does no operation to change state.
nop
Although the compiler generally relies upon the hardware to resolve hazards and thereby ensure correct execution, the compiler must understand the blank to achieve the best performance. Otherwise, unexpected stalls will reduce the performance of the compiled code.
pipeline
To discard instructions in a pipeline, usually due to an unexpected even
Flush
Prediction of branches at runtime using runtime information
Dynamic branch prediction
Also called branch history table. A small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.
Branch prediction buffer
A structure that caches the destination PC or destination instruction for a branch. It is usually organized as a cache with tags, making it more costly than a simple prediction buffer.
Branch target buffer
A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches.
Correlating predictor
A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.
Tournament branch predictor
Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow.
Exception
An exception that comes from outside of the processor. (Some architectures use the term interrupt for all exceptions.)
Interrupt
An interrupt for which the address to which control is transferred is determined by the cause of the exception
Vectored interrupt
A 64-bit register used to hold the address of the affected instruction. (Such a register is needed even when exceptions are vectored.)
ELR
A register used to record the cause of the exception. In the LEGv8 architecture, this register is 32 bits, although some bits are currently unused. Assume there is a field that encodes the three possible exception sources mentioned above, with 8 representing an undefined instruction, 10 representing arithmetic overflow or underflow, and 12 representing hardware malfunction.
ESR
Also called imprecise exception. Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception.
Imprecise interrupt
Also called precise exception. An interrupt or exception that is always associated with the correct instruction in pipelined computers.
Precise interrupt
The parallelism among instructions.
Instruction-level parallelism
A scheme whereby multiple instructions are launched in one clock cycle
Multiple issue
An approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution.
Static multiple issue
An approach to implementing a multiple-issue processor where many decisions are made during execution by the processor.
Dynamic multiple issue
Two primary and distinct responsibilities must be dealt with in a multiple-issue pipeline:
Packaging instructions into issue slots: how does the processor determine how many instructions and which instructions can be issued in a given clock cycle? In most static issue processors, this process is at least partially handled by the compiler; in dynamic issue designs, it is normally dealt with at runtime by the processor, although the compiler will often have already tried to help improve the issue rate by placing the instructions in a beneficial order.
Dealing with data and control hazards: in static issue processors, the compiler handles some or all of the consequences of data and control hazards statically. In contrast, most dynamic issue processors attempt to alleviate at least some classes of hazards using hardware techniques operating at execution time.
The positions from which instructions could issue in a given clock cycle; by analogy, these correspond to positions at the starting blocks for a sprint.
Issue slots
An approach whereby the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions.
Speculation
The set of instructions that issues together in one clock cycle; the packet may be determined statically by the compiler or dynamically by the processor.
Issue packet
A style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields.
Very Long Instruction Word (VLIW)
Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.
Use latency
A technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.
Loop unrolling
The renaming of registers by the compiler or hardware to remove antidependences.
Register renaming
Also called name dependence. An ordering forced by the reuse of a name, typically a register, rather than by a true dependence that carries a value between two instructions.
Antidependence
An advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution.
Superscalar
Hardware support for reordering the order of instruction execution to avoid stalls.
Dynamic pipeline scheduling
The unit in a dynamic or out-of-order execution pipeline that decides when it is safe to release the result of an operation to programmer-visible registers and memory.
Commit unit
A buffer within a functional unit that holds the operands and the operation.
Reservation station
The buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register.
Reorder buffer
A situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait.
Out-of-order execution
A commit in which the results of pipelined execution are written to the programmer visible state in the same order that instructions are fetched.
In-order commit
Both pipelining and multiple-issue execution increase peak instruction throughput and attempt to exploit blank.
instruction-level parallelism (ILP)
The organization of the processor, including the major functional units, their interconnection, and control.
Microarchitecture
The instruction set of visible registers of a processor; for example, in LEGv8, these are the 32 integer and 32 floating-point registers.
Architectural registers
Verilog can describe processors for simulation or with the intention that the Verilog specification be blank.
synthesized
The inherent execution time for an instruction.
Instruction latency
The first supercomputer.
CDC 6600
An approach that uses dynamic hazard detection, generalized forwarding, and reservation stations.
Tomasulo’s algorithm
A computer that used a four-stage pipeline to overlap fetch, decode, and execute.
The IBM 7030, also known as Stretch,
The computer that introduced Tomasulo’s algorithm.
IBM 360/91
Out-of-order instruction commits led to this unpopular situation.
imprecise interrupts
The original design for a superscalar processor was a two-issue machine called _________
Cheetah
_________ is a static multiple issue approach that was used in processors found in Cydrome and Multiflow mini-supercomputers.
VLIW
_________ is a more compiler-intensive approach to multiple issue that removed many VLIW drawbacks.
EPIC
_________ is an approach that uses aggressive loop unrolling and path prediction and has been used to exploit higher levels of ILP.
Trace scheduling