CHAPTER 5: The Processor Flashcards
combinational element
operational element, such as an AND gate or an ALU
state element
memory element, such as a register or a memory
clock methodology
approach used to determine when data is valid and stable relative to the clock
edge-triggered clocking
clocking scheme in which all state changes occur on a clock edge

control signal
signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit
asserted
signal is logically high or true
deasserted
signal is logically low or false
datapath element
unit used to operate on or hold data within a processor. In the LEGv8 implementation, the datapath elements include the instruction and data memories, the register file, the ALU, and adders
program counter
register containing the address of the instruction in the program being executed

register file
state element that consists of a set of registers that can be read and written by supplying a register number to be accessed

sign-extend
increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item
branch target address
address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the LEGv8 architecture, the branch target is given by the sum of the offset field of the instruction and the address of the branch
branch taken
branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional branches are taken branches
branch not taken (untaken branch)
branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch
truth table
a representation of a logical operation by listing all the values of the inputs and then in each case showing what the resulting outputs should be
don’t-care term
element of a logical function in which the output does not depend on the values of all the inputs. Don’t-care terms may be specified in different ways
opcode
field that denotes the operation and format of an instruction

single cycle implementation (single clock cycle implementation)
implementation in which an instruction is executed in one clock cycle. While easy to understand, it is too slow to be practical
pipelining

implementation technique in which multiple instructions are overlapped in execution, much like an assembly line

structural hazard
planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute
data hazard (pipeline data hazard)
planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction are not yet available
forwarding (bypassing)
method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory
load-use data hazard
specific form of data hazard in which the data being loaded by a load instruction has not yet become available when it is needed by another instruction.
pipeline stall (bubble)
stall initiated in order to resolve a hazard
control hazard (branch hazard)
proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected
branch prediction
method of resolving a branch hazard that assumes a given outcome for the conditional branch and proceeds from that assumption rather than waiting to ascertain the actual outcome

latency (pipeline)
the number of stages in a pipeline or the number of stages between two instructions during execution.
nop
instruction that does no operation to change state
flush
discard instructions in a pipeline, usually due to an unexpected event
dynamic branch prediction
prediction of branches at runtime using runtime information
branch prediction buffer
small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not
branch target buffer
structure that caches the destination PC or destination instruction for a branch. It is usually organized as a cache with tags, making it more costly than a simple prediction buffer.
correlating predictor
branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches
tournament branch predictor
branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch
vectored interrupt
interrupt for which the address to which control is transferred is determined by the cause of the exception
imprecise interrupt (imprecise exception)
Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception
precise interrupt (precise exception)
interrupt or exception that is always associated with the correct instruction in pipelined computers
instruction-level parallelism
parallelism among instructions
multiple issue
scheme whereby multiple instructions are launched in one clock cycle
static multiple issue
approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution
dynamic multiple issue
approach to implementing a multiple-issue processor where many decisions are made during execution by the processor
issue slots
positions from which instructions could issue in a given clock cycle; by analogy, these correspond to positions at the starting blocks for a sprint
speculation
approach whereby the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions
issue packet
set of instructions that issues together in one clock cycle; the packet may be determined statically by the compiler or dynamically by the processor.
very long instruction work (VLIW)
style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields
use latency
number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline
loop unrolling
technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together
register renaming
renaming of registers by the compiler or hardware to remove antidependences
antidependence (name dependence)
ordering forced by the reuse of a name, typically a register, rather than by a true dependence that carries a value between two instructions
superscalar
advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution
dynamic pipeline scheduling
hardware support for reordering the order of instruction execution to avoid stalls
commit unit
unit in a dynamic or out-of-order execution pipeline that decides when it is safe to release the result of an operation to programmer-visible registers and memory
reservation station
buffer within a functional unit that holds the operands and the operation
reorder buffer
buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register
out-of-order execution
situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait
in-order commit
commit in which the results of pipelined execution are written to the programmer visible state in the same order that instructions are fetched
fallacy: pipelining is easy
fallacy: pipelining ideas can be implemented independent of technology