Unit 5 Flashcards by Joshua Lane

For every instruction, the first two steps are identical: Name them.

Send the program counter (PC) to the memory that contains the code and fetch the instruction from that memory.

Read one or two registers, using fields of the instruction to select the registers to read. For the LDUR and CBZ instructions, we need to read only one register, but most other instructions require reading two registers.

How well did you know this?

Not at all

Perfectly

3rd step for memory-reference instructions

Use ALU for address calculation

How well did you know this?

Not at all

Perfectly

3rd step for arithmetic-logic instructions

Use ALU for operation execution

How well did you know this?

Not at all

Perfectly

Every instruction must first be fetched from memory, based on the value of the blank.

program counter

How well did you know this?

Not at all

Perfectly

Every instruction reads one or two blank

registers

How well did you know this?

Not at all

Perfectly

3rd step for branch instructions

use ALU for comparison

How well did you know this?

Not at all

Perfectly

The blank stores the program that is to be executed.

instruction memory

How well did you know this?

Not at all

Perfectly

The blank stores the data needed by the running programs.

data memory

How well did you know this?

Not at all

Perfectly

The blank commands the datapath according to the instructions of the program by setting the control lines for each of the major functional units.

control unit

How well did you know this?

Not at all

Perfectly

The blank elements include the instruction and data memories, the register file, the ALU, and adders.

datapath

How well did you know this?

Not at all

Perfectly

An operational element, such as an AND gate or an ALU.

Combinational element

How well did you know this?

Not at all

Perfectly

A memory element, such as a register or a memory.

State element

How well did you know this?

Not at all

Perfectly

A blank element has at least two inputs and one output.

state

How well did you know this?

Not at all

Perfectly

The elements that operate on data values are all blank, which means that their outputs depend only on the current inputs.

combinational

How well did you know this?

Not at all

Perfectly

A sequential element is another name for a _____ element.

state

How well did you know this?

Not at all

Perfectly

A clock input is present on a _____ element.

state

How well did you know this?

Not at all

Perfectly

An ALU is a _____ element.

combinational

How well did you know this?

Not at all

Perfectly

The approach used to determine when data is valid and stable relative to the clock.

Clocking methodology

How well did you know this?

Not at all

Perfectly

A clocking scheme in which all state changes occur on a clock edge.

Edge-triggered clocking

How well did you know this?

Not at all

Perfectly

A signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit.

Control signal

How well did you know this?

Not at all

Perfectly

The signal is logically high or true.

Asserted

How well did you know this?

Not at all

Perfectly

The signal is logically low or false.

Deasserted

How well did you know this?

Not at all

Perfectly

An blank allows us to read the contents of a register, send the value through some combinational logic, and write that register in the same clock cycle.

edge-triggered methodology

How well did you know this?

Not at all

Perfectly

For the 64-bit LEGv8 architecture, nearly all of these state and logic elements will have inputs and outputs that are blank, since that is the width of most of the data handled by the processor

64 bits wide

How well did you know this?

Not at all

Perfectly

A rising clock edge refers to the clock changing from blank

0 to 1

A unit used to operate on or hold data within a processor. In the LEGv8 implementation, the blank include the instruction and data memories, the register file, the ALU, and adders.

Datapath element

The register containing the address of the instruction in the program being executed.

Program counter (PC)

To execute any instruction, we must start by blank.

fetching the instruction from memory

A state element that consists of a set of registers that can be read and written by supplying a register number to be accessed.

To increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.

Sign-extend

There are two details in the definition of branch instructions (see COD Chapter 2 (Instructions: Language of the Computer)) to which we must pay attention:

The instruction set architecture specifies that the base for the branch address calculation is the address of the branch instruction. The architecture also states that the offset field is shifted left 2 bits so that it is a word offset; this shift increases the effective range of the offset field by a factor of 4.

The address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the LEGv8 architecture, the blank is given by the sum of the offset field of the instruction and the address of the branch.

Branch target address

A branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional branches are taken branches.

Branch taken

A branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.

Branch not taken or (untaken branch)

This simplest blank will attempt to execute all instructions in one clock cycle.

datapath

The normal case of fetching the next instruction memory requires blank, not PC + 1.

PC + 4,

The blank must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control.

control unit

For load register and store register instructions, we use the ALU to compute the memory address by blank .

addition

For the R-type instructions, the ALU needs to perform one of the four actions (blank, blank, blank or blank), depending on the value of the 11-bit opcode field in the instruction (

AND, OR, subtract, or add

If the instruction is STUR, then ALUOp should be _____

If the instruction is STUR, then the ALU's four control inputs should be _____.

0010

For LDUR and STUR instructions, the ALU function _____.

the same

If the instruction is ORR, then ALUOp should be _____.

If the instruction is ORR, then as well as examining the ALUOp bits, the ALU control will also examine _____.

instruction's opcode field (Instruction[31:21])

If the instruction is ORR, then the ALU control will (after examining the ALUOp and opcode bits) output _____.

0001

From logic, a representation of a logical operation by listing all the values of the inputs and then in each case showing what the resulting outputs should be.

Truth table

An element of a logical function in which the output does not depend on the values of all the inputs. Don't-care terms may be specified in different ways.

Don't-care term

The blank, which as we saw in COD Chapter 2 (Instructions: Language of the computer), is between 6 and 11 bits wide and found in bits 31:26 to 31:21.

opcode field

The blank is always in bit positions 9:5 (Rn) for both R-type instructions and for the base register for load and store instructions.

first register operand

The blank is in one of two places. It is in bit positions 20:16 (Rm) for R-type instructions and it is in bit positions 4:0 (Rt) for the register to be written by load. That is also the field that specifies the register to be tested for zero for compare and branch on zero. Thus, we will need to add a multiplexor to select which field of the instruction is used to indicate the register number to be read.

other register operand

Another operand can also be a 19-bit offset for blank or a 9-bit offset for load and store.

compare and branch on zero

The blank for R-type instructions (Rd) and for loads (Rt) is in bit positions 4:0

destination register

The field that denotes the operation and format of an instruction.

Opcode

Also called single clock cycle implementation. An implementation in which an instruction is executed in one clock cycle. While easy to understand, it is too slow to be practical.

Single-cycle implementation

In contrast, an unconditional branch instruction always branches, so the blank is not used.

ALU

An implementation technique in which multiple instructions are overlapped in execution, much like an assembly line.

Pipelining

.LEGv8 instructions classically take what five steps:

Fetch instruction from memory. Read registers and decode the instruction. Execute the operation or calculate an address. Access an operand in data memory (if necessary). Write the result into a register (if necessary).

When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.

Structural hazard

Also called a pipeline data hazard. When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction are not yet available.

Data hazard

Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory.

Forwarding

A specific form of data hazard in which the data being loaded by a load instruction has not yet become available when it is needed by another instruction.

Load-use data hazard

Also called bubble. A stall initiated in order to resolve a hazard.

Pipeline stall

Also called branch hazard. When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.

Control hazard

A method of resolving a branch hazard that assumes a given outcome for the conditional branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.

Branch prediction

Blank increases the number of simultaneously executing instructions and the rate at which instructions are started and completed.

Pipelining

Pipelining does not reduce the time it takes to complete an individual instruction, also called the blank.

latency

Blank and blank help make a computer fast while still getting the right answers.

Branch prediction and forwarding

The number of stages in a pipeline or the number of stages between two instructions during execution.

Latency (pipeline)

The division of an instruction into five stages means a five-stage pipeline, which in turn means that up to five instructions will be in execution during any single clock cycle. Thus, we must separate the datapath into five pieces, with each piece named corresponding to a stage of instruction execution. Name them.

IF: Instruction fetch ID: Instruction decode and register file read EX: Execution or address calculation MEM: Data memory access WB: Write back

During instruction fetch, the next instruction is fetched from blank. The right half of IM is shaded to depict that the memory is read.

instruction memory (IM)

During instruction decode, the instruction's fields are converted into datapath control signals, and simultaneously the blank is read. For simplicity, just the register file is used to depict this stage. The right half is shaded to depict the read (vs. write).

During execute, the blank is used to perform the instruction's operation or to compute an address, or an adder is used for branches.

ALU

During data memory access, the blank may be read (for a load instruction) or written (for a store instruction). For load, the right half is shaded, indicating read. (For store, the left half would be shaded).

data memory (DM)

During write back, the blank may be written by certain instructions (like R-type instructions). The left half is shaded to indicate write (vs. read). Although two Reg icons appear in the stylized depictions, only one register file exists.

An instruction that does no operation to change state.

nop

Although the compiler generally relies upon the hardware to resolve hazards and thereby ensure correct execution, the compiler must understand the blank to achieve the best performance. Otherwise, unexpected stalls will reduce the performance of the compiled code.

pipeline

To discard instructions in a pipeline, usually due to an unexpected even

Flush

Prediction of branches at runtime using runtime information

Dynamic branch prediction

Also called branch history table. A small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.

Branch prediction buffer

A structure that caches the destination PC or destination instruction for a branch. It is usually organized as a cache with tags, making it more costly than a simple prediction buffer.

Branch target buffer

A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches.

Correlating predictor

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.

Tournament branch predictor

Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow.

Exception

An exception that comes from outside of the processor. (Some architectures use the term interrupt for all exceptions.)

Interrupt

An interrupt for which the address to which control is transferred is determined by the cause of the exception

Vectored interrupt

A 64-bit register used to hold the address of the affected instruction. (Such a register is needed even when exceptions are vectored.)

ELR

A register used to record the cause of the exception. In the LEGv8 architecture, this register is 32 bits, although some bits are currently unused. Assume there is a field that encodes the three possible exception sources mentioned above, with 8 representing an undefined instruction, 10 representing arithmetic overflow or underflow, and 12 representing hardware malfunction.

ESR

Also called imprecise exception. Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception.

Imprecise interrupt

Also called precise exception. An interrupt or exception that is always associated with the correct instruction in pipelined computers.

Precise interrupt

The parallelism among instructions.

Instruction-level parallelism

A scheme whereby multiple instructions are launched in one clock cycle

Multiple issue

An approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution.

Static multiple issue

An approach to implementing a multiple-issue processor where many decisions are made during execution by the processor.

Dynamic multiple issue

Two primary and distinct responsibilities must be dealt with in a multiple-issue pipeline:

Packaging instructions into issue slots: how does the processor determine how many instructions and which instructions can be issued in a given clock cycle? In most static issue processors, this process is at least partially handled by the compiler; in dynamic issue designs, it is normally dealt with at runtime by the processor, although the compiler will often have already tried to help improve the issue rate by placing the instructions in a beneficial order. Dealing with data and control hazards: in static issue processors, the compiler handles some or all of the consequences of data and control hazards statically. In contrast, most dynamic issue processors attempt to alleviate at least some classes of hazards using hardware techniques operating at execution time.

The positions from which instructions could issue in a given clock cycle; by analogy, these correspond to positions at the starting blocks for a sprint.

Issue slots

An approach whereby the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions.

Speculation

The set of instructions that issues together in one clock cycle; the packet may be determined statically by the compiler or dynamically by the processor.

Issue packet

A style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields.

Very Long Instruction Word (VLIW)

Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.

Use latency

A technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.

Loop unrolling

The renaming of registers by the compiler or hardware to remove antidependences.

Also called name dependence. An ordering forced by the reuse of a name, typically a register, rather than by a true dependence that carries a value between two instructions.

Antidependence

An advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution.

Superscalar

Hardware support for reordering the order of instruction execution to avoid stalls.

Dynamic pipeline scheduling

The unit in a dynamic or out-of-order execution pipeline that decides when it is safe to release the result of an operation to programmer-visible registers and memory.

Commit unit

A buffer within a functional unit that holds the operands and the operation.

Reservation station

The buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register.

Reorder buffer

A situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait.

Out-of-order execution

A commit in which the results of pipelined execution are written to the programmer visible state in the same order that instructions are fetched.

In-order commit

Both pipelining and multiple-issue execution increase peak instruction throughput and attempt to exploit blank.

instruction-level parallelism (ILP)

The organization of the processor, including the major functional units, their interconnection, and control.

Microarchitecture

The instruction set of visible registers of a processor; for example, in LEGv8, these are the 32 integer and 32 floating-point registers.

Architectural registers

Verilog can describe processors for simulation or with the intention that the Verilog specification be blank.

synthesized

The inherent execution time for an instruction.

Instruction latency

The first supercomputer.

CDC 6600

An approach that uses dynamic hazard detection, generalized forwarding, and reservation stations.

Tomasulo's algorithm

A computer that used a four-stage pipeline to overlap fetch, decode, and execute.

The IBM 7030, also known as Stretch,

The computer that introduced Tomasulo's algorithm.

IBM 360/91

Out-of-order instruction commits led to this unpopular situation.

imprecise interrupts

The original design for a superscalar processor was a two-issue machine called _________

Cheetah

_________ is a static multiple issue approach that was used in processors found in Cydrome and Multiflow mini-supercomputers.

VLIW

_________ is a more compiler-intensive approach to multiple issue that removed many VLIW drawbacks.

EPIC

_________ is an approach that uses aggressive loop unrolling and path prediction and has been used to exploit higher levels of ILP.

Trace scheduling