Lesson 4--Datapaths Flashcards
VLIW datapath
Collection of its execution units, which performs data transformation
CISC and DSP
All operands reside in memory
VLIW and RISC
Operands are in the registers before any work is done.
True or False: 16 bit buses are much more efficient than 32 bit buses
True
ISA minimum requirements
- -Minimum commands for accessing memory load, store
- -Minimum commands to perform arithmetic functions subtraction
- -Minimum commands for control functions less than zero, equals zero, branch unconditional
Choosing Operations tradeoffs
- -Efficiency
- -Performance
- -Complexity
- -Design
- -Silicon cost
Other architectures
SIMD
MIMD
microSIMD
True or False: VLIW is more efficient than microSIMD for a media processor
False
True or False: For the same number of addressible registers, microSIMD holds more multimedia data than superscalar or VLIW.
True
True or False: microSIMD supports data parallelism, but at a much higher complexity of register ports
False
Min instructions for accessing memory
- -Load a value from mem into register
- -Store a value from a register into mem
Min instructions set to perform arithmetic functions of a processor:
Subtract
Min instructions set to perform control functions of a processor:
- -Result is zero
- -Result is less than zero (no need for greater than zero)
- -Branch unconditional (goto)
VLIW datapath
Collection of its execution units, which performs data transformation
CISC and DSP
All operands reside in memory
VLIW and RISC
Operands are in the registers before any work is done.
True or False: 16 bit buses are much more efficient than 32 bit buses
TRUE
ISA minimum requirements
- -Minimum commands for accessing memory � load, store
- -Minimum commands to perform arithmetic functions � subtraction
- -Minimum commands for control functions � less than zero, equals zero, branch unconditional
Choosing Operation Repertoire tradeoffs
- -Efficiency
- -Performance
- -Complexity
- -Design
- -Silicon cost
Other architectures
- -SIMD
- -MIMD
- -microSIMD
True or False: VLIW is more efficient than microSIMD for a media processor
False
True or False: For the same number of addressible registers, microSIMD holds more multimedia data than superscalar or VLIW.
True
True or False: microSIMD supports data parallelism, but at a much higher complexity of register ports
False
Min instructions for accessing memory
- -Load a value from mem into register
- -Store a value from a register into mem
Min instructions set to perform arithmetic functions of a processor:
Subtract
Min instructions set to perform control functions of a processor:
- -Result is zero
- -Result is less than zero (no need for greater than zero)
- -Branch unconditional (goto)
The Datapath
The controller makes sure the data is latched into the buffer properly, coordinates ALU operations, and checks for hazards.
Memory to memory
CISC and DSP all the operands reside in memory. So there must be ______ to______operations and complex addressing modes.
Accumulators are target registers of the ALU
This means the compiler is forced to make binding choices and optimizations too early to reduce the memory traffic.
True
True or false: VLIW and RISC use lost of registers
False
True or false: VLIW and RISC compiler does not need to be very aggressive with register allocation
True
True or false: VLIW and RISC compiler must decouple scheduling and register allocation. So first scheduling is performed, the register allocation is done.
Datapath Operations Cycles
The 32 bit processors take way more cycles than the 16 bit versions because the 32 bit operations break down the operations into 16 bit operations and use the carry bit to extend the size.
False
True or False: Datapath Operations Cycles that use 16 bit buses operate much less efficiently than 32 bit buses.
Datapath Width
The width of the datapath equals the width of the registers that hold int and float.
Datapath Width
The width of the datapath equals the width of the registers that hold int and float.
False
True or False: The initial versions for ARM supported floating point operations, was done through a software library.
True
True or False: The initial versions for ARM processors were extremely efficient, but the time for floating point operations was 100s of clock cycles
False, there was usually two
True or False: CISC and RISC usually have only one datapaths, each the width of the datapath.
Narrower
With CISC and RISC, the integer datapath is narrower or wider than the floating point one?
DSPs datapaths
These are likely to be 40 bits or 56 bits, these are ADC widths
8 - 32
VLIW Datapath Widths had __ to __ bit independent datapaths
True
True or False: VLIW Datapath Widths could be reconfigured to support floating point operations
Operation Repertoire
Choosing which operations to include in the ISA is difficult
Additional Operation Repertoire tradeoffs
–application analysis
–execution frequency
–implementation complexity
all must be considered for this operation
The characteristics of application domain are:
- -Simple integer and compare operations for the basic units of execution of any program.
- -datapaths are extremely important.
- -In CISC Carry, Overflow, and other flags are set by the arithmetic operations
CISC
In CISC or RSIC: Overflow, and other flags are set by the arithmetic operations?
VLIW
With CISC or VLIW: Which one has more than one arithmetic operations occurring at the same time?
VEX
With VEX or VLIW: Where are flags stored in branch registers?
large and slow
Are Multipliers small and fast or large and slow?
smaller operations
Are VEX the multiplication is broken down into larger or smaller operations
upper and lower
Integer Multiplication: The 32 bit multiplication is divided into two 16 bit multiplications? upper and lower or upper only or lower only?
NOP and multiplication
Integer Multiplication: There is a NOP or Branch inserted to allow for the delay created with the multiplication or division?
Fixed Point Multiplication
Most embedded systems need short fixedpoint to represent important data types
In VEX there are 3 multiplication forms
In ____ there are 3 multiplication forms�low 16 * low 16, low 16 * high 16, high 16 * high 16 (number of bits)
False, it is expensive
True or False: Higher precision fixedpoint multiplication is cheap
True
True or False: Interger division is more expensive than multiplication?
Integer Division is more complex
With this type math, it is more complex because the answer may be an integer or a FP value
VEX
In ___, divs instruction is provided for basic component for an integer division.
35 cycles
Nonrestoring 32bit division can take __ cycles
Shorter and Constant
The compiler may optimize shorter or loger divisions or divisions by constants or varibles?
code size
Division is rarely critical and many systems favor code size or code quality to hardware design.
Saturated Arithmetic
This arithmetic occurs when you try to exceed the precision that is allowed for the implementation
It becomes an an overflow ; the result is 0X00000000
What is the result if we add 1 to a a 32 bit int 0XFFFFFFFF wrap around?
No
Are embedded domains overflows acceptable?
Saturated arithmetic is used.
Example: 60 + 43 ? 100. (not the expected 103.)
What is used instead of embedded domains overflows?
SIMD
In SIMD or VLIW; instruction sets, the same instruction works on a large quantity of data?
VLIW
In SIMD or VLIW; instruction sets, each instruction works on only one data set?
VLIW
In SIMD or VLIW; can be multiple instructions and multiple data sets?
MicroSIMD Parallel Subword Architecture
With this Architecture, it has 64 bit FU can process words as 1 64 bit, 2 32 bit, or 4 16 bit
microSIMD Parallel Subword Architecture
With this Architecture, data can be compacted and be fit into a long 64 bit word
True
True or False: With microSIMD, data can be compacted and be fit into a long 64 bit word.
It speeds up processing and reduces data size requirements
With microSIMD, it speeds up processing and reduces data size requirements or does it reduce processing and speed up
data size?
MIMD
MIMD or SIMD has 4 instructions on 4 cores?
SIMD
MIMD or SIMD operates on four different data items with one instruction?
Superscalar
Superscalar or SIMD Operates on 4 instructions, four cores?
VLIW
VLIW or Superscalar has one instruction, four sub operations?
False; x86 are popular
True or False: MicroSIMD Operations are not popular
MicroSIMD
In embedded systems MicroSIMD or VLIW manipulates subwords
8 bits, 16 bits, 32 bits, 64 bits
MicroSIMD subwords are configurable bits of ___bits, ___bits, ___bits, ___bits
False; They need small precision qualities
True or False: Multimedia applications usually need large precision quantities
PADD4
PADD4 or 4PADD breaks down larger words into sub words. Then adds 4 sub words together. Then operate on the four sub words.
False, they have difficulties.
True or False: In practice, microSIMD operations have no difficulties.
microSIMD operations difficulties
- -Alignment issues are a problem when breaking into subwords.
- -Structures that contain subword elements rarely align cleanly to word boundaries.
- -Unoptimized pre/post loop codes are needed.
- -Precision Issues there may need to be a few extra bits for holding intermediate stages of an algorithm
False, they have to respect control flow
True or false: MicroSIMD operations does not need to respect the control flow.
MicroSIMD
MicroSIMD or VLIW must mimic existing branches
Two operations control flow in MicroSIMD
- -it must mimic existing branches extensions
- -include partial predication
PCMPGT4
PCMPGT4 or PADD4; does the compare at a boundary. Then take the subwords on the integer boundaries do the add and subtraction. The PSELECT then
chose the branch to take.
SIMD
The sequential program can be parallelized using VEX or SIMD?
MicroSIMD
MicroSIMD or VLIW: The data is divided into subelements and then stored in a register. The entire register is operated on at the same time,
leading to 4 data points being operated on at the same time. Reducing the processing time of the operation.
MicroSIMD
MicroSIMD or VLIW: Can achieve impressive results with a minimal hardware complexity
False
True or False: complete set of microSIM extensions do not costs too much
True
True or False: automatic extrications microSIMD w/o hints by the compiler is still unproven.
False
With Manual code resturcturing is no longer needed to exploit micro SIMDparallelism
True
True or False: microSIMD can pack more data than VLIW
True
True or False: microSIMD holds more multimedia data then superscalar or VLIW architectures
VLIW
The complexity of register ports is higher in VLIW or microSIMD?
True
True or False: microSIMD is able to support a large number of operands.
Less
The number of register files is much less or more for microSIMD.
True
True or False: microSIMD is more powerful than MIMD, SIMD, VLIW
True
True or False: VLIW can do four different kinds of instructions, while SIMD cannot.
VLIW
VLIW or VEX is better for more general parallelism.
Constants
- -Specifically the immediate operands and literals.
- -Are known at compile time, others at load time.
Short
Short or Long immediate constants tend to be used in addressing modes and fit in a single encoded operation
Long
Short or Long immediate constants can be the width of the datapath.
2
There are 2, 8 or 64 methods for long immediates
Two methods for long immediates
- -partial immediate load
- -memory allocated immediate in this case the compiler allocates long immediate in memory and emits an instruction that loads the immediate. (emits?)
False; They are immediate
True or False: With Constants Branch offsets are not immediate
Constants
MIPS or VLIW have a jump instruction has 26 bit wide offsets.
MIPS
PC is word aligned in MIPS or VLIW, so you can jump 28 bits.
True
True or False: Most embedded processes are 32-bit ISAs include a branch offset large enough to cover local branches