Appendix A Flashcards
Instruction set architectures classification
By internal storage:
- accumulator architecture - operands are implicitly in accumulator
- stack architecture - operands are implicitly on top of the stack
- general-purpose register architecture - only explicit operands
By accessing memory and registers:
- register-memory architecture - any instruction can access memory
- load-store architecture - load and store instructions to access memory
- memory-memory architecture - not used anymore, all operands in memory
One special type is also extended accumulator or special-purpose register computers - more registers than just accumulator
Currently mostly used are general purpose register architectures - registers are faster than memories.
How many registers are sufficient?
Depends on the effectivity of the compiler, some compilers allocate GP registers for different purposes such as expression evaluation (e.g. x86 stores the result of the 8-bit division in AL and modulus in AH).
Types of GPR architectures
By number of operands:
- 2 operand format - one operand is both source and destination of the operation
- 3 operand format - one result operand, two src operands
By number of memory operands:
- multiple different amounts - from all three to none
Memory access types
byte - 8 bits
half word -16 bits
words - 32 bits
double words - 64 bits
Byte ordering
Big Endian - highest bits are in the end - 0,1,2,3,4,5,6,7
Little Endian - lowest bits are in the end - 7,6,5,4,3,2,1,0
Memory alignment
Used for accesses higher than byte - memory is then aligned by the module operation where address if address mod access size => s = 0 (e.g. for word access address 0 and 4 is aligned and address such as 2, or 6 is misaligned access
Addrsessing modes
Acutal address is called effective address
Modes:
- register - value is in register
- immidiate - constants, operand is actual value
- displacement - memory[constant + regValue]
- register indirect - memory[regValue]
- indexed - memory[reg1Value + reg2Value]
- direct/absolute - memory[constant]
- memory indirect - memory[memory[regValue]]
- autoincrement/autodecrement - memory[regValue = regValue + d/regValue = regValue - d]
- scaled - memory[constant + reg1Value + reg2Value*d]
regValue - value in register
reg1Value, reg2Value - multiple registers value
d - displacement
constant - constant value
Types of operands
character - 8 bits
half word - 16 bits
word - 32 bits
single-precision floating-point - 32 bits
double-precision floating-point - 64 bits
Decimal operands
- packed and binary coded decimals - 4 bits encode 0 - 9 and 2 digits are packed in each byte
- numeric character strings - unpacked decimals
- used to precisely repersent decimals which are difficult to represent in binary (e.g. 0.1)
Operations in instruction sets
- Arithmetic and logic operations - e.g. add, subtract, multiply, divide, or, and
- Data transfer - load-store
- Control - branch, jump, etc.
- System - operating system call, virtual memory management
- Floating point - floating point operations - add, subtract, multiply, divide…
- Decimal - decimal add, decimal multiply, decimal to character conversions
- String - string move, string compare
- Graphics - pixel and vertex operations
Control flow instructions types
conditional branches
jumps
procedure calls
procedure returns
Adddressing modes for control flow
- PC-relative - adding displacement to program counter (PC)
- position independent
- instructions don`t need to be fully loaded
- usually smaller jumps
Register indirect jumps - when jump address is unknown at compilation time - case or switch statements
- virtual functions or methods
- higher-order functions or function pointers
- dynamically shared libraries
They all usually load jump address from memory
Condition branch options
- Condition code - tests special bits set by ALU operations, possibly under program control
- Condition register/limited comparison - tests arbitrary register with the result of a simple comparison
- Compare and branch - Compare is part of the branch
Procedure invocation - saving processor state
Storing the whole processor state in other registers (older architectures). Using load and store operations to store processor`s state (newer architectures).
Caller saving vs. Callee saving
In caller saving calling procedure is responsible to save the state of the processor.
In callee saving called procedure is responsible for saving the state of the processor.
Encoding instructions - considerations
- desire to have as many registers and addressing modes as possible
- impact of the register and addressing modes on the average instruction size and average program size
- instructions encoded into lengths that are easy to handle in a pipelined implementation.
Instruction sizes
Variable - instructions can have different sizes - x86, VAX
Fixed - instructions have the same length - RISC V, ARM, MIPS
Hybrid - instructions have different lengths however, there are only few different sizes of instructions. The size is specified by the opcode
Compilers - golas
Goal:
1. valid programs must be compiled correctly
2. speed of compiled code
Other goals:
- fast compilation, debugging support, interoperability among other languages
Compilers - structure
Front end per language
- Dependencies - language dependent, machine independent
- Function - transforms language to common intermediate form
High-level optimizations
- Dependencies - Somewhat language dependent, largely machine independent
- Function - loop transformation, procedure inlining…
Global optimizer
- Dependencies -Small language dependencies, machine dependencies slight
- Function - Including global and local optimizations + register allocations
Code generator
- Dependencies - Highly machine dependent, language independent
- Function - Detailed instruction selection and machine-dependent optimizations, may be included or followd by assembler
Compilers - phase ordering problem
Transformations can cause irreversible change of the program structure. Thus compilers need to choose some procedures, or loops to expand inline before knowing their sizes.
Compiler - optimizations
- High-level optimizations - source is fed to later optimization passes
- Local optimizations - optimizing code only within straight-line code fragment (basic block)
- Global optimizations - extends optimizations accross branches and introduces a set of transformations aimed at optimizing loops
- Register allocation - associates registers with operands
- Processor-dependent optimizations - attempt to take advantage of specific architectural knowledge
Register allocation
- based on technique of graph colouring
- goal is to achieve 100% allocation of the registers for active variables
- NP-complete problem
- Works best with at least 16 GP registers
Compiler - optimizations 2
- High-level - at or near source level - processor independent
- procedure integration - replace procedure call by its body
- Local - within straight line code
- common subexpression elimination
- Constant propagation
- Stack height reduction
- Global - across a branch
- global common subexpression elimination
- copy propagation
- code motion
- induction variable elimination
- Processor-dependent - depends on processor knowledge
- Strength reduction
- Pipeline scheduling
- Branch offset optimization
Allocating data
Stack - allocates local variables
- grows and shrinks according to calls and returns
- objects addressed relative to stack pointer
- primarily used for scalars
Global data area
- statically declared objects
- arrays and other aggregated structures
Heap
- dynamically created structures
- access through pointers
Register allocation
- more effective for stack allocated ojbects rather than global variables
- almost impossible to use for heap-allocated objects => accessed via pointers
- aliased variables (in stack and global data area) are difficult to register-allocate
Basic principle in architecture
frequent cases fast, rare cases correct
Helping compiler writer
- Provide regularity - when it makes sense, data types, instructions and addressing modes should be orthogonal. Orthogonal => independent. E.g. for every instruction if it can run in one addressing mode, it should be capable to run also in different addressing modes
- Provide primitives, not solutions - special features to support a higher-languages are usually ineffective, or unusable
- Simplify trade-offs among alternatives
- Provide instructions that bind the quantities known at compile time as constants
Multimedia instructions
SIMD - lot of problems, not much used, ignores basic principles to help the compiler writer
Alternatives - MMX, AltiVec, SSE
Addressing in vector processors
Vector computers provide strided addressing, gather/scatter addressing, register indirect addressing in multiple registers
Risc-V - registers
32 64-bit GPRs - x0..x31 - integer registers
32 FPRs - can hold 32-bit single-precision, or 64-bit double-precision
x0 is always 0
Risc-V - data types
Integer - 8, 16, 32 and 64-bit
Floating-point - 32 single-precision, 64 double precision
Operations work on 64-bit integers and 32 and 64 bit single or double precision
When using smaller integer operations that 64 bits, zeros, or sign bits are appended
Risc-V - addressing types
immidiate or displacement both with 12-bit wide fields
register indirect - placing 0 to the 12-bit displacement
absolute adressing - using x0 as base register
Risc-V - addressing architecture
byte addressable memory
load-store architecture
aligned and unaligned memory accesses - unaligned access may run extremly slow
Risc-V - instruction format
Fixed instruction length
Two addressing modes encoded in the opcode
7 bit primary opcode
12 bits for immidiate values, displacement, or PC-relative branches
Risc-V - instruction types
R-type - register-register value - rd destination register, rs1 - register 1, rs2 register 2
I-type - immidiate values - rd - destination, rs1 - first source base register, immediate - value displacement
S-type - store, compare and branch - rs1 - base register first source, rs2 - data source to store second source, immediate - displacement offset
U-type - jump and link, jump and link registers - rd - register destination for return PC, rs1 - target address for jump and link register, immidiate - target address for jump and link