Lecture 11 - Vector Processors Flashcards
Vector Registers
Each holds a fixed number of elements and has lots of ports so vector operations can be performed in parallel.
Vector functional units
Fully pipelined so we can start processing a new vector element on each clock cycle.
Vector load-store Unit
Data move between registers and memory at rate of one word per cycle.
Scalar Registers
Provide data as input to vector functional units.
Advantages of vector architectures
Exploit data-level parallelism to improve performance.
Potentially reduce complexity and reduce energy per operation because:
- Vector operations specify lots of independent operations which are simple to execute in parallel.
- Less switching activity in datapath.
- Drastic reduction in number of instructions.
- Regular patterns of access to register file and memory.
Vector Datapath
Partition register file and functional units into multiple lanes as always operating on same element from different vectors.
Initiation Rate
How many elemental operations are completed per cycle for each vector operation.
Start-up time.
Time before first result. As functional units are pipelined, the first result will take pipeline depth clock cycles to be produced.
Vector Length Register
Controls length of any vector operation.
Strip Mining
Operate on odd-size piece of the vector size then complete MVL sized pieces until complete.
Vector Stride
Need ability to read elements from memory separated by a fixed distance (stride). This only requires load/store unit to be modified.
Chaining
Start next dependent instruction as soon as individual elements of source vectors become available. (Read-after-write hazards)
Tailgating
Overwrite elements of one vector register as soon as they have been used by prior register. (Write-after-read hazards)
Vector-mask Control
Use a boolean vector of length MVL to control execution of a vector instruction. The vector operation turns into no-op at elements where mask bit is clear.
Scatter and Gather
Store and load vector indexed instructions to support loops which make indirect accesses to arrays.