SIMD Flashcards
what is simd
more unit instrutions next to each other to do same op to different data
transistor budget
less power
and twice performance ideally
what are the vector sizes in simd
SSE 128
AVX 256
AVX 512
intrinsic functions
data types: __m256 (8x32), __m256d (4x64), __m256i (any)
vector length: _mm512, _mm256
function: add, mul, sub, load, store
type and precision: pd (packed double), ps (packed single), ss (scalar single)
loadu: unaligned
load: aligned
vec efficiency
VE = N/vl
N trip count (how many times youre doing the vectorized loop)
vl vector lenght
simd intrinsics cons
low level
error prone
thats why we use compiler generated simd code!!
when does autovectorisation happen and when does it fail
- if possible
- if beneficial
fails for:
- data dependency
- alignment
- mixed data types
- function calls in loop
can we parallelise or vectorzie a loop with loop carried dependency
parallelise no
vectorize yes: if the distance is greater than the vector size
openmp simd
pragma omp simd
maybe a for loop right after if #pragma omp for simd
- can have private, firstprivate and rediction
- safelen(l) safe length to vectorise (max vec length)
- linear(list[:linear step])
aligned(list[:alignment])
for loops: simd chunks
schedule(simd:static, 5) chooses roughly 5 chunks but favours alignment, no remained loops
-remainder: end
-peel loop: beginning