- no sych at beginning - implciit barrier at end (nowait to delete it) - must have no ata dependency

- used for the FOR loop 1. static: round robin, fixed chunks of n/t 2. dynamic: given one by one at runtime, chunks 1 (good for load balance) 3. guided: start with bigger chunks, then smaller exponentially, one by one at runtime 4. runtime: set at runtime 5. auto: compiler/runtime chooses

1. private 2. shared 3. default 4. firstprivate 5. lastprivate

#pragma omp masked [ filter(integer-expression) - only primary thread executed code, the others skip - no implied barrier at either end

#pragma omp single - implicit barrier! - a thread executes it - clauses like private(list) firstprivate(list) - like initializeing data structures

OpenMP Flashcards by Beatrice Picco

three primary API components

1 compiler directives
2 runtime library rountines
3 environment variables

How well did you know this?

Not at all

Perfectly

ICV thread numbers

OMP_NUM_THREADS
num_threads(n) pragma!

How well did you know this?

Not at all

Perfectly

sections

pragma sections, pragma section
- each sections is run by 1 thread
- implicit barrier at end of sections region

How well did you know this?

Not at all

Perfectly

for

no sych at beginning
implciit barrier at end (nowait to delete it)
must have no ata dependency

How well did you know this?

Not at all

Perfectly

schedule

used for the FOR loop
1. static: round robin, fixed chunks of n/t
2. dynamic: given one by one at runtime, chunks 1 (good for load balance)
3. guided: start with bigger chunks, then smaller exponentially, one by one at runtime
4. runtime: set at runtime
5. auto: compiler/runtime chooses

How well did you know this?

Not at all

Perfectly

sharing attributes

private
shared
default
firstprivate
lastprivate

How well did you know this?

Not at all

Perfectly

list synchronization pragmas

barrier, masked region, single region. critical section, atomic statement, ordered contruct

How well did you know this?

Not at all

Perfectly

barrier

pragma omp barrrier

synchs all threads
can cause load imbalance, use only when needed

How well did you know this?

Not at all

Perfectly

masked Construct

pragma omp masked [ filter(integer-expression)

only primary thread executed code, the others skip
no implied barrier at either end

How well did you know this?

Not at all

Perfectly

single Construct

pragma omp single

implicit barrier!
a thread executes it
clauses like private(list) firstprivate(list)
like initializeing data structures

How well did you know this?

Not at all

Perfectly

ciritical construct

pragma omp critical [(name)

restricts execution of the associated structured block to a single thread at a time
- no implicit barrier?!
- can cause load imbalance, only use when really needed

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

atomic statement

pragma omp atomic

so a memory location is updated atomically
- like critical, but less overhead, but also only 1 operation, avoid locking
-

How well did you know this?

Not at all

Perfectly

ordered construct

pragma omp ordered

block of code that must be executed in sequential order.
- The ordered construct sequentializes and orders the execution of ordered regions while allowing code outside the region to run in parallel.
- clauses: threads (default), simd
- for!!!

How well did you know this?

Not at all

Perfectly

omp locks

kinda like mutex

omp_lock_t lockvar; initializes a simple lock
omp_init_lock(&lockvar);
omp_destroy_lock(&lockvar); uninitializes a simple lock
omp_set_lock(&lockvar) waits until a simple lock is available and then sets it
omp_unset_lock(&lockvar) unsets lock
omp_test_lock(&lockvar) tests, if true sets lock

nestable locks possible

How well did you know this?

Not at all

Perfectly

reduction

Study These Flashcards

reduction (operator: list)

+ - * & ^ | && || max min

correctness issues

Study These Flashcards

data races: unsyched conflicting access to shared variables
loop dependencies: prevents parallelization in loops
aliasing: hidden loop dependencies,breaks compiler optimizations

types of data races

Study These Flashcards

read after write
write after read
write after write

types of dependecies

Study These Flashcards

true
anti
output

types of loop dependecies

Study These Flashcards

loop carried
loop independent

list loop transformations

Study These Flashcards

4 interchange, distribution, fusion, alignment

loop interchange

Study These Flashcards

swap inner and outer loop
not for loop carried dependencies
used to get better cache use (row first!)

loop distribution

Study These Flashcards

to eliminate loop carried dep,

loop fusion

Study These Flashcards

may create loop carried dep
reduce overhead!!
eliminates need of barrier

loop alignment

put 1 operation before and 1 after

why is work sharing bad?

1. load imbalance (both static and dynamic scehduling have drawbacks) 2. imbalance because of machine 3. limited program flexibility this means we need tasks!!

tasks

#pragma omp task -inside a single region - untied: tasks can move to different threads, may lose cache - if: when to defer task - sharing: default is firstprivate - priority influences execution order taskwait: waits for completion of immediate child tasks taskyield: current task can be suspended (if takes too long) dependencies: in, out, inout PRO: load balance, simple task granularity: 1. fine: more overhead, better resource use 2. coarse: schedule fragmentation

omp runtime routines

omp_set_num_threads(N) omp_get_max_threads() omp_get_num_threads() size omp_get_thread_num() rank

omp ICVs

OMP_SCHEDUL OMP_DYNAMIC OMP_NESTED

OpenMP Flashcards

(29 cards)