OpenMP Flashcards

1
Q

three primary API components

A

1 compiler directives
2 runtime library rountines
3 environment variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ICV thread numbers

A

OMP_NUM_THREADS
num_threads(n) pragma!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sections

A

pragma sections, pragma section
- each sections is run by 1 thread
- implicit barrier at end of sections region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

for

A
  • no sych at beginning
  • implciit barrier at end (nowait to delete it)
  • must have no ata dependency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

schedule

A
  • used for the FOR loop
    1. static: round robin, fixed chunks of n/t
    2. dynamic: given one by one at runtime, chunks 1 (good for load balance)
    3. guided: start with bigger chunks, then smaller exponentially, one by one at runtime
    4. runtime: set at runtime
    5. auto: compiler/runtime chooses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sharing attributes

A
  1. private
  2. shared
  3. default
  4. firstprivate
  5. lastprivate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

list synchronization pragmas

A

barrier, masked region, single region. critical section, atomic statement, ordered contruct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

barrier

A

pragma omp barrrier

synchs all threads
can cause load imbalance, use only when needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

masked Construct

A

pragma omp masked [ filter(integer-expression)

  • only primary thread executed code, the others skip
  • no implied barrier at either end
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

single Construct

A

pragma omp single

  • implicit barrier!
  • a thread executes it
  • clauses like private(list) firstprivate(list)
  • like initializeing data structures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ciritical construct

A

pragma omp critical [(name)

restricts execution of the associated structured block to a single thread at a time
- no implicit barrier?!
- can cause load imbalance, only use when really needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

atomic statement

A

pragma omp atomic

so a memory location is updated atomically
- like critical, but less overhead, but also only 1 operation, avoid locking
-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ordered construct

A

pragma omp ordered

block of code that must be executed in sequential order.
- The ordered construct sequentializes and orders the execution of ordered regions while allowing code outside the region to run in parallel.
- clauses: threads (default), simd
- for!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

omp locks

A

kinda like mutex

omp_lock_t lockvar; initializes a simple lock
omp_init_lock(&lockvar);
omp_destroy_lock(&lockvar); uninitializes a simple lock
omp_set_lock(&lockvar) waits until a simple lock is available and then sets it
omp_unset_lock(&lockvar) unsets lock
omp_test_lock(&lockvar) tests, if true sets lock

nestable locks possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

reduction

A

reduction (operator: list)

+ - * & ^ | && || max min

17
Q

correctness issues

A

data races: unsyched conflicting access to shared variables
loop dependencies: prevents parallelization in loops
aliasing: hidden loop dependencies,breaks compiler optimizations

18
Q

types of data races

A
  1. read after write
  2. write after read
  3. write after write
19
Q

types of dependecies

A
  1. true
  2. anti
  3. output
20
Q

types of loop dependecies

A

loop carried
loop independent

21
Q

list loop transformations

A

4 interchange, distribution, fusion, alignment

22
Q

loop interchange

A

swap inner and outer loop
not for loop carried dependencies
used to get better cache use (row first!)

23
Q

loop distribution

A

to eliminate loop carried dep,

24
Q

loop fusion

A

may create loop carried dep
reduce overhead!!
eliminates need of barrier

25
loop alignment
put 1 operation before and 1 after
26
why is work sharing bad?
1. load imbalance (both static and dynamic scehduling have drawbacks) 2. imbalance because of machine 3. limited program flexibility this means we need tasks!!
27
tasks
#pragma omp task -inside a single region - untied: tasks can move to different threads, may lose cache - if: when to defer task - sharing: default is firstprivate - priority influences execution order taskwait: waits for completion of immediate child tasks taskyield: current task can be suspended (if takes too long) dependencies: in, out, inout PRO: load balance, simple task granularity: 1. fine: more overhead, better resource use 2. coarse: schedule fragmentation
28
omp runtime routines
omp_set_num_threads(N) omp_get_max_threads() omp_get_num_threads() size omp_get_thread_num() rank
29
omp ICVs
OMP_SCHEDUL OMP_DYNAMIC OMP_NESTED