Chapter 5 - OpenMP Flashcards

Question

What is the default clause?

Answer 1

Default(none) requires us to explicitly define the scope of variables used in the block, that was defined outside it. #pragma omp for default(none) reduction(+: global_sum) private(x, factor) shared(n)

Answer 2

#pragma omp for Does not fork any new threads, but uses already-forked threads in the coming for-loop block. If threads are created earlier in the program, and then used in the for loop, it saves overhead by not creating new threads for the for loop. #pragma omp parallel ... code ... #pragma omp for for(...){}

Answer 3

#pragma omp parallel for schedule( [, ]) Clause specifies how iterations are to be assigned to threads in a for or parallel for directive. chunksize: A number of iterations executed consecutively in a serial loop. Only used in static, dynamic and guided Types: static: Iterations assigned before loop is executed dynamic/guided: Iterations assigned while loop is running. After a thread completes, it can request more iterations auto: compiler/run-time decides schedule, good guess when elements have more equal workload runtime: Schedule determined at run-time by environment variable

Answer 4

Chunks of chunksize iterations to each thread in a round-robin fashion Example: 12 iterations, 3 threads (static, 2): thread 1: [0, 1], [6, 7] thread 2: [2, 3], [8, 9] thread 3: [4, 5], [10, 11]

Answer 5

schedule(static, iterations/num_threads)

Answer 6

Static: When each iterations takes roughly the same time to execute Dynamic: Iterations takes different times to compute Guided: Improves balance between loads when later iterations are more compute heavy

Answer 7

Iterations broken into chunksized chunks. Each thread execute a chunk, and then requests a new one. First-come, first-served assignment of chunks A bit more overhead of dynamically assigning chunks. Larger chunk size helps with this

Answer 8

Similar to dynamic. But as chunks are completed, the chunk size decreases. If no chunksize is specified, the chunksize will eventually decrease down to 1. If specified, decreases to chunksize. Last chunk may be less.

Answer 9

Uses environment variable OMP_SCHEDULE. This var can take any of the static, dyn., or guided values. You can modify this variable to try out the performance of different scheduling types.

Answer 10

#pragma omp barrier

Answer 11

omp assumes the computers architecture has some range of atomic operations. #pragma omp atomic Only used at critical sections consisting of a single C assignment statement of the following form: var = ; var++; ++var; var--; --var; Expression must not reference A thread will complete the expression before any other threads starts executing it. A critical section only performing a load-modify-store can benefit from this, as a lot of hardware is optimized for atomic load-stores.

Answer 12

Two blocks protected by critical directives with different names, can execute in parallel.

Answer 13

Pseudo code: initialize lock (one thread) // Multiple threads attempt to lock - block until ready critical section unlock Destroy lock (one thread) 2 types: simple- and nested locks

Answer 14

Can only be set once before it is unset

Answer 15

Can be set multiple times by same thread before unset

Answer 16

#pragma omp task default scope for variables in tasks is private. Specifies units of computations. When the program reaches this, a new task is generated by the omp run-time, that will be scheduled for execution, not necessarily immediately. Tasks must be launched within a parallel block, but this is generally done by only one thread in the team.

Answer 17

#pragma omp parallel ; creates a team of threads #pragms omp single ; instruct run-time to only launch 1 thread { #pragma omp task } If single is not used, all of the threads would launch multiple times

Answer 18

Operates as a barrier for tasks. Makes a task wait for all its sub-tasks to complete #pragma omp task shared(j) j = 1 + 2 #pragma omp taskwait result = j + 1

Answer 19

#pragma omp task shared(i) if (n > 20) i = func(n) if i weren't set to shared, it would have private scope

Answer 20

OpenMP threads aren't supposed to be sleeping, and critical sections should be held short as they use busy waiting

Answer 21

Directives that can split a given workload between threads for you. Have an implicit barrier at the end

Answer 22

When work is split by the function of its sub-tasks e.g. pipelining

Answer 23

Split work by input/output of its sub-tasks Every threads does the same thing

Answer 24

#pragma omp section Each section only run by one thread

Answer 25

Skip the barrier at the end of worksharing directives

Answer 26

If var was shared, and you are privitizing it, its initial state is given to all the private copies

Answer 27

When threaded region finishes, one of the private copies are stored back into the shared copy

Answer 28

Big: - less scheduling to do - limit on how big blocks can be - if an entire iteration space is one block - there is no parallelisation small: - more disruptions in memory access pattern, more units to distribute - greater flexibility to assign work to unemployed threads

Answer 29

The minimal number of iterations handed to a thread

Answer 30

Way to implement worksharing in queued systems Keep active thread pool of available worker threads Keep a queue of finite work packages assign next package in the queue every time a threads becomes available

Answer 31

One work-package spawns more wor-packages that should be distributed amongst threads

Answer 32

Alternative way for parallel programming. Generates dependency graphs take block of work and dispatch it for background execution record which blocks depend on the others Assign blocks to the team of threads in anorder that matches their dependencies

Answer 33

#pragma omp task Creates arbitrary dependency graphs Block context queued internally, to be executed at first opportunity Wait for task: #pragma omp taskwait

Answer 34

#pragma omp task void some_func(int arg1, int arg2) { // function body } Every call to this will create background tasks

Answer 35

Automates making a task out of every iteration in a loop

Chapter 5 - OpenMP Flashcards

(59 cards)