Chapter 5 - OpenMP Flashcards
What is OpenMP?
API for shared memory MIMD programming
Open Multi-Processing
View a system using OpenMP as a collective of autonomous cores, all having access to the same memory.
What are some differences between omp and pthreads?
Both are standard APIs for shared-memory programming.
Pthread needs the programmer to explicitly define the behaviour of threads. In omp the programmer can just let the program know that a specefic block of code will be executed by threads. The compiler and run-time system defines the specifics of the thread use.
Pthread is a library, but OMP needs compiler support in addition
Pthread pros: lower level, gives opportunity to program virtually any possible thread behaviour
Pthread Cons: Need to specify every detail of thread behaviour - more difficult to implement
Pros OMP: Simpler to implement - runtime and compiler takes care of the details.
Cons OMP: Some lower level thread behaviour may be more difficult to implement
What is a directives-based shared-memory API?
There are special preprocessor instructions know as pragmas.
What are pragmas?
Added to a system to allow behaviour that aren’t part of the basic C specifications.
Not all compilers support pragmas, and will ignore them.
compilator discovers these during its initial scan. If it understands the text, their functionality is implemented, if not, they are ignored.
How are omp pragmas defined ?
pragma omp
What is the header file of omp?
include <omp.h></omp.h>
List omp directives and what they do
pragma omp parallel
Specifies that the structured block of code that follows, should be executed by multiple threads. Number of threads started is determined by run-time system (typically one thread per core, but algorithm for deciding this is quite complicated).
parallel directive + num_threads clause to specify number of threads. System can’t guarantee that n_threads will be started, because of system limits, but most of the time it will.
Directive that tells the compiler that we need a mechanism to ensure the following block of code is accessed mutually exclusive by threads - only one thread at a time
Parallel for directive forks a team to execute the following block. This structured block must be a for-loop.
Only master thread reaches within this block
What is a clause in omp?
Just some text that modifies a directive
What is a team in omp?
The collection of threads executing the directive(parallel) block
original thread + (n-1) new threads
What happens when #pragma omp parallel is used?
From the start, the program is running a single thread.
When the directive is reached, the original thread continues execution. Then n-1 additional threads are started.
Each thread executes the following block of code in parallel.
When this block is completed, there is an implicit barrier. A thread that has completed the block will wait for all other threads in the team to complete. The children will then terminate and the parent continues executing the following code.
Define these terms in omp:
master, parent, child
master: first thread of execution, thread 0
parent: thread that encountered parallel directive and started a team of threads. This is often the master thread.
child: Each thread started by parent
What data does a child thread have access to
ID: rank, omp_get_thread_num();
number of threads in team: omp_get_num_threads();
Stack, and therefor local variables
How are critical sections handled in omp to avoid condition variables?
pragma omp critical
Following code block is accessed by one block at the time.
What is varible-scope in omp?
Scope of a variable refers to the set of threads that can access the variable in a parallel block.
Shared scope: Accessible by all threads
Private scope: Accessible by a single thread
What is the default scope of variables declared outside a parallel block, and within?
Outside: Global
Within: Private
What is a reduction variable in openmp?
A reduction operator is a binary operation (e.g. add, mul)
A reduction is a computation that repeatedly applies the same reduction operator to sequence of data to get a single result.
A reduction variable is where all the intermediate results of the operation are stored.
How can reduction be used in omp?
pragma omp parallel reduction(+: global_result)
Add reduction clause to parallel directive
This specifies that the global_result is the reduction variable.
What happens is that omp creates private variables for each thread, and run-time stores each thread’s result in this variable. Omp then creates a critical sections and adds all the private variables together.
The private values are initialized to the identity value of the operator:
+ : 0
- : 0
* : 1
&& : 1
…and so on
When is it beneficial to use reduction?
When a function call happens within a critical section, it will be serialized. But using reduction avoids this serialization.
How are for-loops parallelized using the parallel for-directive
Iterations are divided between the threads. The default partitioning of the iterations is done by the system, where in normal parallel directive, the work is partitioned by the threads themselves.
In for loops, a normal partitioning is giving the first m/n_threads iterations to thread 0, and the the next m/n_thread to 1, and so on.
Compiler does not check for dependences between the iterations. This can cause error during execution, programmer needs to take care of this.
What are the default scope of loop variables in a parallel for directive?
private
What types of for loops can be parallelized?
Only loops with canonical form
Not while, do-while
Only for-loops where the number of iterations can be determined from the loop statement itself (i: i<n: i++) and prior to execution of loop
Loops that are infinite, or that has conditional breaks in them cannot
What is a canonical form of a loop?
Loops where number of iterations can be determined prior to the execution of the loop.
What is a loop-carried dependence?
A dependence between loop iterations where a value is calculated in one iteration, and the result is used in a subsequent iteration.