Terms Flashcards
Race Condition
An error caused by timing
Critical Section
A block of code that can be executed by at most one thread at a time
P-Complete
A problem C that is solvable in polynomial time and for every other problem L that is solvable in polynomial time, L ≤ log m C
≤ log m
This is true iff there exists a function f that can be computed using only log-space s.t x ∈ A ⇔ f(x) ∈ B
NC
The set of all problems that can be solved in O((log n)^c) steps using an O(n^k) processors on a PRAM (for some c and k
Cache Coherency
The property that when one processor or core changes a cached variable that the value stored by all the other processors in their respective caches is changed to the new value to maintain consistency.
Thread Safety
A function has this property if multiple threads are able to simultaneously execute code within the function without leading to errors
CRCW PRAM
A PRAM where memory accesses enforce concurrent reads and concurrent writes among the processors
EREW PRAM
A PRAM where memory accesses enforce exclusive read and exclusive writes among the processors
False Sharing
Performance degradation caused by memory accesses that induce frequent cache updates across multiple core in systems that enforce cache coherency
How to debug a running process
- Run program using mpiexec and open another terminal
- Use the command:
gdb -pid 5038
With all of the pids associated with the mpiexec call - Use the command:
where
to find where in the code it is hanging
Amdahl’s Law
1 / S+ (1-S)/N
where n is the number of processors
and s is the percentage of program executed sequentially
The theory of P-Completeness gives a notion of “efficient parallel computation.” What does that mean, precisely?
There is probably no algorithm that can solve a p-complete problem in logarithmic time with a polynomial number of processors.
If P ≠ NC then P-complete is not in NC
We know that the circuit value problem (CVP) is P-complete. This tells us that:
It is likely that there is no efficient parallel algorithm for CVP
Problems known to be P-complete
- Breadth First Search
- Circuit Value Problem
- Lexicographically First Depth First Search
- CFG membership
- Greedy Dominating Set
- Linear Programming
How to compile OpenMP program
g++ -fopenmp filename
Why does the second program produce the wrong answer
because the line if (a[i]>a[j]) sum++; causes a race condition
Why does the second program run slower than the first even though it is being run on 4 processors?
False Sharing - multiple cores attempt to change sum.
Cache issues.
Rewrite it so that it runs faster
#pragma omp parallel for schedule(static,1) reduce(+,sum) for(int i=0; ia[j]) sum = sum+1; } } cout<<<endl; }
What is s, the portion of the execution time that must be executed sequentially?
little num/ big num
Amdal’s Law
1/ S+ (1-S/N)
How long would your code take at maximum speedup using 8 processors
S = sequential / parallel
By Amdahl’s law, what is the maximum speedup possible?
1 / S
How to run a program using OpenMPI called t3.cc
mpic++ t3.cc