Serial Optimisation Flashcards

1
Q

What are the 2 ways code can be optimised?

A

• Time optimisations
• Space optimisations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 common ways of storing multi dimensional data in single dimensional memory?

A

• Row major
• Column major

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is inlining?

A

A common compiler optimisation:

The compiler replaces function calls with “inline” code, essentially copy-pasting the code from the function to the location where it was called.
This saves the overhead of saving/restoring the stack etc.
Excessive inlining will make the executable too large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why might a compiler not do inlining?

A

A function contains a loop
A function contains static variables
A function is recursive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is dead code/store elimination?

A

A common compiler optimisation:

• The compiler looks for:
• Dead code
• Code that is unreachable due to a series of conditionals/loops/invariants
• Or if nothing was done with a result of some (series of) computations
• Dead stores
• A variable that was computed but never used
• Also identifies associated code
• …and safely ignores it all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is code hoisting?

A

A common compiler optimisation:

• Sometimes we (whether by mistaking or for readability) write loop-invariant computations inside a (nested) loop.
• A compiler can detect this and move to the invariant computation outside of the loop to prevent redundant computation.
• When done excessively – this might lead to register spillage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is common sub-expression elimination?

A

A common compiler optimisation:

• If the same sub-expression appears multiple times, compute it once and use the result multiple times.
• As before, excessive use can lead to register spillage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is loop unrolling?

A

A common compiler optimisation:

• Replace loop with repeated statements
• Replaces memory load/stores and counter increments with constant values
• Increases program size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Vectorisation?

A

Single-Instruction, Multiple Data (SIMD)
Do 8/16 ADD/MUL operations in a single clock cycle on a single core.
Built for long-running loops that repeat the same computations on large arrays
Produces correct results only if the iterations of the loop are INDEPENDENT.
Hard for compilers to prove independence in general – which is why compilers are sometimes conservative in applying vectorisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the trade off in optimisation?

A

• The trend we see is time-space trade-off.
• If we want to save time for the executable, we need to increase the size of the program (generally).
• Usually means the more aggressive the optimisation, the longer it will take for code to compile, and the larger the executable will be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do each of the compiler flags do?

A

-O0 or –O Disable all optimisations
-O1 Turn on the most common optimisation options.
This should be faster than –O0 since the simplest optimisations are now enabled
-O2 More extensive optimisation. All –O1 optimisations plus some extra “level 2” optimisations. There are no time-space tradeoffs introduced until this point.
This is usually the recommended.
-O3 More aggressive than –O2. This this level of optimisation involves time-space trade-offs, compile times increase significantly at this level.
Recommended for codes that have loops with intensive floating-point computations
-Os Optimise for the size of the executable.
-O2-no-vec O2 with no vectorisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What should you focus on in manual optimisation?

A

It is very important to direct efforts:
• Do not waste time implementing optimisations that the compiler already does.
• Focus on optimising the RIGHT things. No point optimising a routine that takes 0.1% of the total runtime.
• Focus on the right optimisations – think of value-effort trade-offs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is profiling?

A

The process of measurement to get information about a program’s behaviour and performance - both in runtime and the utilisation of resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does gprof do?

A

Used with the GNU compiler to profile.
Prints information relating to time spent in functions.
Prints call graph also

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the profiling process?

A

Profile to find the biggest hotspots
Narrow in one region of code that is taking the most time.
“Optimise” that region (next slide process):
• Identify possible reasons/bottlenecks
• Memory Bandwidth?
• Number of registers?
• Cache utilisation?
• Inefficient code?
Repeat until –
• Maximum performance reached? (Ha! Not likely)
• You run out of time or get fed up (The sad reality)
• “Good enough”
• Remember diminishing returns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the optimisation process?

A

Check compiler optimisation reports to see whether the compiler is doing its bit. If not – try to understand why not.
Give the compiler some hints e.g. *restrict things to look at:
• Remove prints (I/0 is excruciatingly slow)
• Remove any debug code
• Remove dead code
• Check the order of memory access: Row major? Column major?
• Too many function calls?
• Arithmetic. MUL is faster than DIV i.e. a * 0.25 is faster than a / 4

17
Q

It is usually better to optimise single thread performance before going to parallelism? Why?

A

Most optimisations that help single thread performance have multiple-fold returns when running in parallel.

18
Q

Parallelism options (in order of
preference)

A
  1. Vectorisation
  2. OpenMP
  3. MPI – depending on how easily the problem can be partitioned
  4. GPU