Serial Optimisation Flashcards
What are the 2 ways code can be optimised?
• Time optimisations
• Space optimisations
What are the 2 common ways of storing multi dimensional data in single dimensional memory?
• Row major
• Column major
What is inlining?
A common compiler optimisation:
The compiler replaces function calls with “inline” code, essentially copy-pasting the code from the function to the location where it was called.
This saves the overhead of saving/restoring the stack etc.
Excessive inlining will make the executable too large
Why might a compiler not do inlining?
A function contains a loop
A function contains static variables
A function is recursive
What is dead code/store elimination?
A common compiler optimisation:
• The compiler looks for:
• Dead code
• Code that is unreachable due to a series of conditionals/loops/invariants
• Or if nothing was done with a result of some (series of) computations
• Dead stores
• A variable that was computed but never used
• Also identifies associated code
• …and safely ignores it all
What is code hoisting?
A common compiler optimisation:
• Sometimes we (whether by mistaking or for readability) write loop-invariant computations inside a (nested) loop.
• A compiler can detect this and move to the invariant computation outside of the loop to prevent redundant computation.
• When done excessively – this might lead to register spillage
What is common sub-expression elimination?
A common compiler optimisation:
• If the same sub-expression appears multiple times, compute it once and use the result multiple times.
• As before, excessive use can lead to register spillage
What is loop unrolling?
A common compiler optimisation:
• Replace loop with repeated statements
• Replaces memory load/stores and counter increments with constant values
• Increases program size
What is Vectorisation?
Single-Instruction, Multiple Data (SIMD)
Do 8/16 ADD/MUL operations in a single clock cycle on a single core.
Built for long-running loops that repeat the same computations on large arrays
Produces correct results only if the iterations of the loop are INDEPENDENT.
Hard for compilers to prove independence in general – which is why compilers are sometimes conservative in applying vectorisation
What is the trade off in optimisation?
• The trend we see is time-space trade-off.
• If we want to save time for the executable, we need to increase the size of the program (generally).
• Usually means the more aggressive the optimisation, the longer it will take for code to compile, and the larger the executable will be.
What do each of the compiler flags do?
-O0 or –O Disable all optimisations
-O1 Turn on the most common optimisation options.
This should be faster than –O0 since the simplest optimisations are now enabled
-O2 More extensive optimisation. All –O1 optimisations plus some extra “level 2” optimisations. There are no time-space tradeoffs introduced until this point.
This is usually the recommended.
-O3 More aggressive than –O2. This this level of optimisation involves time-space trade-offs, compile times increase significantly at this level.
Recommended for codes that have loops with intensive floating-point computations
-Os Optimise for the size of the executable.
-O2-no-vec O2 with no vectorisation.
What should you focus on in manual optimisation?
It is very important to direct efforts:
• Do not waste time implementing optimisations that the compiler already does.
• Focus on optimising the RIGHT things. No point optimising a routine that takes 0.1% of the total runtime.
• Focus on the right optimisations – think of value-effort trade-offs.
What is profiling?
The process of measurement to get information about a program’s behaviour and performance - both in runtime and the utilisation of resources.
What does gprof do?
Used with the GNU compiler to profile.
Prints information relating to time spent in functions.
Prints call graph also
What is the profiling process?
Profile to find the biggest hotspots
Narrow in one region of code that is taking the most time.
“Optimise” that region (next slide process):
• Identify possible reasons/bottlenecks
• Memory Bandwidth?
• Number of registers?
• Cache utilisation?
• Inefficient code?
Repeat until –
• Maximum performance reached? (Ha! Not likely)
• You run out of time or get fed up (The sad reality)
• “Good enough”
• Remember diminishing returns