6 - GPU Architectures and concepts Flashcards
A GPU computation calculates 2048 elements. Each element can be computed in
its own thread. The algorithm is not sensitive to any particular block size. It may
run on many different GPUs. What number of threads and blocks would you use in
such a case? Motivate your answer.
a
a typical case of separable filter is to split a filter into one horizontal and one vertical filter, often of the same size. however, the two parts may each run with significantly different performance, one much faster than the other. suggest a likely reason why this could happen
a
Compare shared memory, global memory and register memory in terms of performance, usage and accessibility. CUDA terminology is assumed, please note if you
use OpenCL terminology.
a
Describe how Bitonic Merge Sort can be implemented on a GPU. A figure to clarify
the algorithm is expected. Your solution must be able to handle large data sets (i.e.
100000 items or more).
a
Describe how computing is mapped onto graphics in shader-based computing (expressed as kernel, input data, output data and iterations over the same data). What
limitations are imposed on your kernels compared to CUDA or OpenCL?
a
Describe how computing is mapped onto graphics in shader-based computing (expressed as kernel, input data, output data and iterations over the same data). What
limitations are imposed on your kernels compared to CUDA or OpenCL?
a
Describe how reduction can be used to calculate the maximum value of a large array
of scalar values on a GPU.
Also give at least two examples of other problems that are solved by reduction
a
Describe the major architectural differences between a multi-core CPU and a GPU
(apart from the GPU being tightly coupled with image output). Focus on the differences that are important for parallel computing.
a
in order to get the best performancec from shared memory, what should the access pattern be? clarify with a figure, inlcuding how to improve a bad access pattern.
a
Motivate why GPUs can give significantly better computing performance than ordinary CPUs. Is there any reason to believe that this advantage will be reduced over
time?
a
some image filters are separable. Describe how and why this works and give an example of a separable filter. This has potential filter. This has potential to improve performance. why?
a