CUDA Flashcards
What is heterogeneous computing?
Computing in both GPU and CPU
What is the Host?
CPU and its memory
What is the Device?
GPU and its memory
How do we declare a kernel function that is to be run on the Device?
__global__
What does the keyword __global__ indicate?
A function that runs on the device and is called from host code.
What does it mean when we describe launching a kernel?
The CPU places the kernel into the GPGPU stream. Execution in the CPU continues without waiting for the kernel execution to complete.
How does the CPU launch a kernel on the device?
Triple angle brackets mark a call from the host to the device.
Device Pointers
Points to GPU memory.
May to passed to/from host code.
May not be dereferenced in host code.
Host Pointers
Point to CPU memory.
May be passed to/from device code.
May not be dereferenced in device code.
What is used to allocate memory on the GPGPU device?
cudaMalloc()
What is used to copy memory from the CPU Host to the GPGPU?
cudaMemcpy() with the cudamemcpyHostToDevice.
What is used to copy memory from the GPGPU to the CPU?
cudaMemcpy() with the cudamemcpyDeviceToHost.
Where are Device pointers stored?
Stored on the Host and passed to the kernels when they execute.
What is a thread block?
A group of threads executed together.
How are threads in different thread blocks synchronized?
Call another kernel.