CUDA Flashcards
What is heterogeneous computing?
Computing in both GPU and CPU
What is the Host?
CPU and its memory
What is the Device?
GPU and its memory
How do we declare a kernel function that is to be run on the Device?
__global__
What does the keyword __global__ indicate?
A function that runs on the device and is called from host code.
What does it mean when we describe launching a kernel?
The CPU places the kernel into the GPGPU stream. Execution in the CPU continues without waiting for the kernel execution to complete.
How does the CPU launch a kernel on the device?
Triple angle brackets mark a call from the host to the device.
Device Pointers
Points to GPU memory.
May to passed to/from host code.
May not be dereferenced in host code.
Host Pointers
Point to CPU memory.
May be passed to/from device code.
May not be dereferenced in device code.
What is used to allocate memory on the GPGPU device?
cudaMalloc()
What is used to copy memory from the CPU Host to the GPGPU?
cudaMemcpy() with the cudamemcpyHostToDevice.
What is used to copy memory from the GPGPU to the CPU?
cudaMemcpy() with the cudamemcpyDeviceToHost.
Where are Device pointers stored?
Stored on the Host and passed to the kernels when they execute.
What is a thread block?
A group of threads executed together.
How are threads in different thread blocks synchronized?
Call another kernel.
How can threads within a thread block synchronize?
__syncthreads();
What threads can access the memory that is declared using the __shared__ keyword?
Threads in the same block.
What purpose does the variable blockIdx.x have in CUDA?
Access block index.
What purpose does the variable threadIdx.x have in CUDA?
Access thread index within block.
What purpose does the variable blockDim.x have in CUDA?
Get threads per block
For a kernel launch two parameter are given inside of the «<»>, what is the prupse of these parameters?
«<nblocks, nThread per block»>
What does __syncthreads() do? If a GPGPU executes in a SIMD mode, why do we need this call?
Used as a barrier to prevent data hazards.
Threads are divided into small groups of threads called warps. While the warps execute in a SIMD manner and are implicitly synchronized, the threads in a thread block may not remain synchronized.