Terms Flashcards
What model of parallelization does CUDA employ?
SIMT - Single Instruction, Multiple Thread
What does a fundamental computing unit consist of?
An ALU (Arithmetic Logic Unit), and an FPU floating point unit.
What is a fundamental computing unit called?
A core
What is a group of fundamental computing units called?
A streaming multiprocessor (SM)
What is a subtask of a computing task called?
Thread
What are computing subtasks organized into?
Blocks
What are groups of computing subtasks organized into?
Warps
How does the GPU “hide latency”
When waiting for data, each SM runs a different warp that is ready.
What is the essential software construct in CUDA called?
A kernel
How does CUDA tell each thread which part of the computation to do, and how does this method relate to what would be done in serial code?
It assigns index variables to each thread, like a loop index in serial code.
How would you specify a kernel launch function with 10 blocks of 100 threads per block?
__global__ void myKernel«<10, 100»>(args)
How do you specify a function should be called from the host and executed on the device? What is t called when this is launched from the device instead of the host?
__global__, dynamic parallelism
How do you specify a function should be called from the host and executed on the host?
__host__
How do you specify a function should be called from the device?
__device__
How do you specify a function should be compiled so it can be called on the host and device?
__host__ __device__
What function allocates device memory?
cudaMalloc
What function copies memory from the host to the device?
cudaMemcpy
What function frees memory from the device?
cudaFree
What function synchronizes threads within a block?
__syncThreads
What functions synchronizes all threads in a grid?
cudaDeviceSynchronize
What type of operations prevent conflicts from multiple threads accessing a variable?
atomicAdd
What type indicates an amount of memory?
size_t
What type is used for GPU errors?
cudaError_t
What type signature means an unsigned vector with 3 components? What’s the alternative when specifying gird and block sizes?
uint3, dim3