CHAPTER 7: Parallel Processors Flashcards
multiprocessor
computer system with at least two processors. This computer is in contrast to a uniprocessor, which has one, and is increasingly hard to find today
task-level parallelism (process-level parallelism)
utilizing multiple processors by running independent programs simultaneously
parallel processing program
single program that runs on multiple processors simultaneously
cluster
set of computers connected over a local area network that function as a single large multiprocessor
multicore microprocessor
microprocessor containing multiple processors (“cores”) in a single integrated circuit. Virtually all microprocessors today in desktops and servers are multicore
shared memory multiprocessor
parallel processor with a single physical address space
strong scaling
speed-up achieved on a multiprocessor without increasing the size of the problem
weak scaling
speed-up achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors
SISD or single instruction stream, single data stream
uniprocessor
MIMD or multiple instruction streams, multiple data streams
multiprocessor
SPMD or single program, multiple data streams
conventional MIMD programming model, where a single program runs across all processors
SIMD or single instruction stream, multiple data streams
same instruction is applied to many data streams, as in a vector processor
data-level parallelism
parallelism achieved by performing the same operation on independent data
vector lane
one or more vector functional units and a portion of the vector register file. Inspired by lanes on highways that increase traffic speed, multiple lanes execute vector operations simultaneously
hardware multithreading
increasing utilization of a processor by switching to another thread when one thread is stalled
fine-grained multithreading
version of hardware multithreading that implies switching between threads after every instruction
coarse-grained multithreading
version of hardware multithreading that implies switching between threads only after significant events, such as a last-level cache miss
simultaneous multithreading (SMT)
version of multithreading that lowers the cost of multithreading by utilizing the resources needed for multiple issue, dynamically scheduled microarchitecture
reduction
function that processes a data structure and returns a single value
OpenMP
API for shared memory multiprocessing in C, C++, or Fortran that runs on UNIX and Microsoft platforms. It includes compiler directives, a library, and runtime directives
network bandwidth
peak transfer rate of a network; can refer to the speed of a single link or the collective transfer rate of all links in the network
bisection bandwidth
bandwidth between two equal parts of a multiprocessor, this measure is for a worst case split of the multiprocessor
fully connected network
network that connects processor-memory nodes by supplying a dedicated communication link between every node
multistage network
network that supplies a small switch at each node
crossbar network
network that allows any node to communicate with any other node in one pass through the network
polling
process of periodically checking the status of an I/O device to determine the need to service the device
PThreads
a UNIX API for creating and manipulating threads. It is structured as a library
arithmetic intensity
ratio of floating-point operations in a program to the number of data bytes accessed by a program from main memory
fallacy: peak performance tracks observed performance
Amdahl’s Law suggests how difficult it is to reach either peak; multiplying the two together multiplies the sins. The roofline model helps put peak performance in perspective.
fallacy: peak performance tracks observed performance
Amdahl’s Law suggests how difficult it is to reach either peak; multiplying the two together multiplies the sins. The roofline model helps put peak performance in perspective.
pitfall: not developing the software to take advantage of, or optimize for, a multiprocessor architecture
Placing locks on smaller portions of the page table effectively eliminated the problem