Unit 7 Flashcards
A computer system with at least two processors. This computer is in contrast to a uniprocessor, which has one, and is increasingly hard to find today.
Multiprocessor
Utilizing multiple processors by running independent programs simultaneously.
Task-level parallelism or process-level parallelism
A single program that runs on multiple processors simultaneously.
Parallel processing program
A set of computers connected over a local area network that function as a single large multiprocessor.
Cluster
A microprocessor containing multiple processors (“cores”) in a single integrated circuit. Virtually all microprocessors today in desktops and servers are multicore.
Multicore microprocessor
A parallel processor with a single physical address space.
Shared memory multiprocessor (SMP)
Speed-up achieved on a multiprocessor without increasing the size of the problem.
Strong scaling
Speed-up achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.
Weak scaling
A uniprocessor.
SISD or single instruction stream, single data stream
A multiprocessor.
MIMD or multiple instruction streams, multiple data streams
The conventional MIMD programming model, where a single program runs across all processors.
SPMD or single program, multiple data streams
The same instruction is applied to many data streams, as in a vector processor.
SIMD or single instruction stream, multiple data streams
Parallelism achieved by performing the same operation on independent data.
Data-level parallelism
The basic philosophy of blank is to collect data elements from memory, put them in order into a large set of registers, operate on them sequentially in registers using pipelined execution units, and then write the results back to memory.
vector architecture
One or more vector functional units and a portion of the vector register file. Inspired by lanes on highways that increase traffic speed, multiple lanes execute vector operations simultaneously.
Vector lane
Increasing utilization of a processor by switching to another thread when one thread is stalled.
Hardware multithreading
A thread includes the program counter, the register state, and the stack. It is a lightweight process; whereas threads commonly share a single address space, processes don’t.
Thread
A process includes one or more threads, the address space, and the operating system state. Hence, a process switch usually invokes the operating system, but not a thread switch.
Process
A version of hardware multithreading that implies switching between threads after every instruction.
Fine-grained multithreading
A version of hardware multithreading that implies switching between threads only after significant events, such as a last-level cache miss.
Coarse-grained multithreading
A version of multithreading that lowers the cost of multithreading by utilizing the resources needed for multiple issue, dynamically scheduled microarchitecture.
Simultaneous multithreading (SMT)
A multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access.
Uniform memory access (UMA)
A type of single address space multiprocessor in which some memory accesses are much faster than others depending on which processor asks for which word.
Nonuniform memory access (NUMA)
The process of coordinating the behavior of two or more processes, which may be running on different processors.
Synchronization
A synchronization device that allows access to data to only one processor at a time.
Lock
A function that processes a data structure and returns a single value.
Reduction
An API for shared memory multiprocessing in C, C++, or Fortran that runs on UNIX and Microsoft platforms. It includes compiler directives, a library, and runtime directives.
OpenMP
Communicating between multiple processors by explicitly sending and receiving information.
Message passing
A routine used by a processor in machines with private memories to pass a message to another processor.
Send message routine
A routine used by a processor in machines with private memories to accept a message from another processor.
Receive message routine
Collections of computers connected via I/O over standard network switches to form a message-passing multiprocessor.
Clusters
Rather than selling software that is installed and run on customers’ own computers, software is run at a remote site and made available over the Internet typically via a Web interface to customers. SaaS customers are charged based on use versus on ownership.
Software as a service (SaaS)
Informally, the peak transfer rate of a network; can refer to the speed of a single link or the collective transfer rate of all links in the network.
Network bandwidth
The bandwidth between two equal parts of a multiprocessor. This measure is for a worst case split of the multiprocessor.
Bisection bandwidth
A network that connects processor-memory nodes by supplying a dedicated communication link between every node.
Fully connected network
A network that supplies a small switch at each node.
Multistage network
A network that allows any node to communicate with any other node in one pass through the network.
Crossbar network
A popular high-speed link today is which stands for Peripheral Component Interconnect Express. It is called a link in that the basic building block, called a serial lane, consists of only four wires: two for receiving data and two for transmitting data. T
PCIe,
An I/O scheme in which portions of the address space are assigned to I/O devices, and reads and writes to those addresses are interpreted as commands to the I/O device.
Memory-mapped I/O
A mechanism that provides a device controller with the ability to transfer data directly to or from the memory without involving the processor.
Direct memory access (DMA)
An I/O scheme that employs interrupts to indicate to the processor that an I/O device needs attention.
Interrupt-driven I/O
A program that controls an I/O device that is attached to the computer.
Device driver
The process of periodically checking the status of an I/O device to determine the need to service the device.
Polling
A UNIX API for creating and manipulating threads. It is structured as a library.
Pthreads
The ratio of floating-point operations in a program to the number of data bytes accessed by a program from main memory.
Arithmetic intensity
It was perhaps the most infamous of supercomputers. The project started in 1965 and ran its first real application in 1976. The 64 processors used a 13-MHz clock, and their combined main memory size was 1 MB: 64 × 16 KB. The blank was the first machine to teach us that software for parallel machines dominates hardware issues.
Illiac IV