Unit 7 Flashcards by Joshua Lane

A computer system with at least two processors. This computer is in contrast to a uniprocessor, which has one, and is increasingly hard to find today.

Multiprocessor

How well did you know this?

Not at all

Perfectly

Utilizing multiple processors by running independent programs simultaneously.

Task-level parallelism or process-level parallelism

How well did you know this?

Not at all

Perfectly

A single program that runs on multiple processors simultaneously.

Parallel processing program

How well did you know this?

Not at all

Perfectly

A set of computers connected over a local area network that function as a single large multiprocessor.

Cluster

How well did you know this?

Not at all

Perfectly

A microprocessor containing multiple processors (“cores”) in a single integrated circuit. Virtually all microprocessors today in desktops and servers are multicore.

Multicore microprocessor

How well did you know this?

Not at all

Perfectly

A parallel processor with a single physical address space.

Shared memory multiprocessor (SMP)

How well did you know this?

Not at all

Perfectly

Speed-up achieved on a multiprocessor without increasing the size of the problem.

Strong scaling

How well did you know this?

Not at all

Perfectly

Speed-up achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.

Weak scaling

How well did you know this?

Not at all

Perfectly

A uniprocessor.

SISD or single instruction stream, single data stream

How well did you know this?

Not at all

Perfectly

A multiprocessor.

MIMD or multiple instruction streams, multiple data streams

How well did you know this?

Not at all

Perfectly

The conventional MIMD programming model, where a single program runs across all processors.

SPMD or single program, multiple data streams

How well did you know this?

Not at all

Perfectly

The same instruction is applied to many data streams, as in a vector processor.

SIMD or single instruction stream, multiple data streams

How well did you know this?

Not at all

Perfectly

Parallelism achieved by performing the same operation on independent data.

Data-level parallelism

How well did you know this?

Not at all

Perfectly

The basic philosophy of blank is to collect data elements from memory, put them in order into a large set of registers, operate on them sequentially in registers using pipelined execution units, and then write the results back to memory.

vector architecture

How well did you know this?

Not at all

Perfectly

One or more vector functional units and a portion of the vector register file. Inspired by lanes on highways that increase traffic speed, multiple lanes execute vector operations simultaneously.

Vector lane

How well did you know this?

Not at all

Perfectly

Increasing utilization of a processor by switching to another thread when one thread is stalled.

Hardware multithreading

How well did you know this?

Not at all

Perfectly

A thread includes the program counter, the register state, and the stack. It is a lightweight process; whereas threads commonly share a single address space, processes don’t.

Thread

How well did you know this?

Not at all

Perfectly

A process includes one or more threads, the address space, and the operating system state. Hence, a process switch usually invokes the operating system, but not a thread switch.

Process

How well did you know this?

Not at all

Perfectly

A version of hardware multithreading that implies switching between threads after every instruction.

Study These Flashcards

Fine-grained multithreading

A version of hardware multithreading that implies switching between threads only after significant events, such as a last-level cache miss.

Study These Flashcards

Coarse-grained multithreading

A version of multithreading that lowers the cost of multithreading by utilizing the resources needed for multiple issue, dynamically scheduled microarchitecture.

Study These Flashcards

Simultaneous multithreading (SMT)

A multiprocessor in which latency to any word in main memory is about the same no matter which processor requests the access.

Study These Flashcards

Uniform memory access (UMA)

A type of single address space multiprocessor in which some memory accesses are much faster than others depending on which processor asks for which word.

Study These Flashcards

Nonuniform memory access (NUMA)

The process of coordinating the behavior of two or more processes, which may be running on different processors.

Study These Flashcards

Synchronization

A synchronization device that allows access to data to only one processor at a time.

Lock

A function that processes a data structure and returns a single value.

Reduction

An API for shared memory multiprocessing in C, C++, or Fortran that runs on UNIX and Microsoft platforms. It includes compiler directives, a library, and runtime directives.

OpenMP

Communicating between multiple processors by explicitly sending and receiving information.

Message passing

A routine used by a processor in machines with private memories to pass a message to another processor.

Send message routine

A routine used by a processor in machines with private memories to accept a message from another processor.

Receive message routine

Collections of computers connected via I/O over standard network switches to form a message-passing multiprocessor.

Clusters

Rather than selling software that is installed and run on customers' own computers, software is run at a remote site and made available over the Internet typically via a Web interface to customers. SaaS customers are charged based on use versus on ownership.

Software as a service (SaaS)

Informally, the peak transfer rate of a network; can refer to the speed of a single link or the collective transfer rate of all links in the network.

Network bandwidth

The bandwidth between two equal parts of a multiprocessor. This measure is for a worst case split of the multiprocessor.

Bisection bandwidth

A network that connects processor-memory nodes by supplying a dedicated communication link between every node.

Fully connected network

A network that supplies a small switch at each node.

Multistage network

A network that allows any node to communicate with any other node in one pass through the network.

Crossbar network

A popular high-speed link today is which stands for Peripheral Component Interconnect Express. It is called a link in that the basic building block, called a serial lane, consists of only four wires: two for receiving data and two for transmitting data. T

PCIe,

An I/O scheme in which portions of the address space are assigned to I/O devices, and reads and writes to those addresses are interpreted as commands to the I/O device.

Memory-mapped I/O

A mechanism that provides a device controller with the ability to transfer data directly to or from the memory without involving the processor.

Direct memory access (DMA)

An I/O scheme that employs interrupts to indicate to the processor that an I/O device needs attention.

Interrupt-driven I/O

A program that controls an I/O device that is attached to the computer.

Device driver

The process of periodically checking the status of an I/O device to determine the need to service the device.

Polling

A UNIX API for creating and manipulating threads. It is structured as a library.

Pthreads

The ratio of floating-point operations in a program to the number of data bytes accessed by a program from main memory.

Arithmetic intensity

It was perhaps the most infamous of supercomputers. The project started in 1965 and ran its first real application in 1976. The 64 processors used a 13-MHz clock, and their combined main memory size was 1 MB: 64 × 16 KB. The blank was the first machine to teach us that software for parallel machines dominates hardware issues.

Illiac IV

Unit 7 Flashcards

(46 cards)