Parallel Architecture Flashcards
Method used to mitigate the effect of the von Neumann architecture on performance?
Implementation of cache memory to functional processing units.
Flynn’s taxonomy
Categorization based on number of input and data streams
SISD
Execute single instruction at a time and fetch/store one item of data at a time.
SIMD
parallel systems -> apply same instruction to data from multiple data streams. (Abstractly think of them as having a single control unit and multiple ALUs – single CU broadcasts the single instruction and each ALU either applies it to the current data item or is idle). SIMD systems are usually synchronous.
What constraint can severely degrade the overall performance of a SIMD system?
The constraint that that all functional processors must either execute the same instruction or all be idle. Constraint on parallelism.
Data parallelism
Parallelism obtained by dividing data among the processors and having processors all apply similar/same instructions to their subset of the data.
Where are SIMD systems most effective?
SIMD parallelism can be very efficient on large data parallel problems. i.e., When there is a large problem size and data parallelism can be exploited to increase performance.
What are vector processors?
Processors that operate on an array of data (non-scalar data operands).
Name and explain the 5 characteristics of vector processors
1) Vector registers: registers that can store a vector of operands which can be operated on simultaneously
2) Vectorized and pipelined functional units: SIMD systems – simultaneous execution of instructions like addition operations
3) Vector instructions: Instructions that operate on vectors rather than scalars (decreased volume of load, add/operation and store instructions due to vector elements not being scalar)
4) Interleaved memory: Consists of multiple banks of memory which can be accessed relatively independently.
5) Strided memory access and hardware scatter/gather: strided memory access – program accesses elements of a vector located at fixed intervals. Scatter (writing) and gather (reading) elements of a vector located at irregular intervals. Vector systems use special hardware to do these things.
What are the pros and cons of vector processes?
Pros:
Vector processors have the virtue that for many applications, they are very fast and very easy to use. Vectorizing compilers are quite good at identifying code that can be vectorized. Further, they identify loops that cannot be vectorized, and they often provide information about why a loop couldn’t be vectorized
Cons:
On the other hand, they don’t handle irregular data structures as well as other parallel architectures, and there seems to be a very finite limit to their scalability, that is, their ability to handle ever larger problems
Explain what a MIMD system is
MIMD stands for multiple instruction, multiple data
MIMD systems are usually asynchronous.
In many MIMD systems there is no global clock, and there may be no relation between the system times on two different processors. In fact, unless the programmer imposes some synchronization, even if the processors are executing exactly the same sequence of instructions, at any given instant they may be executing different statements.
Name and explain the 2 types of MIMD systems
Shared-memory and distributed-memory systems.
In a shared-memory system a collection of autonomous processors is connected to a memory system via an interconnection network, and each processor can access each memory location. In a shared-memory system, the processors usually communicate implicitly by accessing shared data structures. In a distributed-memory system, each processor is paired with its own private memory, and the processor-memory pairs communicate over an interconnection network. So in distributed-memory systems the processors usually communicate explicitly by sending messages or by using special functions that provide access to the memory of another processor.
see the diagrams.
What is a multicore processor?
A multicore processor has multiple CPUs or cores on a single chip. Typically, the cores have private level 1 caches, while other caches may or may not be shared between the cores.
Explain and compare the shared-memory systems, UMA and NUMA
Uniform memory access (UMA): All the processors are connected directly to main memory.
Nonuniform memory access (NUMA): Processors can access each others’ blocks of main memory through special hardware built into the processors
Pro UMA: UMA systems are usually easier to program, since the programmer doesn’t need to worry about different access times for different memory locations.
Pro NUMA: Faster access to the directly connected memory in NUMA systems. Furthermore, NUMA systems have the potential to use larger amounts of memory than UMA systems.
Name and explain the types of distributed-memory systems
Distributed-memory system types:
Clusters composed of a collection of commodity systems—for example, PCs (computational nodes)—connected by a commodity interconnection network—for example, Ethernet.
If the nodes are shared-memory systems -> entire system is a hybrid system.
The grid provides the infrastructure necessary to turn large networks of geographically distributed computers into a unified distributed-memory system. In general, such a system will be heterogeneous, that is, the individual nodes may be built from different types of hardware
i.e., hetergenous -> nodes are built from different types of hardware.