intro to parallel computing Flashcards

1
Q

explain sequential computing

A

Traditionally, software has been written for serial computation:
1. A problem is broken into a discrete series of instructions.
2. Instructions are executed one after another.
3. Only one instruction may execute at any moment in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

parallel computing

A

simultaneous use of multiple compute resources to solve a computational problem.
1. ̥run using multiple CPUs
2. problem is broken into discrete parts that can be solved concurrently
3. Each part is further broken down to a series of instructions
4. Instructions from each part execute simultaneously on different CPUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the different parallel comp memory architecture

A
  1. ̥shared mem
  2. distributed mem
  3. hybrid
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the parallel programming langs

A

*Shared memeory API : OpenMP(Open multi processing)
– C,C++,Fortran
* Distributed memory API:MPI(Message passing interface)
–C,C++,Fortran,Java,Python
* CLIK– customized C lang
* CUDA(computer unified device architecture)-fro Nvidia GPU
* pThreads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain SISD

A
  1. ̥Single instr -only one instruction stream is being acted on by the CPU during any one clock cycle.
  2. Single data: only one data stream is being used as input during any one clock cycle.
  3. Deterministic Execution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain SIMD

A
  1. type of paralle computer
  2. specialized problems characterized by a high degree of regularity, such as image processing.
  3. Two varieties: Processor Arrays and Vector Pipelines

Array processor
1. ̥Single Computer with Multiple parallel processors
2. Processing Units are designed to work together under the supervision of a single control unit.
3. Results in a single instruction stream and multiple data streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain MIMD

A
  1. ̥most common type of parallel computer.
  2. Execution can be synchronous or asynchronous, deterministic or non-deterministic
  3. Representatives: Most current supercomputers, networked parallel computer “grids” Sand multi-processor SMP computers - including some types of PCs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain MISD

A
  1. ̥Few actual examples of this class of parallel computer have ever existed.
  2. A single data stream is fed into multiple processing units.
  3. Representatives: Systolic Arrays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the different parallel computing models?

A
  • Shared Memory (without threads)
  • Threads Model
  • Distributed mem/meassge passing model
  • Data Parallel Model
  • Hybrid Model
  • Single Program Multiple Data
  • Multiple Program Multiple Data (SPMD):
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain Distributed mem/meassge passing model

A
  • Distributed mem/meassge passing model
    1. ̥A set of tasks that use their own local memory during computation.
    2. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines.
    3. Tasks exchange data through communications by sending and receiving messages.
    4. Data transfer usually requires cooperative operations to be performed by each process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain Data Parallel Model

A
  • Data Parallel Model
    1. ̥AKA Partitioned Global Address Space (PGAS) model.
    2. Address space is treated globally
    3. Most of the parallel work focuses on performing operations on a data set typically organized into a common structure, such as an array or cube
    4. A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure.
    5. Tasks perform the same operation on their partition of work
    6. On shared memory architectures, all tasks may have access to the data structure through global memory.
    7. On distributed memory architectures the data structure is split up and resides as “chunks” in the local memory of each task.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain hybrid model

A
  1. ̥A common example of a hybrid model is the combination of the message passing model (MPI) (Communications between processes on different nodes occurs over the network )with the threads model (OpenMP)( perform computationally intensive kernels using local, on-node data).
  2. Another example is using MPI with CPU-GPU (Graphics Processing Unit) programming.
    –MPI tasks run on CPUs using local memory and communicating with each other over a network.
    –Computationally intensive kernels are off-loaded to GPUs on-node.
    –Data exchange between node-local memory and GPUs uses CUDA (or
    something equivalent).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain SPMD

A
  1. ̥Single program : All tasks** execute their copy of the same program simultaneously.** This program can be threads, message passing, data parallel or hybrid.
  2. Multiple data : All tasks may use different data
    3.SPMD programs usually have the necessary logic programmed into them to allow different tasks to branch or
    conditionally execute only those parts of the program they are designed to execute. (only a portion of it.)
    4.using message passing or hybrid programming,most commonly used
    parallel programming model for multi-node clusters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain MPMD

A

1.MULTIPLE PROGRAM: Tasks may execute different programs simultaneously. The programs can be threads, message passing, data parallel or hybrid. ̥
2.MULTIPLE DATA: All tasks may use different data
3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Design issues of Parallel computing

A

1.Partitioning: Splitting to Smaller Problem
2.Mapping: Distributing to Multiple processor
3.Communication: if Required (Depend on Topology)
4.Consolidating : The Final result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Designning parallel programs

A
  • characteristically been a very manual process.The programmer is typically responsible for both identifying and actually implementing parallelism.
    *manual time consuming, complex,error-prone and iterative process.
  • various tools have been available to assist the programmer with converting serial programs into parallel programs. Common type- a parallelizing compiler or pre-processor.
  • Works in two ways–
    1. ̥Fully Automatic
    – The compiler analyzes the source code and identifies opportunities for parallelism.
    –analysis includes identifying inhibitors to parallelism and possibly a cost weighting on whether or not the
    parallelism would actually improve performance.
    –Loops (do, for) are the most frequent target
    2. Programmer Directed
    –Using “compiler directives” or possibly compiler flags,
    –May be able to be used in conjunction with some degree of automatic parallelization also.
  • Most common compiler generated parallelization is done using on-node shared memory and threads (such as OpenMP).
17
Q

Explain thread level parallellism

A
  1. ̥Multi-threading
  2. Hyper-threading
  3. Simultaneous Multi-threading
    –threads of different functional unit can run simultaneously
  4. Multi-core
    –threads can run on separate cores
18
Q

Explain instr level paralellism

A
  1. ̥pipelining–Several instructions are simultaneously at different stages of their execution
  2. super pipelining
    –several instructions are simultaneously at the same and different stages of their execution
  3. super scalar
    –several instructions are simultaneously at the same stages of
    their execution
    –CPU can execute more than one Instructions per clock cycle
  4. vector & array processing
  5. VLIW
    –Compiler is used to Identify Independent Instruction and Bundle the instructions
    –Drawback - if compiler cant find independent instr…Need to Recompile the code …Need to insert NOP’s
  6. EPIC
19
Q

What probs cannot be solved by parallel computing?

A

when there is dependency between the Instructions

20
Q

Formulae for parallel computing

A
  1. ̥T(N)= (T-F)+ F/N
    T = Total time of serial execution
    T - F = Total time of non-parallizable part
    F = Total time of parallizable part (when executed serially, not in parallel)
  2. Amdahl’s law
    – Speedup= 1/(1-F)+(F/N)
    –maximum speedup possible in parallelizing an algorithm is limited by
    the sequential portion of the code.
    –limitations …doesn’t consider size of prob…..ignores commucation cost
  3. Gustafson’s Law
    –Scaled Speedup = N-(N-1)*S
    S-percenatge of serial section
    –Law: The proportion of the computations that are sequential, normally decreases as the problem size increases.
21
Q

Limitations of single core

A
  1. ̥Wrt to power
    –limit on scaling of clock speeds
    –ability to handle on-chip heat has reached a physical limit
  2. Wrt to memory
    –need for bigger cache size
    –Memory access latency still not in line with processor speeds
  3. Wrt to ILP
    –Identifying Implicit Parallelism within the threads is limited in many Application
    –Hardware Restrictions such as, Instruction Window Size
    –Dependency
22
Q

Exaplain homogeneous multi core architecture

A
  1. ̥has multiple cores on a single chip, and all those cores are identical
  2. CPUs share memory space
  3. Some kind of communication facility between the CPUs is provided this is normally through shared memory, but accessed through the API of the OS
23
Q

Explain heterogeneous multi core architecture

A
  1. ̥Multiple cores on a single chip, but those cores might be different designs.
    2.Each core will have different capabilities.
    3.Heterogeneous system improve performance and efficiency by exposing to programmers architectural features, such as low latency software-controlled memories and inter-processor interconnect
24
Q

Explain multi core architecture

A
  1. ̥ Multi-core is a design in which single physical processor contains the core logic of more than one processor.
  2. each core has its own execution pipeline. And each core has the resources required to run without blocking resources needed by the other software threads.
  3. Adv
    –Better performance - each core is able to run at a lower frequency, dividing among them the power normally given to a single core
25
Q

What if false sharing ?

A

If two or more processors are writing data to different portions of the same cache line, then a lot of cache and bus traffic might result for effectively invalidating or updating every cached copy of the old line on other processors.

26
Q

Limitations of multi core processors

A
  1. ̥Expensize than solitary center processor
  2. presentation of multicore processor relies on how the client utilizes the PC
  3. burn through greater power
  4. processor become hot while doing more work
27
Q

CPU vs GPU

A

CPU
1. ̥Optimized for low-latency access to cached data sets
2. Control logic for out-of-order
and speculative execution

GPU
1. ̥Optimized for data-parallel,throughput computation
2. tolerant of memory latency
3. dedicated to computation

28
Q

Explain the two main components of GPU Architecture

A
  1. ̥Global memory
    * analogus to RAM
    * Accessible by both CPU and GPU
    * Currently upto 6 GB
  2. Streaming multiprocessor
    * perform actual computation
    * each SM has its own: control units,registers,execution pipeline,caches
29
Q

what are the three ways to accelarate applications ?

A
  1. ̥Libraries
    * NVIDIA cuBLAS
    * NVIDIA NPP
    * NVIDIA cuFFT
  2. OpenACC Directives
    * ?
  3. Programming lang