Introduction, Motivation, Problems, Models Flashcards

1
Q

Processor performance

A
Number of (some kind of) instructions that can be carried out per unit
of time

Examples:
• FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)
• IPS: Instructions per Second

Number of cores x Number of Instructions per Clock x Clock Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

FLOPS

A

FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)

Historically, floating-point operations were expensive. And account for most
of the work in numerical applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

IPS

A

IPS: Instructions per Second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Moore’s law

A

Moore‘s “law” (popular version):
Sequential processor performance doubles every 18 months.

Moore’s “Law” is (only) an empirical observation/extrapolation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ILP

A

Instruction-level parallelism (ILP) is a measure of how many of the instructions in a computer program can be executed simultaneously.

ILP must not be confused with concurrency, since the first is about parallel execution of a sequence of instructions belonging to a specific thread of execution

[wikipedia]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SMT/Hyperthreading

A

Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel’s proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors.

[wikipeida]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FPGA, ASIC

A

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term “field-programmable”. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC).

An application-specific integrated circuit (ASIC /ˈeɪsɪk/) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use.

[wikipedia]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Core

A

Core: The smallest unit capable of executing instructions according
to a program (used to be called “processor”, “processing unit”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Processor

A

Processor: Collection of one or more cores on a single chip (as in
“multi-core processor”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multi-core vs Many-core

A

Multi-core: handful of (standard CPU) cores (2-32)

Many-core: a lot of cores, often in special-purpose processors (GPU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Bridging models

A

Model costs translate reasonably into/predicts performance on wide
range of actual hardware

Minimum requirement for a good bridging model:
If Algorithm A performs better than Algorithm B in model, then the implementation of Algorithm A performs better than the implementation of Algorithm B on applicable hardware

Good model
• abstracts unnecessary detail,
• accounts for the essentials,
• leads to interesting results, algorithms and lower bounds,
• and “bridging” works
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MPI

A

Message Passing Interface (MPI)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

OpenMP

A

OpenMP (Open Multi-Processing)

C/C++ compiler directives like
    #pragma omp parallel for
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cilk

A

Cilk, Cilk++ and Cilk Plus are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

Example keywords: spawn, sync, thread

[wikipedia]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

MapReduce

A

[wikipedia]

A MapReduce framework (or system) is usually composed of three operations (or steps):

Map: each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of the redundant input data is processed.
Shuffle: worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node.
Reduce: worker nodes now process each group of output data, per key, in parallel.

The canonical MapReduce example counts the appearance of each word in a set of documents:

function map(documentName, String document):
  for each word w in document:
    emit (w, 1)
function reduce(word, partialCounts):
  sum = 0
  for each pc in partialCounts:
    sum += pc
  emit (word, sum)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parallel computing

A

Parallel computing

The discipline of efficiently utilizing dedicated parallel resources (processors, memories, …) to solve individual, given computational problems.

Typical concerns:
Solving (difficult) problems faster!
Specifically:
Parallel resources with significant inter-communication capabilities,
for problems with non-trivial communication and computational
demands

17
Q

GPGPU

A

General-purpose computing on graphics processing units

18
Q

HPC

A

High-Performance Computing

19
Q

Distributed computing

A

Distributed computing

The discipline of making independent, non-dedicated resources
available and cooperate toward solving specified problem
complexes.

Typical concerns:
Correctness, availability, progress, security, integrity, privacy, robustness, fault tolerance, …

Specifically/typically:
No centralized control

20
Q

Concurrent computing

A

Concurrent computing:

The discipline of managing and reasoning about interacting processes that may (or may not) progress simultaneously

Typical concerns:
Correctness (often formal), e.g. deadlock-freedom, starvation-
freedom, mutual exclusion, fairness, …

21
Q

IPC

A

Interprocess communication (IPC) refers to the mechanisms an operating system provides to allow the processes to manage shared data.

[wikipedia]

22
Q

Process calculi

A

Process calculi such as

Ambient calculus
Calculus of communicating systems (CCS)
Communicating sequential processes (CSP)
Join-calculus
π-calculus
23
Q

Parallel vs. Concurrent computing

A

Concurrent computing: Focus on coordination of access to/usage of shared resources (to solve given, computational problem)

Parallel computing: Focus on dividing given problem (specification,
algorithm, data) into subproblems that can be solved by dedicated
processors (in coordination)

24
Q

Aspects of parallelization

A

•Algorithmic: Dividing computation into independent parts that can be executed in parallel. What kinds of shared resources are necessary? Which kinds of coordination? How can overheads be minimized (redundancy, coordination, synchronization)?

•Scheduling/Mapping: Assigning parts of computation to
processors in good order

  • Load balancing: (Re)assigning independent parts of computation to processors such that all resources are utilized to the same extent (evenly and efficiently)
  • Communication: When must processors communicate? How?
  • Synchronization: When must processors agree/wait?
  • Linguistic: How are the algorithmic aspects expressed? Concepts (programming model) and concrete expression (programming language, interface, library)

•Pragmatic/practical: How does the actual, parallel machine look?
What is a reasonable, abstract (“bridging”) model?

•Architectural: Which kinds of parallel machines can be realized?
How do they look?

25
Q

Levels of parallelism

A

Implicit (processor)parallelism [Hardware&Compiler, “Free lunch”]

  • Gates and functional units
  • Instruction Level Parallelism (ILP)
  • SIMD/Vector parallelism
  • SIMD/Vector parallelism
  • Cores/threads/processes
  • Communication
  • Synchronization
  • Multiple applications
  • Coupled applications
26
Q

RAM

A

Standard sequential model: The RAM (Random-Access Machine)

Processor (ALU, PC, registers) capable of
executing instructions stored in memory on
data in memory

Execution cost model, first assumption: All
operations (ALU instructions, memory access)
take same (unit) time.

Useful indeed, basis of much of sequential algorithmics

Not realistic: Memory access time is (much) slower than instruction execution

27
Q

PRAM

A

Parallel ram

Processors work in lock-step (all same program, or individual programs), all perform an instruction in each step (clock tick)

Unit-time instruction and memory access (uniform)

Assume as many processors per step as convenient

28
Q

PRAM read write conflict resolution

A

PRAM conflict resolution: What happens if several processors in the same time step access the same memory location?

  • EREW PRAM: Not allowed, neither read nor write
  • CREW PRAM: Concurrent reads allowed, concurrent writes not
  • CRCW PRAM: Both concurrent read and write

E… Exclusive
C… Concurrent

The read causes no discrepancies while the concurrent write is further defined as:

  • COMMON: Conflicting processors must write same value
  • ARBITRARY: One write succeeds
  • PRIORITY: A priority scheme determines which
29
Q

UMA/NUMA

A

UMA (Uniform Memory Access):

Access time to memory location independent of location and accessing processor, e.g., O(1), O(log M)

Examples: RAM, PRAM

NUMA (Non-Uniform Memory Access):
Access time depends on processor and location.

Examples: real processors

30
Q

Cellular automaton

A

Yet another parallel architecture model (totally non-RAM):

Cellular automaton, systolic array, … : Simple processors without memory (finite state automata, FSA), operate in lock step on (unbounded) grid, local communication only

31
Q

Flynn’s taxonomy

A

•SISD: Single processor, single stream of instructions, operates on
single stream of data. Example: Sequential architecture (e.g.
RAM)
•SIMD: Single processor, single stream of operations, operates on
multiple data per instruction. Example: traditional vector
processor, SIMD-extensions, GPU(?) (PRAM, some variants)
•MISD: Multiple instructions operate on single data stream.
Example: Pipelined architectures, streaming architectures(?),
systolic arrays (70ties architectural idea)
Some say: Empty
•MIMD: Multiple instruction streams, multiple data streams
(PRAM, distributed memory architecture)

SPMD Same Program Multiple Data

32
Q

branch/thread divergence

A

if statements:

Processors do something different depending on the condition: MIMD, but under control of same program

SIMD restriction: Both branches are executed, for some processors as noop, depending on condition. On GPU’s this is called branch/thread divergence, and can cause severe performance loss