Introduction, Motivation, Problems, Models Flashcards
Processor performance
Number of (some kind of) instructions that can be carried out per unit of time
Examples:
• FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)
• IPS: Instructions per Second
Number of cores x Number of Instructions per Clock x Clock Frequency
FLOPS
FLOPS: FLoating-Point OPerations per Second (IEEE 64-bit)
Historically, floating-point operations were expensive. And account for most
of the work in numerical applications
IPS
IPS: Instructions per Second
Moore’s law
Moore‘s “law” (popular version):
Sequential processor performance doubles every 18 months.
Moore’s “Law” is (only) an empirical observation/extrapolation.
ILP
Instruction-level parallelism (ILP) is a measure of how many of the instructions in a computer program can be executed simultaneously.
ILP must not be confused with concurrency, since the first is about parallel execution of a sequence of instructions belonging to a specific thread of execution
[wikipedia]
SMT/Hyperthreading
Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel’s proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors.
[wikipeida]
FPGA, ASIC
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term “field-programmable”. The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC).
An application-specific integrated circuit (ASIC /ˈeɪsɪk/) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use.
[wikipedia]
Core
Core: The smallest unit capable of executing instructions according
to a program (used to be called “processor”, “processing unit”)
Processor
Processor: Collection of one or more cores on a single chip (as in
“multi-core processor”)
Multi-core vs Many-core
Multi-core: handful of (standard CPU) cores (2-32)
Many-core: a lot of cores, often in special-purpose processors (GPU)
Bridging models
Model costs translate reasonably into/predicts performance on wide
range of actual hardware
Minimum requirement for a good bridging model:
If Algorithm A performs better than Algorithm B in model, then the implementation of Algorithm A performs better than the implementation of Algorithm B on applicable hardware
Good model • abstracts unnecessary detail, • accounts for the essentials, • leads to interesting results, algorithms and lower bounds, • and “bridging” works
MPI
Message Passing Interface (MPI)
OpenMP
OpenMP (Open Multi-Processing)
C/C++ compiler directives like #pragma omp parallel for
Cilk
Cilk, Cilk++ and Cilk Plus are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.
Example keywords: spawn, sync, thread
[wikipedia]
MapReduce
[wikipedia]
A MapReduce framework (or system) is usually composed of three operations (or steps):
Map: each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of the redundant input data is processed. Shuffle: worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node. Reduce: worker nodes now process each group of output data, per key, in parallel.
The canonical MapReduce example counts the appearance of each word in a set of documents:
function map(documentName, String document): for each word w in document: emit (w, 1)
function reduce(word, partialCounts): sum = 0 for each pc in partialCounts: sum += pc emit (word, sum)