Vorlesung 12 Flashcards
Intel’s single core assessment
power and area increase more rapidly than performance
Why multi-core processors?
POWER – Formeln auf Folie 60
What is the evolution of processor performance
Currently, end of the line?, because only 3% processor performance increase can be achieved within a year.
ARM LITTLE
most energy efficient application processor from ARM
- simple architecture
- in-order
- 8 stage pipeline
ARM big
highest performance in mobile power envelope
- complex architecture
- out-of-order
- multi-issue pipeline
When does one use ARM’s little?
“Always on, always connected” tasks like OS, UI activity
Maximum efficiency
When does one use ARM’s big
for best performance…
“demanding tasks” like browsers, gaming, content creation
What is SGEMM/W?
single-precision general matrix multiplication per Watt
What is the lesson learned from nature?
High power efficiency of human brain comes from specialization. More advanced societies have higher degree of specilization.
—–> dedicated architectures have a higher energy efficiency compared to general purpose chip types
what is the main task of the zFAS (zentrales Fahrassistenzsteuergerät)
Sensor fusion (signals from stereo cameras, radar, multi-axis acceleration sensors)
what does a Field Programmable Gate Array (FPGA) consist of?
A set of programmable macro cells
A programmable interconnection network
Programmable input/outputs
what are LUTs and what are they used for?
Look-up tables, used as function generators in memory FPGA
characteristics of software
Large flexibility, easy to modify
„Easy“ to learn, many teaching courses
Minimal infrastructure necessary
Large amount of free and open source software
characteristics of hardware
Any software requires underlying hardware
Some functions can‘t be implemented in software, e.g., amplifier
Hardware is inherently parallel
Hardware engineering is based on physical laws
where does the problem of concurrency in software come from and how can it be solved?
programming languages are usually based on sequential models…
can be solved by having parallelized code
What are characteristics of the High-Performance-Computing (HPC) world?
almost unlimited resources
reduce computing time for large computation
but energy is now a big issue!
Characteristics of the embedded world
restricted resources (memory, power,…)
heterogeneous architecture
real time and safety critical applications
What are the limitations of concurrency in software
Not possible for every algorithm: no independent instructions
More resources do not necessarily reduce the computation time
Splitting the work in two does not mean dividing the time by 2
Using n processors for n tasks might waste resources without decreasing the
computation time
Speed
up depends on the algorithm and is never linear!
Rule of thumb: N processors
√𝑁performance increase
What is the rule of thumb for the performance increase based on the number of processors?
N processors lead to a performance increase of √𝑁
Amdahl’s Law formula and meaning
PPT
Amdahl’s Law describes performance gain by parallelization
Difference between Amdahl’s and Gustafson’s Laws
Amdahl considered a fixed size problem, Gustafson a problem size that scales
What are basic types of parallelization and give a quick example
Task parallelism: several independent tasks
data parallelism: split a work into small identical tasks
loop unrolling/pipelining
What is the main challenge in parallelization?
Ensure determinism by adding lock mechanisms.
What is the problem of parallelization techniques regarding the embedded world
many parallelization techniques not suitable for the embedded world:
Not suitable for safety critical applications
Real time requirements not taken into account
What are the three domains of hardware design methodology
behavioral, structural, geometry
What is a synthesis
mapping behaviour onto a structure
Analysis
extracting behaviour from structure
Generation
mapping structure onto geometry
Extraction
extracting structure from geometry
Synthesis techniques
Circuit synthesis: mapping transfer functions onto transistor structures
Logic synthesis: mapping Boolean equations onto gates
High Level synthesis: mapping RTL specifications onto automatons and datapaths
System synthesis: mapping a system specification onto a mixed HW/SW structure
DSP (digital signal processing) Algorithms, steps of the transformation
Allocation:
selecting the number and types of resources
E.g. number of processors, memories, communication
Scheduling:
determines execution order of operations/processes
Mapping:
assigning operations/processes to resources
HW/SW codesign flow
PPT
HW/SW codesign
PPT
ab 105 nicht drin, aber sieht hart useless aus
USELESS