VL 2 Flashcards
CPU trends
Multi-Core processors
SIMD support
Combination of core private and shared caches
Heterogeneity
Hardware support for energy control
64 bit architectures
CPU Challenge
Memory hierarchy
Intel Kaby Lake Processor
Ring Network
Skylake XP Socket
Mesh network
Skylake cache architecture
L1l Cache
L1D: 4 cycles latency
L2: 14 cycles
L3: 50-70 cycles
DRAM: hundreds of cycles
Processors for mobile devices: ARM
Systems on a chip: become part of a chip
Apple M1
ARM based system on a chip:
16 billion transistors
8-core CPU
8-core GPU
16-Core neural engine
Big little processing is current trend
M1 Pro and M1 Max: unified memory
Traditionally, GPU and CPU have their own memory-> now memory is shared
Motivation for accelerators
Increase computational speed
Reduce energy consumption
Achieved by specialization
Types of accelerators
GPGPUs
Many standard cores
FGPA
Accelerator: System integration
Nodes with attached accelerators
Accelerator only design (no standard core)
Accelerator booster
GPU
Graphics processing unit
Used for visualization via interfaces linke OpenGL
Combines every type of parallelism: multithreading, MIMD,SIMD, and instruction-level
GPU Challenges
Programming a GPGPU
Coordinating scheduling of computation on the system processor and GPU
Managing transfer of data between system memory and GPU memory
Streaming multiprocessor
L1 Shared cache
HBM
High bandwidth memory
Vertical stacks of memory dies connected by microscopic wires through-silicon vias
FPGAs as accelerators
Specify hardware to execute algorithm
Two companies: altera and Xilinx
NUMA
Shared memory system
Multiple CPUs with multiple cores
Single physical address space
Non-uniform memory access
Distributed memory systems or clusters
Coupling of individual nodes via network
No shared physical address space
Explicit transfer of messages between nodes
Warm water cooling in data center
Reduced noise level
Reduced server power consumption
Reduced cooling power consumption