CSCI 223 Virtual Memory, Instruction Pipelining, Parallelism Flashcards
used in “simple” systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames
a system using physical addressing
used in all modern servers, desktops, laptops, mobile processors; one of the great ideas in computer science; address translation process
a system using virtual addressing
why virtual memory? (4 major benefits)
- uses main memory efficiently
- simplifies memory management
- isolates address spaces (memory protection/security)
- makes programming so much easier
virtual memory as a tool for caching
VM is an array of N contiguous bytes used when compiled programs; programs are stored on disk; the contents of the array on disk are cached in physical memory (DRAM), these cache blocks are called pages
data movement between main memory and disk
page
consequences of page
large page size (typically 4-8KB), fully associative (any VP can be placed in any PP), highly sophisticated and expensive replacement algorithms (too complicated and open-ended to be implemented in hardware), write-back rather than write-through
page fault penalty is
enormous b/c disk is ~10,000x slower than DRAM
page table
an array of page table entries (PTE’s) that maps virtual pages to physical pages (per process kernel data structure in DRAM)
page hit
reference to VM word that is in physical memory
page fault
reference to VM word that is not in physical memory
handling a page fault
causes an exception handled by the system: page fault handler selects a victim to be evicted and the offending instruction is restarted, yielding a page hit
virtual memory works because of
locality
working set
at any point in time, these are the set of active virtual pages that programs tend to access (programs with better temporal locality will have smaller working sets)
if (working set size < main memory size)
good performance for one process after compulsory misses
if (SUM(working set sizes) > main memory size)
thrashing: performance meltdown where pages are swapped in and out continuously
VM as a tool for memory management
key idea: each process has its own virtual address space (it can view memory as a simple linear array; mapping function scatters addresses through physical memory)
memory allocation: each virtual page can be mapped to any physical page; a virtual page can be stored in different physical pages at different times
sharing code and data among processes: map virtual pages to the same physical page
address translation: page hit
- processor sends virtual address to MMU
2-3. MMU fetches PTE from page table in memory - MMU sends physical address to cache/memory
- cache/memory sends data word to processor
address translation: page fault
- processor sends virtual address to MMU
2-3. MMU fetches PTE from page table in memory - valid bit is zero, so MMU triggers page fault exception
- handler identifies victim (and, if dirty, pages it out to disk)
- handler pages in new page and updates PTE in memory
- handler returns to original process, restarting faulting instruction
speeding up translation with a
translation lookaside buffer (TLB)
TLB contains
complete page table entries for a small number of pages
TLB hit
eliminated memory access
TLB miss
incurs an additional memory access (the PTE)
this is rare b/c locality
VM as a tool for memory protection
extends PTE’s with permission bits; page fault handler checks these before remapping, and, if violated, send process SIGSEGV (segmentation fault)
instruction flow
read instruction at address specified by PC, process through different stages, update program counter (hardware executes instruction sequentially)
execution stages
- fetch (read instruction from instruction memory)
- decode (understand instruction and registers)
- execute (compute value or address)
- memory (read or write data)
- write back (write program registers)
stage computation: arithmetic/logical operations
formulate instruction execution as sequence of simple steps, use same general form for all instructions
fetch: read instruction byte, read register byte, compute next PC
decode: read operand A, read operand B
execute: perform ALU operation, set condition code register memory
write back: write back result
instruction execution limitations
too slow to be practical; hardware units only active for fraction of clock cycle
solution to instruction execution limitations
instruction pipelining
idea of instruction pipelining
divide process into independent stages; move objects through stages in sequence; at any given times, multiple objects are being processed
limitation of instruction pipelining
nonuniform delays (throughput limited by slowest stage; other stages sit idle for much of the time; challenging to partition system into balanced stages)
pipeline speedup
if all stages are balanced (i.e., take the same time),
ETpipeline = ETnonpipelined / (# stages + time for filling and exiting)
max speedup
number of stages
speedup due to
increased throughput
execution time of each instruction (decreases/increases) and why
increases slightly due to stage imbalance and increased hardware complexity
ETunpipelined/ETpipelined =
ETinst * # inst / [(ETinst * # inst) / (# stages + Tfilling&exiting)] = # stages
hazards
situations that prevent starting the next instruction in the next cycle
3 types of hazards
- structural hazard
- data hazard
- control hazard
structural hazard
a required resource does not exist or is busy
data hazard
need to wait for previous instruction to complete its data read/write