VM-implementation Flashcards
On syscall, OS must run and use its own internal data structures
But we don’t want every system call to induce a context switch
What optimization can the OS do?
map OS data structures to address space of all
processes
– Use the “user/supervisor bit” to indicate that only the OS (“ring 0”)
can access this memory area, whereas user code (“ring 3”) cannot
– Further, set the “global page” bit in the PTEs associated with the OS
memory, which would then leave the corresponding TLB translations
valid across context switches
Not used anymore, because of the meltdown attack.
Who inserts the VA=>PA translation to the TLB?
The HW, at the end of a page walk.
Who’s responsible for setting bits (e.g. dirty & accessed)
HW, the OS turns them off.
Who invalidates the TLB?
– E.g., upon munmap() or context switch
OS invalidates TLB entries
Each core has its own TLB. when, e.g., one core invalidates PTE or changes
access, who is responsible for synching the TLB with the other cores?
OS.
How many address bits does current x86_64 HW use?
48 bits (256TB).
How many page tree hierarchies there are in x86_64?
4 levels.
What sizes of huge pages are supported in x86_64?
2MB, 4MB, 1GB.
What are the 3 different address types of PPC? describe each one.
3 address types: effective => virtual => real
Effective
– Each process uses 64-bit “effective” addresses
– Effective addresses aren’t unique per-process
– More or less equivalent to x86 “virtual” addresses
– Get translated to PPC “virtual” addresses
Virtual
– A huge 80-bit address space
– All processes live in and share this (single) space
– Namely, if two processes have a page with the same virtual address
• Then it’s the same page (= a shared page)
– Not equivalent to x86 virtual addresses
– Get translated into physical (“real”) address
Physical (a.k.a. real)
– 62-bit
What is the size of PPC effective and virtual segments?
Effective & virtual spaces are partitioned into contiguous
segments of 256MB
Each segment is contiguous in the per-process effective memory space
Each segment is contiguous in the single, huge virtual space
How many segments can there be in an effective space?
– Effective space size is 2^64; so can be 2^(64-28) = 2^36 segments
How many segments can there be in the virtual space?
– Effective space size is 2^80; so can be 2^(80-28) = 2^52 segments
What’s the purpose of SLB (segment lookaside buffer) in PPC?
HW searches SLB
– To find ESID=>VSID mapping
– Each STE (segment table entry)
contains ESID & VSID info
Who manages the SLB? and what happens upon context switch?
OS explicitly manages SLB
– Maintains ESID=>VSID for all
segments of the process
Upon context switch
– OS invalidates SLB (not shared)
What happens upon SLB miss?
Upon SLB miss
– HW raises “segment fault” interrupt
– OS will then insert right STE
What’s the purpose of TLB in PPC?
TLB is a (less) fast HW cache, bigger the SLB.
HW searches TLB
– To find VPN=>PPN mapping
– Each PTE (page table entry)
contains VPN & PPN info
• Shared by all processes
– Since virtual space is shared
– Unlike in x86
– Not invalidated upon context switch
Who manages the TLB in ppc?
Managed by HW & OS
– Upon miss, HW populates TLB
– By “walking the page table”, which is
in fact the “HTAB”…
What’s the difference between PPC PTE and x86 PTE?
PPC Contains VPN (the “tag”; why do we need it?) & PPN
x86 contains only PPN.
What’s a PTEG (page table entry group)?
Contains 8 PTEs
– 8 * 16 = 128 bytes = |cache line|
What’s HTAB?
At boot time, OS allocates in DRAM the “HTAB” array
• Holds k PTEGs, where k is configurable
OS saves HTAB’s size & base in “SDR1” (storage description register)
What happens upon a TLB miss in PPC?
– HW hashes VPN (modulo k)
• Recall that HW knows about HTAB location and size via SDR1
• The hash function is well-known and documented in the PPC spec
– HW accesses HTAB and gets the so called “primary” PTEG of the VPN
– HW searches for the VPN in the 8 PTEs populating the primary PTEG
– If found, HW puts VPN=>PPN in TLB and re-executes operation
– Otherwise, HW uses a secondary well-known hash function to obtain
the “secondary” PTEG
– If VPN found, HW puts VPN=>PPN in TLB and re-executes operation
– Otherwise, HW triggers page-fault interrupt
– OS will then resolve the page fault and put the appropriate PTE in one
of the associated two PTEGs (primary or secondary)
– After interrupt is handled, HW will re-execute the operation; now it’ll
find the VPN in one of the 2 PTEGs
What’s the PPC ERAT?
ERAT (effective to real translation)
A small, fast HW cache
– O(128) entries
– Quicker than 1-cycle to access (on the critical path)
– Translates from effective to real (physical)
– Analogous to x86 TLB (L1)
– Updated to hold the most recent effective=>physical mappings used
• On an LRU basis
– If hit, don’t need to go through the SLB/TLB process
• Typically, obviates the need to do costly SLB=>TLB=>HTAB
translations