7 - Paging Flashcards
Outline how paged virtual memory works
Aim: allow a process to exist in non-contiguous memory
Diagram p3
- CPU generates logical addresses (page, offset)
- Search is performed through page table, to find entry corresponding to page number p.
- If the valid bit is set, then the frame number is read off, and a physical address (frame number, offset) is derived.
- The physical address is located and retrieved using memory bus operations, and stored in register thus available to process.
Paging scheme:
- Divide physical memory into frames, small fixed-size blocks.
- Divide logical memory into pages, blocks of the same size (typically 4kB)
- Each CPU-generated address is a page number p with page offset o.
- Page table contains associated frame number f
- Usually have many more page numbers than frame numbers, so also record whether mapping valid (valid bit)
WHY is hardware support required for paging and state some issues relating to the selection of page sizes.
- Every memory access requires us to read the page table.
- If arithmetic operations had to be performed upon every memory access in software => slow.
- Page size typically defined by hardware
- For hardware support to be effective, require page size to be power of two,e.g. ranging from 0.5 kB to 8kB.
- Power of two page sizes - can perform fast logical masking and shifting to get page number and page offset, no need to perform expensive arithmetic operations.
- e.g. logical address space of 2^m bytes, and page size of 2^n bytes, gives 2^(m - n) pages so require (m - n) bits to specify the page number, and offset of n bits to specify offset within page.
?? Error in handout - page offset should be n bits ??
Relationship between paging and dynamic relocation
- Paging is itself a form of dynamic relocation
- simply change page table
to reflect movement of page in memory. - .This is similar to using a set of base +
limit registers for each page in memory
What are some pros of paging
Clear separation between user (process) and system (OS) view of memory usage
How:
- Process sees single logical address space; OS does the hard work
- Process cannot address memory they don’t own — cannot reference a page it
doesn’t have access to - OS can map system resources into user address space, e.g., IO buffer
- OS must keep track of free memory; typically in frame table
- Easier for OS to allocate memory - no longer needs to be contiguous.
- No external fragmentation (in physical memory
What are some cons of paging
Adds overhead to context switching, how:
- Per process page table must be mapped into hardware on context switch
- The page table itself may be large and extend into physical memory
- Although no external fragmentation in physical memory, get internal fragmentation because a process may not use all of final page.
What hardware support is used to implement paging?
Page tables (PTs) rely on h/w support: Case 1: Set of dedicated relocation registers e.g. PDP-11 Case 2 : Keep PT in memory, then one MMU register needed, the PTBR (page table base register). Case 2 : TLB
Describe how paging COULD be implemented using dedicated set of relocation registers
This is the simplest / most basic way of implementing page table in hardware:
- What: keep a set of dedicated relocation registers.
- One register per page,
- OS Loads registers on context switch
- EG PDP-11, 16 bit address, 8kB pages, hence 8 PT registers (16 bit address, 8 kB = 2^13 bytes, #pages = 2^(16 - 13) = 8 pages, so 8 page table registers)
- every single memory reference goes through the PT so they must be fast registers.
BUT
1. This is OK for small PTs, but if we have millions of pages - cannot store such a large page table in registers in CPU.
Describe how PTBR could be used to implement page table
Set of dedicated relocation registers OK if PT is small, but if we have many pages e.g. millions, cannot store in registers.
Solution : Keep PT in memory, then one MMU register needed, the PTBR (page table base register).
- Context switches => OS switches only the PTBR
- Problem : PTs may still be very big,
=> Keep a PT length register (PTLR) to indicate size of PT (why???) - Problem : need to refer to memory twice for every “actual” memory reference.
=> Solution is to use a TLB (Translation lookaside buffer).
Factors involved in determining size of page
- Architecture sets this …
Considerations - Smaller page tables => less internal fragmentation, where a process may not use all of final page.
- But, significant per-page overhead to using small pages : the PTEs themselves (what ??? ) plus that disk IO is more efficient with larger pages.
- Typically 4 kB (memory is cheaper in this tradeoff than time overhead ???).
Explain what problem the TLB is supposed to solve
- When we use the solution of storing the PT in memory and use PTLR, PTBRs, then we need to refer to main memory twice for every “actual” memory reference. (1) Read off page table (2) Read off physical address.
- TLB maintains a cache of recently accessed pages and the corresponding frame number in physical memory to exploit spatial locality of reference and reduce the number of times we get double access times involving reading the page table.
Diagram showing TLB operation
- 7 in §7
- When memory is referenced, present TLB with logical memory address
- If the PTE is present, get an immediate result
- Otherwise, make memory reference to PTs and update the TLB
- Latter case is much slower than direct memory reference
Explain some issues with TLB use
As with any cache - what do we do when it’s full.
- If full, discard entries typically through LRU policy
- Context switches require TLB flush to prevent next process using wrong PTEs
- Mitigate cost through process tags
How are entries shared?
TLB performance and 80-20 issue
- TLB performance is measured using hit ratio = proportion of times a PTE is found in TLB.
- If t := TLB search time, and M := memory access time,
TLB hit: (t + M)
TLB miss: (t + 2M)
So effective memory access time:
(h)(t + M) + (1-h)(t + 2M)
assuming the page table is stored in main memory.
Due to spatial locality of reference, hit ratios may be high e.g. 80%, but not much gains from increasing this hit ratio further (….80-20 issue, worked example)
Explain why a multilevel page table is necessary.
- Most modern systems can support very large (2^32 bytes, 2^64 bytes) address spaces => very large page tables
- Don’t want to keep whole page table in main memory
- Solution is to split PT into several sub parts e.g. two parts, then page the page table.
Explain how 2-level paging works (Diagram)
- Divide the page number into two parts e.g. 20 bit page number, 12 bit page offset.
- Then divide the page number into outer and inner parts of 10 bits each.
- The MMU takes a logical address
- adds the page table base register to get the address of the correct entry in the process’s first level page table using the first ten bits of the page number.
- the MMU reads off the location of the second level page table, and then accesses the frame number using the next ten bits of the virtual address.
- Then the MMU returns the value specified by the last 12 bits (the offset into the correct frame).
Describe how the VAX architecture used paging
VAX
1. 32 bit architecture with 512 byte pages
2. Logical address space divided into 4 sections of 2^30 bytes.
3. Top 2 address bits designate section
4. Next 21 bits designate page within section.
5. Final 9 bits designate page offset
6. For a VAX with 100 pages, one level PT would be 4 MB, with sectioning its 1 MB.
?????????????
Explain why two level paging is still not enough for 64 bit architectures
- For 4kB pages, need 2^52 entires in a one-level page table (as 4kB = 2^12 bytes)
- For a 2 level PT with 32 bit outer PT, we’d still need 16 GB for the outer PT???????????????????
- Even some 32 bit machines have > 2 levels, SPARC (32 bit) has 3 level paging scheme, 68030 has 4 level paging.
Describe how X86 implements paging
Diagram p13 §7
- Page sizes of 4 kB or 4 MB, first makes a lookup to the page directory, indexed using top 10 bits.
- The page directory address is stored in an internal processor register.
- The lookup results in the address of a page table (usually).
- Next ten bits of logical address index the page table, retrieving the page frame address.
- Finally, use the low 12 bits as the page offset.
- Note that the page directory and page tables are exactly one page each themselves, deliberately. (why???????)
Describe (DIAGRAMMATICALLY) how X86 implements paging
p 13 §7.
- Take the virtual address, use the top ten bits as the index to the PAGE DIRECTORY (a level 1 page table). This results in the address of a page table as well as other bits…
- Each page table entry consists of four bytes, of which 20 bits are the L2 page table address (PTA) - Next 10 bits used as index to the page table located at PTA - retrieves the page frame address (PFA)
- Use the low 12 bits of virtual address as the page offset.
State protection issues associated with page table entries
We associate protection bits with each page kept in page tables and TLB (!)
e. g. FrameNumber - K RWX V
- Read permission
- Write permission
- Execute permission
- Kernel mode
- Dirty / modified bit