7 - Paging Flashcards

1
Q

Outline how paged virtual memory works

A

Aim: allow a process to exist in non-contiguous memory

Diagram p3

  • CPU generates logical addresses (page, offset)
  • Search is performed through page table, to find entry corresponding to page number p.
  • If the valid bit is set, then the frame number is read off, and a physical address (frame number, offset) is derived.
  • The physical address is located and retrieved using memory bus operations, and stored in register thus available to process.

Paging scheme:

  1. Divide physical memory into frames, small fixed-size blocks.
  2. Divide logical memory into pages, blocks of the same size (typically 4kB)
  3. Each CPU-generated address is a page number p with page offset o.
  4. Page table contains associated frame number f
  5. Usually have many more page numbers than frame numbers, so also record whether mapping valid (valid bit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHY is hardware support required for paging and state some issues relating to the selection of page sizes.

A
  • Every memory access requires us to read the page table.
  • If arithmetic operations had to be performed upon every memory access in software => slow.
  • Page size typically defined by hardware
  • For hardware support to be effective, require page size to be power of two,e.g. ranging from 0.5 kB to 8kB.
  • Power of two page sizes - can perform fast logical masking and shifting to get page number and page offset, no need to perform expensive arithmetic operations.
  • e.g. logical address space of 2^m bytes, and page size of 2^n bytes, gives 2^(m - n) pages so require (m - n) bits to specify the page number, and offset of n bits to specify offset within page.

?? Error in handout - page offset should be n bits ??

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relationship between paging and dynamic relocation

A
  • Paging is itself a form of dynamic relocation
  • simply change page table
    to reflect movement of page in memory.
  • .This is similar to using a set of base +
    limit registers for each page in memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some pros of paging

A

Clear separation between user (process) and system (OS) view of memory usage

How:

  1. Process sees single logical address space; OS does the hard work
  2. Process cannot address memory they don’t own — cannot reference a page it
    doesn’t have access to
  3. OS can map system resources into user address space, e.g., IO buffer
  4. OS must keep track of free memory; typically in frame table
  5. Easier for OS to allocate memory - no longer needs to be contiguous.
  6. No external fragmentation (in physical memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some cons of paging

A

Adds overhead to context switching, how:

  1. Per process page table must be mapped into hardware on context switch
  2. The page table itself may be large and extend into physical memory
  3. Although no external fragmentation in physical memory, get internal fragmentation because a process may not use all of final page.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What hardware support is used to implement paging?

A
Page tables (PTs) rely on h/w support: 
Case 1: Set of dedicated relocation registers e.g. PDP-11 
Case 2 : Keep PT in memory, then one MMU register needed, the PTBR (page table base register). 
Case 2 : TLB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe how paging COULD be implemented using dedicated set of relocation registers

A

This is the simplest / most basic way of implementing page table in hardware:
- What: keep a set of dedicated relocation registers.
- One register per page,
- OS Loads registers on context switch
- EG PDP-11, 16 bit address, 8kB pages, hence 8 PT registers (16 bit address, 8 kB = 2^13 bytes, #pages = 2^(16 - 13) = 8 pages, so 8 page table registers)
- every single memory reference goes through the PT so they must be fast registers.
BUT
1. This is OK for small PTs, but if we have millions of pages - cannot store such a large page table in registers in CPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe how PTBR could be used to implement page table

A

Set of dedicated relocation registers OK if PT is small, but if we have many pages e.g. millions, cannot store in registers.

Solution : Keep PT in memory, then one MMU register needed, the PTBR (page table base register).

  • Context switches => OS switches only the PTBR
  • Problem : PTs may still be very big,
    => Keep a PT length register (PTLR) to indicate size of PT (why???)
  • Problem : need to refer to memory twice for every “actual” memory reference.
    => Solution is to use a TLB (Translation lookaside buffer).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Factors involved in determining size of page

A
  • Architecture sets this …
    Considerations
  • Smaller page tables => less internal fragmentation, where a process may not use all of final page.
  • But, significant per-page overhead to using small pages : the PTEs themselves (what ??? ) plus that disk IO is more efficient with larger pages.
  • Typically 4 kB (memory is cheaper in this tradeoff than time overhead ???).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain what problem the TLB is supposed to solve

A
  • When we use the solution of storing the PT in memory and use PTLR, PTBRs, then we need to refer to main memory twice for every “actual” memory reference. (1) Read off page table (2) Read off physical address.
  • TLB maintains a cache of recently accessed pages and the corresponding frame number in physical memory to exploit spatial locality of reference and reduce the number of times we get double access times involving reading the page table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Diagram showing TLB operation

A
  1. 7 in §7
  2. When memory is referenced, present TLB with logical memory address
  3. If the PTE is present, get an immediate result
  4. Otherwise, make memory reference to PTs and update the TLB
  5. Latter case is much slower than direct memory reference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain some issues with TLB use

A

As with any cache - what do we do when it’s full.

  1. If full, discard entries typically through LRU policy
  2. Context switches require TLB flush to prevent next process using wrong PTEs
  3. Mitigate cost through process tags

How are entries shared?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

TLB performance and 80-20 issue

A
  1. TLB performance is measured using hit ratio = proportion of times a PTE is found in TLB.
  2. If t := TLB search time, and M := memory access time,
    TLB hit: (t + M)
    TLB miss: (t + 2M)
    So effective memory access time:

(h)(t + M) + (1-h)(t + 2M)

assuming the page table is stored in main memory.

Due to spatial locality of reference, hit ratios may be high e.g. 80%, but not much gains from increasing this hit ratio further (….80-20 issue, worked example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain why a multilevel page table is necessary.

A
  1. Most modern systems can support very large (2^32 bytes, 2^64 bytes) address spaces => very large page tables
  2. Don’t want to keep whole page table in main memory
  3. Solution is to split PT into several sub parts e.g. two parts, then page the page table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how 2-level paging works (Diagram)

A
  1. Divide the page number into two parts e.g. 20 bit page number, 12 bit page offset.
  2. Then divide the page number into outer and inner parts of 10 bits each.
  3. The MMU takes a logical address
    - adds the page table base register to get the address of the correct entry in the process’s first level page table using the first ten bits of the page number.
    - the MMU reads off the location of the second level page table, and then accesses the frame number using the next ten bits of the virtual address.
    - Then the MMU returns the value specified by the last 12 bits (the offset into the correct frame).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe how the VAX architecture used paging

A

VAX
1. 32 bit architecture with 512 byte pages
2. Logical address space divided into 4 sections of 2^30 bytes.
3. Top 2 address bits designate section
4. Next 21 bits designate page within section.
5. Final 9 bits designate page offset
6. For a VAX with 100 pages, one level PT would be 4 MB, with sectioning its 1 MB.
?????????????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain why two level paging is still not enough for 64 bit architectures

A
  1. For 4kB pages, need 2^52 entires in a one-level page table (as 4kB = 2^12 bytes)
  2. For a 2 level PT with 32 bit outer PT, we’d still need 16 GB for the outer PT???????????????????
  3. Even some 32 bit machines have > 2 levels, SPARC (32 bit) has 3 level paging scheme, 68030 has 4 level paging.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe how X86 implements paging

A

Diagram p13 §7

  1. Page sizes of 4 kB or 4 MB, first makes a lookup to the page directory, indexed using top 10 bits.
  2. The page directory address is stored in an internal processor register.
  3. The lookup results in the address of a page table (usually).
  4. Next ten bits of logical address index the page table, retrieving the page frame address.
  5. Finally, use the low 12 bits as the page offset.
  6. Note that the page directory and page tables are exactly one page each themselves, deliberately. (why???????)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe (DIAGRAMMATICALLY) how X86 implements paging

A

p 13 §7.

  1. Take the virtual address, use the top ten bits as the index to the PAGE DIRECTORY (a level 1 page table). This results in the address of a page table as well as other bits…
    - Each page table entry consists of four bytes, of which 20 bits are the L2 page table address (PTA)
  2. Next 10 bits used as index to the page table located at PTA - retrieves the page frame address (PFA)
  3. Use the low 12 bits of virtual address as the page offset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

State protection issues associated with page table entries

A

We associate protection bits with each page kept in page tables and TLB (!)

e. g. FrameNumber - K RWX V
- Read permission
- Write permission
- Execute permission
- Kernel mode
- Dirty / modified bit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Comment on protection using paging

A
  1. As the virtual address goes through the page hardware can check protection bits - hardware supported protection
  2. Though, this only gives page granularity, not byte granularity => internal fragmentation if want to assign separate permissions (via locating in different pages)????
  3. Any attempt to violate protection causes hardware trap to OS code to handle,
  4. Entry in TLB will have a valid / invalid bit indicating whether the page is mapped into the process address space (??????). If invalid, then trap to the OS handler to map the page (?????)
    * *** Can do lots of interesting things here, particularly with regard to sharing, virtualisation ??????????
22
Q

Explain the usefulness of shared pages

A
  1. Another advantage of paged memory is code / data sharing e.g.
    - Sharing binaries : editor, compiler
    - Libraries : shared objets, DLLs (???)
23
Q

Explain how shared memory could be implemented using paging

A
  1. Implemented as two logical addresses which map to one physical address
  2. If code is re-entrant (i.e. stateless, non-self modifying) it can be easily shared between users (why???)
  3. Otherwise, can use a copy-on-write technique…
    => Mark a page as read only in all processes
    => If a process tries to write to a page, will trap to OS fault handler
    => Can then allocate new frame, copy data, and create new page table mapping.
    => May use this for lazy data sharing too ???
  4. Evaluation :
    - Requires additional book-keeping in OS, but worth it e.g. many hundreds of MB shared code even in single user system
    - How does unikernels change this ???????
24
Q

Why does virtual addressing allow us to implement virtual memory?

A

Virtual addressing allows us to introduce the idea of virtual memory
=> Already have valid or invalid page translations; introduce “non-resident”
designation and put such pages on a non-volatile backing store
=> Processes access non-resident memory just as if it were “the real thing”

25
Q

What are the benefits of virtual memory?

A

Virtual Memory (VM) has several benefits:
1. Portability:
=> programs work regardless of how much actual memory present;
=> programs can be larger than physical memory
2. Convenience:
=> programmer can use e.g. large sparse data structures with impunity;
=> less of the program needs to be in memory at once, thus potentially
more efficient multi-programming, less IO loading/swapping program into
memory
3. Efficiency:
=> no need to waste (real) memory on code or data which isn’t used (e.g., error handling)

26
Q

Explain how virtual memory can be implemented with demand paging

A
  1. Programs (executables) reside on disk
  2. To execute a process, we load pages in on demand IE as and when they are referenced.
  3. Demand segmentation is also possible, but rare (e.g. Burroughs, OS/2) as it’s more difficult (segment replacement is much harder due to segments having variable size - why??????).
27
Q

Explain in details how demand paging works

A

When loading a new process for execution

  1. Create its address space (page tables, etc)
  2. Mark PTEs as either invalid (???), or non-resident.
  3. Add the PCB to scheduler

Then, whenever the OS receives a page fault, check PTE:

=> If due to invalid reference, kill process
=> Else, due to non-resident page, so “page in” the desired page (separate question as to how).

28
Q

Explain how a demand paging system “pages in”, and when

A
  • OS receives page fault due to non-resident page.
    Pages in desired page:
    1. Find a free frame in memory
    2. Initiate disk IO to read in desired page
    3. When IO finished, modify PTE for this page to show that it is now valid.
    4. restart process at the faulting instruction
29
Q

Demand paging : issues

A
  1. Handling makes fault invisible to process .: logical transparency

BUT:

  1. Requires care to save process state correctly on fault (why???)
  2. Can be particularly awkward on a CPU with pipelined decode as we need to wind
    back (e.g.,MIPS,Alpha) (WHAT????)
  3. CISC CPU - single instruction can move lots of data possibly across pages - cannot restart instruction, rely on help from microcode (e.g. to test address before writing). Can possibly use temporary registers to store moved data (how ???)
  4. Instructions / data could span multiple pages .: multiple page faults per instruction, but infrequent due to locality of reference.
  5. Pure demand paging (described previously) - lots of page faults when process begins .: many real systems explicitly load core parts of process first.
30
Q

Why do we need page replacement?

A
  1. To page in from disk, need a free frame of physical memory to hold data we’re reading in - but size of physical memory limited.

Two options:
1. Discard unused pages if total demand for pages exceeds physical memory size

  1. Or swap out an entire process to free some frames
31
Q

Describe : page fault handling with page replacement

A

Page fault occurs due to non-resident page
1. Locate the desired replacement page on disk
2. Select a free frame for the incoming page:
2a . If there is a free frame use it, otherwise select a victim page to free
2b Then write the victim page back to disk [unless possibly if unmodified].
2c Finally mark it as invalid (?? or non-memory resident ??) in its process page tables
3. Read desired page into the now free frame
4. Restart the faulting process from where it left off.

… thus, having no free frames effectively doubles the page fault service time.

32
Q

Dirty bit to PTE to improve efficient of Page replacement : HOW?

A

Add dirty bit to a page’s PTE

  1. If page not modified (dirty bit not set) no need to write out when its frame is evicted.
  2. If page is marked read only e.g. contains binary executable code, no need to write out either (unmodifiable).
33
Q

Page replacement : how do we choose our victim frame ? State objectives.

A

Importance of choosing a good victim frame:
=> A key factor in an efficient VM system: evicting a page that we’ll need in a few
instructions time can get us into a really bad condition
Aim
=> We want to ensure that we get few page faults overall, and that any we do get
are relatively quick to satisfy

In general, page replacement algorithms: => All aim to minimise page fault rate
=> Candidate algorithms are evaluated by (trace driven) simulation using reference
strings

34
Q

Page replacement algorithms : First in first out : HOW??

A
  1. Keep FIFO queue of pages

2. Discard from head

35
Q

Evaluate : FIFO page replacement

A
  1. Performance hard to predict BECAUSE no idea whether replaced page will be used again or not.
    BECAUSE eviction is independent of page use frequency

Generally, simple but inefficient e.g. :
3. Can discard a page currently in use, causing immediate fault, and next in queue to be replaced => system slowdown.

  1. Possible to have faults with increasing availability of frames : Belady’s Anomaly (Need some questions on this …. !!!!)
36
Q

State OPT / MIN page replacement algorithm

A
  1. Replace the page which will not be used again for the longest period of time.

How:
=> Uncertain how long a page will be unused for … !

Why:
=> Provides good baseline for other algorithms - how close they can get to theoretical best performance.

37
Q

Page replacement : least recently used : LRU : state

A
  1. Replace page which hasn’t been used for longest amount of time.
  2. Equivalent to OPT with time running backwards (???)
38
Q

Evaluate : LRU page replacemnet

A
  1. Assumes past is a good predictor of future
  2. Can still end up replacing pages that are about to be used.
  3. Generally considered quite good, but may require substantial hardware assistance (why ??)
  4. BUT, how do we determine the LRU ordering ? (separate details on implementing least recently used).
39
Q

Describe how to implement LRU using counters

A
  1. Give each PTE a time-of-use field and give the CPU a logical clock (counter)
  2. Whenever a page is referenced, its PTE is updated to clock value
  3. Replace page with smallest time value
40
Q

Evaluate : implementing LRU with counters

A
  1. Requires a search to find minimum counter value
  2. Adds a write to memory (PTE) on every memory reference
  3. Must handle clock overflow
  4. Impractical on a standard processor to add extra memory writes to every single memory reference made.
41
Q

Describe how to implement a LRU page replacement algorithm using a page stack.

A
  1. Maintain a stack of pages (doubly linked list) with MRU (most recently used) page on top
  2. Discard from bottom of stack
42
Q

Evaluate LRU page replacement using page stack

A
  1. Requires changing (up to) 6 pointers per [new] reference
  2. This is very slow without extensive hardware support
  3. Also impractical on a standard processor
43
Q

Describe how to approximate LRU by NRU replacement. State what hardware support is required.

A
  1. Have a reference bit in the PTE
  2. Reference bit initially zeroed by OS when first paged in.
  3. R set by hardware whenever the page is referenced.
  4. After time has passed, consider those pages with the bit set to 1 to be ACTIVE and implement NRU (not recently used) replacement :
    => Periodically (e.g. 20 ms) clear all reference bits (??? scan through PTable every 20 ms ???)
    -> When choosing a victim to evict, prefer pages with clear reference bits
    => If also have a modified or dirty bit in the PTE, can use that too.

Priorities:
(Referenced and dirty) => bad choice
(Referenced but not dirty) = probably code in use
(Not referenced but dirty) => next best, requires write back
(not referenced, not dirty) => best choice to replace.

44
Q

How could a system improve the accuracy of approximate LRU by using extra bits….

A

Instead of just a single bit, the OS:

  1. Maintains an 8-bit value per page, initialised to zero.
  2. Periodically (e.g. 20 ms) shift reference bit onto higher order bit of the byte AND clear the REFERENCE bit.

replacement : select lowest value page (or one of …) to replace

  • Keeps history for last 8 clock sweeps
  • Interpreting bytes as unsigned integers, then LRU page is min (additional_bits) (what????)
  • May not be unique, but gives a candidate set (so ????)
45
Q

Explain further improvement to approximate LRU using second chance FIFO

A

How:

  1. Store pages in queue as per FIFO
  2. Before discarding the head, check reference bit.
  3. If reference bit = 0, discard ELSE reset reference bit, give page a second chance. (add to tail of queue)
46
Q

Evaluate improved approximate LRU using second chance FIFO

A
  1. Guaranteed to terminate after at most one cycle => worst case having second chance devolve into FIFO if all pages are referenced
  2. A page given a second chance is the last to be replaced (so ???)
47
Q

Explain TLB OPERATION

A

When memory is referenced we present the TLB with a logical memory address

  1. If the PTE is present we get an immediate result (parallelised search / readoff)
  2. Otherwise we make a memory reference to PTs and update the TLB…
  3. Latter case is definitely slower than the direct memory reference.

Nice diagram p9 of §7 (another question, sketch / explain)

48
Q

Explain and sketch the diagram on p9 of §7 (TLB operation)

A
  1. CPU generates logical address (page number, offset)
  2. Logical address (p, o) sent in parallel to the TLB which contains a cache of recently used page numbers and their corresponding frame numbers in physical memory => content lookup (???) fast
  3. If a match is found in the TLB then we can use the frame number and offset to access memory only once to get the value we want.
  4. If a match is not found, we access the page table (itself in memory) to get the frame number and make a further access to get the value stored at the offset within the frame in physical memory.
  5. The page - frame mapping is inserted into the TLB so that (cf spatial locality of reference) if the same page is accessed again we can get its frame number very quickly.
49
Q

Discuss issues with using a TLB

A

As with any cache,
1. what to do when it’s full,
1a - Discard entries typically LRU (least recently used) policy.
1b - Context switches require TLB flush to prevent next process using wrong PTEs
1c - May mitigate cost through process tags (???? how ??????)
2. how are entries shared???

50
Q

Explain application of 80-20 rule in TLB performance

Assume TLB search time of 20 ns,
Memory access time of 100 ns,
hit ratio of 80% vs 98 %

A

TLB performance is measured in terms of a hit-ratio
=> Hit ratio = proportion of time a PTE is found in TLB.
=> Effective memory access time =
hr * TLBtime + (1 - hr) * (TLB + 2 * MAT)
Case 1 = 140 ns
Case 2 = 122 ns only 13 % improvement …. may not be useful if already sufficient spatial locality of reference as to make the hit ratio already rather high (80-20 principle).