Doesn't bring whole process into memory, only loads the needed page (Lazy swapper) Less I/O, no unnecessary I/O More free memory faster response Can serve more users Hardware support needed Needs to change MMU and add a bit signalling validity So that during address translation it can detect page faults v = in-memory i = not-in-memory -\> page-fault secondary memory (swap) instruction restart pay attention to limit cases: not all instruction can be restarted after one page fault, some may generate more than one. Extreme case: start process with no pages in mem. (this for every proc.) pure demand paging Instruction could access multiple pages -\> more page faults locality reduces the stress

parent and child process share the same page initially only if a process modifies the page it gets another copy process creation is more efficient vfork() is used to let the child use parent address space while parent suspended designed for child calling exec very efficient

Keep counter of the references amde to each page Least frequently used (LFU): replace page with smalles count Most frequntly used(MFU): replace page with largest count, hypotesis that the page with lowest count was just brought in Not so popular

Virtual Memory (10) Flashcards by Enrico Carraro

Virtual Memory

Virtual Memory: separation of logical memory from physical mem.
The whole program doesn’t need to be in memory to be executed
Logical address space can be in much larger than physical
Allows address spaces to be shared by several processes
- Library code can be shared
More concurrency
More efficient process creation
- Less stuff to load in memory
Less I/O needed to load or swap process
- Only parts of the process that are not yet in me

Virtual memory can be implemented with:

Demand Paging
Demand Segmentation

How well did you know this?

Not at all

Perfectly

Virtual-Address Space

Logical view of how process is stored in main memory

Usally starts at 0
Physical memory is organized in page frames
MMU must map logical to physical

Often designed this way:

stack starts at the top of the logical address space
heap grows up above data and code
Unused address space between: hole
- No more physical memory needed until there is a hole

Enable sparse addresses spaces with holes left for:

growth
dynamically linked libraries

System libraries and shared memory can be accessed by mapping them into virtual address space.

Pages can be shared to speed up fork()

How well did you know this?

Not at all

Perfectly

Demand Paging

Doesn’t bring whole process into memory, only loads the needed page (Lazy swapper)
- Less I/O, no unnecessary I/O
- More free memory
- faster response
- Can serve more users
Hardware support needed
- Needs to change MMU and add a bit signalling validity
  - So that during address translation it can detect page faults
    - v = in-memory
    - i = not-in-memory -> page-fault
- secondary memory (swap)
- instruction restart
  - pay attention to limit cases: not all instruction can be restarted after one page fault, some may generate more than one.
Extreme case: start process with no pages in mem. (this for every proc.)
- pure demand paging
- Instruction could access multiple pages -> more page faults
  - locality reduces the stress

How well did you know this?

Not at all

Perfectly

Handling a Page Fault (Demand Paging)

Once in page fault, OS looks at another table to decide
- Invalid ref. -> abort
- not-in-memory
find free frame
swap page into frame by scheduling disk op.
reset tables to reflect that the page is in-memory
restart from instruction that caused page fault

How well did you know this?

Not at all

Perfectly

Free-frame list

Most OSes maintain a list pool of free frams to satisfy requests
Zero-fill-on-demand: the content of the frames are zeroed-out before being allocated
When a system starts up: all the memory is in the free frame list
It’s very important to always have free frames
- if not we need to deal with freeing also a frame -> 2x slower

How well did you know this?

Not at all

Perfectly

Demand paging performance

Main factors:

Handling the interrupt: several hundred instruction if we are being optimistic (not so long)
Read the page: very very slow
Restart the process: small amount of time

p = Page fault rate

Effective access time = (1-p) * memory access + p ( page_fault_overhead + swap_out_time + swap_in_time)

How well did you know this?

Not at all

Perfectly

Demand paging optimization

Copy entire proccess image to swap space at process load time, then page in and out from there
- swap space I/O is faster than fs I/O even if it is same device
  - swap is allocated in larger chunks -> less overhead
- used in older BSD Unix v.
Demand page in from program binary on disk
- this is very good because:
  - most pages are read-only
  - it doesn’t require to write back to disk
- We still need write to swap space
  - heap and stack
  - pages modified in memory but not written back to the fs

How well did you know this?

Not at all

Perfectly

Copy-on-write

parent and child process share the same page initially
- only if a process modifies the page it gets another copy
process creation is more efficient
vfork() is used to let the child use parent address space while parent suspended
- designed for child calling exec
- very efficient
  *

How well did you know this?

Not at all

Perfectly

Page replacement

What if no free frames?
We need to prevent over-allocation of mem. by modifying page-fault service routine to include replacement
Use dirty bit to transfer only pages that have been modified
Page replacement makes possible separation between physical and logical memory
- we can have logical > physical
Process:
- no free frame?
- run page repalcement alog to select victim
- write victim to disk
- bring page into the new free frame
- update page/frames tables
- continue
With page repl. we have 2 page transfer for page fault

How well did you know this?

Not at all

Perfectly

Page frame Replacement Algorithms

Frame-allocation algo:
- how many frames do I give to this process?
- which frames to I replace
Page-replacement algo:
- want lowest possible page fault rate of access and re-access

Algos:

FIFO:
- Belady’s anomaly: adding more frames may result in more faults
Optimal algo:
- used as a benchmark
- impossible to implement without discovering that P=NP
LRU: least recently used

LRU and optimal are cases of stack algo without Belady’s anomaly.

How well did you know this?

Not at all

Perfectly

LRU Algorithm Implementation

Counter
- every page has a counter
- on every reference of the page: copy the timestamp into it
- look for the smaller counter when you want to replace page
Stack
- keep stack of page numbers in a double linked list
- page referenced:
  - move to top
  - requires 6 pointers to be changes
- each update more expensive

How well did you know this?

Not at all

Perfectly

LRU Approximation algorithms

We need an approximation because LRU needs special hardware and it is still slow.
Reference bit:

Used to understand if a page was used recently enough.
RB = 1 -> page is fresh
RB = 0 -> you can replace it

Second-chance algorithm:

FIFO + hardware provided reference bit
Circular list
if page to be replaced has RB = 0 then replace
otherwise set RB = 0, leave page in memory, move to next page
repeat.

Basically we are setting to old fresh pages, but replacing only the ones that were already old.

How well did you know this?

Not at all

Perfectly

Enhanced second chance algorithm

Use reference bit and modify bit
consider pair (rb, mb)
- (0, 0): not used recently neither modified
  - best candidate
- (0, 1): modified but not used recently
  - okay, but must write to memory
- (1, 0): used recently but not modified
  - probably will be used soon
- (1, 1): recently used and modified
  - will be used soon and need to write it
On page replacement:
- use clock scheme
- use the four classes to replace page
- might need to search queue several times

How well did you know this?

Not at all

Perfectly

Counting algorithms

Keep counter of the references amde to each page
Least frequently used (LFU): replace page with smalles count
Most frequntly used(MFU): replace page with largest count, hypotesis that the page with lowest count was just brought in
Not so popular

How well did you know this?

Not at all

Perfectly

Page Buffering Algorithms

We want to minimize the number of page faults
We want always a pool of free frames in which to swap pages:
- We’ll have the frame when needed, not looked after the page fault occurs
- select victim to evict and add to free pool
- use dirty bit
- keep frame contents intact until eviction
  - if page referenced again we don’t need to transfer page from memory
  - reducec penalty if wrong victim was selected

How well did you know this?

Not at all

Perfectly

Applications and Page Replacement

Study These Flashcards

Some applications have better knowledge than OS about what page will be needed
Memory intensive apps can cause double buffering
- OS ees page in memory as I/O buffer
- Application does its work on their page
Raw disk is a way for application to access disk directly bypassing
- buffering
- locking
- etc.

Frame allocation

Study These Flashcards

Each process needs a minimum number of frames
Two main schemes:
- fixed allocation
  - Equal: divide the number of frames by the number of processes
    - keep some in free frames pool
  - Proportional:
    - allocate pages based on processes size
    - changes with time due to degree of multiprogramming and process size
- priority allocation
- many variations
  - Global vs. Local replacement:
    - Global: process can take a frame from another
      - greater throughput, more common
    - Local: select only from own set of frames
      - underutilized memory
      - more consistent per-process performance
        *

Frame allocation: Reclaiming Page

Study These Flashcards

Strategy for global page-replacement policy:
- All memory requests are satisfied from free frame list
- if free pages number is under a certain threshold
- free up some frames so that we are above threshold

Non-uniform memory acccess

Study These Flashcards

frame allocation problem
Many systems have a varying access speed to memory
Optimal performance comes from:
- allocating frames near the CPU that is executing the process
- modifying scheduler to schedule thread on the same system board when possible

Thrashing

Study These Flashcards

If process doesn’t have enoguh pages, the page-fault rate gets high
- page fault to get page
- replace existing frame
- slightly after replace frame back again
Leads to:
- Low CPU usage
- OS thinks: I need process that doesn’t do much I/O, let’s bring some processes to memory
  - another process in the system -> more page faults -> less CPU utilization

Demand paging works because of locality model.

Thrashing occurs because: Σ size of locality > total memory size

We can limit thrashing by using local or priority page replacement.

Working-Set Model

Study These Flashcards

Trying to solve thrashing
Δ ≡ working-set window ≡ a fixed number of page references
WSSi (Working set of Pi): total number of pages referenced in the last Δ
- Δ too small: will not take full advantage of locality
- Δ too large: will include several localities
- Δ tends to infinity: will take the whole program
D = Σ WSSi = total demand frames ≈ locality
- D > m => Thrashing
Policy:
- if D > m: suspend or swap out one of the processes

Keeping track of working set

Study These Flashcards

Keeping track of the WSS at each page fault is too expensive
- we will have an interrupt and will need to compute the working set again
Solution: approximate with timer interval, reference bit and previous reference bit state
Example
- Δ = 10000
- Interrupt timer every Δ/2 time units
- at every interrupt copy and sets the values of all reference bits to 0 and update previous reference bit
- if one of the two bit is 1 => the page has been accessed in the last Δ time units
We could improve using more bits and interrupting more frequently

Page-fault frequency

Study These Flashcards

More direct approch to solve thrashing tan WSS
Establishes an acceptable page-fault frequency (PFF)
- if current rate high -> process gains frame
- if too low -> process loses frame
Algo:
- at each page fault
- τ: time interval from previous page fault
- τ < c: page-fault freq. too high, add new frame to resident set of process
- τ >= c: freq. is okay:
  - remove from resident set of process all pages with reference bit = 0
  - clear reference bit of all other pages in resident set

Kernel memory allocation

Study These Flashcards

Treated differently (say what?!)
Often allocated from free pages pool
- Kernel requests memory for structures of varying sizes
- Some kernel memory needs to be contiguos (for device I/O)

Buddy system

* Way of allocating kernel memory * Allocates from fixed-size segment of contiguos physical space using a power of 2 allocator * A request is rounded to the next high power of two * Pros: * Coalescing: adjacent buddies can be combined to form larger segments * Cons: * fragmentation

Slab allocator

* Kernel allocation strategy * **Slab**: 1 or more physical pages that are contiguos * **Cache**: set of 1 or more slabs * Every **kind of kernel data** **structure** has a single **cache** assigned: * each cahe is filled with objects from the data structure * On cache creation: filled with obejct marked as *free* * On strctures stored: objects are marked as *used* * If slab is full * if there is a free slab frome the same cache: allocate object * otherwise: allocate new slab, and allocate object in it * Pros: * no memory fragmentation * fast memory request satisfaction

Prepaging

* At startup there are a lot of page-faults * Prepage all or some of the pages the process will need, preventing page-faults * Beware that processes may not need all prepaged paged -\> wasted I/O Cost of prepaging: prepaging is too costly if the number of used pages is low respect to the number of prepaged ones

Page size

* Important parameter that is difficult to choose * Sometime OS designers can have choice (in particular if running on custom-built CPU) * Factors to take into consideration: * Fragmentation * page table size * resolution * IO overhead * number of page faults * locality * TLB size and effectiveness * On average, growing over time * Increasing page size: * this increases TLB effectiveness but also frag. for smaller apps. * Provide multiple page sizes: * allow applications that require larger page sized the opportunity to used them without increasing fragmentation.

TLB Reach

* Amount of memory accessible from TLB * TLB reach = (TLB size) \* (Page size) * Ideally, the working set of each process is stored in TLB: * otherwise -\> too high page fault rate

I/O interlock

* Pages that are used to copy/move files shouldn't be selected for eviction. * Pinning the page locks to avoid the probelm

Virtual Memory (10) Flashcards

(30 cards)