Exam 2 Flashcards
Sharing of Memory
Simplistic view of memory
• All programs fit in memory
• Each program occupies a continuous
part of memory
Key issues: • How are programs allocated to physical memory? • How do we make sure applications do not step on on another? • How do we find free space efficiently?
Naïve Idea 1: Fixed Partitions
• Memory is divided up into partitions by operator at startup, e.g.
• One process per partition
• Processes assigned a partition at load time
• Loader rewrites memory addresses
• New processes wait until a sufficiently large partition becomes
available
Fixed Partitions Pros and Cons
Advantages
• Pretty simple
• Requires relatively little hardware support
Disadvantages
• Have to decide partition boundaries in advance
• Inflexible
• Can be an inefficient use of memory
• Have to know memory requirements in advance (no dynamic memory
allocation)
Naïve Idea 2: Base & Limit Registers
• Basic idea: hardware support – two special registers (for user-mode
processes)
• A base register, automatically added to each address as it is used
• A limit register, against which all accesses are tested
• Going beyond the limit triggers an error exception
• Nice idea, but requires hardware support to do an add and compare
on each access
• Not used by itself much anymore
Managing Free Space: Bitmaps
• Suppose memory is divided in chunks of 10K
• Maintain a vector of 0 & 1’s that specifies
availability
• 𝑖𝑡ℎ bit tells whether 𝑖𝑡ℎ chunk is free
• For this example: 20 bits
00000011 00111110 0011
Bitmaps pros and cons
Advantages
• Simple implementation
Disadvantages
• Slow to search for free bits
• Search for k allocation units for a process requires searching
for k consecutive 0s
Managing Free Space: Linked Lists
Each record has • Process ID/ Free (H: hole) • Start location • Size • Pointer to Next record • Current state: (H,2,3),(B,5,5),(H,10,2),(D,12,2),(H,14,6)
Memory Allocation: Linked Lists
• Suppose a new process requests 15K, which hole should it use? • First-fit: 30K hole • Best-fit: 20K hole • Worst-fit: 60K hole
Which Algorithm Works Best?
• All could leave many small holes that are too small for storing a
contiguous program:
• External fragmentation problem
• Example (assuming 1K blocks):
• Best-Fit sometimes performs better: Assume holes of 20K and 15K,
requests for 12K followed by 16K can be satisfied only by best-fit
• But First-Fit can also perform better: Assume holes of 20K and 15K,
requests for 12K, followed by 14K, and 7K, can be satisfied only by
first-fit
• In practice, which algorithm works best is highly workload
dependent
Internal Fragmentation
• Suppose we thought program B needs 50K,
but only needs 40K when it starts running
• OS allocates 50K to process B, and remaining
10K is unused
• OS cannot allocate 10K to any other process
External Fragmentation
• Suppose previous 20K free space is partially allocated
to Program C (15K)
• Remaining 5K is too small to fit another program
• OS solution: relax the need to store a program
continuously in physical memory
Buddy Algo: Linux Memory Allocator
• Linux maintains page sizes based on power of 2.
• Example: pages of size 1, 2, 4, 8, etc.
• Page size is typically 4 KB (tunable parameter)
When a request for memory comes in, rounded to power of 2, e.g. 2, 4, 8, etc.
For a request of 8 pages, (a)-(d) involves repeated halving until right memory
size is reached.
(e): second request for eight pages.
(f)-(g): request of 4 pages. Split 16 pages to 8 and then 4.
(h): release of 8 page chunk
(i): release of another 8 page chunk and merge.
Buddy Algorithm Lookup Structure
Let all blocks of pages on the left be free, then the lookup structure is on the
right before the merge of the 2 contiguous blocks of size 2^2.
Note that the linked list item stores the page offset.
Internal Fragmentation Issues
128 wasted space
page
63
65 page
Internal Fragmentation:
128 wasted space
Problem: Internal Fragmentation
Ask for 65-page chunk and get 128
page chunk
Slab Allocation in Linux
• Second memory allocation: the slab allocator
• Takes chunks using the buddy algorithm, but carves slabs (smaller units)
from them, and manage smaller units separately.
• Object caches: dealing with creation/destruction of objects of certain
types
• Pointers to one or more slab, storing objects of the same type
Virtual Memory
Each process has separate virtual address space
Memory management main objective: Maps between virtual and
physical address spaces on per-process basis
• Processes use virtual addresses, which are translated by hardware into
physical addresses
• Typically done in hardware - a Memory Management Unit (MMU)
The mapping can change during the lifetime of a process execution
• If virtual address is more than physical memory, some parts of virtual
memory (particularly infrequently used ones) may be stored on disk
(swap file)
• Many addresses might not have a mapping to physical memory since it
has not yet been allocated or is spilled over to disk
Virtual Addresses
Today, typically a virtual address is 32 or 64 bits
• 32 bits allows a process to have 232 GB or 4GB of virtual memory
• 64 bits will allow 264 bytes to be addressable
• Physical memory is usually smaller than this and varies from machine to
machine
• Virtual address spaces of different processes are distinct
Two ways to organize virtual addresses:
• Paging: address space divided into fixed-size pages
• Segmentation: address space divided into variable-size segments (logical
addressing units)
Paging
Physical memory is divided into chunks called page-frames or frames
• On Linux, each page-frame is 4KB by default
Virtual memory is divided into chunks called virtual pages or pages;
size of a page is equal to size of a page frame
• In a 32-bit machine, 220 pages (a little over a million) in virtual memory
OS keeps track of mapping of pages to page-frames
Memory size calculations:
• 10-bit address : 1KB of memory; 1024 addresses
• 20-bit address : 1MB of memory; about a million addresses
• 30-bit address : 1 GB of memory; about a billion addresses
Virtual Page vs Offset
A virtual address is really a pair (p,o) of addresses
• Low-order bits give an offset o within the page
• This part stays the same after translation to physical address
• High-order bits specify the page p
• This is the part that gets translated
• E.g. If each page is 1KB and virtual address is 16 bits, then low-order 10
bits give the offset and high-order 6 bits give the page number
Virtual Address
Orderings
Within a page/frame, virtual
addresses map to a
contiguous section of physical
memory
But the page-frames themselves can map anywhere in physical memory • Out of order within a process • Interleaved with pages from other processes
MMU and Page Table
Memory Management Unit (MMU) is a hardware component whose
job is to translate the page number p to a frame number f
• MMU is given virtual address (p,o) by CPU
• Translate to physical address (f,o)
• Request for (f,o) is sent on memory bus to fetch data from
memory
Mapping virtual p to physical f
• For every process, there is a page-table (an array of p to f
mappings)
• p is just an index into this array for the translation
• Page table is stored in physical memory (more on this later)
Typical Page Table Entries
Validity bit
• Set to 0 if the corresponding page is not in memory
Frame number
• Number of bits required depends on size of physical memory
Block number
• If page is not in memory but on disk, points to disk block number
Protection bits
• Read, write, execute accesses
Referenced bit
• Set to 1 by hardware when the page is accessed: used by page
replacement policy
Modified bit (dirty bit)
• Set to 1 by hardware on write-access: used to avoid writing when
swapped out
Swap Files
Starts off as empty
Created on demand as pages are spilled to disk
• Each page being swapped is dynamically assigned a block number in
swap file
When pages are freed, holes in swap file
Bitmap is used to identify holes for recycling to swapped pages
Design Issues
Are large page sizes or small page sizes better?
• Typically 1KB – 4KB, but more on the tradeoffs later
What if page table does not fit into memory?
• On 32-bit computers with 4KB page, there are over a million pages, so the
page table is an array with over million entries.
• On 64-bit computers with 4KB page sizes, not possible to fit page table
into physical memory
• Solutions: multi-level page tables; inverted page tables
Reading page table requires additional memory lookup
• TLBs (Translation Lookaside Buffers)
Reading pages not in physical memory
• This is called a page fault, and it traps to kernel
• Page replacement policy: try to pick a page least likely to be used in future
for storing on disk
Multi-Level Paging
A page-table with 220 entries in memory is big
• And in most architectures you need one per process!
• Problem is worse on 64-bit computers
Insight: Make page tables hierarchical, load only needed portions to memory
Break virtual page number into 10-bit chunks:
• First 10-bits index into a top-level page-entry table T1 (1024 entries)
• Each entry in T1 points to another (second-level) page table with its own 1K
entries (4 MB of memory since each page is 4KB)
• Next 10-bits of virtual address index into the second-level page-table selected
by the first 10-bits
Saves memory
• Total of 1K potential second-level tables, but many are likely to be unused
• If a process uses 16 MB virtual memory then it will have only 4 entries in toplevel
table (rest will be marked unused) and only 4 second-level tables
Translation Lookaside Buffers (TLB)
Problem: page tables are stored in main memory
Access to main memory is slow compared to clock cycle on CPU (factor
of about 10 – 10ns vs 1 ns)
• An instruction such as MOVE REG, ADDR has to decode ADDR and thus go
through page tables
• This would make things very, very slow
Standard solution: TLB in MMU
• TLB stores small cache (say, 64) of page table entries to avoid the usual
page-table lookup
TLB is associative memory (content addressable memory) that contains,
basically, pairs of the form (page-no, page-frame)
• Special hardware compares incoming page number in parallel with all
entries in TLB to retrieve page-frame
• If no match found in TLB, we do standard lookup via main memory
• TLB is like a hardware cache of page table for MMU
More on TLBs
Maximize TLB hit rate by keeping most recently used in TLB
TLB needs to be kept in sync with page table
• Modified and reference bits can be modified in TLB for performance, but
later propagated to page table in memory
• Likewise, changes from page tables also need to be propagated to TLB
Inverted Page Tables
When virtual memory is much larger than physical memory, overhead of
storing page-table is high
• For example, in 64-bit machine with 4KB per page and 256 MB memory, there
are just 64K page-frames but 252 pages !
Solution: Inverted page tables that store entries of the form (page-frame,
process-id, page-no)
• First, try to find frame using inverted page table. If not found, search page table
• Inverted page table bounded by physical memory size, i.e. 64K frames
Problem: Given a page p, how to find the corresponding page frame?
• Use a hash table
Used more with 64-bit machines
Steps in Paging
Modern systems use TLBs and multi-level pagin
Paging assumes special hardware support
Overview of steps
1. Input to MMU: virtual address = (page p, offset o)
2. Check if there is a frame f with (p,f) in TLB
3. If so, physical address is (f,o)
4. If not, lookup page-table in main memory
Requires several slow accesses due to multi-level paging
5. If page is present, compute physical address
6. If not, issue page fault interrupt to OS kernel
Even slower
7. Update TLB/page-table entries (e.g. Modified bit)
First Attempts
Use reference bit and modified bit in page-table entry
• Both bits are initially 0
• Read sets reference to 1, write sets both bits to 1
• Reference bit cleared on every clock interrupt (40ms)
Prefer to replace pages that were unused in last clock cycle time period
• First, prefer to keep pages with reference bit set to 1
• Then, prefer to keep pages with modified bit set to 1
Upon a clock interrupt, OS updates CPU-usage counters for scheduling
in PCB as well as reference bits in page tables
Easy to implement; needs additional rules to resolve ties
Queue Based Algorithms
FIFO
• Maintain a linked list of pages in memory in order of arrival
• Replace first page in queue
• Easy to implement, but access info not used at all
Modifications
• Second-chance
• Clock algorithm
Second Chance Replacement
• Pages ordered in a FIFO queue as before
• If the page at front of queue (i.e. oldest page) has reference bit set,
then just put it at end of the queue with R=0, and try again
• Effectively, finds the oldest page with R=0, (or the first one in the
original queue if all have R=1)
• Easy to implement, but slow !!
Clock Algorithm
• Optimization of Second chance
• Keep a circular list with current pointer
• If current page has R=0 then replace, else set R to 0 and move current
pointer
Least Recently Used (LRU)
Assume pages used recently will be used again soon
• Throw out page that has been unused for longest time
Past is usually a good indicator for the future.
It adds significant overhead:
• A timestamp for each memory access. Timestamp is updated in page
table
• Sorted list of pages by timestamp
How to Implement LRU?
Main challenge: How to implement this?
• Reference bit provides only coarse granularity ranking
Counter-based solution
• Maintain a counter that gets incremented with each memory access
• Copy the counter in appropriate page table entry
• On page-fault pick the page with lowest counter
List based solution
• Maintain a linked list of pages in memory
• On every memory access, move the accessed page to end
• Pick the front page on page fault
Approximating LRU: Aging
Bookkeeping on every memory access is expensive
Software solution: OS does this on every clock interrupt
Every page-entry has an additional 8-bit counter
Every clock cycle, for every page in memory:
• Shift the counter 1 bit to the right copying R bit into the high-order bit of
the counter
• Clear R bit
On page-fault, or when pager daemon wants to free up space, pick the
page with lowest counter value
Intuition: High-order bits of recently accessed pages are set to 1 (𝑖𝑡ℎ
high-order bit tells us if page was accessed during 𝑖𝑡ℎ previous clockcycle)
Remember whether a page was accessed within last 8 clock cycles
Bias towards more recent access in ranking pages
Potential problem: Insufficient info to resolve ties
• Only one bit info per clock cycle (typically 40ms)
• Info about accesses more than 8 cycles ago lost
Stack Algorithms
For an algorithm A, reference string r, and page-frames m, let
P(r, m, A) be the set of pages that will be in memory if we run A on
references r using m frames
An algorithm A is called a stack algorithm if for all r and for all m, P(r,
m, A) is a subset of P(r, m+1, A)
• Intuitively, this means the set of pages that A considers relevant grows
monotonically as more memory becomes available
• For stack algorithms, for all r and for all m, F(r, m+1, A) cannot be more
than F(r, m, A) (so increasing memory can only reduce faults!)
LRU is a stack algorithm: P(r, m, LRU) should be the last m pages in r, so P(r, m, LRU) is a subset of P(r, m+1, LRU)
Belady’s Anomaly
Let F(r, m, A) be the faults generated by A
• Belady’s Anomaly: allocating more frames may increase the faults
• F(r, m, A) may be smaller than F(r, m+1, A)
FIFO is not a stack algorithm
Implications: it is not guaranteed for all workloads that increasing
physical memory will result in fewer number of page faults (i.e.
improved performance)
Thrashing
Will the CPU utilization keep increasing monotonically as the degree of
multiprogramming (number of processes in memory) increases?
• No! It increases for a while, but then starts dropping again!
• Why? With many processes around, each one has only a few pages in
memory, so more frequent page faults, more I/O wait, less CPU utilization
Bottom Line:
Cause of low CPU utilization can be either too few or too many processes!
Paging in Linux
Paging is implemented by the kernel and a new process called page
daemon (process 2)
• Process 0 is the idle process
• Process 1 is init
Demand-Paged System: no pre-paging
• Text segment and mapped files paged to respective files on disk
• Everything else paged into swap area
• Uses raw disk – bypass file system for performance and limits on file
block sizes
• Bitmap on paging device indicate which pages are free on disk to store
swapped in-memory pages