Exam 2 Flashcards

1
Q

Sharing of Memory

A

Simplistic view of memory
• All programs fit in memory
• Each program occupies a continuous
part of memory

Key issues:
• How are programs allocated to
physical memory?
• How do we make sure applications
do not step on on another?
• How do we find free space
efficiently?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Naïve Idea 1: Fixed Partitions

A

• Memory is divided up into partitions by operator at startup, e.g.
• One process per partition
• Processes assigned a partition at load time
• Loader rewrites memory addresses
• New processes wait until a sufficiently large partition becomes
available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fixed Partitions Pros and Cons

A

Advantages
• Pretty simple
• Requires relatively little hardware support

Disadvantages
• Have to decide partition boundaries in advance
• Inflexible
• Can be an inefficient use of memory
• Have to know memory requirements in advance (no dynamic memory
allocation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Naïve Idea 2: Base & Limit Registers

A

• Basic idea: hardware support – two special registers (for user-mode
processes)
• A base register, automatically added to each address as it is used
• A limit register, against which all accesses are tested
• Going beyond the limit triggers an error exception
• Nice idea, but requires hardware support to do an add and compare
on each access
• Not used by itself much anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Managing Free Space: Bitmaps

A

• Suppose memory is divided in chunks of 10K
• Maintain a vector of 0 & 1’s that specifies
availability
• 𝑖𝑡ℎ bit tells whether 𝑖𝑡ℎ chunk is free
• For this example: 20 bits
00000011 00111110 0011

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bitmaps pros and cons

A

Advantages
• Simple implementation

Disadvantages
• Slow to search for free bits
• Search for k allocation units for a process requires searching
for k consecutive 0s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Managing Free Space: Linked Lists

A
Each record has
• Process ID/ Free (H: hole)
• Start location
• Size
• Pointer to Next record
• Current state:
(H,2,3),(B,5,5),(H,10,2),(D,12,2),(H,14,6)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Memory Allocation: Linked Lists

A
• Suppose a new process requests
15K, which hole should it use?
• First-fit: 30K hole
• Best-fit: 20K hole
• Worst-fit: 60K hole
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which Algorithm Works Best?

A

• All could leave many small holes that are too small for storing a
contiguous program:
• External fragmentation problem
• Example (assuming 1K blocks):
• Best-Fit sometimes performs better: Assume holes of 20K and 15K,
requests for 12K followed by 16K can be satisfied only by best-fit
• But First-Fit can also perform better: Assume holes of 20K and 15K,
requests for 12K, followed by 14K, and 7K, can be satisfied only by
first-fit
• In practice, which algorithm works best is highly workload
dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Internal Fragmentation

A

• Suppose we thought program B needs 50K,
but only needs 40K when it starts running
• OS allocates 50K to process B, and remaining
10K is unused
• OS cannot allocate 10K to any other process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

External Fragmentation

A

• Suppose previous 20K free space is partially allocated
to Program C (15K)
• Remaining 5K is too small to fit another program
• OS solution: relax the need to store a program
continuously in physical memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Buddy Algo: Linux Memory Allocator

A

• Linux maintains page sizes based on power of 2.
• Example: pages of size 1, 2, 4, 8, etc.
• Page size is typically 4 KB (tunable parameter)
When a request for memory comes in, rounded to power of 2, e.g. 2, 4, 8, etc.
For a request of 8 pages, (a)-(d) involves repeated halving until right memory
size is reached.
(e): second request for eight pages.
(f)-(g): request of 4 pages. Split 16 pages to 8 and then 4.
(h): release of 8 page chunk
(i): release of another 8 page chunk and merge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Buddy Algorithm Lookup Structure

A

Let all blocks of pages on the left be free, then the lookup structure is on the
right before the merge of the 2 contiguous blocks of size 2^2.
Note that the linked list item stores the page offset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Internal Fragmentation Issues

A

128 wasted space
page
63
65 page

Internal Fragmentation:
128 wasted space

Problem: Internal Fragmentation
Ask for 65-page chunk and get 128
page chunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Slab Allocation in Linux

A

• Second memory allocation: the slab allocator
• Takes chunks using the buddy algorithm, but carves slabs (smaller units)
from them, and manage smaller units separately.
• Object caches: dealing with creation/destruction of objects of certain
types
• Pointers to one or more slab, storing objects of the same type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Virtual Memory

A

Each process has separate virtual address space
Memory management main objective: Maps between virtual and
physical address spaces on per-process basis
• Processes use virtual addresses, which are translated by hardware into
physical addresses
• Typically done in hardware - a Memory Management Unit (MMU)
The mapping can change during the lifetime of a process execution
• If virtual address is more than physical memory, some parts of virtual
memory (particularly infrequently used ones) may be stored on disk
(swap file)
• Many addresses might not have a mapping to physical memory since it
has not yet been allocated or is spilled over to disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Virtual Addresses

A

Today, typically a virtual address is 32 or 64 bits
• 32 bits allows a process to have 232 GB or 4GB of virtual memory
• 64 bits will allow 264 bytes to be addressable
• Physical memory is usually smaller than this and varies from machine to
machine
• Virtual address spaces of different processes are distinct

Two ways to organize virtual addresses:
• Paging: address space divided into fixed-size pages
• Segmentation: address space divided into variable-size segments (logical
addressing units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Paging

A

Physical memory is divided into chunks called page-frames or frames
• On Linux, each page-frame is 4KB by default
Virtual memory is divided into chunks called virtual pages or pages;
size of a page is equal to size of a page frame
• In a 32-bit machine, 220 pages (a little over a million) in virtual memory
OS keeps track of mapping of pages to page-frames
Memory size calculations:
• 10-bit address : 1KB of memory; 1024 addresses
• 20-bit address : 1MB of memory; about a million addresses
• 30-bit address : 1 GB of memory; about a billion addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Virtual Page vs Offset

A

A virtual address is really a pair (p,o) of addresses
• Low-order bits give an offset o within the page
• This part stays the same after translation to physical address
• High-order bits specify the page p
• This is the part that gets translated
• E.g. If each page is 1KB and virtual address is 16 bits, then low-order 10
bits give the offset and high-order 6 bits give the page number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Virtual Address

Orderings

A

Within a page/frame, virtual
addresses map to a
contiguous section of physical
memory

But the page-frames
themselves can map anywhere
in physical memory
• Out of order within a process
• Interleaved with pages from
other processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

MMU and Page Table

A

Memory Management Unit (MMU) is a hardware component whose
job is to translate the page number p to a frame number f
• MMU is given virtual address (p,o) by CPU
• Translate to physical address (f,o)
• Request for (f,o) is sent on memory bus to fetch data from
memory

Mapping virtual p to physical f
• For every process, there is a page-table (an array of p to f
mappings)
• p is just an index into this array for the translation
• Page table is stored in physical memory (more on this later)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Typical Page Table Entries

A

Validity bit
• Set to 0 if the corresponding page is not in memory
Frame number
• Number of bits required depends on size of physical memory
Block number
• If page is not in memory but on disk, points to disk block number
Protection bits
• Read, write, execute accesses
Referenced bit
• Set to 1 by hardware when the page is accessed: used by page
replacement policy
Modified bit (dirty bit)
• Set to 1 by hardware on write-access: used to avoid writing when
swapped out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Swap Files

A

Starts off as empty

Created on demand as pages are spilled to disk
• Each page being swapped is dynamically assigned a block number in
swap file

When pages are freed, holes in swap file

Bitmap is used to identify holes for recycling to swapped pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Design Issues

A

Are large page sizes or small page sizes better?
• Typically 1KB – 4KB, but more on the tradeoffs later

What if page table does not fit into memory?
• On 32-bit computers with 4KB page, there are over a million pages, so the
page table is an array with over million entries.
• On 64-bit computers with 4KB page sizes, not possible to fit page table
into physical memory
• Solutions: multi-level page tables; inverted page tables

Reading page table requires additional memory lookup
• TLBs (Translation Lookaside Buffers)

Reading pages not in physical memory
• This is called a page fault, and it traps to kernel
• Page replacement policy: try to pick a page least likely to be used in future
for storing on disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Multi-Level Paging

A

A page-table with 220 entries in memory is big
• And in most architectures you need one per process!
• Problem is worse on 64-bit computers

Insight: Make page tables hierarchical, load only needed portions to memory

Break virtual page number into 10-bit chunks:
• First 10-bits index into a top-level page-entry table T1 (1024 entries)
• Each entry in T1 points to another (second-level) page table with its own 1K
entries (4 MB of memory since each page is 4KB)
• Next 10-bits of virtual address index into the second-level page-table selected
by the first 10-bits

Saves memory
• Total of 1K potential second-level tables, but many are likely to be unused
• If a process uses 16 MB virtual memory then it will have only 4 entries in toplevel
table (rest will be marked unused) and only 4 second-level tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Translation Lookaside Buffers (TLB)

A

Problem: page tables are stored in main memory

Access to main memory is slow compared to clock cycle on CPU (factor
of about 10 – 10ns vs 1 ns)
• An instruction such as MOVE REG, ADDR has to decode ADDR and thus go
through page tables
• This would make things very, very slow

Standard solution: TLB in MMU
• TLB stores small cache (say, 64) of page table entries to avoid the usual
page-table lookup

TLB is associative memory (content addressable memory) that contains,
basically, pairs of the form (page-no, page-frame)
• Special hardware compares incoming page number in parallel with all
entries in TLB to retrieve page-frame
• If no match found in TLB, we do standard lookup via main memory
• TLB is like a hardware cache of page table for MMU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

More on TLBs

A

Maximize TLB hit rate by keeping most recently used in TLB

TLB needs to be kept in sync with page table
• Modified and reference bits can be modified in TLB for performance, but
later propagated to page table in memory
• Likewise, changes from page tables also need to be propagated to TLB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Inverted Page Tables

A

When virtual memory is much larger than physical memory, overhead of
storing page-table is high
• For example, in 64-bit machine with 4KB per page and 256 MB memory, there
are just 64K page-frames but 252 pages !

Solution: Inverted page tables that store entries of the form (page-frame,
process-id, page-no)
• First, try to find frame using inverted page table. If not found, search page table
• Inverted page table bounded by physical memory size, i.e. 64K frames

Problem: Given a page p, how to find the corresponding page frame?
• Use a hash table

Used more with 64-bit machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Steps in Paging

A

Modern systems use TLBs and multi-level pagin

Paging assumes special hardware support

Overview of steps
1. Input to MMU: virtual address = (page p, offset o)
2. Check if there is a frame f with (p,f) in TLB
3. If so, physical address is (f,o)
4. If not, lookup page-table in main memory
 Requires several slow accesses due to multi-level paging
5. If page is present, compute physical address
6. If not, issue page fault interrupt to OS kernel
 Even slower
7. Update TLB/page-table entries (e.g. Modified bit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

First Attempts

A

Use reference bit and modified bit in page-table entry
• Both bits are initially 0
• Read sets reference to 1, write sets both bits to 1
• Reference bit cleared on every clock interrupt (40ms)

Prefer to replace pages that were unused in last clock cycle time period
• First, prefer to keep pages with reference bit set to 1
• Then, prefer to keep pages with modified bit set to 1

Upon a clock interrupt, OS updates CPU-usage counters for scheduling
in PCB as well as reference bits in page tables

Easy to implement; needs additional rules to resolve ties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Queue Based Algorithms

A

FIFO
• Maintain a linked list of pages in memory in order of arrival
• Replace first page in queue
• Easy to implement, but access info not used at all

Modifications
• Second-chance
• Clock algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Second Chance Replacement

A

• Pages ordered in a FIFO queue as before
• If the page at front of queue (i.e. oldest page) has reference bit set,
then just put it at end of the queue with R=0, and try again
• Effectively, finds the oldest page with R=0, (or the first one in the
original queue if all have R=1)
• Easy to implement, but slow !!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Clock Algorithm

A

• Optimization of Second chance
• Keep a circular list with current pointer
• If current page has R=0 then replace, else set R to 0 and move current
pointer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Least Recently Used (LRU)

A

Assume pages used recently will be used again soon
• Throw out page that has been unused for longest time

Past is usually a good indicator for the future.

It adds significant overhead:
• A timestamp for each memory access. Timestamp is updated in page
table
• Sorted list of pages by timestamp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How to Implement LRU?

A

Main challenge: How to implement this?
• Reference bit provides only coarse granularity ranking

Counter-based solution
• Maintain a counter that gets incremented with each memory access
• Copy the counter in appropriate page table entry
• On page-fault pick the page with lowest counter

List based solution
• Maintain a linked list of pages in memory
• On every memory access, move the accessed page to end
• Pick the front page on page fault

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Approximating LRU: Aging

A

Bookkeeping on every memory access is expensive

Software solution: OS does this on every clock interrupt

Every page-entry has an additional 8-bit counter

Every clock cycle, for every page in memory:

• Shift the counter 1 bit to the right copying R bit into the high-order bit of
the counter
• Clear R bit

On page-fault, or when pager daemon wants to free up space, pick the
page with lowest counter value

Intuition: High-order bits of recently accessed pages are set to 1 (𝑖𝑡ℎ
high-order bit tells us if page was accessed during 𝑖𝑡ℎ previous clockcycle)

Remember whether a page was accessed within last 8 clock cycles

Bias towards more recent access in ranking pages

Potential problem: Insufficient info to resolve ties
• Only one bit info per clock cycle (typically 40ms)
• Info about accesses more than 8 cycles ago lost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Stack Algorithms

A

For an algorithm A, reference string r, and page-frames m, let
P(r, m, A) be the set of pages that will be in memory if we run A on
references r using m frames

An algorithm A is called a stack algorithm if for all r and for all m, P(r,
m, A) is a subset of P(r, m+1, A)
• Intuitively, this means the set of pages that A considers relevant grows
monotonically as more memory becomes available
• For stack algorithms, for all r and for all m, F(r, m+1, A) cannot be more
than F(r, m, A) (so increasing memory can only reduce faults!)

LRU is a stack algorithm: P(r, m, LRU) should be the last m pages in r,
so P(r, m, LRU) is a subset of P(r, m+1, LRU)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Belady’s Anomaly

A

Let F(r, m, A) be the faults generated by A
• Belady’s Anomaly: allocating more frames may increase the faults
• F(r, m, A) may be smaller than F(r, m+1, A)

FIFO is not a stack algorithm

Implications: it is not guaranteed for all workloads that increasing
physical memory will result in fewer number of page faults (i.e.
improved performance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Thrashing

A

Will the CPU utilization keep increasing monotonically as the degree of
multiprogramming (number of processes in memory) increases?
• No! It increases for a while, but then starts dropping again!
• Why? With many processes around, each one has only a few pages in
memory, so more frequent page faults, more I/O wait, less CPU utilization

Bottom Line:
Cause of low CPU utilization can be either too few or too many processes!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Paging in Linux

A

Paging is implemented by the kernel and a new process called page
daemon (process 2)
• Process 0 is the idle process
• Process 1 is init

Demand-Paged System: no pre-paging
• Text segment and mapped files paged to respective files on disk
• Everything else paged into swap area
• Uses raw disk – bypass file system for performance and limits on file
block sizes
• Bitmap on paging device indicate which pages are free on disk to store
swapped in-memory pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Page Replacement in Linux

A

4 types of pages:
• Unreclaimable – reserved or locked pages, kernel stack
• Swappable – Must be written back to swap area when paged out (typical
user-level pages, heap or stack)
• Syncable pages – written to disk immediately if dirtied
• Memory-mapped files
• Discardable – can be reclaimed immediately
• Unused pages

Try to pick discardable, followed by swappable (that are not referenced
recently, using a clock-like algorithm)

42
Q

Page Replacement Policy

A

Two LRU lists: Active and Inactive
• Reclaim from inactive list first
• If page has not been referenced recently, move to inactive list
• If page is referenced:
• Set reference flag to be true
• Move to active list next time it is accessed
• Two page accesses to be declared active
• If second access does not happen, reference flag is reset periodically
• After two timeouts, move a page to inactive state

43
Q

Page Replacement

A

We might page fault if:
• A process is allocating new memory
• A process accesses memory currently not in physical memory but
previously swapped out to disk

How to load a virtual page into physical frame during page
fault:
• If there is available space in physical memory, we just load the page
from disk into a free frame
• Otherwise, we have to select a page and swap it out (write page
contents to disk) and use the freed frame to load in new page

44
Q

Loading Page from Swap to Memory

A

When there is no free frames, OS needs to select a frame and move its
page (the victim) to disk
• Consult the page replacement rule to select a victim
• Might belong to current process or might belong to another

After a victim is selected:
• If dirty bit is set, copy the victim page back to swap file
• If dirty bit is not set, just evict the page to free up slot

If victim page needs to be copied to swap file:
• Copy to its previously allocated location in swap file
• If victim has never been in the swap file, find a few blocks in swap file,
copy the page to swap file, and note the assigned block number in that
process’ page table
• While a page is copied to disk, current process is blocked

After frame is freed up:
• Locate block number of new page to load from swap file and schedule
read transfer request
• Block current process until complete

45
Q

Page Replacement Rule

A

If a swapped page is used again in near future, it will be copied backand-
forth physical memory

Performance influenced by which page we pick
• Avoid swapping actively / recently used pages to disk
• Identify “inactive” pages not likely to be used in the near future

Optimal Page Replacement rule (OPT):
• Pick the page to be accessed farthest in the future
• Can’t actually do this, of course

Real replacement rules aim to approximate OPT
• Use the past to predict the future

46
Q

Virtual Memory Performance

A

Applications work well if their heavily used virtual memory pages are in
physical memory and swapping is minimized

High penalty for page faults
• TLB lookup (small cost) &laquo_space;Page table lookup (medium cost, a few
memory references) &laquo_space;Disk access (high cost, writing and reading)

Page faults is inevitable since virtual addresses are larger than physical
addresses

Two factors influence whether a page is likely to be in memory:
• Amount of real memory
• Replacement policy

47
Q

I/O Devices

A
  • Storage devices
  • Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD…
  • User interaction
  • Keyboard, mouse, monitor, sound card, printer, modem ….
  • Significant fraction of OS code deals with I/O
  • Device drivers
48
Q

Device Controllers

A

• Devices have two parts
• Physical / mechanical
• Electronic / interface
• Usually including a controller
• For most devices, controller’s basic job is to convert serial bitstream
(to/from device) into blocks (to/from CPU)
• Also error handling using error correction codes
• Controller interface
• Device registers (status and other control functions)
• Data buffers

49
Q

How do Processes Access Devices?

A
  • Two basic approaches
  • Port-mapped I/O
  • IN register, port
  • Reads value from I/O port into register
  • Memory-mapped
  • Part of virtual address space really goes to devices
50
Q

Memory Mapped I/O

A

• In a machine instruction “MOV R, L”, L can refer to:
• A location in main memory, or
• A control register, or address in data buffer, of a specific I/O device
• Memory Mapped I/O:
• I/O space viewed as part of virtual memory
• Directly access control registers or data buffers by reading from
memory

51
Q

Naïve Busy Wait Device Driver

A
  • Suppose we want to send N byte buffer to some device (say, the printer)
  • Device has two registers (mapped as memory locations)
  • Status, which we read, to learn if device is ready
  • Data, which we write, to send byte to device
  • The driver busy wait on status:
for (i=0; i< N; i++) {
/* wait for device to be free */
while (*Status != READY)
;
*Data = buffer[i];
}
52
Q

Device Driver with Interrupts

A
  • Driver and interrupt handler
  • Driver copies a single byte into buffer and schedules interrupt:

while (*Status != ready)
;
Data = buffer[0]; / 1 byte */
InvokeScheduler();

• Interrupt handler wakes up and reads data:

if (N == 0) Wakeup(); /* unblock */
else {
*Data = buffer[i];
N--; i++;
}
ResetInterrupt(); /*Notify device (ack) */
53
Q

I/O with Direct Memory Access (DMA)

A

• Problem: CPU is kept busy while device driver is writing data to
external device
• DMA controller:
• Separate hardware co-processor dedicated for I/O
• Co-processor helps the CPU copy data between memory and device
• Frees up the CPU to do other work
• DMA controller interrupts CPU only after the entire transfer
• DMA controller can be integrated with device controller, but usually
one (on parentboard) for all devices

54
Q

Reading Disk Block without DMA

A

• If DMA is not used:
• Disk controller reads each disk block from the drive serially, byte by byte,
until the entire block is in the controller’s internal buffer
• Disk controller computes checksum to verify no read errors occur
• Controller causes an interrupt.
• OS’s interrupt handler read disk block from controller’s buffer a byte or a
word at time (in a loop), and storing it in main memory
• During this entire time, the CPU is kept busy copying from controller
buffer to main memory

55
Q

Device Drivers

A

• Device drivers are the interface between the OS and a device
• Whenever a new device is plugged into a computer, the OS installs new
device drivers
• Device Driver includes device-specific interrupt handlers
• Device-specific I/O routines
• Device-specific control routines (ioctl)
• Device drivers may be hard-coded part of OS kernel, dynamically
loaded, or (sometimes) part of a user process
• Depends on OS and kind of device

56
Q

Device Drivers

A
Logical positioning of
device drivers. In reality all
communication between
drivers and device
controllers goes over the
bus.
57
Q

I/O Software Layers

A

• Device drivers: Starts I/O operation and blocks until I/O completes
• Typically structured as kernel processes, with own states, stacks, and
program counters
• Interrupt handler: unblocks driver typically using semaphores or
condition variables
• Setting up a context for interrupt service procedure
• Running the interrupt service procedure which extracts information from
device controller register
• Unblocking device driver to run
• Upon completion, context switch to new process (which may be high
priority process which blocked in I/O previously)

58
Q

Kinds of Devices

A

• OS-related devices
• Clock interrupt, etc,
• User related devices:
• Keyboard, mouse
• Shared by many processes:
• Disks, network interfaces, etc.
• Requires demultiplexing, i.e. relaying data between device and the
appropriate process
• Block vs character devices:
• Block devices (e.g. disks, file systems)
• Character devices (keyboards, printers)

59
Q

Naming Devices

A

• Even with memory mapped I/O, it is difficult for processes to track the
addresses used for communicating with the various devices installed
on a system
• Instead, many operating systems have a notion of a device namespace
that processes use to refer to devices
• Manipulate these devices by opening, reading/writing to file abstractions
• For example, the first printer might be named /dev/printer1

60
Q

Sharing Devices Among Processes

A

• Some devices might be used by more than one process at a time
• For example, multiple processes writing to same disk
• Solutions: Have all access go through a subsystem that translates
requests, schedules I/O and enforces order
• Might be in OS kernel or in a user process
• For example, file system manages access to disks

61
Q

Socket API

A

• Applications uses the Socket Application Programming Interface (API)
to use the network

62
Q

Sockets

A
  • Well-known OS abstraction denoting an endpoint of a connection
  • A socket is associated with each end-point (end-host) of a connection
  • Identified by IP address and port number
  • Berkeley sockets is the most popular network API
  • Runs on Linux, FreeBSD, OS X, Windows
  • Feed off popularity of TCP/IP
63
Q

Naming and Addressing

A
  • IP address
  • Identifies a single host
  • 32 bits (IPv4), 128 bits (IPv6)
  • Written as dotted octets (8 bits)
  • e.g., 0x0a000001 is 10.0.0.1
  • Host name
  • Identifies a single host
  • Variable length string
  • Maps to one or more IP address
  • e.g., www.upenn.edu
  • gethostbyname() translates name to IP address
64
Q

Knowing What Port Number To Use

A
  • Popular applications have well-known ports
  • E.g., port 80 for Web and port 25 for e-mail
  • See http://www.iana.org/assignments/port-numbers
  • Well-known vs. ephemeral ports
  • Network service has a well-known port (e.g., port 80)
  • Between 0 and 1023
  • Unused ephemeral (i.e., temporary) port
  • Between 1024 and 65535
65
Q

Byte Ordering

A

increasing memory addresses
address A +1 address A

little-endian - high-order byte/low-order byte
16-bit value
(network byte-order)
big-endian - low-order byte/high-order byte

66
Q

Byte Ordering Solution

A
uint16_t htons(uint16_t host16bitvalue);
uint32_t htonl(uint32_t host32bitvalue);
uint16_t ntohs(uint16_t net16bitvalue);
uint32_t ntohl(uint32_t net32bitvalue);
• Use for all numbers (int, short) to be sent across network
• Typically used for port numbers
67
Q

Types of Sockets

A
  • Different types of sockets implement different service models
  • Stream vs datagram
  • Stream Socket (aka TCP)
  • Connection-oriented (includes establishment + termination)
  • Reliable, in order delivery
  • At-most-once delivery, no duplicates
  • E.g., ssh, http
  • Datagram Socket (aka UDP)
  • Connectionless (just data-transfer)
  • “Best-effort” delivery, possibly lower variance in delay
  • E.g., IP Telephony, streaming audio

• Stream Sockets
• No need to packetize data
• Data arrives in the form of a byte-stream
• Receiver needs to separate messages in stream
• Datagram Sockets
• User packetizes data before sending
• Maximum size of 64Kbytes
• “Hello there!” and “I love programming” will definitely be sent in separate
packets at network layer

68
Q

Stream Sockets

A
  • Implements Transmission Control Protocol (TCP)
  • Does NOT set up virtual-circuit!
  • Sequence of actions:
69
Q

Initialize (Client + Server)

A

int s = socket(PF_INET, SOCK_STREAM, 0)) <0);

• Create a socket
• int socket(int domain, int type, int protocol)
• Returns a descriptor (or handle) for the socket
• Arguments:
• Domain: protocol family
• PF_INET for the Internet
• Others: PF_INET6, PF_PACKET, PC_APPLETALK
• Type of communication:
• SOCK_STREAM: reliable byte stream
• SOCK_DGRAM: message-oriented service
• Protocol: normally only one protocol for each type of communication
within each protocol family. So we normally put it 0

70
Q

Client: Connecting Socket to the Server

A

if(connect(s, (struct sockaddr *)&sin,
sizeof(sin))<0) {
perror(“simplex-talk: connect”);
}

• Establishing the connection
• int connect(int sockfd, struct sockaddr *server_address,
socketlen_t addrlen)
• Arguments: socket descriptor, server address, and address size
• Returns 0 on success, and -1 if an error occurs
• Translating the server’s name to an address
• struct hostent *gethostbyname(char *name)
• Argument: host name (e.g., “www.cnn.com”)
• sockaddr: AF_INET, host address, htons(PORT)

71
Q

Client: Sending and Receiving Data

A
  • Sending data
  • ssize_t send(int sockfd, void *buf, size_t len, int flags)
  • Arguments: socket descriptor, pointer to data buffer, buffer size, and flags
  • Returns the number of characters written, and -1 on error
  • Receiving data
  • ssize_t read(int sockfd, void *buf, size_t len, int flags)
  • Arguments: socket descriptor, pointer to data buffer, buffer size, and flags
  • Returns the number of characters read (0 = “end of file”), and -1 on error
72
Q

Disk Blocks

A

Basic unit of disk storage is the “block”
• Disk controller provides view of disk as array of blocks
• Some FSs support several different block sizes
• Block size based on hardware sector size

Reads and write are of an entire block at a time

Typical block size: 2k-16k bytes

Tradeoff:
• If blocks too small, more data to manage, slow to write several blocks at a
time
• If blocks too big, wasted space

73
Q

Disk Layout

A

Disk drives typically divided into one or more logical drives, called
“partitions”
• Partition Table at fixed place on disk gives layout

Each partition may contain
• Boot block (loads OS in kernel space)
• Superblock (contains basic info about file system which is read into
memory at boot time)
• Free space data structure
• List of I-nodes (or other data structure) giving info about individual files
(more on this later…)
• Allocatable blocks for directory and file data

74
Q

The Classic Unix File System

A


Each disk partition has 4 Regions

Block 0: boot block

Block 1: super block. Contains the size of the disk and the boundaries of
the other regions

Free space management: meta data indicating where free blocks are

I nodes: list of i nodes, each is a 64 byte structure

Root directory block: for bootstrapping directory traversal

Free storage blocks for the contents of files and directories

Each i node contains owner, protection bits, size, directory/file and 13
disk addresses.

The first 10 of these addresses point directly at the first 10 blocks of a file.
If a file is larger than 10 blocks (5,120 bytes), the 11th points at a block for
secondary indexing (single indirect block)

75
Q

Disk Space Allocation

A
Similar to memory allocation problem
•
How to represent free/allocated blocks
•
How to allocate & free up blocks
Some practical differences, however
•
Performance of disk isn’t like memory
•
Access patterns may be different
Goals
•
Fast sequential access
•
Fast random access
•
Ability to dynamically grow
•
Minimize fragmentation
76
Q

Allocation Schemes

A
•
Contiguous allocation (fixed)
•
CD ROMs, etc.
•
Linked list allocation
•
Linked list with file allocation table (FAT)
•
MS DOS FAT File System
•
Linked list with Indexing (I nodes)
•
Unix File System
77
Q

Linked List Allocation

A


Natural idea: linked list of blocks for each file

Directory entry of a file points to first block, and each block begins with
a pointer to next block

No external fragmentation

But random access is slow

To get to block n, n 1 blocks have to be read up front

78
Q

Linked List via

File Allocation Tables (FAT)

A


Problem: pointers in (slow) disk data blocks

Solution: put them somewhere else

FAT is an array indexed by blocks

Each array entry is a pointer to next block

Copy of FAT in main memory

Random access still requires traversing the chain of pointers, however,
they are in main memory and no disk I/O is needed

Problem: the entire FAT must be in memory, and as disk size increases,
so does FAT

20GB disk with 1K block size needs 20 million entries, which requires
entry sizes of minimum 3 bytes, which is results a FAT of 60MB

79
Q

Directories

A
A directory entry provides the info needed to find the disk data blocks
of a file
• Disk address of first block and size
• Or address of first block
• Or number of associated i-node

File attributes can be stored in the directory entry (Windows) or in its i-node (Unix)

File name is usually its directory entry
• May be fixed size or variable length
• e.g., Unix supports long file names (255 chars)

80
Q

Implementing Directories

A
  • (a) A simple MS-DOS directory
  • Fixed size entries
  • Disk addresses and attributes in directory entry

• (b) Unix directory in which each entry has name and
pointer to file’s i-node

81
Q

Directory Entry

A

• Each entry has file name and an I I-node number
• I-node contains all the attributes
• Restriction: File name size is bounded (14 chars); newer versions use a
heap to allow longer names

82
Q

Traversing Directory Structure

A

• Suppose we want to read the file / usr /astast/mbox
• Location of a i-node, given i-number, can be computed easily
• I-number of the root directory known (usually 2)
• Read in i-node 2 from the disk into memory
• Find out the location of the root directory file on disk, and read the
directory block in memory
• If directory spans multiple blocks, then read blocks until usr found
• Find out the i-number of directory usr , which is 6
• Read in i-node 6 from disk
• Find out the location of the directory file usr on disk, and read in the
block containing this directory
• Find out the i-number of directory ast (26), and repeat

83
Q

Links

A

May need more than one name for a file
• May be in a different director

Hard link - early binding
• Pointer to i-node added to the directory entries
• Link counter in the i-node linked to incremented
• I-node can only be deleted (data addresses cleared) if counter goes down
to 0 (why?)
• then space can be reclaimed

Soft link (symbolic) - late binding
• New special file created of type LINK, which contains just the path name
• New file also has its own i-node created
84
Q

Managing Free Disk Space

A

Familiar problem: keeping track of unallocated blocks

Linked Lists
• A list of blocks containing addresses of free blocks
• e.g., each block address is 32 bits (4 Bytes)
• A 1KB block used can hold addresses of 255 free blocks and an address of
the next block in the list for free space management
• Note: list can be stored in free blocks themselves

Bitmap method:
• An array of bits, one entry per block, indicating whether that block is free
or not
• 16 -GB disk has 16 million 1 KB blocks
• Storing the bitmap requires 16 million bits, or 2048 blocks

85
Q

caching

A

• Since disk I/O is slow, a part of memory is reserved to hold disk blocks
that are likely to be accessed repeatedly
• Basic data structure:
• Maintain a doubly linked list of blocks
• Use hashing so that a block can be quickly accessed from its address
• Efficient LRU implementation possible: every time a block is accessed
move it to the back of the list (no special hardware needed)
Caching
Rear (MRU)

86
Q

Cache Writing Strategies

A

If there is a crash, modified blocks in cache will be lost
• Especially bad for blocks containing i-nodes or FAT
• And LRU guarantees loss of the exactly the worst blocks!

MS -DOS: write write-through cache
• Every time a block is modified, a request to write it to disk is issued
• Loop with putchar () generates busy disk I/O
• Reads can still be fast

87
Q

File System Backup

A
•
Disks aren’t very reliable
•
Backup purpose
•
Recover from (natural) disaster
•
Recover from error
•
Recover from attack
•
Backup tools
•
Full dump
•
Incremental dump
•
Physical dump
•
Logical dump
88
Q

Logical Dump

A

A file system to be dumped
• Squares are directories, circles are files
• Shaded items, modified since last dump
• Each directory & file labeled by i-node number

89
Q

Logical Dump Bitmaps

A

(a) all directories and modified files
(b) all modified files, their parent directories and modified directories
(c) all directories to be dumped
(d) all files to be dumped

90
Q

File System Consistency

A
•
Disk I/O is buffered (delayed write)
•
There may be a crash before modified blocks in memory are written back
to the disk
•
File system might become inconsistent
•
fsck
•
scandisk
•
Block consistency a block is either used (listed in i node) by a file or is
free (listed in free list)
•
File consistency a file has the same number of directory entries (i.e. the
same number of hard links) as the link count in its i node
91
Q

Overview of Virtualization

A


Allow single physical machine can run multiple operating systems
concurrently, each in its own virtual machine , or VM

Key terminologies:

Host underlying hardware system

Virtual machine manager (VMM) or hypervisor creates and runs
virtual machines on the host machine

Guest process that usually runs an operating system over the VMM

92
Q

Types of Hypervisors

A

Many different types:

Type 0 Hypervisors - Hardware-based solutions that provide support for
virtual machine creation and management via firmware. Early IBM approach.

Type 1 Hypervisors – Operating systems that provide standard operating
system functions integrated with VMM functions
• Including VMware ESX, Citrix XenServer. Microsoft Windows Server with HyperV
and RedHat Linux with KVM

Type 2 Hypervisors - Applications that run on standard operating systems
but provide VMM features to guest operating systems
• Including VMware Workstation and Fusion, and Oracle VirtualBox

93
Q

Other VMM Variations

A

Paravirtualization - Technique in which the guest operating system is
modified to work in cooperation with the VMM to optimize performance
• Programming-environment virtualization - VMMs do not virtualize real
hardware but instead create an optimized virtual system
• Used by Oracle Java and Microsoft.Net
• Emulators – Allow applications written for one hardware environment to
run on a very different hardware environment, such as a different type of
CPU
• Application containment - Not virtualization at all but rather provides
virtualization-like features by segregating applications from the operating
system, making them more secure, manageable
• Oracle Solaris Zones, BSD Jails, IBM AIX WPARs, Linux containers (LCX)

94
Q

Benefits of Virtualization

A


End users: run different OSes on a single machine without rebooting
or dealing with different partitions

VMs isolated from other VMs

Allow for sandbox testing (experimental operating systems or those with
security vulnerabilities)

Data center management in cloud computing era:

Freeze, suspend , running VM

Then can move or copy somewhere else and resume

Periodic backups for restoring to consistent state

Some VMMs allow multiple snapshots per VM

Load balancing

Migrate VMs are heavily loaded to less loaded machines

Templating create an OS + application VM, clone multiple copies easily

95
Q

Building Blocks

A


Virtual CPU (vCPU)

Represent state of CPU per guest as guest believes it to be

When guest context switched onto CPU by VMM, information from vCPU
loaded and stored

Analogous to the PCB, but for entire virtual machine state

Several techniques, as described in next slides

96
Q

OS Component

CPU Scheduling

A


VMM configures each guest OS with certain number of VCPUs

Can be adjusted throughout life of VM

Ideal case: each vCPU is tied to one actual physical CPU.

Non ideal case: multiple VCPUs are mapped to each physical CPU.

Overcommitment scenario:

VMM use standard scheduling algorithms to decide which vCPU to run next

97
Q

Shadow Page Tables

A

• Guest OS creates page tables, but these are not used by actual
hardware MMU
• VMM manages the real page tables, one set per virtual machine. These
are called shadow page tables
• Shadow page tables map from guest virtual address to actual physical
addresses, bypassing guest physical address
• These tables are loaded into the MMU on a context switch

Advantage:

Replace gva  gpa and gpa  to hpa translation to just gva  hpa

Maintaining shadow page table:

VMM maps guest OS page table as read only (identify these pages and
mark them as read only)

When guest OS writes to guest page table, traps to VMM

VMM applies write to shadow page table and actual OS page table

98
Q

Types of VMs

Paravirtualization

A


What if we can modify the guest operating system to take advantage of
VMM?

VMM provides services that guest must be modified to use

Leads to increased performance

Less needed as hardware support for VMs grows

Xen, leader in paravirtualized space, adds several techniques

For example, clean and simple device abstractions

Efficient I/O

Good communication between guest and VMM about device I/O

Each device has circular buffer shared by guest and VMM via shared memory

99
Q

Types of VMs

Containers

A


What if we care more about isolation and supporting multiple tenants,
and less about running different operating systems on the same
physical machine?

Retain desired aspects of VMMs: segregation of apps, performance
and resource management, easy start, stop, move, and management
of them

Forgo running different operating systems

Less performance penalty for running full fledged VMMs

100
Q

Programming
Environment
Virtualization

A


Java Virtual Machine (JVM) as classic example

Java: Popular language / application environment invented by Sun
Microsystems in 1995

Write once, run anywhere

Java programs are compiled into architecture neutral bytecode
((.class ) which JVM loads and runs

JVM can be operating system specific, but bytecode is common across
all patforms