Exam 2 Flashcards

1
Q

Sharing of Memory

A

Simplistic view of memory
• All programs fit in memory
• Each program occupies a continuous
part of memory

Key issues:
• How are programs allocated to
physical memory?
• How do we make sure applications
do not step on on another?
• How do we find free space
efficiently?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Naïve Idea 1: Fixed Partitions

A

• Memory is divided up into partitions by operator at startup, e.g.
• One process per partition
• Processes assigned a partition at load time
• Loader rewrites memory addresses
• New processes wait until a sufficiently large partition becomes
available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fixed Partitions Pros and Cons

A

Advantages
• Pretty simple
• Requires relatively little hardware support

Disadvantages
• Have to decide partition boundaries in advance
• Inflexible
• Can be an inefficient use of memory
• Have to know memory requirements in advance (no dynamic memory
allocation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Naïve Idea 2: Base & Limit Registers

A

• Basic idea: hardware support – two special registers (for user-mode
processes)
• A base register, automatically added to each address as it is used
• A limit register, against which all accesses are tested
• Going beyond the limit triggers an error exception
• Nice idea, but requires hardware support to do an add and compare
on each access
• Not used by itself much anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Managing Free Space: Bitmaps

A

• Suppose memory is divided in chunks of 10K
• Maintain a vector of 0 & 1’s that specifies
availability
• 𝑖𝑡ℎ bit tells whether 𝑖𝑡ℎ chunk is free
• For this example: 20 bits
00000011 00111110 0011

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bitmaps pros and cons

A

Advantages
• Simple implementation

Disadvantages
• Slow to search for free bits
• Search for k allocation units for a process requires searching
for k consecutive 0s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Managing Free Space: Linked Lists

A
Each record has
• Process ID/ Free (H: hole)
• Start location
• Size
• Pointer to Next record
• Current state:
(H,2,3),(B,5,5),(H,10,2),(D,12,2),(H,14,6)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Memory Allocation: Linked Lists

A
• Suppose a new process requests
15K, which hole should it use?
• First-fit: 30K hole
• Best-fit: 20K hole
• Worst-fit: 60K hole
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which Algorithm Works Best?

A

• All could leave many small holes that are too small for storing a
contiguous program:
• External fragmentation problem
• Example (assuming 1K blocks):
• Best-Fit sometimes performs better: Assume holes of 20K and 15K,
requests for 12K followed by 16K can be satisfied only by best-fit
• But First-Fit can also perform better: Assume holes of 20K and 15K,
requests for 12K, followed by 14K, and 7K, can be satisfied only by
first-fit
• In practice, which algorithm works best is highly workload
dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Internal Fragmentation

A

• Suppose we thought program B needs 50K,
but only needs 40K when it starts running
• OS allocates 50K to process B, and remaining
10K is unused
• OS cannot allocate 10K to any other process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

External Fragmentation

A

• Suppose previous 20K free space is partially allocated
to Program C (15K)
• Remaining 5K is too small to fit another program
• OS solution: relax the need to store a program
continuously in physical memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Buddy Algo: Linux Memory Allocator

A

• Linux maintains page sizes based on power of 2.
• Example: pages of size 1, 2, 4, 8, etc.
• Page size is typically 4 KB (tunable parameter)
When a request for memory comes in, rounded to power of 2, e.g. 2, 4, 8, etc.
For a request of 8 pages, (a)-(d) involves repeated halving until right memory
size is reached.
(e): second request for eight pages.
(f)-(g): request of 4 pages. Split 16 pages to 8 and then 4.
(h): release of 8 page chunk
(i): release of another 8 page chunk and merge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Buddy Algorithm Lookup Structure

A

Let all blocks of pages on the left be free, then the lookup structure is on the
right before the merge of the 2 contiguous blocks of size 2^2.
Note that the linked list item stores the page offset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Internal Fragmentation Issues

A

128 wasted space
page
63
65 page

Internal Fragmentation:
128 wasted space

Problem: Internal Fragmentation
Ask for 65-page chunk and get 128
page chunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Slab Allocation in Linux

A

• Second memory allocation: the slab allocator
• Takes chunks using the buddy algorithm, but carves slabs (smaller units)
from them, and manage smaller units separately.
• Object caches: dealing with creation/destruction of objects of certain
types
• Pointers to one or more slab, storing objects of the same type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Virtual Memory

A

Each process has separate virtual address space
Memory management main objective: Maps between virtual and
physical address spaces on per-process basis
• Processes use virtual addresses, which are translated by hardware into
physical addresses
• Typically done in hardware - a Memory Management Unit (MMU)
The mapping can change during the lifetime of a process execution
• If virtual address is more than physical memory, some parts of virtual
memory (particularly infrequently used ones) may be stored on disk
(swap file)
• Many addresses might not have a mapping to physical memory since it
has not yet been allocated or is spilled over to disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Virtual Addresses

A

Today, typically a virtual address is 32 or 64 bits
• 32 bits allows a process to have 232 GB or 4GB of virtual memory
• 64 bits will allow 264 bytes to be addressable
• Physical memory is usually smaller than this and varies from machine to
machine
• Virtual address spaces of different processes are distinct

Two ways to organize virtual addresses:
• Paging: address space divided into fixed-size pages
• Segmentation: address space divided into variable-size segments (logical
addressing units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Paging

A

Physical memory is divided into chunks called page-frames or frames
• On Linux, each page-frame is 4KB by default
Virtual memory is divided into chunks called virtual pages or pages;
size of a page is equal to size of a page frame
• In a 32-bit machine, 220 pages (a little over a million) in virtual memory
OS keeps track of mapping of pages to page-frames
Memory size calculations:
• 10-bit address : 1KB of memory; 1024 addresses
• 20-bit address : 1MB of memory; about a million addresses
• 30-bit address : 1 GB of memory; about a billion addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Virtual Page vs Offset

A

A virtual address is really a pair (p,o) of addresses
• Low-order bits give an offset o within the page
• This part stays the same after translation to physical address
• High-order bits specify the page p
• This is the part that gets translated
• E.g. If each page is 1KB and virtual address is 16 bits, then low-order 10
bits give the offset and high-order 6 bits give the page number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Virtual Address

Orderings

A

Within a page/frame, virtual
addresses map to a
contiguous section of physical
memory

But the page-frames
themselves can map anywhere
in physical memory
• Out of order within a process
• Interleaved with pages from
other processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

MMU and Page Table

A

Memory Management Unit (MMU) is a hardware component whose
job is to translate the page number p to a frame number f
• MMU is given virtual address (p,o) by CPU
• Translate to physical address (f,o)
• Request for (f,o) is sent on memory bus to fetch data from
memory

Mapping virtual p to physical f
• For every process, there is a page-table (an array of p to f
mappings)
• p is just an index into this array for the translation
• Page table is stored in physical memory (more on this later)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Typical Page Table Entries

A

Validity bit
• Set to 0 if the corresponding page is not in memory
Frame number
• Number of bits required depends on size of physical memory
Block number
• If page is not in memory but on disk, points to disk block number
Protection bits
• Read, write, execute accesses
Referenced bit
• Set to 1 by hardware when the page is accessed: used by page
replacement policy
Modified bit (dirty bit)
• Set to 1 by hardware on write-access: used to avoid writing when
swapped out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Swap Files

A

Starts off as empty

Created on demand as pages are spilled to disk
• Each page being swapped is dynamically assigned a block number in
swap file

When pages are freed, holes in swap file

Bitmap is used to identify holes for recycling to swapped pages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Design Issues

A

Are large page sizes or small page sizes better?
• Typically 1KB – 4KB, but more on the tradeoffs later

What if page table does not fit into memory?
• On 32-bit computers with 4KB page, there are over a million pages, so the
page table is an array with over million entries.
• On 64-bit computers with 4KB page sizes, not possible to fit page table
into physical memory
• Solutions: multi-level page tables; inverted page tables

Reading page table requires additional memory lookup
• TLBs (Translation Lookaside Buffers)

Reading pages not in physical memory
• This is called a page fault, and it traps to kernel
• Page replacement policy: try to pick a page least likely to be used in future
for storing on disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Multi-Level Paging
A page-table with 220 entries in memory is big • And in most architectures you need one per process! • Problem is worse on 64-bit computers Insight: Make page tables hierarchical, load only needed portions to memory Break virtual page number into 10-bit chunks: • First 10-bits index into a top-level page-entry table T1 (1024 entries) • Each entry in T1 points to another (second-level) page table with its own 1K entries (4 MB of memory since each page is 4KB) • Next 10-bits of virtual address index into the second-level page-table selected by the first 10-bits Saves memory • Total of 1K potential second-level tables, but many are likely to be unused • If a process uses 16 MB virtual memory then it will have only 4 entries in toplevel table (rest will be marked unused) and only 4 second-level tables
26
Translation Lookaside Buffers (TLB)
Problem: page tables are stored in main memory Access to main memory is slow compared to clock cycle on CPU (factor of about 10 – 10ns vs 1 ns) • An instruction such as MOVE REG, ADDR has to decode ADDR and thus go through page tables • This would make things very, very slow Standard solution: TLB in MMU • TLB stores small cache (say, 64) of page table entries to avoid the usual page-table lookup TLB is associative memory (content addressable memory) that contains, basically, pairs of the form (page-no, page-frame) • Special hardware compares incoming page number in parallel with all entries in TLB to retrieve page-frame • If no match found in TLB, we do standard lookup via main memory • TLB is like a hardware cache of page table for MMU
27
More on TLBs
Maximize TLB hit rate by keeping most recently used in TLB TLB needs to be kept in sync with page table • Modified and reference bits can be modified in TLB for performance, but later propagated to page table in memory • Likewise, changes from page tables also need to be propagated to TLB
28
Inverted Page Tables
When virtual memory is much larger than physical memory, overhead of storing page-table is high • For example, in 64-bit machine with 4KB per page and 256 MB memory, there are just 64K page-frames but 252 pages ! Solution: Inverted page tables that store entries of the form (page-frame, process-id, page-no) • First, try to find frame using inverted page table. If not found, search page table • Inverted page table bounded by physical memory size, i.e. 64K frames Problem: Given a page p, how to find the corresponding page frame? • Use a hash table Used more with 64-bit machines
29
Steps in Paging
Modern systems use TLBs and multi-level pagin Paging assumes special hardware support Overview of steps 1. Input to MMU: virtual address = (page p, offset o) 2. Check if there is a frame f with (p,f) in TLB 3. If so, physical address is (f,o) 4. If not, lookup page-table in main memory  Requires several slow accesses due to multi-level paging 5. If page is present, compute physical address 6. If not, issue page fault interrupt to OS kernel  Even slower 7. Update TLB/page-table entries (e.g. Modified bit)
30
First Attempts
Use reference bit and modified bit in page-table entry • Both bits are initially 0 • Read sets reference to 1, write sets both bits to 1 • Reference bit cleared on every clock interrupt (40ms) Prefer to replace pages that were unused in last clock cycle time period • First, prefer to keep pages with reference bit set to 1 • Then, prefer to keep pages with modified bit set to 1 Upon a clock interrupt, OS updates CPU-usage counters for scheduling in PCB as well as reference bits in page tables Easy to implement; needs additional rules to resolve ties
31
Queue Based Algorithms
FIFO • Maintain a linked list of pages in memory in order of arrival • Replace first page in queue • Easy to implement, but access info not used at all Modifications • Second-chance • Clock algorithm
32
Second Chance Replacement
• Pages ordered in a FIFO queue as before • If the page at front of queue (i.e. oldest page) has reference bit set, then just put it at end of the queue with R=0, and try again • Effectively, finds the oldest page with R=0, (or the first one in the original queue if all have R=1) • Easy to implement, but slow !!
33
Clock Algorithm
• Optimization of Second chance • Keep a circular list with current pointer • If current page has R=0 then replace, else set R to 0 and move current pointer
34
Least Recently Used (LRU)
Assume pages used recently will be used again soon • Throw out page that has been unused for longest time Past is usually a good indicator for the future. It adds significant overhead: • A timestamp for each memory access. Timestamp is updated in page table • Sorted list of pages by timestamp
35
How to Implement LRU?
Main challenge: How to implement this? • Reference bit provides only coarse granularity ranking Counter-based solution • Maintain a counter that gets incremented with each memory access • Copy the counter in appropriate page table entry • On page-fault pick the page with lowest counter List based solution • Maintain a linked list of pages in memory • On every memory access, move the accessed page to end • Pick the front page on page fault
36
Approximating LRU: Aging
Bookkeeping on every memory access is expensive Software solution: OS does this on every clock interrupt Every page-entry has an additional 8-bit counter Every clock cycle, for every page in memory: • Shift the counter 1 bit to the right copying R bit into the high-order bit of the counter • Clear R bit On page-fault, or when pager daemon wants to free up space, pick the page with lowest counter value Intuition: High-order bits of recently accessed pages are set to 1 (𝑖𝑡ℎ high-order bit tells us if page was accessed during 𝑖𝑡ℎ previous clockcycle) Remember whether a page was accessed within last 8 clock cycles Bias towards more recent access in ranking pages Potential problem: Insufficient info to resolve ties • Only one bit info per clock cycle (typically 40ms) • Info about accesses more than 8 cycles ago lost
37
Stack Algorithms
For an algorithm A, reference string r, and page-frames m, let P(r, m, A) be the set of pages that will be in memory if we run A on references r using m frames An algorithm A is called a stack algorithm if for all r and for all m, P(r, m, A) is a subset of P(r, m+1, A) • Intuitively, this means the set of pages that A considers relevant grows monotonically as more memory becomes available • For stack algorithms, for all r and for all m, F(r, m+1, A) cannot be more than F(r, m, A) (so increasing memory can only reduce faults!) ``` LRU is a stack algorithm: P(r, m, LRU) should be the last m pages in r, so P(r, m, LRU) is a subset of P(r, m+1, LRU) ```
38
Belady’s Anomaly
Let F(r, m, A) be the faults generated by A • Belady’s Anomaly: allocating more frames may increase the faults • F(r, m, A) may be smaller than F(r, m+1, A) FIFO is not a stack algorithm Implications: it is not guaranteed for all workloads that increasing physical memory will result in fewer number of page faults (i.e. improved performance)
39
Thrashing
Will the CPU utilization keep increasing monotonically as the degree of multiprogramming (number of processes in memory) increases? • No! It increases for a while, but then starts dropping again! • Why? With many processes around, each one has only a few pages in memory, so more frequent page faults, more I/O wait, less CPU utilization Bottom Line: Cause of low CPU utilization can be either too few or too many processes!
40
Paging in Linux
Paging is implemented by the kernel and a new process called page daemon (process 2) • Process 0 is the idle process • Process 1 is init Demand-Paged System: no pre-paging • Text segment and mapped files paged to respective files on disk • Everything else paged into swap area • Uses raw disk – bypass file system for performance and limits on file block sizes • Bitmap on paging device indicate which pages are free on disk to store swapped in-memory pages
41
Page Replacement in Linux
4 types of pages: • Unreclaimable – reserved or locked pages, kernel stack • Swappable – Must be written back to swap area when paged out (typical user-level pages, heap or stack) • Syncable pages – written to disk immediately if dirtied • Memory-mapped files • Discardable – can be reclaimed immediately • Unused pages Try to pick discardable, followed by swappable (that are not referenced recently, using a clock-like algorithm)
42
Page Replacement Policy
Two LRU lists: Active and Inactive • Reclaim from inactive list first • If page has not been referenced recently, move to inactive list • If page is referenced: • Set reference flag to be true • Move to active list next time it is accessed • Two page accesses to be declared active • If second access does not happen, reference flag is reset periodically • After two timeouts, move a page to inactive state
43
Page Replacement
We might page fault if: • A process is allocating new memory • A process accesses memory currently not in physical memory but previously swapped out to disk How to load a virtual page into physical frame during page fault: • If there is available space in physical memory, we just load the page from disk into a free frame • Otherwise, we have to select a page and swap it out (write page contents to disk) and use the freed frame to load in new page
44
Loading Page from Swap to Memory
When there is no free frames, OS needs to select a frame and move its page (the victim) to disk • Consult the page replacement rule to select a victim • Might belong to current process or might belong to another After a victim is selected: • If dirty bit is set, copy the victim page back to swap file • If dirty bit is not set, just evict the page to free up slot If victim page needs to be copied to swap file: • Copy to its previously allocated location in swap file • If victim has never been in the swap file, find a few blocks in swap file, copy the page to swap file, and note the assigned block number in that process’ page table • While a page is copied to disk, current process is blocked After frame is freed up: • Locate block number of new page to load from swap file and schedule read transfer request • Block current process until complete
45
Page Replacement Rule
If a swapped page is used again in near future, it will be copied backand- forth physical memory Performance influenced by which page we pick • Avoid swapping actively / recently used pages to disk • Identify “inactive” pages not likely to be used in the near future Optimal Page Replacement rule (OPT): • Pick the page to be accessed farthest in the future • Can’t actually do this, of course Real replacement rules aim to approximate OPT • Use the past to predict the future
46
Virtual Memory Performance
Applications work well if their heavily used virtual memory pages are in physical memory and swapping is minimized High penalty for page faults • TLB lookup (small cost) << Page table lookup (medium cost, a few memory references) << Disk access (high cost, writing and reading) Page faults is inevitable since virtual addresses are larger than physical addresses Two factors influence whether a page is likely to be in memory: • Amount of real memory • Replacement policy
47
I/O Devices
* Storage devices * Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD… * User interaction * Keyboard, mouse, monitor, sound card, printer, modem …. * Significant fraction of OS code deals with I/O * Device drivers
48
Device Controllers
• Devices have two parts • Physical / mechanical • Electronic / interface • Usually including a controller • For most devices, controller’s basic job is to convert serial bitstream (to/from device) into blocks (to/from CPU) • Also error handling using error correction codes • Controller interface • Device registers (status and other control functions) • Data buffers
49
How do Processes Access Devices?
* Two basic approaches * Port-mapped I/O * IN register, port * Reads value from I/O port into register * Memory-mapped * Part of virtual address space really goes to devices
50
Memory Mapped I/O
• In a machine instruction “MOV R, L”, L can refer to: • A location in main memory, or • A control register, or address in data buffer, of a specific I/O device • Memory Mapped I/O: • I/O space viewed as part of virtual memory • Directly access control registers or data buffers by reading from memory
51
Naïve Busy Wait Device Driver
* Suppose we want to send N byte buffer to some device (say, the printer) * Device has two registers (mapped as memory locations) * Status, which we read, to learn if device is ready * Data, which we write, to send byte to device * The driver busy wait on status: ``` for (i=0; i< N; i++) { /* wait for device to be free */ while (*Status != READY) ; *Data = buffer[i]; } ```
52
Device Driver with Interrupts
* Driver and interrupt handler * Driver copies a single byte into buffer and schedules interrupt: while (*Status != ready) ; *Data = buffer[0]; /* 1 byte */ InvokeScheduler(); • Interrupt handler wakes up and reads data: ``` if (N == 0) Wakeup(); /* unblock */ else { *Data = buffer[i]; N--; i++; } ResetInterrupt(); /*Notify device (ack) */ ```
53
I/O with Direct Memory Access (DMA)
• Problem: CPU is kept busy while device driver is writing data to external device • DMA controller: • Separate hardware co-processor dedicated for I/O • Co-processor helps the CPU copy data between memory and device • Frees up the CPU to do other work • DMA controller interrupts CPU only after the entire transfer • DMA controller can be integrated with device controller, but usually one (on parentboard) for all devices
54
Reading Disk Block without DMA
• If DMA is not used: • Disk controller reads each disk block from the drive serially, byte by byte, until the entire block is in the controller’s internal buffer • Disk controller computes checksum to verify no read errors occur • Controller causes an interrupt. • OS’s interrupt handler read disk block from controller’s buffer a byte or a word at time (in a loop), and storing it in main memory • During this entire time, the CPU is kept busy copying from controller buffer to main memory
55
Device Drivers
• Device drivers are the interface between the OS and a device • Whenever a new device is plugged into a computer, the OS installs new device drivers • Device Driver includes device-specific interrupt handlers • Device-specific I/O routines • Device-specific control routines (ioctl) • Device drivers may be hard-coded part of OS kernel, dynamically loaded, or (sometimes) part of a user process • Depends on OS and kind of device
56
Device Drivers
``` Logical positioning of device drivers. In reality all communication between drivers and device controllers goes over the bus. ```
57
I/O Software Layers
• Device drivers: Starts I/O operation and blocks until I/O completes • Typically structured as kernel processes, with own states, stacks, and program counters • Interrupt handler: unblocks driver typically using semaphores or condition variables • Setting up a context for interrupt service procedure • Running the interrupt service procedure which extracts information from device controller register • Unblocking device driver to run • Upon completion, context switch to new process (which may be high priority process which blocked in I/O previously)
58
Kinds of Devices
• OS-related devices • Clock interrupt, etc, • User related devices: • Keyboard, mouse • Shared by many processes: • Disks, network interfaces, etc. • Requires demultiplexing, i.e. relaying data between device and the appropriate process • Block vs character devices: • Block devices (e.g. disks, file systems) • Character devices (keyboards, printers)
59
Naming Devices
• Even with memory mapped I/O, it is difficult for processes to track the addresses used for communicating with the various devices installed on a system • Instead, many operating systems have a notion of a device namespace that processes use to refer to devices • Manipulate these devices by opening, reading/writing to file abstractions • For example, the first printer might be named /dev/printer1
60
Sharing Devices Among Processes
• Some devices might be used by more than one process at a time • For example, multiple processes writing to same disk • Solutions: Have all access go through a subsystem that translates requests, schedules I/O and enforces order • Might be in OS kernel or in a user process • For example, file system manages access to disks
61
Socket API
• Applications uses the Socket Application Programming Interface (API) to use the network
62
Sockets
* Well-known OS abstraction denoting an endpoint of a connection * A socket is associated with each end-point (end-host) of a connection * Identified by IP address and port number * Berkeley sockets is the most popular network API * Runs on Linux, FreeBSD, OS X, Windows * Feed off popularity of TCP/IP
63
Naming and Addressing
* IP address * Identifies a single host * 32 bits (IPv4), 128 bits (IPv6) * Written as dotted octets (8 bits) * e.g., 0x0a000001 is 10.0.0.1 * Host name * Identifies a single host * Variable length string * Maps to one or more IP address * e.g., www.upenn.edu * gethostbyname() translates name to IP address
64
Knowing What Port Number To Use
* Popular applications have well-known ports * E.g., port 80 for Web and port 25 for e-mail * See http://www.iana.org/assignments/port-numbers * Well-known vs. ephemeral ports * Network service has a well-known port (e.g., port 80) * Between 0 and 1023 * Unused ephemeral (i.e., temporary) port * Between 1024 and 65535
65
Byte Ordering
increasing memory addresses address A +1 address A little-endian - high-order byte/low-order byte 16-bit value (network byte-order) big-endian - low-order byte/high-order byte
66
Byte Ordering Solution
``` uint16_t htons(uint16_t host16bitvalue); uint32_t htonl(uint32_t host32bitvalue); uint16_t ntohs(uint16_t net16bitvalue); uint32_t ntohl(uint32_t net32bitvalue); • Use for all numbers (int, short) to be sent across network • Typically used for port numbers ```
67
Types of Sockets
* Different types of sockets implement different service models * Stream vs datagram * Stream Socket (aka TCP) * Connection-oriented (includes establishment + termination) * Reliable, in order delivery * At-most-once delivery, no duplicates * E.g., ssh, http * Datagram Socket (aka UDP) * Connectionless (just data-transfer) * “Best-effort” delivery, possibly lower variance in delay * E.g., IP Telephony, streaming audio • Stream Sockets • No need to packetize data • Data arrives in the form of a byte-stream • Receiver needs to separate messages in stream • Datagram Sockets • User packetizes data before sending • Maximum size of 64Kbytes • “Hello there!” and “I love programming” will definitely be sent in separate packets at network layer
68
Stream Sockets
* Implements Transmission Control Protocol (TCP) * Does NOT set up virtual-circuit! * Sequence of actions:
69
Initialize (Client + Server)
int s = socket(PF_INET, SOCK_STREAM, 0)) <0); • Create a socket • int socket(int domain, int type, int protocol) • Returns a descriptor (or handle) for the socket • Arguments: • Domain: protocol family • PF_INET for the Internet • Others: PF_INET6, PF_PACKET, PC_APPLETALK • Type of communication: • SOCK_STREAM: reliable byte stream • SOCK_DGRAM: message-oriented service • Protocol: normally only one protocol for each type of communication within each protocol family. So we normally put it 0
70
Client: Connecting Socket to the Server
if(connect(s, (struct sockaddr *)&sin, sizeof(sin))<0) { perror("simplex-talk: connect"); } • Establishing the connection • int connect(int sockfd, struct sockaddr *server_address, socketlen_t addrlen) • Arguments: socket descriptor, server address, and address size • Returns 0 on success, and -1 if an error occurs • Translating the server’s name to an address • struct hostent *gethostbyname(char *name) • Argument: host name (e.g., “www.cnn.com”) • sockaddr: AF_INET, host address, htons(PORT)
71
Client: Sending and Receiving Data
* Sending data * ssize_t send(int sockfd, void *buf, size_t len, int flags) * Arguments: socket descriptor, pointer to data buffer, buffer size, and flags * Returns the number of characters written, and -1 on error * Receiving data * ssize_t read(int sockfd, void *buf, size_t len, int flags) * Arguments: socket descriptor, pointer to data buffer, buffer size, and flags * Returns the number of characters read (0 = “end of file”), and -1 on error
72
Disk Blocks
Basic unit of disk storage is the “block” • Disk controller provides view of disk as array of blocks • Some FSs support several different block sizes • Block size based on hardware sector size Reads and write are of an entire block at a time Typical block size: 2k-16k bytes Tradeoff: • If blocks too small, more data to manage, slow to write several blocks at a time • If blocks too big, wasted space
73
Disk Layout
Disk drives typically divided into one or more logical drives, called “partitions” • Partition Table at fixed place on disk gives layout Each partition may contain • Boot block (loads OS in kernel space) • Superblock (contains basic info about file system which is read into memory at boot time) • Free space data structure • List of I-nodes (or other data structure) giving info about individual files (more on this later…) • Allocatable blocks for directory and file data
74
The Classic Unix File System
• Each disk partition has 4 Regions • Block 0: boot block • Block 1: super block. Contains the size of the disk and the boundaries of the other regions • Free space management: meta data indicating where free blocks are • I nodes: list of i nodes, each is a 64 byte structure • Root directory block: for bootstrapping directory traversal • Free storage blocks for the contents of files and directories Each i node contains owner, protection bits, size, directory/file and 13 disk addresses. • The first 10 of these addresses point directly at the first 10 blocks of a file. If a file is larger than 10 blocks (5,120 bytes), the 11th points at a block for secondary indexing (single indirect block)
75
Disk Space Allocation
``` Similar to memory allocation problem • How to represent free/allocated blocks • How to allocate & free up blocks ``` ``` Some practical differences, however • Performance of disk isn’t like memory • Access patterns may be different ``` ``` Goals • Fast sequential access • Fast random access • Ability to dynamically grow • Minimize fragmentation ```
76
Allocation Schemes
``` • Contiguous allocation (fixed) • CD ROMs, etc. • Linked list allocation • Linked list with file allocation table (FAT) • MS DOS FAT File System • Linked list with Indexing (I nodes) • Unix File System ```
77
Linked List Allocation
• Natural idea: linked list of blocks for each file • Directory entry of a file points to first block, and each block begins with a pointer to next block • No external fragmentation • But random access is slow • To get to block n, n 1 blocks have to be read up front
78
Linked List via | File Allocation Tables (FAT)
• Problem: pointers in (slow) disk data blocks • Solution: put them somewhere else • FAT is an array indexed by blocks • Each array entry is a pointer to next block • Copy of FAT in main memory • Random access still requires traversing the chain of pointers, however, they are in main memory and no disk I/O is needed • Problem: the entire FAT must be in memory, and as disk size increases, so does FAT • 20GB disk with 1K block size needs 20 million entries, which requires entry sizes of minimum 3 bytes, which is results a FAT of 60MB
79
Directories
``` A directory entry provides the info needed to find the disk data blocks of a file • Disk address of first block and size • Or address of first block • Or number of associated i-node ``` File attributes can be stored in the directory entry (Windows) or in its i-node (Unix) File name is usually its directory entry • May be fixed size or variable length • e.g., Unix supports long file names (255 chars)
80
Implementing Directories
* (a) A simple MS-DOS directory * Fixed size entries * Disk addresses and attributes in directory entry • (b) Unix directory in which each entry has name and pointer to file’s i-node
81
Directory Entry
• Each entry has file name and an I I-node number • I-node contains all the attributes • Restriction: File name size is bounded (14 chars); newer versions use a heap to allow longer names
82
Traversing Directory Structure
• Suppose we want to read the file / usr /astast/mbox • Location of a i-node, given i-number, can be computed easily • I-number of the root directory known (usually 2) • Read in i-node 2 from the disk into memory • Find out the location of the root directory file on disk, and read the directory block in memory • If directory spans multiple blocks, then read blocks until usr found • Find out the i-number of directory usr , which is 6 • Read in i-node 6 from disk • Find out the location of the directory file usr on disk, and read in the block containing this directory • Find out the i-number of directory ast (26), and repeat
83
Links
May need more than one name for a file • May be in a different director Hard link - early binding • Pointer to i-node added to the directory entries • Link counter in the i-node linked to incremented • I-node can only be deleted (data addresses cleared) if counter goes down to 0 (why?) • then space can be reclaimed ``` Soft link (symbolic) - late binding • New special file created of type LINK, which contains just the path name • New file also has its own i-node created ```
84
Managing Free Disk Space
Familiar problem: keeping track of unallocated blocks Linked Lists • A list of blocks containing addresses of free blocks • e.g., each block address is 32 bits (4 Bytes) • A 1KB block used can hold addresses of 255 free blocks and an address of the next block in the list for free space management • Note: list can be stored in free blocks themselves Bitmap method: • An array of bits, one entry per block, indicating whether that block is free or not • 16 -GB disk has 16 million 1 KB blocks • Storing the bitmap requires 16 million bits, or 2048 blocks
85
caching
• Since disk I/O is slow, a part of memory is reserved to hold disk blocks that are likely to be accessed repeatedly • Basic data structure: • Maintain a doubly linked list of blocks • Use hashing so that a block can be quickly accessed from its address • Efficient LRU implementation possible: every time a block is accessed move it to the back of the list (no special hardware needed) Caching Rear (MRU)
86
Cache Writing Strategies
If there is a crash, modified blocks in cache will be lost • Especially bad for blocks containing i-nodes or FAT • And LRU guarantees loss of the exactly the worst blocks! MS -DOS: write write-through cache • Every time a block is modified, a request to write it to disk is issued • Loop with putchar () generates busy disk I/O • Reads can still be fast
87
File System Backup
``` • Disks aren’t very reliable • Backup purpose • Recover from (natural) disaster • Recover from error • Recover from attack • Backup tools • Full dump • Incremental dump • Physical dump • Logical dump ```
88
Logical Dump
A file system to be dumped • Squares are directories, circles are files • Shaded items, modified since last dump • Each directory & file labeled by i-node number
89
Logical Dump Bitmaps
(a) all directories and modified files (b) all modified files, their parent directories and modified directories (c) all directories to be dumped (d) all files to be dumped
90
File System Consistency
``` • Disk I/O is buffered (delayed write) • There may be a crash before modified blocks in memory are written back to the disk • File system might become inconsistent • fsck • scandisk • Block consistency a block is either used (listed in i node) by a file or is free (listed in free list) • File consistency a file has the same number of directory entries (i.e. the same number of hard links) as the link count in its i node ```
91
Overview of Virtualization
• Allow single physical machine can run multiple operating systems concurrently, each in its own virtual machine , or VM • Key terminologies: • Host underlying hardware system • Virtual machine manager (VMM) or hypervisor creates and runs virtual machines on the host machine • Guest process that usually runs an operating system over the VMM
92
Types of Hypervisors
Many different types: Type 0 Hypervisors - Hardware-based solutions that provide support for virtual machine creation and management via firmware. Early IBM approach. Type 1 Hypervisors – Operating systems that provide standard operating system functions integrated with VMM functions • Including VMware ESX, Citrix XenServer. Microsoft Windows Server with HyperV and RedHat Linux with KVM Type 2 Hypervisors - Applications that run on standard operating systems but provide VMM features to guest operating systems • Including VMware Workstation and Fusion, and Oracle VirtualBox
93
Other VMM Variations
Paravirtualization - Technique in which the guest operating system is modified to work in cooperation with the VMM to optimize performance • Programming-environment virtualization - VMMs do not virtualize real hardware but instead create an optimized virtual system • Used by Oracle Java and Microsoft.Net • Emulators – Allow applications written for one hardware environment to run on a very different hardware environment, such as a different type of CPU • Application containment - Not virtualization at all but rather provides virtualization-like features by segregating applications from the operating system, making them more secure, manageable • Oracle Solaris Zones, BSD Jails, IBM AIX WPARs, Linux containers (LCX)
94
Benefits of Virtualization
• End users: run different OSes on a single machine without rebooting or dealing with different partitions • VMs isolated from other VMs • Allow for sandbox testing (experimental operating systems or those with security vulnerabilities) • Data center management in cloud computing era: • Freeze, suspend , running VM • Then can move or copy somewhere else and resume • Periodic backups for restoring to consistent state • Some VMMs allow multiple snapshots per VM • Load balancing • Migrate VMs are heavily loaded to less loaded machines • Templating create an OS + application VM, clone multiple copies easily
95
Building Blocks
• Virtual CPU (vCPU) • Represent state of CPU per guest as guest believes it to be • When guest context switched onto CPU by VMM, information from vCPU loaded and stored • Analogous to the PCB, but for entire virtual machine state • Several techniques, as described in next slides
96
OS Component | CPU Scheduling
• VMM configures each guest OS with certain number of VCPUs • Can be adjusted throughout life of VM • Ideal case: each vCPU is tied to one actual physical CPU. • Non ideal case: multiple VCPUs are mapped to each physical CPU. • Overcommitment scenario: • VMM use standard scheduling algorithms to decide which vCPU to run next
97
Shadow Page Tables
• Guest OS creates page tables, but these are not used by actual hardware MMU • VMM manages the real page tables, one set per virtual machine. These are called shadow page tables • Shadow page tables map from guest virtual address to actual physical addresses, bypassing guest physical address • These tables are loaded into the MMU on a context switch • Advantage: • Replace gva  gpa and gpa  to hpa translation to just gva  hpa • Maintaining shadow page table: • VMM maps guest OS page table as read only (identify these pages and mark them as read only) • When guest OS writes to guest page table, traps to VMM • VMM applies write to shadow page table and actual OS page table
98
Types of VMs | Paravirtualization
• What if we can modify the guest operating system to take advantage of VMM? • VMM provides services that guest must be modified to use • Leads to increased performance • Less needed as hardware support for VMs grows • Xen, leader in paravirtualized space, adds several techniques • For example, clean and simple device abstractions • Efficient I/O • Good communication between guest and VMM about device I/O • Each device has circular buffer shared by guest and VMM via shared memory
99
Types of VMs | Containers
• What if we care more about isolation and supporting multiple tenants, and less about running different operating systems on the same physical machine? • Retain desired aspects of VMMs: segregation of apps, performance and resource management, easy start, stop, move, and management of them • Forgo running different operating systems • Less performance penalty for running full fledged VMMs
100
Programming Environment Virtualization
• Java Virtual Machine (JVM) as classic example • Java: Popular language / application environment invented by Sun Microsystems in 1995 • Write once, run anywhere • Java programs are compiled into architecture neutral bytecode ((.class ) which JVM loads and runs • JVM can be operating system specific, but bytecode is common across all patforms