Final Flashcards

Question

What’s specifically done in Sun RPC for these design points – you should easily understand this from your project?

Answer 1

TODO (P4L1)

Answer 2

TODO (P4L1)

Answer 3

1. Upload/download model Client downloads file, makes changes, uploads it to server. + simple + local reads/writes at client - entire file download/upload even for small accesses - server gives up control 2. True remote file access model Every access to remote file goes through server + file access is centralized so maintaining consistency is simple - every file operation pays a network latency cost - limits scalability b/c everything must go through server 3. A compromise P4L2

Answer 4

TODO (P4L2)

Answer 5

Cache Line Granularity : Too fine. Coherence traffic overwhelms any benefit Variable Granularity: More sense for programmer, still high overhead Page Granularity: Makes more sense to OS. Must be aware of false sharing, when a shared page is being used in 2 different areas in the page. Object Granularity: Requires language runtime support. OS doesn't care. Not as generalizable P4L3

Answer 6

A "home node" keeps state about a page and is responsible for cache coherence for that page--unless another node is operating a lot on the page. In this case, it can become the "owner" and take over responsibility for state and drive cache coherence (saves on constantly talking to the "home node." The owner can change over time but the home node never does, but it keeps track of who the current owner is. The global index structure keeps track of which manager is responsible for each page. The global index structure is *replicated* across all nodes. The local index structure can be *partitioned* such that it is only available on its node rather than all nodes. P4L3

Answer 7

- each node contributes part of MM pages to DSM - need local caches for performance (latency) - all nodes responsible for part of distributed memory - home node manages access and tracks page ownership - explicit replication possible for load balancing, performance, or reliability P4L3

Answer 8

A consistency model is a guarantee about how state changes will behave (access ordering, visibility / propagation) Strict - Updates visible everywhere immediately. In distributed systems, latency and message reorder/loss make this impossible to guarantee. Sequential - Updates from different processes may be arbitrarily interleaved - All processes see the same interleaving (though may not reflect issue order) - Operations from the same process will always appear in order they were issued Causal - Causally related writes maintain order - Writes from a single processor maintain order - "Concurrent" writes no guarantees about order Weak - All updates prior to a sync will be visible - No guarantee what happens in between

Answer 9

Homogeneous + simple front-end (doesn't have to keep track of node specialization) + easy to scale. just add more nodes - unable to take advantage of locality & caching Heterogeneous + complex front-end (has to keep track of node specialization) - complex to scale. which nodes? can get hot spots & bottlenecks - able to take advantage of locality & caching P4L4

Answer 10

Infrastructure as a Service - You manage everything including the virtualization Platform as a Service - We virtualize servers for you, and you get to manage the OS and everything above it Software as a Service - You just manager your software and disk usage - the OS, virtualization, etc is done by the provider P4L4

Answer 11

Practical Scalability: • economies of scale - you buy a single set of hardware, then you can virtualize the servers and the network. • You can host multiple tenants in a single location, on a single machine even. • You can easily provision new systems as needed, or bring them down Failures: Cloud services integrate many components, each with their own percentage of failure. These failures compound and create a system that is more likely to fail the more it scales. This means developers have to expect failures and write robust code that handles these failures. P4L4

Answer 12

When a process accesses a virtual memory address that is not present in physical memory (presence bit indicates this) the hardware MMU generates a fault, which traps into the kernel. The OS then determines where the page is (e.g., on disk), issues an I/O operation to bring it into physical memory. The OS will then determine a free frame where the page can be placed and use the page frame number to update the page table entry that corresponds to the virtual address of the page. The OS then resets the process's program counter so the access request is made again, but this time succeeds. P3L2

Answer 13

Page replacement happens when memory usage is above a certain threshold and CPU usage is below a certain threshold. A page replacement algorithm determines which pages should be swapped out (e.g., Least Recently Used uses history-based prediction to swap out the page that was accessed the longest time ago as indicated by the access bit) P3L2

Answer 14

When a process tries to access a page not present in physical memory the MMU throws an exception, which traps into the kernel. The OS then determines the exception is due to a page fault, and ascertains the reason for the page fault (e.g., page not present in physical memory vs inadequate permission). If the fault is due to the page not being present in physical memory, it determines where the page is (e.g., on disk), issues an I/O request, finds a page frame for the data, updates the page entry, resents the process's program counter, and gives control back to the process so it can proceed. When a process tries to access a page that hasn't been allocated to it the MMU again throws an exception, and the OS determines it's a page fault due to a permission error. At this point, I assume the process would be terminated? P3L2

Answer 15

When a new process is spawned its entire contents needs to be copied, but there's a lot of static data so to save space the OS starts by simply pointing the child's virtual address to the same memory location of the parent but write protecting the memory. Then, if/when either process attempts to write, only then is the content copied to its own memory location. This allows us to save on space/time if the process is only performing reads. P3L2

Answer 16

Copying requires crossing the user/kernel boundary. Costs CPU cycles. Mapping requires mapping the virtual address to the physical address of the shared memory region. Costs CPU cycles. Mapping happens once and can be reused many times so it can have a good payoff. Also, mapping can have a good payoff even if only used once if a large quantity of data is involved. P3L3

Answer 17

In message-based IPC the OS sets up the channel, determines the protocol, and handles synchronization. It's simple, but it's expensive b/c every message requires a system call (crossing the user/kernel boundary). In shared memory-based IPC the OS sets up the channel, and everything else is up to the programmer. It's more complex for the programmer, but it can be cheaper b/c the OS is out of the way. P3L3

Answer 18

PIO - doesn't require special hardware support - CPU interacts with command and data registers - requires many CPU store instructions - better for small transfers DMA - does require special hardware support (DMA controller) - CPU interacts with command registers and DMA controller - requires one CPU store instruction - better for large transfers P3L5

Answer 19

An inode holds the index of all blocks for a file on disk. It contains: - metadata - pointers to blocks (12 direct, 1 single indirect, 1 double indirect, 1 triple indirect) A single indirect pointer points to a block of pointers that point to data blocks. A double indirect pointer points to a block of pointers that point to blocks of pointers that point to data blocks. And so on for the triple indirect... This allows the inode data structure to remain small while supporting large file sizes. To determine the max file size supported first find the num ptrs per block = size of block / size of block ptr then: max file size = direct ptrs (12) + single indirect ptrs (num ptrs per block) + double indirect ptrs (num ptrs per block ^ 2) + triple indirect ptrs (num ptrs per block ^ 3) x size of block P3L5

Answer 20

1. Bare-metal or Hypervisor-based (type 1) VHH (hypervisor) manages all hardware resources and supports execution of VMs privileged, service VM to deal with devices (and other configuration and management tasks) 2. Hosted (type 2) ``` host OS owns all hardware special VMM module provides hardware interfaces to VMs and deals with VM context switching ``` P3L6

Answer 21

When the guest OS is rewritten in such a way that it knows its virtualized (i.e., it's rewritten to make direct calls to the hypervisor). P3L6

Answer 22

A stateless design means the server doesn't keep any info about what's happening. + resilient (on failure just restart) + no resources used to maintain state - unable to support caching (need state for that) - every request must be self-contained (more bits transferred to describe request) A stateful design means the server keeps information about what's cached / accessed + supports caching, locks, and incremental operations - needs checkpointing to recover from failure - needs resources to maintain state P4L2

Answer 23

Caching + Clients can locally perform operations on cached state (e.g., open/read/write) - coherence mechanism is require to keep the cached portions of files consistent with the server representation Replication + load balancing + availability + fault tolerance - writes become more complex (synchronously write to all, or write to one then propagate to others) - replicas must be reconciled (e.g., voting) - scalability (machines only get so large) ``` Partitioning + availability vs single server design + scalability w/ file system size + single file writes are simple - on failure, lose portion of data - load balancing harder; if not balanced, then hot spots possible ``` P4L2

Answer 24

- Virtualization - Resource provisioning (scheduling) - Big Data processing & storage - Software defined slices of resources (networking, storage, data centers) - Monitoring => real time log processing P4L4

Final Flashcards

(48 cards)