I/O Avoiding Algorithms Flashcards
What is an I/O Avoiding Algorithm?
Given a machine with a two-level memory heirarchy, I/O Avoiding Algorithms minimize the number of slow-fast memory transfers. They assume these transfers are the dominant cost.
Handy log fact for converting log bases: log_(z/L)(x) = ?
(that’s log base z/L)
log_(z/L)(x) = log_2(x) / log_2(z/L)
Describe “External Memory Merge Sort”, i.e., merge sort on n elements using a 2-level memory hierarchy and a sequential processor.
Phase 1:
- Divide input into chunks less than or equal to Z (the number of items that can fit in fast memory). Each chunk should be size f * Z, where f is in range [0, 1)
- for each chunk:
- Load each chunk into fast memory and sort, resulting in a sorted “run”
- Write the sorted run back to slow memory
- Repeat this over all n/(f * Z) chunks
Costs of Phase 1:
- Comparisons = O(n * log_2(z))
- Transfers = O(n/L)
Phase 2:
- You start phase 2 with m sorted runs in slow memory, each containing s items.
- Big picture: pair these runs up and merge the pairs, do this again (and again) until they are merged.
- At each level k, the size of each run is 2^k * s
- Detailed picture: For each merge, you need three L-sized buffers in fast memory, A_hat, B_hat, and C_hat, two runs to be merged, A and B, and an output location for the merged run, C.
- A and B are size 2^(k-1) * s and C is 2^k * s (twice as large)
- Load one L-size chunk from run A into A_hat, one chunk from run B to B_hat
- Merge elements from A_hat and B_hat, storing in C_hat. Continue as long as there is room in C_hat and you still have elements to sort in A_hat and B_hat
- When you empty A_hat or B_hat, refill from slow memory
- If you fill C_hat, store to slow memory and flush it.
- If you exhaust A or B, just copy remaining elements from the other.
Cost to merge A and B in phase 2:
- You load each element from A and B only once, so that costs 2^(k-1) / L + 2^(k-1) / L transfers
- You write each element to C only once, so we need 2^(k) / L transfers
- Hence Transfers = 2^(k-1) / L + 2^(k-1) / L + 2^(k) / L = 2^(k+1) / L
- Comparisons are linear, so Comparisons = theta(2^k * s)
Cost to merge all pairs in phase 2:
- Number of merged pairs per level is n / (2^k * s), i.e., total number of input items divided by run size at level k. (these WERE pairs at level k-1, but now they’ve been merged. that’s why it equals the number of runs at level k!)
- Number of levels is log_2(n / s), i.e., log base 2 of total number of elements over divided by the size of each original run).
- Total costs of merging all pairs in phase 2:
- Transfers = 2 * (n/L) * log_2(n / s), i.e., 2 * (number of transfers needed to load all elements) * (number of levels in the tree)
- Comparisons = theta(n * log_2(n / s)), i.e., (number of elements to be compared) * (number of levels in the tree)
Costs of Phase 2:
- Comparisons = O(n * log_2(n/z))
- Transfers = O(n/L * log_2(n/z))
(Where do those z’s come from?!)
Combined Costs of Phases 1 and 2:
- Comparisons = O(n * log(n)), i.e., nlogz + nlog(n/z) =nlogz + n[logn - logz] = nlogz + nlogn - nlogz = nlogn!
- Transfers = O(n/L * log(n/z)), i.e., Phase 2 dominates so it’s the same as Phase 2
What are the asymptotic runtimes of the following operations on a priority queue implemented as a MIN HEAP:
- Build the heap
- Extract the min
- Insert a new item
Build: O(k)
Extract min: O(log k) (need to re-heap after removing min)
Insert: O(log k) (similar to extract min, we have to re-heap)
WHAT IS K?
Using multiway merging, what is the cost of of 1 k-way merge? (transfers and comparisons)
Transfers = 2ks / L
- ks = number of elements transferred into memory
- L = the size of each transfer
- ks / L = the number of transfers necessary to bring in all data that will be merged
- Then we multiply by 2 because we’re going to write the same number of elements back out to slow memory after sorting
Comparison = O(k + ks log k)
- O(k) = time to build the heap
- There are ks items to be merged
- Each of these ks items must be inserted or sorted once, hence ks log k
What is the total cost of multiway merging (in both comparisons and transfers)?
Comparisons = O(n log n)
- “matches what you would expect for any comparison-based sort”, whatever that means
Transfers = O(n/L log_(z/L) n/L)
- Maximum number of levels in merge tree is l = Theta(log_(z/L) n/L)
- Each run at level i is k^i * s items long (where k is the number of ways we merge)
- Transfers per run at level i = Theta((k^i * s) / L)
- Number of runs at level i = n / (k^i * s)
- Transfers per run times number of runs gives number of transfers at level i = n / L
- Number of transfers per level times the number of levels = n/L * log_(z/L) n/L
How good is the transfer complexity of multiway merge sort, O(n/L log_(z/L) n/L)?
As good as it gets! i.e., meets the lower bound on external memory sorting
What is the lower bound on external memory sorting?
Time t >= n/L log_(z/L) n/L
How do we find the lower bound on external memory sorting?
- Before seeing the data (size n), there are n! possible ways in which it might be ordered (and we’re looking for one of them)
- When you read data from slow memory in general, you learn something about the possible orderings, reducing their number
- So, let K(t-1) := # orderings after t-1 transfers
- Let K(0) := n!
- At step t, load L items into fast memory. There are L! ways for them to be ordered
- Assuming we have other items in fast memory already, and they’re already ordered, how many ways are there to order Z-L old items plus L new items? At most z-choose-L * L! (this is the factor by which the number of possible orderings might decrease)
- After t reads, K(t) >= K(t-1)/[z-choose-L * L!]^t = n!/[z-choose-L * L!]^t
- The L! terms assumes we’ve never read these L items before. This won’t always happen because you can only do n/L reads of unseen items. So, we can change the exponent on that term: K(t) >= n!/[z-choose-L]^t * [L!]^(n/L)
- Now, when does this equal 1 (when does only one ordering remain)? n!/[z-choose-L]^t * [L!]^(n/L) <= 1 IMPLIES t >= n/L log_(z/L) n/L
What two approximations do we need to get the lower bound on external memory sorting?
- Stirling’s Approximation: log(x!) ~= x log x
- Also: log(a-choose-b) ~= b log (a/b)
How many transfers are necessary for binary search on a sorted array, and why? (Big-O)
The array is stored entirely in slow memory.
O(log (n/L))
MY explanation:
Binary search requires log n comparisons. For each comparison, we have to load an L-sized block of data until, at some point before the log(n) comparison, we load a block that contains all of the items we’re still considering.
At this point, no more transfers are needed! It will take log(L) comparisons to find the target within the block we just loaded, so we can subtract log(L) from log(n) to get the number of transfers needed. log(n) - log(L) = log(n/L).
LECTURE explanation:
We start with the same idea: at some point we will load a block of size L that contains all remaining items. We can write a recurrence to describe what happens:
if n > L, Q(n; Z, L) = 1 + Q(n/2; Z, L)
if n <= L, Q(n; Z, L) = 1
The answer to this recurrence (apparently) is O(log (n/L)). (why???)
We’re performing binary search on an array of n items. Note the size of index i is at max, log n, i.e., it’s O(log n).
Let x(L) := the maximum number of bits “learned” on one L-sized transfer.
If we get an upper bound on x(L), we can get a lower bound on the number of I/O transfers during search: Q(n; Z, L) = Omega(logn / x(L)).
What is x(L) in a Big-O sense?
O(log L)
When we load L items into fast memory, we see whether our target is in the block, before the block, or after the block. In the before or after case, we just learned L places that the target item is not, which tells us log L bits of the target item’s index.
Which of these data structures can achieve the lower bound on binary search in terms of transfers?
- Doubly linked list
- Binary search tree
- Red-black tree
- Skip List
- B-tree
B-tree, but only if you choose B so that B = Theta(L)
Describe a doubly linked list. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?
A doubly linked list is a linked list with pointers to the nodes BOTH before and after the current node.
No, it cannot achieve the lower bound on binary search in terms of memory transfers. We can only transfer one node at a time into memory, because we need that node to find the others (we need the pointers!).
Describe a binary search tree. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?
A binary search tree satisfies two properties:
1) It’s a binary tree, i.e., all nodes have at most two children.
2) Node y is in the left subtree of x, then y.val <= x.val. If y is in the right subtree of x, then y.val >= x.val.
NOT SURE BUT, I believe we need to read a node into fast memory to get its pointers and retrieve its child nodes. So it’s impossible to load L nodes of a binary search tree at once, a necessary step in achieving the lower bound on binary search (in terms of transfers).