I/O Avoiding Algorithms Flashcards by Robert Lindgren

What is an I/O Avoiding Algorithm?

Given a machine with a two-level memory heirarchy, I/O Avoiding Algorithms minimize the number of slow-fast memory transfers. They assume these transfers are the dominant cost.

How well did you know this?

Not at all

Perfectly

Handy log fact for converting log bases: log_(z/L)(x) = ?

(that’s log base z/L)

log_(z/L)(x) = log_2(x) / log_2(z/L)

How well did you know this?

Not at all

Perfectly

Describe “External Memory Merge Sort”, i.e., merge sort on n elements using a 2-level memory hierarchy and a sequential processor.

Phase 1:
- Divide input into chunks less than or equal to Z (the number of items that can fit in fast memory). Each chunk should be size f * Z, where f is in range [0, 1)
- for each chunk:
- Load each chunk into fast memory and sort, resulting in a sorted “run”
- Write the sorted run back to slow memory
- Repeat this over all n/(f * Z) chunks

Costs of Phase 1:
- Comparisons = O(n * log_2(z))
- Transfers = O(n/L)

Phase 2:
- You start phase 2 with m sorted runs in slow memory, each containing s items.
- Big picture: pair these runs up and merge the pairs, do this again (and again) until they are merged.
- At each level k, the size of each run is 2^k * s
- Detailed picture: For each merge, you need three L-sized buffers in fast memory, A_hat, B_hat, and C_hat, two runs to be merged, A and B, and an output location for the merged run, C.
- A and B are size 2^(k-1) * s and C is 2^k * s (twice as large)
- Load one L-size chunk from run A into A_hat, one chunk from run B to B_hat
- Merge elements from A_hat and B_hat, storing in C_hat. Continue as long as there is room in C_hat and you still have elements to sort in A_hat and B_hat
- When you empty A_hat or B_hat, refill from slow memory
- If you fill C_hat, store to slow memory and flush it.
- If you exhaust A or B, just copy remaining elements from the other.

Cost to merge A and B in phase 2:
- You load each element from A and B only once, so that costs 2^(k-1) / L + 2^(k-1) / L transfers
- You write each element to C only once, so we need 2^(k) / L transfers
- Hence Transfers = 2^(k-1) / L + 2^(k-1) / L + 2^(k) / L = 2^(k+1) / L
- Comparisons are linear, so Comparisons = theta(2^k * s)

Cost to merge all pairs in phase 2:
- Number of merged pairs per level is n / (2^k * s), i.e., total number of input items divided by run size at level k. (these WERE pairs at level k-1, but now they’ve been merged. that’s why it equals the number of runs at level k!)
- Number of levels is log_2(n / s), i.e., log base 2 of total number of elements over divided by the size of each original run).
- Total costs of merging all pairs in phase 2:
- Transfers = 2 * (n/L) * log_2(n / s), i.e., 2 * (number of transfers needed to load all elements) * (number of levels in the tree)
- Comparisons = theta(n * log_2(n / s)), i.e., (number of elements to be compared) * (number of levels in the tree)

Costs of Phase 2:
- Comparisons = O(n * log_2(n/z))
- Transfers = O(n/L * log_2(n/z))

(Where do those z’s come from?!)

Combined Costs of Phases 1 and 2:
- Comparisons = O(n * log(n)), i.e., nlogz + nlog(n/z) =nlogz + n[logn - logz] = nlogz + nlogn - nlogz = nlogn!
- Transfers = O(n/L * log(n/z)), i.e., Phase 2 dominates so it’s the same as Phase 2

How well did you know this?

Not at all

Perfectly

What are the asymptotic runtimes of the following operations on a priority queue implemented as a MIN HEAP:
- Build the heap
- Extract the min
- Insert a new item

Build: O(k)
Extract min: O(log k) (need to re-heap after removing min)
Insert: O(log k) (similar to extract min, we have to re-heap)

WHAT IS K?

How well did you know this?

Not at all

Perfectly

Using multiway merging, what is the cost of of 1 k-way merge? (transfers and comparisons)

Transfers = 2ks / L
- ks = number of elements transferred into memory
- L = the size of each transfer
- ks / L = the number of transfers necessary to bring in all data that will be merged
- Then we multiply by 2 because we’re going to write the same number of elements back out to slow memory after sorting

Comparison = O(k + ks log k)
- O(k) = time to build the heap
- There are ks items to be merged
- Each of these ks items must be inserted or sorted once, hence ks log k

How well did you know this?

Not at all

Perfectly

What is the total cost of multiway merging (in both comparisons and transfers)?

Comparisons = O(n log n)
- “matches what you would expect for any comparison-based sort”, whatever that means

Transfers = O(n/L log_(z/L) n/L)
- Maximum number of levels in merge tree is l = Theta(log_(z/L) n/L)
- Each run at level i is k^i * s items long (where k is the number of ways we merge)
- Transfers per run at level i = Theta((k^i * s) / L)
- Number of runs at level i = n / (k^i * s)
- Transfers per run times number of runs gives number of transfers at level i = n / L
- Number of transfers per level times the number of levels = n/L * log_(z/L) n/L

How well did you know this?

Not at all

Perfectly

How good is the transfer complexity of multiway merge sort, O(n/L log_(z/L) n/L)?

As good as it gets! i.e., meets the lower bound on external memory sorting

How well did you know this?

Not at all

Perfectly

What is the lower bound on external memory sorting?

Time t >= n/L log_(z/L) n/L

How well did you know this?

Not at all

Perfectly

How do we find the lower bound on external memory sorting?

Before seeing the data (size n), there are n! possible ways in which it might be ordered (and we’re looking for one of them)
When you read data from slow memory in general, you learn something about the possible orderings, reducing their number
So, let K(t-1) := # orderings after t-1 transfers
Let K(0) := n!
At step t, load L items into fast memory. There are L! ways for them to be ordered
Assuming we have other items in fast memory already, and they’re already ordered, how many ways are there to order Z-L old items plus L new items? At most z-choose-L * L! (this is the factor by which the number of possible orderings might decrease)
After t reads, K(t) >= K(t-1)/[z-choose-L * L!]^t = n!/[z-choose-L * L!]^t
The L! terms assumes we’ve never read these L items before. This won’t always happen because you can only do n/L reads of unseen items. So, we can change the exponent on that term: K(t) >= n!/[z-choose-L]^t * [L!]^(n/L)
Now, when does this equal 1 (when does only one ordering remain)? n!/[z-choose-L]^t * [L!]^(n/L) <= 1 IMPLIES t >= n/L log_(z/L) n/L

How well did you know this?

Not at all

Perfectly

What two approximations do we need to get the lower bound on external memory sorting?

Stirling’s Approximation: log(x!) ~= x log x
Also: log(a-choose-b) ~= b log (a/b)

How well did you know this?

Not at all

Perfectly

How many transfers are necessary for binary search on a sorted array, and why? (Big-O)

The array is stored entirely in slow memory.

O(log (n/L))

MY explanation:

Binary search requires log n comparisons. For each comparison, we have to load an L-sized block of data until, at some point before the log(n) comparison, we load a block that contains all of the items we’re still considering.

At this point, no more transfers are needed! It will take log(L) comparisons to find the target within the block we just loaded, so we can subtract log(L) from log(n) to get the number of transfers needed. log(n) - log(L) = log(n/L).

LECTURE explanation:

We start with the same idea: at some point we will load a block of size L that contains all remaining items. We can write a recurrence to describe what happens:

if n > L, Q(n; Z, L) = 1 + Q(n/2; Z, L)
if n <= L, Q(n; Z, L) = 1

The answer to this recurrence (apparently) is O(log (n/L)). (why???)

How well did you know this?

Not at all

Perfectly

We’re performing binary search on an array of n items. Note the size of index i is at max, log n, i.e., it’s O(log n).

Let x(L) := the maximum number of bits “learned” on one L-sized transfer.

If we get an upper bound on x(L), we can get a lower bound on the number of I/O transfers during search: Q(n; Z, L) = Omega(logn / x(L)).

What is x(L) in a Big-O sense?

O(log L)

When we load L items into fast memory, we see whether our target is in the block, before the block, or after the block. In the before or after case, we just learned L places that the target item is not, which tells us log L bits of the target item’s index.

How well did you know this?

Not at all

Perfectly

Which of these data structures can achieve the lower bound on binary search in terms of transfers?

Doubly linked list
Binary search tree
Red-black tree
Skip List
B-tree

B-tree, but only if you choose B so that B = Theta(L)

How well did you know this?

Not at all

Perfectly

Describe a doubly linked list. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?

A doubly linked list is a linked list with pointers to the nodes BOTH before and after the current node.

No, it cannot achieve the lower bound on binary search in terms of memory transfers. We can only transfer one node at a time into memory, because we need that node to find the others (we need the pointers!).

How well did you know this?

Not at all

Perfectly

Describe a binary search tree. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?

A binary search tree satisfies two properties:
1) It’s a binary tree, i.e., all nodes have at most two children.
2) Node y is in the left subtree of x, then y.val <= x.val. If y is in the right subtree of x, then y.val >= x.val.

NOT SURE BUT, I believe we need to read a node into fast memory to get its pointers and retrieve its child nodes. So it’s impossible to load L nodes of a binary search tree at once, a necessary step in achieving the lower bound on binary search (in terms of transfers).

How well did you know this?

Not at all

Perfectly

Describe a red-black tree. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?

Study These Flashcards

A red-black tree is a binary search tree with an extra storage bit per node: its color. By constraining the node colors on any simple path from root to leaf (according to the red-black properties), we guarantee that none of these paths are more than twice as long as any other. This makes the tree approximately balanced.

Describe a skip list. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?

Study These Flashcards

A skip list is a linked list with additional pointers that form a hierarchy, allowing for both O(log n) search and insertion (insertion not being possible with a static array).

Describe a B-tree. Can it achieve the lower bound on binary search in terms of transfers? Why or why not?

Study These Flashcards

A B-tree is a balanced search trees, a lot like red-black trees, but designed specifically to minimize disk I/O operations.

Every node x has three properties: x.n the number of keys stored in x, the x.n keys themselves, and an x.leaf flag. Flags don’t have children.

Internal nodes also have x.n + 1 pointers to child nodes. The x.n keys separate the ranges of keys stored in child nodes, so the B-tree operates like a binary search tree but with more keys and child nodes.

How do you know that data movement dominates other costs of an algorithm?

Study These Flashcards

(Fill this in, go back and think about computational intensity and machine balance)

What are the Red-black properties (of a red-black tree)?

Study These Flashcards

Every node is red or black
The root is black
Every leaf is black
If a node is red, its children are both black
For each node, all simple paths to descendant leaves contain the same number of black nodes

What are the two phases of External Memory Merge Sort? (at high level)

Study These Flashcards

Phase 1:
- Divide input into chunks less than or equal to Z (the number of items that can fit in fast memory).
- for each chunk:
- Load chunk into fast memory and sort, resulting in a sorted “run”
- Write the sorted run back to slow memory
- Repeat this over all chunks

Phase 2:
- You start phase 2 with m sorted runs in slow memory, each containing s items.
- Big picture: pair these runs up and merge the pairs, do this again (and again) until they are merged.

Describe Phase 2 of External Memory Merge Sort in detail

Study These Flashcards

You start phase 2 with m sorted runs in slow memory, each containing s items.
Big picture: pair these runs up and merge the pairs, do this again (and again) until they are merged.
Detailed picture: For each merge, you need three L-sized buffers in fast memory, A_hat, B_hat, and C_hat, two runs to be merged, A and B, and an output location for the merged run, C.
- Load one L-size chunk from run A into A_hat, one chunk from run B to B_hat
- Merge elements from A_hat and B_hat, storing in C_hat. Continue as long as there is room in C_hat and you still have elements to sort in A_hat and B_hat
- When you empty A_hat or B_hat, refill from flow memory
- If you fill C_hat, store to slow memory and flush it.
- If you exhaust A or B, just copy remaining elements from the other.

What is the cost of Phase 1 of External Memory Merge Sort?

Study These Flashcards

In Phase 1, you divide the input (size n) into chunks of size f * Z, where f is in range [0, 1) and Z is the number of items that can fit in fast memory
for each chunk:
- Then load each chunk into fast memory and sort, resulting in a sorted “run”
- Then write the sorted run back to slow memory
- And repeat this over all n/(f * Z) chunks

Costs of Phase 1:
- Comparisons = O(n * log_2(z))
- Transfers = O(n/L)

What is the cost of Phase 2 of External Memory Merge Sort?

Study These Flashcards

Costs of Phase 2:
- Comparisons = O(n * log_2(n/z))
- Transfers = O(n/L * log_2(n/z))

What is the total cost of External Memory Merge Sort?

Combined Costs of Phases 1 and 2: - Comparisons = O(n * log(n)), i.e., nlogz + nlog(n/z) =nlogz + n[logn - logz] = nlogz + nlogn - nlogz = nlogn! - Transfers = O(n/L * log(n/z)), i.e., Phase 2 dominates so it's the same as Phase 2

What is Stirling's Approximation for log(x!)?

log(x!) ~= x log x

What is a useful approximation for log(a-choose-b)?

log(a-choose-b) ~= b log (a/b)

The lower-bound on the number of slow-fast memory transfers for a comparison-base sort is: Q(n; Z; L) = Omega( n/L log_(Z/L) n/L) Break this expression down and explain what each piece means.

Imagine it's n log_2 n instead and we're talking about a generic tree-based recursive algorithm. n is the number of items, 2 is the branching factor of the tree, and log_2 n is the height of the tree. The total expression is telling us that we do n operations per level of the tree, so with log_2 n levels we do n log_2 n total operations. In the case of n/L log_(Z/L) n/L, n/L is the number of "items", i.e., the number of individual transfers necessary to get n items into memory. Z/L is still the branching factor, i.e., the number of transfers needed to fill fast memory. log_(Z/L) n/L is the height of the tree that describes the process of repeatedly loading n items into a fast memory of size Z with transfer size L.

How do you convert log_2(x) to some other log base?

Divide log_2(x) by log2(the desired base)! So if we want log_(Z/L) x, we can calculate like: log_(Z/L) x = log_2 x / log_2 Z/L

I/O Avoiding Algorithms Flashcards

(29 cards)