6. Sorting Flashcards

1
Q

Why databases need special sorting algorithms?

A

All of the traditional sorting algorithms (i.e. quick sort, insertion sort, etc.) rely on us being able to store all of the data in memory. This is a luxury we do not have when developing a database. In fact, most of the time our data will be an order of magnitude larger than the memory available to us.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we measure when analyzing database-specific sorting algorithms?

A

Because of how time-consuming it is to go to disk, we only look at the number of I/Os an algorithm incurs when analyzing its performance. We can pretty much ignore traditional measures of algorithmic complexity like big-O.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Two Way External Merge Sort - explain the algorithm

A
  1. Sort every individual phase (“conquer” phase)
  2. Sorted run - merge 2 unsorted sets of pages (“merging” phase)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two Way Merge complexity - analysis sketch

A

N data pages

each pass over the data will take 2∗N I/Os

“conquer” pass is always needed, so we have 2*N right away

we need log2(N) merging passes,

so

2N ∗ (1 + log2(N)) I/Os in total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Full External Sort - a main improvement over Two Way Merge

A

Use an in-memory buffer to store sorted results instead of sorting each page separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Full External Sort - explain

A

Let’s assume we have B buffer pages.

  1. Conquer phase - load B pages and sort them all at once. Repeat until such bundles of pages are sorted.
  2. Merge phase - merge B-1 sorted runs at a time (instead of 2 in two-way merge)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Full External Sort complexity - analysis sketch

A

The conquering pass produces only N/B sorted runs

As soon as we merge B-1 runs on every step, we have

logB−1 (N/B) sorted runs

(log base is B-1 instead of 2)

So…

2N ∗ (1 + logB−1 N/B) I/Os

How well did you know this?
1
Not at all
2
3
4
5
Perfectly