sorting Flashcards
why is external sorting needed?
files are stored on the disk and it’s too large for in-memory sorting methods
in multi-way sort there are multiple passes
what does pass 0 do?
what about all the subsequent passes?
pass 0 is used to read a single page and sort it
subsequent passes keep X buffers X-1 buffers for input and 1 buffer for output, these will sort the pages between themselves
in a 2-way external merge sort how do you calculate
1. the number of passes
2. total I/O pages
- ceil(log2(N)) + 1
- 2N( ceil ( log2 (N) ) + 1 )
in a general external merge sort how do you calculate where you have B buffers
1. the number of passes
2. total I/O pages
- ceil ( logB-1 (N/B) ) + 1
- 2N ( ceil (logB-1 (N/B) + 1 )
what needs to be the case for us to be able to reduce I/O time per page
if the pages are sequentially stored we can reduce I/O time per page by having larger input buffers. We will take more pages in at a time. However it will cost additional seek time and rotational delay
Regarding blocked buffers and their sorting costs, if
buffer (memory) size is B pages
input buffer block size is b pages
N is the number of pages
what is the
1. number of input buffers
2. number of runs after pass 0
3. I/O pages
- (B-1)/b
- ceil (N/B)
- 2N ( 1 + |logB-1/b ( ceil (N/b) ) | )
in blocked buffer sorting, a larger input buffer size does what?
a larger input buffer block size reduces fan in , which increases passes and I/O pages but reduces the time per I/O page
true or false: clustered B+ trees are always better for sorting than external sorting
true
true or false: unclustered B+ Trees are always better for sorting than external sorting
false, it could be more expensive than external sorting
true or false: iterating over unclustered index can be faster than external sorting
True. if the # of records are low enough, iterating over unclustered indices can be faster. However, the number of records per page are normally large enough enough so that external sorting is faster.
- slide 25
how do you calculate the # of runs of B pages?
B = buffer pages
ceil ( N / B ) runs of B pages
what is the formula to calculate the # of runs merged
ceil ( (B-1) / b )
b: is input buffer block size.
B: # of buffer pages
if we have a larger b (input buffer block),
- does it increase or decrease I/O time per page?
- does it increase or decrease the # of runs merged?
- does it increase or decrease the number of passes?
b is
Larger b means:
- less I/O time per page
- smaller # of runs merged
- more passes and I/O pages
do we prefer an internal sort that produces longer runs or shorter runs
longer runs
how many pages does replacement sort produce per page
Replacement sort produce runs of 2*B
pages each