Sorting Algorithms Flashcards
Describe external sorting
Sorting algorithms that are suitable for very large files on disk whose records are not entirely stored in main memory
Describe the sort-merge strategy
The strategy involves sorting small subfiles of the main file known as runs, these runs are then merged into sorted into larger sorted subfiles which are sorted in turn
Describe the difference in sorting for relations that fit in the memory and those that do not
For those that fit, techniques such as quicksort, bubble sort and merge sort are used.
For those that do not, external sort-merge is used
In what cases is sorting not done
Sorting is not done when there exists an appropriate index on the file, eg primary or clustered, that allows ordered access to the files
Describe the buffer space in the sort-merge algorithm
Like all algorithms, sort-merge requires a buffer space in the main memory where actual sorting and merging of the runs takes place
The buffer space is part of the DBMS cache, an area in the computer main memory that is controlled by the DBMS
The buffer is divided into individual buffers where each buffer is the same size in bytes as the size of one disk block. Each individual buffer corresponds to exactly one block
Describe the sorting phase of the sort-merge algorithm
Runs of the file are read into the main memory (buffer space) sorted using an internal sorting algorithm and then written back to the disk as sorted subfiles.
How is the number of initial runs required determined?
The size of the run and number of initial runs (nR) is determined by the number of file blocks (b) and the available buffer space (nB) using the formula:
nR=⎡(b/nB)⎤
Do example in page 35
Describe the merging phase in sort-merge
The sorted runs are merged in one or more merge passes. These merge passes have one or more merge steps
-one buffer block is needed to hold one disk block from each of the sorted subfiles being merged, and one additional buffer is needed for containing one disk block of the merge result
What is the degree of merging (dM) and how is it determined
This is the number of sorted subfiles that can be merged in one merge step
dM is the smaller of (nB-1) and nR
*nB-available buffers nR- number of initial runs
How do you calculate the number of merge passes required
Done manually by counting or:
[(log dM (nR))]
Do example on page 37
Describe the cost analysis of the merge-sort algorithm
The cost is measured based on the number of disk block reads and writes before the sorting is done/
To calculate:
(2b)+(2b*(log dM (nR)))
The first part is for the sorting phase. Each block is accessed twice (reading into main memory and writing back into disk)
The second part is for the merging phase. Similar to the sorting phase, each block is accessed twice. We the multiply by the number of merge passes.
Do example in 41
irh