Sorting, Searching and Selection Flashcards
Theorem for the lower bound of sorting algorithms
For any comparison-based sorting algorithm there exists at least one input of length n that requires omega(n*log(n)) comparisons
How does bucket sort work?
Suppose the values you want to sort are between 0 and k-1
(3, 2, 1, 2, 0, 3, 2, 1, 4, 0, 4, 3, 0)
Create k queues (buckets) B[0] to B[k-1]
Go through the array placing the values in their appropriate buckets
B[0] = 0, 0, 0 | B[1] = 1, 1 | B[2] = 2, 2, 2 | B[3] = 3, 3, 3 | B[4] = 4, 4
Empty the buckets in order
0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4
If we’re not sorting numbers but rather arbitrary things, then we just assign a numerical key to each item and use that for sorting
Explain the time complexity of bucket sort
Worst case: O(n+k) where k = bound, n = number of items
Creating buckets: O(k)
Putting items into buckets: O(n)
“Spilling” buckets: O(n)
If k gets really large then the running time is dominated by O(k)
What is radix sort?
When sorting numbers, create as many buckets as there are digits in the numbers (maximum 10 in base-10)
Bucket-sort by the rightmost digit
Bucket-sort again, this time by the second rightmost digit
The number of rounds depends on the length of the longest digit
Explain the time complexity of radix sort
If there are k buckets, then the biggest number has d = log_10(k) digits. We perform bucket sort (O(n)) d times, giving us a time complexity of O(dn) = O(n*log(k))
Compare radix sort to bucket sort
Radix sort: O(nlog(k))
Bucket sort: O(n+k)
if k = n, it’s O(nlog(n)) vs O(2n) and bucket sort wins
if k = n^2, it’s O(n*log(n)) vs O(n^2) and radix sort wins
if k = 2^n, it’s O(n^2) vs O(2^n) and radix sort wins
How would you search for x in a sorted list length n using binary search?
Look at position ceil(n/2); if it contains x then we’re done
If x is bigger than this middle value, then recursively binary search the top half of the list
If x is smaller than this middle value, then recursively binary search the bottom half of the list.
O(log(n)) time complexity vs O(n) for linear sort
What is selection?
The process of finding the i’th smallest element in an unsorted array
Describe how QuickSelect works
Searching for the i’th smallest item in A:
Call a partition function to get the index of the pivot item
If the pivot index is i, return A[i]
If the pivot index is greater than i, QuickSelect the part of the list before the pivot
If the pivot index is smaller than i, QuickSelect the part of the list after the pivot
QuickSelect performance
Worst case (already sorted and rightmost is pivot): the part being recursed into is only one element smaller than the current input. T(n) = T(n-1) + O(n), T(n) = O(n^2)
Average running time is O(n)
How does Median-of-Medians work?
Same as QuickSelect, but to find the pivot index you divide the n elements into groups of 5, and then find the medians of these groups. You then use QuickSelect to select the x’th item in the list of medians
Median-of-Medians analysis
At least half of the group medians are less than x, which is floor(floor(n/5)/2) = floor(n/10) group medians
at least 3floor(n/10) elements are less than x
at least 3floor(n/10) elements are greater than x
The second recursive call is executed on at most (n-3)*floor(n/10) elements
(n-3)*floor(n/10) < (n-3)(n/10 - 1) <= 7n/10 + 3
T(n) = O(n) (medians of 5-element groups) + T(n/5) (selecting xth median in the list of medians) + O(n) (partitioning) + T(7n/10 + 3) (rest of the code)
T(n) = T(n/5) + T(7n/10 + 3) + n
T <= cn therefore T is O(n)