Week 3 Flashcards
What is the sorting problem?
- given a collection of n elements and a total ordering arrange the elements into a non-decreasing order
What is a priority queue?
- a priority queue is a container of elements, each having an associated key
- Keys determine the priority used in picking elements to be removed
What are the 4 fundamental methods performed on a priority queue
- insertItem(k, e) - element e with key k
- removeMin() - remove element with the minimum key
- minElement() - return element with the minimum key currently in the priority queue
- minKey() - return key of minimum element
what order are elements stored in in priority queues?
- we work on maintaining the element with the minimum key currently in the priority queue
How is a deletion completed in a priority queue and what is the time complexity of this?
- The elements with the current minimum key is deleted and the priority queues minimum is updated.
- this is completed in logarithmic time
How can we use a priority queue to perform sorting on a set of elements in two phases?
- Phase 1: put elements of set one by one into an initially empty priority queue by a series of insertions, after each insertion the priority queue pulls up the minimum so it’s ordered and we have the minimum in linear time
- Phase 2: extract the elements from the priority queue with a series of remove operations. When all items have been extracted and added to a list, this list will be sorted in non-decreasing order
Describe the primary queue sorting algorithm
- the priority queue is initially empty
- while the input set is not empty, remove the first element and insert it into the priority queue, do n times for every item in the list
- then when we have a full sorted priority queue, remove all n items, by searching through the priority queue and adding the minimum element to the new sorted list.
What is the running time of the priority queue sorting algorithm?
O(nlog) because there’s technically 2nlogn since putting n elements in PQ then searching logn to find where to insert them takes nlogn time, and we do this again n times when we remove every item from the pq to add to a sorted list, hence nlogn, but 2nlogn = O(nlogn)
What is a heap data structure?
- This is an implementation of a priority queue that is efficient for both insertions and deletions
- elements and keys are stored in a complete binary tree (so every level apart from perhaps the bottom level will have a the maximum number of children possible)
What are the properties of a heap data strucutre?
- In a heap T, for every node v (excluding the root) it’s key is >= it’s parents key
- if we take any path from the root to an external leaf, it only had elements that are non-decreasing, so the root must be the global minimum
How can we implement a heap data structure?
- As a nearly complete binary tree containing elements with keys satisfying the heap-order property stored in an array
- need to store a reference to the last used node T in the bottom level of the binary tree
- a comparator function that defines the total order relation on keys and which is used to maintain the minimum/maximum element at the root of T
What is the last used node/element at the bottom level of the binary tree/heap?
The rightmost element/node on the lowest level of the tree
How are items inserted into heaps?
- begin by adding the element to the bottom of tree in the position of the first unused/empty child
- then if need be, the new element bubbles it’s way up the heap until the heap order property is restored
What is the time complexity of insertion into a heap?
At max we’ll need to do 2^n number of swaps where n is the height of the heap, the insertion is only O(1) so the time complexity is log2(n)
How is deletion in a heap carried out?
- deletion consists of removing the minimum/maximum element which is at the root, so we remove this
- the element in the ‘last’ position is immediately moved to the root and then sinks/bubbles down to restore heap-order property
When bubbling the item at the root node down, how do we decide which subtree’s element to swap with?
- We swap with the element that’s smallest.
- For example if the item at the root = 8 and the left subtree holds 6 and the right subtree holds 5, we would swap 8 with 5 at the right subtree
- this is because if we move the larger value up this will break the heap-order property
- if we moved 6 up, then 6 is bigger than 5 so the heap-order property would be broken.
What are the time complexities of finding the size/is empty, minElement, minKey, insertItem and removeItem methods for a heap?
- size, is Empty - O(1)
- minElement, minKey - O(1)
- insertItem - O(logn)
- removeMin - O(logn)
What is the divide and conquer method and how is it achieved?
- it’s a method for solving certain types of algorithmic problems
- divide the input data into two or more disjoint subsets
- recursively solve the subproblems associated with the subsets
- conquer: take the solutions to sub problems and merge them into a solution to solve the original problem
How is a merge sort performed?
- a way to apply divide and conquer method to sorting
- setting an input size 0 or 1 as the base case
- dividing the input sequence in half, recursively divide each sequence in half until every input is in a list on their own, this will hit the base case and unwind
- as the algorithm unwinds, it will sort each list, merging them together into a final completely sorted list
What are the 3 steps in the divide and conquer method
- divide, recur, merge
What does merging mean in a merge sort
Putting the inputs in global sorted order
What is the time complexity of a merge sort
- sorts in time nlogn
- this is since during each recursive step we divide a sublist in two which takes O(n) time, and we will recursively call the algorithm at most O(logn) times since we’re dividing the list in two, this gives O(nlogn) run time for mergesort
What are counting inversions?
Having two sets of rankings of numbers, e.g. 1 3, 4, 5, 2 and 1, 2, 4, 3, 5 and counting the number of inversions between the two rankings to give a meaningful metric of the difference in the two rankings
how do we count inversions
- Suppose that a1, a2, a3… an denotes a permutation of the integers 1, 2, … n, the pair of integers i,j are said to form an inversion if i <j> aj</j>
- for example if a1 = 1, a2 = 2, a3 = 3, a4 = 4, a5 = 5 and we have the ranking 1,3,2,4,5
- here i = 3 and j = 2 but 2 < 3 so this is an inversion
What happens if we invert a permutation from it’s correct order?
It’ll contain the largest umber of inversions
What is the naive approach of counting inversions?
-running through all pairs to see if they form an inversion in the permutation which gives us a run time of O(n^2) which is not efficient
What is the more efficient method for counting inversions and what time complexity is this completed in?
- using a divide and conquer algorithm
- we divide the permutation list in half. we recursively divide the permutation list until each item is in a list on it’s own (this is the base case), we then merge the lists by sorting each sublist, as we merge the lists we count the number of inversions from pairs. We merge the lists until only one sorted list is returned
- then we have counted all the inversions
How do we know that the divide and conquer counting inversion algorithm counts all of the inversions?
- Because when we put an item into a merged list, we’ll have already counted it’s inversions in it’s own list and it’s inversions with the other list will be counted while we move it into the merged list,
- we count all the items with another list by following that if we’re moving an item I from B into the merged list, since each list is sorted and I is being moved before any other items in A, we know that I must be smaller than the ones in A, even though all of these come before I in the input list, hence we can count an inversion between I and every item left in A
What is the quicksort algorithm
- a divide and conquer algorithm that performs sorting
- it chooses a pivot x in the input set and creates three subsets, less than, equal to or greater than x
- it then recursively sorts the elements into each group
- one every element is in a group on it’s own (base case), it unwinds, merging every list into a sorted list until there is one final sorted list
What is the runtime of a quicksort?
- a pivot must be chosen each time, if a bad choice is made, meaning that it leads to the height of the quicksort tree being n, then worst case time complexity = O(n^2) as we’re performing n operations to sort the n elements, for n levels
what is a randomised quicksort?
- where we randomly select the pivot element, instead of having a specific formula for choosing it in the normal quicksort
what is basic probability?
- it can be used to analyse the average-case performance of algorithms
- some inputs are more/less probable to appear, so we can use probability distribution to pick the more efficient inputs
What is a sample space? explain what the same space of a fair 6 sided dice is?
A sample space is the set of all possible outcomes from an experiment
- sample space of a fair dice is {1,2,3,4,5,6} as these are all the results you can get from rolling the dice
What is a probability space and an event?
a probability space is a sample space s and a probability function pr, which maps subsets of S to [0,1]
- an event is a subset of the sample space
What is a probability function?
A probability function Pr possesses the following properties:
- Pr(empty set) = 0
- Pr(whole set S) = 1
- Pr(a subset of S) = probability between 0 and 1
- if A and B are subsets of S and have no intersection of elements, the probability of A and B happening is the same as the probability of A happening + the probability of B happening
give an example of how independent events effect the probability of landing head on a coin toss?
Independent events means that two events do not have any effect on the probability of the other happening. So if I land heads first, it does not increase or decrease the probability that the coin will next land on heads again or on tails
How do we mathematically represent that two events are mutally independent?
Pr(A n B) = Pr(A) * Pr(B)
What is conditional probability?
The probability of an event occurring given that some other event has already occurred.
How do we calculate conditional probability?
Pr(A n B) / Pr(B)
probability of the intersection of A and B divided by the probability of A happening
What is a random variable?
A function that maps outcomes from some sample space to real numbers.
Where do we use random variables in algorithms?
We use them with a discrete set of outcomes to characterise running time of a randomised algorithm
What does the expectation of a random variable tell us?
It tells us what value I should expect from the experiment, repeating it many time and taking the mean
What is the expected running time of a randomised quicksort?
O(nlogn) (which is much better than the normal quicksort with complexity n^2)
What criteria does a good pivot in a quicksort meet?
a good pivot is if the size of neither L or G is a larger than 3/4*size of the set
Why can we class the time complexity of the randomised quicksort algorithm as nlog2n
Because anything that is logi(n) can also be logj(n) is j > i
- so since the randomised quicksort has a time complexity of log4/3(n) it also belongs to the class of log2(n) since 2 > 4/3