Sorting Fundamentals Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q
  1. Time complexity more important to analyse than space because space is easier to augment in machines.
  2. The system independant variable which we should analyse is input size of the data structure(array)
A

Brute Force / Selection Sort :

  1. Run the loop from 0 to n-1
  2. For every arbitrary i, we find the smallest from i+1 to n-1 and swap
  3. So, at every i, it has i+1 th smallest. That is, at index 1 we have the second smallest
for i in range(0, n):
   minindex = i
   for j in range(i+1, n):
      if A[j] < A[i]:
        minindex=j
  A[i], A[minindex] = A[minindex], A[i]

return A

Time complexity is O(n^2). Because the inner for loop is running n(n-1)/2 comparisons. There are lower order terms which are ignored because when n tends to infinity the lower order terms tend to 0.

For ex : an2 + bn + c. Here, bn + c is ignored because when n tends to infinity the term will become tend to 0 in the equation and c is constant so ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bubble Sort :

  1. Loop from 0 to n-1
  2. For every i we have an inner loop running from n-1 to i+1
  3. The inner loop checks if A[i] < A[i-1]. If yes, we swap the two.
  4. This basically bubbles up and places the i+1 th smallest element at i. That is at index 0, it places the first smallest element and proceeds till n-1. It doesn’t run inner loop when i = n-1

for i in range(0, n):
for red in range(n-1, i+1, -1):
if A[red] > A[red-1]:
A[red], A[red-1] = A[red-1], A[red]

return A

Time complexity is O(n2). This is because the inner loop runs n(n-1)/2 times

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Decrease And Conquer Design strategy :

  1. Two ways of doing this :
    i. Top Down : Here we nibble away one piece from the problem size. Assume lazy manager does one small piece of problem and passes on rest to delegate. The delegate does the same and so on. In the end, lowly developer gets problem size of 1. So basically, lazy manager does problem of size 1 and reduces problem size from n –> n-1

In terms of Sorting, we can see that selection sort and bubble sort are examples of this approach where we take one number in the array at a time and place it in it’s correct index.

ii. Bottom up : Here we assume that problem of size n-1 is already complete. That is, array till size n-1 is already sorted and we need to add only the last element in the correct index. We consider this for every arbitrary i in the array.

In terms of Sorting, insertion sort is an example of bottom up decrease and conquer. Assume we have a few cards in our hands sorted. We need to insert a new card in it’s correct position. For this the other cards greater than new card needs to move to next place. So we store this new card as temp variable and move every card greater than new card by one place. When we reach a card lesser than it, we stop shifting and insert new card in that location.

A

Insertion Sort : 4,5,6,7,3

for i in range(0, n):
    temp = A[i]
    print('temp',temp)
    red = i-1
    while red >= 0 and A[red] > temp:
        A[red+1] = A[red]
        red=red-1
    A[red+1] = temp

T(n) = Summation(Sigma) of work done by each manager i

T(n) = Summation(No of shift operations * C1 + C2)

T(n) = Summation(No of shift operations * C1) + C2n

Best Case : If entire array is sorted no shift operations needed hence T(n) = O(n)
Average Case : If half the array is sorted then no of shifts needed at every index i is i/2. then T(n) = O(n^2)
Worst Case : If entire array is is descending order then no of shifts needed at every index is i then T(n) = O(n^2)

Here, we have a while which checks for a condition before doing shift operations so there is a difference in no of shift operations needed in each case hence different cases of time complexity. However, in selection and bubble sort irrespective two loops exist so time complexity is always O(n^2) in best, worst and average case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Divide & Conquer : Merge Sort

  1. This is another design strategy where a problem is divided into parts(usually 2). Say, a lazy manager divides the problem at hand into 2 and delegates each subproblem to his delegates.
  2. Each delegate solves the problem(using recursion)
  3. Manager then combines both solved subproblems to get solution
A
def mergeSort(A, start, end):
    if start >= end:
        return 
mid = (start + end )//2

mergeSort(A, start, mid)
mergeSort(A, mid+1, end)

res=[]
i=start
j=mid+1
    while i<=mid and j<=end:
        if A[i] <= A[j]:
            res.append(A[i])
            i+=1
        else:
            res.append(A[j])
            j+=1
while i <= mid:
    res.append(A[i])
    i+=1

while j <= end:
    res.append(A[j])
    j+=1

A[start:end+1] = res
print(A)
return

def merge(A):

mergeSort(A, 0, len(A)-1)

A = [4,3,1,2]
merge(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Merge Sort Analysis :
T(n) = 2*T(n/2) + c’n

O(n) comes from combine step where we combine n elements
Sorting is 2 * T(n/2)

This can be written as T(n) = 2[2T(n/4) + c’(n/2)] + c’n
= 4T(n/4) + 2c’(n/2) + c’n
= 4T(n/4) + 2c’n
This keeps going on till level h where h is height of the tree.
At level h where #of subprpblems is N and work done by size of subproblem is 1:
T(n) = nT(1)+hc’n
= cn + logn(c’n)
= cn + c’nlogn
= O(nlogn)

Worst, best and average case of merge sort is O(nlogn)

A

Inplace algorithm - Does not require any additional memory. So mergesort is not inplace.
Insertion, bubble and selection sort is in place because memory required is constant.

TimSort combines O(nlogn) worst case merge sort + O(n) best case insertion sort. TimSort is used as python sorting algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Lomutos’s Partitioning QuickSort :

import random

def helper(A, start, end):

if start >= end:
    return

pivotindex = random.choice(range(start,end+1))
A[pivotindex], A[start] = A[start], A[pivotindex]

red = start

for green in range(start+1, end+1):

    if A[green] < A[start]:

        red+=1

        A[red], A[green] = A[green], A[red]

A[start], A[red] = A[red], A[start]

helper(A, start, red-1)
helper(A, red+1, end)

return

def quickSort(A):

helper(A, 0, len(A)-1)

A = [2,3,5,1,6,4]
quickSort(A)

A

Hoare’s partitioning QuickSort:

import random

def helper(A, start, end):

if start >= end:
    return

pivotindex = random.choice(range(start,end+1))
A[pivotindex], A[start] = A[start], A[pivotindex]

red = start+1
green = end 

while red <= green:
        if A[red] < A[start]:
            red+=1
        elif A[green] > A[start]:
            green-=1
        else:
            A[red], A[green] = A[green], A[red]
            red+=1
            green-=1
A[start], A[red-1] = A[red-1], A[start]

helper(A, start, red-1)
helper(A, red+1, end)

print(A)

return

def quickSort(A):

helper(A, 0, len(A)-1)

A = [2,3,5,1,6,4]
quickSort(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quick Sort Analysis

  1. Divide : Cn
  2. Solve and Combine : T(1) which is trivial

The time taken T(n) by each manager = Cn + T(left partition) + T(right partition)

Best Case when Pivot is the median element :

T(n) = Cn + T(n/2) + T(n/2)
= 2T(n/2) + Cn
= O(nlogn)

Worst Case when pivot is smallest or largest element ;

When pivot is the smallest element then every element will be greater than pivot so will all be green.
When pivot is the largest then every element will be lesser than pivot so will all be red.

T(n) = Cn + T(n-1)
= O(n^2)

The average case is the best case here which is O(nlogn)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stability in Sorting

A sorting algorithm is called stable when the order of duplicate elements in an array is maintained exactly even after sorting.

5 1 3(a) 6 3(b) ————-> Sorting algo ————> 1 3(a) 3(b) 5 6

Selection Sort? Not stable 
Bubble Sort? Stable
Insertion Sort? Stable
Merge Sort? Stable
Quick Sort? Not stable
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Transform and Conquer

Priority Queue ADT

  1. Unlike a regular queue with FIFO as it’s property a priority queue has two functions that needs to be implemented by any Data structure to implement thus ADT
     i. Insert 
     ii. Extract min or extract max ( this is the priority function) 

So here unlike FIFO property any time you extract the element is extracted based on the priority function which is max or min regardless of when it was inserted into the priority queue

A

Real world examples : Hospitals needs to prioritize patients that come in for service.

What DS to use?

  1. Unsorted array : Here, insert is O(1) for each element and extract max/min will be O(n) each time
  2. Sorted Array : Here, insert is O(n) for each element and extract max/min will be O(1) each time
  3. Unsorted List : Here, insert is O(1) for each element and extract max/min will be O(n) each time
  4. Sorted List : Here, insert is O(n) for each element and extract max/min will be O(1) each time

This is not ideal we want to have insert and extract operations done in O(logn) time so that the overall T(n) would be O(nlogn) for both.

What to do?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Binary Heap

  1. Tree structure? Yes
  2. Because in tree structure we can maintain a O(logn) time xomplexity for insert and extarct min.max

The properties we maintain are :

  1. The root will have the element with most priority (either min or max depending on the ADT)
  2. It will be a complete binary tree because only in complete binary trees the T(n) is O(logn) for insert and extract
A

How is it O(logn)?

Level      #Nodes
0              1
1               2
2              4
3              8
i.              2^i
h              2^h

We know level h nodes are leaves. The total numebr of nodes in the tree will be ‘n’

1 + 2 + 4 + 8 + ……. 2^h = n
Geometric sequence of form a + ar + ar^2 + ar^3 ……. where a = 1 r = 2
where each term x subscript n = a*r^(n-1)

Sum of geometric series. = a(1-r^n)/(1-r)

Therefore, sum of the above series is 1(1-2^h)/1(1-2)
= 2^h - 1

We know that this sum of the geometric sequence = n

2^h - 1 = n
2^h = n+1
h = log(n+1) which is approximately equal to logn

therefore, h = logn

T(N) for insert and T(n) for extract min for all nodes will be O(nlogn)

Heap can be easily represented in an array :

  1. Parent of element given by integer division by 2 that is parent of index 7 is in index 3
  2. Children of an element given by elements in 2i, 2i+1 indexes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Heap functions

def heapify(A, n, k):

l = left(k)
r = right(k)

if l < n and A[l] > A[k]:
    largest = l
else:
    largest = k

if r < n and A[r] > A[largest]:
    largest = r

if largest != k:
    A[k], A[largest] = A[largest], A[k]
    heapify(A, n, largest)

def extractMax(A):

return A[0]

def deleteKey(A, k):

del A[0]
def heapSort(arr):
    n = len(arr)
    #Build max heap
    for i in range(n//2-1, -1, -1):
        heapify(arr, n, i)
for i in range(n-1, -1, -1):
    arr[0], arr[i] = arr[i], arr[0]
    heapify(arr, i, 0)
def left(k):
    return 2*k+1
def right(k):
    return 2*k+2
# Driver code
arr = [12, 11, 13, 5, 6, 7]
heapSort(arr)
print(arr)
n = len(arr)
print("Sorted array is")
for i in range(n):
    print(arr[i])
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly