Algorithms and Data Structures Flashcards

1
Q

Name three desirable properties of a good algorithm.

A

Algorithm should be:

  1. Correct
  2. Efficient
  3. Easy to implement

These goals might not be and often are not simultanously achievable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is mergesort algorithm implemented?

A
  • Based on divide-and-conquer approach.
  • Breaks array into two halves and sorts both halves
  • Once subproblem is 2 element array it starts merging back arrays so that they are sorted during merge. After bubbling up the recursion array is sorted.
  • Needs an additional temporary array for merging.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is quicksort algorithm implemented?

A
  • Divide and conquer approach through partitioning.
  • Taking an element and partitioning the array to “lower than” and “greater than”.
  • Recursively repeating partitioning on subarrays
  • When subarrays have 1 element, sorting is done.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

On which algorithm is shell sort based?

A
  • It is based on insertion sort algorithm.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is shell sort algorithm implemented?

A
  • It uses so cold h-sort approach to introduce partial order in the array.
  • Value h is first calculated “while (h < N/3) h = 3*h + 1”
  • In every loop value h = h/3 - at the end we have h = 1 and that is regular insertion on partially sorted array.
  • For wrongly set h can be slow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Running time properties of mergesort algorithm?

A
  • Mergesort is optimal compare-based sorting algorithm.
  • Average, worst and best case run in O(NlogN) complexity
  • Needs auxilliary array. Additional space O(N)
  • For already sorted is O(N) if there is a check for if array is already sorted.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Running time properties of quicksort algorithm?

A
  • Average O(NlogN)
  • Worst O(N2) when array is already sorted. Randomization before sorting is done to prevent.
  • Can be improved by cutoff to insertion sort for smaller arrays
  • For arrays with lot of duplicates can be significantly improved by partitioning to “smaller than”, “equal”, “larger than”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ways to improve default quicksort approach?

A
  • Cutoff to insertion sort for 15 element subarrays
  • Picking the partition element. Median of three.
  • Partitioning to “lower than”, “equal”, “greather than” for arrays with repeateable keys.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are optimizations for the mergesort algorithm?

A
  • Cutoff to insertion sort
  • Check for already sorted arrays.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are run-time properties of selection sort algorithm?

A
  • Average, best, worst time: O(N2)
  • Approx N2/2 compares and N swaps
  • Doesn’t need additional memory
  • Running time is not sensitive to input
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are running time properties of insertion sort algorithm?

A
  • Average case is O(N2) with N2 swaps for unsorted array
  • Worst case is O(N2) with N2 swaps for random array
  • Best case is O(N) with 1 swap for already ordered array.
  • It doesn’t need additional memory
  • Dependent on properties of input
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are running time properties of shell sort algorithm?

A
  • It is in practice O(N4/3), O(N5/4) … but no proof is given
  • In practice used due to its simplicity and acceptable speed for moderately large arrays.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define:

heuristic

A

Heuristic is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution. In a way, it can be considered a shortcut.

In general, every type of problem will have its own set of heuritics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give one example of heuristics?

A

An example of heuristics is using Manhattan distance heuristic function in the problem of finding the shortest path in order to decide on the next step among the list of options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between the algorithmic problem and an instance of an algorithmic problem.

A

Algorithmic problem is specified by describing the complete set of instances it must work for and produce valid solution to all of them.

Sorting is an algorithmic problem. Sorting of an array of integers is an instance of sorting problem.
This is important since recognizing the general problem in an instance is the first step to solving it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fundamental difference between algorithm and heuristic?

A

There is a fundamental difference between algorithms, which always produce correct result, and heuristics, which may usually do a good job but without providing any guarantee.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give one example where
heuristic yields very unoptimal solution.

A

If points we are trying connect are lying on a flat line and we start in the middle then it will, instead of goinglinearly, it will jump towards the outside which is very unoptima.

In general, if distances between nearest neighbours become bigger and bigger towards the end of execution, it is most probably unoptimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which are three forms of algorithmic notation?

A

Three forms of algorithmic notation (going from informal to formal):

  • English
  • Pseudo code
  • Implementation in a particular programming language.

Common mistake is to use more formal structure to express not so formal level of communication. E.g. english explanation presented through pseudo code structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which are two parts of algorithmic problem specification?

A

Two parts of algorithmic problem specification are:

  • Set of allowed input instances.
  • Required properties of the algorithm output.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What can be done with set of allowable instances in the specification of an algorithmic problem in order to make it easier to find a correct solution?

A

Narrowing down problems to a more simple allowable instances is a standard way to simplify reasoning about algorithms.

Example would be reasoning about a tree structure instead of full graph or single dimensional problem instead of 2-dimensional one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the best way to prove one algorithm incorrectness?

A

The best way to prove incorrectness of an algorithm is by finding a counter-example for which algorithm is not correct.

Counter example is the case of input sequence for which correct solution exists but algorithm doesn’t find it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which are two important properties of counter-examples?

A

Two properties of a good counter-example are:

  • Verifiability
  • Simplicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which are good approaches in finding counter-examples?

A
  • Think small.
  • Think exhaustively
  • Hunt for the weakness
  • Go for a tie - when algorithm has to decide on two options.
  • Seek extremes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Inductive proof steps are?

A
  1. Prove the idea for the basic case n = 1, n = 2
  2. Assume the idea is valid until the “n - 1” case
  3. Prove that it is still valid for the “n” case.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Inductive reasoning is important in algorithmic theory because …

A

Inductive reasoning comes natural with the recursive nature of lot of algorithmic problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Two major classes of summations are?

A

Two major classes of summation formulas are:

  • Arithmetic progression
  • Geometric progression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Explain:

Arithmetic progression

A

Arithmetic progression of summation is the case where difference between consecutive terms is constant.

In general for thr p-th degree summation it yields (p+1)-th degree sum.

S(n,p) = SUMi -> n(ip) = Theta(np+1)

Concrete

SUMi -> n(i) = n(n+1)/2 ~ Theta(n2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Explain:

Geometric progression

A

Geometric sequence of numbers if such a sequence where every next term is found by multiplying the previous one by a fixed non-zero multiplier called common ratio.

An example:

G(n, a) = SUMi = 0 -> n(ai) = a(an+1 - 1) / (a - 1)

In general

G(n, a) ~ Theta(an+1), a > 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain:

Modeling of an (algorithmic) solution.

A

Modeling is the process of formulating the solution of a problem in terms of precisely described well-understood and already resolved problems and solutions.

Proper modeling can hugely reduce the need for finding complex solutions by reusing existing solutions.

It is important to understand that modeling often has to happen on the abstract level of discussion about general structures and solutions which are not bound to any domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Modeling with permutations.

A

Permutations - arrangements or ordering of items.

For example: {1,2,3,4} and {2,3,1,4} are two permutations of the same set.

Usually, problem modeled by permutations is seeking for ordering, sequence, tour, arrangement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Modeling with subsets.

A

Subset - represent selection from a set of items.

For example: {1,2,4} and {3} are two distinc subsets of {1,2,3,4}

Usually, subsets are modeling the problem which seeks for cluster, collection, group, packaging, committee

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Modeling with trees.

A

Trees represent hierarchical relations between items.

For example tree would be used to model family tree.

Problems with solutions modeled as trees usually are seeking for hierarchy, dominance relationship, taxonomy, ancestor/descendant relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Modeling with graphs.

A

Graphs are representing arbitrary relationship between pairs of items.

For example: road network in a city is modeled by graphs.

Usually, graphs are model of a solution when we seek for network, circuit, web, relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Modeling with points.

A

Points are representing location in some geometric space.

For example: list of monuments you want to visit in a city.

Usually they appear in the solution whenever we seek for instance sites, locations, positions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Modeling with polygons.

A

Polygons are representing some physical areas.

For example: finding neighbouring city borders.

They appear usually when we seek for shapes, regions, configurations, boundaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Modeling with strings.

A

Strings represent sequence of characters or patterns.

For example: Finding all names that start with “Iv”

Usually they appear when we are expected to find text, characters, patters, labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Explain recursive nature of standard model objects.

A

In general, removing an element from the set of elements of the model will produce a smaller object which will represent a smaller problem than the original full one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Explain:

Knapsack problem

A

Imagine a tief with a knapsack breaks into a store. How would tief pick the items from the store to fill the knapsack so that it is full. Every item has size and the knapsack has a total capacity.

Given the set of numbers, find the subset that adds up to a certain target.

For example: items are of sizes S= {1, 2, 5, 9, 10} and the knapsack is of target capacity T=22.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Two most important tools for analysis of algorithm efficiency without actually implementing them are?

A
  • RAM model of computation.
  • Asymptotic analysis of worst-time complexity.

It is important to understand that target of these models is to analyze algorithm in a machine independent way. They are both approximations of the real machine but excellent for understanding overall properties of algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Which are properties of RAM (Random Access Machine) model of computation?

A
  • Every simple operation (+, -, *, =, /, if, call) takes one step.
  • Loops and subroutines are not simple operations.
  • Reading and writing of memory takes one step.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Define:

Worst-case complexity.

A

The worst-case complexity of an algorithm is a function defined by the maximum number of steps algorithm takes for any input instance of size n.

For example: Sorting of an input array may be very different for different input instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Why is worst-case complexity useful metric in algorithm analysis?

A
  • useful as pessimistic view
  • easy to obtain in analysis and math of it is usually straightforward.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Define:

Best-case complexity

A

Best-case complexity of an algorithm is a function defined by the minimum number of steps algorithm takes for input instances of any size n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Define:

Average-case compexity

A

Average-case complexity of an algorithm is a function defined by the average number of steps algorithm takes for input instances of any size n.

45
Q

Explain the need for asymptotic notation of the algorithm run-time complexity.

A

Thinking in terms of exact behavior of an algorithm for any input instance is messy and doesn’t expose really the overall behavior and tendency of the algorithm in a clean way.

Because of this we introduce approaximations that express the upper and lower bounds of run-time complexities which give enough information and hide the messy details.

46
Q

Which are and what is the meaning of asymptotic bounding functions?

A
  • g(n) = O(f(n)) - means that C*f(n) is an upper bound on g(n)
  • g(n) = Ω(f(n)) - means that C*f(n) is a lower bound on g(n)
  • g(n) = Θ(f(n)) - means that f(n) is a tight bound on g(n)
    • C1*f(n) is an upper bound
    • C2*f(n) is a lower bound

All of these relationships should hold after some input size threshold n0.

47
Q

Standard classes of asymptotic bounds of algorithms?

A

Small amount of classes fit for most of practical algorithms:

  • constant - f(n) = 1
  • logarithmic - f(n) = log(n)
  • linear - f(n) - n
  • linearithmic - f(n) = n*log(n)
  • quadratic - f(n) = n2
  • cubic - f(n) - n3
  • exponential - f(n) = cn
  • factorial - f(n) = n!

n! >> cn >> n3 >> n2 >> n*log(n) >> n >> log(n) >> 1

48
Q

Constant time complexity

f(n) = 1

A

Constant upper bound means that number of operations algorithm takes for any input n constant number of steps.

Examples:

  • access n-th element of array by index.
49
Q

Logarithmic time complexity

f(n) = log(n)

A

Logarithmic time complexity appears in algorithms where in every next step the size of the problem is halved or doubled.

Examples:

  • binary search of an ordered array.
  • finding an element in a fairly balanced binary search tree.
  • fast exponentiation by an = (an/2)2 means we need O(log(n)) multiplications.
50
Q

Linear time complexity

f(n) = n

A

Running time of such an algorithm is linearly proportional to the number of elements which means that algorithm has to go one or more times over all elements of the array.

Examples:

  • checking if array is sorted.
  • identify biggest item
  • calculate average of the values in an array.
51
Q

Linearithmic time complexity

f(n) = n*log(n)

A

This one grows a bit faster than linear since number of times all items are visited depends on the size of thea array and it is log(n) growth.

Examples:

  • divide-and-conquer sorting like quicksort and mergesort
52
Q

Quadratic time complexity

f(n) = n2

A

Quadratic time complexity is property of algorithms that need to go over all or almost all elements of input for every element of the input.

For two inputs n and m it would be n*m

Examples:

  • selection sort algorithm
  • insertion sort algorithm
  • string pattern matching O(n*m)
53
Q

Cubic time complexity

f(n) = n3

A

Cubic time complexity is property of algorithms which enumerate over all triples of an n-element input instance.

Examples:

  • matrix multiplication
54
Q

Exponential time complexity

f(n) = cn, c > 1

A

Exponential complexity is property of algorithms that enumerate of all subsets of certain set, in case of 2n.

Examples:

  • generate all subsets of a set.
55
Q

Factorial time complexity

f(n) = n!

A

Factorial time complexity is property of algorithms that have to enumerate all permutations of a set of n items.

Examples:

  • all permutations of a set in a brute force approach.
56
Q

Why are back of the envelope estimations important?

In relation to algorithm execution time.

A

Because thinking about algorithms is often about estimating how long would it take to execute certain algorithm with the growth of the size of the problem, usually in a complex setup.

In practice, for a fairly big amount of elements to be processed, turns out that linear or nearly linear like O(n*log n) is the only satisfying complexity.

57
Q

Abstract data type

vs.

Data structures

In a vague way.

A
  • Abstract data type* is a data type which is defined only by the publicly exposed interface and the mathematical model of effects on its internal state which changes as a consequence of the use of the external interface.
  • Concrete data structures* are the underlying data structures which are used to implement the public interface of an abstract data type. They are replaceable and replacement should not affect the correctness of implementation.
58
Q

Regarding the memory organization of data structures, we can divide them into … ?

A
  • Contiguously-allocated structures
    • memory is allocated in single slabs
    • examples are arrays, matrices, heaps and hash tables
  • Linked data structures.
    • memory is allocated in scattered chunks and connected through pointers
    • examples are lists, trees, graph adjacency lists.
59
Q

Define array as a data structure.

A

Array is the fundamental continuously-allocated data structure. It is a structure of predefined number of fixed size data elements.

Because of the fixed size of the elements of an array it is possible to always calculate the position and directly access any element of it.

60
Q

Good sides of arrays are?

A
  • Constant-time access to a given index due to its structure.
  • Space efficiency since they contain only data and not meta information unlike pointer based data structures.
  • Memory locality which is helpful for good low level caching use.
61
Q

Downsides of array data structure are?

A
  • Limited and predefined capacity (solvable by dynamic allocation approach)
62
Q

Arrays with dynamic allocation.

A

Problem of fixed size of the array can be fixed by doubling the array when capacity is not sufficient and halving when oversized. When doubling the size we have to copy the content of the old array into the new one.

Indeed, half of the elements are being copied once, quarter two times and so. At the end amount of work in managing of the size of array is O(n).

M = Σ(i*n/2i)i = 1 -> log(n) = n*Σ(i*/2i)i = 1 -> lg(n) <= 2n

63
Q

Pointer data type is?

A

Pointer data type is type of the values which represent memory addresses. It is used to store and move around references to data instead copies of data itself.

64
Q

Advantages of linked data structures?

A
  • No need for dynamic size management, overflow can not happen unless we are out of memory.
  • Insertions and deletions are simple, not as in continguous data structures.
  • With large records, moving pointers to data is easier and more efficient than moving data itself.
65
Q

Disadvantages of the linked data structures?

A
  • Take additional memory to store pointers.
  • Do not allow for efficient random access to elements.
  • Worse memory locality and lower level cache benefits.
66
Q

Explain single-linked list data structure.

A
  • standard operations on a list (add, remove, find)
  • structure of a list and implementation
  • graphical, boxed, notation
  • different variants (doubly-linked i.e.)
67
Q

What are sentinel values?

A

Sentinel values are values used in implementation of algorithms for several reasons.

  • Increasing speed of operations
  • Reducing algorithm complexity and simplifying implementation
  • Arguably increasing data structure robustness (not convinced?)

Examples: In binary tree it can be used to avoid expensive comparisons with NULL. In arrays it can be used to prevent constatnt checking for the end of array.

68
Q

Containers

vs.

Dictionaries?

A
  • Containers* are abstract data types which are collections of objects and rules about their access order.
  • Example: stacks, queues
  • Dictionaries* are abstract data types which are collections of objects which are accessed by the key which was used while storing them. Known as well as associative maps, symbol tables.
  • Example: hash map, binary search tree
69
Q

Define:

Stack data type.

A

Stack is a container and an abstract data type which is defined by the:

  • push and pop abstract operations on the state of the stack
    • push(x, s) - put x on stack s
    • pop(s) - get an element from the stack s
  • LIFO order in which elements are returned after they are put on the stack.

LIFO - Last In First Out

70
Q

Which are some of the situations where LIFO order of access of the stored data is natural or generally useful?

A
  • generally useful when we don’t care about the order like for batch jobs.
  • generally applicable to the process of reversing order.
  • naturally found in algorithms that use recursion
    • recursion is kind of putting function arguments on a stack.
  • people getting in and out of elevator
  • bullets in the stack of a gun.
71
Q

Stack implementation, linked-lists vs arrays?

A

Stack can be implemented both based on linked-lists and arrays in a very simple way.

All the pluses and minuses to both of the underlying representations apply.

In the case of arrays it is important to ask if how will size behave over the time of execution.

72
Q

Define:

Queue data type.

A

A queue is a container and abstract data type defined by the:

  • enqueue and dequeue abstract operations.
    • enqueue(x, q) - add element at the end
    • dequeue(q) - remove element from the top
  • FIFO order of retrieval of enqueued elements.

FIFO - First In First Out

73
Q

Which are some situations where FIFO order of access of the stored elements is natural or generally useful?

A
  • natural for the list of arrived messages to read
  • natural for the people on the shop counter.
  • everywhere where order of processing is important.
74
Q

Queue implementation, linked-lists vs arrays?

A

With linked-lists we need to store the pointer to the tail. As well we need doubly linked list to move towards the head. Due to this it takes more space for maintaining the structure.

With arrays it is generally question if we can go with statically allocated array or we will lose performance on copying due to dynamic allocation.

75
Q

Define:

Dictionary data type.

A

Dictionary is an abstract data type which is used to efficiently insert, locate and delete elements based on one or more keys.

  • insert, find and delete, max, min, successor, predecessor are basic abstract operations

They are as well called symbol tables.
They belong to the most fundamental data structures.

76
Q

Common questions for choosing the right implementation of the dictionary.

A
  • How many items will you have in your data structure?
    • are you going to run out of memory?
  • Do you know relative frequencies of insert, find and delete operations and which are their asymptotic behaviours?
    • focus on one of them can vastly simplify the implementation and improve performance.
  • Is access pattern for keys uniform and random?
  • Is it critical that every operation is fast or that amortized performance is best possible?
77
Q

Common underlying data structures which implement dictionaries are …?

A
  • unsorted list or array
  • sorted list or array
  • hash table
  • binary search tree
  • B-trees
  • skip lists
78
Q

Dictionary implementation with sorted and unsorted arrays?

A
  • Asymptotic analysis of basic operations.
  • Unsorted
    • insert and delete are constant O(1)
    • searching and traversing is O(n)
  • Sorted
    • insert and delete are linear O(n)
    • searching is fast O(log n) based on binary search
    • traversing is O(1) since we always know the next one.
79
Q

Dictionary implementation with linked lists?

A

Single-linked, doubly-linked.

Like with sorted and unsorted arrays modification and searching are orthogonal.

Linked lists have option to optimize access a bit by being single or doubly linked.

No possibility for binary search.

80
Q

Define:

Binary Search Tree

A

Binary Search Tree (BST) is a linked data structure which allows flexible updates and fast search in the same time.

BST is structured so that we can access the median element of the structure over and below the current element. This way we do binary search in a linked data structure.

BST can be (1) empty (2) having a root with two subtrees to which root element is pointing.

Given root x, all nodes in the left subtree are lower than x and all nodes in the right subtree are bigger than x.

81
Q

Overall Binary Search Tree implementation?

A

Overall, as defined, BST has

  • a node which contains
    • data
    • link to the left subtree whose keys are smaller
    • link to the right subtree whose keys are bigger.
    • eventually pointer to parent.
82
Q

BST implementation recursion vs iterative

A

Recursive implementation is easer to mathematically reason and simpler in implementation.

Iterative is a bit more performant.

83
Q

BST implementation of insert()

A
  • searching within the tree for a position to attach the new node.
    • new one always goes to the bottom of the tree with complexity O(h) - h is the height of the tree.
    • far left or far right are always an option for smallest and biggest entries.
    • if ordered set is inserted then we will end up with a tall skinny tree which will actually be a list with O(n) complexities.
    • we need to randomize the set before inserts or use some balanced tree form.
84
Q

BST implementation of find()

A
  • O(h) complexity of finding a node within the tree.
  • Needs comparison in every node through which it passes.
85
Q

BST implementation of remove()

A
  • A bit trickier operation because we have to rewrite part of the tree. There are 3 cases.
    1. Delete node without children - just remove it
    2. Delete node with one child - just move child to the parent of a parent
    3. Delete node with two subtrees
      1. Find the minimum of the right tree and set it as replacement
      2. Remove the minimum node from the right subtree
      3. Set the right node of replacement as right node of deleted (without removed minimum)
      4. Set the left node of replacement as the left node of deleted.
86
Q

BST implementation of min/max

A
  • Navigate to the far left/right of the tree and return the found one.
87
Q

BST implementation of successor/predecessor.

A
  • Navigate to the element whose succ/pred we search for
    • if not found return null / better to check before diving into recursion.
    • if found and has right/left tree take the min/max of the respective trees.
    • if found and has no right/left tree
      • return nil from the bottom
      • on the way up replace the return value with the first one which is bigger/smaller
88
Q

Sorting by the use of BST

A

Inserting elements in the BST which is by nature able to easy provide retrieval of elements in a sorted order is one way to sort elements.

  1. Traverse the tree in-order
  2. Start from a minimum and take successors
  3. Start from a minimum and delete next minimum from the tree.
89
Q

Define:

Priority queue data structure.

A

Priority queue is a container data structure which provides means to work with in-order processing of data but with more flexibility than traditional sorting methods.

It is much more cost-effective to insert element into a priority queue than to resort the data every time new element is inserted.

  • insert
  • find minimum/maximum
  • delete minimum/maximum
90
Q

Basic questions to ask related to priority queues.

A
  • Is size of the queue predefined or variable
  • What are other operations needed.
  • Is it needed to change the priority of the elements in the queue?
    • in this case you need to take element from the queue based on their key and reinsert again.
91
Q

Compare:

Basic implementations of priority queues based on

  • an unordered array or list
  • an ordered array or list
  • a balanced binary search tree
A

Slight optimization in all the cases can be done by keeping track of the minimum/maximum element to retrieve it fast.

Evaluating three basic operations:

  • insert: UA - O(1), OA - O(n), BST - O(log n)
  • find-minimum: UA - O(1), OA - O(1), BST - O(1)
  • delete-minimum: UA - O(n), OA - O(1), BST - O(log n)
92
Q

Explain:

Standard implementations of priority queues?

A
  • sorted array or list
  • balanced binary search tree
  • binary heaps
  • bounded hight prioirity queues
  • Fibonacci and pairing heaps
93
Q

Define:

heap data structure.

A

Heap is a tree-based data structure which satisfies the heap property. Such trees are called heap-labeled trees and satisfy property that parent-child relations are defined consistently through the tree. If parent is always bigger than all its childrer we get the maximum element on top of the tree, otherwise the minimum one. Thus the names max-heap and min-heap.

  • Heap is usualy implemented as array.
  • Heaps are usualy used to implement prioirity queues.
94
Q

Explain:

Usages of the heap data structure.

A
  • Heapsort: One of the best sorting methods being in-place and with no quadratic worst-case scenarios.
  • Selection algorithms: A heap allows access to the min or max element in constant time, and other selections (such as median or kth-element) can be done in sub-linear time on data that is in a heap.
  • Graph algorithms: By using heaps as internal traversal data structures, run time will be reduced by polynomial order. Examples of such problems are Prim’s minimal-spanning-tree algorithmand Dijkstra’s shortest-path algorithm.
  • Priority Queue: Having heap being structured according to value of the priority of elements allow to pick minimal or maximal element very efficiently.
95
Q

Explain:

Representing heaps in arrays.

A
  • root on position 1
  • without compressions
    • children of root on positions 2 and 3
    • in general, children of the i-th node are on 2i and 2i + 1 positions.
    • needs 2n elements array
  • with compression (element goes on the first free position)
    • n elements
    • n-th element has children on 2*n and 2*n + 1
96
Q

Explain:

Tree representation in array vs linked representation.

A
  • linked structures are more flexible for inserts and deletes.
  • array representation is very inflexible when trees need to be restructured since that means moving a lot of elements and finding new places for them.
  • arrays are more optimal memory-wise since for an int we don’t need to drag two more pointers.
97
Q

Explain:

Construction of the heap (array based implementation) and its operations.

A
  • put inserted element on the first free place in array (starting from index 1 for root)
  • if there are children, bubble down the element by swapping it with its left child as long as child is smaller/bigger than it
    • this is a O(log n) operation.
  • finding minimum/maximum is easy since it is always at index 1 for a non-empty heap
  • removing min/max
    • take element at index 1
    • replace the taken one with the last of array
    • bubble down the element until its place.
98
Q

Analyse:

Asymptotic properties of heaps.

A
  • inserting - O(log n)
  • finding min/max - O(1)
  • removing min/max - O(log n)
  • construction O(n*log n)

Because of these properties, it is used in heapsort approach as a way to sort an array in an optimal worst-case complexity and inplace.

  • heapify on an array.
99
Q

Explain:

Bounded height priority queue.

(for limited number of priorities)

A

Bounded priority queue is a data structure used when there is a limited number of priorities to be managed.

It consistes of an array of queues, one for each priority and a pointer to the current minimum priority.

  • Inserting, and element with priority K, inserts element in the K-th queue and updates the value of the top
  • Removal, we just remove from the queue of that prioirty and update top pointer to the next minimal.
100
Q

Key concept hashing is?

A

Key concept of hashing is that a more comlex object or content gets hashed to an integer which then represents it.

Big object gets represented by smaller representation which can be manipulated in constant time in algorithms.

Hashing is done by a so called hashing function.

101
Q

Usages of hashing concepts?

A
  • implementation of dictionary data structures based on hashing.
  • caching based on the content.
  • finding duplicate records
  • finding similar substrings (Rabin-Karp algorithm)
  • Geo hashing.
102
Q

Explain:

Properties of hashing functions.

A
  • Determinism - same content has to produce same hash value
  • Defined range - to which output range the input range is mapped.
  • Uniformity - hash function should map input range uniformly to the output range. Important in relation with the size of the data structure.
  • Data normalization - is the input range normalized before hashed so that “Ivan Jovanovic” and “ivan jovanovic” produce the same hash.
103
Q

Hashing functions for strings.

A

Given the size of the alphabet A, for any string we can create a hash function producing the integer in number system with the base of A where base factors are multiplied by the values of the characters at the appropriate positions.

H(S) = Σi=0->|S|-1(A|S|-(i+1))*char(Si)

104
Q

Explain:

Collision of hash values in the hashing process.

A

Since hashing functions do not have an indefinite output range (having the range of 0-97 for example) it is expected that different content values get mapped to the same hashing values.

In such case, when hash values are to be stored somewhere we say that we have a collision. The collisions have to be resolved in some way or the data related to the first hash value would be overwritten by the second one.

105
Q

Explain:

Dictionary implementation based on the hash tables.

A

There are two implementations of the dictionary based on hashing and they are differentiated on the approach they take regarding collision resolution.

  • separate chaining - uses lists of values to hold all the collided values
  • linear probing - uses extended array to make space and group collited values in an array section.
106
Q

Dictionary implementation based on the hash table with separate chaining collision resolution strategy.

A
  • Array of N buckets is created to hold N hashes.
  • N is selected to be a prime number so that hash values are evenly distributed accross them.
  • Every bucket references a list of values which get extended every time ne element for that hash value is provided.
  • When collision occurs, element is added to the list of values of that bucker.
  • When the element is to be retrieved, the list has to be traversed.
107
Q

Explain:

Problems with the separate chaining in hash table implementation.

A

If our hashing function doesn’t have good properties and have some input examples, or a bigger series of them, for which it produces all the same hash value, this can lead to the situation that some lists of the table get so long that it ends up having O(N) worst case complexity.

These are some standard attacks on the hash table implementations.

108
Q

Hash table data structure asymptotic analysis.

A

For a hash table where length of the list is N/M for number of elements N and size of the table M

  • insert - expected constant, worst-case O(1)
  • find - expected O(N/M), worst-case O(N)
  • delete - expected constant, worst-case O(1)
  • succ, pred, min, max are all O(N+M) since we have to traverse the whole structure visiting M buckets and N elements in them in total.
109
Q

Explain:

usage of hashing in Rabin-Karp substring search algorithm.

A

Due to the properties of hashing functions for strings, Rabin-Karp algorithm uses them to efficiently find the substrings in a string in O(n+m) asymprotic complexity.

  • First time the m-length string hash is computed.
  • Since hash is an integer in A-based number system, we can always drop a digit and compute a new one based on the next char.
  • This way we do not recompute whole comparison every time but just once.