Final Study Flashcards

Question 1

Q

Describe the issues with having an unbalanced tree

Answer

A

The operations become more slow than the expected as the level of unbalance increases, with the worst case for search requiring O(n) time, and O(log n) for a balanced tree.

A plain BST does not ensure balance. The order in which keys are inserted into the tree will affect the balance.

Question 2

Q

How is balance evaluated?

Answer

A

With Height. Every node in the tree, the heights of the node’s left and right subtrees differs by at most 1.

The BALANCE FACTOR is used to determine whether or not the subtree rooted at that node is height balanced. It is the difference in height between the nodes right and left subtrees.

balanceFactor(N) = height(N.right) - height(N.left)

By convention, the height of a node with no subtrees is -1. This ensures that their parent node will have a balance factor of 0 if it has two children; (-1–1=0)

balanceFactor < 0 is LEFT HEAVY

balanceFactor > 0 is RIGHT HEAVY

Question 3

Q

AVL Rotations

Answer

A

Rebalance is required in response to an insertion or removal of an element.

Rebalancing starts from the bottom (at the point of change) and proceeds upwards towards the root.

Specifically, after every insertion into or removal from an AVL tree, the tree retraces the path it took to find the location at which to insert or remove a node upwards, towards the root. At each node encountered during this upward traversal of the tree, the AVL tree re-computes the balance factor and performs a rotation if needed:

Moves one node upwards and another downwards in order to restore balance.

Each rotation has a center and a direction.

ROTATION CENTER is the point at which the rotation is performed

LEFT ROTATION (aka R-R rotation) is counterclockwise with the center moving down and the nodes to it’s right moving upwards

RIGHT ROTATION moves clockwise, with the center moving downwards and the nodes to its left moving upwards

Question 4

Q

Determining when a single AVL rotation is necessary

Answer

A

N = an unbalanced node
C = n’s heavier child

C will have either a balance factor of -1, or 1

If N and C are heavy in the same direction (balance factor has the same sign for both), then a single rotation is needed around N in the opposite direction of N’s heaviness.

These are the situations needing a single rotation:

Question 5

Q

Determining when a double AVL rotation is necessary

Answer

A

N = an unbalanced node
C = n’s heavier child

C will have either a balance factor of -1, or 1

If N and C are heavy in the opposite directions (balance factor has opposite signs), then a double rotation is necessary.

Question 6

Q

Pointer updates for AVL single rotations

Answer

A

N is the center of rotation
C is the current unbalanced child of N
N moves down, opposite the direction of rotation, becoming C’s child
This causes C to effectively move into N’s place
N adopts the child of the side that it took from C
- Note that this child will be located on the opposite side of n, compared to where it was on C

Example:

In a right rotation around N, the following things will happen:

N will become the right child of its current left child C.
C’s current right child will become N’s left child.
If N has a parent PN, then C will replace N as PN’s child. Otherwise, if N was the root of the entire tree, C will replace N as the root.

Question 7

Q

Pointer updates for double rotations

Answer

A

N is the center of rotation
C is the current unbalanced child of N

Importantly, each rotation in a double rotation is always applied in the direction opposite to the heaviness at the rotation’s center. In other words:

To fix a left-right imbalance, we apply a left rotation around C followed by a right rotation around N.
To fix a right-left imbalance, we apply a right rotation around C followed by a left rotation around N.

The first rotation is simply ment to align imbalances on the same side, i.e., to create a L-L or a R-R imbalance

Question 8

Q

Necessary properties of an AVL node:

Answer

A

key - an integer that the nodes are sorted by
value - the value held by the node
height - an integer representing the height of the node
left - the left child Node
right - the right child Node
parent - the parent Node

Importantly, just as we use None to indicate when a node does not have a child, we will also use None to indicate when a node does not have a parent. Specifically, the root node of the tree will always have a None parent pointer.

Question 9

Q

rotatLeft and rotateRight

Answer

A

rotateLeft(n):
c ← n.right
n.right ← c.left
if n.right is not NULL:
n.right.parent ← n
c.left ← n
n.parent ← c
updateHeight(n)
updateHeight(c)
return c
And here’s the code for a right rotation around N:

rotateRight(n):
c ← n.left
n.left ← c.right
if n.left is not NULL:
n.left.parent ← n
c.right ← n
n.parent ← c
updateHeight(n)
updateHeight(c)
return c

Question 10

Q

updateHeight, avlInsert, avlRemove, and rebalance

Answer

A

updateHeight(n):
n.height ← MAX(height(n.left), height(n.right)) + 1

Here, n’s new height is one more than the larger of the heights of its left and right subtrees (we add 1 to account for N itself). The height() function here simply returns a node’s height, or −1 if that node is NULL.

avlInsert(tree, key, value):
insert key, value into tree like normal BST insertion
n ← newly inserted node
p ← n.parent
while p is not NULL:
rebalance(p)
p ← p.parent

avlRemove(tree, key):
remove key from tree like normal BST removal
p ← lowest modified node (e.g. parent of removed node)
while p is not NULL:
rebalance(p)
p ← p.parent

rebalance(n):
if balanceFactor(n) < -1:
if balanceFactor(n.left) > 0:
n.left ← rotateLeft(n.left)
n.left.parent ← n
newSubtreeRoot ← rotateRight(n)
newSubtreeRoot.parent ← n.parent
n.parent.left or n.parent.right ← newSubtreeRoot
else if balanceFactor(n) > 1:
if balanceFactor(n.right) < 0:
n.right ← rotateRight(n.right)
n.right.parent ← n
newSubtreeRoot ← rotateLeft(n)
newSubtreeRoot.parent ← n.parent
n.parent.left or n.parent.right ← newSubtreeRoot
else:
updateHeight(n)

The if and else if conditions check if height balance is lost at n.
The inner if statements check whether a double rotation is needed. Specifically, they check whether n’s child is imbalanced in the opposite direction of n’s imbalance. If so, the double rotation’s first rotation, around n’s child, is performed, with parent status updated appropriately.
Any time a rotation is performed, the node returned by the rotation function is used to update parent status appropriately.
Note that newSubtreeRoot will become either the left or right child of n’s old parent, replacing n.
If no rotation at all is performed (i.e., if we enter the else block), we still need to make sure to update n’s subtree height, since the tree underneath n may have changed

Question 11

Q

time complexity of AVL operations

Answer

A

Rebalance = O(1)
Insert and Remove operations each have an overall complexity of O(log n)

Question 12

Q

What is a priority Queue?

Answer

A

The priority queue is an ADT that associates a priority value with each element. The element with the highest priority is the first one dequeued. Typically, this corresponds to the element with the lowest priority value.

A heap data structure is typically a complete binary tree, in which every node’s value is less than or equal to the values of its children. This version of the heap is specifically called a minimum binary heap, or just “min heap”. We can also have a “max heap”, where each node’s value is greater than or equal to the values of its children

Question 13

Q

Typical interface of a priority queue

Answer

A

insert() – insert an element with a specified priority value
first() – return the element with the lowest priority value (the “first” element in the priority queue)
remove_first() – remove (and return) the element with the lowest priority value

typically implemented with the heap data structure

typically there is a priority value that is used to organize the heap, but there will usually also be some sort of corresponding data associated with the value.

Necessary to track:
- last filled position
- next open spot

Question 14

Q

Heap Operation (with Priority Queue) - Addition of a node

Answer

A

Place the node into the next open spot
Percolate the node up the heap until its priority value is less than (or greater than in the case of a max heap) both of its children.

Example with Dynamic Array implementation:

Put the new element at the end of the array.
Compute the inserted element’s parent index, (i − 1) / 2.
Compare the value of the inserted element with the value of its parent.
If the value of the parent is greater than the value of the inserted element, swap the elements in the array and repeat from step 2.
- Do not repeat if the element has reached the beginning of the array.

Time Complexity up to O(log n)
This comes from the fact that after placing the new element at the bottom of the heap, it may need to be “percolated” or “heapified” up, swapping with its parent nodes until the heap property is restored. This percolation can occur up the tree height, which is O(log n) for a heap.

Question 15

Q

Heap Operation (with Priority Queue) - Removal of a node

Answer

A

Removal of the root node using the remove_first() operation:
-replace the root node with the last node that was added
-percolate the new root downwards until its parent meets the priority queue criteria

Example with dynamic array implementation:
1. Remember the value of the first element in the array (to be returned later).
2. Replace the value of the first element in the array with the value of the last element, and remove the last element.
3. If the array is not empty (i.e., it started with more than one element), compute the indices of the children of the replacement element (2 * i + 1 and 2 * i + 2).
If both of these elements fall beyond the bounds of the array, we can stop here.
4. Compare the value of the replacement element with the minimum value of its two children (or possibly one child).
5. If the replacement element’s value is greater than its minimum child’s value, swap those two elements in the array, and repeat from step 3.

Question 16

Q

Heap Implementation

Answer

A

We can implement the complete binary tree representation of a heap using a dynamic array.

if the binary tree is not complete, big gaps would result in the dynamic array

Question 17

Q

Heap - formula for children of a node at index i, parent of a node at index i

Answer

A

left child = 2i + 1
right child = 2i+2

parent = (i-1)//2 (floor division)

Question 18

Q

Building a heap from an unsorted array

Answer

A

Find the first non-leaf element in the array - located at n // 2 -1 (floor division)
percolate this element down
proceed to the next non-leaf element and repeat the percolation process
do this until we get to the root, once it is percolated down, we will have a proper heap.

Question 19

Q

Heapsort

Answer

A

Time Complexity - O(n log n)

An in-place sorting algorithm

build a heap from the array using the process described for an unsorted array
- keep in mind that we simply have to heapify the non-leaf nodes
similar to heap remove -
- keep a running counter k, initialized to one less than the size of the array (i.e., the last element)
Swap the first element in the array with the last element
decrement K
Percolate the replacement value down to no further than the heap portion (k),
- the heap portion is effectively shrinking by 1 with each iteration
maintain two properties during the sort
1. The elements of the array after k are sorted, with the minimum element at the end of the array
2. The array through k always forms a heap, with the minimum remaining valu at the beginning of the array

Question 20

Q

Maps

Answer

A

Also known as a dictionary or an associative array.

good when all we need are insertion, lookup, and removal operations.

We could use a simple array, storing key/value structs
- this would give us O(n) insertions and lookups (or O(log n) lookups, if we ordered the array by key

We could use an AVL tree, also storing key/value structs
- this would give us O(log n) insertions and lookups

A hash table can do better
- chaining with linked list - The average case for all operations is O(𝝺).
If the number of buckets is adjusted according to the load factor, then the number of elements is a constant factor of the number of buckets i.e.: 𝝺 = n/m = O(m)/m = O(1).
- open addressing - O(1) on average if we limit the load factor to a constant and reasonably small number.

Question 21

Q

Hash Table

Answer

A

An array with differences:
- Elements can be indexed by values other than integers
- More than one element may share an index

Can be constructed with a dynamic array that resizes when the load factor hits a certain threshold

Use of a hash function that takes values of some type (string, struct, double, etc.) and maps them to an integer index value. We can then use this value to both store and retrieve data from an actual array

The hash function usually computes an index in two steps:
hash = hash_function(key)
index = hash % array_size

Question 22

Q

Perfect and minimally perfect hash functions

Answer

A

A perfect hash function is one that results in no collisions; that is, every input gets a unique output.

A minimally perfect hash function is one that results in no collisions for a table size that equals exactly the number of elements. A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n.

Question 23

Q

Hash Table collision resolution with chaining

Answer

A

Compute the element’s bucket using the hash function
Search the data structure at that bucket for the element using the key (e.e., iterate through the elements in the linked list)

Question 24

Q

hash table load factor

Answer

A

the average number of elements in each bucket

𝝺=n/m
𝝺 Is the load factor
n is the total number of elements stored in the table
m is the number of buckets
In a chained hash table, the load factor can be greater than 1.

Question 25

Q

Open addressing time complexity

Answer

A

Let’s assume truly uniform hashing.

To insert a given item into the table (that’s not already there), the probability that the first probe is successful is (m−n)/m.

There are m total slots and n filled slots, so m − n open spots.
Let’s call this probability p, i.e., p = (m−n)/m.
If the first probe fails, the probability that the second probe succeeds is (m−n)/(m−1).

There are still m − n remaining open slots, but now we only have a total of m − 1 slots to look at, since we have examined one already.
If the first two probes fail, the probability that the third probe succeeds is (m−n)/(m−2).

There are still m − n remaining open slots, but now we only have a total of m − 2 slots to look at, since we’ve examined two already.
And so forth. In other words, for each probe, the probability of success is at least p because (m−n)/(m−c) >= (m−n)/m = p.

Here, we are dealing with a geometric distribution, so the expected number of probes until success is:

1/p = 1/((m−n)/m) = 1/(1−n/m) = 1/(1−𝝺)

In other words, the expected number of probes for any given operation is O(1/(1−𝝺).

This suggests that, if we limit the load factor to a constant and reasonably small number, our operations will be O(1) on average. For example, if we have 𝝺 = 0.75, then we would expect 4 probes, on average. For 𝝺 = 0.9, we would expect 10 probes

Question 26

Q

When do you use quadratic probing step?

Answer

A

When you find a collision, if no collision is detected, perform your operation.

Question 27

Q

In open addressing, if the first probe fails, the probability that the second probe succeeds is (m-n)/(n-1)? True or false

Answer

A

False, if the first probe fails, the probability that the second probe succeeds is (m-n)/(m-1)

Question 28

Q

Does hash table performance increase or decrease as the number of buckets increases?

Answer

A

It should increase. An increase in the number of buckets will more evenly distribute the elements within the indices of the hash table. Also, recall that accessing a bucket in a hash table runs in O(1) time, since it simply needs to run the hash function on a key to find out which bucket it belongs in. However, unless it is a perfect hash function without any collisions (multiple elements stored at one index), you will still need to traverse all of the elements within that bucket for certain methods like “contains”. This is where having more buckets and a smaller table load factor will increase performance since more buckets generally mean fewer elements stored at any one index.

Question 29

Q

What is the worst-case time complexity to retrieve a value from a hash table with chaining?

Question 30

Q

Graphs

Answer

A

A structure that represents a collection of objects or states, where some pairs of those objects are related or connected in some way.

Similar to trees in that they comprise of connected nodes, but they can also contain cycles.

searching for things -
breadth first - looks at every adjacent node, then looks at all of their adjacent nodes, and so on
Depth-first - starts at a node, travels to the furthest possible node, then traces back to search other paths
Dijkstra’s algorithm - a way of finding the shortest path from one node to another when the connections between them have different weights

Question 31

Q

What does a graph node contain?

Answer

A

Aka vertex, vertices

represent objects, states (i.e. conditions or configurations), locations, etc. They form a set in which each vertex is unique. This means that no two vertices represent the same object or state. Therefore, the set of vertices V = {v1, v2, v3,…,vn}

Question 32

Q

What does a graph edge contain?

Answer

A

Edges represent relationships or connections between vertices
- represented as vertex pairs: the set of edges E = {(vi, vj), …}
- They can be directed or undirected
- If there is an edge between vi and vj, then vi and vj are said to be adjacent (or they are neighbors
- Edges can be weighted or unweighted. (weighted edges represent a property of the edge. for example, it might represent the distance between vertices).

Question 33

Q

What are the parts of a graph?

Answer

A

vertices, representing elements in the graph
edges, representing the connections between those elements

Question 34

Q

When is it appropriate to use a weighted graph vs an unweighted graph?

Question 35

Q

What is an advantage to using an unweighted graph?

Question 36

Q

What are the characteristics of a data set that requires a weighted graph?

Question 37

Q

What is a directed edge in a graph?

Answer

A

A directed edge is like a following relationship on Twitter. for example, if han follows nick, there would be a directed edge between them pointing from han to nick
- we say the edge is directed from han to nick
- we can also say that han is the head of this edge, and nick is the tail
we can also say that nick is the direct successor of han, and han is a direct predecessor of nick
- we can also say that nick is reachable from han

Question 38

Q

How do graphs represent general relationships between objects?

Answer

A

A node may have connections to any number of other nodes. An undirected graph is connected if all vertices are reachable from all other vertices. A directed graph is strongly connected if all vertices are reachable from all other vertices.
There can be multiple paths (or no path) from one node to another. In an undirected graph, a path is a sequence of vertices, where each adjacent pair of vertices are adjacent in the graph. Informally, we can also think of a path as a sequence of edges.
A simple path visits each edge at most once. A simple path is often used for a trail with no repeated vertices, which conflicts with the definition we have just mentioned. Because of this variation in terminology, you will need to make sure which definition is used in a particular problem when you consider about traversing edges of a graph.
A directed path is a sequence of directed edges in the graph.
A component of an undirected graph is an induced subgraph, in which any two vertices are connected to each other by paths, and which is not connected to any additional vertices in the rest of the graph.
A vertex with no incident edges is itself a component. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are also sometimes called connected components.
A cycle is a closed walk that enters and leaves each vertex at most once. An undirected graph is acyclic if no subgraph is a cycle. Acyclic graphs are also called forests. A directed graph is acyclic if it does not contain a directed cycle. Directed acyclic graphs are often called dags.

Question 39

Q

What is a connected and strongly connected graph path?
What is a simple path?
What is a directed path?
What is an induced subgraph?
What is a component?
What are connected components?
What is acyclic?
What is a forest?
What are dags?

Question 40

Q

How would we try to find reachable vertices from some vertex vi?

Answer

A

Initialize an empty set of reachable vertices
Initialize an empty stack. Add vi to the stack.
If the stack is not empty, pop a vertex v from the stack
If v is not in the set of reachable vertices:
- add it to the set of reachable vertices
- add each vertex that is direct successor of v to the stack
Repeat from 3

Question 41

Q

Depth first search and breadth first search

Answer

A

Used when we are looking for a node with a particular characteristic. Both can be used to find a path from start to finish in a maze.

The general algorithm for DFS and BFS is below. For DFS, we use a stack, and for BFS, we use a queue:

Initialize an empty set of visited vertices.
Initialize an empty stack (DFS) or queue (BFS). Add vi to the stack/queue.
If the stack/queue is not empty, pop/dequeue a vertex v.
Perform any desired processing on v.
E.g., check if v meets a desired condition.
(DFS only): If v is not in the set of visited vertices:
Add v to the set of visited vertices.
Push each vertex that is direct successor of v to the stack.
(BFS only):
Add v to the set of visited vertices.
For each direct successor v’ of v:
If v’ is not in the set of visited vertices, enqueue it into the queue
Repeat from 3.

We can make some comparisons between DFS and BFS:

DFS is a backtracking search. If we’re looking for a node with a specific characteristic, and DFS takes a path that doesn’t contain such a node, it will backtrack to try a different path.
In an infinite graph, DFS can become lost down an infinite path without ever finding a solution.
BFS is complete and optimal. If a solution exists in the graph, BFS is guaranteed to find it, and it will find the shortest path to that solution.
However, BFS may take a long time to find a solution if the solution is deep in the graph. DFS may find a deep solution more quickly.
Both algorithms have O(V) space complexity in the worst case. However, BFS may take up more space in practice.
If the graph has a high branching factor, i.e., if each node has many neighbors, BFS can use a lot of memory to maintain all of the paths being explored in the queue.

Question 42

Q

Dijkstra’s algorithm: single source lowest-cost paths

Answer

A

Dijkstra’s algorithm is structured very much like DFS and BFS, except in this algorithm, we will use a priority queue to order our search.

The priority values used in the queue correspond to the cumulative distance to each vertex added to the priority queue.
Therefore, we are always exploring the remaining node with the minimum cumulative cost.
Here’s the algorithm, which begins with some source vertex vs:

Initialize an empty map/hash table representing visited vertices.
Key is the vertex v.
Value is the min distance d to vertex v.
Initialize an empty priority queue, and insert vs into it with distance (priority) 0.
While the priority queue is not empty:
Remove the first element (a vertex) from the priority queue and assign it to v. Let d be v’s distance (priority).
If v is not in the map of visited vertices:
Add v to the visited map with distance/cost d.
For each direct successor vi of v:
Let di equal the cost/distance associated with edge (v, vi).
Insert vi to the priority queue with distance (priority) d + di.
This version of the algorithm only keeps track of the minimum distance to each vertex, but it can be easily modified to keep track of the min-distance path, too. You can do this by augmenting the visited vertex map and the priority queue to keep track of the vertex previous to each one added.

The complexity of this version of the algorithm is O(|E| log |E|). The innermost loop is executed at most |E| times, and the cost of the instructions inside the loop is O(log |E|). The inner cost comes from insertion into the priority queue.

Question 43

Q

Breadth First Search Example

Answer

A

add a to the queue
initialize reachable array with nothing
pop a off the queue and add it to reachable
add A’s connected nodes to the queue in reverse alpha order
pop D off the queue and add it to reachable
add its connected nodes to the back of the queue
pop c off the queue and add it to reachable
add c’s connected nodes ot the end of the queue
Pop b off the front of the queue and add it’s connections, (none, c was already processed), add b to reachable
pop f off the queue and add g to queue, add f to reachable
pop g off the end of the queu and add it’s connections to queue(none)
pop e off the end of the queue and add it to reachable

Question 44

Q

Depth first search example

Answer

A

Initialize an empty stack (DFS)
ad vi to the stack
if the stack is not empty, pop a vertex
process the vertex
if v is not in the set of visited vertices, add it
push of v’s direct successors to the stack

Example
iteration–stack–visited
0–A–none
1–BCD–A
2–CCD–AB
3–EGCD–ABC
4–GCD–ABCE
5–CD–ABCEG
6–D–ABCEG
7–F–ABCEGD
8– –ABCEGDF

Question 45

Q

Dijkstra’s Algorithm example

Answer

A

use a priority queue to order our search
The priority values used in the queue correspond to the cumulative distance to each vertex added to the priority queue
Initialize an empty map/hash table representing visited vertices.
Key is the vertex v.
Value is the min distance d to vertex v.
Initialize an empty priority queue, and insert vs into it with distance (priority) 0.
While the priority queue is not empty:
Remove the first element (a vertex) from the priority queue and assign it to v. Let d be v’s distance (priority).
If v is not in the map of visited vertices:
Add v to the visited map with distance/cost d.
For each direct successor vi of v:
Let di equal the cost/distance associated with edge (v, vi).
Insert vi to the priority queue with distance (priority) d + di.

Question 46

Q

Adjacency matrix example