Final Study Flashcards

1
Q

Describe the issues with having an unbalanced tree

A

The operations become more slow than the expected as the level of unbalance increases, with the worst case for search requiring O(n) time, and O(log n) for a balanced tree.

A plain BST does not ensure balance. The order in which keys are inserted into the tree will affect the balance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is balance evaluated?

A

With Height. Every node in the tree, the heights of the node’s left and right subtrees differs by at most 1.

The BALANCE FACTOR is used to determine whether or not the subtree rooted at that node is height balanced. It is the difference in height between the nodes right and left subtrees.

balanceFactor(N) = height(N.right) - height(N.left)

By convention, the height of a node with no subtrees is -1. This ensures that their parent node will have a balance factor of 0 if it has two children; (-1–1=0)

balanceFactor < 0 is LEFT HEAVY

balanceFactor > 0 is RIGHT HEAVY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

AVL Rotations

A

Rebalance is required in response to an insertion or removal of an element.

Rebalancing starts from the bottom (at the point of change) and proceeds upwards towards the root.

Specifically, after every insertion into or removal from an AVL tree, the tree retraces the path it took to find the location at which to insert or remove a node upwards, towards the root. At each node encountered during this upward traversal of the tree, the AVL tree re-computes the balance factor and performs a rotation if needed:

Moves one node upwards and another downwards in order to restore balance.

Each rotation has a center and a direction.

ROTATION CENTER is the point at which the rotation is performed

LEFT ROTATION (aka R-R rotation) is counterclockwise with the center moving down and the nodes to it’s right moving upwards

RIGHT ROTATION moves clockwise, with the center moving downwards and the nodes to its left moving upwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Determining when a single AVL rotation is necessary

A

N = an unbalanced node
C = n’s heavier child

C will have either a balance factor of -1, or 1

If N and C are heavy in the same direction (balance factor has the same sign for both), then a single rotation is needed around N in the opposite direction of N’s heaviness.

These are the situations needing a single rotation:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Determining when a double AVL rotation is necessary

A

N = an unbalanced node
C = n’s heavier child

C will have either a balance factor of -1, or 1

If N and C are heavy in the opposite directions (balance factor has opposite signs), then a double rotation is necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pointer updates for AVL single rotations

A

N is the center of rotation
C is the current unbalanced child of N
N moves down, opposite the direction of rotation, becoming C’s child
This causes C to effectively move into N’s place
N adopts the child of the side that it took from C
- Note that this child will be located on the opposite side of n, compared to where it was on C

Example:

In a right rotation around N, the following things will happen:

N will become the right child of its current left child C.
C’s current right child will become N’s left child.
If N has a parent PN, then C will replace N as PN’s child. Otherwise, if N was the root of the entire tree, C will replace N as the root.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pointer updates for double rotations

A

N is the center of rotation
C is the current unbalanced child of N

Importantly, each rotation in a double rotation is always applied in the direction opposite to the heaviness at the rotation’s center. In other words:

To fix a left-right imbalance, we apply a left rotation around C followed by a right rotation around N.
To fix a right-left imbalance, we apply a right rotation around C followed by a left rotation around N.

The first rotation is simply ment to align imbalances on the same side, i.e., to create a L-L or a R-R imbalance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Necessary properties of an AVL node:

A

key - an integer that the nodes are sorted by
value - the value held by the node
height - an integer representing the height of the node
left - the left child Node
right - the right child Node
parent - the parent Node

Importantly, just as we use None to indicate when a node does not have a child, we will also use None to indicate when a node does not have a parent. Specifically, the root node of the tree will always have a None parent pointer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

rotatLeft and rotateRight

A

rotateLeft(n):
c ← n.right
n.right ← c.left
if n.right is not NULL:
n.right.parent ← n
c.left ← n
n.parent ← c
updateHeight(n)
updateHeight(c)
return c
And here’s the code for a right rotation around N:

rotateRight(n):
c ← n.left
n.left ← c.right
if n.left is not NULL:
n.left.parent ← n
c.right ← n
n.parent ← c
updateHeight(n)
updateHeight(c)
return c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

updateHeight, avlInsert, avlRemove, and rebalance

A

updateHeight(n):
n.height ← MAX(height(n.left), height(n.right)) + 1

Here, n’s new height is one more than the larger of the heights of its left and right subtrees (we add 1 to account for N itself). The height() function here simply returns a node’s height, or −1 if that node is NULL.

avlInsert(tree, key, value):
insert key, value into tree like normal BST insertion
n ← newly inserted node
p ← n.parent
while p is not NULL:
rebalance(p)
p ← p.parent

avlRemove(tree, key):
remove key from tree like normal BST removal
p ← lowest modified node (e.g. parent of removed node)
while p is not NULL:
rebalance(p)
p ← p.parent

rebalance(n):
if balanceFactor(n) < -1:
if balanceFactor(n.left) > 0:
n.left ← rotateLeft(n.left)
n.left.parent ← n
newSubtreeRoot ← rotateRight(n)
newSubtreeRoot.parent ← n.parent
n.parent.left or n.parent.right ← newSubtreeRoot
else if balanceFactor(n) > 1:
if balanceFactor(n.right) < 0:
n.right ← rotateRight(n.right)
n.right.parent ← n
newSubtreeRoot ← rotateLeft(n)
newSubtreeRoot.parent ← n.parent
n.parent.left or n.parent.right ← newSubtreeRoot
else:
updateHeight(n)

The if and else if conditions check if height balance is lost at n.
The inner if statements check whether a double rotation is needed. Specifically, they check whether n’s child is imbalanced in the opposite direction of n’s imbalance. If so, the double rotation’s first rotation, around n’s child, is performed, with parent status updated appropriately.
Any time a rotation is performed, the node returned by the rotation function is used to update parent status appropriately.
Note that newSubtreeRoot will become either the left or right child of n’s old parent, replacing n.
If no rotation at all is performed (i.e., if we enter the else block), we still need to make sure to update n’s subtree height, since the tree underneath n may have changed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

time complexity of AVL operations

A

Rebalance = O(1)
Insert and Remove operations each have an overall complexity of O(log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a priority Queue?

A

The priority queue is an ADT that associates a priority value with each element. The element with the highest priority is the first one dequeued. Typically, this corresponds to the element with the lowest priority value.

A heap data structure is typically a complete binary tree, in which every node’s value is less than or equal to the values of its children. This version of the heap is specifically called a minimum binary heap, or just “min heap”. We can also have a “max heap”, where each node’s value is greater than or equal to the values of its children

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Typical interface of a priority queue

A

insert() – insert an element with a specified priority value
first() – return the element with the lowest priority value (the “first” element in the priority queue)
remove_first() – remove (and return) the element with the lowest priority value

typically implemented with the heap data structure

typically there is a priority value that is used to organize the heap, but there will usually also be some sort of corresponding data associated with the value.

Necessary to track:
- last filled position
- next open spot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Heap Operation (with Priority Queue) - Addition of a node

A
  • Place the node into the next open spot
  • Percolate the node up the heap until its priority value is less than (or greater than in the case of a max heap) both of its children.

Example with Dynamic Array implementation:

  1. Put the new element at the end of the array.
  2. Compute the inserted element’s parent index, (i − 1) / 2.
  3. Compare the value of the inserted element with the value of its parent.
  4. If the value of the parent is greater than the value of the inserted element, swap the elements in the array and repeat from step 2.
    - Do not repeat if the element has reached the beginning of the array.

Time Complexity up to O(log n)
This comes from the fact that after placing the new element at the bottom of the heap, it may need to be “percolated” or “heapified” up, swapping with its parent nodes until the heap property is restored. This percolation can occur up the tree height, which is O(log n) for a heap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Heap Operation (with Priority Queue) - Removal of a node

A

Removal of the root node using the remove_first() operation:
-replace the root node with the last node that was added
-percolate the new root downwards until its parent meets the priority queue criteria

Example with dynamic array implementation:
1. Remember the value of the first element in the array (to be returned later).
2. Replace the value of the first element in the array with the value of the last element, and remove the last element.
3. If the array is not empty (i.e., it started with more than one element), compute the indices of the children of the replacement element (2 * i + 1 and 2 * i + 2).
If both of these elements fall beyond the bounds of the array, we can stop here.
4. Compare the value of the replacement element with the minimum value of its two children (or possibly one child).
5. If the replacement element’s value is greater than its minimum child’s value, swap those two elements in the array, and repeat from step 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Heap Implementation

A

We can implement the complete binary tree representation of a heap using a dynamic array.

if the binary tree is not complete, big gaps would result in the dynamic array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Heap - formula for children of a node at index i, parent of a node at index i

A

left child = 2i + 1
right child = 2
i+2

parent = (i-1)//2 (floor division)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Building a heap from an unsorted array

A
  1. Find the first non-leaf element in the array - located at n // 2 -1 (floor division)
  2. percolate this element down
  3. proceed to the next non-leaf element and repeat the percolation process
  4. do this until we get to the root, once it is percolated down, we will have a proper heap.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Heapsort

A

Time Complexity - O(n log n)

An in-place sorting algorithm

  1. build a heap from the array using the process described for an unsorted array
    - keep in mind that we simply have to heapify the non-leaf nodes
  2. similar to heap remove -
    - keep a running counter k, initialized to one less than the size of the array (i.e., the last element)
  3. Swap the first element in the array with the last element
  4. decrement K
  5. Percolate the replacement value down to no further than the heap portion (k),
    - the heap portion is effectively shrinking by 1 with each iteration
  6. maintain two properties during the sort
    1. The elements of the array after k are sorted, with the minimum element at the end of the array
    2. The array through k always forms a heap, with the minimum remaining valu at the beginning of the array
20
Q

Maps

A

Also known as a dictionary or an associative array.

good when all we need are insertion, lookup, and removal operations.

We could use a simple array, storing key/value structs
- this would give us O(n) insertions and lookups (or O(log n) lookups, if we ordered the array by key

We could use an AVL tree, also storing key/value structs
- this would give us O(log n) insertions and lookups

A hash table can do better
- chaining with linked list - The average case for all operations is O(𝝺).
If the number of buckets is adjusted according to the load factor, then the number of elements is a constant factor of the number of buckets i.e.: 𝝺 = n/m = O(m)/m = O(1).
- open addressing - O(1) on average if we limit the load factor to a constant and reasonably small number.

21
Q

Hash Table

A

An array with differences:
- Elements can be indexed by values other than integers
- More than one element may share an index

Can be constructed with a dynamic array that resizes when the load factor hits a certain threshold

Use of a hash function that takes values of some type (string, struct, double, etc.) and maps them to an integer index value. We can then use this value to both store and retrieve data from an actual array

The hash function usually computes an index in two steps:
hash = hash_function(key)
index = hash % array_size

22
Q

Perfect and minimally perfect hash functions

A

A perfect hash function is one that results in no collisions; that is, every input gets a unique output.

A minimally perfect hash function is one that results in no collisions for a table size that equals exactly the number of elements. A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n.

23
Q

Hash Table collision resolution with chaining

A
  • Compute the element’s bucket using the hash function
  • Search the data structure at that bucket for the element using the key (e.e., iterate through the elements in the linked list)
24
Q

hash table load factor

A

the average number of elements in each bucket

𝝺=n/m
𝝺 Is the load factor
n is the total number of elements stored in the table
m is the number of buckets
In a chained hash table, the load factor can be greater than 1.

25
Q

Open addressing time complexity

A

Let’s assume truly uniform hashing.

To insert a given item into the table (that’s not already there), the probability that the first probe is successful is (m−n)/m.

There are m total slots and n filled slots, so m − n open spots.
Let’s call this probability p, i.e., p = (m−n)/m.
If the first probe fails, the probability that the second probe succeeds is (m−n)/(m−1).

There are still m − n remaining open slots, but now we only have a total of m − 1 slots to look at, since we have examined one already.
If the first two probes fail, the probability that the third probe succeeds is (m−n)/(m−2).

There are still m − n remaining open slots, but now we only have a total of m − 2 slots to look at, since we’ve examined two already.
And so forth. In other words, for each probe, the probability of success is at least p because (m−n)/(m−c) >= (m−n)/m = p.

Here, we are dealing with a geometric distribution, so the expected number of probes until success is:

1/p = 1/((m−n)/m) = 1/(1−n/m) = 1/(1−𝝺)

In other words, the expected number of probes for any given operation is O(1/(1−𝝺).

This suggests that, if we limit the load factor to a constant and reasonably small number, our operations will be O(1) on average. For example, if we have 𝝺 = 0.75, then we would expect 4 probes, on average. For 𝝺 = 0.9, we would expect 10 probes

26
Q

When do you use quadratic probing step?

A

When you find a collision, if no collision is detected, perform your operation.

27
Q

In open addressing, if the first probe fails, the probability that the second probe succeeds is (m-n)/(n-1)? True or false

A

False, if the first probe fails, the probability that the second probe succeeds is (m-n)/(m-1)

28
Q

Does hash table performance increase or decrease as the number of buckets increases?

A

It should increase. An increase in the number of buckets will more evenly distribute the elements within the indices of the hash table. Also, recall that accessing a bucket in a hash table runs in O(1) time, since it simply needs to run the hash function on a key to find out which bucket it belongs in. However, unless it is a perfect hash function without any collisions (multiple elements stored at one index), you will still need to traverse all of the elements within that bucket for certain methods like “contains”. This is where having more buckets and a smaller table load factor will increase performance since more buckets generally mean fewer elements stored at any one index.

29
Q

What is the worst-case time complexity to retrieve a value from a hash table with chaining?

A

O(n)

30
Q

Graphs

A

A structure that represents a collection of objects or states, where some pairs of those objects are related or connected in some way.

Similar to trees in that they comprise of connected nodes, but they can also contain cycles.

searching for things -
breadth first - looks at every adjacent node, then looks at all of their adjacent nodes, and so on
Depth-first - starts at a node, travels to the furthest possible node, then traces back to search other paths
Dijkstra’s algorithm - a way of finding the shortest path from one node to another when the connections between them have different weights

31
Q

What does a graph node contain?

A

Aka vertex, vertices

represent objects, states (i.e. conditions or configurations), locations, etc. They form a set in which each vertex is unique. This means that no two vertices represent the same object or state. Therefore, the set of vertices V = {v1, v2, v3,…,vn}

32
Q

What does a graph edge contain?

A

Edges represent relationships or connections between vertices
- represented as vertex pairs: the set of edges E = {(vi, vj), …}
- They can be directed or undirected
- If there is an edge between vi and vj, then vi and vj are said to be adjacent (or they are neighbors
- Edges can be weighted or unweighted. (weighted edges represent a property of the edge. for example, it might represent the distance between vertices).

33
Q

What are the parts of a graph?

A

vertices, representing elements in the graph
edges, representing the connections between those elements

34
Q

When is it appropriate to use a weighted graph vs an unweighted graph?

A
35
Q

What is an advantage to using an unweighted graph?

A
36
Q

What are the characteristics of a data set that requires a weighted graph?

A
37
Q

What is a directed edge in a graph?

A

A directed edge is like a following relationship on Twitter. for example, if han follows nick, there would be a directed edge between them pointing from han to nick
- we say the edge is directed from han to nick
- we can also say that han is the head of this edge, and nick is the tail
we can also say that nick is the direct successor of han, and han is a direct predecessor of nick
- we can also say that nick is reachable from han

38
Q

How do graphs represent general relationships between objects?

A

A node may have connections to any number of other nodes. An undirected graph is connected if all vertices are reachable from all other vertices. A directed graph is strongly connected if all vertices are reachable from all other vertices.
There can be multiple paths (or no path) from one node to another. In an undirected graph, a path is a sequence of vertices, where each adjacent pair of vertices are adjacent in the graph. Informally, we can also think of a path as a sequence of edges.
A simple path visits each edge at most once. A simple path is often used for a trail with no repeated vertices, which conflicts with the definition we have just mentioned. Because of this variation in terminology, you will need to make sure which definition is used in a particular problem when you consider about traversing edges of a graph.
A directed path is a sequence of directed edges in the graph.
A component of an undirected graph is an induced subgraph, in which any two vertices are connected to each other by paths, and which is not connected to any additional vertices in the rest of the graph.
A vertex with no incident edges is itself a component. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are also sometimes called connected components.
A cycle is a closed walk that enters and leaves each vertex at most once. An undirected graph is acyclic if no subgraph is a cycle. Acyclic graphs are also called forests. A directed graph is acyclic if it does not contain a directed cycle. Directed acyclic graphs are often called dags.

39
Q

What is a connected and strongly connected graph path?
What is a simple path?
What is a directed path?
What is an induced subgraph?
What is a component?
What are connected components?
What is acyclic?
What is a forest?
What are dags?

A
40
Q

How would we try to find reachable vertices from some vertex vi?

A
  1. Initialize an empty set of reachable vertices
  2. Initialize an empty stack. Add vi to the stack.
  3. If the stack is not empty, pop a vertex v from the stack
  4. If v is not in the set of reachable vertices:
    - add it to the set of reachable vertices
    - add each vertex that is direct successor of v to the stack
  5. Repeat from 3
41
Q

Depth first search and breadth first search

A

Used when we are looking for a node with a particular characteristic. Both can be used to find a path from start to finish in a maze.

The general algorithm for DFS and BFS is below. For DFS, we use a stack, and for BFS, we use a queue:

Initialize an empty set of visited vertices.
Initialize an empty stack (DFS) or queue (BFS). Add vi to the stack/queue.
If the stack/queue is not empty, pop/dequeue a vertex v.
Perform any desired processing on v.
E.g., check if v meets a desired condition.
(DFS only): If v is not in the set of visited vertices:
Add v to the set of visited vertices.
Push each vertex that is direct successor of v to the stack.
(BFS only):
Add v to the set of visited vertices.
For each direct successor v’ of v:
If v’ is not in the set of visited vertices, enqueue it into the queue
Repeat from 3.

We can make some comparisons between DFS and BFS:

DFS is a backtracking search. If we’re looking for a node with a specific characteristic, and DFS takes a path that doesn’t contain such a node, it will backtrack to try a different path.
In an infinite graph, DFS can become lost down an infinite path without ever finding a solution.
BFS is complete and optimal. If a solution exists in the graph, BFS is guaranteed to find it, and it will find the shortest path to that solution.
However, BFS may take a long time to find a solution if the solution is deep in the graph. DFS may find a deep solution more quickly.
Both algorithms have O(V) space complexity in the worst case. However, BFS may take up more space in practice.
If the graph has a high branching factor, i.e., if each node has many neighbors, BFS can use a lot of memory to maintain all of the paths being explored in the queue.

42
Q

Dijkstra’s algorithm: single source lowest-cost paths

A

Dijkstra’s algorithm is structured very much like DFS and BFS, except in this algorithm, we will use a priority queue to order our search.

The priority values used in the queue correspond to the cumulative distance to each vertex added to the priority queue.
Therefore, we are always exploring the remaining node with the minimum cumulative cost.
Here’s the algorithm, which begins with some source vertex vs:

Initialize an empty map/hash table representing visited vertices.
Key is the vertex v.
Value is the min distance d to vertex v.
Initialize an empty priority queue, and insert vs into it with distance (priority) 0.
While the priority queue is not empty:
Remove the first element (a vertex) from the priority queue and assign it to v. Let d be v’s distance (priority).
If v is not in the map of visited vertices:
Add v to the visited map with distance/cost d.
For each direct successor vi of v:
Let di equal the cost/distance associated with edge (v, vi).
Insert vi to the priority queue with distance (priority) d + di.
This version of the algorithm only keeps track of the minimum distance to each vertex, but it can be easily modified to keep track of the min-distance path, too. You can do this by augmenting the visited vertex map and the priority queue to keep track of the vertex previous to each one added.

The complexity of this version of the algorithm is O(|E| log |E|). The innermost loop is executed at most |E| times, and the cost of the instructions inside the loop is O(log |E|). The inner cost comes from insertion into the priority queue.

43
Q

Breadth First Search Example

A

add a to the queue
initialize reachable array with nothing
pop a off the queue and add it to reachable
add A’s connected nodes to the queue in reverse alpha order
pop D off the queue and add it to reachable
add its connected nodes to the back of the queue
pop c off the queue and add it to reachable
add c’s connected nodes ot the end of the queue
Pop b off the front of the queue and add it’s connections, (none, c was already processed), add b to reachable
pop f off the queue and add g to queue, add f to reachable
pop g off the end of the queu and add it’s connections to queue(none)
pop e off the end of the queue and add it to reachable

44
Q

Depth first search example

A

Initialize an empty stack (DFS)
ad vi to the stack
if the stack is not empty, pop a vertex
process the vertex
if v is not in the set of visited vertices, add it
push of v’s direct successors to the stack

Example
iteration–stack–visited
0–A–none
1–BCD–A
2–CCD–AB
3–EGCD–ABC
4–GCD–ABCE
5–CD–ABCEG
6–D–ABCEG
7–F–ABCEGD
8– –ABCEGDF

45
Q

Dijkstra’s Algorithm example

A

use a priority queue to order our search
The priority values used in the queue correspond to the cumulative distance to each vertex added to the priority queue
Initialize an empty map/hash table representing visited vertices.
Key is the vertex v.
Value is the min distance d to vertex v.
Initialize an empty priority queue, and insert vs into it with distance (priority) 0.
While the priority queue is not empty:
Remove the first element (a vertex) from the priority queue and assign it to v. Let d be v’s distance (priority).
If v is not in the map of visited vertices:
Add v to the visited map with distance/cost d.
For each direct successor vi of v:
Let di equal the cost/distance associated with edge (v, vi).
Insert vi to the priority queue with distance (priority) d + di.

46
Q

Adjacency matrix example

A