coding university - Data structure and algorithms Flashcards

1
Q

What is ZeroMQ?

A

A socket-based system, can be used as a queue, pub/sub, etc.

Carries messages across inproc, IPC, TCP, TIPC, multicast.

Smart patterns like pub-sub, push-pull (pipeline), and router-dealer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ActiveMQ?

A

Apache ActiveMQ is an open source message broker written in Java.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MessagePack?

A

MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

No IDL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Avro?

A

Apache Avro is a data serialization system. IDL-based.

Rich data structures.

A compact, fast, binary data format.

A container file, to store persistent data.

Remote procedure call (RPC).

Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Bloom filter?

A

A Bloom filter is a data structure used to quickly test membership in a set where the number and size of possible elements would be very large. Too large to keep in memory.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either ““possibly in set”” or ““definitely not in set””. Elements can be added to the set, but not removed (though this can be addressed with a ““counting”” filter). The more elements that are added to the set, the larger the probability of false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you easily generate multiple hashes for the same element?

A

Double hashing. This method gives you as many hashes as you need:

hash(x,m) = (hasha(x) + i * hashb(x)) mod m

In Python:

import mmh3

mmh3.hash64(‘foo’) # two 64 bit signed ints, in a tuple

now you have 2 64-bit hashes. Substituting for i gives you multiple hashes for a Bloom filter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a cache-oblivious algorithm?

A

A cache-oblivious algorithm does not mean that the algorithm does not take advantage of the cache; to the contrary, it does so quite effectively. What it means is that the algorithm does not need to know the cache line size; it works effectively for all cache line sizes simultaneously, removing the need to tune or optimize for a given machine.

Optimal cache-oblivious algorithms are known for the Cooley–Tukey FFT algorithm, matrix multiplication, sorting, matrix transposition, and several other problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you augment a splay tree so you can find how many items are between x and y?

A

Store size of subtrees at each node.

Find x, splay to root. Each splay, insert, and delete must maintain size in node.

Find y, and along the way add up the sizes in the left subtrees, and 1 for each visited left-hand node.

Splay y to root to ensure balance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a maximum flow problem, what is the minimum cut?

A

The min cut is the maximum flow through the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Ford-Fulkerson algorithm?

A

The Ford–Fulkerson method or Ford–Fulkerson algorithm (FFA) is a greedy algorithm that computes the maximum flow in a flow network. It is called a ““method”” instead of an ““algorithm”” as the approach to finding augmenting paths in a residual graph is not fully specified or it is specified in several implementations with different running times. The name ““Ford–Fulkerson”” is often also used for the Edmonds–Karp algorithm, which is a specialization of Ford–Fulkerson.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the running time for the disjoint set data structure?

A

Due to merging smaller disjoint sets into larger ones (called union by rank) (during union) and performing path compression (during find), the amortized time per operation is only O(alpha(n)), where alpha(n) is the inverse of the function and A is the extremely fast-growing Ackermann function. Since alpha(n) is the inverse of this function, alpha(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

The worst-case for find() is Theta(log u) where u is the number of unions, and no finds have been done to allow for path compression yet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What Python flag turns on optimizations and removes assertions from code?

A

python -O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is doing work in a constructor a bad thing?

A

It can make your code harder to test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What should be avoided to ensure testing is easier/possible?

A
  • static methods and properties
  • final keyword
  • use of new in methods (use dependency injection)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some guidelines to keep in mind to not violate the dependency inversion principle?

A
  • No variable should have a concrete class type. An abstract type is better.
  • No class should derive from a concrete class.
  • No method should override an implemented method of any of its base classes.

These are guidelines and may not be feasible all the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is separate chaining?

A

In hash table conflict resolution, each bucket is independent and has some sort of linked list of entries with the same index. The time for hash table operations is the time to find the bucket (which is constant) plus the time for the list operation.

In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than that. Therefore, structures that are efficient in time and space for these cases are preferred. Structures that are efficient for a fairly large number of entries per bucket are not needed or desirable. If these cases happen often, the hashing function needs to be fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is open addressing?

A

In hash table conflict resolution, all entry records are stored in the bucket array itself. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. The name ““open addressing”” refers to the fact that the location (““address””) of the item is not determined by its hash value. (This method is also called closed hashing; it should not be confused with ““open hashing”” or ““closed addressing”” that usually mean separate chaining.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the length of the longest chain in a hash table using separate chaining?

A

O(1 + alpha) where alpha is the load factor, n/m.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Since uniform hashing is difficult to achieve in practice, what is a great alternative?

A

double hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you test if a number is odd in bitwise operations?

A

return (x & 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can you test if a number is even in bitwise operations?

A

return (x & 1) == 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is another name for a breadth-first search traversal?

A

Level-order traversal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a 2-3-4 tree?

A

2–3–4 tree (also called a 2–4 tree) is a self-balancing data structure that is commonly used to implement dictionaries. The numbers mean a tree where every node with children (internal node) has either two, three, or four child nodes:

  • 2-node has one data element, and if internal has two child nodes;
  • 3-node has two data elements, and if internal has three child nodes;
  • 4-node has three data elements, and if internal has four child nodes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the complexity of all operations on a splay tree?

A

O(log n) on average.

A single operation Theta(n) in the worst case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the maximum height of a red-black tree?

A

2 log n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In a b-tree, how many children are there per node?

A

root: 1 to 2t-1 keys

non-root: t-1 to 2t-1 keys

t could be up to 100, or more.

There are n keys and n+1 children.

Leaves are all the same level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does the max degree of a b-tree depend on?

A

The number of items being stored, and page size based on disk characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A b-tree’s data is organized to correspond with what?

A

Pages on disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Give an example of how a b-tree might be organized.

A

1024 children per node.

Store root in memory.

3 nodes accessed gets us 1024^3 disk pages.

4 nodes accessed gets us 1024^4 disk pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

On descending a b-tree, what’s the rule?

A

Never step into a minimal node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

On insertion in a b-tree, what’s the rule?

A

Never step into a full node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How many nodes of k leaves are in a compressed trie (big-O)?

A

O(k) nodes with k leaves due to compression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a suffix tree?

A

A suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provide one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string’s suffix tree typically requires significantly more space than storing the string itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

In brief, how does selection sort work?

A

Find the minimum item on each pass, past the previous minimum, and swap it into the leftmost position after the previous minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

When can insertion sort run in n log n time?

A

Load into a binary search tree. Then inorder traversal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can you speed up selection sort with a heap?

A

Replace the unsorted portion with a min-heap. Gives O(log n) removal. Makes n log n overall.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What data structure is well suited for a heap sort and which is bad?

A

Array - good

Linked list - clumsy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What data structure is well suited for a merge sort and which is just okay?

A

Linked list - a natural

Array does not allow for in-place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How can you optimize finding a pivot when the segment to pivot is large (not random choice)?

A

Choose a median of three.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is counting sort?

A

Counting sort is an algorithm for sorting a collection of objects according to keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is radix sort?

A

Radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value.

Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts. LSD radix sorts process the integer representations starting from the least digit and move towards the most significant digit. MSD radix sorts work the other way around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the counting sort running time?

A

O(q + n) where q is the number of unique items. If q is in O(n), then linear time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What radix is most natural to use?

A

A power of 2 radix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How would radix sort work for IEEE floating point numbers?

A

Flip all bits for negative numbers, do sort, then flip back.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How to choose q for radix sort?

A

Choose q within a power of 2 of n. Ensures the number of passes is small. Best rule is n rounded down to the next power of 2.

To save memory, round sqrt(n) down to the next power of 2. Twice as many passes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What operations are a treap optimized for?

A
  • union
  • intersection
  • difference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the Day–Stout–Warren (DSW) algorithm?

A

The Day–Stout–Warren (DSW) algorithm is a method for efficiently balancing binary search trees — that is, decreasing their height to O(log n) nodes, where n is the total number of nodes. Unlike a self-balancing binary search tree, it does not do this incrementally during each operation, but periodically, so that its cost can be amortized over many operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the insertion sort algorithm?

A

for (i = 0; i < n; ++i) {

j = i;

while (j > 0 &amp;&amp; a[j - 1] > a[j]) {

    swap(a, j, j - 1);

    j -= 1;

}

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Is radix sort stable?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the algorithmic time complexity of radix sort?

A

O(digits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Give the code for selection sort.

A

for (i = 0; i < n; ++i) {

min_index = i:

for (j = i; j < n; ++j) {

    if (a[j] < a[min_index]) {

        min_index = j;

    }

}

swap(a, i, min_index)

}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

All comparison-based sorting is bounded by what complexity?

A

Omega(n log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What do you call a linear ordering of a directed graph of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering?

A

Topological sort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is a good method for performing a topological sort?

A
  1. Calculate in-degree for each node. O(v + e)
  2. Go through 0s, add to queue.
  3. For each item in queue, look at each connection, and decrement in-degree of each, if they got to 0, add to queue, repeat.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

How many possible trees are there that span all nodes in a graph?

A

4^n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is Prim’s algorithm?

A

def prim(self):

""""""

Returns a dictionary of parents of vertices in a minimum spanning tree

:rtype: dict

""""""

s = set()

q = queue.PriorityQueue()

parents = {}

start_weight = float(""inf"")

weights = {}  # since we can't peek into queue

for i in self.get_vertex():

    weight = start_weight

    if i == 0:

        q.put(([0, i]))

    weights[i] = weight

    parents[i] = None

while not q.empty():

    v_tuple = q.get()

    vertex = v_tuple[1]

    s.add(vertex)

    for u in self.get_neighbor(vertex):

        if u.vertex not in s:

            if u.weight < weights[u.vertex]:

                parents[u.vertex] = vertex

                weights[u.vertex] = u.weight

                q.put(([u.weight, u.vertex]))

return parents
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is the time complexity of Prim’s algorithm on an adjacency matrix?

A

O(v^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

What is the time complexity of Prim’s algorithm on an adjacency list and a binary heap?

A

O(e log v)

derived from:

O((e + v) log v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is the time complexity of Prim’s algorithm on an adjacency list and a Fibonacci heap?

A

O(e + v log v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is the pseudocode Kruskal’s algorithm?

A

KRUSKAL(G):

A = ∅

foreach v ∈ G.V:

MAKE-SET(v)

foreach (u, v) in G.E ordered by weight(u, v), increasing:

if FIND-SET(u) ≠ FIND-SET(v):

  A = A ∪ {(u, v)}

  UNION(u, v)

return A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is the time complexity of Kruskal’s algorithm?

A

O(E log V)

or

O(e log e + e α(v) + v)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What is Kruskal’s algorithm?

A

Kruskal’s algorithm is a minimum-spanning-tree algorithm which finds an edge of the least possible weight that connects any two trees in the forest. It is a greedy algorithm in graph theory as it finds a minimum spanning tree for a connected weighted graph adding increasing cost arcs at each step. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

How can you find the number of connected components?

A

For each node:

if node not yet visited, increment component count and do DFS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

How can you get a topological sort with DFS?

A

Do a DFS, and when each node is being marked as complete, add node to a list.

Reverse the list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How can you check for a cycle with DFS?

A

for each neighbor node:

if not marked as visited (and is not parent) then DFS

else it’s a cycle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

How can you get the strongly connected components of a graph?

A
  1. DFS - calculate the finish times for each node
  2. Reverse the edges in the graph
  3. Call DFS on nodes in reverse graph in reverse order of finishing times.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

How do you reverse the edges in a directed graph represented as an adjacency matrix?

A

Transpose the matrix, so [i, j] becomes [j, i]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

How can you find the shortest path on a DAG?

A
  1. Topological sort

2. follow the topological sort, relaxing edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

How to find the longest path on a weighted DAG?

A
  1. Set all edges to their negative weight.
  2. Topological sort
  3. follow the topological sort, relaxing edges
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is the diameter of a graph?

A

The shortest path of the farthest nodes. That is, it is the greatest distance between any pair of vertices. To find the diameter of a graph, first find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Under what condition can you not use Djikstra’s algorithm?

A

When the graph contains a negative edge. Can cause a cycle that will be traversed infinitely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

In plain words, how does Kruskal’s algorithm work?

A
  1. Create a set T and list for result
  2. Make a list of all edges in G
  3. Sort edges by weight, from least to greatest.
  4. Iterate edges in sorted order.
  5. For each edge, if u and v are not in T, add u and v to T, and add edge to result list.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What can most dynamic programming problems be expressed as?

A

Finding the shortest path in a DAG. Formulating it this way ensures you can solve it in linear or linearithmic time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

What metric can you use to measure the badness of a line in a text justification problem?

A

(page width - text width)^3

Minimize the sum of the badness of the lines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

How can you tell if a graph is 2-colorable?

A

If it’s bipartite. All trees are bipartite.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What is it called when you have too many base cases in your recursion?

A

arm’s length recursion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

What is the base case of a recursion?

A

The code required to give the solution to the smallest subproblem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

What is the formula for n choose k?

A

n! / k!(n - k)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What is the general outline of a backtracking algorithm?

A

ef solve(conf):

if (no more choices):

    return conf

choices = get_available_choices

for choice in choices:

    c = pick one

    if solve(conf using c):

        return true

    unmake choice c

return false
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

At how many items should you expect a collision when hashing among n buckets?

A

At sqrt(n) the probability is 1/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

What is n/n^2?

A

1/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

What does it mean when a problem is NP-Hard?

A

It is as hard as any other problem in NP. A problem X is NP-Hard if every problem Y in NP-Hard reduces to X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

“Is ““3-D matching”” NP-Complete?”

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

“Is ““triple coloring a graph”” NP-Complete?”

A

YEs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

“Is ““two coloring a graph”” NP-Complete?”

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

“Is ““subset sum”” NP-Complete?”

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

“Is ““bin packing”” NP-Complete?”

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

“Is ““vertex cover”” NP-Complete?”

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

“Is ““set cover”” NP-Complete?”

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

Name some NP-Complete problems.

A
  • tsp
  • knapsack problem
  • satisfiability
  • 3D matching
  • tricoloring
  • subset sum
  • rectangle packing
  • bin packing
  • vertex cover
  • set cover
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

What is one way of doing approximate traveling salesman?

A

Select a vertex as root.

Build a MST.

Do a preorder traversal, store nodes in H.

Return H (a Hamiltonian cycle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

How can an LRU cache be implemented with a linked list?

A

When an item is accessed, it moves to the head of the list.

The trailing items can be overwritten with new items, or removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

What is a skip list?

A

A data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for.

A skip list is built in layers. The bottom layer is an ordinary ordered linked list. Each higher layer acts as an ““express lane”” for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

What operations does a skip list support and what is their avg and worst case times?

A

search: O(log n) O(n)
insert: O(log n) O(n)
delete: O(log n) O(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

What operations does a van Emde Boas tree support and what are the time complexities?

A

All are O(log log M), where M is the total number of items that can be stored = 2^m

Or O(log m) where m is the actual number of items stored

Space: O(M)

Search

Insert

Delete

Predecessor

Successor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

What are the complexities for treap operations?

A

For all the basic maintenance operations, they are O(log n) average case and O(n) worst case.

  • Search
  • Insert
  • Delete

For these operations, O(m log n/m) for treaps of sizes m and n, with m ≤ n.

  • union
  • intersection
  • difference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

What are Catalan numbers?

A

Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively-defined objects. They can be thought of as the set of balanced parentheses.

Do not think of Catalan numbers as pseudoprimes. There are only 3 Catalan pseudoprimes.

98
Q

How are the priorities of a treap assigned?

A

Randomly generated upon insertion. That randomness is used to keep the tree balanced.

99
Q

Is a geometric Steiner tree NP-Complete?

A

Yes

100
Q

What are the 2 algorithms for convex hull?

A
  • Graham scan

- Jarvis march (gift-wrapping method)

101
Q

How does a Graham scan work in finding convex hull?

A

At O(n log n), uses a sort and then a simple single pass of all the points, and making only left turns as it goes around the perimeter counter-clockwise. When the next point is a right turn, it backtracks past all points (using a stack and popping points off) until that turn turns into a left turn.

102
Q

How does the Jarvis march work in finding convex hull?

A

Starting with the leftmost point p:

Go through each point to the right of that point, and using p as a pivot, find which point is the most clockwise. O(n)

Get the most clockwise point as the new p - O(1)

Loop again with new p

This continues until the starting point is reached O(h) - where h is the number of hull points

103
Q

What is the worst case time complexity of a Jarvis march?

A

O(n^2)

Occurs when most points are part of the hull, and few points contained in the hull.

104
Q

What is the average complexity of a Jarvis march?

A

O(n * h) where h is the number of points that compose the hull.

105
Q

unordered singly linked list - time complexity

A

delete - o(n)

find - o(n)

106
Q

ordered singly linked list - time complexity

A

delete- o(n)

107
Q

Binary Tree - time complexity

A

find - o(h)
add - o(h)
delete - O(h)

108
Q

stack - time complexity

A

Add element to the top of the stack - push O(1)
Remove the top element of the stack - pop O(1)
Return the value of the top element of the stack without removing it. O(1)

109
Q

Queue - time complexity

A

Add an element to a queue. O(1)
Remove an element from the front of the queue. dequeue O(1)
Return the element from the front of the queue without removing it. - front O(1)

110
Q

(Balanced Binary Search Tree)

A

find - O(log N)
add - O(log N)
delete - O(log N)

111
Q

What is a skip list?

A

A data structure for storing a sorted list of items using a hierarchy of linked lists that connect increasingly sparse subsequences of the items.

O(log N) expected time for all operations, O(N) worst case.

112
Q

What is a treap?

A

tree + heap.
A random priority is assigned to every key and must maintain two properties:

  • They are in order with respect to their keys, as in a typical binary search tree
  • They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor

O(log N) expected time for all operations, O(N) worst case.

113
Q

What is a max-heap?

A

A queue in which each element has a ““priority”” assigned to it. Elements with higher priorities are served before lower priorities.

114
Q

binary heap - time complexity

A

min - o(1)
insert - o(log n)
removemin - o (logn)

115
Q

What is a binary heap?

A

A collection of keys arranged in a complete heap-ordered binary tree, represented in level order in an array (not using the first entry). The parent of the node in position k is in position [k/2] and the two children of the node in position k are in position 2k and 2k+1.

116
Q

What is a Adaptable Priority Queue?

A

A priority queue that allows you to change the priority of objects already in the queue.

117
Q

What is the time complexity of quicksort?

A

O(N^2 worst)

O(N log N) - best & expected
Lower Bound for Comparison Based Sorting No comparison based sorting algorithm can be faster than O(N log N)

118
Q

What is a connected graph?

A

There exists a path from every vertex to every other vertex in the graph.

119
Q

What is a tree?

A

An acyclic connected graph.

120
Q

What is a cycle?

A

Path with at least one edge whose first and last vertices are the same.

121
Q

What is a spanning tree?

A

A subgraph that contains all of that graph’s vertices and a single tree.
Also known as Min-Cost Spanning Tree

122
Q

What is stable sorting?

A

Items with the same key are sorted based on their relative position in the original permutation

123
Q

What is another name for a trie?

A

Prefix tree or a radix tree.

124
Q

What is internal sorting?

A

An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory. For sorting larger datasets, it may be necessary to hold only a chunk of data in memory at a time, since it won’t all fit. The rest of the data is normally held on some larger, but slower medium, like a hard-disk. Any reading or writing of data to and from this slower media can slow the sortation process considerably.

125
Q

What is external sorting?

A

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file.

Mergesort is typically preferred.

126
Q

What are 2 advantages of merge sort?

A
  • suitable for a linked list

- suitable for external sort

127
Q

What is disadvantages of merge sort?

A

Need an extra buffer to hold the merged data

128
Q

What are 3 advantages of heap sort?

A
  • don’t need recursion
  • suitable for large data
  • locality of data
129
Q

What is a disadvantage of heap sort?

A

Usually slower than merge sort and quick sort.

130
Q

What is a articulation vertex?

A

The weakest point in a graph.

131
Q

What is the chromatic number?

A

The smallest number of colors needed for an edge coloring of a graph.

132
Q

What is the degree of a vertex?

A

The number of edges incident of the vertex, with loops counted twice.

133
Q

What is quick select?

A

A selection algorithm to find the kth smallest element in an unordered list. Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side - the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n).

134
Q

What is an inverted index?

A

An index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents (named in contrast to a Forward Index, which maps from documents to content). The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.

135
Q

What is set partition?

A

A partitioning of elements of some universal set into a collection of disjointed subsets. Thus, each element must be in exactly one subset.

136
Q

What is a maximum spanning tree?

A

A spanning tree of a weighted graph having maximum weight. It can be computed by negating the edges and running either Prim’s or Kruskal’s algorithms

137
Q

What is a minimum product spanning tree and when would you use it?

A

The cost of a tree is the product of all the edge weights in the tree, instead of the sum of the weights. Since log(a*b) = log(a) + log(b), the minimum spanning tree on a graph whose edge weights are replaced with their logarithms gives the minimum product spanning tree on the original graph.

You would use it to minimize the product.

138
Q

What is a rolling hash?

A

A rolling hash (also known as a rolling checksum) is a hash function where the input is hashed in a window that moves through the input.

One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash.

139
Q

What is the Euclidean GCD algorithm in Python?

A

def gcd(a, b):

while a:

    b, a = a, b % a

return b
140
Q

What is the Rabin-Karp algorithm?

A

Compute hash codes of each substring whose length is the length of s, such as a function with the property that the hash code of a string is an additive function of each individual character. Get the hash code of a sliding window of characters and compare if the hash matches.

141
Q

What is the time complexity of breadth-first search?

A

O(m + n)

142
Q

What is the time and space complexity of Bellman-Ford?

A

Time : O (|V| |E|) or Theta(n^3)

Space: O (|V|)

143
Q

What is the Bellman–Ford algorithm?

A

An algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra’s algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers.

144
Q

What is a Hamiltonian cycle?

A

Given an undirected graph G = (V, E), does there exist a simple

cycle Γ that contains every node in V.
Certificate is a permutation of the n nodes, contain each node in v exactly once, there is an edge btw each pair of adj nodes in the permutation.

145
Q

What is the set cover problem?

A

Given a set U of elements, a collection S1, S2, …, Sm of subsets of

U, and an integer k, does there exist a collection of ≤ k of these sets whose union is equal to U

146
Q

What is the time and space complexity of heapsort?

A

O(n lg n) time

O(1) space

147
Q

What is the time and space complexity of merge sort?

A

O(n lg n) time

O(n) space

148
Q

Write a function that reverses a linked list, with this argument: pointer to pointer to the head node.

A

void reverse(node_t **head) {

node_t *prev = NULL;
node_t *current = *head;
node_t *next = *head;

while (current) {

next = current->next;

current->next = prev;

prev = current;

current = next;

}

*head = prev;

}

149
Q

Delete a given value from a BST rooted at given node. Returns a pointer to node.

A

bst_node* delete_value(bst_node* node, int value) {

if (node == NULL)

return node;

if (value < node->value) {

node->left = delete_value(node->left, value);

} else if (value > node->value) {

node->right = delete_value(node->right, value);

} else { // found value

if (node->left == NULL &amp;&amp; node->right == NULL) {

  free(node);

  node = NULL;

} else if (node->left == NULL) {

  bst_node* temp = node;

  node = node->right;

  free(temp);

} else if (node->right == NULL) {

  bst_node* temp = node;

  node = node->left;

  free(temp);

} else {

  // 2 children - get min node of right subtree

  int right_min = get_min(node->right);

  node->value = right_min;

  node->right = delete_value(node->right, right_min);

}

}

return node;

}

150
Q

Get the successor of a value in a BST rooted by given node. Returns int.

A

int get_successor(bst_node* node, int value) {

if (node == NULL)

return -1;

bst_node* target = node;

while (target->value != value) {

if (value < target->value) {

  target = target->left;

} else if (value > target->value) {

  target = target->right;

}

}

// arrived at target node

if (target->right != NULL) {

// get min value of right subtree

return get_min(target->right);

} else {

// get lowest ancestor that is a left child in the path to target value

bst_node* successor = NULL;

bst_node* ancestor = node;

while (ancestor != NULL) {

  if (value < ancestor->value) {

    successor = ancestor;

    ancestor = ancestor->left;

  } else {

    ancestor = ancestor->right;

  }

}

return successor->value;

}

}

151
Q

Using recursion, insert a value into a tree

A

root = insert(node, int) “bst_node insert(bst_node* node, const int value) {

if (node == 0) {

bst_node* new_node = malloc(sizeof(bst_node));

if (new_node == NULL) {

  printf(""Unable to allocate memory."");

  exit(0);

}

new_node->value = value;

new_node->left = 0;

new_node->right = 0;

node = new_node;

return node;

}

if (value < node->value) {

node->left = insert(node->left, value);

} else if (value > node->value) {

node->right = insert(node->right, value);

}

return node;

}

152
Q

Write a method is_binary_search_tree that returns true if a given tree is a BST (use helper function)

A

bool is_binary_search_tree(bst_node* node) {

if (node == NULL) return true;

return is_between(node, INT_MIN, INT_MAX);

}

bool is_between(bst_node* node, int min, int max) {

if (node == NULL) return true;

// ensure subtrees are not hiding a value lower or higher than the subtree

// allows

return node->value > min && node->value < max &&

     is_between(node->left, min, node->value) &amp;&amp;

     is_between(node->right, node->value, max);

}”
Using an iterative approach, write a function find_node(bst_node* root, int target) that returns the node with the given target value in a BST. “bst_node* find_node(bst_node* root, int target) {

while (root != NULL && root->key != target) {

if (root->key > target) {

  root = root->left;

} else {

  root = root->right;

}

}

return root;

}

153
Q

Function that returns the height (in nodes) of a BST: int get_height(bst_node* node)

A

int get_height(bst_node* node) {

if (node == NULL) {

return 0;

}

return 1 + max_num(get_height(node->left), get_height(node->right));

}

154
Q

How many levels in a complete binary tree of size n?

A

floor(1 + log(base2)(n))

155
Q

How can build heap be done in linear time?

A

A tree of size n nodes, will have floor(n/2^h) nodes with height >= h.

The last half of nodes will be leaves, so they already satisfy the heap property. No work needs to be done on them.

going bottom-up (ignoring the last n/2 items) and satisfying the heap property one level at a time, each level going up the tree has to do at most 1 operation more than the level below it. But as you go up the tree, higher levels have fewer nodes, so you may be doing more operations, but it happens on fewer number of times.

This resembles a series:

n/2 - height 1: 1 operations

n/4 - height 2: 2 operation

n/8 - height 3: 3 operations

going to floor(n/2^h) - height h: h operations

n * (1/2 + 2/4 + 3/8 + 4/16 ….) = n * 1 = n”
C or Python: Sort an array of numbers using heap sort. “void heap_sort(int* numbers, int count) {

int temp;

for (int i = count - 1; i > 0; –i) {

temp = numbers[i];

numbers[i] = numbers[0];

numbers[0] = temp;

percolate_down(numbers, i, 0);

}

}

void heapify(int* numbers, int count) {

for (int i = count / 2 - 1; i >= 0; –i) {

percolate_down(numbers, count, i);

}

}

void percolate_down(int* numbers, int count, int index) {

while (index * 2 + 1 < count) {

int swap_index = index;

int left_child_index = index * 2 + 1;

int right_child_index = index * 2 + 2;

bool has_left_child = left_child_index < count;

bool has_right_child = right_child_index < count;

if (has_left_child &amp;&amp; has_right_child) {

  if (numbers[left_child_index] > numbers[right_child_index]) {

    swap_index = left_child_index;

  } else {

    swap_index = right_child_index;

  }

} else if (has_left_child) {

  swap_index = left_child_index;

} else if (has_right_child) {

  swap_index = right_child_index;

} else {

  break;

}

if (numbers[swap_index] > numbers[index]) {

  int temp = numbers[index];

  numbers[index] = numbers[swap_index];

  numbers[swap_index] = temp;

  index = swap_index;

} else {

  break;

}

}

}

156
Q

How are queues usually implemented

A

Using a Circular Array or Singly Linked List.

157
Q

How is a deque usually implemented?

A

Using a Circular Array or Doubly Linked List.

158
Q

Write a binary search function that works iteratively, taking a target int, array of ints, and size of the array, returning the index of the found item, or -1

A

int binary_search(int target, int numbers[], int size) {

int low = 0;

int high = size - 1;

int mid = 0;

while (low <= high) {

mid = (high + low) / 2;

if (target > numbers[mid]) {

  low = mid + 1;

} else if (target < numbers[mid]) {

  high = mid - 1;

} else {

  return mid;

}

}

return -1;

}

159
Q

Write a binary search function that works recursively, returning the index of the found item, or -1

A

int binary_search_recur(int target, int numbers[], int low, int high) {

if (low > high) {

return -1;

}

int mid = (high + low) / 2;

if (target > numbers[mid]) {

return binary_search_recur(target, numbers, mid + 1, high);

} else if (target < numbers[mid]) {

return binary_search_recur(target, numbers, low, mid - 1);

} else {

return mid;

}

}

160
Q

In C or Python, Write a universal hashing function for a string, taking as arguments a string and the capacity of the hashtable

A

int hash(const char* key, const int m) {

int hash = 0;

for (int i = 0; i < key[i] != ‘\0’; ++i) {

hash = hash * 31 + key[i];

}

return abs(hash % m);

}

161
Q

What is a Binary Search Tree?

A

A binary tree is a data structure where each node has a comparable key and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node’s left subtree and smaller than the keys in all nodes in that node’s right subtree.

162
Q

What is an AVL tree?

A

A BST where the height of every node and that of its sibling differ by at most 1.

163
Q

What is a red-black tree?

A

BSTs having red and black links satisfying:

  • Red links lean left
  • No node has two links connected to it
  • The tree has perfect black balance: every path from the root to a null link has the same number of blacks
164
Q

What is a splay tree?

A

A self-adjusting binary search tree where recently accessed elements are moved to the root so they are quick to access again.

165
Q

What is a treap?

A

A random priority is assigned to every key and must maintain two properties:

  • They are in order with respect to their keys, as in a typical binary search tree
  • They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor

O(log N) expected time for all operations, O(N) worst case

166
Q

What is a van Emde Boas tree?

A

The van Emde Boas tree supports insertions, deletions, lookups, successor queries, and predecessor queries in time O(log log U), where U is the universe of items to store. Items are stored in clusters of size sqrt(U).

The van Emde Boas data structure divides the range {0,…,n−1} into blocks of size sqrt(n), which we call clusters. Each cluster is itself a vEB structure of size sqrt(n). In addition, there is a “summary” structure that keeps track of which clusters are nonempty.

—More details —-
A van Emde Boas tree (or van Emde Boas priority queue), also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the maximum number of elements that can be stored in the tree. The M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.

167
Q

What is a compressed trie?

A

It’s a trie where the non-branching paths are compacted into a single edge.

168
Q

What is a compressed trie?

A

It’s a trie where the non-branching paths are compacted into a single edge.

169
Q

What relationship of the keys do you lose with a hash table?

A

The ordering of the keys.

170
Q

Is quicksort stable?

A

No

171
Q

Can quicksort be done in-place?

A

Yes

172
Q

Can merge sort be done in-place?

A

No. It requires O(n) space

173
Q

Is merge sort stable?

A

Yes

174
Q

Is insertion sort stable?

A

Yes

175
Q

Can insertion sort be done in-place?

A

Yes

176
Q

Can selection sort be done in-place?

A

Yes

177
Q

Is selection sort stable?

A

no

178
Q

Is heap sort stable?

A

No

179
Q

Can heap sort be done in-place?

A

yes

180
Q

What is a y-fast trie?

A

A y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982 to decrease the O(n log M) space used by an x-fast trie.

181
Q

What is an x-fast trie?

A

An x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

182
Q

Write merge sort in C (check answer carefully)

A

void merge(int numbers[], int low, int mid, int high) {

// temp array for holding sorted items

int b[high - low - 1];

int i = low;

int j = mid + 1;

int k = 0;

// merge items from list in order

while (i <= mid && j <= high) {

if (numbers[i] <= numbers[j]) {

  b[k++] = numbers[i++];

} else {

  b[k++] = numbers[j++];

}

}

// copy the remaining items to tmp array

while (i <= mid) b[k++] = numbers[i++];

while (j <= high) b[k++] = numbers[j++];

–k;

while (k >= 0) {

numbers[low + k] = b[k];

--k;

}

}

void merge_sort(int numbers[], int low, int high) {

if (low < high) {

int mid = (low + high) / 2;

merge_sort(numbers, low, mid);

merge_sort(numbers, mid + 1, high);

merge(numbers, low, mid, high);

}

}

183
Q

Write quick sort in C with only one method and random pivot (check answer carefully)

A

void quick_sort(int numbers[], int left, int right) {

if (left == right)

return;

int i = left;

int j = right;

int temp = 0;

int count = right - left;

int pivot_mod = rand() % count;

int pivot = numbers[left + pivot_mod];

while (i <= j) {

while (numbers[i] < pivot) ++i;

while (numbers[j] > pivot) --j;

if (i <= j) {

  temp = numbers[i];

  numbers[i] = numbers[j];

  numbers[j] = temp;

  \++i;

  --j;

}

}

if (left < j) {

quick_sort(numbers, left, j);

}

if (right > i) {

quick_sort(numbers, i, right);

}

}

184
Q

In what case would perfect hashing be practical?

A

When you don’t need to support inserts or deletes. The data is static.

185
Q

How does perfect hashing handle collisions?

A

It creates a second hash table in the buckets where there are multiple items (k), using a second hash function, and k^2 space. The hash table has two hashing levels. k^2 is chosen because the Markov inequality (birthday paradox) ensures we will not have collisions in bucket.

186
Q

What is the optimal load factor for a hash table?

A

O(sqrt(n))

187
Q

What is the expected load factor for a hash table?

A

n/m, where n = items, m = buckets) n/m is also called alpha.

188
Q

What is the technical running time for operations on a hash table?

A

O(1 + alpha), where alpha is the load factor (n/m). Table doubling operations are amortized.

189
Q

What is the worst-case search time of perfect hashing?

A

O(1)

190
Q

What is the worst-case space required for perfect hashing?

A

O(n)

191
Q

What’s the best-case running time of binary search?

A

O(1) - we get lucky and find the element right at the midpoint.

192
Q

What’s the worst-case running time of binary search?

A

O(log n)

193
Q

What are the downsides of using an adjacency matrix to represent a graph?

A

Finding all the outgoing edges from a vertex takes O(n) time even if there aren’t very many, and the O(n^2) space cost is high for ““sparse graphs,”” those with much fewer than n^2 edges.

194
Q

When is using an adjacency list expensive?

A

Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n + e), where e is the total number of edges, although if this is something we actually need to do often we can store a second copy of the graph with the edges reversed.

195
Q

When are adjacency lists most useful?

A

Adjacency lists are most useful when we mostly want to enumerate outgoing edges of each node. This is common in search tasks, where we want to find a path from one node to another or compute the distances between pairs of nodes. If other operations are important, we can optimize them by augmenting the adjacency list representation; for example, using sorted arrays for the adjacency lists reduces the cost of edge existence testing to O(log(d+ (u))), and adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d− (u)) time, where d− (u) is u’s in-degree.

196
Q

What is the space required for a graph using an adjacency list?

A

O(n + e)

197
Q

Given a fully balanced binary tree with x nodes, what is the height of the tree in nodes?

A

log(base2) x + 1

198
Q

Given a fully balanced k-ary tree with x nodes, what is the height of the tree in nodes?

A

log(basek) x + 1

199
Q

A binary tree with height h can contain at most how many nodes?

A

2^(h+1) − 1 nodes

200
Q

For a k-ary tree with height h, the upper bound for the maximum number of leaves is:

A

k^h

201
Q

What is the complexity of Dijkstra’s shortest-path algorithm

A

O(e log v), where e is the number of edges.

It must scan each edge, and gets and updates values on the heap.

202
Q

What is a drawback of using an adjacency matrix for an undirected graph?

A

Half of the entries in the matrix are duplicates.

203
Q

What is the memory needed to store an adjacency list?

A

Theta( |V| + |E| )

204
Q

What is the memory needed to store an adjacency matrix?

A

Theta(|V|^2)

205
Q

How would you implement a queue with a linked list?

A

Use a tail pointer. Push new items at the tail, pop items at the head. Both operations are constant-time.

206
Q

How would you implement a stack with a linked list?

A

Push and pop items at the head. Both operations are constant-time.

207
Q

What preference of nodes vs leaves does preorder traversal give on a tree?

A

Nodes first, leaves later.

208
Q

What preference of nodes vs leaves does postorder traversal give on a tree?

A

Leaves first, internal nodes later.

209
Q

What could you use in DFS to turn a recursive algorithm into an interative one?

A

A stack

210
Q

What do you use to keep track of nodes to visit in BFS?

A

queue

211
Q

Using a stack to keep track of unvisited nodes gives what kind of traversal?

A

DFS

212
Q

Using a queue to keep track of unvisited nodes gives what kind of traversal?

A

BFS

213
Q

In a highly connected graph of n vertices, how many cycles can there be?

A

(n - 1)! - enumerating is possible (using backtracking), but there will be a lot.

214
Q

What can use to find if a graph is bipartite?

A

BFS. Using only 2 colors. When you encounter a new vertex, if it has no color, give it the opposite color of its parent vertex. If it is already colored the same, the graph is not bipartite.

215
Q

How can you find a cycle in a graph?

A

DFS. If you discover an edge that connects to an ancestor (previously discovered vertex), you have a cycle.

216
Q

What is an articulation vertex?

A

A vertex of a graph whose deletion disconnects the graph.

217
Q

How can you find an articulation vertex?

A

DFS multiple times. Remove each edge one at a time, doing a DFS after each, so see if you end up with > 1 connected components. If you remove a node and then DFS and find you have fewer than m - 1 edges, you’ve deleted an articulation vertex. O(n(n+m)).

A faster way, with a little more bookkeeping, can be done in O(n+m) time, if you do DFS and keep track of parents and make a note when you reach a back edge, which connects to an ancestor.

218
Q

What is the optimal substructure property tell us about shortest paths?

A

That a subpath of a shortest path is also a shortest path.

219
Q

What is a red-black tree?

A

BSTs having red and black links satisfying:

  • Red links lean left
  • No node has two links connected to it
  • The tree has perfect black balance: every path from the root to a null link has the same number of blacks
220
Q

What is a splay tree?

A

A random priority is assigned to every key and must maintain two properties:

  • They are in order with respect to their keys, as in a typical binary search tree
  • They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor

O(log N) expected time for all operations, O(N) worst case

221
Q

Python: Write a class function to tell if the graph is bipartite. Start with vertex 0. You can access the adjacency list for a vertex v with: self.adjacency_list[v]

A

def is_bipartite(self):

    """"""
    Returns true if graph is bipartite
    :rtype: bool
    """"""

    colorings = {}

    to_visit = queue.Queue()

    to_visit.put(0)

    colorings[0] = 0

    while not to_visit.empty():
        v = to_visit.get()
        for u in self.adjacency_list[v]:
            if u not in colorings:
                colorings[u] = 1 - colorings[v]
                to_visit.put(u)
            elif colorings[u] == colorings[v]:
                return False

    return True
222
Q

What should you avoid in your base case in recursion

A

Too many base case scenarios. Just have one base case so you can return as quickly as possible. Avoid ““arm’s length”” recursion.

223
Q

What is the bandwidth of a graph?

A

The longest edge in the permutation that gives you the shortest edges.

224
Q

What is the complexity for a naive recursive Fibonacci function

A

Θ(φ^n), where phi(φ) is the golden ratio (1 + sqrt(5)) / 2.

approx: 1.618

225
Q

How many subsets are there in n items?

A

2^n

226
Q

What is a contiguously-allocated structures, and give examples

A

Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.

227
Q

What are linked data structures and give examples.

A

Linked data structures are composed of distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.

228
Q

What are some benefits of arrays?

A

Constant-time access given the index

  • Space efficiency
  • Memory locality
229
Q

What are some advantages to linked lists over arrays?

A
  • Overflow on linked structures can never occur unless the memory is actually full.
  • Insertions and deletions are simpler than for contiguous (array) lists.
  • With large records, moving pointers is easier and faster than moving the items themselves.
230
Q

What are some advantages to arrays over linked lists?

A
  • Linked structures require extra space for storing pointer fields.
  • Linked lists do not allow efficient random access to items.
  • Arrays allow better memory locality and cache performance than random pointer jumping.
231
Q

Given two strings str1 and str2, find the minimum number of edits (edit one character to another, delete char from str1 or delete char from str2) to change str1 to str2.

A

”””””””

  • DP Runtime : O(len(str1) * len(str2))

””””””

def min_edit_distance(str1, str2):

rows = len(str2) + 1

cols = len(str1) + 1

T = [[0 for _ in range(cols)] for _ in range(rows)]

for j in range(cols):

    T[0][j] = j

for i in range(rows):

    T[i][0] = i

for i in range(1, rows):

    for j in range(1, cols):

        if str2[i - 1] == str1[j - 1]:

            T[i][j] = T[i - 1][j - 1]

        else:

            T[i][j] = 1 + min(T[i - 1][j - 1], T[i - 1][j], T[i][j - 1])

print_edits(T, str1, str2)

return T[rows - 1][cols - 1]

if __name__ == ‘__main__’:

str1 = ""azced""

str2 = ""abcdef""

expected = 3

assert expected == min_edit_distance(str1, str2)

assert expected == min_edit_distance(str2, str1)
232
Q

How could you implement an LRU cache?

A

A fast lookup table, like a hash table or binary tree, and a linked list of items by use. When you access or add an item, you delete it from the linked list and add it to the head of the list. Then to prune, traverse the linked list and remove trailing elements, and delete them from the storage (tree or hash table).

You can also use a splay tree, since it moves accesses to the root. To prune items, somehow find and remove the leaves, since the number of leaves will be about n/2.”
What is a direct mapped cache? It’s a type of cache used in the CPU, where the lower order bits of a given memory address are used modulo the number of cache lines to place or lookup in the cache. Collisions are treated as overwrites.
What is a fully-associative cache? “It’s a type of cache used in the CPU, where lookups are done on all cache lines in parallel to determine a hit or miss.

This requires a very large number of comparators that increase the complexity and cost of implementing large caches. Therefore, this type of cache is usually only used for small caches, typically less than 4K.

233
Q

What is Huffman encoding?

A

Huffman encoding algorithm analyzes the occurrence of individual symbols and creates a binary tree where the common symbols are closest to the root, using fewer bits to encode, and less common/rare symbols have longer paths on the tree, with longer encodings to accommodate. By traversing the tree, from root to leaf, and keeping track of 1 or 0 at each node, we can determine the encoding of the symbol.

234
Q

What are Dense Codes?

A

A way to put symbols or words into a dictionary or array, and use the indices as the values in the text to save space so that words are not repeated.

235
Q

Codeless question: Give an algorithm for finding an ordered word pair (e.g., “New York”) occurring with the greatest frequency in a given webpage. Which data structures would you use?

A

You could use a hash table, creating or updating an entry for each pair. Keep track of max_frequency and most_frequent_phrase. Just increment the count, and when you see the new count is > than max_frequency, update max_frequency and most_frequent_phrase

236
Q

Codeless question: Given a set of n numbers, how do you find the pair of numbers that have the smallest difference between them?

A

Sort them: Once the numbers are sorted, the closest pair of numbers must lie next to each other somewhere in sorted order. Thus, a linear-time scan through them completes the job, for a total of O(n log n) time including the sorting.

237
Q

Codeless question: Are there any duplicates in a given set of n items?

A

This is a special case of the closest-pair problem, where we ask if there is a pair separated by a gap of zero. The most efficient algorithm sorts the numbers and then does a linear scan though checking all adjacent pairs.

238
Q

Codeless question: Given a set of n items, which element occurs the largest number of times in the set? Bonus: How do you find out how many times some element k appears?

A

If the items are sorted, we can sweep from left to right and count them, since all identical items will be lumped together during sorting.

To find out how often an arbitrary element k occurs, look up k using binary search in a sorted array of keys. Then use binary search in each direction to find where that run of the number begins and ends.

239
Q

Codeless question: Give an efficient algorithm to determine whether two sets (of size m and n, respectively) are disjoint.

A

The small set can be sorted in O(m log m) time. We can now do a binary search with each of the n elements in the big set, looking to see if it exists in the small one. The total time will be O((n + m) log m)

240
Q

What kinds of problems is dynamic programming best suited for?

A
  • optimizing left to right sequences (strings, tree nodes as array, permutations)
  • search all possibilities while storing results to avoid recomputing
241
Q

What is the Floyd-Warshall algorithm

A

The Floyd–Warshall algorithm is a dynamic programming algorithm for finding shortest paths in a weighted graph with positive or negative edge weights (but with no negative cycles).