coding university - Data structure and algorithms Flashcards

1
Q

What is ZeroMQ?

A

A socket-based system, can be used as a queue, pub/sub, etc.

Carries messages across inproc, IPC, TCP, TIPC, multicast.

Smart patterns like pub-sub, push-pull (pipeline), and router-dealer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ActiveMQ?

A

Apache ActiveMQ is an open source message broker written in Java.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MessagePack?

A

MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it’s faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.

No IDL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Avro?

A

Apache Avro is a data serialization system. IDL-based.

Rich data structures.

A compact, fast, binary data format.

A container file, to store persistent data.

Remote procedure call (RPC).

Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Bloom filter?

A

A Bloom filter is a data structure used to quickly test membership in a set where the number and size of possible elements would be very large. Too large to keep in memory.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate. In other words, a query returns either ““possibly in set”” or ““definitely not in set””. Elements can be added to the set, but not removed (though this can be addressed with a ““counting”” filter). The more elements that are added to the set, the larger the probability of false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you easily generate multiple hashes for the same element?

A

Double hashing. This method gives you as many hashes as you need:

hash(x,m) = (hasha(x) + i * hashb(x)) mod m

In Python:

import mmh3

mmh3.hash64(‘foo’) # two 64 bit signed ints, in a tuple

now you have 2 64-bit hashes. Substituting for i gives you multiple hashes for a Bloom filter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a cache-oblivious algorithm?

A

A cache-oblivious algorithm does not mean that the algorithm does not take advantage of the cache; to the contrary, it does so quite effectively. What it means is that the algorithm does not need to know the cache line size; it works effectively for all cache line sizes simultaneously, removing the need to tune or optimize for a given machine.

Optimal cache-oblivious algorithms are known for the Cooley–Tukey FFT algorithm, matrix multiplication, sorting, matrix transposition, and several other problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you augment a splay tree so you can find how many items are between x and y?

A

Store size of subtrees at each node.

Find x, splay to root. Each splay, insert, and delete must maintain size in node.

Find y, and along the way add up the sizes in the left subtrees, and 1 for each visited left-hand node.

Splay y to root to ensure balance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a maximum flow problem, what is the minimum cut?

A

The min cut is the maximum flow through the graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Ford-Fulkerson algorithm?

A

The Ford–Fulkerson method or Ford–Fulkerson algorithm (FFA) is a greedy algorithm that computes the maximum flow in a flow network. It is called a ““method”” instead of an ““algorithm”” as the approach to finding augmenting paths in a residual graph is not fully specified or it is specified in several implementations with different running times. The name ““Ford–Fulkerson”” is often also used for the Edmonds–Karp algorithm, which is a specialization of Ford–Fulkerson.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the running time for the disjoint set data structure?

A

Due to merging smaller disjoint sets into larger ones (called union by rank) (during union) and performing path compression (during find), the amortized time per operation is only O(alpha(n)), where alpha(n) is the inverse of the function and A is the extremely fast-growing Ackermann function. Since alpha(n) is the inverse of this function, alpha(n) is less than 5 for all remotely practical values of n. Thus, the amortized running time per operation is effectively a small constant.

The worst-case for find() is Theta(log u) where u is the number of unions, and no finds have been done to allow for path compression yet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What Python flag turns on optimizations and removes assertions from code?

A

python -O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is doing work in a constructor a bad thing?

A

It can make your code harder to test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What should be avoided to ensure testing is easier/possible?

A
  • static methods and properties
  • final keyword
  • use of new in methods (use dependency injection)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some guidelines to keep in mind to not violate the dependency inversion principle?

A
  • No variable should have a concrete class type. An abstract type is better.
  • No class should derive from a concrete class.
  • No method should override an implemented method of any of its base classes.

These are guidelines and may not be feasible all the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is separate chaining?

A

In hash table conflict resolution, each bucket is independent and has some sort of linked list of entries with the same index. The time for hash table operations is the time to find the bucket (which is constant) plus the time for the list operation.

In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than that. Therefore, structures that are efficient in time and space for these cases are preferred. Structures that are efficient for a fairly large number of entries per bucket are not needed or desirable. If these cases happen often, the hashing function needs to be fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is open addressing?

A

In hash table conflict resolution, all entry records are stored in the bucket array itself. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. The name ““open addressing”” refers to the fact that the location (““address””) of the item is not determined by its hash value. (This method is also called closed hashing; it should not be confused with ““open hashing”” or ““closed addressing”” that usually mean separate chaining.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the length of the longest chain in a hash table using separate chaining?

A

O(1 + alpha) where alpha is the load factor, n/m.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Since uniform hashing is difficult to achieve in practice, what is a great alternative?

A

double hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you test if a number is odd in bitwise operations?

A

return (x & 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can you test if a number is even in bitwise operations?

A

return (x & 1) == 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is another name for a breadth-first search traversal?

A

Level-order traversal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a 2-3-4 tree?

A

2–3–4 tree (also called a 2–4 tree) is a self-balancing data structure that is commonly used to implement dictionaries. The numbers mean a tree where every node with children (internal node) has either two, three, or four child nodes:

  • 2-node has one data element, and if internal has two child nodes;
  • 3-node has two data elements, and if internal has three child nodes;
  • 4-node has three data elements, and if internal has four child nodes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the complexity of all operations on a splay tree?

A

O(log n) on average.

A single operation Theta(n) in the worst case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the maximum height of a red-black tree?
2 log n
26
In a b-tree, how many children are there per node?
root: 1 to 2t-1 keys non-root: t-1 to 2t-1 keys t could be up to 100, or more. There are n keys and n+1 children. Leaves are all the same level.
27
What does the max degree of a b-tree depend on?
The number of items being stored, and page size based on disk characteristics.
28
A b-tree's data is organized to correspond with what?
Pages on disk.
29
Give an example of how a b-tree might be organized.
1024 children per node. Store root in memory. 3 nodes accessed gets us 1024^3 disk pages. 4 nodes accessed gets us 1024^4 disk pages.
30
On descending a b-tree, what's the rule?
Never step into a minimal node.
31
On insertion in a b-tree, what's the rule?
Never step into a full node.
32
How many nodes of k leaves are in a compressed trie (big-O)?
O(k) nodes with k leaves due to compression.
33
What is a suffix tree?
A suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations. The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provide one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string's suffix tree typically requires significantly more space than storing the string itself.
34
In brief, how does selection sort work?
Find the minimum item on each pass, past the previous minimum, and swap it into the leftmost position after the previous minimum.
35
When can insertion sort run in n log n time?
Load into a binary search tree. Then inorder traversal.
36
How can you speed up selection sort with a heap?
Replace the unsorted portion with a min-heap. Gives O(log n) removal. Makes n log n overall.
37
What data structure is well suited for a heap sort and which is bad?
Array - good Linked list - clumsy
38
What data structure is well suited for a merge sort and which is just okay?
Linked list - a natural Array does not allow for in-place
39
How can you optimize finding a pivot when the segment to pivot is large (not random choice)?
Choose a median of three.
40
What is counting sort?
Counting sort is an algorithm for sorting a collection of objects according to keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the number of objects that have each distinct key value, and using arithmetic on those counts to determine the positions of each key value in the output sequence. Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort, that can handle larger keys more efficiently.
41
What is radix sort?
Radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. Two classifications of radix sorts are least significant digit (LSD) radix sorts and most significant digit (MSD) radix sorts. LSD radix sorts process the integer representations starting from the least digit and move towards the most significant digit. MSD radix sorts work the other way around.
42
What is the counting sort running time?
O(q + n) where q is the number of unique items. If q is in O(n), then linear time.
43
What radix is most natural to use?
A power of 2 radix.
44
How would radix sort work for IEEE floating point numbers?
Flip all bits for negative numbers, do sort, then flip back.
45
How to choose q for radix sort?
# Choose q within a power of 2 of n. Ensures the number of passes is small. Best rule is n rounded down to the next power of 2. To save memory, round sqrt(n) down to the next power of 2. Twice as many passes.
46
What operations are a treap optimized for?
- union - intersection - difference
47
What is the Day–Stout–Warren (DSW) algorithm?
The Day–Stout–Warren (DSW) algorithm is a method for efficiently balancing binary search trees — that is, decreasing their height to O(log n) nodes, where n is the total number of nodes. Unlike a self-balancing binary search tree, it does not do this incrementally during each operation, but periodically, so that its cost can be amortized over many operations.
48
What is the insertion sort algorithm?
for (i = 0; i < n; ++i) { j = i; while (j > 0 && a[j - 1] > a[j]) { swap(a, j, j - 1); j -= 1; } }
49
Is radix sort stable?
yes
50
What is the algorithmic time complexity of radix sort?
O(digits)
51
Give the code for selection sort.
for (i = 0; i < n; ++i) { min_index = i: for (j = i; j < n; ++j) { if (a[j] < a[min_index]) { min_index = j; } } swap(a, i, min_index) }
52
All comparison-based sorting is bounded by what complexity?
Omega(n log n)
53
What do you call a linear ordering of a directed graph of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering?
Topological sort
54
What is a good method for performing a topological sort?
1. Calculate in-degree for each node. O(v + e) 2. Go through 0s, add to queue. 3. For each item in queue, look at each connection, and decrement in-degree of each, if they got to 0, add to queue, repeat.
55
How many possible trees are there that span all nodes in a graph?
4^n
56
What is Prim's algorithm?
def prim(self): """""" Returns a dictionary of parents of vertices in a minimum spanning tree :rtype: dict """""" s = set() q = queue.PriorityQueue() parents = {} start_weight = float(""inf"") weights = {} # since we can't peek into queue for i in self.get_vertex(): weight = start_weight if i == 0: q.put(([0, i])) weights[i] = weight parents[i] = None while not q.empty(): v_tuple = q.get() vertex = v_tuple[1] s.add(vertex) for u in self.get_neighbor(vertex): if u.vertex not in s: if u.weight < weights[u.vertex]: parents[u.vertex] = vertex weights[u.vertex] = u.weight q.put(([u.weight, u.vertex])) return parents
57
What is the time complexity of Prim's algorithm on an adjacency matrix?
O(v^2)
58
What is the time complexity of Prim's algorithm on an adjacency list and a binary heap?
O(e log v) derived from: O((e + v) log v)
59
What is the time complexity of Prim's algorithm on an adjacency list and a Fibonacci heap?
O(e + v log v)
60
What is the pseudocode Kruskal's algorithm?
KRUSKAL(G): A = ∅ foreach v ∈ G.V: MAKE-SET(v) foreach (u, v) in G.E ordered by weight(u, v), increasing: if FIND-SET(u) ≠ FIND-SET(v): A = A ∪ {(u, v)} UNION(u, v) return A
61
What is the time complexity of Kruskal's algorithm?
O(E log V) or O(e log e + e α(v) + v)
62
What is Kruskal's algorithm?
Kruskal's algorithm is a minimum-spanning-tree algorithm which finds an edge of the least possible weight that connects any two trees in the forest. It is a greedy algorithm in graph theory as it finds a minimum spanning tree for a connected weighted graph adding increasing cost arcs at each step. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component).
63
How can you find the number of connected components?
For each node: if node not yet visited, increment component count and do DFS.
64
How can you get a topological sort with DFS?
Do a DFS, and when each node is being marked as complete, add node to a list. Reverse the list.
65
How can you check for a cycle with DFS?
for each neighbor node: if not marked as visited (and is not parent) then DFS else it's a cycle
66
How can you get the strongly connected components of a graph?
1. DFS - calculate the finish times for each node 2. Reverse the edges in the graph 3. Call DFS on nodes in reverse graph in reverse order of finishing times.
67
How do you reverse the edges in a directed graph represented as an adjacency matrix?
Transpose the matrix, so [i, j] becomes [j, i]
68
How can you find the shortest path on a DAG?
1. Topological sort | 2. follow the topological sort, relaxing edges
69
How to find the longest path on a weighted DAG?
1. Set all edges to their negative weight. 2. Topological sort 3. follow the topological sort, relaxing edges
70
What is the diameter of a graph?
The shortest path of the farthest nodes. That is, it is the greatest distance between any pair of vertices. To find the diameter of a graph, first find the shortest path between each pair of vertices. The greatest length of any of these paths is the diameter of the graph.
71
Under what condition can you not use Djikstra's algorithm?
When the graph contains a negative edge. Can cause a cycle that will be traversed infinitely.
72
In plain words, how does Kruskal's algorithm work?
1. Create a set T and list for result 2. Make a list of all edges in G 3. Sort edges by weight, from least to greatest. 4. Iterate edges in sorted order. 5. For each edge, if u and v are not in T, add u and v to T, and add edge to result list.
73
What can most dynamic programming problems be expressed as?
Finding the shortest path in a DAG. Formulating it this way ensures you can solve it in linear or linearithmic time.
74
What metric can you use to measure the badness of a line in a text justification problem?
(page width - text width)^3 Minimize the sum of the badness of the lines.
75
How can you tell if a graph is 2-colorable?
If it's bipartite. All trees are bipartite.
76
What is it called when you have too many base cases in your recursion?
arm's length recursion
77
What is the base case of a recursion?
The code required to give the solution to the smallest subproblem.
78
What is the formula for n choose k?
n! / k!(n - k)!
79
What is the general outline of a backtracking algorithm?
ef solve(conf): if (no more choices): return conf choices = get_available_choices for choice in choices: c = pick one if solve(conf using c): return true unmake choice c return false
80
At how many items should you expect a collision when hashing among n buckets?
At sqrt(n) the probability is 1/2
81
What is n/n^2?
1/n
82
What does it mean when a problem is NP-Hard?
It is as hard as any other problem in NP. A problem X is NP-Hard if every problem Y in NP-Hard reduces to X.
83
"Is ""3-D matching"" NP-Complete?"
yes
84
"Is ""triple coloring a graph"" NP-Complete?"
YEs
85
"Is ""two coloring a graph"" NP-Complete?"
No
86
"Is ""subset sum"" NP-Complete?"
Yes
87
"Is ""bin packing"" NP-Complete?"
Yes
88
"Is ""vertex cover"" NP-Complete?"
Yes
89
"Is ""set cover"" NP-Complete?"
Yes
90
Name some NP-Complete problems.
- tsp - knapsack problem - satisfiability - 3D matching - tricoloring - subset sum - rectangle packing - bin packing - vertex cover - set cover
91
What is one way of doing approximate traveling salesman?
Select a vertex as root. Build a MST. Do a preorder traversal, store nodes in H. Return H (a Hamiltonian cycle)
92
How can an LRU cache be implemented with a linked list?
When an item is accessed, it moves to the head of the list. The trailing items can be overwritten with new items, or removed.
93
What is a skip list?
A data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than or equal to the element searched for. A skip list is built in layers. The bottom layer is an ordinary ordered linked list. Each higher layer acts as an ""express lane"" for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4).
94
What operations does a skip list support and what is their avg and worst case times?
search: O(log n) O(n) insert: O(log n) O(n) delete: O(log n) O(n)
95
What operations does a van Emde Boas tree support and what are the time complexities?
All are O(log log M), where M is the total number of items that can be stored = 2^m Or O(log m) where m is the actual number of items stored Space: O(M) Search Insert Delete Predecessor Successor
96
What are the complexities for treap operations?
For all the basic maintenance operations, they are O(log n) average case and O(n) worst case. - Search - Insert - Delete For these operations, O(m log n/m) for treaps of sizes m and n, with m ≤ n. - union - intersection - difference
97
What are Catalan numbers?
Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively-defined objects. They can be thought of as the set of balanced parentheses. Do not think of Catalan numbers as pseudoprimes. There are only 3 Catalan pseudoprimes.
98
How are the priorities of a treap assigned?
Randomly generated upon insertion. That randomness is used to keep the tree balanced.
99
Is a geometric Steiner tree NP-Complete?
Yes
100
What are the 2 algorithms for convex hull?
- Graham scan | - Jarvis march (gift-wrapping method)
101
How does a Graham scan work in finding convex hull?
At O(n log n), uses a sort and then a simple single pass of all the points, and making only left turns as it goes around the perimeter counter-clockwise. When the next point is a right turn, it backtracks past all points (using a stack and popping points off) until that turn turns into a left turn.
102
How does the Jarvis march work in finding convex hull?
Starting with the leftmost point p: Go through each point to the right of that point, and using p as a pivot, find which point is the most clockwise. O(n) Get the most clockwise point as the new p - O(1) Loop again with new p This continues until the starting point is reached O(h) - where h is the number of hull points
103
What is the worst case time complexity of a Jarvis march?
O(n^2) Occurs when most points are part of the hull, and few points contained in the hull.
104
What is the average complexity of a Jarvis march?
O(n * h) where h is the number of points that compose the hull.
105
unordered singly linked list - time complexity
delete - o(n) | find - o(n)
106
ordered singly linked list - time complexity
delete- o(n)
107
Binary Tree - time complexity
find - o(h) add - o(h) delete - O(h)
108
stack - time complexity
Add element to the top of the stack - push O(1) Remove the top element of the stack - pop O(1) Return the value of the top element of the stack without removing it. O(1)
109
Queue - time complexity
Add an element to a queue. O(1) Remove an element from the front of the queue. dequeue O(1) Return the element from the front of the queue without removing it. - front O(1)
110
(Balanced Binary Search Tree)
find - O(log N) add - O(log N) delete - O(log N)
111
What is a skip list?
A data structure for storing a sorted list of items using a hierarchy of linked lists that connect increasingly sparse subsequences of the items. O(log N) expected time for all operations, O(N) worst case.
112
What is a treap?
tree + heap. A random priority is assigned to every key and must maintain two properties: - They are in order with respect to their keys, as in a typical binary search tree - They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor O(log N) expected time for all operations, O(N) worst case.
113
What is a max-heap?
A queue in which each element has a ""priority"" assigned to it. Elements with higher priorities are served before lower priorities.
114
binary heap - time complexity
min - o(1) insert - o(log n) removemin - o (logn)
115
What is a binary heap?
A collection of keys arranged in a complete heap-ordered binary tree, represented in level order in an array (not using the first entry). The parent of the node in position k is in position [k/2] and the two children of the node in position k are in position 2k and 2k+1.
116
What is a Adaptable Priority Queue?
A priority queue that allows you to change the priority of objects already in the queue.
117
What is the time complexity of quicksort?
O(N^2 worst) O(N log N) - best & expected Lower Bound for Comparison Based Sorting No comparison based sorting algorithm can be faster than O(N log N)
118
What is a connected graph?
There exists a path from every vertex to every other vertex in the graph.
119
What is a tree?
An acyclic connected graph.
120
What is a cycle?
Path with at least one edge whose first and last vertices are the same.
121
What is a spanning tree?
A subgraph that contains all of that graph's vertices and a single tree. Also known as Min-Cost Spanning Tree
122
What is stable sorting?
Items with the same key are sorted based on their relative position in the original permutation
123
What is another name for a trie?
Prefix tree or a radix tree.
124
What is internal sorting?
An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory. For sorting larger datasets, it may be necessary to hold only a chunk of data in memory at a time, since it won't all fit. The rest of the data is normally held on some larger, but slower medium, like a hard-disk. Any reading or writing of data to and from this slower media can slow the sortation process considerably.
125
What is external sorting?
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file. Mergesort is typically preferred.
126
What are 2 advantages of merge sort?
- suitable for a linked list | - suitable for external sort
127
What is disadvantages of merge sort?
Need an extra buffer to hold the merged data
128
What are 3 advantages of heap sort?
- don't need recursion - suitable for large data - locality of data
129
What is a disadvantage of heap sort?
Usually slower than merge sort and quick sort.
130
What is a articulation vertex?
The weakest point in a graph.
131
What is the chromatic number?
The smallest number of colors needed for an edge coloring of a graph.
132
What is the degree of a vertex?
The number of edges incident of the vertex, with loops counted twice.
133
What is quick select?
A selection algorithm to find the kth smallest element in an unordered list. Quickselect uses the same overall approach as quicksort, choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, quickselect only recurses into one side - the side with the element it is searching for. This reduces the average complexity from O(n log n) to O(n).
134
What is an inverted index?
An index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents (named in contrast to a Forward Index, which maps from documents to content). The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.
135
What is set partition?
A partitioning of elements of some universal set into a collection of disjointed subsets. Thus, each element must be in exactly one subset.
136
What is a maximum spanning tree?
A spanning tree of a weighted graph having maximum weight. It can be computed by negating the edges and running either Prim's or Kruskal's algorithms
137
What is a minimum product spanning tree and when would you use it?
The cost of a tree is the product of all the edge weights in the tree, instead of the sum of the weights. Since log(a*b) = log(a) + log(b), the minimum spanning tree on a graph whose edge weights are replaced with their logarithms gives the minimum product spanning tree on the original graph. You would use it to minimize the product.
138
What is a rolling hash?
A rolling hash (also known as a rolling checksum) is a hash function where the input is hashed in a window that moves through the input. One of the main applications is the Rabin-Karp string search algorithm, which uses the rolling hash.
139
What is the Euclidean GCD algorithm in Python?
def gcd(a, b): while a: b, a = a, b % a return b
140
What is the Rabin-Karp algorithm?
Compute hash codes of each substring whose length is the length of s, such as a function with the property that the hash code of a string is an additive function of each individual character. Get the hash code of a sliding window of characters and compare if the hash matches.
141
What is the time complexity of breadth-first search?
O(m + n)
142
What is the time and space complexity of Bellman-Ford?
Time : O (|V| |E|) or Theta(n^3) Space: O (|V|)
143
What is the Bellman–Ford algorithm?
An algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers.
144
What is a Hamiltonian cycle?
Given an undirected graph G = (V, E), does there exist a simple cycle Γ that contains every node in V. Certificate is a permutation of the n nodes, contain each node in v exactly once, there is an edge btw each pair of adj nodes in the permutation.
145
What is the set cover problem?
Given a set U of elements, a collection S1, S2, ..., Sm of subsets of U, and an integer k, does there exist a collection of ≤ k of these sets whose union is equal to U
146
What is the time and space complexity of heapsort?
O(n lg n) time O(1) space
147
What is the time and space complexity of merge sort?
O(n lg n) time O(n) space
148
Write a function that reverses a linked list, with this argument: pointer to pointer to the head node.
void reverse(node_t **head) { node_t *prev = NULL; node_t *current = *head; node_t *next = *head; while (current) { next = current->next; current->next = prev; prev = current; current = next; } *head = prev; }
149
Delete a given value from a BST rooted at given node. Returns a pointer to node.
bst_node* delete_value(bst_node* node, int value) { if (node == NULL) return node; if (value < node->value) { node->left = delete_value(node->left, value); } else if (value > node->value) { node->right = delete_value(node->right, value); } else { // found value if (node->left == NULL && node->right == NULL) { free(node); node = NULL; } else if (node->left == NULL) { bst_node* temp = node; node = node->right; free(temp); } else if (node->right == NULL) { bst_node* temp = node; node = node->left; free(temp); } else { // 2 children - get min node of right subtree int right_min = get_min(node->right); node->value = right_min; node->right = delete_value(node->right, right_min); } } return node; }
150
Get the successor of a value in a BST rooted by given node. Returns int.
int get_successor(bst_node* node, int value) { if (node == NULL) return -1; bst_node* target = node; while (target->value != value) { if (value < target->value) { target = target->left; } else if (value > target->value) { target = target->right; } } // arrived at target node if (target->right != NULL) { // get min value of right subtree return get_min(target->right); } else { // get lowest ancestor that is a left child in the path to target value bst_node* successor = NULL; bst_node* ancestor = node; while (ancestor != NULL) { if (value < ancestor->value) { successor = ancestor; ancestor = ancestor->left; } else { ancestor = ancestor->right; } } return successor->value; } }
151
Using recursion, insert a value into a tree
root = insert(node*, int) "bst_node* insert(bst_node* node, const int value) { if (node == 0) { bst_node* new_node = malloc(sizeof(bst_node)); if (new_node == NULL) { printf(""Unable to allocate memory.""); exit(0); } new_node->value = value; new_node->left = 0; new_node->right = 0; node = new_node; return node; } if (value < node->value) { node->left = insert(node->left, value); } else if (value > node->value) { node->right = insert(node->right, value); } return node; }
152
Write a method is_binary_search_tree that returns true if a given tree is a BST (use helper function)
bool is_binary_search_tree(bst_node* node) { if (node == NULL) return true; return is_between(node, INT_MIN, INT_MAX); } bool is_between(bst_node* node, int min, int max) { if (node == NULL) return true; // ensure subtrees are not hiding a value lower or higher than the subtree // allows return node->value > min && node->value < max && is_between(node->left, min, node->value) && is_between(node->right, node->value, max); }" Using an iterative approach, write a function find_node(bst_node* root, int target) that returns the node with the given target value in a BST. "bst_node* find_node(bst_node* root, int target) { while (root != NULL && root->key != target) { if (root->key > target) { root = root->left; } else { root = root->right; } } return root; }
153
Function that returns the height (in nodes) of a BST: int get_height(bst_node* node)
int get_height(bst_node* node) { if (node == NULL) { return 0; } return 1 + max_num(get_height(node->left), get_height(node->right)); }
154
How many levels in a complete binary tree of size n?
floor(1 + log(base2)(n))
155
How can build heap be done in linear time?
A tree of size n nodes, will have floor(n/2^h) nodes with height >= h. The last half of nodes will be leaves, so they already satisfy the heap property. No work needs to be done on them. going bottom-up (ignoring the last n/2 items) and satisfying the heap property one level at a time, each level going up the tree has to do at most 1 operation more than the level below it. But as you go up the tree, higher levels have fewer nodes, so you may be doing more operations, but it happens on fewer number of times. This resembles a series: n/2 - height 1: 1 operations n/4 - height 2: 2 operation n/8 - height 3: 3 operations ... going to floor(n/2^h) - height h: h operations n * (1/2 + 2/4 + 3/8 + 4/16 ....) = n * 1 = n" C or Python: Sort an array of numbers using heap sort. "void heap_sort(int* numbers, int count) { int temp; for (int i = count - 1; i > 0; --i) { temp = numbers[i]; numbers[i] = numbers[0]; numbers[0] = temp; percolate_down(numbers, i, 0); } } void heapify(int* numbers, int count) { for (int i = count / 2 - 1; i >= 0; --i) { percolate_down(numbers, count, i); } } void percolate_down(int* numbers, int count, int index) { while (index * 2 + 1 < count) { int swap_index = index; int left_child_index = index * 2 + 1; int right_child_index = index * 2 + 2; bool has_left_child = left_child_index < count; bool has_right_child = right_child_index < count; if (has_left_child && has_right_child) { if (numbers[left_child_index] > numbers[right_child_index]) { swap_index = left_child_index; } else { swap_index = right_child_index; } } else if (has_left_child) { swap_index = left_child_index; } else if (has_right_child) { swap_index = right_child_index; } else { break; } if (numbers[swap_index] > numbers[index]) { int temp = numbers[index]; numbers[index] = numbers[swap_index]; numbers[swap_index] = temp; index = swap_index; } else { break; } } }
156
How are queues usually implemented
Using a Circular Array or Singly Linked List.
157
How is a deque usually implemented?
Using a Circular Array or Doubly Linked List.
158
Write a binary search function that works iteratively, taking a target int, array of ints, and size of the array, returning the index of the found item, or -1
int binary_search(int target, int numbers[], int size) { int low = 0; int high = size - 1; int mid = 0; while (low <= high) { mid = (high + low) / 2; if (target > numbers[mid]) { low = mid + 1; } else if (target < numbers[mid]) { high = mid - 1; } else { return mid; } } return -1; }
159
Write a binary search function that works recursively, returning the index of the found item, or -1
int binary_search_recur(int target, int numbers[], int low, int high) { if (low > high) { return -1; } int mid = (high + low) / 2; if (target > numbers[mid]) { return binary_search_recur(target, numbers, mid + 1, high); } else if (target < numbers[mid]) { return binary_search_recur(target, numbers, low, mid - 1); } else { return mid; } }
160
In C or Python, Write a universal hashing function for a string, taking as arguments a string and the capacity of the hashtable
int hash(const char* key, const int m) { int hash = 0; for (int i = 0; i < key[i] != '\0'; ++i) { hash = hash * 31 + key[i]; } return abs(hash % m); }
161
What is a Binary Search Tree?
A binary tree is a data structure where each node has a comparable key and satisfies the restriction that the key in any node is larger than the keys in all nodes in that node's left subtree and smaller than the keys in all nodes in that node's right subtree.
162
What is an AVL tree?
A BST where the height of every node and that of its sibling differ by at most 1.
163
What is a red-black tree?
BSTs having red and black links satisfying: - Red links lean left - No node has two links connected to it - The tree has perfect black balance: every path from the root to a null link has the same number of blacks
164
What is a splay tree?
A self-adjusting binary search tree where recently accessed elements are moved to the root so they are quick to access again.
165
What is a treap?
A random priority is assigned to every key and must maintain two properties: - They are in order with respect to their keys, as in a typical binary search tree - They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor O(log N) expected time for all operations, O(N) worst case
166
What is a van Emde Boas tree?
The van Emde Boas tree supports insertions, deletions, lookups, successor queries, and predecessor queries in time O(log log U), where U is the universe of items to store. Items are stored in clusters of size sqrt(U). The van Emde Boas data structure divides the range {0,...,n−1} into blocks of size sqrt(n), which we call clusters. Each cluster is itself a vEB structure of size sqrt(n). In addition, there is a “summary” structure that keeps track of which clusters are nonempty. ---More details ---- A van Emde Boas tree (or van Emde Boas priority queue), also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O(log m) time, or equivalently in O(log log M) time, where M = 2m is the maximum number of elements that can be stored in the tree. The M is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured. The vEB tree has good space efficiency when it contains a large number of elements, as discussed below. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975.
167
What is a compressed trie?
It's a trie where the non-branching paths are compacted into a single edge.
168
What is a compressed trie?
It's a trie where the non-branching paths are compacted into a single edge.
169
What relationship of the keys do you lose with a hash table?
The ordering of the keys.
170
Is quicksort stable?
No
171
Can quicksort be done in-place?
Yes
172
Can merge sort be done in-place?
No. It requires O(n) space
173
Is merge sort stable?
Yes
174
Is insertion sort stable?
Yes
175
Can insertion sort be done in-place?
Yes
176
Can selection sort be done in-place?
Yes
177
Is selection sort stable?
no
178
Is heap sort stable?
No
179
Can heap sort be done in-place?
yes
180
What is a y-fast trie?
A y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982 to decrease the O(n log M) space used by an x-fast trie.
181
What is an x-fast trie?
An x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.
182
Write merge sort in C (check answer carefully)
void merge(int numbers[], int low, int mid, int high) { // temp array for holding sorted items int b[high - low - 1]; int i = low; int j = mid + 1; int k = 0; // merge items from list in order while (i <= mid && j <= high) { if (numbers[i] <= numbers[j]) { b[k++] = numbers[i++]; } else { b[k++] = numbers[j++]; } } // copy the remaining items to tmp array while (i <= mid) b[k++] = numbers[i++]; while (j <= high) b[k++] = numbers[j++]; --k; while (k >= 0) { numbers[low + k] = b[k]; --k; } } void merge_sort(int numbers[], int low, int high) { if (low < high) { int mid = (low + high) / 2; merge_sort(numbers, low, mid); merge_sort(numbers, mid + 1, high); merge(numbers, low, mid, high); } }
183
Write quick sort in C with only one method and random pivot (check answer carefully)
void quick_sort(int numbers[], int left, int right) { if (left == right) return; int i = left; int j = right; int temp = 0; int count = right - left; int pivot_mod = rand() % count; int pivot = numbers[left + pivot_mod]; while (i <= j) { while (numbers[i] < pivot) ++i; while (numbers[j] > pivot) --j; if (i <= j) { temp = numbers[i]; numbers[i] = numbers[j]; numbers[j] = temp; ++i; --j; } } if (left < j) { quick_sort(numbers, left, j); } if (right > i) { quick_sort(numbers, i, right); } }
184
In what case would perfect hashing be practical?
When you don't need to support inserts or deletes. The data is static.
185
How does perfect hashing handle collisions?
It creates a second hash table in the buckets where there are multiple items (k), using a second hash function, and k^2 space. The hash table has two hashing levels. k^2 is chosen because the Markov inequality (birthday paradox) ensures we will not have collisions in bucket.
186
What is the optimal load factor for a hash table?
O(sqrt(n))
187
What is the expected load factor for a hash table?
n/m, where n = items, m = buckets) n/m is also called alpha.
188
What is the technical running time for operations on a hash table?
O(1 + alpha), where alpha is the load factor (n/m). Table doubling operations are amortized.
189
What is the worst-case search time of perfect hashing?
O(1)
190
What is the worst-case space required for perfect hashing?
O(n)
191
What's the best-case running time of binary search?
O(1) - we get lucky and find the element right at the midpoint.
192
What's the worst-case running time of binary search?
O(log n)
193
What are the downsides of using an adjacency matrix to represent a graph?
Finding all the outgoing edges from a vertex takes O(n) time even if there aren't very many, and the O(n^2) space cost is high for ""sparse graphs,"" those with much fewer than n^2 edges.
194
When is using an adjacency list expensive?
Finding predecessors of a node u is extremely expensive, requiring looking through every list of every node in time O(n + e), where e is the total number of edges, although if this is something we actually need to do often we can store a second copy of the graph with the edges reversed.
195
When are adjacency lists most useful?
Adjacency lists are most useful when we mostly want to enumerate outgoing edges of each node. This is common in search tasks, where we want to find a path from one node to another or compute the distances between pairs of nodes. If other operations are important, we can optimize them by augmenting the adjacency list representation; for example, using sorted arrays for the adjacency lists reduces the cost of edge existence testing to O(log(d+ (u))), and adding a second copy of the graph with reversed edges lets us find all predecessors of u in O(d− (u)) time, where d− (u) is u's in-degree.
196
What is the space required for a graph using an adjacency list?
O(n + e)
197
Given a fully balanced binary tree with x nodes, what is the height of the tree in nodes?
log(base2) x + 1
198
Given a fully balanced k-ary tree with x nodes, what is the height of the tree in nodes?
log(basek) x + 1
199
A binary tree with height h can contain at most how many nodes?
2^(h+1) − 1 nodes
200
For a k-ary tree with height h, the upper bound for the maximum number of leaves is:
k^h
201
What is the complexity of Dijkstra's shortest-path algorithm
O(e log v), where e is the number of edges. It must scan each edge, and gets and updates values on the heap.
202
What is a drawback of using an adjacency matrix for an undirected graph?
Half of the entries in the matrix are duplicates.
203
What is the memory needed to store an adjacency list?
Theta( |V| + |E| )
204
What is the memory needed to store an adjacency matrix?
Theta(|V|^2)
205
How would you implement a queue with a linked list?
Use a tail pointer. Push new items at the tail, pop items at the head. Both operations are constant-time.
206
How would you implement a stack with a linked list?
Push and pop items at the head. Both operations are constant-time.
207
What preference of nodes vs leaves does preorder traversal give on a tree?
Nodes first, leaves later.
208
What preference of nodes vs leaves does postorder traversal give on a tree?
Leaves first, internal nodes later.
209
What could you use in DFS to turn a recursive algorithm into an interative one?
A stack
210
What do you use to keep track of nodes to visit in BFS?
queue
211
Using a stack to keep track of unvisited nodes gives what kind of traversal?
DFS
212
Using a queue to keep track of unvisited nodes gives what kind of traversal?
BFS
213
In a highly connected graph of n vertices, how many cycles can there be?
(n - 1)! - enumerating is possible (using backtracking), but there will be a lot.
214
What can use to find if a graph is bipartite?
BFS. Using only 2 colors. When you encounter a new vertex, if it has no color, give it the opposite color of its parent vertex. If it is already colored the same, the graph is not bipartite.
215
How can you find a cycle in a graph?
DFS. If you discover an edge that connects to an ancestor (previously discovered vertex), you have a cycle.
216
What is an articulation vertex?
A vertex of a graph whose deletion disconnects the graph.
217
How can you find an articulation vertex?
DFS multiple times. Remove each edge one at a time, doing a DFS after each, so see if you end up with > 1 connected components. If you remove a node and then DFS and find you have fewer than m - 1 edges, you've deleted an articulation vertex. O(n(n+m)). A faster way, with a little more bookkeeping, can be done in O(n+m) time, if you do DFS and keep track of parents and make a note when you reach a back edge, which connects to an ancestor.
218
What is the optimal substructure property tell us about shortest paths?
That a subpath of a shortest path is also a shortest path.
219
What is a red-black tree?
BSTs having red and black links satisfying: - Red links lean left - No node has two links connected to it - The tree has perfect black balance: every path from the root to a null link has the same number of blacks
220
What is a splay tree?
A random priority is assigned to every key and must maintain two properties: - They are in order with respect to their keys, as in a typical binary search tree - They are in heap order with respect to their priorities, that is, no key has a key of lower priority as an ancestor O(log N) expected time for all operations, O(N) worst case
221
Python: Write a class function to tell if the graph is bipartite. Start with vertex 0. You can access the adjacency list for a vertex v with: self.adjacency_list[v]
def is_bipartite(self): """""" Returns true if graph is bipartite :rtype: bool """""" colorings = {} to_visit = queue.Queue() to_visit.put(0) colorings[0] = 0 while not to_visit.empty(): v = to_visit.get() for u in self.adjacency_list[v]: if u not in colorings: colorings[u] = 1 - colorings[v] to_visit.put(u) elif colorings[u] == colorings[v]: return False return True
222
What should you avoid in your base case in recursion
Too many base case scenarios. Just have one base case so you can return as quickly as possible. Avoid ""arm's length"" recursion.
223
What is the bandwidth of a graph?
The longest edge in the permutation that gives you the shortest edges.
224
What is the complexity for a naive recursive Fibonacci function
Θ(φ^n), where phi(φ) is the golden ratio (1 + sqrt(5)) / 2. approx: 1.618
225
How many subsets are there in n items?
2^n
226
What is a contiguously-allocated structures, and give examples
Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.
227
What are linked data structures and give examples.
Linked data structures are composed of distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.
228
What are some benefits of arrays?
Constant-time access given the index - Space efficiency - Memory locality
229
What are some advantages to linked lists over arrays?
- Overflow on linked structures can never occur unless the memory is actually full. - Insertions and deletions are simpler than for contiguous (array) lists. - With large records, moving pointers is easier and faster than moving the items themselves.
230
What are some advantages to arrays over linked lists?
- Linked structures require extra space for storing pointer fields. - Linked lists do not allow efficient random access to items. - Arrays allow better memory locality and cache performance than random pointer jumping.
231
Given two strings str1 and str2, find the minimum number of edits (edit one character to another, delete char from str1 or delete char from str2) to change str1 to str2.
""""""" * DP Runtime : O(len(str1) * len(str2)) """""" def min_edit_distance(str1, str2): rows = len(str2) + 1 cols = len(str1) + 1 T = [[0 for _ in range(cols)] for _ in range(rows)] for j in range(cols): T[0][j] = j for i in range(rows): T[i][0] = i for i in range(1, rows): for j in range(1, cols): if str2[i - 1] == str1[j - 1]: T[i][j] = T[i - 1][j - 1] else: T[i][j] = 1 + min(T[i - 1][j - 1], T[i - 1][j], T[i][j - 1]) print_edits(T, str1, str2) return T[rows - 1][cols - 1] if __name__ == '__main__': str1 = ""azced"" str2 = ""abcdef"" expected = 3 assert expected == min_edit_distance(str1, str2) assert expected == min_edit_distance(str2, str1)
232
How could you implement an LRU cache?
A fast lookup table, like a hash table or binary tree, and a linked list of items by use. When you access or add an item, you delete it from the linked list and add it to the head of the list. Then to prune, traverse the linked list and remove trailing elements, and delete them from the storage (tree or hash table). You can also use a splay tree, since it moves accesses to the root. To prune items, somehow find and remove the leaves, since the number of leaves will be about n/2." What is a direct mapped cache? It's a type of cache used in the CPU, where the lower order bits of a given memory address are used modulo the number of cache lines to place or lookup in the cache. Collisions are treated as overwrites. What is a fully-associative cache? "It's a type of cache used in the CPU, where lookups are done on all cache lines in parallel to determine a hit or miss. This requires a very large number of comparators that increase the complexity and cost of implementing large caches. Therefore, this type of cache is usually only used for small caches, typically less than 4K.
233
What is Huffman encoding?
Huffman encoding algorithm analyzes the occurrence of individual symbols and creates a binary tree where the common symbols are closest to the root, using fewer bits to encode, and less common/rare symbols have longer paths on the tree, with longer encodings to accommodate. By traversing the tree, from root to leaf, and keeping track of 1 or 0 at each node, we can determine the encoding of the symbol.
234
What are Dense Codes?
A way to put symbols or words into a dictionary or array, and use the indices as the values in the text to save space so that words are not repeated.
235
Codeless question: Give an algorithm for finding an ordered word pair (e.g., “New York”) occurring with the greatest frequency in a given webpage. Which data structures would you use?
You could use a hash table, creating or updating an entry for each pair. Keep track of max_frequency and most_frequent_phrase. Just increment the count, and when you see the new count is > than max_frequency, update max_frequency and most_frequent_phrase
236
Codeless question: Given a set of n numbers, how do you find the pair of numbers that have the smallest difference between them?
Sort them: Once the numbers are sorted, the closest pair of numbers must lie next to each other somewhere in sorted order. Thus, a linear-time scan through them completes the job, for a total of O(n log n) time including the sorting.
237
Codeless question: Are there any duplicates in a given set of n items?
This is a special case of the closest-pair problem, where we ask if there is a pair separated by a gap of zero. The most efficient algorithm sorts the numbers and then does a linear scan though checking all adjacent pairs.
238
Codeless question: Given a set of n items, which element occurs the largest number of times in the set? Bonus: How do you find out how many times some element k appears?
If the items are sorted, we can sweep from left to right and count them, since all identical items will be lumped together during sorting. To find out how often an arbitrary element k occurs, look up k using binary search in a sorted array of keys. Then use binary search in each direction to find where that run of the number begins and ends.
239
Codeless question: Give an efficient algorithm to determine whether two sets (of size m and n, respectively) are disjoint.
The small set can be sorted in O(m log m) time. We can now do a binary search with each of the n elements in the big set, looking to see if it exists in the small one. The total time will be O((n + m) log m)
240
What kinds of problems is dynamic programming best suited for?
- optimizing left to right sequences (strings, tree nodes as array, permutations) - search all possibilities while storing results to avoid recomputing
241
What is the Floyd-Warshall algorithm
The Floyd–Warshall algorithm is a dynamic programming algorithm for finding shortest paths in a weighted graph with positive or negative edge weights (but with no negative cycles).