General Data Structure Flashcards
Two-Stack Queue
“Dirty Dishes” Analogy
Maintain an “In” stack and an “Out” stack.
● To enqueue an element, push it onto the
“In” stack.
● To dequeue an element:
● If the “Out” stack is nonempty, pop it.
● If the “Out” stack is empty, pop elements from the “In” stack, pushing them into the “Out” stack, until the bottom of the “In” stack is exposed.
● Intuition: We only do expensive dequeues after a long run of cheap enqueues.
● Think “dishwasher:” we very slowly introduce a lot of dirty dishes to get cleaned up all at once.
● Provided we clean up all the dirty dishes at once, and provided that dirty dishes accumulate slowly, this is a fast strategy!
When to use bloom filters
Bloom filters are a great first layer of filtering, since they don’t require much space and can fit in fast storage, like RAM. Consider using them anywhere where knowing if something is definitely not present or possibly present would be helpful.
One common use is to eliminate unnecessary accesses to slower storage / expensive lookups.
For instance, say we wanted to query a large database stored on a rotating hard drive (slow to read from). And suppose the thing we’re querying for has a good chance of not being present at all. Before querying the disk, we could check for the record in a bloom filter; if the bloom filter says the record definitely isn’t present, then we don’t need to touch the slow disk at all.
Hash Map Operations
insert()
delete()
find()
All operations take O(1) time
Hash map is a “really big array” where indices are anything, not just integers, but strings or sequence of integers. But it takes a string,object,etc and converts it into an integer by a HASH Function
But a “large” hash map is not good/or reasonable.
Abstract Data Structure are :
LIST Queue Stack Set (or Multiset) Graph Tree List Container
INTERFACES
- specifices only interfaces (methods names and specs)
- they do not deal with IMPLEMENTATION (bodies, methods, fields, ….)
Concrete Data Structures are:
LinkedList Array Binary Tree Heaps Bloom Filter Count-Min Sketch
CLASSES (i.e. node classes to support linked lists)
ADT structures can have multiple implementations by different concrete data structures.
Classes: ArrayList, LinkedList
What is the backing storage?
ArrayList : array
add(i, val) O(n), add(0, val) O(n), add(n, val) O(1),
get(1) O(1), get(0) O(1), get(n) O(1)
LinkedList: chain of nodes
add(i, val) O(n), add(i, val) O(1), add(i, val) O(1),
get(1) O(n), get(0) O(1), get(n) O(1)
What are examples of Concrete Data Structures
Array
LinkedList (singlely-linked, doubley-linked)
Tree (binary, general)
Heaps
Heaps
Is it a ADT or Concrete Data Structure?
Heaps are Concrete Data Structures
Binary tree with properties such that:
- Heap Order Invariant: every element in tree is greater than or equal to its parent.
- Complete Binary Tree: every level of tree , except the last one is completely filled, there are no holes
Don’t confuse heap memory (Java) with this heap
Heap Order Property
(again) Every element is greater than or equal to its parent
Heap Order Property
(again) Every element is greater than or equal to its parent
Completeness: Every level except the last is filled.
Nodes on the bottom are as far left as possible.
Know what makes something a complete tree/heap
It does not have any holes or missing nodes, and last row is not as left as possible
Heap: KeyMethods
add(e): adds a new element to the heap
poll(): deletes the least element and returns it
Heap ADD
Adds to the “end” of tree and bubbles its way up (if it is less than parent).
This maintains heap invariant.
O(log n): since tree is balanced.
- size is exponential as function of depth
- depth of three is logarithmic as function of size
Treap (a randomized binary search tree):
A Binary Search Tree, NOT guaranteed to have height O(log n)
Uses random /binary heap property to maintain balance
Similar to AV Tree and Red-Black Trees
How does the key in a node compare to the keys of its children in . . .
I. . . . a binary search tree?
II. . . . a max heap?
I. node.left.key < node.key < node.right.key
II. node.key > node.left.key, node.right.key
Suppose that the universe U of possible keys is {0, 1, . . . , n2 − 1}. For a hash table
of size n, what is the greatest number of distinct keys the table can hold with each of these collision resolution strategies?
I. Chaining
II. Linear probing
III. Quadratic probing
I. With chaining, we can hold as many keys as there are in the universe
which, in this case, is n^2.
Common errors were miscounting the size of the universe to be (n^2 − 1) or saying that the table could hold an infinite amount of elements.
II. Linear probing can only store as many elements as the size of the table which is n
II III. Quadratic probing (not taught)
Similarly to linear probing, quadratic probing can only store as many elements as the size of the table which is n.
What order should we insert the elements {1, 2, . . . , 7} into an empty AVL tree so that
we don’t have to perform any rotations on it?
You should insert in the order {4, 2, 6, 1, 3, 5, 7} to make an AVL tree. The ordering of {2, 6} and the ordering of {1, 3, 5, 7} do not matter. One can see the resulting binary search tree is perfectly balance therefore an AVL tree. One point is taken off if the student did not explain why the resulting BST is an AVL tree (balanced, or left and right depth differ by 0). A common mistake is that people gave an output of a max heap, which is completely different from BST.
Suppose you insert three keys into a hash table with m slots. Assuming the simple
uniform hashing assumption, and given that collisions are resolved by chaining, what is the probability that both slots 0 and 1 are empty?
Under SUHA (simple uniform hashing algorithm), each key is independent of others and have equal probability inserting into each of the m slots. So for each key the chance of not inserting into slot 0 or 1 is (m−2)/m . And because each key is independent, the total probability is ((m−2)/ m)^3. A common mistake is that students think the probability of not inserting into slot 1 is independent of not inserting into slot 0, which is not true. Given a key did not end up in slot 0, the chance that it will also not end up in slot 1 is (m−2)/(m−1).
Explain why, when resolving hash-table collisions via linear probing, one cannot remove an entry from the hash table by resetting the slot to NULL.
When one tries to look up a key in the hash table, it will return NIL when it sees an empty slot and therefore stop the lookup. So if one deletes a slot by resetting the slot to NIL, it will break the chain and one may not be able to find items that were inserted after the key in the deleted slot. Full credit is given for the correct explanation or a correct example.
Explain how a tree with n nodes and the property that the heights of the two children of any node differ by at most 2 has O(log n) height.
Using the same approach as proving AVL trees have O(log n) height, we say that nh is the minimum number of elements in such a tree of height h.
n_h ≥ 1 + n_h−1 + n_h −3 (1)
n_h > 2*n_h−3 (2)
n_h > 2^{h/3} (3)
h < 3 lg ( n_h ) (4)
h = O(log n) (5)
“if the heights of every node’s two children differ by at most some constant c, the
tree will have height O(log n)”, true but we’re looking for why exactly). Some
got to the right conclusion with an alternate method but had some logical flaws.
A common mistake was providing a counter example where the height was greater
than log n. This is not a valid counter example since that’s not what O(log n)
height means. h = O(log n) is comparing the asymptotic relationship between
the height and the number of elements in the tree, it’s not saying h < log n for
all n.
Is the following array is a max heap: [10, 3, 5, 1, 4, 2]. True/False?
False. The element 3 is smaller than its child 4, violating the maxheap property.
Every problem in NP can be solved in exponential time. True/False?
True
Could a binary search tree be built using o(n lg n) comparisons in the comparison
model? Explain why or why not.
No, or else we could sort in o(n lg n) time by building a BST in o(n lg n)
time and then doing an in-order tree walk in O(n) time.
Describe how any comparison-based sorting algorithm can be made stable, without affecting the running time by more than a constant factor
Tag elements with their original positions in the array, only increase by a factor of 2 at most
True/False: Binary insertion sorting (insertion sort that uses binary search to find each insertion point) requires O(n log n) total operations.
False:
False. While binary insertion sorting improves the time it takes to
find the right position for the next element being inserted, it may still take O(n)
time to perform the swaps necessary to shift it into place. This results in an O(n^2)
running time, the same as that of insertion sort.
True/False: In the merge-sort execution tree, roughly the same amount of work is done at each level of the tree.
True:
At the top level, roughly n work is done to merge all n elements. At the next level, there are two branches, each doing roughly n/2 work to merge n/2 elements. In total, roughly n work is done on that level. This pattern continues on through to the leaves, where a constant amount of work is done on n leaves, resulting in roughly n work being done on the leaf level, as well.
True/False: In a BST, we can find the next smallest element to a given element in
O(1) time.
False:
False. Finding the next smallest element, the predecessor, may require traveling down the height of the tree, making the running time O(h).
Example, next biggest might be in the adjacent subtree.
True/False: In an AVL tree, during the insert operation there are at most two
rotations needed.
True. The AVL property is restored on every operation. Therefore, inserting another item will require at most two rotations to restore the balance.
True/False: In a min-heap, the next largest element of any element can be found in O(log n) time.
False. A min-heap cannot provide the next largest element in O(log n)
time. To find the next largest element, we need to do a linear, O(n), search
through the heap’s array.
True/False: Double hashing satisfies the uniform hashing assumption.
False. The notes state that double hashing ‘comes close.’ Double hashing only provides n^2 permutations, not n!.
True/False: In a weighted undirected graph G = (V, E, w), breadth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V + E) time.
False. Only in unweighted graphs.
True/False: In a weighted undirected tree G = (V, E, w), breadth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V + E) time.
True. In a tree, there is only one path between two vertices, and breadth-first search finds it.
True/False: In a weighted undirected tree G = (V, E, w), depth-first search from
a vertex s finds single-source shortest paths from s (via parent pointers) in O(V +
E) time.
True. In a tree, there is only one path between two vertices, and breadth-first search finds it.
True/False: In a weighted undirected tree G = (V, E, w), depth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V +
E) time.
True. In a tree, there is only one path between two vertices, and depth-first search finds it.
True/False: If a graph represents tasks and their interdependencies (i.e., an edge (u, v) indicates that u must happen before v happens), then the breadth-first search order of vertices is a valid order in which to tackle the tasks.
No, you’d prefer depth-first search, which can easily be used to produce a topological sort of the graph, which would correspond to a valid task order. BFS can produce incorrect results.
True/False: Dijkstra’s shortest-path algorithm may relax an edge more than once in a graph with a cycle.
False. Dijkstra’s algorithm always visits each node at most once; this
is why it produces an incorrect result in the presence of negative-weight edges.
(NOT part of UCSD CSE 100)
Given a weighted directed graph G = (V, E, w) and a source s ∈ V , if G has a negative-weight cycle somewhere, then the Bellman-Ford algorithm will necessarily compute an incorrect result for some δ(s, v).
False. The negative-weight cycle has to be reachable from s.
(NOT part of UCSD CSE 100)
True/False: In a weighted directed graph G = (V, E, w) containing no zero- or positive-weight cycles, Bellman-Ford can find a longest (maximum-weight) path from vertex s to vertex t.
True. Negate the weights.
(NOT part of UCSD CSE 100)
True/False: In a weighted directed graph G = (V, E, w) containing no zero- or positive-weight cycles, Bellman-Ford can find a longest (maximum-weight) path from vertex s to vertex t.
True. Negate the weights.
True/False: Given a weighted directed graph G = (V, E, w) and a shortest path p
from s to t, if we doubled the weight of every edge to produce G’ = (V, E, w’), then p is also a shortest path in G’.
True. Multiplying edge weights by any positive constant factor preserves their relative order, as well as the relative order of any linear combination of the weights. All path weights are linear combinations of edge weights, so the relative order of path weights is preserved. This means that a shortest path in G will still be a shortest path in G’