General Data Structure Flashcards

1
Q

Two-Stack Queue

“Dirty Dishes” Analogy

A

Maintain an “In” stack and an “Out” stack.
● To enqueue an element, push it onto the
“In” stack.
● To dequeue an element:
● If the “Out” stack is nonempty, pop it.
● If the “Out” stack is empty, pop elements from the “In” stack, pushing them into the “Out” stack, until the bottom of the “In” stack is exposed.

● Intuition: We only do expensive dequeues after a long run of cheap enqueues.
● Think “dishwasher:” we very slowly introduce a lot of dirty dishes to get cleaned up all at once.
● Provided we clean up all the dirty dishes at once, and provided that dirty dishes accumulate slowly, this is a fast strategy!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When to use bloom filters

A

Bloom filters are a great first layer of filtering, since they don’t require much space and can fit in fast storage, like RAM. Consider using them anywhere where knowing if something is definitely not present or possibly present would be helpful.

One common use is to eliminate unnecessary accesses to slower storage / expensive lookups.

For instance, say we wanted to query a large database stored on a rotating hard drive (slow to read from). And suppose the thing we’re querying for has a good chance of not being present at all. Before querying the disk, we could check for the record in a bloom filter; if the bloom filter says the record definitely isn’t present, then we don’t need to touch the slow disk at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hash Map Operations

A

insert()
delete()
find()

All operations take O(1) time

Hash map is a “really big array” where indices are anything, not just integers, but strings or sequence of integers. But it takes a string,object,etc and converts it into an integer by a HASH Function

But a “large” hash map is not good/or reasonable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Abstract Data Structure are :

LIST
Queue
Stack
Set (or Multiset)
Graph
Tree
List 
Container
A

INTERFACES

  • specifices only interfaces (methods names and specs)
  • they do not deal with IMPLEMENTATION (bodies, methods, fields, ….)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Concrete Data Structures are:

LinkedList
Array
Binary Tree
Heaps
Bloom Filter 
Count-Min Sketch
A

CLASSES (i.e. node classes to support linked lists)

ADT structures can have multiple implementations by different concrete data structures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Classes: ArrayList, LinkedList

What is the backing storage?

A

ArrayList : array
add(i, val) O(n), add(0, val) O(n), add(n, val) O(1),
get(1) O(1), get(0) O(1), get(n) O(1)

LinkedList: chain of nodes
add(i, val) O(n), add(i, val) O(1), add(i, val) O(1),
get(1) O(n), get(0) O(1), get(n) O(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are examples of Concrete Data Structures

A

Array
LinkedList (singlely-linked, doubley-linked)
Tree (binary, general)
Heaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Heaps

Is it a ADT or Concrete Data Structure?

A

Heaps are Concrete Data Structures
Binary tree with properties such that:
- Heap Order Invariant: every element in tree is greater than or equal to its parent.
- Complete Binary Tree: every level of tree , except the last one is completely filled, there are no holes

Don’t confuse heap memory (Java) with this heap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Heap Order Property

A

(again) Every element is greater than or equal to its parent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Heap Order Property

A

(again) Every element is greater than or equal to its parent

Completeness: Every level except the last is filled.

Nodes on the bottom are as far left as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Know what makes something a complete tree/heap

A

It does not have any holes or missing nodes, and last row is not as left as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Heap: KeyMethods

A

add(e): adds a new element to the heap

poll(): deletes the least element and returns it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heap ADD

A

Adds to the “end” of tree and bubbles its way up (if it is less than parent).

This maintains heap invariant.

O(log n): since tree is balanced.

  • size is exponential as function of depth
  • depth of three is logarithmic as function of size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Treap (a randomized binary search tree):

A Binary Search Tree, NOT guaranteed to have height O(log n)

Uses random /binary heap property to maintain balance

A

Similar to AV Tree and Red-Black Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the key in a node compare to the keys of its children in . . .
I. . . . a binary search tree?

II. . . . a max heap?

A

I. node.left.key < node.key < node.right.key

II. node.key > node.left.key, node.right.key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Suppose that the universe U of possible keys is {0, 1, . . . , n2 − 1}. For a hash table
of size n, what is the greatest number of distinct keys the table can hold with each of these collision resolution strategies?

I. Chaining

II. Linear probing

III. Quadratic probing

A

I. With chaining, we can hold as many keys as there are in the universe
which, in this case, is n^2.

Common errors were miscounting the size of the universe to be (n^2 − 1) or saying that the table could hold an infinite amount of elements.

II. Linear probing can only store as many elements as the size of the table which is n

II III. Quadratic probing (not taught)
Similarly to linear probing, quadratic probing can only store as many elements as the size of the table which is n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What order should we insert the elements {1, 2, . . . , 7} into an empty AVL tree so that
we don’t have to perform any rotations on it?

A

You should insert in the order {4, 2, 6, 1, 3, 5, 7} to make an AVL tree. The ordering of {2, 6} and the ordering of {1, 3, 5, 7} do not matter. One can see the resulting binary search tree is perfectly balance therefore an AVL tree. One point is taken off if the student did not explain why the resulting BST is an AVL tree (balanced, or left and right depth differ by 0). A common mistake is that people gave an output of a max heap, which is completely different from BST.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Suppose you insert three keys into a hash table with m slots. Assuming the simple
uniform hashing assumption, and given that collisions are resolved by chaining, what is the probability that both slots 0 and 1 are empty?

A
Under SUHA (simple uniform hashing algorithm), each key is independent of others and have equal probability inserting into each of the m slots. So for each key the chance of not inserting
into slot 0 or 1 is (m−2)/m . And because each key is independent, the total probability is ((m−2)/ m)^3. 
A common mistake is that students think the probability of not inserting into
slot 1 is independent of not inserting into slot 0, which is not true. Given a key did not end up in slot 0, the chance that it will also not end up in slot 1 is (m−2)/(m−1).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain why, when resolving hash-table collisions via linear probing, one cannot remove an entry from the hash table by resetting the slot to NULL.

A

When one tries to look up a key in the hash table, it will return NIL when it sees an empty slot and therefore stop the lookup. So if one deletes a slot by resetting the slot to NIL, it will break the chain and one may not be able to find items that were inserted after the key in the deleted slot. Full credit is given for the correct explanation or a correct example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Explain how a tree with n nodes and the property that the heights of the two children of any node differ by at most 2 has O(log n) height.

A

Using the same approach as proving AVL trees have O(log n) height, we say that nh is the minimum number of elements in such a tree of height h.

n_h ≥ 1 + n_h−1 + n_h −3 (1)
n_h > 2*n_h−3 (2)
n_h > 2^{h/3} (3)
h < 3 lg ( n_h ) (4)
h = O(log n) (5)

“if the heights of every node’s two children differ by at most some constant c, the
tree will have height O(log n)”, true but we’re looking for why exactly). Some
got to the right conclusion with an alternate method but had some logical flaws.
A common mistake was providing a counter example where the height was greater
than log n. This is not a valid counter example since that’s not what O(log n)
height means. h = O(log n) is comparing the asymptotic relationship between
the height and the number of elements in the tree, it’s not saying h < log n for
all n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Is the following array is a max heap: [10, 3, 5, 1, 4, 2]. True/False?

A

False. The element 3 is smaller than its child 4, violating the maxheap property.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Every problem in NP can be solved in exponential time. True/False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Could a binary search tree be built using o(n lg n) comparisons in the comparison
model? Explain why or why not.

A

No, or else we could sort in o(n lg n) time by building a BST in o(n lg n)
time and then doing an in-order tree walk in O(n) time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Describe how any comparison-based sorting algorithm can be made stable, without affecting the running time by more than a constant factor

A

Tag elements with their original positions in the array, only increase by a factor of 2 at most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

True/False: Binary insertion sorting (insertion sort that uses binary search to find each insertion point) requires O(n log n) total operations.

A

False:

False. While binary insertion sorting improves the time it takes to
find the right position for the next element being inserted, it may still take O(n)
time to perform the swaps necessary to shift it into place. This results in an O(n^2)
running time, the same as that of insertion sort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

True/False: In the merge-sort execution tree, roughly the same amount of work is done at each level of the tree.

A

True:

At the top level, roughly n work is done to merge all n elements. At the next level, there are two branches, each doing roughly n/2 work to merge n/2 elements. In total, roughly n work is done on that level. This pattern continues on through to the leaves, where a constant amount of work is done on n leaves, resulting in roughly n work being done on the leaf level, as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

True/False: In a BST, we can find the next smallest element to a given element in
O(1) time.

A

False:

False. Finding the next smallest element, the predecessor, may require traveling down the height of the tree, making the running time O(h).

Example, next biggest might be in the adjacent subtree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

True/False: In an AVL tree, during the insert operation there are at most two
rotations needed.

A

True. The AVL property is restored on every operation. Therefore, inserting another item will require at most two rotations to restore the balance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
True/False: In a min-heap, the next largest element of any element can be found
in O(log n) time.
A

False. A min-heap cannot provide the next largest element in O(log n)
time. To find the next largest element, we need to do a linear, O(n), search
through the heap’s array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

True/False: Double hashing satisfies the uniform hashing assumption.

A

False. The notes state that double hashing ‘comes close.’ Double hashing only provides n^2 permutations, not n!.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

True/False: In a weighted undirected graph G = (V, E, w), breadth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V + E) time.

A

False. Only in unweighted graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

True/False: In a weighted undirected tree G = (V, E, w), breadth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V + E) time.

A

True. In a tree, there is only one path between two vertices, and breadth-first search finds it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

True/False: In a weighted undirected tree G = (V, E, w), depth-first search from
a vertex s finds single-source shortest paths from s (via parent pointers) in O(V +
E) time.

A

True. In a tree, there is only one path between two vertices, and breadth-first search finds it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

True/False: In a weighted undirected tree G = (V, E, w), depth-first search from a vertex s finds single-source shortest paths from s (via parent pointers) in O(V +
E) time.

A

True. In a tree, there is only one path between two vertices, and depth-first search finds it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

True/False: If a graph represents tasks and their interdependencies (i.e., an edge (u, v) indicates that u must happen before v happens), then the breadth-first search order of vertices is a valid order in which to tackle the tasks.

A

No, you’d prefer depth-first search, which can easily be used to produce a topological sort of the graph, which would correspond to a valid task order. BFS can produce incorrect results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

True/False: Dijkstra’s shortest-path algorithm may relax an edge more than once in a graph with a cycle.

A

False. Dijkstra’s algorithm always visits each node at most once; this
is why it produces an incorrect result in the presence of negative-weight edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

(NOT part of UCSD CSE 100)
Given a weighted directed graph G = (V, E, w) and a source s ∈ V , if G has a negative-weight cycle somewhere, then the Bellman-Ford algorithm will necessarily compute an incorrect result for some δ(s, v).

A

False. The negative-weight cycle has to be reachable from s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

(NOT part of UCSD CSE 100)
True/False: In a weighted directed graph G = (V, E, w) containing no zero- or positive-weight cycles, Bellman-Ford can find a longest (maximum-weight) path from vertex s to vertex t.

A

True. Negate the weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

(NOT part of UCSD CSE 100)
True/False: In a weighted directed graph G = (V, E, w) containing no zero- or positive-weight cycles, Bellman-Ford can find a longest (maximum-weight) path from vertex s to vertex t.

A

True. Negate the weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

True/False: Given a weighted directed graph G = (V, E, w) and a shortest path p
from s to t, if we doubled the weight of every edge to produce G’ = (V, E, w’), then p is also a shortest path in G’.

A

True. Multiplying edge weights by any positive constant factor preserves their relative order, as well as the relative order of any linear combination of the weights. All path weights are linear combinations of edge weights, so the relative order of path weights is preserved. This means that a shortest path in G will still be a shortest path in G’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What implementation of an algorithm/search can determine if a graph has multiple paths to a vertex from another vertex

A

Stack based implementation

Recursion based implementation

42
Q

Minimum Spanning TREE :

Given an UNDirected Weighted Graph, a MST is a subgraph that connects all the vertices with the lowest possible sum of its edge weight.

MST’s total edge weight is minimized by removing heavier edges but retain connection to all vertices

MST of a graph is NOT Unique. More than one MST can exist. However, each MST will have the same total edge weight.

A

An MST always contains all the vertices of the original graph.

A subgraph with all the vertices, but minimum number of edges need connect them all AND a collection with the minimum edge weight sum.

43
Q

Shortest Path:

Given a weighted graph, the shortest path between two vertices is a path with the lowest possible sum of edge weights.

A

Shortest Path does not require inclusion of all the vertices, just at least the two vertices needed to have a path with.

44
Q

Comparing Prim’s and Dijkstra’s Algorithm

  1. Dijsktra’s Algorithm finds the shortest path, but Prim’s finds MST
  2. Dijsktra’s can work on both directed and undirected graphs. Prim’s only works with undirected graphs
  3. Prim’s algorithm can handle negative edge weights, but Dijkstra’s algorithm may fail accurately to compute distances if at least one negative edge weight exists.
A

Dijsktra: Shortest path
Prim’s: MST

Dijsktra: works on directed and undirected graphs
Prim: Only for undirected

Dijsktra: cannot handle graphs with negative edge weights
Prim’s: can handle negative edge weights

45
Q

Implementing Prim’s Algorithm

A

Requires data structure to store elements. Elements should/needs to represent the edge weight in some way.

Include a boolean variable to indicate if currently included/visited during traversal through graph

Create a “forrest” of vertices initialized to indicate vertices are “seperated”/disconnected, and use a while loop

Should have a way for class/function to check if all vertices’ boolean is all Yes or if a no exists.

46
Q

Dijkstra’s Algorithm

Given a initial/start and end/destination vertex, Dijsktra can determine shortest path and distance of of two vertices.

Dijsktras can find the shortest distance from a start to ANY destination in a connected graph

A

The core idea is to continusly eliminate longer paths between the starting node and all possible destinations (or at least given them lower priority) during the traversal.

Discriminate between visited nodes and unvisited nodes. Visited nodes are “known”/verified to be the minimum distance from the start and current traversal vertex.

Unvisited nodes are those that have not been confirmed to be minimum but can still reach destination vertex.

Implementation requires initiazliation of all vertex to have a “distance” of infinity and Prev/Predecessor.

Start vertex is only node not given “infinity” distance, but 0.

Iteratively traverse through graph through BFS and change vertex status as visited BFS is powered by a PRIORITY QUEUE (min-heap) popping the minimum edge weight each time among the set of possible vertices.

The edge weight is not static, as the vertices being added to the Priority QUEUE have edge weights that are the weight of its edge to the previously visited vertex PLUS (+) current distance since the start vertex.

47
Q

Dijsktra’s Algorithm Implementation

A

A vertex/node can be described with a label/name, LinkedList in reference to shortest path, a distance from the source, and a adjacency list (list of neighbors and edge weights)

48
Q

Kruskal’s Algorithm

With N number of edges, Kruskal’s algorithm will fid a MST with V-1 edges and have NO cycles.

  1. Collect all graph edges and initialize all as “unvisited”/un-included
  2. Sort all edges based on weight (PQ)
  3. Select lowest weight edge/first off PQ, if it doesn’t create a cycle with current edges then add it.
  4. Continue selection until V-1 edges reached.
A

Reversing the PQ to give the max edge weight can provide the Maximum Spanning Tree

Challenge in implementation is checking for cycle. Use DFS algorithm to check for cycle.

Optima solution is to use UNION-FIND alogirthm with disjoint data structure (since it uses incremental edge addding approach to detect cycles

49
Q

Kruskal’s Algorithm

With N number of edges, Kruskal’s algorithm will fid a MST with V-1 edges and have NO cycles.

  1. Collect all graph edges and initialize all as “unvisited”/un-included
  2. Sort all edges based on weight (PQ)
  3. Select lowest weight edge/first off PQ, if it doesn’t create a cycle with current edges then add it.
  4. Continue selection until V-1 edges reached.
A

Reversing the PQ to give the max edge weight can provide the Maximum Spanning Tree

Challenge in implementation is checking for cycle. Use DFS algorithm to check for cycle, but…

Optimal solution is to use UNION-FIND algorithm with disjoint data structure (since it uses incremental edge adding approach to detect cycles. Implementing this into the spanning tree construction process will improve Kruskal Algorithm

50
Q

Checking to see if two nodes are in the same set:

A

From a forrest of disjoint sets/nodes, where each set only contains one node, itself.

Each time an edge is added to the tree, we check if the two nodes are in the same set.

Tree structure can be used to represent the disjoint set. Each node has a parent pointer. In each set, there is a unique root node that represents this set. At the start , the root node has a self referenced parent pointer.

51
Q

Improving Kruskal RunTime Complexity

Up-Tree with PATH COMPRESSION (self adjusting structure)

A

With a certain union order worst case could be O(n) and resembling a linked list. To improve this we use path compression technique where the root node is attached to the union’s parent reference directly. Thus, future lookups of this node will take one iteration to get to the root/sentinel node.

Path compression reattaches the vertices directly to the root

52
Q

Amortized Cost Analysis

Self-Adjusting Operations in iterative uses

A

In context of path compression, the first time using this technique is essentially guaranteeing to take no longer than subsequent uses.

Amortized cost analysis considered the time or space cost of doing a sequence of operations (as opposed to a single operation) since the total cost of entire sequence of operations might be less with extra initial work than without!

The first application of path compression is non-constant time where structure is re-adjusting itself. The second is constant time part in which the structure has finished readjusting and is now reaping the benefits of the self-adjustment.

53
Q

Dijkstra’s Algorithm takes advantage of the Priority Queue ADT to store the possible paths and uses the total weights of the paths to determine priority.

Since we are storing vertices that are immediately reachable from the current vertex in the priority queue, the same vertex can in fact appear in the priority queue more than once if there exists more than one path to reach it.

Dijsktra’s Algorithm runs as along as the PQ is not empty.

Once a vertex/node is removed from PQ, algorithm never updates that vertex’s info/fields, guaranteeing once its been popped, that vertex has arrived through and achieved its shortest path.

A

In the PQ existence of a shorter path with the same vertex is not possible since the PQ is designed to have the SHORTEST path the top. Thus the shortest path is updated before any other options of the same vertex that are larger in distance, which after that vertex is popped will never be updated again (boolean flag).

This is part of the GREEDY algorithm approach.

54
Q

Bloom Filter is a Data Structure

Quick ( O(1) ) verification/check if element is in set.

Benefits: Space Efficient. Uses bits/bytes (integers) not not a lot of data for all element information. Uses bitmap (initialized to 0)

Tradeoff/Compromise: Due to probabilistic methods (not deterministic) can only give accurate verification of true negative or “that something is definitely not in the set.” Cannot definitively report that an element is in the set, just a maybe. True positive report not possible

Compromise: Limited interface. Can only insert and look up. Cannot iterate or delete elements.

A

Space: O(1)

Insert: O(1)

Lookup: O(1)

How to use and implement?
It is a good first “layer” of filtering.

As the size of the database or table or bitmap becomes bigger the likelihood of eventually reporting a false positive gets larger. If possible, if the size is known ahead of time, then try to restrict it from growing (if possible) or else needing to resize constantly to maintain low probability of false positve (“maybe”)

55
Q

Priority Queue

“Special” Queue, where every element has a priority rating/tag associated with it. the “higher” the priority the earlier it can get popped off the queue or the higher up the list/container it is.

Priority can be defined by user

Benefits: the top will always be the “highest” priority

Benefits: generally fast operations over all functions

Tradeoff: pushing/insert and popping/remove take the longest compared to other operations due to maintenance of rules. First insert is quick, and removal from low occupancy container is fast

A

Space: O(n)

Peek: O(1)

Dequeue: O( log n). (“popping”/removing out of queue and correctly positioning/location next item on the top and ensuring the remainder of container maintains priority rules)

Enqueue: O( log n). (“pushing” into queue and correctly positioning/location it based on priority)

Highly useful in Graph/Dijsktra’s Algorithm, Graph Traversal, and Huffman encoding

56
Q

Priority Queue (ADT)

“Special” Queue, where every element has a priority rating/tag associated with it. the “higher” the priority the earlier it can get popped off the queue or the higher up the list/container it is.

Priority can be defined by user

Benefits: the top will always be the “highest” priority

Benefits: generally fast operations over all functions

Tradeoff: pushing/insert and popping/remove take the longest compared to other operations due to maintenance of rules. First insert is quick, and removal from low occupancy container is fast

Implementation using BINARY HEAPS (data structure). In a tree structure the root is the highest priority, allowing for O(1) access (peek). But after removal, next item will cost due to restructuring

A

Space: O(n)

Peek: O(1)

Dequeue: O( log n). (“popping”/removing out of queue and correctly positioning/location next item on the top and ensuring the remainder of container maintains priority rules)

Enqueue: O( log n). (“pushing” into queue and correctly positioning/location it based on priority)

Highly useful in Graph/Dijsktra’s Algorithm, Graph Traversal, and Huffman encoding

Applications: any environment with different levels of urgency like hospitals, auctions, or scheduling and tradeoffs can be be made (discrimination of elements)

If using a sorted LIST to implement Priority Queue, where the highest priority is just a position 0, but insertion will need to move a certain number (up to N-1) items over, nad removal will require the same but shifted over opposite direction

As a linkedLIST, first/head node is highest priority so fast access, but insertion will require changing of pointers O(n). Removal and restructure is fast and easy.

57
Q

Burrows-Wheeler Transform Optimization

No matter how large our database string may be, we can find all occurrences of any given query as fast as simply scanning the query! Given that this is just as fast as even reading in our query, in any practical use-case, it’s impossible to beat this Big-O time complexity O(mk).

n is length of database string
Q is query
m number of strings within query Q
k is the length of strings in the collection of m

All occurrences of all m query strings can be found in O(mk)

A

Burrows Wheeler Transform Construction O(n)

58
Q

Heaps are naturally structured like a Priority Queue ad so its usually the default choice of data structure to implement Priority Queue

A

They are also used to implement the heapsort sorting algorithm, which is a nice fast N log N sorting algorithm

59
Q

When inserting into a heap be aware to:

  • preserve the structural and ordering
    properties. The result must be a heap!

insertion is O(log n)

A

Basic algorithm:
1. Create a new node in the proper location, filling level left-to-right
2. Put key to be inserted in this new node
3. “Bubble up” the key toward the root, exchanging with key in parent,
until it is in a node whose parent has a larger key (or it is in the root)

Fill in right left child before the right. No node should have a right child and empty left child

60
Q

What is worst time complexity of Huffman Coding Algorithm with Heap/Priority Queue

A

O(K + N log N)

This depends on also the I/O speed which can dominate if number of elements is large

61
Q

What is the time cost of then using that tree to code the input message?
• Coding one symbol from the input message takes time proportional to the number of bits in the code for that symbol; and so the total time cost for coding the entire message is proportional to the total number of bits in the coded version of the message

A

This the answer to this question lies in the motivation and inclusion of information theory and the minimum number of bits/units of information needed for a message of size K.

By Shannon -> Entropy:

~~log_2 (K)

to find average number of bits, need to take the weights average of p log 1/p .. +1 (or ceiling)

62
Q

What is the range?
0 <= H <= log (N)

What the is lowest value of a character/symbol/data entropy?

A

Lowest is if it always occurs and has prob of 1 is log (1) = 0

If all element occurs the same nuber of times, then they all have the same probability 1/N, so H = SUM N * log(1/N)

63
Q

Arrays can be used to implement trees; this is a good choice in some cases if the tree has a very regular structure, or a fixed size

A

One common case is: implementing a heap

Because heaps have a very regular structure (they are complete binary trees), the array
representation can be very compact: array entries themselves don’t need to hold parentchild relational information
• the index of the parent, left child, or right child of any node in the array can be found
by simple computations on that node’s index

64
Q

Huffman code trees do not have as regular a structure as a heap (they can be extremely unbalanced), but they do have some structure, and they do have a determinable size

A
  • structure: they are “full” binary trees; every node is either a leaf, or has 2 children
  • size: to code N possible items, they have N leaves, and N-1 internal nodes

These features make it potentially interesting to use arrays to implement a Huffman
code tree

65
Q

One way to implement a heap with N nodes holding keys of type T, is to use an N element array T heap[N]; Nodes of the heap will correspond to entries in the array
as follows:
• The root of the heap is array element indexed 0, heap[0]
• if a heap node corresponds to array element indexed i, then
• its left child corresponds to element indexed 2i + 1
• its right child corresponds to element indexed 2
i + 2
• ..and so a node indexed k has parent indexed (k-1)/2

A

The result: Nodes at the same level are contiguous in the array, and the array has no “gaps”

66
Q

Red Black Trees try to improve on AVL Trees:

The red-black invariants are more complicated than the AVL balance property; however they can be implemented to provide somewhat faster operations on the tree

A

The red-black invariants imply that the tree is balanced

Red-black trees are balanced; insert, delete, and find operations are O(logN) worstcase. In fact, the height can never be more than 2 (log2 N) + 1

It is fairly easy to implement insert and delete operations to be faster by a constant factor than for AVL trees

The average level of a node in a random red-black tree is very close to that in a
random AVL tree (about log2 N), so find operations are as fast in the average case

The average level of a node in a random red-black tree is very close to that in a
random AVL tree (about log2 N), so find operations are as fast in the average case

67
Q

The trick to Red-Black Trees improving on AVL trees is :

The trick is to implement Insert and Delete operations that ensure that all the red-black invariants are… invariant

A

Height of R-B trees are: 2 (log2 N) + 1 as inserting only increases by a factor of 2 and restrictions/rules dont the height get out of control.

All internal nodes will have at least 2 children

68
Q

Contrast with Insert in AVL trees:

Descent from the root finds the place to insert the new node; then detecting failures of the AVL property and fixing them is done on the way back up to the root

A

Red-Black Trees only worry about the initial top-down preparation and never needing to bubble up or rotate after insertion which AVL does and costs more in runtime costs

This means in effect travelling the path from root to leaf twice, whether it is done
recursively, or iteratively (by using parent pointers)

69
Q

The tree resulting from any possible AVL rotation on any possible Binary Search Tree (BST) is always also a valid BST.

A

TRUE

70
Q

Tries

A trie data structure is a tree in which:
✗ internal nodes represent decision rules…
✗ edges from a node to its children represent possible outcomes of the decision…
✗ (usually only) leaves represent data

A

Tries don’t have to be binary:

an “alphabet trie” is a 26-ary tree for storing words
✗ a “ternary trie” is a 3-ary tree supporting efficient prefix queries on strings

Tries don’t have to be completely filled: in general they can have any tree structure

Other applications of TRIE are decision trees and discrimination nets

71
Q

A Treap containing n elements has the same tightest worst-case Big-O time complexity for “insert” operations as a Linked List containing n elements.

Removing a leaf is easy. But removing other nodes will require restructuring and rebalancing to maintain full structure

TREAP are combination of Binary TREE + Binary HEAP, storing pairs of values: data and priority

A

Formed by inserting the nodes highest-priority-first into a binary search tree without doing any rebalancing

If the priorities are independent random numbers (from a distribution over a large enough space of possible priorities to ensure that two nodes are very unlikely to have the same priority) then the shape of a treap has the same probability distribution as the shape of a random binary search tree, a search tree formed by inserting the nodes without rebalancing in a randomly chosen insertion order

72
Q

(definition/fact)

To store n strings, it is generally more memory efficient to use a Ternary Search Tree than to use a (TSTS) Multiway Trie (MWT)

A

MWT uses much more memory due to empty nullptr form unused character edges

73
Q

In the case where the length (m) of strings in a set is small(er) than the number of different strings (k), it is best to use a MWT than Ternary Search Tree

A

Depending on initial insertion and how many different words/strings, you can traverse down the lef tor right multiple times before reaching actual initial character.

74
Q

Comparatively, MWT will NOT ALWAYS be smaller than TST being smaller (shorter) height.

They will either be the same or TST will be shorter

A

MWT will not always have an advantage , it is more likely to have an advantage but NOT ALWAYS

75
Q

AVL Tree is a special type of Binary Tree that maintains a tight balance factor (-1, 0, or 1)

A

All AVL trees are binary trees but not all binary trees are AVL trees

76
Q

Ding just a (left or right) rotation at the root without rebalancing is O(1).

A

This is because the only nodes affected are the 2-3 notes connect to the node being rotated and the readjustment of pointers are O(1)

In general the higher up the node, the less time complexity.

77
Q

What data structure can maintain a dynamic set with operations Insert (x,S), Delete(x,S) and Member? (x,S) that has expected run time of O(1) per opertion?

A

Hash Table

78
Q

IS this array a max heap?

The array
20 15 18 7 9 5 12 3 6 2

A

True

a[1] has two children a[2] a[3] that are less
a[2] has two children a[4] and a[5]. etc etc

79
Q

Suppose that a hash table with collisions resolved by chaining contains n items and has a load factor of ALPHA = 1/(log_2 n) . Assuming simple uniform hashing, the expect time to search for an item in the table is O(1/log_2 n)

FALSE

A

FALSE

The expected time to search for an item in the table is O(1 + ALPHA) = O(1+ log_2 n) = O(1)

At least a constant running time O(1) is needed to search for an item; subconstant running time O(1/log_2 n) is not possible

80
Q

Suppose that a hash table of m slots contains a single element with key k and the rest of the slots are empty. Suppose further that we search r times in the table for various other keys not equal to k. Assuming simple uniform hashing, the probability is r/m that one of the r searches probes the slot containing the single element stored in the table.

FALSE

A

FALSE

The probability p that one of the r searches collides with the single element stored in the table is equal to 1 minus the probability that none of the r searches
r collides with the single element stored in the table. That is, p = 1 − (1- 1/m)^r

81
Q

Let S be a set of n integers. One can create a data structure for S so that determining whether an integer x belongs to S can be performed in O(1) time in the worst case.

TRUE

A

TRUE

Perfect Hashing O(1)

82
Q

In a BST, we can find the next smallest element to a given element in O(1) time.

FALSE

A

O(h)

h=height = log_2 n

83
Q

In an AVL tree, during the insert operation there are at most two rotations needed.

A

TRUE. AVL property restored on every operation. Therefore, inserting another item will require at most two rotations to restore the balance.

84
Q

Linked List is a container object made up of zero(0) or more nodes

A

LinkedList can be used a a data structure with sequence of data/objects in a HEAP

HW5 (CSE12 GARY) Objects are nameless

85
Q

Tree Traversal

Depth First Search
PRE-ORDER: V, L , R
IN-ORDER: L, V, R
POST-ORDER” L, R, V

Breadth First Search
LEVEL Order Traversal: top to bottom and left to right
Layers are distances from the root

A

Post Order is guaranteed to visit descendants before itself

Pre-Order visits everyone before stoping

86
Q

Unbalance:
worst case is O(n)
height is n-1

Balanced:
worst case is O(log n)

BST worst case is O(n0

A

Self Balancing BST gives guarantee of tree brnaches

Best case: O(1)

Worst case: O(n) if can’t find or unbalanced

Average: E[Y]

87
Q

BST Remove Algorithm

Case1: no child (easy) just return

Case2: 1 child

Case3: 2 children

A

Successor is the leftmost node on right subtree

88
Q

Queue is an ADT
First in First Out (FIFO)

STACK is Last In First Out (LIFO)

A

Queue implemented as Linked List

Stack implemented as Linked List

Queue or Stack can be implemented as an ArrayList

Data Structure impl,ements “any” ADT but speed can vary depending how what Data Structure is used.

89
Q

Data Structure

How ADTs are implemented

Holds data, relates the data, operates on the data

Concrete

Built on ADT

A

Array List

Binary Search Tree

Heap

Linked List

Hashmap

90
Q

Tree is a GRAPH with NO CYCLES

GRAPH is an ADT and the DATA STRUCTURE is the TREE

A

Car is an ADT, engine is data structure

How the car is build is data structure

91
Q
Queue - FIFP
enqueue: Add to back
deque: remove from front
               BEST     WORST
Space:   O(n)         O(n)
Search:  O(n)         O(n)
Insert:     O(1)         O(1)
Delete:   O(1)         O(1)
A

Dynamic memory allocation: “new” memory allocated during runtime, as opposed to static memory, assigned at start of program (constants, variables, etc.)

92
Q

SET is ADT in C++

ADT: no description of how data is organized
- describes functions, not how its implemented

How an ADT is implemented describes organization of the data

A

Array can implement SET

Hashtable can implement MAP ADT

C++ uses RBT for MAP ADT

93
Q

Red-Black Tree

Runtime of TREE are related to height, so Red-Black tree doesn’t try to minimize the height of subtrees, only constrains how the nodes are organized to minimize the insertion. Thus it sets constraints so worst time case is O(log n) which is the height.

A

When rearranging only rotate nodes within the path of insertion, never have to deal with other subtrees unrelated to the path of insertion

Height will always be factor of runtime. But it minimizes insertion. Search takes longer but faster to insert.

Before and after insertion RBT is perfectly balanced

94
Q

AVL Trees

Self balancing trees that computes its balance factor and sets it to stay within +/-1, 0

A

AVL will always need to go down to insert and back up to check balance and rebalance if necessary

95
Q

A tree is a container object composed of 0 or more nodes.

TRUE, still a tree, but empty

A

The first node of a TREE is the root

96
Q

Search Path: nodes visited from root to a leaf in search of an item in a tree

A

Terms for visiting all nodes in a Tree: Traversal

97
Q

A recursive solution involves less code than a loop based solution. TRUE

When a program overflows the Run-Time Stack, the cause is usually infinite Recursion

A

Loop based solutions often uses less memory than recursive solution

98
Q

Hashtables and Hashing

Hash Functions, properties of a good hash function, examples of hash functions, determining good vs. bad hash functions.

Probability of collisions

Collision Resolution: Open address (linear probing, double hashing, random hashing , cuckoo hashing), Separate Hashing

Running time for hashtables and issues (load factor)

How to choose hash table size

Hashtables vs. Tree/List data structure (advantages and disadvantages)

A

Read/write code that uses arrays on the stack or heap, such as ArrayStack

Trees: understand/implement operations on a tree (BST or trie) such as adding elements or performing traversals, and understand qualities of trees such as balance.

Graphs: look at a graph and answer questions about it (connectedness, cyclicness, degrees; understand the execution of graph algorithms such as DFS, BFS , Dijsktra’s)

99
Q

Amortized

Any series of m operations on a two-stack queue will take time O(m).

Every element is pushed at most twice and popped at most twice.

Intuition: Average case is O(1) per operation, total work done is O(m). True, but not precise.
Total work done: THETA (m)
Total Operations: THETA (m)

Key Idea: Backcharge expensive operations to cheaper ones. We overestimate the average cost of EACH operation so we never underestimate the overall amount of work that we do

No matter when we stop performing operations its will be LESS THAN the actual cost of performing those operations which is LESS THAN the ceiling represented by the amortized cost of performing those operations.

A

Key Idea: Design Data structures that trade per-operation efficiency for overall efficiency

Use two-stack queue as example

We only do expensive dequeues after a long run of cheap enqueues (dishwasher, we slowly introduce a lot of dirty dishes to get cleaned up all at once. Provided we clean u al the dirty dishes at once and provided that dirty dishes accumulate slowly, this is the fastest strategy)

Dam Example: Lots of expensive up front work but payoff over time (long term). The average work done at each point in time is high until lots of operations are performed. Early expensive operations, cheap later ones.

Dish Washer Example: Lots of cheap operations that need to be made up for by an expensive one later. The average work done at each point in time is low.

Grocery Story Example: Unlikely there will be large operations because of the randomizations, but every now and then, we run into trouble when demand for one item is high. Performs well on expectation, can’t guarantee efficiency.

100
Q

When Amortization Works?

Array/Vector: most append take O(1) and consume free space. Sometimes append takes O(n) but produces a lot of free space. Amortized cost of any append is O(1)

Binary Tree: Insertion takes O(log n) and causes unbalances tree but with rebalancing, search and remove can also be O(log n)

A

Amortization works best if:

  1. imbalances accumulate slowly, and
  2. imbalances get cleaned up quickly
101
Q

Amortized Analysis with Bankers Method

To perform an amortized analysis using the
banker’s method, do the following:
● Figure out the actual runtimes of each operation.
● Indicate where you’ll place down credits, and
compute the amortized cost of operations that
place credits this way.
● Indicate where you’ll spend credits, and justify
why the credits you intend to spend are
guaranteed to be there. Then, compute the
amortized cost of each operation that spends
credits this way.

A

It doesn’t matter where these credits are placed
or removed from.
● The total number of credits added and removed
doesn’t matter; all that matters is the difference
between these two.