Data Structures Q & A Flashcards
Priority Queue vs. Heap?
Priority Queue is an Abstract Data Type, and Heap is the concrete data structure we use to implement a priority queue.
Describe the methods for a Priority Queue?
A priority queue is a data structure that consists of a collection of items and supports the following operations:
insert: insert an item with a key
delete_min/delete_max: remove the item with the smallest/largest key and returning it
We use an array as the implementation but model it using a complete binary tree which is a heap.
Note that - we only allow getting and deleting the element with min/max key and NOT arbitrary key.
Describe the use cases for a Priority Queue? (Heap)
Give a real world case as well.
Sorted data structure -> so anything that needs order
It’s good for Top K problems so ranking, get top 10 user scores in a leaderboard, or top 10 liked/upvoted links
Alternatives to those problems of course are sorted sets (hash table + skip list)
A real world case is Hospital triage which is a priority queue. Patients are sorted based on the severity of their condition. For example, a person with the cold comes in and he is placed near the end of the queue. Then a person who had a car accident comes in, and he is placed before the person with the cold even though he came in later because his condition is more severe. Severity is the key in this case.
Coding Problem Examples
Consider a problem like Merge K sorted lists. We need to keep track of min value among k elements (smallest element in each sorted list) and remove the min value and insert new values at will while still having access to the min value at any point in time. Here are some other problems where the priority queue comes in handy.
Merge K sorted lists k closest pointers to origin kth largest element kth largest element in a stream Median of a data stream (multiple heaps) Uniform Cost Search
Python heapq, how to create a heap?
Can have a list and use heap.heapify(A) to sort in-place in O(n)
Reference on why: https://python.plainenglish.io/heapify-in-linear-time-114a15487ba1
Bucket Sort? Describe bucket sort?
O(n) way to use for n length array, where the index represents a “counter” and there needs to be some bound on the array, and the entries in the array themselves could be the “key” such as “Get K top frequent elements” leetcode.
Array where index is bounded by some constant, and the entries are a list of values incase you have multiple items.
https://en.wikipedia.org/wiki/Bucket_sort
The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.
What is constant time complexity?
What are examples of actions
O(1) is constant time complexity
Looking up a hashmap entry, or calculating a math formula. Also could be searching a ‘k’ constrained variable is also constant technically.
What is the time complexity of permutations?
O(N!)
This is the largest growing time complexity, and often requires memoization to avoid repeated computations and reduce complexity.
Example: Combinatorial problems, backtracking - ordering matters like permutations
What is the time complexity of finding subsets?
O(2^N)
Order doesn’t matter like permutations.
Grows very rapidly. Often requires memoization to avoid repeated computations and reduce complexity.
Combinatorial problems, backtracking, e.g. subsets
What is quadratic time?
O(N^2)
Also called quadratic time.
Nested loops, e.g., visiting each matrix entry
Many brute force solutions
For small N < 1000, this is actually not that bad for modern computers. In fact, you can probably pass most Leetcode tests with quadratic time for small Ns. However, in an interview, your solution usually has to do better than quadratic time to impress the interviewer if there is a better solution.
2 nested forloops
What is the time complexity of efficient sorts?
O(n log n), where O(n) for merge, and O(log n) for the divide operation in merge sort
–
Sorting. The default sorting algorithm’s expected runtime in all mainstream languages is N log(N). For example, java uses a variant of merge sort for object sorting and a variant of quick sort for primitive type sorting.
Divide and conquer with a linear time merge operation. Divide is normally log(N), and if merge is O(N) then the overall runtime is O(N log(N)). An example problem is smaller numbers to the right.
Time complexity of using a heap to find top k in a max or min heap?
O(k log n)
O(n) to build heap using heapify
k calls to pop with each pop costing log(n) => O(k log n)
Heap push/pop K times. When you encounter problems that look for the “top K elements”, you can usually solve them with pushing and popping to a heap, K times, resulting in O(K log(N)) runtime. e.g., K closest points, merge K sorted lists.
Binary search K times.
Name linear time operations?
Linear time normally means looping through a linear data structure a constant number of times. Most commonly this means
Going through array/linked list Two pointers Some types of greedy Tree/graph traversal Stack/Queue
Describe log time complexity
log(N) grows VERY slowly. log(1,000,000) is only about 20. In fact, lookup by primary key in a relational database is log(N) (many mainstream relational databases such as postgres use B-trees for indexing by default, and search in B-tree is log(N)).
In coding interviews, log(N) typically means
Binary search or variant
Balanced binary search tree lookup
Top K problems what algorithm to use?
Heaps, but some problems have better ways to improve this to a linear run time like:
- binary search
- quickselect
- bucket sort
But heaps will always be better than a regular sort, but there may be a good case to get a better run time than a heap.
“How many ways” can you do something?
Dynamic Programming or DFS (backtracking)
What algorithm when it comes to substring?
Sliding window, dynamic or static.
Example: Longest substring without repeating characters (dynamic sliding window)
What algorithms for tree structures?
They are DAGs (graphs)
So
BFS (level order)
DFS (preorder, inorder, postorder)
Height, LCA, Diameter (max height on both subtrees), and other common actions
What algorithm/data structure for “Parentheses”?
Stack
X Sum problems?
Two Pointer
Kinds of Two Pointer problems?
(1) Fast and Slow Pointers (cycle detection in linked lists)
(2) Starting from both ends (binary search, reverse, palindrome) - called collision with subtype two sum and partition
(3) Forward, with subtype sliding window (dynamic/static) and fast/slow pointers
(4) Parallel - 2 arrays and on pointer for each
Others that can’t be categorized are also in the post
Reference: https://jojozhuang.github.io/algorithm/algorithm-two-pointers/
“Subarray” problem algorithm patterns?
Prefix sum - subarray sum
Hashmap - continuous subarray sum
Max / Longest Sequence algo pattern?
Dynamic programming, DFS: Longest increasing subsequence
mono deque: Sliding window maximum
“Minimum/Shortest” pattern?
DP, DFS, BFS
Dynamic programming, DFS: Minimal path sum
BFS: Shortest path
Definition of a BST?
API?
A “Binary Search Tree”, or BST, is a binary tree with the following properties:
The value of the left node is less than the value of its parent node.
The value of the right node is greater than the value of its parent node.
The above properties must apply for all nodes in the tree.
An empty tree is a BST since it satisfies the BST properties. The values of each node on a BST can be of any type, as long as they are comparable with each other. For example, you can have an integer BST, a string BST, and so on.
API
find(x) // search, O(log n) // O(h)
insert(x) - O(log n) // O(h)
delete(x)
K, V store basically, ordered by Keys, ordered dictionary or a sorted set is a balanced binary search tree
h = log n, worst case O(n) if not balanced
Implement a BST in 30 lines of python
Node
and find(), and insert()
class Node: def \_\_init\_\_(self, val, left=None, right=None): self.val = val self.left = left self.right = right
def find(tree, val): if tree is None: return False if tree.val == val: return True elif tree.val < val: return find(tree.right, val) else: return find(tree.left, val)
def insert(tree, val): if tree is None: return Node(val) if tree.val < val: tree.right = insert(tree.right, val) elif tree.val > val: tree.left = insert(tree.left, val) return tree
What is BST used for?
Advantages over a hash table?
Used to look up existence of certain objects
Better than sorted arrays as insertion is faster
If you don’t need to dynamically insert new items, then you can simply sort the collection first and use binary search to look up.
Advantages over a hash table include
- Hash tables are unsorted, while BSTs are. If you want to constantly maintain a sorted order while inserting, using a BST is more efficient than a hash table or a sorted list.
- It’s easy to look up the first element in the BST that is greater/smaller than a lookup value than a hash table.
- It’s easy to find the k-th largest/smallest element.
- Dynamic hash tables usually have a lot of unused memory in order to make the insertion/deletion time approach O(1), whereas BST uses all the memory they requested.
DFS and thinking like a node, as a node, 2 concerns are?
Write DFS algo for Tree template
Two things we need to decide to define the function?
- Your value
- How to get your children
- ———
function dfs(node, state):
if node is null:
…
return
left = dfs(node.left, state) right = dfs(node.right, state) ... return ...
Two things we need to decide to define the function:
- Return value (passing value up from child to parent)
What do we want to return after visiting a node. For example, for the max depth problem this is the max depth for the current node’s subtree. If we are looking for a node in the tree, we’d want to return that node if found, else return null. Use the return value to pass information from children to parent. - Identify state(s) (passing value down from parent to child)
What states do we need to maintain to compute the return value for the current node. For example, to know if the current node’s value is larger than its parent we have to maintain the parent’s value as a state. State becomes DFS’s function arguments. Use states to pass information from parent to children.
What type of problem is this?
“How many ways are there to arrange something”
Combinatorial, use DFS (backtracking)
What type of problem is this?
“Find all possible combinations of”
“Find all solutions to a puzzle”
DFS/backtracking
What algo to use on a tree to
Traverse and find/create/modify/delete node
Traverse with return value (finding max subtree, detect balanced tree)
DFS - pre order, in order, post order
Can use BFS to traverse tree too thought
Algorithm to solve this?
Find a path from point A to B
Find connected components
Detect cycles
Find a path from point A to B - DFS, BFS, Shortest Path
Find connected components - DFS and BFS
Detect cycles - DFS, check if we’ve already visited a node we encounter again during traversal, only use BFS for undirected, but go with DFS for cycle detection