Data structure and algorithm Flashcards
Name 4 enumeration use cases for backtracking
- subset
- permutation
- combination sum
- palindrome partitioning
What algorithm does backtracking use and what’s a “must have” function for backtracking
- uses recursion
- uses DFS
- need a isValid function to check if the solution is valid so far
What are the two implementations of a graph
- adjacency matrix
- 2D array of boolean where true indicates an edge exists between the two nodes
- adjacency list
- Map<T, List<T>></T>
- map of a node and all of its neighbors
Name 2 diff between graph vs tree traversal
- For graph, need to keep track of “visited”
- For graph, need to consider a disjoint graph - use an outer while loop
What data structures are used for BFS vs DFS
BFS uses a Queue (class LinkedList)
DFS uses a Stack (class Stack)
Describe BFS to visit all vertices for a graph (iterative approach)
- prep (before the loop)
- add starting node to the queue
- for loop (process the queue)
- Look at the head of the queue
- if its visited, remove it from the queue, break from the loop
- else, add it to “visited”
- for all its neighbors, check if they are visited, if not, add to the queue
- remove it from the queue
- stop condition: queue.isEmpty()
- algorithm is iterative not recursive
Describe DFS for a graph (iterative approach)
- push starting node into stack
- while stack is not empty
- pop node from stack
- mark the node visited
- for all neighbors of the node
- check if visited, if not, push to stack
- stop condition: stack is empty
Describe DFS for graph using recursion and some things to note
procedure dfs(vertex v) { visit(v); for each neighbor u of v if u is undiscovered call dfs(u); }
- Note this implementation does not handle disjoint graph
- For disjoint graph, we need a separate DFS for each connected graph
Name 4 DFS applications (1 of which is not studied so it’s ok to only name 3)
- detect cycle in graph
- path finding
- finding strongly connected components of a graph
- backtracking
Tradeoffs between DFS and BFS (incomplete)
BFS
- can consume a lot more memory when the branching factor of a tree is huge
DFS
- can take a long time to visit other neighboring nodes if the depth of the tree is huge
Advantage of using adjacency list over adjacency matrix
- adjacency list uses less space, better for sparse matrix
Implement permutation using backtrack
Highlights (compare to the subsets)
- add to solution only if the length matches
- no need to have a start pointer
- need to check if the current item is already in visited
/** * This implementation does not handle the input if it contains duplicate values * input: 1, 2, 3, * output: [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]] * * https://leetcode.com/problems/permutations/ * * @param nums * @return */ public List<List<Integer>> permute(int[] nums) { List<List<Integer>> solution = new ArrayList<>(); permuteBacktrack(solution, new ArrayList<>(), nums); return solution; } private void permuteBacktrack(List<List<Integer>> solution, List<Integer> visited, int [] nums) { if(visited.size() == nums.length){ solution.add(new ArrayList<>(visited)); } for(int i = 0; i < nums.length; i++){ if(visited.contains(nums[i])) continue; // element already exists, skip visited.add(nums[i]); permuteBacktrack(solution, visited, nums); visited.remove(visited.size() - 1); } }
Implement subsets using backtrack
Algorithm highlights:
- because we use recursion, any temp data structure are passed in as args instead of return values
- args are: solution, visited (list), input (array), start pointer
- have a for loop to iterate thru the input array
- visit the current item
- recursion with start++
- remove the last item in visited
- essentially “visited” is used as a “stack”
public List<List<Integer>> subsets(int[] nums) { List<List<Integer>> solution = new ArrayList<>(); subsetsBacktrack(solution, new ArrayList<>(), nums, 0); return solution; } private void subsetsBacktrack(List<List<Integer>> solution , List<Integer> visited, int [] nums, int start){ solution.add(new ArrayList<>(visited)); for(int i = start; i < nums.length; i++){ visited.add(nums[i]); subsetsBacktrack(solution, visited, nums, i + 1); visited.remove(visited.size() - 1); } }
Find substring problem - high level description of the approach
Problem: find a min substring in s that satisfies t
* Input: s = "ADOBECODEBANC", t = "ABC" * Output: "BANC"
- use a “two pointer” approach by using start/end pointer to iterate thru “s”
- outer while loop to “expand” the window, move “end” pointer
- outer loop ends when “end” points at last char in “s”
- inner while loop to “shrink” the window, move the “start” pointer
- “shrink” until the “window” no longer satisfies “t”
Find substring problem - what are the data to keep track of
Problem: find a min substring in s that satisfies t
* Input: s = "ADOBECODEBANC", t = "ABC" * Output: "BANC"
- a char map - to keep track of whats matched
- start/end pointers
- minStart/minLength
Implement isPalindrome
public boolean isPalindrome(String s, int low, int high){ while(low < high) if(s.charAt(low++) != s.charAt(high--)) return false; return true; }
Important points
- use s.charAt()
- while (low < high), not <= because no need to compare the middle char if length is odd
Implement the template for binary search
- how to compare left and right (< or <=)
- mid rounds up or down, in java, what method to call
- how to set mid pointer
- how to set left and right after each iteration
- after exiting the loop, which pointer do we return
- while loop is left < right
- mid rounds down, In Java, use Math.floorDiv(int x, int y) -> int
- mid is left + Math.floorDiv((right-left), 2)
- left = mid + 1, right = mid
- return left
- after exiting the while loop, return left or right depending on if need to find min or max
def binary_search(array) -> int: def condition(value) -> bool: pass left, right = min(search_space), max(search_space) # could be [0, n], [1, n] etc. Depends on problem while left < right: mid = left + (right - left) // 2 # means round down if condition(mid): right = mid else: left = mid + 1 return left
Name 3 basic applications for binary search
Basic applications
- find the first “bad” version (like git bisect)
- compute square root
- condition is k^2 > x, then k-1 is the answer
- right = x+1 to deal with edge conditions like x = 0 and x = 1
- search insert positions in a sorted array
When can we use binary search
Give a sample problem of an advanced application
In this sample problem, what are the min and max values for the search space
When we can identify a search space.
Sample problem - determine the min ship weight to ship all the shipments within x days.
Input: weights = [1,2,3,4,5,6,7,8,9,10], days = 5
Approach
- search space is left = max(weights), right = sum(weights)
- condition function returns true if it’s possible to ship everything within D days with a given capacity
What are the variations of the sliding window problem
“max” window
- the initial condition is valid
- goal is to find the “max” window
- example: find max non repeated chars in a string
“min” window
- the initial condition is invalid
- goal is to find the “min” window where it’s valid
- example: find the substring problem
What are the two implementations of the “max window” problem
“shrinkable” solution
- right pointer expands until the window is invalid
- then shrink the left pointer until it’s valid again
- keep track of the “max” result
“non shrinkable” solution
- based on the observation that once we find a “max” value so far, then there is no need to shrink it
- we “shift” the window by incrementing “left” pointer when the window is invalid
- note there is no inner loop, it’s an “if” statement instead
- this essentially “shifts” the window (without increasing the size) when it’s invalid, thus keep track of the “max result” seen so far
Some highlights of the sliding window implementation for both the “max”and the “min” variations
common
- outer loop is a for loop, “right” ptr moves 1 step every time
- inner loop is a while loop, “left” ptr moves 1 step while its valid or invalid
- the variation between problems is how to keep the “state”
“max window”
- because the window starts as valid, the outer loop expands the window while state is valid
- the inner loop shrinks the window until it’s invalid
“min window” is the opposite
In union find, what’s the difference in the data stored in the “root” data structure between the quick find and quick union implementations
- quick find: it stores the root of the node
- quick union: it stores the parent of the node
Whats union find and what are the 4 common operations
- union find performs operation on disjoint data sets
- “find” - returns the root node given a node
- “union” - joins the subtree containing x and y nodes
- “init” - inits the disjoint data sets by setting each node as a disjoint graph
- “isConnected” - returns true if x and y are connected
What are some variation of implementation of union find
Quick find - optimizes on “find”
- the root array stores the “root” of each element
Quick union - optimizes on “union”
- the root array stores the “parent” of each element
Union by rank - an optimization for “quick union”
- all function implementations are the same as “quick union” except for the “union” function
- optimize the worst case scenario in “quick union” where the tree is a single line (height of tree == num of elements)
- uses an additional array “rank” to store the height of the tree
- optimizes by making the shorter of the two trees a subtree of the taller tree, thus not increasing the height of the new tree
Union find - quick find - why is the union operation O(n).
Because when we join two disjoint graphs A and B, we need to update the root node for all nodes in B to be the root node of A.
Even if we figure out which subset has less nodes, in the case where A and B are about the same size, the runtime complexity is O(1/2 n) or O(n).
Union find, quick find
Runtime complexity for each operation
Quick find:
- find: o(1)
- union: o(n)
What data do we need to store to implement union find
- a root array storing either the root node or the parent node (quick find vs quick union)
- in the case of non int values, can use a map
- for “union by rank”, will also have a rank array to store the height of each element, init all elements to 1.
- only the rank[root] are read and updated in the “union” function
Whats the implementation for each union find functions, for the quick union variation
Also point out the difference for union by rank
- init
- init root
- for “union by rank”, init rank to be all 1
- find(x) - returns the root node of x
- loop the parent of x until it equals itself
- union (with union by rank)
- find root of x and y, if they are the same, skip
- compare the rank of rootX and rootY
- set the new root to be the root of the taller tree
- if they are equal, the height of the tree is incremented by 1
- isConnected
- return find(x) == find(y)