Data structure and algorithm Flashcards

Question 1

Q

Name 4 enumeration use cases for backtracking

Answer

A

subset
permutation
combination sum
palindrome partitioning

Question 2

Q

What algorithm does backtracking use and what’s a “must have” function for backtracking

Answer

A

uses recursion
uses DFS
need a isValid function to check if the solution is valid so far

Question 3

Q

What are the two implementations of a graph

Answer

A

adjacency matrix
- 2D array of boolean where true indicates an edge exists between the two nodes
adjacency list
- Map<T, List<T>></T>
- map of a node and all of its neighbors

Question 4

Q

Name 2 diff between graph vs tree traversal

Answer

A

For graph, need to keep track of “visited”
For graph, need to consider a disjoint graph - use an outer while loop

Question 5

Q

What data structures are used for BFS vs DFS

Answer

A

BFS uses a Queue (class LinkedList)
DFS uses a Stack (class Stack)

Question 6

Q

Describe BFS to visit all vertices for a graph (iterative approach)

Answer

A

prep (before the loop)
- add starting node to the queue
for loop (process the queue)
1. Look at the head of the queue
2. if its visited, remove it from the queue, break from the loop
3. else, add it to “visited”
4. for all its neighbors, check if they are visited, if not, add to the queue
5. remove it from the queue
stop condition: queue.isEmpty()
algorithm is iterative not recursive

Question 7

Q

Describe DFS for a graph (iterative approach)

Answer

A

push starting node into stack
while stack is not empty
- pop node from stack
- mark the node visited
- for all neighbors of the node
  - check if visited, if not, push to stack
stop condition: stack is empty

Question 8

Q

Describe DFS for graph using recursion and some things to note

Answer

A

procedure dfs(vertex v) {
    visit(v);
    for each neighbor u of v
        if u is undiscovered
            call dfs(u);
}

Note this implementation does not handle disjoint graph
For disjoint graph, we need a separate DFS for each connected graph

Question 9

Q

Name 4 DFS applications (1 of which is not studied so it’s ok to only name 3)

Answer

A

detect cycle in graph
path finding
finding strongly connected components of a graph
backtracking

Question 10

Q

Tradeoffs between DFS and BFS (incomplete)

Answer

A

BFS
- can consume a lot more memory when the branching factor of a tree is huge

DFS
- can take a long time to visit other neighboring nodes if the depth of the tree is huge

Question 11

Q

Advantage of using adjacency list over adjacency matrix

Answer

A

adjacency list uses less space, better for sparse matrix

Question 12

Q

Implement permutation using backtrack

Answer

A

Highlights (compare to the subsets)
- add to solution only if the length matches
- no need to have a start pointer
- need to check if the current item is already in visited

/**
     * This implementation does not handle the input if it contains duplicate values
     * input: 1, 2, 3,
     * output: [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]
     *
     * https://leetcode.com/problems/permutations/
     *
     * @param nums
     * @return
     */

    public List<List<Integer>> permute(int[] nums) {
        List<List<Integer>> solution = new ArrayList<>();
        permuteBacktrack(solution, new ArrayList<>(), nums);
        return solution;
    }

    private void permuteBacktrack(List<List<Integer>> solution, List<Integer> visited, int [] nums) {

        if(visited.size() == nums.length){
            solution.add(new ArrayList<>(visited));
        } 

        for(int i = 0; i < nums.length; i++){
                if(visited.contains(nums[i])) continue; // element already exists, skip
                visited.add(nums[i]);
                permuteBacktrack(solution, visited, nums);
                visited.remove(visited.size() - 1);
        }
    }

Question 13

Q

Implement subsets using backtrack

Answer

A

Algorithm highlights:
- because we use recursion, any temp data structure are passed in as args instead of return values
- args are: solution, visited (list), input (array), start pointer
- have a for loop to iterate thru the input array
- visit the current item
- recursion with start++
- remove the last item in visited
- essentially “visited” is used as a “stack”

    public List<List<Integer>> subsets(int[] nums) {
        List<List<Integer>> solution = new ArrayList<>();
        subsetsBacktrack(solution, new ArrayList<>(), nums, 0);
        return solution;
    }

    private void subsetsBacktrack(List<List<Integer>> solution , List<Integer> visited, int [] nums, int start){

        solution.add(new ArrayList<>(visited));

        for(int i = start; i < nums.length; i++){
            visited.add(nums[i]);
            subsetsBacktrack(solution, visited, nums, i + 1);
            visited.remove(visited.size() - 1);
        }
    }

Question 14

Q

Find substring problem - high level description of the approach

Problem: find a min substring in s that satisfies t

 * Input: s = "ADOBECODEBANC", t = "ABC"
 * Output: "BANC"

Answer

A

use a “two pointer” approach by using start/end pointer to iterate thru “s”
outer while loop to “expand” the window, move “end” pointer
outer loop ends when “end” points at last char in “s”
inner while loop to “shrink” the window, move the “start” pointer
“shrink” until the “window” no longer satisfies “t”

Question 15

Q

Find substring problem - what are the data to keep track of

Problem: find a min substring in s that satisfies t

 * Input: s = "ADOBECODEBANC", t = "ABC"
 * Output: "BANC"

Answer

A

a char map - to keep track of whats matched
start/end pointers
minStart/minLength

Question 16

Q

Implement isPalindrome

Answer

A

public boolean isPalindrome(String s, int low, int high){
        while(low < high)
            if(s.charAt(low++) != s.charAt(high--)) return false;
        return true;
    }

Important points
- use s.charAt()
- while (low < high), not <= because no need to compare the middle char if length is odd

Question 17

Q

Implement the template for binary search
- how to compare left and right (< or <=)
- mid rounds up or down, in java, what method to call
- how to set mid pointer
- how to set left and right after each iteration
- after exiting the loop, which pointer do we return

Answer

A

while loop is left < right
mid rounds down, In Java, use Math.floorDiv(int x, int y) -> int
mid is left + Math.floorDiv((right-left), 2)
left = mid + 1, right = mid
return left
after exiting the while loop, return left or right depending on if need to find min or max

def binary_search(array) -> int:
    def condition(value) -> bool:
        pass

    left, right = min(search_space), max(search_space) # could be [0, n], [1, n] etc. Depends on problem
    while left < right:
        mid = left + (right - left) // 2            # means round down
        if condition(mid):
            right = mid
        else:
            left = mid + 1
    return left

Question 18

Q

Name 3 basic applications for binary search

Answer

A

Basic applications

find the first “bad” version (like git bisect)
compute square root
- condition is k^2 > x, then k-1 is the answer
- right = x+1 to deal with edge conditions like x = 0 and x = 1
search insert positions in a sorted array

Question 19

Q

When can we use binary search
Give a sample problem of an advanced application
In this sample problem, what are the min and max values for the search space

Answer

A

When we can identify a search space.

Sample problem - determine the min ship weight to ship all the shipments within x days.

Input: weights = [1,2,3,4,5,6,7,8,9,10], days = 5

Approach
- search space is left = max(weights), right = sum(weights)
- condition function returns true if it’s possible to ship everything within D days with a given capacity

Question 20

Q

What are the variations of the sliding window problem

Answer

A

“max” window
- the initial condition is valid
- goal is to find the “max” window
- example: find max non repeated chars in a string

“min” window
- the initial condition is invalid
- goal is to find the “min” window where it’s valid
- example: find the substring problem

Question 21

Q

What are the two implementations of the “max window” problem

Answer

A

“shrinkable” solution
- right pointer expands until the window is invalid
- then shrink the left pointer until it’s valid again
- keep track of the “max” result

“non shrinkable” solution
- based on the observation that once we find a “max” value so far, then there is no need to shrink it
- we “shift” the window by incrementing “left” pointer when the window is invalid
- note there is no inner loop, it’s an “if” statement instead
- this essentially “shifts” the window (without increasing the size) when it’s invalid, thus keep track of the “max result” seen so far

Question 22

Q

Some highlights of the sliding window implementation for both the “max”and the “min” variations

Answer

A

common
- outer loop is a for loop, “right” ptr moves 1 step every time
- inner loop is a while loop, “left” ptr moves 1 step while its valid or invalid
- the variation between problems is how to keep the “state”

“max window”
- because the window starts as valid, the outer loop expands the window while state is valid
- the inner loop shrinks the window until it’s invalid

“min window” is the opposite

Question 23

Q

In union find, what’s the difference in the data stored in the “root” data structure between the quick find and quick union implementations

Answer

A

quick find: it stores the root of the node
quick union: it stores the parent of the node

Question 24

Q

Whats union find and what are the 4 common operations

Answer

A

union find performs operation on disjoint data sets
“find” - returns the root node given a node
“union” - joins the subtree containing x and y nodes
“init” - inits the disjoint data sets by setting each node as a disjoint graph
“isConnected” - returns true if x and y are connected

Question 25

Q

What are some variation of implementation of union find

Answer

A

Quick find - optimizes on “find”
- the root array stores the “root” of each element

Quick union - optimizes on “union”
- the root array stores the “parent” of each element

Union by rank - an optimization for “quick union”
- all function implementations are the same as “quick union” except for the “union” function
- optimize the worst case scenario in “quick union” where the tree is a single line (height of tree == num of elements)
- uses an additional array “rank” to store the height of the tree
- optimizes by making the shorter of the two trees a subtree of the taller tree, thus not increasing the height of the new tree

Question 26

Q

Union find - quick find - why is the union operation O(n).

Answer

A

Because when we join two disjoint graphs A and B, we need to update the root node for all nodes in B to be the root node of A.

Even if we figure out which subset has less nodes, in the case where A and B are about the same size, the runtime complexity is O(1/2 n) or O(n).

Question 27

Q

Union find, quick find
Runtime complexity for each operation

Answer

A

Quick find:
- find: o(1)
- union: o(n)

Question 28

Q

What data do we need to store to implement union find

Answer

A

a root array storing either the root node or the parent node (quick find vs quick union)
in the case of non int values, can use a map
for “union by rank”, will also have a rank array to store the height of each element, init all elements to 1.
- only the rank[root] are read and updated in the “union” function

Question 29

Q

Whats the implementation for each union find functions, for the quick union variation

Also point out the difference for union by rank

Answer

A

init
- init root
- for “union by rank”, init rank to be all 1
find(x) - returns the root node of x
- loop the parent of x until it equals itself
union (with union by rank)
- find root of x and y, if they are the same, skip
- compare the rank of rootX and rootY
- set the new root to be the root of the taller tree
- if they are equal, the height of the tree is incremented by 1
isConnected
- return find(x) == find(y)

Question 30

Q

When using DFS for graph, when to use recursion and when to use iteration? Give some samples problems for each.

Answer

A

Iteration
- when pushing the next node onto the stack
- example, find one path between A and B

Recursion
- when push a path onto the stack
- example: find all possible paths between A and B
- back tracking

Question 31

Q

What are the things to watch out for DFS in a 2D grid and how to find neighbors?

Answer

A

be sure to check if the node is an edge node to avoid index out of bound
i >= 0 && i < board.length
j >=0 && j < board[i].length

Question 32

Q

In DFS recursion
For the “find all paths between A and B” problem, what’s special about the “visited” array and the data that we push onto the stack. What do we do after dfs returns from recursion and why.

Answer

A

The “visited” array keeps track of the path we are exploring at the moment
To find out if we need to visit a node or not is to see whether it’s already on the path
We push the “path” onto the stack, but since we are doing recursion (where the stack is the call stack), it just means we keep it on the argument list.
When we return from recursion, we always remove the last node from the “path”. This is backtracking. Why only remove one node, because with each dfs step, we move fwd one node, thus we remove one node when we are done.

Question 33

Q

How do we break down a recursive function into 3 high level steps, use the “find all paths” problem as an example.

Answer

A

Before recursion
- process the current node, e.g. add to path, mark it visited
- check for stop condition and return if stop condition met

What to recurse on
- all unvisited neighbors of the current node
- set the “starting point” to each unvisited neighbors

After recursion
- remove the current node from the path

Question 34

Q

When we use recursion, where is the stack?
How do we push data onto the stack.

Explain with the “find all paths between A and B” problem

Answer

A

When using recursion, we use the function call stack as the stack to push/pop data from.
We push data onto the stack by passing it as an argument to the function
Each time we call the recursive function, we push data onto the stack.
In the “find all paths between A and B” example, we add one more node onto the path. But because the data is shared (because List<String> path is passed by reference), we need to remove the last item from the stack after recursion exits.</String>

Note - Recursion is used on problems that normally need a stack (but not all stack problems should use recursion).

Question 35

Q

Whats args to pass onto dfs for “find all paths” problem

Answer

A

the starting node
one path (visited data structure, can use LinkedHashSet to both keep the order of insertion and easy lookup
all paths - the solution

Question 36

Q

Practically speaking, does it make sense to use recursion for BFS

Answer

A

No, because using recursion essentially adds a stack (the call stack) to the solution. BFS uses a queue not a stack and therefore it’ll awkward to use recursion.

Question 37

Q

Cycle detection:
1. What algorithm do we use
2. Explain the algorithm

Answer

A

Use DFS recursion
Algorithm
- stop condition is the “next node” is already on the “path”
- add the current node on to the path
- calls dfs for all unvisited neighbors (where “visited” means on the “path”)
- after recursion exists, remove the last node from the “path”

Note.
- We push node onto the stack. Not a path
- That means we can also use DFS iterative approach.
- Visited is also the path. No separate visited set is needed
- Mark the node visited when it’s added to the path

Question 38

Q

What’s the name of the algorithm for topological sort

Answer

A

Kahn’s algorithm

Question 39

Q

Whats the time complexity for Kahn’s algorithm

Question 40

Q

Is Kahn’s algorithm BFS or DFS

Answer

A

BFS and it uses a Queue

Question 41

Q

What are edge conditions to consider for Topological sort and how to find out about each

Answer

A

There is no starting node - no node with in-degree of 0 - after all the in-degrees are calculated, there is nothing added to the Queue
There is a starting node but there is a cycle elsewhere in the graph - we are done processing all nodes in the Queue, but not all nodes were processed (visited.length != numCourses)

Question 42

Q

Kahn’s algorithm is normally used for topological sort, but in that process, it can be used for something else (almost as a by-product), what is it.

Answer

A

To detect if the graph contains cycles, but its less efficient than DFS

Question 43

Q

Describe Kahn’s algorithm

Answer

A

Calculate in-degrees and out-degrees for each node
If any node has in-degree of 0, add to the Queue
edge condition check: if the Q is empty, there is no “starting node”
While queue is not empty,
- remove the node from the queue
- mark the node “visited”
- for each node in its out-degree, subtract their out-degree by 1
- if the neighboring node’s out-degree becomes 0, add it to the Queue
if visited.length != numNodes, there is a cycle in the graph

Question 44

Q

What are some applications for BFS vs DFS for graph traversal

Answer

A

Common
- find if path exists between A and B (iterative dfs, faster with BFS)
- find all paths between A and B (recursive dfs)
- traverse all vertices (iterative dfs)

DFS
- cycle detection (recursive dfs)
- backtracking (recursive dfs)

BFS
- topological sort (Kahn’s algorithm, can also use to find cycle)
- shortest path
- graph coloring/bipartition

Question 45

Q

Describe the BFS algorithm to find if a path exist between A and B

Answer

A

Algorithm
- populate an adjacency list
- push A into the queue, mark it visited
- while queue is not empty
* remove from the queue
* if the node is B, then return true
* add all unvisited neighbors to the queue
- return false after the while loop exits

Highlights
- the queue saves a node (not a path)
- we need keep track of “visited”
- mark node as “visited” when they are added to the queue (not when they are removed from the queue)

Question 46

Q

Describe the BFS algorithm to find all paths from A to B

Answer

A

Algorithm
- push A onto the queue, this is a path of size 1
- while queue is not empty
* pop the path from the queue
* observe the last node from the path
* for all its neighbors, push a new path with the neighbor
* if the neighbor is B, add the path to the result

Highlights
- the queue saves a path, not a node
- no need to keep track of “visited” because we already save the path

Question 47

Q

Whats the time complexity for “find if path exist” for both BFS and DFS

Answer

A

They are both O(V+E)

Question 48

Q

Whats the time complexity for “find all paths” for both BFS and DFS

Answer

A

BFS: O( 2^V * V)
DFS: ( (V-1)! )

BFS is much better
For v=10,
BFS ~ 10k
DFS ~ 362k

Question 49

Q

Describe the shortest path algorithm using BFS

Answer

A

Highlight
- the queue saves the path
- we mark the node “visited” AFTER all its visited neighbors are added to the path

Algorithm
- add a path of 1 onto the the queue, it contains just node A
- while queue is not empty
* pop the last path
* examine the last node in the path
* for each unvisited neighbor of the last node, create a new path, and add to the queue
* mark the last node “visited”
* if the last node is B, return the size of the new path, this is the shortest path

Question 50

Q

Using BFS, why is the first path to B discovered is the shortest path

Answer

A

Using BFS, we are visiting nodes in “layers” from A
We see all nodes with 1 distance from A before we see any node with 2 distances
That’s why the first path discovered with B is the shortest path

Question 51

Q

Why do we need to keep track of “visited” for “shortest path” but not for “all paths” using BFS

Answer

A

For shortest path, we visit nodes in “layers” from A. If there is a path A C E D, and another path A D E, we don’t need to know about ACED because we get to D faster from A. Therefore, we mark D visited from the AD path rather than from the ACED path.

For “all paths”, because we keep track of the path in the queue, we don’t need a separate “visited” set. We can find out if a node is already on the path. For “all paths”, ACED is a diff path from AD.

Question 52

Q

Whats a bipartite graph

Answer

A

A graph is biparte if it can be div into two distinct set of nodes where there is no edges between nodes within the same set. All edges are between the “sets”.

Question 53

Q

What are the 4 properties of a bipartite graph

Answer

A

never contain “odd cycles”, e.g. a cycle with odd number of nodes
subgraphs of a bipartite graph is also bipartite
a bipartite graph is always 2 way colorable, and vise versa
In an undirected bipartite graph, the degree of each vertex partition set is always equal.

Question 54

Q

Describe the algorithm to find if a graph is bipartite.

Answer

A

The graph could be disjoint, use an outer loop to iterate thru all nodes. If a node is “uncolored”, run it thru BFS.
BFS starts with an uncolored node, and returns a boolean to indicate if the graph is bipartite.
- Mark the starting node with an arbitrary color, add to the queue.
- while queue is not empty, pop the node
- Check if any neighbor is of the same color, if so, return false (we found a conflict).
- Otherwise, mark the neighbor the “opposite” color.
- When the BFS exits, the entire connected graph is processed.
- BFS uses “colored” to keep track of “visited”.

Question 55

Q

Explain the pre-order, in-order, and post-order traversals for a tree.

Answer

A

pre order: node, left subtree, right subtree
in order: left subtree, node, right subtree. This is used to display values in non decreasing order for a BST.
post order: left subtree, right subtree, node

Question 56

Q

Which of the pre-order, in-order, and post-order traversals use DFS and which uses BFS

Answer

A

All 3 uses DFS

We can use either the recursive approach or iterative approach for all 3.
- pre/post order iterative approach can be implemented with 1 stack
- in order iterative approach - I have not look it up yet, heard it uses 2 stacks

Question 57

Q

What happens to pre/in/post order traversals when the tree is not a binary tree

Answer

A

For pre-order and post-order, since L and R are traversed together, it means visit all the children before/after the node.
For in-order, we need to define which children should be visited before the node.

Question 58

Q

Describe the recursive approach for pre/in/post order traversal

Answer

A

Recursive is easier compare to iterative.

For in-order:
- visit the left tree
- visit the node
- visit the right tree

For pre/post order, just change when to visit the node.

Stop condition is if the node is a lead node.

Question 59

Q

Describe the iterative approach for pre-order

Answer

A

Pre-order is the easier one because we “visit” the node first.
Because its DFS, we use a stack.

Steps
- add root to stack
- while stack not empty
- pop node, mark it visited (add to output)
- add children R to L.

To “visit” a node, just add it to the output.
We pop the node when it’s “visited”

Question 60

Q

Describe the iterative approach for post-order

Answer

A

Post order is slightly more complicated than pre-order because we can’t “visit” the node until both children are “visited”.
“Visit” means to pop the node from stack and add to the output

Steps
- add root to stack
- while stack is not empty
- peek the node (don’t pop yet)
- pop it only if both children are “visited”
- push R subtree to stack
- push L subtree to stack

Question 61

Q

Whats the time complexity to in/pre/post order traversal

Answer

A

O(n) since we visit all nodes exactly once.

Question 62

Q

What is a trie

Answer

A

tree data structure to store strings
all strings sharing the common prefix comes from a common node

Question 63

Q

Define a TrieNode (including fields and constructor)

Answer

A

Fields:
- Map<Character, TrieNode> edges - where Char is the value of the edge
- boolean isEndNode

Constructor
- new HashMap()
- isEndNode = false

Question 64

Q

How is a TrieNode different from a TreeNode or GraphNode

Answer

A

a TrieNode has no value, the values are on the edges
TrieNode needs to keep track whether it’s an end node

Answer 64

A

insert
search
startsWith
delete - less common

Answer 65

A

Keys are searched using common prefixes, hence faster.
take less space because the common part of the string is only stored once
help with longest prefix matching

Answer 66

A

trie is faster in worst case compare to imperfect hash table
trie no hash function thus no collision
trie is not available in programming tool, hence implementation is always done from scratch

Answer 67

A

build dictionaries for english
spell check

Answer 68

A

convert the word to lower case
use a var to keep track of the last TrieNode
walk the down trie, if the edges does not contain the char to be added, then add the edge, otherwise just keep walking
set the “isEndNode” to true for the last char

Answer 69

A

convert the word to lower case
use a var to keep track of the last TrieNode
as we walk down the trie, if we cannot find an edge, return false
if we are able to walk all the way down to the last char AND if the last char isEndNode is true, return true.

Answer 70

A

quick find: it stores the root of the node
quick union: it stores the parent of the node

Answer 71

A

They are the same until the last step.
- for startsWith, no need to check if the last node’s “isEndNode” is set to true
- for search, we need to check

Answer 72

A

Quick union:
- find: o(n)
- union: o(n)

Answer 73

A

Union by rank
- find: o(log n)
- union: o(log n)

Brainscape's Knowledge GenomeTM

Data structure and algorithm Flashcards

Brainscape's Knowledge Genome^TM