Subset 1 Flashcards
Subset 1 of Coding Problems
<p>This problem was recently asked by Google.</p>
<p>Given a list of numbers and a number <code>k</code>, return whether any two numbers from the list add up to <code>k</code>.</p>
<p>For example, given <code>[10, 15, 3, 7]</code> and <code>k</code> of <code>17</code>, return true since <code>10 + 7</code> is <code>17</code>.</p>
<p>Bonus: Can you do this in one pass?</p>
<p>This problem can be solved in several different ways.</p>
<p>Brute force way would involve a nested iteration to check for every pair of numbers:</p>
<pre><code>def two_sum(lst, k):
for i in range(len(lst)):
for j in range(len(lst)):
if i != j and lst[i] + lst[j] == k:
return True
return False
</code></pre>
<p>This would take O(N2). Another way is to use a set to remember the numbers we've seen so far.
Then for a given number, we can check if there is another number that, if added, would sum to k.
This would be O(N) since lookups of sets are O(1) each.</p>
<pre><code>def two_sum(lst, k):
seen = set()
for num in lst:
if k - num in seen:
return True
seen.add(num)
return False
</code></pre>
<p>Yet another solution involves sorting the list. We can then iterate through the list and run a binary search on <code>K - lst[i]</code>. Since we run binary search on N elements, this would take O(N log N) with O(1) space.</p>
<pre><code>from bisect import bisect_left
def two_sum(lst, K):
lst.sort()
for i in range(len(lst)):
target = K - lst[i]
j = binary_search(lst, target)
# Check that binary search found the target and that it's not in the same index
# as i. If it is in the same index, we can check lst[i + 1] and lst[i - 1] to see
# if there's another number that's the same value as lst[i].
if j == -1:
continue
elif j != i:
return True
elif j + 1 < len(lst) and lst[j + 1] == target:
return True
elif j - 1 >= 0 and lst[j - 1] == target:
return True
return False
def binary_search(lst, target):
lo = 0
hi = len(lst)
ind = bisect_left(lst, target, lo, hi)
if 0 <= ind < hi and lst[ind] == target:
return ind
return -1
</code></pre>
<p>This problem was asked by Uber.</p>
<p>Given an array of integers, return a new array such that each element at index <code>i</code> of the new array is the product of all the numbers in the original array except the one at <code>i</code>.</p>
<p>For example, if our input was <code>[1, 2, 3, 4, 5]</code>, the expected output would be <code>[120, 60, 40, 30, 24]</code>.
If our input was <code>[3, 2, 1]</code>, the expected output would be <code>[2, 3, 6]</code>.</p>
<p>Follow-up: what if you can't use division?</p>
<p>This problem would be easy with division: an optimal solution could
just find the product of all numbers in the array and then divide
by each of the numbers.</p>
<p>Without division, another approach would be to first see that the ith
element simply needs the product of numbers before i and the product
of numbers after i. Then we could multiply those two numbers to get
our desired product.</p>
<p>In order to find the product of numbers before i, we can generate
a list of prefix products. Specifically, the ith element in the list
would be a product of all numbers including i. Similarly, we would generate
the list of suffix products.</p>
<pre><code>def products(nums):
# Generate prefix products
prefix_products = []
for num in nums:
if prefix_products:
prefix_products.append(prefix_products[-1] * num)
else:
prefix_products.append(num)
# Generate suffix products
suffix_products = []
for num in reversed(nums):
if suffix_products:
suffix_products.append(suffix_products[-1] * num)
else:
suffix_products.append(num)
suffix_products = list(reversed(suffix_products))
# Generate result
result = []
for i in range(len(nums)):
if i == 0:
result.append(suffix_products[i + 1])
elif i == len(nums) - 1:
result.append(prefix_products[i - 1])
else:
result.append(prefix_products[i - 1] * suffix_products[i + 1])
return result
</code></pre>
<p>This runs in O(N) time and space, since iterating over the input arrays takes O(N)
time and creating the prefix and suffix arrays take up O(N) space.</p>
<p>This problem was asked by Google.</p>
<p>Given the root to a binary tree, implement <code>serialize(root)</code>, which serializes the tree into a
string, and <code>deserialize(s)</code>, which deserializes the string back into the tree.</p>
<p>For example, given the following <code>Node</code> class</p>
<pre><code>class Node:
def \_\_init\_\_(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
</code></pre>
<p>The following test should pass:</p>
<pre><code>node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'
</code></pre>
<p>There are many ways to serialize and deserialize a binary tree, so don't worry
if your solution differs from this one. We will be only going through one
possible solution.</p>
<p>We can approach this problem by first figuring out what we would like the
serialized tree to look like. Ideally, it would contain the minimum
information required to encode all the necessary information about
the binary tree. One possible encoding might be to borrow <a>S-expressions</a>
from Lisp. The tree <code>Node(1, Node(2), Node(3))</code> would then look like
'(1 (2 () ()) (3 () ()))', where the empty brackets denote nulls.</p>
<p>To minimize data over the hypothetical wire, we could go a step further
and prune out some unnecessary brackets. We could also replace the
2-character '()' with '#'. We can then infer leaf nodes by their form
'val # #' and thus get the structure of the tree that way. Then our
tree would look like <code>1 2 # # 3 # #</code>.</p>
<pre><code>def serialize(root):
if root is None:
return <span>'#'
return <span>'{} {} {}'.format(root.val, serialize(root.left), serialize(root.right))
def deserialize(data):
def helper():
val = next(vals)
if val == <span>'#':
return None
node = Node(int(val))
node.left = helper()
node.right = helper()
return node
vals = iter(data.split())
return helper()
</span></span></span></code></pre>
<p>This runs in O(N) time and space, since we iterate over the whole tree when serializing and deserializing.</p>
<p>This problem was asked by Stripe.</p>
<p>Given an array of integers, find the first missing positive integer in linear time and constant space.
In other words, find the lowest positive integer that does not exist in the array.
The array can contain duplicates and negative numbers as well.</p>
<p>For example, the input <code>[3, 4, -1, 1]</code> should give <code>2</code>. The input <code>[1, 2, 0]</code> should give <code>3</code>.</p>
<p>You can modify the input array in-place.</p>
<p>Our lives would be easier without the linear time constraint:
we would just sort the array, while filtering out negative numbers,
and iterate over the sorted array and return the first number that doesn't
match the index. However, sorting takes O(n log n), so we can't use that here.</p>
<p>Clearly we have to use some sort of trick here to get it running in linear time.
Since the first missing positive number must be between 1 and len(array) + 1 (why?),
we can ignore any negative numbers and numbers bigger than len(array).
The basic idea is to use the indices of the array itself to reorder the elements
to where they should be. We traverse the array and swap elements between 0
and the length of the array to their value's index. We stay at each index until
we find that index's value and keep on swapping.</p>
<p>By the end of this process, all the first positive numbers should be grouped in
order at the beginning of the array. We don't care about the others.
This only takes O(N) time, since we swap each element at most once.</p>
<p>Then we can iterate through the array and return the index of the first number that doesn't match, just like before.</p>
<pre><code>def first_missing_positive(nums):
if not nums:
return 1
for i, num in enumerate(nums):
while i + 1 != nums[i] and 0 < nums[i] <= len(nums):
v = nums[i]
nums[i], nums[v - 1] = nums[v - 1], nums[i]
if nums[i] == nums[v - 1]:
break
for i, num in enumerate(nums, 1):
if num != i:
return i
return len(nums) + 1
</code></pre>
<p>Another way we can do this is by adding all the numbers to a set, and then
use a counter initialized to 1. Then continuously increment the counter and
check whether the value is in the set.</p>
<pre><code>def first_missing_positive(nums):
s = set(nums)
i = 1
while i in s:
i += 1
return i
</code></pre>
<p>This is much simpler, but runs in O(N) time and space, whereas the previous algorithm uses no extra space.</p>
<p>This problem was asked by Jane Street.</p>
<p><code>cons(a, b)</code> constructs a pair, and <code>car(pair)</code> and <code>cdr(pair)</code> returns the first and last element of that pair. For example, <code>car(cons(3, 4))</code> returns <code>3</code>, and <code>cdr(cons(3, 4))</code> returns <code>4</code>.</p>
<p>Given this implementation of cons:</p>
<pre><code>def cons(a, b):
def pair(f):
return f(a, b)
return pair
</code></pre>
<p>Implement <code>car</code> and <code>cdr</code>.</p>
<p>This is a really cool example of using <a>closures</a> to store data. We must look
at the signature type of cons to retrieve its first and last elements. cons takes in
a and b, and returns a new anonymous function, which itself takes in f, and calls
f with a and b. So the input to car and cdr is that anonymous function, which is <code>pair</code>. To get a
and b back, we must feed it yet another function, one that takes in two parameters
and returns the first (if car) or last (if cdr) one.</p>
<pre><code>def car(pair):
return pair(lambda a, b: a)
def cdr(pair):
return pair(lambda a, b: b)
</code></pre>
<p>Fun fact: cdr is pronounced "cudder"!</p>
<p>This problem was asked by Google.</p>
<p>An XOR linked list is a more memory efficient doubly linked list.
Instead of each node holding <code>next</code> and <code>prev</code> fields, it holds a field named <code>both</code>,
which is an XOR of the next node and the previous node. Implement
an XOR linked list; it has an <code>add(element)</code> which adds the element to the
end, and a <code>get(index)</code> which returns the node at index.</p>
<p>If using a language that has no pointers (such as Python), you can assume you have access to <code>get_pointer</code> and
<code>dereference_pointer</code> functions that converts between nodes and memory addresses.</p>
<p>For the head, <code>both</code> will just be the address of next, and if it's the tail, it should
just be the address of prev. And intermediate nodes should have an XOR of <code>next</code> and <code>prev</code>.</p>
<p>Here's an example XOR linked list which meets the above conditions:</p>
<pre><code>A <-> B <-> C <-> D
B A ⊕ C B ⊕ D C
</code></pre>
<p>Let's work through <code>get</code> first, assuming that the above conditions are maintained. Then, given a
node, to go to the next node, we have to XOR the current node's <code>both</code> with the previous node's
address. And to handle getting the next node from the head, we would initialize the previous node's address as 0.</p>
<p>So in the above example, <code>A</code>'s <code>both</code> is <code>B</code> which when XOR'd with <code>0</code> would become <code>B</code>.
Then <code>B</code>'s <code>both</code> is <code>A ⊕ C</code>, which when XOR'd with <code>A</code> becomes C, etc.</p>
<p>To implement <code>add</code>, we would need to update current tail's <code>both</code> to be XOR'd by its current <code>both</code> the new node's memory address.
Then the new node's <code>both</code> would just point to the memory address of the current tail. Finally, we'd update
the current tail to be equal to the new node.</p>
<pre><code>import ctypes
# This is hacky. It's a data structure for C, not python.
class Node(object):
def \_\_init\_\_(self, val):
self.val = val
self.both = 0
class XorLinkedList(object):
def \_\_init\_\_(self):
self.head = self.tail = None
self.\_\_nodes = [] # This is to prevent garbage collection
def add(self, node):
if self.head is None:
self.head = self.tail = node
else:
self.tail.both = id(node) ^ self.tail.both
node.both = id(self.tail)
self.tail = node
# Without this line, Python thinks there is no way to reach nodes between
# head and tail.
self.\_\_nodes.append(node)
def get(self, index):
prev_id = 0
node = self.head
for i in range(index):
next_id = prev_id ^ node.both
if next_id:
prev_id = id(node)
node = _get_obj(next_id)
else:
raise IndexError(<span>'Linked list index out of range')
return node
def _get_obj(id):
return ctypes.cast(id, ctypes.py_object).value
</span></code></pre>
<p><code>add</code> runs in O(1) time and <code>get</code> runs in O(N) time.</p>
<p>This problem was asked by Facebook.</p>
<p>Given the mapping a = 1, b = 2, ... z = 26, and an encoded message, count the number of ways it can be decoded.</p>
<p>For example, the message '111' would give 3, since it could be decoded as 'aaa', 'ka', and 'ak'.</p>
<p>You can assume that the messages are decodable. For example, '001' is not allowed.</p>
<p>This looks like a problem that is ripe for solving with recursion. First, let's try to
think of a recurrence we can use for this problem. We can try some cases:</p>
<ul><li>"", the empty string and our base case, should return 1.</li><li>"1" should return 1, since we can parse it as "a" + "".</li><li>"11" should return 2, since we can parse it as "a" + "a" + "" and "k" + "".</li><li>"111" should return 3, since we can parse it as:<ul><li>"a" + "k" + ""</li><li>"k" + "a" + ""</li><li>"a" + "a" + "a" + "".</li></ul></li><li>"011" should return 0, since no letter starts with 0 in our mapping.</li><li>"602" should also return 0 for similar reasons.</li></ul>
<p>We have a good starting point. We can see that the recursive structure is
as follows:</p>
<ul><li>If string starts with zero, then there's no valid encoding.</li><li>If the string's length is less than or equal to 1, there is only 1 encoding.</li><li>If the first two digits form a number <code>k</code> that is less than or equal to 26, we can recursively
count the number of encodings assuming we pick <code>k</code> as a letter.</li><li>We can also pick the first digit as a letter and count the number of encodings with this assumption.</li></ul>
<pre><code>def num_encodings(s):
if s.startswith(<span>'0'):
return 0
elif len(s) <= 1: # This covers empty string
return 1
total = 0
if int(s[:2]) <= 26:
total += num_encodings(s[2:])
total += num_encodings(s[1:])
return total
</span></code></pre>
<p>However, this solution is not very efficient. Every branch calls itself recursively twice,
so our runtime is O(2n). We can do better by using dynamic programming.</p>
<p>All the following code does is repeat the same computation as above except starting from
the base case and building up the solution. Since each iteration takes O(1), the whole
algorithm now takes O(n).</p>
<pre><code>from collections import defaultdict
def num_encodings(s):
# On lookup, this hashmap returns a default value of 0 if the key doesn't exist
# cache[i] gives us # of ways to encode the substring s[i:]
cache = defaultdict(int)
cache[len(s)] = 1 # Empty string is 1 valid encoding
for i in reversed(range(len(s))):
if s[i].startswith(<span>'0'):
cache[i] = 0
elif i == len(s) - 1:
cache[i] = 1
else:
if int(s[i:i + 2]) <= 26:
cache[i] = cache[i + 2]
cache[i] += cache[i + 1]
return cache[0]
</span></code></pre>
<p>This problem was asked by Google.</p>
<p>A unival tree (which stands for "universal value") is a tree where all nodes under it have the same value.</p>
<p>Given the root to a binary tree, count the number of unival subtrees.</p>
<p>For example, the following tree has 5 unival subtrees:</p>
<pre><code> 0
/ \
1 0
/ \
1 0
/ \
1 1
</code></pre>
<p>To start off, we should go through some examples.</p>
<pre><code> a
/ \
a a
/\
a a
\
A
</code></pre>
<p>This tree has 3 unival subtrees: the two 'a' leaves, and the one 'A' leaf. The 'A' leaf causes all its
parents to not be counted as a unival tree.</p>
<pre><code> a
/ \
c b
/\
b b
\
b
</code></pre>
<p>This tree has 5 unival subtrees: the leaf at 'c', and every 'b'.</p>
<p>We can start off by first writing a function that checks whether a tree is unival or not.
Then, perhaps we could use this to count up all the nodes in the tree.</p>
<p>To check whether a tree is a unival tree, we must check that every node in the tree has
the same value. To start off, we could define an <code>is_unival</code> function that takes in
a root to a tree. We would do this recursively with a helper function. Recall that
a leaf qualifies as a unival tree.</p>
<pre><code>def is_unival(root):
return unival_helper(root, root.value)
def unival_helper(root, value):
if root is None:
return True
if root.value == value:
return unival_helper(root.left, value) and unival_helper(root.right, value)
return False
</code></pre>
<p>And then our function that counts the number of subtrees could simply use that function:</p>
<pre><code>def count_unival_subtrees(root):
if root is None:
return 0
left = count_unival_subtrees(root.left)
right = count_unival_subtrees(root.right)
return 1 + left + right if is_unival(root) else left + right
</code></pre>
<p>However, this runs in O(n^2) time. For each node of the tree, we're evaluating
each node in its subtree again as well. We can improve the runtime by starting
at the leaves of the tree, and keeping track of the unival subtree count and value
as we percolate back up. This should evaluate each node only once, making it
run in O(n) time.</p>
<pre><code>def count_unival_subtrees(root):
count, _ = helper(root)
return count
# Also returns number of unival subtrees, and whether it is itself a unival subtree.
def helper(root):
if root is None:
return 0, True
left_count, is_left_unival = helper(root.left)
right_count, is_right_unival = helper(root.right)
total_count = left_count + right_count
if is_left_unival and is_right_unival:
if root.left is not None and root.value != root.left.value:
return total_count, False
if root.right is not None and root.value != root.right.value:
return total_count, False
return total_count + 1, True
return total_count, False
</code></pre>
<p>This problem was asked by Airbnb.</p>
<p>Given a list of integers, write a function that returns the largest sum of non-adjacent numbers. Numbers can be <code>0</code> or negative.</p>
<p>For example, <code>[2, 4, 6, 2, 5]</code> should return <code>13</code>, since we pick <code>2</code>, <code>6</code>, and <code>5</code>. <code>[5, 1, 1, 5]</code> should return <code>10</code>, since we pick <code>5</code> and <code>5</code>.</p>
<p>Follow-up: Can you do this in O(N) time and constant space?</p>
<p>This problem seems easy from the surface, but is actually quite tricky. It's tempting
to try to use a greedy strategy like pick the largest number (or first), then the 2nd-largest if
it's non-adjacent and so on, but these don't work -- there will always be some edge case that
breaks it.</p>
<p>Instead, we should look at this problem recursively. Say we had a function that already
returns the largest sum of non-adjacent integers on smaller inputs. How could we use it to figure out
what we want?</p>
<p>Say we used this function on our array from <code>a[1:]</code> and <code>a[2:]</code>. Then our solution should be
<code>a[1:]</code> OR <code>a[0] + a[2:]</code>, whichever is largest. This is because choosing <code>a[1:]</code> precludes us
from picking <code>a[0]</code>. So, we could write a straightforward recursive solution like this:</p>
<pre><code>def largest_non_adjacent(arr):
if not arr:
return 0
return max(
largest_non_adjacent(arr[1:]),
arr[0] + largest_non_adjacent(arr[2:]))
</code></pre>
<p>However, this solution runs in O(2n) time, since with each call, we're making two further recursive calls.
We could memoize the results, or use dynamic programming to store, in an array, the largest sum of
non-adjacent numbers from index <code>0</code> up to that point. Like so:</p>
<pre><code>def largest_non_adjacent(arr):
if len(arr) <= 2:
return max(0, max(arr))
cache = [0 for i in arr]
cache[0] = max(0, arr[0])
cache[1] = max(cache[0], arr[1])
for i in range(2, len(arr)):
num = arr[i]
cache[i] = max(num + cache[i - 2], cache[i - 1])
return cache[-1]
</code></pre>
<p>This code should run in O(n) and in O(n) space. But we can improve this even further. Notice that we only
ever use the last two elements of the cache when iterating through the array. This suggests that we could
just get rid of most of the array and just store them as variables:</p>
<pre><code>def largest_non_adjacent(arr):
if len(arr) <= 2:
return max(0, max(arr))
max_excluding_last= max(0, arr[0])
max_including_last = max(max_excluding_last, arr[1])
for num in arr[2:]:
prev_max_including_last = max_including_last
max_including_last = max(max_including_last, max_excluding_last + num)
max_excluding_last = prev_max_including_last
return max(max_including_last, max_excluding_last)
</code></pre>
<p>This problem was asked by Apple.</p>
<p>Implement a job scheduler which takes in a function <code>f</code> and an integer <code>n</code>, and calls <code>f</code> after <code>n</code> milliseconds.</p>
<p>We can implement the job scheduler in many different ways, so don't worry if
your solution is different from ours. Here is just one way:</p>
<p>First, let's try the most straightforward solution. That would probably be
to spin off a new thread on each function we want to delay, sleep the
requested amount, and then run the function. It might look something like this:</p>
<pre><code>import threading
from time import sleep
class Scheduler:
def \_\_init\_\_(self):
pass
def delay(self, f, n):
def sleep_then_call(n):
sleep(n / 1000)
f()
t = threading.Thread(target=sleep_then_call)
t.start()
</code></pre>
<p>While this works, there is a huge problem with this method: we spin off a new
thread each time we call delay! That means the number of threads we use could
easily explode. We can get around this by having only one dedicated thread
to call the functions, and storing the functions we need to call in some data
structure. In this case, we use a list. We also have to do some sort of polling now to
check when to run a function. We can store each function along with a unix epoch timestamp
that tells it when it should run by. Then we'll poll some designated tick amount and
check the list for any jobs that are due to be run, run them, and then remove them
from the list.</p>
<pre><code>from time import sleep
import threading
class Scheduler:
def \_\_init\_\_(self):
self.fns = [] # tuple of (fn, time)
t = threading.Thread(target=self.poll)
t.start()
def poll(self):
while True:
now = time() * 1000
for fn, due in self.fns:
if now > due:
fn()
self.fns = [(fn, due) for (fn, due) in self.fns if due > now]
sleep(0.01)
def delay(self, f, n):
self.fns.append((f, time() * 1000 + n))
</code></pre>
<p>We'll stop here, but you can go much farther with this. Some extra credit work:</p>
<ul><li>Extend the scheduler to allow calling delayed functions with variables</li><li>Use a heap instead of a list to keep track of the next job to run more efficiently</li><li>Use a condition variable instead of polling (it just polls lower in the stack)</li><li>Use a threadpool or other mechanism to decrease the chance of starvation (one thread
not being able to run because of another running thread)</li></ul>
<p>This problem was asked by Twitter.</p>
<p>Implement an autocomplete system. That is, given a query string <code>s</code> and a set of all possible query strings,
return all strings in the set that have s as a prefix.</p>
<p>For example, given the query string <code>de</code> and the set of strings [<code>dog</code>, <code>deer</code>, <code>deal</code>], return [<code>deer</code>, <code>deal</code>].</p>
<p>Hint: Try preprocessing the dictionary into a more efficient data structure to speed up queries.</p>
<p>The naive solution here is very straightforward: we need to only iterate over the
dictionary and check if each word starts with our prefix. If it does, then
add it to our set of results and then return it once we're done.</p>
<pre><code>WORDS = [<span>'foo', <span>'bar', ...]
def autocomplete(s):
results = set()
for word in WORDS:
if word.startswith(s):
results.add(word)
return results
</span></span></code></pre>
<p>This runs in O(N) time, where N is the number of words in the dictionary.
Let's think about making this more efficient. We can preprocess the words,
but what data structure would be best for our problem?</p>
<p>If we pre-sort the list, we could use binary search to find the first word
that includes our prefix and then the last, and return everything in between.</p>
<p>Alternatively, we could use a tree for this. Not a binary tree, but a tree where each child
represents one character of the alphabet. For example, let's say we had the
words 'a' and 'dog' in our dictionary. Then the tree would look like this:</p>
<pre><code> x
/ \
a d
\
o
\
g
</code></pre>
<p>Then, to find all words beginning with 'do', we could start at the root,
go into the 'd' child, and then the 'o', child, and gather up all the words
under there. We would also some sort of terminal value to mark whether or not
'do' is actually a word in our dictionary or not. This data structure is known
as a <a>trie</a>.</p>
<p>So the idea is to preprocess the dictionary into this tree, and then when
we search for a prefix, go into the trie and get all the words under
that prefix node and return those. While the worst-case runtime would still be
O(n) if all the search results have that prefix, if the words are uniformly
distributed across the alphabet, it should be much faster on average since we
no longer have to evaluate words that don't start with our prefix.</p>
<pre><code>ENDS_HERE = <span>'\_\_ENDS_HERE'
class Trie(object):
def \_\_init\_\_(self):
self._trie = {}
def insert(self, text):
trie = self._trie
for char in text:
if char not in trie:
trie[char] = {}
trie = trie[char]
trie[ENDS_HERE] = True
def elements(self, prefix):
d = self._trie
for char in prefix:
if char in d:
d = d[char]
else:
return []
return self._elements(d)
def _elements(self, d):
result = []
for c, v in d.items():
if c == ENDS_HERE:
subresult = [<span>'']
else:
subresult = [c + s for s in self._elements(v)]
result.extend(subresult)
return result
trie = Trie()
for word in words:
trie.insert(word)
def autocomplete(s):
suffixes = trie.elements(s)
return [s + w for w in suffixes]
</span></span></code></pre>
<p>This problem was asked by Amazon.</p>
<p>There exists a staircase with N steps, and you can climb up either 1 or 2 steps at a time.
Given N, write a function that returns the number of unique ways you can climb the staircase.
The order of the steps matters.</p>
<p>For example, if N is 4, then there are 5 unique ways:</p>
<ul><li>1, 1, 1, 1</li><li>2, 1, 1</li><li>1, 2, 1</li><li>1, 1, 2</li><li>2, 2</li></ul>
<p>What if, instead of being able to climb 1 or 2 steps at a time, you could climb any number
from a set of positive integers X? For example, if X = {1, 3, 5}, you could climb 1, 3, or 5
steps at a time.</p>
<p>It's always good to start off with some test cases. Let's start with small cases
and see if we can find some sort of pattern.</p>
<ul><li>N = 1: [1]</li><li>N = 2: [1, 1], [2]</li><li>N = 3: [1, 2], [1, 1, 1], [2, 1]</li><li>N = 4: [1, 1, 2], [2, 2], [1, 2, 1], [1, 1, 1, 1], [2, 1, 1]</li></ul>
<p>What's the relationship?</p>
<p>The only ways to get to N = 3, is to first get to N = 1, and then go up by 2
steps, or get to N = 2 and go up by 1 step. So f(3) = f(2) + f(1).</p>
<p>Does this hold for N = 4? Yes, it does. Since we can only get to the 4th step
by getting to the 3rd step and going up by one, or by getting to the 2nd step
and going up by two. So f(4) = f(3) + f(2).</p>
<p>To generalize, f(n) = f(n - 1) + f(n - 2). That's just the <a>Fibonacci sequence</a>!</p>
<pre><code>def staircase(n):
if n <= 1:
return 1
return staircase(n - 1) + staircase(n - 2)
</code></pre>
<p>Of course, this is really slow (O(2N)) — we are doing a lot of repeated computations!
We can do it a lot faster by just computing iteratively:</p>
<pre><code>def staircase(n):
a, b = 1, 2
for _ in range(n - 1):
a, b = b, a + b
return a
</code></pre>
<p>Now, let's try to generalize what we've learned so that it works if you can take a number of steps
from the set X. Similar reasoning tells us that if X = {1, 3, 5}, then our algorithm should be
f(n) = f(n - 1) + f(n - 3) + f(n - 5). If n < 0, then we should return 0 since we can't start
from a negative number of steps.</p>
<pre><code>def staircase(n, X):
if n < 0:
return 0
elif n == 0:
return 1
else:
return sum(staircase(n - x, X) for x in X)
</code></pre>
<p>This is again, very slow (O(|X|N)) since we are repeating computations again. We can
use dynamic programming to speed it up.</p>
<p>Each entry cache[i] will contain the number of ways we can get to step i with the set X.
Then, we'll build up the array from zero using the same recurrence as before:</p>
<pre><code>def staircase(n, X):
cache = [0 for _ in range(n + 1)]
cache[0] = 1
for i in range(1, n + 1):
cache[i] += sum(cache[i - x] for x in X if i - x >= 0)
return cache[n]
</code></pre>
<p>This now takes O(N * |X|) time and O(N) space.</p>
<p>This problem was asked by Amazon.</p>
<p>Given an integer k and a string s, find the length of the longest substring that contains at most k distinct characters.</p>
<p>For example, given s = "abcba" and k = 2, the longest substring with k distinct characters is "bcb".</p>
<p>The most obvious brute force solution here is to simply try every possible substring of the string
and check whether it contains at most <code>k</code> distinct characters. If it does and it is greater than
the current longest valid substring, then update the current one. This takes O(n2 * k) time,
since we use n2 to generate each possible substring, and then take <code>k</code> to check each character.</p>
<pre><code>def longest_substring_with_k_distinct_characters(s, k):
current_longest_substring = <span>''
for i in range(len(s)):
for j in range(i + 1, len(s) + 1):
substring = s[i:j]
if len(set(substring)) <= k and len(substring) > len(current_longest_substring):
current_longest_substring = substring
return len(current_longest_substring)
</span></code></pre>
<p>We can improve this by instead keeping a running window of our longest substring.
We'll keep a dictionary that maps characters to the index of their last occurrence. Then,
as we iterate over the string, we'll check the size of the dictionary. If it's larger
than k, then it means our window is too big, so we have to pop the smallest item
in the dictionary and recompute the bounds. If, when we add a character to the
dictionary and it doesn't go over k, then we're safe -- the dictionary hasn't been
filled up yet or it's a character we've seen before.</p>
<pre><code>def longest_substring_with_k_distinct_characters(s, k):
if k == 0:
return 0
# Keep a running window
bounds = (0, 0)
h = {}
max_length = 0
for i, char in enumerate(s):
h[char] = i
if len(h) <= k:
new_lower_bound = bounds[0] # lower bound remains the same
else:
# otherwise, pop last occurring char
key_to_pop = min(h, key=h.get)
new_lower_bound = h.pop(key_to_pop) + 1
bounds = (new_lower_bound, bounds[1] + 1)
max_length = max(max_length, bounds[1] - bounds[0])
return max_length
</code></pre>
<p>This takes O(n * k) time and O(k) space.</p>
<p>This problem was asked by Google.</p>
<p>The area of a circle is defined as πr^2. Estimate π to 3 decimal places using a Monte Carlo method.</p>
<p>Hint: The basic equation of a circle is x2 + y2 = r2.</p>
<p>Monte Carlo methods rely on random sampling. In this case, if we take a cartesian plane and inscribe a
circle with radius <code>r</code> inside a square with lengths <code>2r</code>, then the area of the circle will be πr2 while
the area of the square will be (2r)2 = 4r2. Then, the ratio of the areas of the circle to the square
is <code>π / 4</code>.</p>
<p>So, what we can do is the following:</p>
<ul><li>Set r to be 1 (the unit circle)</li><li>Randomly generate points within the square with corners (-1, -1), (1, 1), (1, -1), (-1, 1)</li><li>Keep track of the points that fall inside and outside the circle<ul><li>You can check whether a point (x, y) is inside the circle if x2 + y2 < r2, which is another way of representing a circle</li></ul></li><li>Divide the number of points that fall inside the circle to the total number of points -- that should give us an approximation of π / 4.</li></ul>
<pre><code>from random import uniform
from math import pow
def generate():
return (uniform(-1, 1), uniform(-1, 1))
def is_in_circle(coords):
return coords[0] * coords[0] + coords[1] * coords[1] < 1
def estimate():
iterations = 10000000
in_circle = 0
for _ in range(iterations):
if is_in_circle(generate()):
in_circle += 1
pi_over_four = in_circle / iterations
return pi_over_four * 4
</code></pre>
<p>Note that this doesn't give a perfect approximation -- we need more iterations to get a closer estimate.
We want the digits of pi up to 3 decimal places. This translates to an error of < 10^(-3). The error
scales with the square root of the number of guesses, which means we need 10^6 iterations to get to
our desired precision. If we want more precision, we'll have to crank up the iterations.</p>
<p>This problem _is_ <a>embarrassingly parallel</a>.
None of the estimations have any dependent computations, so we can parallelize this problem easily -- divide up the workload into <code>P</code> processes you have,
and then add up all the points in the circle in the end. Extra credit: make this program multi-process.</p>
<p>This problem was asked by Facebook.</p>
<p>Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability.</p>
<p>Naively, we could process the stream and store all the elements we encounter in a list, find its size,
and pick a random element from [0, size - 1]. The problem with this approach is that it would take
O(N) space for a large N.</p>
<p>Instead, let’s attempt to solve using loop invariants. On the ith iteration of our loop to pick a
random element, let’s assume we already picked an element uniformly from [0, i - 1]. In order to
maintain the loop invariant, we would need to pick the ith element as the new random element at 1 / (i + 1)
chance. For the base case where i = 0, let’s say the random element is the first one. Then we know it
works because</p>
<ul><li>For i >= 0, before the loop began, any element K in [0, i - 1] had 1 / i chance of being chosen as the random element. We want K to have 1 / (i + 1) chance of being chosen after the iteration. This is the case since the chance of having being chosen already but not getting swapped with the ith element is 1 / i <em> (1 - (1 / (i + 1))) which is 1 / i </em> i / (i + 1) or 1 / (i + 1)</li></ul>
<p>Let’s see how the code would look:</p>
<pre><code>import random
def pick(big_stream):
random_element = None
for i, e in enumerate(big_stream):
if random.randint(1, i + 1) == 1:
random_element = e
return random_element
</code></pre>
<p>Since we are only storing a single variable, this only takes up constant space!</p>
<p>By the way, this is called <a>reservoir sampling</a>!</p>
<p>This problem was asked by Twitter.</p>
<p>You run an e-commerce website and want to record the last <code>N</code> <code>order</code> ids in a log.
Implement a data structure to accomplish this, with the following API:</p>
<ul><li>record(order_id): adds the order_id to the log</li><li>get_last(i): gets the ith last element from the log. i is guaranteed to be smaller than or equal to N.</li></ul>
<p>You should be as efficient with time and space as possible.</p>
<p>It seems like an array would be the perfect fit for this problem. We can just initialize the
array to have size N, and index it in constant time. Then, when we record any orders, we can
pop off the first order and append it to the end. Getting the ith last order would then just
be indexing the array at <code>length - i</code>.</p>
<pre><code>class Log(object):
def \_\_init\_\_(self, n):
self._log = []
self.n = n
def record(self, order_id):
if len(self._log) >= self.n:
self._log.pop(0)
self._log.append(order_id)
def get_last(self, i):
return self._log[-i]
</code></pre>
<p>This is one issue with this solution, however: when we have to pop off an element when the
array is full, we have to move every other element down by 1. That means <code>record</code> takes O(N)
time. How can we improve this?</p>
<p>What we can do to avoid having to moving every element down by 1 is to keep a current index and move it up
each time we record something. For <code>get_last</code>, we can simply take <code>current - i</code> to get the appropriate element.
Now, both <code>record</code> and <code>get_last</code> should take constant time.</p>
<pre><code>class Log(object):
def \_\_init\_\_(self, n):
self.n = n
self._log = []
self._cur = 0
def record(self, order_id):
if len(self._log) == self.n:
self._log[self._cur] = order_id
else:
self._log.append(order_id)
self._cur = (self._cur + 1) % self.n
def get_last(self, i):
return self._log[self._cur - i]
</code></pre>
<p>By the way, this is called a ring buffer or <a>circular buffer</a>!</p>
<p>This problem was asked by Google.</p>
<p>Suppose we represent our file system by a string in the following manner:</p>
<p>The string <code>"dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext"</code> represents:</p>
<pre><code>dir
subdir1
subdir2
file.ext
</code></pre>
<p>The directory <code>dir</code> contains an empty sub-directory <code>subdir1</code> and a sub-directory <code>subdir2</code> containing a file <code>file.ext</code>.</p>
<p>The string <code>"dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext"</code> represents:</p>
<pre><code>dir
subdir1
file1.ext
subsubdir1
subdir2
subsubdir2
file2.ext
</code></pre>
<p>The directory <code>dir</code> contains two sub-directories <code>subdir1</code> and <code>subdir2</code>. <code>subdir1</code> contains a file <code>file1.ext</code>
and an empty second-level sub-directory <code>subsubdir1</code>. <code>subdir2</code> contains a second-level sub-directory
<code>subsubdir2</code> containing a file <code>file2.ext</code>.</p>
<p>We are interested in finding the longest (number of characters) absolute path to a file within our
file system. For example, in the second example above, the longest absolute path is <code>"dir/subdir2/subsubdir2/file2.ext"</code>,
and its length is 32 (not including the double quotes).</p>
<p>Given a string representing the file system in the above format, return the length of the longest
absolute path to a file in the abstracted file system. If there is no file in the system, return 0.</p>
<p>Note:</p>
<p>The name of a file contains at least a period and an extension.</p>
<p>The name of a directory or sub-directory will not contain a period.</p>
<p>There are two steps in solving this question: we must first parse the string
representing the file system and then get the longest absolute path to a file.</p>
<h6>Step 1: Parsing the file system</h6>
<p>Ideally, we would initially parse the string given into a dictionary of some sort.
That would mean a string like:</p>
<pre><code>dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext
</code></pre>
<p>would become:</p>
<pre><code>{
<span>"dir": {
<span>"subdir1": {
<span>"file1.ext": True,
<span>"subsubdir1": {}
},
<span>"subdir2": {
<span>"subsubdir2": {
<span>"file2.ext": True
}
}
}
}
</span></span></span></span></span></span></span></code></pre>
<p>where each key with a dictionary as its value represents a directory, and a key
with <code>True</code> as its value represents an actual file.</p>
<p>To achieve this, we can first split the string by the newline character, meaning each
item in our array represents a file or directory. Then, we create an empty dictionary
to represent our parsed file system and traverse the file system on each entry.
We keep track of the last path we've seen so far in <code>current_path</code> because we may need
to return to some level in that path, depending on the number of tabs. Once we are at
the correct place to put down the new directory or file, we check the name for a <code>.</code>
and set the correct value to either <code>True</code> (if file) or <code>{}</code> (if directory).</p>
<pre><code>def build_fs(input):
fs = {}
files = input.split(<span>'\n')
current_path = []
for f in files:
indentation = 0
while <span>'\t' in f[:2]:
indentation += 1
f = f[1:]
current_node = fs
for subdir in current_path[:indentation]:
current_node = current_node[subdir]
if <span>'.' in f:
current_node[f] = True
else:
current_node[f] = {}
current_path = current_path[:indentation]
current_path.append(f)
return fs
</span></span></span></code></pre>
<h6>Step 2: Computing the longest path</h6>
<p>After we've constructed a native representation of the file system, we can write a fairly
straightforward recursive function that takes the current root, recursively calculates the
<code>longest_path</code> of all the subdirectories and files under the root, and returns the longest
one. Remember that since we specifically want the longest path to a file to discard any paths
that do not have a <code>.</code> in them. And if there are no paths starting at this root, then we can
simply return the empty string.</p>
<pre><code>def longest_path(root):
paths = []
for key, node in root.items():
if node == True:
paths.append(key)
else:
paths.append(key + <span>'/' + longest_path(node))
# filter out unfinished paths
paths = [path for path in paths if <span>'.' in path]
if paths:
return max(paths, key=lambda path:len(path))
else:
return <span>''
</span></span></span></code></pre>
<h6>Step 3: Putting it together</h6>
<p>Now that the hard part is done, we just need to put the two together:</p>
<pre><code>def longest_absolute_path(s):
return len(longest_path(build_fs(s)))
</code></pre>
<p>This runs in O(n), since we iterate over the input string twice to build the
file system, and then in the worst case we go through the string again
to compute the longest path.</p>
<p>This problem was asked by Google.</p>
<p>Given an array of integers and a number k, where 1 <= k <= length of the array, compute the maximum values of each subarray of length k.</p>
<p>For example, given array = [10, 5, 2, 7, 8, 7] and k = 3, we should get: [10, 7, 8, 8], since:</p>
<ul><li>10 = max(10, 5, 2)</li><li>7 = max(5, 2, 7)</li><li>8 = max(2, 7, 8)</li><li>8 = max(7, 8, 7)</li></ul>
<p>Do this in O(n) time and O(k) space. You can modify the input array in-place and you do not need to store the results. You can simply print them out as you compute them.</p>
<p>Even though the question states O(n), in an interview it's always useful to first write out
a brute force solution, which may provide us with some insight on some deeper structure
in the problem.</p>
<p>So let's first write out a naive solution: we can simply take each subarray of k length and compute
their maxes.</p>
<pre><code>def max_of_subarrays(lst, k):
for i in range(len(lst) - k + 1):
print(max(lst[i:i + k]))
</code></pre>
<p>This takes O(n * k) time, which doesn't get us quite to where we want. How can we make this faster?</p>
<p>One possible idea is this: we could use a max-heap of size k and add the first k elements to the heap
initially, and then pop off the max and add the next element for the rest of the array. This is better,
but adding and extracting from the heap will take O(log k), so this algorithm will take O(n * log k),
which is still not enough. How can we do better?</p>
<p>Notice that, for example, the input [1, 2, 3, 4, 5, 6, 7, 8, 9] and k = 3, after evaluating the max of
first range, since 3 is at the end, we only need to check whether 4 is greater than 3. If it is, then
we can print 4 immediately, and if it isn't, we can stick with 3.</p>
<p>On the other hand, for the input [9, 8, 7, 6, 5, 4, 3, 2, 1] and k = 3, after evaluating the max of
the first range, we can't do the same thing, since we can't use 9 again. We have to look at 8 instead,
and then once we move on to the next range, we have to look at 7.</p>
<p>These two data points suggest an idea: we can keep a double-ended queue with max size k and only keep
what we need to evaluate in it. That is, if we see [1, 3, 5], then we only need to keep [5], since
we know that 1 and 3 cannot possibly be the maxes.</p>
<p>So what we can do is maintain an ordered list of indices, where we only keep the elements we care about,
that is, we will maintain the loop invariant that our queue is always ordered so that we only keep
the indices we care about (i.e, there are no elements that are greater after, since we would just pick
the greater element as the max instead).</p>
<p>It will help to go over an example. Consider our test input: [10, 5, 2, 7, 8, 7] and k = 3. Our queue at each step would look like this (recall that these are indices):</p>
<h3>Preprocessing</h3>
<p>After processing 10: [0]
After processing 5: [0, 1] # 5 is smaller than 10, and 10 is still valid until we hit the 3rd index
After processing 2: [0, 1, 2] # 2 is smaller than 5, and 10 is still valid</p>
<h3>Main Loop</h3>
<p>Print value of first element in our queue: <strong>10</strong></p>
<p>After processing 7: [4] # 10 is no longer valid (we can tell since the current index - 0 > k), so we dequeue from the front. 7 is bigger than 5 and 2, so we get rid of them from the back and replace it with the 7</p>
<p>Print value of first element in our queue: <strong>7</strong></p>
<p>After processing 8: [5] # 8 is bigger than 7, so no point in keeping 7 around. We get rid of it from the back and replace it with the 8</p>
<p>Print value of first element in our queue: <strong>8</strong></p>
<p>After processing 7: [5, 4] # 7 is smaller than 8, so we enqueue it from the back</p>
<p>Print value of first element in our queue: <strong>8</strong></p>
<h3>Code</h3>
<pre><code>from collections import deque
def max_of_subarrays(lst, k):
q = deque()
for i in range(k):
while q and lst[i] >= lst[q[-1]]:
q.pop()
q.append(i)
# Loop invariant: q is a list of indices where their corresponding values are in descending order.
for i in range(k, len(lst)):
print(lst[q[0]])
while q and q[0] <= i - k:
q.popleft()
while q and lst[i] >= lst[q[-1]]:
q.pop()
q.append(i)
print(lst[q[0]])
</code></pre>
<p>This problem was asked by Facebook.</p>
<p>A builder is looking to build a row of N houses that can be of K different colors. He has
a goal of minimizing cost while ensuring that no two neighboring houses are of the same color.</p>
<p>Given an N by K matrix where the nth row and kth column represents the cost to build the nth
house with kth color, return the minimum cost which achieves this goal.</p>
<p>The brute force solution here would be to generate all possible combinations of houses
and colors, filter out invalid combinations, and keep track of the lowest cost seen.
This would take O(N^K) time.</p>
<p>We can solve this problem faster using dynamic programming. We can maintain a matrix cache
where every entry [i][j] represents the minimum cost of painting house i the color j,
as well as painting every house < i. We can calculate this by looking at the minimum
cost of painting each house < i - 1, and painting house i - 1 any color except j,
since that would break our constraint. We'll initialize the first row with zeroes to start.
Then, we just have to look at the smallest value in the last row of our cache, since
that represents the minimum cost of painting every house.</p>
<pre><code>def build_houses(matrix):
n = len(matrix)
k = len(matrix[0])
solution_matrix = [[0] * k]
# Solution matrix: matrix[i][j] represents the minimum cost to build house i with color j.
for r, row in enumerate(matrix):
row_cost = []
for c, val in enumerate(row):
row_cost.append(min(solution_matrix[r][i] for i in range(k) if i != c) + val)
solution_matrix.append(row_cost)
return min(solution_matrix[-1])
</code></pre>
<p>This runs in O(N <em> K^2) time and O(N </em> K) space. Can we do even better than this?</p>
<p>First off, notice that we're only ever looking at the last row when computing the next
row's cost. That suggests that we only need to keep track of one array of size K instead
of a whole matrix of size N * K:</p>
<pre><code>def build_houses(matrix):
k = len(matrix[0])
soln_row = [0] * k
for r, row in enumerate(matrix):
new_row = []
for c, val in enumerate(row):
new_row.append(min(soln_row[i] for i in range(k) if i != c) + val)
soln_row = new_row
return min(soln_row)
</code></pre>
<p>Now we're only using O(K) space! Can we improve this any more?</p>
<p>Hold on a second. When we're looking at the previous row's total cost, it looks like we're almost
computing the same thing each time: the minimum of the previous row that isn't the current index.</p>
<p>For every element that <strong>isn't</strong> that index, it will be the same value. When it <strong>is</strong> that index,
it will be the second-smallest value.</p>
<p>Now, armed with this insight, we only need to keep track of three variables:</p>
<ul><li>The lowest cost of the current row</li><li>The index of the lowest cost</li><li>The second lowest cost</li></ul>
<p>Then, when looking at the value at each row, we only need to do the following:</p>
<ul><li>Check if the index is the index of the lowest cost of the previous row. If it is, then we can't use
this color -- we'll use the second lowest cost instead. Otherwise, use the lowest cost
of the previous row</li><li>Calculate the minimum cost if we painted this house this particular color</li><li>Update our new lowest cost/index or second lowest cost if appropriate</li></ul>
<p>Now we'll always have our lowest cost in a variable, and once we've gone
through the matrix we can just return that.</p>
<pre><code>from math import inf
def build_houses(matrix):
lowest_cost, lowest_cost_index = 0, -1
second_lowest_cost = 0
for r, row in enumerate(matrix):
new_lowest_cost, new_lowest_cost_index = inf, -1
new_second_lowest_cost = inf
for c, val in enumerate(row):
prev_lowest_cost = second_lowest_cost if c == lowest_cost_index else lowest_cost
cost = prev_lowest_cost + val
if cost < new_lowest_cost:
new_second_lowest_cost = new_lowest_cost
new_lowest_cost, new_lowest_cost_index = cost, c
elif cost < new_second_lowest_cost:
new_second_lowest_cost = cost
lowest_cost = new_lowest_cost
lowest_cost_index = new_lowest_cost_index
second_lowest_cost = new_second_lowest_cost
return lowest_cost
</code></pre>
<p>Now the runtime is only O(N * K) and the space complexity is O(1) - constant, since we keep track of only three
variables!</p>
<p>Thanks to Alexander Shirkov for pointing out these optimizations!</p>
<p>This problem was asked by Google.</p>
<p>Given two singly linked lists that intersect at some point, find the intersecting node. The lists
are non-cyclical.</p>
<p>For example, given A = 3 -> 7 -> 8 -> 10 and B = 99 -> 1 -> 8 -> 10, return the node with value 8.</p>
<p>In this example, assume nodes with the same value are the exact same node objects.</p>
<p>Do this in O(M + N) time (where M and N are the lengths of the lists) and constant space.</p>
<p>We might start this problem by first ignoring the time and space constraints, in
order to get a better grasp of the problem.</p>
<p>Naively, we could iterate through one of the lists and add each node to a set
or dictionary, then we could iterate over the other list and check each node
we're looking at to see if it's in the set. Then we'd return the first node
that is present in the set. This takes O(M + N) time but also O(max(M, N))
space (since we don't know initially which list is longer). How can we reduce
the amount of space we need?</p>
<p>We can get around the space constraint with the following trick: first, get
the length of both lists. Find the difference between the two, and then keep
two pointers at the head of each list. Move the pointer of the larger list
up by the difference, and then move the pointers forward in conjunction and
check if they match.</p>
<pre><code>def length(head):
if not head:
return 0
return 1 + length(head.next)
def intersection(a, b):
m, n = length(a), length(b)
cur_a, cur_b = a, b
if m > n:
for _ in range(m - n):
cur_a = cur_a.next
else:
for _ in range(n - m):
cur_b = cur_b.next
while cur_a != cur_b:
cur_a = cur_a.next
cur_b = cur_b.next
return cur_a
</code></pre>
<p>This problem was asked by Snapchat.</p>
<p>Given an array of time intervals (start, end) for classroom lectures (possibly overlapping), find the minimum number of rooms required.</p>
<p>For example, given [(30, 75), (0, 50), (60, 150)], you should return 2.</p>
<p>First, notice that the minimum number of classroom halls is the maximum number of overlapping intervals.</p>
<p>Now let's consider the naive approach. We could go through each interval and
check every other interval and see if it overlaps, keeping track of the largest
number of overlapping intervals.</p>
<pre><code>def overlaps(a, b):
start_a, end_a = a
start_b, end_b = b
# It doesn't overlap if it's like this:
# |start_a .... end_a| <---> |start_b ... end_b|
# or like this:
# |start_b .... end_b| <---> |start_a ... end_a|
# so return not or either of these
return not (end_a < start_b or start_a > end_b)
def max_overlapping(intervals):
current_max = 0
for interval in intervals:
num_overlapping = sum(overlaps(interval, other_interval)
for other_interval in intervals
if interval is not other_interval)
current_max = max(current_max, num_overlapping)
return current_max
</code></pre>
<p>This would take O(n^2) time, since we're checking each interval pairwise. Can we do any better?</p>
<p>One solution is to extract the start times and end times of all the intervals and sort them.
Then we can start two pointers on each list, and consider the following:</p>
<ul><li>If the current start is before the current end, then we have a new overlap. Increment the start pointer.</li><li>If the current start is after the current end, then our overlap closes. Increment the end pointer.</li></ul>
<p>All that's left to do is keep a couple variables to keep track of the maximum number of overlaps we've seen so far
and the current number of overlaps.</p>
<pre><code>def max_overlapping(intervals):
starts = sorted(start for start, end in intervals)
ends = sorted(end for start, end in intervals)
current_max = 0
current_overlap = 0
i, j = 0, 0
while i < len(intervals) and j < len(intervals):
if starts[i] < ends[j]:
current_overlap += 1
current_max = max(current_max, current_overlap)
i += 1
else:
current_overlap -= 1
j += 1
return current_max
</code></pre>
<p>This runs in O(n log n) time, since we have to sort the intervals.</p>
<p>This problem was asked by Microsoft.</p>
<p>Given a dictionary of words and a string made up of those words (no spaces), return the original
sentence in a list. If there is more than one possible reconstruction, return any of them. If
there is no possible reconstruction, then return null.</p>
<p>For example, given the set of words 'quick', 'brown', 'the', 'fox', and the string "thequickbrownfox",
you should return ['the', 'quick', 'brown', 'fox'].</p>
<p>Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond', and the string "bedbathandbeyond",
return either ['bed', 'bath', 'and', 'beyond] or ['bedbath', 'and', 'beyond'].</p>
<p>We might be initially tempted to take a greedy approach to this problem, by
for example, iterating over the string and checking if our current string matches
so far. However, you should immediately find that that can't work: consider
the dictionary {'the', 'theremin'} and the string 'theremin': we would find
'the' first, and then we wouldn't be able to match 'remin'.</p>
<p>So this greedy approach doesn't work, since we would need to go back if we get stuck.
This gives us a clue that we might want to use <a>backtracking</a>
to help us solve this problem. We also have the following idea for a recurrence:
If we split up the string into a prefix and suffix, then we can return the prefix
extended with a list of the rest of the sentence, but only if they're both valid.
So what we can do is the following:</p>
<ul><li>Iterate over the string and split it into a prefix and suffix</li><li>If the prefix is valid (appears in the dictionary), then recursively call on the suffix</li><li>If that's valid, then return. Otherwise, continue searching.</li><li>If we've gone over the entire sentence and haven't found anything, then return empty.</li></ul>
<p>We'll need a helper function to tell us whether the string can actually be broken up
into a sentence as well, so let's define <code>find_sentence_helper</code> that also returns
whether or not the sentence is valid.</p>
<pre><code>def find_sentence(dictionary, s):
sentence, valid = find_sentence_helper(dictionary, s)
if valid:
return sentence
def find_sentence_helper(dictionary, s):
if len(s) == 0:
return [], True
result = []
for i in range(len(s) + 1):
prefix, suffix = s[:i], s[i:]
if prefix in dictionary:
rest, valid = find_sentence_helper(dictionary, suffix)
if valid:
return [prefix] + rest, True
return [], False
</code></pre>
<p>This will run in O(2^N) time, however. This is because in the worst case,
say, for example, s = "aaaaab" and dictionary = ["a", "aa", "aaa", "aaaa", "aaaaa"],
we will end up exploring every single path, or every combination of letters, and
the total number of combinations of characters is 2^N.</p>
<p>We can improve the running time by using dynamic programming to store repeated
subcomputations. This reduces the running time to just O(N^2). We'll keep a
dictionary that maps from indices to the last word that can be made up
to that index. We'll call these starts. Then, we just need to do two nested
for loops, one that iterates over the whole string and tries to find a
start at that index, and a loop that checks each start to see if a new word can
be made from that start to the current index.</p>
<p>Now we can simply take the start at the last index and build our sentence backwards:</p>
<pre><code>def find_sentence(s, dictionary):
starts = {0: <span>''}
for i in range(len(s) + 1):
new_starts = starts.copy()
for start_index, _ in starts.items():
word = s[start_index:i]
if word in dictionary:
new_starts[i] = word
starts = new_starts.copy()
result = []
current_length = len(s)
if current_length not in starts:
return None
while current_length > 0:
word = starts[current_length]
current_length -= len(word)
result.append(word)
return list(reversed(result))
</span></code></pre>
<p>Now this runs in O(N^2) time and O(N) space.</p>
<p>This problem was asked by Google.</p>
<p>You are given an M by N matrix consisting of booleans that represents a board.
Each True boolean represents a wall. Each False boolean represents a tile you
can walk on.</p>
<p>Given this matrix, a start coordinate, and an end coordinate, return the minimum
number of steps required to reach the end coordinate from the start. If there is
no possible path, then return null. You can move up, left, down, and right. You cannot
move through walls. You cannot wrap around the edges of the board.</p>
<p>For example, given the following board:</p>
<pre><code>[[f, f, f, f],
[t, t, f, t],
[f, f, f, f],
[f, f, f, f]]
</code></pre>
<p>and start = <code>(3, 0)</code> (bottom left) and end = <code>(0, 0)</code> (top left), the minimum number of steps
required to reach the end is 7, since we would need to go through <code>(1, 2)</code> because there is
a wall everywhere else on the second row.</p>
<p>The idea here is to use either BFS or DFS to explore the board, starting
from the start coordinate, and keep track of what we've seen so far as
well as the steps from the start until we find the end coordinate.</p>
<p>In our case, we'll use BFS. We'll create a queue and initialize it with our
start coordinate, along with a count of 0. We'll also initialize a <code>seen</code> set
to ensure we only add coordinates we haven't seen before.</p>
<p>Then, as long as there's something still in the queue, we'll dequeue from the
queue and first check if it's our target coordinate -- if it is, then we can
just immediately return the count. Otherwise, we'll get the valid neighbours
of the coordinate we're working with (valid means not off the board and not a
wall), and enqueue them to the end of the queue.</p>
<p>To make sure the code doesn't get too messy, we'll define some helper
functions: <code>walkable</code>, which returns whether or not a tile is valid,
and <code>get_walkable_neighbours</code> which returns the valid neighbours of a coordinate.</p>
<pre><code>from collections import deque
# Given a row and column, returns whether that tile is walkable.
def walkable(board, row, col):
if row < 0 or row >= len(board):
return False
if col < 0 or col >= len(board[0]):
return False
return not board[row][col]
# Gets walkable neighbouring tiles.
def get_walkable_neighbours(board, row, col):
return [(r, c) for r, c in [
(row, col - 1),
(row - 1, col),
(row + 1, col),
(row, col + 1)]
if walkable(board, r, c)
]
def shortest_path(board, start, end):
seen = set()
queue = deque([(start, 0)])
while queue:
coords, count = queue.popleft()
if coords == end:
return count
seen.add(coords)
neighbours = get_walkable_neighbours(board, coords[0], coords[1])
queue.extend((neighbour, count + 1) for neighbour in neighbours
if neighbour not in seen)
board = [[False, False, False, False],
[True, True, True, True],
[False, False, False, False],
[False, False, False, False]]
print(shortest_path(board, (3, 0), (0, 0)))
</code></pre>
<p>This code should run in O(M * N) time and space, since in the worst case we
need to examine the entire board to find our target coordinate.</p>
<p>This problem was asked by Google.</p>
<p>Implement locking in a binary tree. A binary tree node can be locked or unlocked
only if all of its descendants or ancestors are not locked.</p>
<p>Design a binary tree node class with the following methods:</p>
<ul><li><code>is_locked</code>, which returns whether the node is locked</li><li><code>lock</code>, which attempts to lock the node. If it cannot be locked, then it should return false.
Otherwise, it should lock it and return true.</li><li><code>unlock</code>, which unlocks the node. If it cannot be unlocked, then it should return false.
Otherwise, it should unlock it and return true.</li></ul>
<p>You may augment the node to add parent pointers or any other property you would like.
You may assume the class is used in a single-threaded program, so there is no need for actual locks or mutexes.
Each method should run in O(h), where h is the height of the tree.</p>
<p>A relatively easy way to implement this would be to augment each node with an <code>is_locked</code>
attribute as well as a parent pointer. We can then implement the methods
in a straightforward manner:</p>
<ul><li><code>is_locked</code> simply returns the node's attribute</li><li><code>lock</code> searches the node's children and parents for a true <code>is_locked</code> attribute.
If it is set to true on any of them, then return false. Otherwise, set the current
node's <code>is_locked</code> to true and return true.</li><li><code>unlock</code> simply changes the node's attribute to false. If we want to be safe,
then we should search the node's children and parents as in <code>lock</code> to make sure
we can actually unlock the node, but that shouldn't ever happen.</li></ul>
<p>While <code>is_locked</code> is O(1) time, <code>lock</code> and <code>unlock</code> will take O(m + h) time where
m is the number of nodes in the node's subtree (since we have to traverse through all its descendants)
and h is the height of the node (since we have to traverse through the node's ancestors).</p>
<p>We can improve the performance of <code>lock</code> and <code>unlock</code> by adding another field to the node
that keeps tracks of the count of locked descendants. That way, we can immediately
see whether any of its descendants are locked. This will reduce our <code>lock</code> and <code>unlock</code>
functions to only O(h). We can maintain this field by doing the following:</p>
<ul><li>When locking, if the locking succeeds, traverse the node's ancestors and increment each one's count</li><li>When unlocking, traverse the node's ancestors and decrement each one's count</li></ul>
<p>The code will look something like the following:</p>
<pre><code>class LockingBinaryTreeNode(object):
def \_\_init\_\_(self, val, left=None, right=None, parent=None):
self.val = val
self.left = left
self.right = right
self.parent = parent
self.is_locked = False
self.locked_descendants_count = 0
def _can_lock_or_unlock(self):
if self.locked_descendants_count > 0:
return False
cur = self.parent
while cur:
if cur.is_locked:
return False
cur = cur.parent
return True
def is_locked(self):
return self.is_locked
def lock(self):
if self.is_locked:
return False # node already locked
if not self._can_lock_or_unlock():
return False
# Not locked, so update is_locked and increment count in all ancestors
self.is_locked = True
cur = self.parent
while cur:
cur.locked_descendants_count += 1
cur = cur.parent
return True
def unlock(self):
if not self.is_locked:
return False # node already unlocked
if not self._can_lock_or_unlock():
return False
self.is_locked = False
# Update count in all ancestors
cur = self.parent
while cur:
cur.locked_descendants_count -= 1
cur = cur.parent
return True
</code></pre>
<p>Now, <code>is_locked</code> is still O(1), but <code>lock</code> and <code>unlock</code> are both O(h) instead of O(m + h).</p>
<p>This problem was asked by Facebook.</p>
<p>Implement regular expression matching with the following special characters:</p>
<ul><li><code>.</code> (period) which matches any single character</li><li><code>*</code> (asterisk) which matches zero or more of the preceding element</li></ul>
<p>That is, implement a function that takes in a string and a valid regular expression
and returns whether or not the string matches the regular expression.</p>
<p>For example, given the regular expression "ra." and the string "ray", your function
should return true. The same regular expression on the string "raymond" should return false.</p>
<p>Given the regular expression ".*at" and the string "chat", your function should return
true. The same regular expression on the string "chats" should return false.</p>
<p>This problem should strike you as recursive. The string should match the regex
if we can match the head of the string with the head of the regex and the rest of the
string with the rest of the regex. The special characters <code>.</code> and <code>*</code> make implementing
this a bit trickier, however, since the <code>*</code> means we can match 0 or any number of characters
in the beginning.</p>
<p>The basic idea then is to do the following. Let's call the string we want to match <code>s</code> and
the regex <code>r</code>.</p>
<ul><li>Base case: if <code>r</code> is empty, then return whether <code>s</code> is empty or not.</li><li>Otherwise, if the first thing in <code>r</code> is not proceeded by a <code>*</code>, then match the first character
of both <code>r</code> and <code>s</code>, and if they match, return <code>match(r[1:], s[1:])</code>. If they don't, then return false.</li><li>If the first thing in <code>r</code> _is_ proceeded by a <code>*</code>, then try every suffix substring of <code>s</code> on <code>r[2:]</code>
and return true if any suffix substring works.</li></ul>
<p>The code should look something like this:</p>
<pre><code>def matches_first_char(s, r):
return s[0] == r[0] or (r[0] == <span>'.' and len(s) > 0)
def matches(s, r):
if r == <span>'':
return s == <span>''
if len(r) == 1 or r[1] != <span>'*':
# The first character in the regex is not proceeded by a *.
if matches_first_char(s, r):
return matches(s[1:], r[1:])
else:
return False
else:
# The first character is proceeded by a *.
# First, try zero length.
if matches(s, r[2:]):
return True
# If that doesn't match straight away, then try globbing more prefixes
# until the first character of the string doesn't match anymore.
i = 0
while matches_first_char(s[i:], r):
if matches(s[i+1:], r[2:]):
return True
i += 1
</span></span></span></span></code></pre>
<p>This takes O(len(s) * len(r)) time and space, since we potentially need to iterate over each suffix substring again
for each character.</p>
<p>Fun fact: Stephen Kleene introduced the <code>*</code> operator in regular expressions and as such, it
is sometimes referred to as the Kleene star.</p>
<p>This problem was asked by Google.</p>
<p>Given a singly linked list and an integer k, remove the kth last element from the list. k is guaranteed to be smaller than the length of the list.</p>
<p>The list is very long, so making more than one pass is prohibitively expensive.</p>
<p>Do this in constant space and in one pass.</p>
<p>If we didn't have the constraint of needing only to make one pass, this problem
would be trivial to implement. We could simply iterate over the whole list to
find out the total length N of the list, and then restart from the beginning
and iterate N - k steps and remove the node there. That would take constant
space as well.</p>
<p>However, given that we have the constraint of needing to make only one pass,
we have to find some way of getting the N - kth node in the list
in one shot.</p>
<p>What we can do, then, is this:</p>
<ul><li>Set up two pointers at the head of the list (let's call them <code>fast</code> and <code>slow</code>)</li><li>Move <code>fast</code> up by <code>k</code></li><li>Move both <code>fast</code> and <code>slow</code> together until <code>fast</code> reaches the end of the list</li><li>Now <code>slow</code> is at the N - kth node, remove it</li></ul>
<p>That only makes one pass and is constant time. The code should look something like
this:</p>
<pre><code>class Node:
def \_\_init\_\_(self, val, next=None):
self.val = val
self.next = next
def \_\_str\_\_(self):
current_node = self
result = []
while current_node:
result.append(current_node.val)
current_node = current_node.next
return str(result)
def remove_kth_from_linked_list(head, k):
slow, fast = head, head
for i in range(k):
fast = fast.next
prev = None
while fast:
prev = slow
slow = slow.next
fast = fast.next
prev.next = slow.next
head = Node(1, Node(2, Node(3, Node(4, Node(5)))))
print(head)
remove_kth_from_linked_list(head, 3)
print(head)
</code></pre>
<p>This problem was asked by Facebook.</p>
<p>Given a string of round, curly, and square open and closing brackets, return
whether the brackets are balanced (well-formed).</p>
<p>For example, given the string "([])[]({})", you should return true.</p>
<p>Given the string "([)]" or "((()", you should return false.</p>
<p>In this case, it's easy to start with a simplified case of the problem, which is
dealing with only round brackets. Notice that in this case, we just need to keep
track of the current number of open brackets -- each closing bracket should be
matched with the rightmost open bracket. So we can keep a counter and increment it
for every open bracket we see and decrement it on every closing bracket.
If we get to the end of the string and have a non-zero number, then it means it's unbalanced.
A negative number would indicate more closing brackets than open ones, and a positive number
would indicate the opposite.</p>
<p>In the case of round, curly, and square brackets, we need to also keep track
of what <em>kind</em> of brackets they are as well, because we can't match a round
open bracket with a curly square. In this case, we can use a stack to keep
track of the actual bracket character and push onto it whenever we encounter
an open bracket, and pop if we encounter a matching closing bracket to the top
of the stack. If the stack is empty or it's not the correct matching bracket,
then we'll return false. If, by the end of the iteration, we have something left
over in the stack, then it means it's unbalanced -- so we'll return whether
it's empty or not.</p>
<pre><code>def balance(s):
stack = []
for char in s:
if char in [<span>"(", <span>"[", <span>"{"]:
stack.append(char)
else:
# Check character is not unmatched
if not stack:
return False
# Char is a closing bracket, check top of stack if it matches
if (char == <span>")" and stack[-1] != <span>"(") or \
(char == <span>"]" and stack[-1] != <span>"[") or \
(char == <span>"}" and stack[-1] != <span>"{"):
return False
stack.pop()
return len(stack) == 0
</span></span></span></span></span></span></span></span></span></code></pre>
<p>Fun fact: "(())" is not a palindrome, nor is "()()". "())(" is a palindrome, though.</p>
<p>This problem was asked by Palantir.</p>
<p>Write an algorithm to justify text. Given a sequence of words
and an integer line length k, return a list of strings which represents each
line, fully justified.</p>
<p>More specifically, you should have as many words as possible in each line.
There should be at least one space between each word. Pad extra spaces when necessary
so that each line has exactly length k. Spaces should be distributed as
equally as possible, with the extra spaces, if any, distributed starting
from the left.</p>
<p>If you can only fit one word on a line, then you should pad the right-hand side
with spaces.</p>
<p>Each word is guaranteed not to be longer than k.</p>
<p>For example, given the list of words ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] and k = 16, you should return the following:</p>
<pre><code>["the quick brown", # 1 extra space on the left
"fox jumps over", # 2 extra spaces distributed evenly
"the lazy dog"] # 4 extra spaces distributed evenly
</code></pre>
<p>It seems like the justification algorithm is independent from the groupings, so
immediately we should figure out two things:</p>
<ul><li>How to group lines together so that it is as close to k as possible (without going over)</li><li>Given a grouping of lines, justifying the text by appropriately distributing spaces</li></ul>
<p>To solve the first part, let's write a function <code>group_lines</code> that takes in all the
words in our input sequence as well as out target line length k, and return a list of
list of words that represents the lines that we will eventually justify. Our main
strategy will be to iterate over all the words, keep a list of words for the current
line, and because we want to fit as many words as possible per line, estimate the
current line length, assuming only one space between each word. Once we go over
<code>k</code>, then save the word and start a new line with it. So our function will look
something like this:</p>
<pre><code>def min_line(words):
return <span>' '.join(words)
def group_lines(words, k):
<span>'''
Returns groupings of |words| whose total length, including 1 space in between,
is less than |k|.
'''
groups = []
current_sum = 0
current_line = []
for i, word in enumerate(wordwordss):
# Check if adding the next word would push it over
# the limit. If it does, then add |current_line| to
# group. Also reset |current_line| properly.
if len(min_line(current_line + [word])) > k:
groups.append(current_line)
current_line = []
current_line.append(word)
# Add the last line to groups.
groups.append(current_line)
return groups
</span></span></code></pre>
<p>Then, we'll want to actually justify each line. We know for sure each line we feed
from <code>group_lines</code> is the maximum number of words we can pack into a line and no
more. What we can do is first figure out how many spaces we have available to
distribute between each word. Then from that, we can calculate how much base
space we should have between each word by dividing it by the number of words minus one.
If there are any leftover spaces to distribute, then we can keep track of that
in a counter, and as we rope in each new word we'll add the appropriate
number of spaces. We can't add more than one leftover space per word.</p>
<pre><code>def justify(words, length):
<span>'''
Precondition: |words| can fit in |length|.
Justifies the words using the following algorithm:
- Find the smallest spacing between each word (available_spaces / spaces)
- Add a leftover space one-by-one until we run out
'''
if len(words) == 1:
word = words[0]
num_spaces = length - len(word)
spaces = <span>' ' * num_spaces
return word + spaces
spaces_to_distribute = length - sum(len(word) for word in words)
number_of_spaces = len(words) - 1
smallest_space = floor(spaces_to_distribute / number_of_spaces)
leftover_spaces = spaces_to_distribute - (number_of_spaces * smallest_space)
justified_words = []
for word in words:
justified_words.append(word)
current_space = <span>' ' * smallest_space
if leftover_spaces > 0:
current_space += <span>' '
leftover_spaces -= 1
justified_words.append(current_space)
return <span>''.join(justified_words).rstrip()
</span></span></span></span></span></code></pre>
<p>The final solution should just combine our two functions:</p>
<pre><code>def justify_text(words, k):
return [justify(group, k) for group in group_lines(words, k)]
</code></pre>
<p>This problem was asked by Amazon.</p>
<p>Run-length encoding is a fast and simple method of encoding strings. The basic idea is to represent
repeated successive characters as a single count and character. For example, the string "AAAABBBCCDAA" would
be encoded as "4A3B2C1D2A".</p>
<p>Implement run-length encoding and decoding. You can assume the string to be encoded have no digits
and consists solely of alphabetic characters. You can assume the string to be decoded is valid.</p>
<p>We can implement <code>encode</code> by iterating over our input string
and keeping a current count of whatever the current character is,
and once we encounter a different one, appending the count (as a string)
and the actual character to our result string.</p>
<pre><code>def encode(s):
if not s:
return <span>''
result = <span>''
current_char = s[0]
current_count = 1
for i, char in enumerate(s, 1):
if char == current_char:
current_count += 1
else:
result += str(current_count) + current_char
current_char = char
current_count = 1
result += str(current_count) + current_char
return result
</span></span></code></pre>
<p>We can implement <code>decode</code> by iterating over the encoded string and checking
each character for a digit. If it is, then calculate the correct count, and
once we find its corresponding character, extend the result with the character
count number of times and then reset the count.</p>
<pre><code>def decode(s):
count = 0
result = <span>''
for char in s:
if char.isdigit():
count = count * 10 + int(char)
else:
# char is alphabetic
result += char * count
count = 0
return result
</span></code></pre>
<p>This problem was asked by Facebook.</p>
<p>You are given an array of non-negative integers that represents a two-dimensional elevation map where each element
is unit-width wall and the integer is the height. Suppose it will rain and all spots between two walls get filled up.</p>
<p>Compute how many units of water remain trapped on the map in O(N) time and O(1) space.</p>
<p>For example, given the input [2, 1, 2], we can hold 1 unit of water in the middle.</p>
<p>Given the input [3, 0, 1, 3, 0, 5], we can hold 3 units in the first index, 2 in the second,
and 3 in the fourth index (we cannot hold 5 since it would run off to the left), so we can
trap 8 units of water.</p>
<p>Notice that the amount of water that can be filled up at a certain index i is the
smaller of the largest height to the left and the largest height to the right minus
the actual value at that point, because it will be trapped by the smaller of the two sides.
So what we can do is to create two arrays that represent the running maximum
heights, one from the left and one from the right. Then to count the total
capacity, we can run through the both arrays and add up the smaller of the two
arrays at that index.</p>
<pre><code>def capacity(arr):
n = len(arr)
left_maxes = [0 for _ in range(n)]
right_maxes = [0 for _ in range(n)]
current_left_max = 0
for i in range(n):
current_left_max = max(current_left_max, arr[i])
left_maxes[i] = current_left_max
current_right_max = 0
for i in range(n - 1, -1, -1):
current_right_max = max(current_right_max, arr[i])
right_maxes[i] = current_right_max
total = 0
for i in range(n):
total += min(left_maxes[i], right_maxes[i]) - arr[i]
return total
</code></pre>
<p>This is O(N) time, but also O(N) space, and we want constant space. So instead,
we can do this. We can find the largest element in the array, and then when we're
looking on the left of it, we only need to keep the running total to the left
(since we know the largest element on the array is on the right). And then do a
similar thing, but starting from the right side. So the general gist is this:</p>
<ul><li>Find the maximum element in the array -- let's say it's at index i</li><li>Initialize a running maximum on the left to arr[0]</li><li>Iterate from index 1 to i. At each step, update the running maximum if necessary
and then increment a variable counter with the running maximum minus the value at that array.</li><li>Do the same thing but from len(arr) - 2 to i backwards, and keep the running maximum
on the right.</li></ul>
<pre><code>def capacity(arr):
if not arr:
return 0
total = 0
max_i = arr.index(max(arr))
left_max = arr[0]
for num in arr[1:max_i]:
total += left_max - num
left_max = max(left_max, num)
right_max = arr[-1]
for num in arr[-2:max_i:-1]:
total += right_max - num
right_max = max(right_max, num)
return total
</code></pre>
<p>This problem was asked by Google.</p>
<p>The edit distance between two strings refers to the minimum number of character insertions, deletions,
and substitutions required to change one string to the other. For example, the edit distance between
“kitten” and “sitting” is three: substitute the “k” for “s”, substitute the “e” for “i”, and append
a “g”.</p>
<p>Given two strings, compute the edit distance between them.</p>
<p>First, notice that we can probably define this problem recursively. How can we notice this?
If we look at the example (kitten -> sitting) and its solution path (kitten -> sitten -> sittin -> sitting),
we can see that it's the minimum distance between sitten and sitting plus one.</p>
<p>The recurrence, then, looks like this:</p>
<ul><li>If either <code>s1</code> or <code>s2</code> are empty, then return the size of the larger of the two strings (since
we can trivially turn an empty string into a string by inserting all its characters)</li><li>Otherwise, return the minimum between:<ul><li>The edit distance between each string and the last n - 1 characters of the other plus one</li><li>If the first character in each string is the same, then the edit distance between s1[1:] and s2[1:], otherwise the same edit distance + 1</li></ul></li></ul>
<p>So, the naive recursive solution would look like this:</p>
<pre><code>def distance(s1, s2):
if len(s1) == 0 or len(s2) == 0:
return max(len(s1), len(s2))
return min(distance(s1[1:], s2) + 1,
distance(s1, s2[1:]) + 1,
distance(s1[1:], s2[1:]) if s1[0] == s2[0]
else distance(s1[1:], s2[1:]) + 1)
</code></pre>
<p>However, this runs very slowly due to repeated subcomputations. We can speed it up by using
dynamic programming and storing the subcomputations in a 2D matrix. The index at i, j will
contain the edit distance between <code>s1[:i]</code> and <code>s2[:j]</code>. Then, once we fill it up, we can
return the value of the matrix at A[-1][-1].</p>
<pre><code>def distance(s1, s2):
x = len(s1) + 1 # the length of the x-coordinate
y = len(s2) + 1 # the length of the y-coordinate
A = [[-1 for i in range(x)] for j in range(y)]
for i in range(x):
A[0][i] = i
for j in range(y):
A[j][0] = j
for i in range(1, y):
for j in range(1, x):
if s1[j- 1] == s2[i - 1]:
A[i][j] = A[i - 1][j - 1]
else:
A[i][j] = min(
A[i - 1][j] + 1,
A[i][j - 1] + 1,
A[i - 1][j - 1] + 1
)
return A[y - 1][x - 1] # return the edit distance between the two strings
</code></pre>
<p>This now takes O(N * M) time and space, where N and M are the lengths of the strings.</p>
<p>This problem was asked by Jane Street.</p>
<p>Suppose you are given a table of currency exchange rates, represented as a 2D array.
Determine whether there is a possible arbitrage: that is, whether there is some sequence
of trades you can make, starting with some amount A of any currency, so that you can end up
with some amount greater than A of that currency.</p>
<p>There are no transaction costs and you can trade fractional quantities.</p>
<p>In this question, we can model the currencies and the exchange rates
as a graph, where the nodes are the currencies and the edges are the
exchange rates between each commodity. Since our table is complete,
the graph is also complete. Then, to solve this problem, we need to
find a cycle whose edge weights product is greater than 1.</p>
<p>This seems hard to do faster than brute force, so let's try to reduce
it down to a problem we already know we can solve faster than brute force.
Hint: <code>log(a * b) = log(a) + log(b)</code>. So if we take the negative log of
the edge weights, the problem of finding a cumulative product that's
greater than 1 turns into the problem of finding a negative sum cycle.</p>
<p>The Bellman-Ford algorithm can detect negative cycles. So if we run
Bellman-Ford on our graph and discover one, then that means its
corresponding edge weights multiply out to more than 1, and thus
we can perform an arbitrage.</p>
<p>As a refresher, the Bellman-Ford algorithm is commonly used to find
the shortest path between a source vertex and each of the other vertices.
If the graph contains a negative cycle, however, it can detect it
and throw an exception (or, in our case, return true). The main idea of
Bellman-Ford is this:</p>
<p>Since the longest path in any graph has at most |V| - 1 edges, if we
take all the direct edges from our source node, then we have all the
one-edged shortest paths; once we take edges from there, we have
all the two-edged shortest paths; all the way until |V| - 1 sized paths.</p>
<p>If, after |V| - 1 iterations of this, we can still find a smaller
path, then there must be a negative cycle in the graph.</p>
<pre><code>from math import log
def arbitrage(table):
transformed_graph = [[-log(edge) for edge in row] for row in graph]
# Pick any source vertex -- we can run Bellman-Ford from any vertex and
# get the right result
source = 0
n = len(transformed_graph)
min_dist = [float(<span>'inf')] * n
min_dist[source] = 0
# Relax edges |V - 1| times
for i in range(n - 1):
for v in range(n):
for w in range(n):
if min_dist[w] > min_dist[v] + transformed_graph[v][w]:
min_dist[w] = min_dist[v] + transformed_graph[v][w]
# If we can still relax edges, then we have a negative cycle
for v in range(n):
for w in range(n):
if min_dist[w] > min_dist[v] + transformed_graph[v][w]:
return True
return False
</span></code></pre>
<p>Because of the triply-nested foor loop, this runs in O(N^3) time.</p>
<p>This problem was asked by Microsoft.</p>
<p>Compute the running median of a sequence of numbers. That is, given a stream of numbers,
print out the median of the list so far on each new element.</p>
<p>Recall that the median of an even-numbered list is the average of the two middle numbers.</p>
<p>For example, given the sequence [2, 1, 5, 7, 2, 0, 5], your algorithm should print out:</p>
<pre><code>2
1.5
2
3.5
2
2
2
</code></pre>
<p>For this problem, the trick is to use two heaps: a min-heap and a max-heap.
We keep all elements smaller than the median in the max-heap and all elements
larger than the median in the min-heap. We'll keep these heaps balanced so that
the median is always either the root of the min-heap or the max-heap (or both).</p>
<p>When we encounter a new element from the stream, we'll first add it to one of our heaps:
the max-heap if the element is smaller than the median, or the min-heap if it's bigger.
We can make the max-heap the default heap if they're equal or there are no elements.</p>
<p>Then we re-balance if necessary by moving the root of the larger heap to the smaller one.
It's only necessary if the a heap is larger than the other by more than 1 element.</p>
<p>Finally, we can print out our median: it will just be the root of the larger heap,
or the average of the two roots if they're of equal size.</p>
<p>Since Python has really terrible support for heaps, we'll pretend we have some heap
objects that have the standard interface:</p>
<pre><code>def get_median(min_heap, max_heap):
if len(min_heap) > len(max_heap):
return min_heap.find_min()
elif len(min_heap) < len(max_heap):
return max_heap.find_max()
else:
min_root = min_heap.find_min()
max_root = max_heap.find_max()
return (min_root + max_root) / 2
def add(num, min_heap, max_heap):
# If empty, then just add it to the max heap.
if len(min_heap) + len(max_heap) <= 1:
max_heap.insert(num)
return
median = get_median(min_heap, max_heap)
if num > median:
# add it to the min heap
min_heap.insert(num)
else:
max_heap.insert(num)
def rebalance(min_heap, max_heap):
if len(min_heap) > len(max_heap) + 1:
root = min_heap.extract_min()
max_heap.insert(root)
elif len(max_heap) > len(min_heap) + 1:
root = max_heap.extract_max()
min_heap.insert(root)
def print_median(min_heap, max_heap):
print(get_median(min_heap, max_heap))
def running_median(stream):
min_heap = minheap()
max_heap = maxheap()
for num in stream:
add(num, min_heap, max_heap)
rebalance(min_heap, max_heap)
print_median(min_heap, max_heap)
</code></pre>
<p>This runs in O(N) space. In terms of time, each new element takes O(log N) time to manipulate the heaps,
so this will run in O(N log N) time.</p>
<p>This problem was asked by Quora.</p>
<p>Given a string, find the palindrome that can be made by inserting the fewest number of characters as possible anywhere in the word.
If there is more than one palindrome of minimum length that can be made, return the lexicographically earliest one (the first one alphabetically).</p>
<p>For example, given the string "race", you should return "ecarace", since we can add three letters to it (which is the smallest amount to make a palindrome).
There are seven other palindromes that can be made from "race" by adding three letters, but "ecarace" comes first alphabetically.</p>
<p>As another example, given the string "google", you should return "elgoogle".</p>
<p>Notice that whenever we add a character, it should ideally match the one on the other side of the string.
We can use the following recurrence to solve this problem:</p>
<ul><li>If <code>s</code> is already a palindrome, then just return <code>s</code> -- it's already the shortest palindrome we can make</li><li>If the first character of <code>s</code> (let's call it <code>a</code>) is the same as the last, then return <code>a + make_palindrome(s[1:-1]) + a</code></li><li>If the first character of <code>s</code> is different from the last (let's call this <code>b</code>), then return the minimum between:<ul><li><code>a + make_palindrome(s[1:]) + a</code></li><li><code>b + make_palindrome(s[:-1]) + b</code>
or the lexicographically earliest one if their lengths are equal.</li></ul></li></ul>
<p>So a naive recursive solution might look like this:</p>
<pre><code>def is_palindrome(s):
return s == s[::-1]
def make_palindrome(s):
if is_palindrome(s):
return s
if s[0] == s[-1]:
return s[0] + make_palindrome(s[1:-1]) + s[-1]
else:
one = s[0] + make_palindrome(s[1:]) + s[0]
two = s[-1] + make_palindrome(s[:-1]) + s[-1]
if len(one) < len(two):
return one
elif len(one) > len(two):
return two
else:
return min(one, two)
</code></pre>
<p>Recall that the min of two strings in python will return the lexicographically earliest one!</p>
<p>However, this algorithm runs in O(2^N) time, since we could potentially make two recursive calls each time.
We can speed up using dynamic programming, as usual. We can either <a>memoize</a>
our results so that we don't duplicate any work, or use a table and do bottom-up programming.</p>
<p>Let's start with memoization. We can keep a cache and store all our results when we compute them in the cache.
If we come across a string we've seen before, then we just need to look it up in the cache.</p>
<pre><code>cache = {}
def is_palindrome(s):
return s == s[::-1]
def make_palindrome(s):
if s in cache:
return cache[s]
if is_palindrome(s):
cache[s] = s
return s
if s[0] == s[-1]:
result = s[0] + make_palindrome(s[1:-1]) + s[-1]
cache[s] = result
return result
else:
one = s[0] + make_palindrome(s[1:]) + s[0]
two = s[-1] + make_palindrome(s[:-1]) + s[-1]
cache[s] = min(one, two)
return min(one, two)
</code></pre>
<p>However, this is inefficient due to buildup in the call stack. We can build a 2D table instead.
We'll store, in each index, the shortest palindrome that can be made in the substring defined
from <code>i</code> to <code>i + j</code>. Then instead of calling ourselves recursively, we'll just look up the values in
our table:</p>
<pre><code>def make_palindrome(s):
if len(s) <= 1:
return s
table = [[<span>'' for i in range(len(s) + 1)] for j in range(len(s) + 1)]
for i in range(len(s)):
table[i][1] = s[i]
for j in range(2, len(s) + 1):
for i in range(len(s) - j + 1):
term = s[i:i + j]
first, last = term[0], term[-1]
if first == last:
table[i][j] = first + table[i + 1][j - 2] + last
else:
one = first + table[i + 1][j - 1] + first
two = last + table[i][j - 1] + last
if len(one) < len(two):
table[i][j] = one
elif len(one) > len(two):
table[i][j] = two
else:
table[i][j] = min(one, two)
return table[0][-1]
</span></code></pre>
<p>Because we store a part of our input string in each index of our matrix, the time and space complexity
for this solution is O(N^3).</p>
<p>This problem was asked by Google.</p>
<p>Given an array of strictly the characters 'R', 'G', and 'B', segregate the values of the array
so that all the Rs come first, the Gs come second, and the Bs come last. You can only
swap elements of the array.</p>
<p>Do this in linear time and in-place.</p>
<p>For example, given the array ['G', 'B', 'R', 'R', 'B', 'R', 'G'], it should become ['R', 'R', 'R', 'G', 'G', 'B', 'B'].</p>
<p>It may be easier to first consider an easier problem: one with only two possible values, say
'R' and 'G'. Then we could maintain the following loop invariant quite easily:</p>
<ul><li>Maintain three sections of the array using two indices, <code>low</code> and <code>high</code>:<ul><li>Strictly 'R's: array[:low]</li><li>Unknown: array[low:high]</li><li>Strictly 'G's: array[high:]</li></ul></li></ul>
<p>Initially, low will be 0 and high will be <code>len(array) - 1</code>, since the whole array is unknown.
As we iterate over the array, we'll swap any 'G's we see to the third section and decrement <code>high</code>.
If we see an 'R', then we just need to increment <code>low</code>, since that's where it belongs. We can terminate
once <code>low</code> crosses <code>high</code>. So we can gradually shrink our unknown section through the following algorithm:</p>
<pre><code>def partition(arr):
low, high = 0, len(arr) - 1
while low <= high:
if arr[low] == <span>'R':
low += 1
else:
arr[low], arr[high] = arr[high], arr[low]
high -= 1
</span></code></pre>
<p>This correctly partitions our array into two separate categories. How can we extend this to three partitions?
Let's maintain four sections using 3 indices, <code>low</code>, <code>mid</code>, and <code>high</code>:</p>
<ul><li>Strictly 'R's: array[:low]</li><li>Strictly 'G's: array[low:mid]</li><li>Unknown: array[mid:high]</li><li>Strictly 'B's: array[high:]</li></ul>
<p>We'll initialize <code>low</code> and <code>mid</code> both to 0, and <code>high</code> to <code>len(array) - 1</code> so that our unknown section is the whole array, as before.
To maintain this invariant, we should do the following:</p>
<ul><li>Look at array[mid]:<ul><li>If it's <code>R</code>, then swap <code>array[low]</code> with <code>array[mid]</code> and increment <code>low</code> and <code>mid</code></li><li>If it's <code>G</code>, then just increment <code>mid</code>; it's where it should be</li><li>If it's <code>B</code>, then swap <code>array[mid]</code> with <code>array[high]</code> and decrement <code>high</code></li></ul></li></ul>
<p>Once <code>mid</code> crosses over with <code>high</code>, then our unknown section is gone and we can terminate.</p>
<p>Our solution looks like this:</p>
<pre><code>def partition(arr):
low, mid, high = 0, 0, len(arr) - 1
while mid <= high:
if arr[mid] == <span>'R':
arr[low], arr[mid] = arr[mid], arr[low]
low += 1
mid += 1
elif arr[mid] == <span>'G':
mid += 1
else:
arr[mid], arr[high] = arr[high], arr[mid]
high -= 1
</span></span></code></pre>
<p>P.S. This problem is also called the <a>Dutch national flag problem</a>!</p>
<p>This problem was asked by Dropbox.</p>
<p>Given the root to a binary search tree, find the second largest node in the tree.</p>
<p>An in-order traversal of the binary search tree would give us all the nodes of the tree
in sorted order. So the naive solution here might be do an in-order traversal of the
tree, store it in an array, and return the second-to-last element in the array.</p>
<p>This takes O(N) time and space since we have to go through and store every node in the tree.</p>
<p>We can do better. Notice that the in-order traversal explores always the left node first before
the current node. We could do something similar to that by exploring the right node first.</p>
<p>Let's do a reverse in-order traversal, where we first call ourselves recursively on the right node.
Because it's reversed, that should give us the binary tree in reverse sorted order.</p>
<p>So we can keep a counter, and once we start processing the current node we can increment the counter.
Once it hits 2, that means the current node we're looking at is the second largest, so we can stuff
it in a variable and eventually return that.</p>
<pre><code>def second_largest(root):
def inorder(node):
if not node or count[0] == 2:
return
if node.right:
inorder(node.right)
count[0] += 1
if count[0] == 2:
val.append(node.val)
return
if node.left:
inorder(node.left)
count = [0]
val = []
inorder(root)
return val[0]
</code></pre>
<p>Unfortunately because of Python's <a>demented scoping rules</a>,
we have to wrap <code>count</code> and <code>val</code> in a list. Ugly!</p>
<p>This problem was asked by Google.</p>
<p>The power set of a set is the set of all its subsets. Write a function that, given a set, generates its power set.</p>
<p>For example, given the set <code>{1, 2, 3}</code>, it should return <code>{{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}</code>.</p>
<p>You may also use a list or array to represent a set.</p>
<p>To gain some intuition about this problem, let's try some examples:</p>
<ul><li>If we're given the empty set (<code>{}</code>), then the power set is a set with only the empty set in it: <code>{{}}</code></li><li>If we're given a set with one element in it (<code>{a}</code>), then the power set is a set with two sets: an empty set and a set with the element in it: <code>{{}, {a}}</code></li><li>If we're given a set with two elements in it (<code>{a, b}</code>), then the power is has four sets: <code>{{}, {a}, {b}, {a, b}}</code></li></ul>
<p>What's the pattern?</p>
<p>Notice that going from the empty set to <code>{a}</code>, that we still keep the empty set in our result and have another set with <code>a</code> in it.
Similarly, when going from one element to two, we keep the same result set with one element (<code>{}, {a}</code>), but we also have a duplicate set with the <code>b</code> in it (<code>{b}, {a ,b}</code>).</p>
<p>So we can use the following recursive formula to generate the power set:</p>
<ul><li>If the input set is empty, return a set with an empty set in it</li><li>Otherwise, take an element from our set. Let's call it <code>x</code>.</li><li>Generate the power set of our input set without x. Let's call it <code>result</code>, for lack of a better name.</li><li>Return the union of <code>name</code> with <code>name + x</code></li></ul>
<pre><code>def power_set(s):
if not s:
return [[]]
result = power_set(s[1:])
return result + [subset + [s[0]] for subset in result]
</code></pre>
<p>This runs in O(2^N) time and space, since that's how many subsets there are.</p>
<p>This problem was asked by Microsoft.</p>
<p>You have an N by N board. Write a function that, given N, returns the number of possible arrangements of the
board where N queens can be placed on the board without threatening each other, i.e. no two queens
share the same row, column, or diagonal.</p>
<p>If we were to attempt to solve this problem using brute force, we would quickly
find out that it would be prohibitively expensive. Consider a typical 8 by 8 board:
we have 64 spots to place 8 queens, so that's 64 choose 8 possible placements.
In general, that's factorial in runtime!</p>
<p>This problem is ripe for solving with backtracking. In backtracking, we
can visualize the search space like a tree, and we would explore it depth-first.
Each node would be a possible configuration. If the configuration contains
eight queens and is valid, then we're done and we can add it to our count.
Otherwise, we can try to place another queen somewhere on the board and
search from there. If we encounter an invalid board, then we can just prune
the entire subtree from our search -- there's no point in exploring a board
that we know won't work.</p>
<p>Notice we can pare down the search space by ensuring we only place queens
in distinct rows, since we know that two queens can never occupy the same row.</p>
<p>Now we can just represent the board as a one-dimensional array of max size N,
where each value represents which column the queen is on. For example, one
solution for N = 4 would just be [1, 3, 0, 2].</p>
<pre><code>def n_queens(n, board=[]):
if n == len(board):
return 1
count = 0
for col in range(n):
board.append(col)
if is_valid(board):
count += n_queens(n, board)
board.pop()
return count
def is_valid(board):
current_queen_row, current_queen_col = len(board) - 1, board[-1]
# Iterate over all already-placed queens and check if any of them can attack
# each other.
for row, col in enumerate(board[:-1]):
diff = abs(current_queen_col - col)
if diff == 0 or diff == current_queen_row - row:
return False
return True
</code></pre>
<p>If you're interested in optimizing this problem even further, check out <a>this paper</a>
that uses constant space by representing all columns and diagonals simply with integers! However, this depends on n being smaller than the number of bits in your integer.</p>
<p>This problem was asked by Dropbox.</p>
<p>Conway's Game of Life takes place on an infinite two-dimensional board of square cells.
Each cell is either dead or alive, and at each tick, the following rules apply:</p>
<ul><li>Any live cell with less than two live neighbours dies.</li><li>Any live cell with two or three live neighbours remains living.</li><li>Any live cell with more than three live neighbours dies.</li><li>Any dead cell with exactly three live neighbours becomes a live cell.</li></ul>
<p>A cell neighbours another cell if it is horizontally, vertically, or diagonally adjacent.</p>
<p>Implement Conway's Game of Life. It should be able to be initialized with a starting list of
live cell coordinates and the number of steps it should run for. Once initialized, it should print
out the board state at each step. Since it's an infinite board, print out only the relevant coordinates, i.e.
from the top-leftmost live cell to bottom-rightmost live cell.</p>
<p>You can represent a live cell with an asterisk (<code>*</code>) and a dead cell with a dot (<code>.</code>).</p>
<p>This is a straightforward implementation problem, so your solution may differ.
Since our board is infinite, we can't create a matrix that represents our whole board.</p>
<p>Instead, we'll represent each cell simply as a pair of cartesian coordinates (row, col).
In this solution, we keep the set of cells as a property on our class. Each
tick, we create a new set of cells that represents the next generation. We
pretty much have to do this so that changing the board doesn't affect the future
cells we process from the current generation.</p>
<p>We look at each live cell, compute the number of neighbours for each one, and
preserve it according to the rules.</p>
<p>Similarly, we look at all the neighbouring cells of all the live cells, since
any of them could potentially become alive due to rule #4. If any of them have
exactly 3 neighbours, then we should add them to the set of new cells.</p>
<p>For printing the board, we need to find the top-leftmost cell and the bottom-rightmost
cell. These are our boundaries for the board. Then we can print out each row and cell
one by one and checking if the current spot is in our set of cells.</p>
<p>It's useful to create some helper functions here. In our case, we have:</p>
<ul><li><code>get_number_of_live_neighbours</code></li><li><code>get_neighbouring_cells</code></li><li><code>get_boundaries</code></li></ul>
<pre><code>class GameOfLife:
def \_\_init\_\_(self, n, cells=set()):
# Each cell will be a tuple (row, col)
self.cells = cells
for _ in range(n):
self.print_board()
self.next()
def get_number_of_live_neighbours(self, row, col):
count = 0
for cell_row, cell_col in self.cells:
if abs(cell_row - row) > 1:
continue
if abs(cell_col - col) > 1:
continue
if cell_row == row and cell_col == col:
continue
count += 1
return count
def get_neighbouring_cells(self, row, col):
return set([
(row - 1, col - 1),
(row, col - 1),
(row + 1, col - 1),
(row - 1, col),
(row + 1, col),
(row - 1, col + 1),
(row, col + 1),
(row + 1, col + 1),
])
def next(self):
new_cells = set()
# Go through each cell, look for neighbours, decide whether to append to new list
for row, col in self.cells:
num_of_neighbours = self.get_number_of_live_neighbours(row, col)
if 2 <= num_of_neighbours <= 3:
new_cells.add((row, col))
potential_live_cells = set()
for row, col in self.cells:
potential_live_cells = potential_live_cells.union(self.get_neighbouring_cells(row, col))
potential_live_cells = potential_live_cells - self.cells
# Go through each potential live cell, get the number of neighbours, and add if = 3
for row, col in potential_live_cells:
num_of_neighbours = self.get_number_of_live_neighbours(row, col)
if num_of_neighbours == 3:
new_cells.add((row, col))
self.cells = new_cells
def get_boundaries(self):
top = min(self.cells, key=lambda cell: cell[0])[0]
left = min(self.cells, key=lambda cell: cell[1])[1]
bottom = max(self.cells, key=lambda cell: cell[0])[0]
right = max(self.cells, key=lambda cell: cell[1])[1]
return top, left, bottom, right
def print_board(self):
top, left, bottom, right = self.get_boundaries()
print(<span>'--------------------------------------')
for i in range(top, bottom + 1):
for j in range(left, right + 1):
if (i, j) in self.cells:
print(<span>'*', end=<span>'')
else:
print(<span>'.', end=<span>'')
print(<span>'')
print(<span>'--------------------------------------')
</span></span></span></span></span></span></span></code></pre>
<p>This problem was asked by Google.</p>
<p>Given an array of integers where every integer occurs three times except for one integer, which only occurs once,
find and return the non-duplicated integer.</p>
<p>For example, given [6, 1, 3, 3, 3, 6, 6], return 1. Given [13, 19, 13, 13], return 19.</p>
<p>Do this in O(N) time and O(1) space.</p>
<p>We can find the unique number in an array of <em>two</em> duplicates by XORing all the numbers
in the array. What this does is cancel out all the bits that have an even number of 1s,
leaving only the unique (odd) bits out.</p>
<p>Let's try to extend this technique to three duplicates. Instead of cancelling out all the
bits with an even number of bits, we want to cancel those out that have a number of bits
that are multiple of three.</p>
<p>Let's assume all integers fit in 32 bits. Then let's create an array 32 zeroes long, and
when iterating over each number in our array, we can add up all the bits to its proper spot in the
array. Finally, we'll go over each bit in the array and make it equal to itself modulo 3. This
means that any bit that has been set some multiple of 3 times will effectively be cleared, leaving
only the bit from the unique number.</p>
<pre><code>def find_unique(arr):
result_arr = [0] * 32
for num in arr:
for i in range(32):
bit = num >> i & 1
result_arr[i] = (result_arr[i] + bit) % 3
result = 0
for i, bit in enumerate(result_arr):
if bit:
result += 2 ** i
return result
</code></pre>
<p>This runs in linear time, since we iterate over the array once, and in constant space, since we initialize an array of constant size.</p>