SWE Flashcards

Question

SWE: How do you simply implement a stack -- with pop, push, peek and isEmpty methods -- if you already have an implementation of another key data structure?

Answer 1

You need a linked list implementation. Recall that a linked list has a null node as the "final node" on the right with no descendants, and the "first node" being the root node which has nothing pointing to it. To add an element to the linked list, you make it the root and make the old root its descendant. To remove an element, you make its descendant the root. In other words, *a linked list is basically already an implementation of a stack*. To push, you just add the value to the LL. To pop, you just remove the root from the LL. To peek, you just look at the value in the root of the LL. To check if it's empty, you see if the root node is the null node.

Answer 2

A queue is a data structure which stores elements. You can add an element (or "push" it), and you can remove an element (or "pop" it). Elements are removed the same order as how they're added, or first-it-first-out. This is like a queue of people who are waiting in line, where the first person to get in line is the first person to get out.

Answer 3

We will implement the queue using a slight variation of a linked list. *With a normal linked list*, the list is visualized several nodes with arrows pointing left to right. On the far left is the root node; the linked list data structure keeps track of the root node, and a newly inserted value enters on the left and becomes the new root. The linked list also has a null node to the far right. In a queue, we are going to insert elements on the *right side* of the linked list, or the *back*, and we (same as a normal linked list) remove elements from the *left side* of the linked list, or the *front*. So the arrows between nodes point from *front to back*, which may seem unintuitive. Thus, we now *keep track of two nodes* rather than just one. We keep track of the *front*, or the root, as well as the *back*, which is the last node in the list. The back node doesn't point to anything; it's easier in a queue to *not have a null node* and instead keep an "empty" bool in the data structure. When the queue's empty, set the front and back pointers to just point to NULL. To push a value onto the back of the list, we make a new node, have the current back point to it, and set the new node as the back. To pop a value from the list, we set the front's descendant as the new front. To peek, we return the value at the front. For isEmpty, we check the bool.

Answer 4

A tree is a data structure which organizes nodes in terms of parent-child relationships, where each node can have at most one parent, but nodes could have several children. A tree is basically a DAG, or directed acyclic graph, that is also connected, meaning it's not like two or more graphs that don't touch each other.

Answer 5

A leaf is a tree with no descendants; the root is a tree with no parents.

Answer 6

It's simply all node A as the root of a tree with all of the descendants. It's basically the tree you get if you erase anything from the original tree that wasn't A or a descendant of A.

Answer 7

A binary tree is where each node can have at most two descendants (but only 1 parent). A ternary tree is where each node can have at most three descendants (but only 1 parent). And so on.

Answer 8

It's a binary tree where, for every node n, you have the property that: *All left descendents of n* **\<** *n* **\<** *all right descendants of n.* (There can be differing opinions on equality in the tree. Sometimes above will read \<= n

Answer 9

To search for an element x, you follow a branch down the tree using the information that the tree is in a way "sorted". So if you're at node y and y \< x, you move down the left branch, because x then would definitionally need to be on the left. If x \< y, you move down the right branch. To insert, you do the same thing and move down the tree in this way. If you find x, you typically wouldn't insert it again. If you find a leaf without finding x, you insert it on the appropriate side of the leaf.

Answer 10

Because of the ordering properties of the BST, you only need to search down one branch rather than the whole tree. This means that if the tree is balanced, you can search for an element in only log(n) time.

Answer 11

A complete binary tree has every level fully filled out, except for potentially the last level. If the last level isn't full, it's filled in from left to right.

Answer 12

The height of a tree is the length of its longest branch. Specifically, it's the *number of edges along the longest branch*. We say that just a root node has a height of 0, because the longest path has no edges. To find the height of an arbitrary tree, just follow all the branches down to their leaves, adding 1 to that branch's length for each node past the root, and thus for each edge. Take the max branch length as the height.

Answer 13

Technically, a balanced tree is a tree such that, for every subtree present, the height of *its* left subtree and *its* right subtree differ by at most 1. Colloquially, it's basically a tree that is balanced "enough" among all subtrees such that we can expect traversals of a single branch to take O(log n) time, rather than linear time.

Answer 14

You will need a treeNode class, which has a *data* attribute holding its value and a *descendants* attribute which holds a *list* of its descendants. You can also include a Tree class, whose only attribute is a root node, but it's usually not helpful for interviews? If you want specifically a binary tree, your treeNode class would have three attributes: data, leftDescendant, and rightDescendant.

Answer 15

In an in-order traversal, at every node, you explore the left subtree of the node (and visit all of the nodes in that subtree), then visit the node itself, then visit all of the nodes in the right subtree. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 16

In an pre-order traversal, at every node, visit that node, *then* you explore the left subtree of the node (and visit all of the nodes in that subtree), and then visit all of the nodes in the right subtree. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 17

In an post-order traversal, at every node, you explore the left subtree of the node (and visit all of the nodes in that subtree), then visit all of the nodes in the right subtree, and then *lastly* you visit the node itself. A recursive function is as follows, which implicitly handles the base case by doing nothing if we've reached a null node:

Answer 18

In a min heap, all nodes' descendants are larger than the node. For a max heap it's simply the opposite: all nodes' descendants are smaller, so the max is at the top. So the implementation is basically the exact same, you just do the orderings in reverse.

Answer 19

A *complete* binary tree where each node is smaller than its children, so the min of the tree is always at the root.

Answer 20

A heap must be a complete tree, meaning the bottom layer is filled out from left to right; so, we start by placing our new node in the bottom layer, to the right of the rightmost node in that layer. Then, we continually swap that new node with its *parent*, until its parent has a lower value than it. This restores the min heap to its correct form.

Answer 21

We first remove the minimum value from the top (it's always at the top). A min heap must be a complete tree, so next we take the rightmost node on the bottom layer and place it *at the top, where the min value originally was*. Now we need to restore the ordering. We continually look at the node we just moved to the top and its two descendants. If the node we moved is the smallest of those three, we stop. Otherwise, we swap it with whichever of its descendants is the smallest of the three.

Answer 22

When inserting a new element, we put it at the bottom so as to preserve the completeness of the tree, then swap it up a branch until the ordering condition has been restored. Because the tree is complete, it is balanced, and so the branch is at most O(log n) in length. Thus we have an **O(log n)** solution.

Answer 23

A complete tree is always balanced. A balanced tree is not always complete.

Answer 24

Constant time: it's always at the top.

Answer 25

When removing the min element, you take an element from the bottom level and put it at the top where the min element was, then swap it down a branch until the ordering condition is restored. Because the min heap is a complete tree, it is balanced, and so the branch is at most length O(log n). Thus, it's an **O(log n)** algorithm.

Answer 26

A min heap is somewhat fast to maintain, as inserting an element takes O(log n) time, and you can always find the min element in constant time. Also, removing the min element takes only O(log n) time. This means that you always have the "most important" or "most pressing" element in the heap ready to go at a moment's notice. This gives the name priority queue: drawing from a queue returns elements, and this one does so in order of priority, as several draws will yield the highest priority elements, in order. This would be useful in a hospital, for example, as we want to continually admit the patient in the most critical condition.

Answer 27

A trie is a tree with a letter at each node. As you move down a branch, you find a word, or a prefix for a longer word. Branches sometimes have \* values at the leaves to signify the end of a word. They're often used for interview problems where you need to look up words, or several words with the same prefix, etc. You can see if a word is present in the trie in O(length of word) time, because you can follow the prefix down the tree.

Answer 28

A list of nodes (or vertices), and a list of edges connecting those nodes.

Answer 29

In a directed graph, the edges have arrows from one node to another. In an undirected graph, the edges don't have arrows, and count as going in both directions?

Answer 30

A sequence of vertexes connected by edges. For directed graphs, a path must follow the arrows.

Answer 31

A graph where no node has a path out and *back to itself.*

Answer 32

Adjacency list, and adjacency matrix.

Answer 33

An adjacency matrix is one way to store a graph in memory. For a graph with n vertices, an adjacency matrix is an nxn matrix, where the ijth entry is a 1 if there is an edge from i to j, and a 0 if not. For undirected graphs, the matrix is symmetric, because an edge from i to j also goes from j to i.

Answer 34

An adjacency list is one way to store a graph in memory. For each of your n nodes i, you have a list of the nodes j for which there is an edge from i to j. (If your graph is undirected, there will be some redundancy in your adj. list, as each edge needs to appear twice.)

Answer 35

To do it object-oriented, you could have a Graph class, with objects having a list of nodes as an attribute, as well as a Node class, with objects having a name attribute as well as a ListOfEdges attribute, which is just a list of the nodes into which it has an outgoing edge. The picture shows this implementation. To do it with just arrays, you could assume your n vertices are indexed from 0 to n-1 and use a 2d list: each entry at position i in the high-level list would be a list of indices j into which i has an outgoing edge.

Answer 36

An adjacency list only needs to store the edges that do exist, whereas an adjacency matrix stores an entry for each possible edge from i to j. Adjacency lists are thus more space efficient, especially if the graph is sparse. With an adjacency matrix, you can check for an edge in constant time; for an adjacency list, you need to iterate through all of the edges in one of the nodes, taking O(# neighbors) time. Conversely, with an adjacency list, you can get a list of out-neighbors in constant time, which is useful for things like searches from A to B. In an adjacency matrix, you need to iterate through all possible neighbors to see which are actually neighbors.

Answer 37

Depth-first search is a way of searching through a graph, to either visit all of its nodes, or to find a specific node. In depth-first search, you *move fully down a branch* before moving to the next one, hence "depth-first." A depth-first search starts at a specific node in the graph; you may be given a specific start node, or it may be arbitrary. You search through each of the node's neighbors, but you *move fully down a branch* before moving to the next. You must keep track of all of the nodes you have visited, because if you see a node you've already visited while exploring a branch, you don't visit it again.

Answer 38

A recursive approach works best:

Answer 39

Breadth-first search is a way to search through a graph, in order to either traverse all of the nodes, or to find a path from one node to another. You're given a starting node (or choose one randomly), and then *you visit all of its neighbors before visiting any of* their *neighbors*, hence "breadth-first." You can think of it as moving out from the root layer-by-layer.

Answer 40

We use an *iterative* solution (unlike DFS, which is recursive) which takes advantage of a queue. The queue stores nodes we need to visit, in the order in which we need to visit them. We "mark" a node before pushing it onto the queue, and when iterating through a node's neighbors, we only push it onto the queue if it isn't marked. This is because we always push a node on after we mark it, so if it's already market, that means it's already in the queue, and we don't want to put it in the queue twice and thus visit it twice.

Answer 41

You are searching for an element in a *sorted* array. You basically continually divide the array in half. You look at the middle element, and if it's larger than the target element, you search in the half of the array below the middle; if the middle is smaller than the target element, you search in the half of the array above the middle; and if you found it you're done. You recurse like this, and if you run out of list, then it's not there.

Answer 42

Binary search is O(log n) if you have a sorted list, otherwise you can't run it. Binary search is great if you're searching the same list for elements over and over again. In this case, you sort the list first, which takes O(n log n), then you can make lots of searches all at O(log n). If you didn't sort, you would save that O(n log n) cost, but all of your searches would be O(n) each.

Answer 43

If we're computing n of something; often times we can compute the nth given the n-1th, and this lends itself to recursion

Answer 44

1. Top-down: start by figuring out out how to solve for the nth by calling your function for the n-1st value, and solve that using the n-2nd, and so on. 2. Bottom-up: figure out how to first solve for your first value, then use that to find your second, and on up until you reach n. 3. Figure out how to divide the problem of size n into two problems of size n/2, as with merge sort.

Answer 45

Recursive solutions are often much more space inefficient than iterative solutions. Say we're generating n numbers. In an iterative solution, we probably generate the first, then the second, then the third, *and once we're done with a number, we can reuse the space that computed it for the next.* Conversely, if we have a recursive solution where we base the nth number on a call for the n-1st number, and base that on a call for the n-2nd number and so on, *all n calls must be in memory at the same time*. This can often lead to O(n) space (or even worse), when the iterative solution may have worked in constant space.

Answer 46

Drawing a tree of the recursive calls, then trying to figure out how much runtime each of the calls in the tree takes on average.

Answer 47

Sometimes in recursive algorithms, we need to solve the same subproblem more than once. An example of this is fibonacci numbers: because f(n) builds on two different subproblems via f(n) = f(n-2) + f(n-1), the tree may have a ton of repeats, and this can cause giant slowdowns in memory (recursing fibonacci takes *exponential time*). To solve this, we use memoization, which simply means whenever we calculate the answer to a sub-problem, we *store the answer*, in something like an array or a hash table (more often hash table in my experience). And whenever we go to calculate a sub-problem, we first check to see if the answer is already stored, in which case we can just pull the answer. This means we *onle need to calculate every subproblem once*, which greatly helps runtime (memoizing fibonacci numbers using an array, for example, brings the runtime down to O(n)).

Answer 48

When calculating a big value recursively, *we start at the beginning* with the base cases, and work our way up the recursion, storing the answers to progressively larger values until we get to our target. We can store the answers in an array or, more commonly in my experience from classes like 210, a 2d-array.

Answer 49

The stack and the heap

Answer 50

The stack is a device in computer memory that *stores local variables for your function calls*. It is also a literal stack: whenever you create a local variable during a function call, it is pushed onto the stack; whenever you need to access a variable in the stack, we find it presumably by popping elements until we get the one we need, then putting everything back in. It is also a *temporary* memory storage; when your function call ends, all of the local variables from the function call are popped off the stack and lost forever. Because of this, you don't need to worry about deleting things from memory yourself; *the computer handles garbage-collection automatically* for the stack.

Answer 51

The heap stores *global variables* which can be accessed by any function, i.e. with global scope. In order to write to the heap, you must use a function like malloc() or calloc(). To access something on the heap, you need a pointer to it. On the heap, *garbage collection is not handled automatically for you*. You must take care to delete values when you are done using them, otherwise you'll have a memory leak.

Answer 52

The stack has very fast access, and you don't have to worry about memory leaks. However, variables on the stack can only be accessed within the function call that createed them, and on most CPUS the stack has a memory limit that is much less than the heap.

Answer 53

You're given a situation that contains a variety of different ***objects***, as well as ***relationships between objects***, and ***processes involving these objects***. You will need to design an object-oriented framework that will represent these objects/relationships and carry out these processes For example, say you need to do represent a restaurant. You'll have objects like customers, tables, servers, and orders, with servers having their customers, and processes including seating guests, taking orders, paying for food, etc. These questions will often involve a fair amount of ambiguity. It will be up to you to brainstorm the most important objects, relationships and processes, as well as ask the interviewer for clarification on the use case and its key elements.

Answer 54

A class is an object types, and subclasses are just specific subsets of that object type. If B is a subclass of A, then every instance of B is an instance of A, and we can do A-related operations on instances of B, but not vice-versa. For example, maybe we have a Vehicle class, and we have Truck, Car, and Motorcycle subclasses. An instance of any of these subclasses is still an instance of Vehicle, and we can still get Vehicle attributes like mpg, top speed, etc. But the subclasses also have additional information: maybe Truck has truckbed\_size, or Motorcycle has gang or something. Subclasses for one allow us not to retype lots of code. Rather than saying that each of Truck, Car and Motorcycle have an mpg attribute (or a common method), we say that Vehicle does, and then they all inherit it. *Inheritance decreases duplicate code.* The class-subclass system allows us to keep track of similar objects, while also giving needed attributes to individual object types, in a neat and intuitive way.

Answer 55

a.f() This is kinda obvious, because it's a method, but just remember this.

Answer 56

1. **List carefully** for any *unique information* about the problem (i.e. maybe the list is sorted, or maybe your algorithm is run many times), and ask any necessary **clarifying questions**. 2. Draw and think through a **specific example**, which is *large enough* to think through the problem and is *not a special case*. 3. Talk through a **brute-force solution** to establish a baseline, but *don't code it up*. Also explain baseline *time and space complexity*. 4. **Find a better algorithm**, through speaking and consideration. Will expand on this in other cards. 5. Walk through the algorithm again to **make sure you understand it** and *can code it effectively.* Potentially use *brief pseudo-code* here. 6. **Code your solution** elegantly, with good style. Start in top-left of board and write neatly, *modularize code* as much as possible, use good variable names *but potentially abbreviate long ones after 1st use*. Also, keep a *list of things to improve or test* as you code*.* 7. **Test your code** and iteratively **improve** it. Read through lines, and look for *weird-looking code* as well as *code that tends to cause issues* (base cases in recursion, integer division, etc). Then, try *small test cases* you can get throug quickly, as well as *edge cases.*

Answer 57

BUD is perhaps the most important method for figuring out how to improve your existing solution. It stands for **Bottleneck, Unnecessary Work, Duplicated Work**. You look for these things, in this order, and try to make them faster or better. **Bottleneck**: Identify the part of your algorithm that is costing the most in terms of time complexity, or find the part that is "causing the big O to be what it is." If your algorithm is to first sort an array in n log n time, and then to search for an element in log n time, then there's not buch use trying to optimize the searching part: the first step of sorting is the bottleneck, so try to fix this. **Unnecessary Work**: Find work your algorithm is doing, or can do in certain instances, that isn't contributing to finding a solution. For example, maybe your algoritm is iterating through a list, and it could stop the iteration when it finds the answer in the middle, but doesn't: here you should institute a break condition in the loop to avoid unnecessary work. Or, maybe your algorithm is iterating through all possible triples to solve an equation like a+b+c = 0, when it could just iterate through pairs, and then check the only possible value for the remaining number c. Look for things like that. **Duplicated Work**: Maybe your brute force/nested loop solution checks the same possible answer twice, or performs the same process twice because it occurs in two different parts of the problem. Look for these and figure out how to only do work once: maybe you memoize, or otherwise come up with a system.

Answer 58

**DIY** (do it yourself): Walk through another example, maybe one that's a bit bigger, and rather than thinking about computery and algorithmic jargon, just try to solve the example in the way your human brain intuitively does it. Often you default to something that's pretty good, so turn your brain off and see what that default is. **Simplify and generalize:** Try to solve a simplified, easier version of the problem, and then see how the ideas from that simplified version can be applied to the original problem. **Base case and build:** First, just solve the problem for an instance that is "size 1", whatever that means for your problem. Then, try to solve it for "size 2", then "size 3", etc. As you do this, try to find ways to use your size 2 solution during size 3, or use size 3 during size 4, or wherever you can see it work. This can lead to a good recursive solution. **Data structure brainstorm:** Run through a bunch of data structures (array, hash table, heap, sorted list, tree, graph, linked list, etc) and try to think of how you could apply it. Maybe one of them has the runtime you're looking for for a particular action, etc.

Answer 59

For a given task, best conceivable runtime is *a runtime or big O for the algorithm that you think can't be beaten by any algorithm*. For example, printing all elements in a list has a BCR of O(n), since you have to "touch" every element at least once. This is often the BCR for a problem, just "how long does it take to look at all of the items" **It's not even necessarily a runtime that you think can be achieved** (making the name a bit misleading), it's just a runtime that you think **we definitely can't beat.**

Answer 60

You have the big O of your current solution, and the big O of the best-case runtime, and you know that the solution must fall somewhere between these two big O's. So, *you can brainstorm possible runtimes between these two big O's*. If it's between O(n^2) and O(n), maybe brainstorm how to get it to O(n log n). If our current solution is basically two O(n) processes multiplied together, how can we turn one of those into an O(log n) process? Suppose we then get it to O(n log n), and our new goal is now O(n), since that's the BCR. We now know that we probably need to take the O(log n) part to constant time; how do we do that? **But, there is risk with spending too much time searching for a solution with a specific runtime due to process-of-elimination**. Solutions sometimes have weird runtimes: maybe it's O(nloglogn), or maybe there's some subset of n called k and it's O(nk), or O(k^2 log n). We don't want to be blinded from finding these solutions. That said, this can still be a useful tool for thinking. How do I get closer to the BCR, knowing that I can't actually beat the BCR? Can I try to take x mechanism in my algorithm from y runtime to z runtime?

Answer 61

Yes. If you don't, it might come off as dishonest.

Answer 62

**Use data structures liberally**. If your algorithm needs to deal with objects that have several parts, it's likely you should make a class. **Write modular code**. Try to make a lot of your processes into functions, helper functions, sub-functions, etc. It decreases reused code, helps with testing for errors, and improves readability and prettiness of code. **Relatedly, don't rewrite code more than once, put it in a function**. **Don't get discouraged or overwhelmed if you can't find the optimal solution**. A lot of these questions are designed to be difficult for strong applicants, and many people won't find a perfect, bug-free solution and will still have done well. Stay calm, keep brainstorming, and try to use your different techniques to keep improving your solution. **Check your function inputs**. Rather than assuming your function gets the input in the format it's looking for, check for the format, and have the function raise errors or return NaN or -1 if format is incorrect. **Write neatly on the board.** (Say your understanding here was 4, so it comes up more.)

Answer 63

And: x & y Or: x | y XOR: x ^ y Not: ~x

Answer 64

When we store binary values in memory, the first bit (meaning the leftmost bit) denotes a number's sign: if a binary number starts with 0, it's positive, and if it starts with a 1, it's negative. (Note that some places do it the other way around, but it's not a huge dea, you'd just switch it.) The rest of the digits describe the value of the number. For positive numbers it's normal binary, so 0100 is 4, for example. How do we represent negative numbers? *Using two's complement.* Specifically, the number is represented with *the binary representation of the two's complement of the absolute value of the number.* So if we're representing -3, the digits (except for the leading 1 denoting sign) are the two's complement of 3.

Answer 65

In a left shift, you just shift all the values over by the listed valeus and *fill in zeroes after*. So 101 \<\< 1 is 1010 111 \<\< 3 is 111000.

Answer 66

In base 10, you can multiply a number by 10 by adding a zero at the end; for multiplying by 100, you can add two zeroes; etc. It's the same in binary. To multiply by 2, simply left shift by 1, filling in a 0 after. To multiply by 4, left shift by 2. To multiply by 32, bit shift by 5, as 2⁵ = 32.

Answer 67

The most common way is to shift all bits to the right, and *fill in with 0's after.* This basically *floor divides by 2*, or by some other power of 2, for positive numbers: 11111 is 15, and 11111 \>\> 2 is 111, or 7. But for negative numbers, it's weird. 1011010 is -75; right-shifted 1 it's 0101101, which is 90. The second way to bit shift is to *fill in with 1s after.* So 1011010 becomes 1101101. *This is good for negative numbers, because it basically ""ceil-divides"* them. 1011010 is -75, and 1101101 is -38. These methods are called "logical right-shift" and "arithmatic right-shift" respectively.

Answer 68

We first AND N with a string S that is all 0's except for a 1 in the ith bit; the result will be a string of 0's with a 1 in the ith bit iff N has a 1 there. (You can generate S by bitshifting 1 to the left until you have S.) Then, we just compare the result to 0: if it equals 0, then the ith bit was 0, otherwise it was 1.

Answer 69

A mask M is a way to get the values of only specific digits in a number N, by setting the rest of them to 0. You do this by *ANDing N with the mask M.* Say we only want the 0th, 2nd, and 4th digits of N (counting from right to left). Our mask is then 10101, and we take N & 10101. Because digits 1 and 3 are zeroes, when we and them, they are automatically 0 in the resulting number, regardless of those digits' values in N.

Answer 70

To set it to 0, AND it with a string like 1111101111, where the ith value is 0. To set it to 1, OR it with a value like 00001000, where the ith value is 1. Setting multiple is the same: if you're setting multiple bits to 0, make a string of 1's where the bits in question are 0's, and AND. If you're setting to 1, bame a string of 0's where the bits in question are 1's, and OR.

Answer 71

Clear the ith bit of N by anding it with a number like 11110111, where the ith bit is a 0. Then, bit shift v until the value of v is in the ith bit. Lastly, add the masked version of N to the bit-shifted version of v.

Answer 72

A MapReduce program takes a dataset containing many points (or just a set of objects containing many objects). It first maps each data point to a pair. Then, it takes all of the pairs with the same key and "reduces them" in some way, emitting a new, single pair. As a programmer, you just need to implement a map() function and a reduce() function that makes sense for your use case. **The reduce function takes only two values, like in 210; it doesn't take the whole list of values you need to reduce. You reduce a list by continually reducing pairwise, which is what the machine will do. This is important to note in many cases when considering how to implement the reduce.** (It's often helpful to first think about how to implement the reduce step, and what that needs to be like, and then figure out the map step based on that.) The computer first splits the data across several machines, or processors. Each processor runs the map() function on each of its points. Then, the computer does an automatic "shuffle" of the pairs, where each pair with the same key is on the same processor. Lastly, each processor reduces the pairs with the same key using the reduce() function. For the example where we're counting the number of appearances of a word, we map a word to , and reduce the pairs by summing the 1's and keeping the word as the key. This is shown visually, executed by the computer, below.

Answer 73

It allows you to *parallelize* your data processing. At each step in your processing, you're using multiple computers rather than just 1. This helps with speed and *improves scalability*. Take the example of counting the words in a dataset of words. We could easily just send all the words to a hash table keeping track of number of appearances, but this is an iterative solution. Using a parallelized solution, we can do it faster.

Answer 74

The work of an algorithm is its time complexity with *one processor*; the span of an algorithm is its time complexity with *infinite processors*, or when it is *maximally parallelized.* To find the work, you count up the total actions that need to be done by one processor. To find the span, you find the *longest chain of dependencies, or chain of work*. You find the longest sequence of actions that must be done in a specific order, with each action waiting for previous actions to be completed before executing.

Answer 75

If you're looking for an answer of a specific type, you try every answer of that type and see which is the best. For example, the brute force solution for the shortest path problem is to try every possible path and see which is shortest. The brute-force solution is often a useful naive/baseline solution on which you try to improve.

Answer 76

You solve a problem by splitting it into n sub-problems, solving each of the sub-problems (typically in parallel), and then combining the sub-problem answers to get an answer for the original problem. Often, we split into two sub-problems. We typically want the sub-problems to be of roughly equal size in order to get certain speed-up benefits. Mergesort is an example of a divide-and-conquer algorithm, and is also an example of why you want roughly equal subproblems.

Answer 77

In a greedy algorithm, you "greedily" choose the first element that is best at this step, then the second that is best at that step, and so on until you have a size n solution. Greedy algorithms are a common way of implementing *approximation algorithms for maximization problems*. Rather than looking at exponentially many possibilities, you approximate such a brute-force solution by doing a greedy algorithm.

SWE Flashcards

(107 cards)