Final Exam Flashcards

Question

specified set

Answer 1

a sequence of characters within square brackets represents any of those characters

Answer 2

- Move entries more than one position at a time by h-sorting the array. - Insertion sort, with stride length h. - Performance: The order of growth of the worst-case number of compares used by shellsort with the 3x+1 increments is N 3/2. - Liked because: ○ Has acceptable running time even for moderately large arrays ○ Requires small amount of code ○ Uses no extra space

Answer 3

computes the MST of any connected edge-weighted graph attach a new edge to a single growing tree at each step start w/ any vertex as a single vertex tree; add V - 1 edges to it, always taking next (coloring black) the minimum weight edge that connects a vertex on the tree to a vertex not yet on the tree data structures - vertices on the tree: vertex-indexed boolean array marked[], where marked[v] is true if v is on the tree - edges on the tree: one of two data structures: a queue mst to collect MST edges or a vertex-indexed array edgeTo[] of edge objects, where edgeTo[v] is the Edge that connects v to the tree - crossing edges (edge that connects a vertex in one set w/ a vertex in the other): a minPQ priority queue that compares edges by weight Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. ・Key = edge; priority = weight of edge. ・Delete-min to determine next edge e = v–w to add to T. ・Disregard if both endpoints v and w are marked (both in T). ・Otherwise, let w be the unmarked vertex (not in T ): – add to PQ any edge incident to w (assuming other endpoint not in T) – add e to T and mark w uses space proportional to E and time proportional to E log E to compute MST. Eager solution. Maintain a PQ of vertices connected by an edge to T, where priority of vertex v = weight of shortest edge connecting v to T. ・Delete min vertex v and add its associated edge e = v–w to T. ・Update PQ by considering all edges e = v–x incident to v – ignore if x is already in T – add x to PQ if not already on it – decrease priority of x if v–x becomes shortest edge connecting x to T Uses space proportional to V and time proportional to E log V in the worst case.

Answer 4

the difference between inflow and outflow (presuming there are no edges leaving t or s)

Answer 5

java 7 and later - reference to character array and an in - 16 bytes of object overhead + 8 for array reference + (24 + 2N) for char array + 4 for int instance variable + 4 padding 2. java 6 and earlier - two extra int instance variables - 40 bytes for string object + 2 ints (8) + char array (24 + 2N) 3. substring constant (40) or linear extra memory (requires new char array of length substring.length())

Answer 6

an edge that connects a vertex in one set w/ a vertex in the other

Answer 7

Positive integers - Choose array of size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M - Effective for dispersing keys evenly between 0 and M-1 - If M isn't prime (not a power of 2 in particular), may be case that not all the bits of key play a role, missing an opportunity to disperse values evenly Floating Point Numbers - Multiply by M if between 0 and 1 and round off to the nearest integer to get an index between 0 and M - 1 - Defective because it gives more weight to the most significant bits of the keys - Solution: modular hashing on the binary representation of the key Strings - Treat them as huge integers; Horner's method is supposed to get the job done with N multiplations, additions, and remainder operations Compound Keys - Mix them together in the same way Strings are Performance optimization. - Cache the hash value in an instance variable. - Return cached value. - Best if computing the hash code is expensive Uniform hashing assumption: The hash functions used uniformly and independently distribute keys among the integer values b/t 0 and M - 1. No deterministic hash funcion can satisfy these conditions, but by selecting a hash function from a universal family

Answer 8

Alternative to Huffman compression that instead of maintaining a table of variable-length code words for fixed-length patterns, maintains a table of fixed-length codewords for variable-length patterns in the input. Importantly, we do not have to encode the table in this case. Compression: - Find the longest string s in the symbol table that is a prefix of the unscanned input - write the 8-bit value (codeword) associated w/ s - scan one character past s in the input - associate the next code word value w/ s + c (the lookahead character) in the symbol table Involves two symbol table operations: - find a longest-prefix match of the input w/ a symbol table key - add an entry associating the next codeword w/ the key formed by appending the lookahead char to that key A ternary search trie is used for it in practice. Expansion: Maintain symbol table that associates strings of chars w/ codeword values (inverse of that used for compression). Fill entries from 00 to 7F w/ one-char strings (one for each ASCII character), set the first unassigned code value to 81 (reserving 80 for end of file), set the current string val to the 1-char string consisting of the first character and perform the following steps until reading codeword 80. - write the current string val - read a codeword x from teh input - set s to the value associated w/ x in the symbol table - associate the next unassigned codeword value to val + c in the symbol table, where c is the first char of s. - set the current string val to s. A tricky case occurs when the lookahead process gets one character ahead of itself (the codeword is the same as the table entry to be completed). I don't fully get it.

Answer 9

edge-weighted digraph w positive edge weights (capacities).

Answer 10

left f be an st-flow. The following three conditions are equivalent: - there exists an st-cut whose capacity equas the value of the flow f - f is a maxflow - there is no augmenting path w. respect to f

Answer 11

directed acyclic graph a digraph w/ no directed cycles (if you don't know what a directed cycle is, 1) by relaxing vertices in topological order, we can solve the single-source shortest-path problem for edge-weighted DAGs in time proportional to E + V. Every edge is relaxed exactly once. Longest path, too.

Answer 12

1. copy dfa[][X] to dfa[][j] (for mismatch cases) 2. set fda[pat.charAt(j)[j] to j + 1 (for the match case) 3. Update X (practice problems until you drop)

Answer 13

collection based on the last-in-last-out LIFO policy. Objects are output in reverse of the order that they were put in. All operations implementable in constant time using linked lists.

Answer 14

- In iteration i, pick integer r between 0 and i uniformly at random. - Swap a[i] and a[r].

Answer 15

networks: represent computers in a large network, with pairs representing connections in teh network variable name equivalence: when you declare two variable names as referring to the same object, what do you do? mathematical sets: do p and q belong to the same set? Can we use existing connections to set up a communications path, or do we need to establish new ones?

Answer 16

a graph whose vertices we can divide into two sets such that all edges connect a vertex in one set w/ a vertex in teh other set

Answer 17

a finite-state machine that can read inputs, move from state to state, and write outputs

Answer 18

Delete. 1. If there are 0 children, delete t by setting parent link to null. 2. [1 child] Delete t by replacing parent link 3. [2 children] ○ Find the successor of T (the child to the right that has no left child) ○ Delete the minimum in t's right subtree ○ Put successor in t's spot

Answer 19

a cycle w/ no repeated edges or vertices

Answer 20

based on the idea that we can save bits by encoding frequently used characters w/ fewer bits than rarely used characters a prefix-free code encoded in a trie permits avoidance of delimiter usage by ensuring that no character code is the prefix of another Expansion involves beginning at a root and going left if a bit is 0 and right if it is 1. Once a leaf is encountered, the char at it is output and restart. compression involves creating an char-indexed st[]. Then recursively walks the tree, maintaining a binary string corresponding to the path from root to each node (0 for left links, 1 for right links) and setting the codeword corresponding to each character when the character is found in a leaf. Once trhe coding table is built, compression is simply looking up the code for each character in the input. trie construction: 1. create a forest of 1-node trees (leaves), one for each char in the input stream, each assigned a freq value. 2. build the coding trie from the bottom up according to the frequencies. Find the new nodes with the smallest frequencies and then create a new node w/ those two nodes as children (new node has sum frequencies). Then iterate the process: find the two nodes w/ the smallest frequencies and repeat. - Straightforward w/ a priority queue. for any prefix-free code, the length of the encoded bitstring is equal to the weighted external path length (the sum of the weight [associated freq count] times depth of all the leaves) of the corresponding tree given a set of r symbols and frequencies, the Huffman algorithm builds one of at least one optimal prefix-free code

Answer 21

given a st-flow networka dna an st-flow, the residual network for the flow has the same vertices as the original and one or two edges in the residual network for each edge in the original defined as follows: - for each edge e from v to w in the original, let f be its flow and c its capacity If f is positive, include an edge w-\>v in the residual w/ capacity f; and if f \< c, include an edge v -\> w in the residual w/ capacity c - f the forward edges represent the remaining capacity (the amount of flow we can add if traversing that edge); the backward edges represent the flow (amount of flow we can remove if traversing that edge)

Answer 22

Two phases: heap construction (reorganize array into a heap) and sortdown (where we pull the items out of the heap in decreasing order Heap Construction Straightforward method: - proceed from left to right through array, using swim() to ensure that the items to the left of the scanniing pointer make up a heap-ordered complete tree, like successive priority queue insertions Clever method: - Proceed from right to left, using sink() to make subheaps as we go. - Since most of the heaps processed are small, processing time gets minimized greatly. - Uses fewer than 2N compares and fewer than N exchanges to construct a heap from N items Sortdown - Simply involves removing the larges remaining item from the heap and placing it into the array position vacated as the heap shrinks

Answer 23

characters enclosed in [] and separated by - may convey a range of values

Answer 24

Tries... STore characters in nodes (not keys) Each node has R children, one for each possible character. Follow links corresponding to each car in the key. Search hit: node where seawrch ends has a non-null value. Search miss: reach null link or node where seawrch ends has null value. Insertion: Follow links corresponding to each char in the key. Encounter null link? Create a new node. Encounter the last char of the key? Set value in that node. Performance: Search hit: need to examine all L chars for equality. Search miss. COuld have mismatch on first character. Typical case: examine only a few characters (sublinear). Space: R null links at each leaf. (but sublinear space possible if many short strings share common prefixes) Fast search hit and even faster miss, but wastes space.

Answer 25

Items that collide are chained together in separate linked lists. Idea is to choose M to be sufficiently large that the lists are sufficiently sort ot allow efficient sort through a two-step process: has to find the list that could contain the key, then sequentially search through that list for the key. Simpler/less efficient process: build for each of the M array indicies, a symbol table of the keys that hash to that index, thus reusing already developed code. Average length of the lists is N/M (N keys and M lists), no matter how many keys distributed among lsits. The probability that the number of keys in a list is within a small constant factor of N/M is extremely close to 1.

Answer 26

keep the RE in an array re[] of char values that defines the match transitions (if re[i] is in the alphabet, then there is a match transition from i to i + 1). - for E-transitions, represent w/ a digraph - directed edges connecting vertices b/t 0 and M (one for each state). Regular-expression-matching NFA. ・We assume RE enclosed in parentheses. ・One state per RE character (start = 0, accept = M). ・Red ε-transition (change state, but don't scan text). ・Black match transition (change state and scan to next text char). ・Accept if any sequence of transitions ends in accept state after scanning all text chars. Q. How to efficiently simulate an NFA? A. Maintain set of all possible states that NFA could be in after reading in the first i text characters. Reduces to digraph reachability problem. Find all vertices reachable from a given source or set of vertices. Solution. Run DFS from each source, without unmarking vertices. Performance. Runs in time proportional to E + V. Determining whether an N-character text string is recognized by the NFA corresponding to an M-char RE takes time proportional to NM in the worst case, since for each of the N text chars we iterate through a set of states fo size no more than M and run a DFS on the digraph of E-transitions, which has no more than 2M edges.

Answer 27

job placement (what is the maximum number of jobs that can be filled?) product distribution (Is it possible to get the product from the warehouses to the retail outlets such that supply meets demand everywhere?) network reliability (What is the minimum number of trunk lines that can be cut to disconnect some pair of computers?) (many others)

Answer 28

topological sort works to put all the vertices in order such that all of its directed edges point from a vertex earlier in the order to a vertex later in the other, avoiding directed cycles a digraph has topological order iff it is a DAG (directed acyclic graph). reverse postorder in a DAG is a topological sort; w/ DFS we can use this fact to topologically sort a DAG in time proportional to V + E

Answer 29

method that colors black all edges in the MST of any connected edge-weighted graph w/ V vertices; starting w/ all edges colored grey, find a cut w/ no black edges, colors its minimum-weight edge black, and continue until V-1 edges have been colored black

Answer 30

like Djikstra's two-stack algorithm for evaluating arithmetic, but just 1 stack. Challenges. Remember left parentheses to implement closure and or; remember | to implement or. Solution. Maintain a stack. ・( symbol: push ( onto stack. ・| symbol: push | onto stack. ・) symbol: pop corresponding ( and any intervening |; add ε-transition edges for closure/or. Building the NFA corresponding to an M-character RE takes time and space proportional to M. Pf. For each of the M characters in the RE, we add at most three ε-transitions and execute at most two stack operations. States. Include a state for each symbol in the RE, plus an accept state. Concatenation. Add match-transition edge from state corresponding to characters in the alphabet to next state. Parentheses. Add ε-transition edge from parentheses to next state. Closure. Add three ε-transition edges for each \* operator. Or. Add two ε-transition edges for each | operator. Takes time/space proportional to M in the worst case.

Answer 31

To visit a vertex: - Mark it as having already been visited - Visit (recursively) all the vertices that are adjacent to it and that have not yet been marked Marks all the vertices connected to a given source in time proportional to the sum of their degrees. Maintains an array of boolean values to mark all of the vertices that are connected to the source. If the graph is connected, every adjacency list entry is ehecked. Enables path detection. Using an array edgeTo[] of int values that convey the edge v-w that takes us to each vertex w for the first time. That is, v-w is the last edge on the known path from s (source) to w. edgeTo[w] = v. All occurs in time proportional to actual path's length. can also be used to find connected components in a graph in time/space proportional to V + E. Maintains an id[] array that associates the same int value w/ every vertex in each component, starting w/ all 0s, and then marking those found next w/ 1, and then 2, etc. Faster than UF in theory because it's surely constant and UF is not. However, UF is faster in practice b/c it doesn't have to build a full representation of the graph. UF is also an online algorithm, while DFS must preprocess.

Answer 32

a spanning tree (connected subgraph w/ no cycles that includes all the vertices) whose weight (the sum of the weights of its edges) is no larger than the weight of any other spanning tree

Answer 33

the set of all search problems that can be solved in polynomial time (worst case)

Answer 34

application of dfs-based reachability testing in digraphs each vertex represents an object and each edge rerepsents a reference to an object in an appropriate model for the memory usage of a running Java program. Periodically marks the set of potentially accessible objects by running a digraph reachability algorithm and sweeps through all objects, collecting the unmarked ones for use on new objects.

Answer 35

a search problem A is said to be NP-complete if all problems in NP poly-time reduce to A

Answer 36

number of nodes in a tree

Answer 37

if we save the vertex given as an argument to the recursive dsf() in a data structure, then iterate through it, we see all the graph vertices, in order determined by the nature of the data structure and whether we do the save before or after the recursive calls. order in which dfs() returns.

Answer 38

input: - vertex names are strings - a specified delimiter separates vertex names - each line rerpesents a set of edges, connecteing the first vertex name on the line to each of the other vertices named on the line - the number of vertices V and the number of edges E are both implicitly defined Builds three data structures - symbol table st w/ String keys (vertex names) and int values (indices) - an array keys[] that serves as an inverted index, giving the vertex name associated w/ each integer index - a graph G built using the indices to refer to vertices Uses two passes through the data to build these. Enables assessment of "degrees of separation" by assigning number to a source s and another to each other according to how far it is from s.

Answer 39

collection based on the first-in-first-out FIFO policy. Objects are output in the order that they were put in. All operations implementable in constant time using linked lists.

Answer 40

for substring search method that leads to substantial performance improvements because it can back up in the text; scans pattern from right to left . ex. if we find matches on the seventh and sixth chars but not the fifth, then we can immediately slide the pattern seven positions to the right. implements a mismatched character heuristic using an array right[] that gives for each char in the alphabet the index of the rightmost occurrence in the pattern (or -1 if the char isn't in the pattern). \*\*\* index i moving from left to right through text, and j moving right to left through the pattern; if txt.charAt(i + j) = pat.charAt(j) for all j, then there's a match if the char causing mismatch isn't in the pattern, slide the pattern j + 1 positions to the right. (increment i by j + 1). [We could proibably further increase i w/ a kmp-like table] If the char c causing the mismatch is found in the pattern, increment i by j - right[c] If computation wouldn't increase i, increment i anyhow. Usually takes N/M char compares to search for a pattern of length N in a text of length N.

Answer 41

the total flow out of a vertex (the sum of the flows on its outgoing edges)

Answer 42

he Burrows-Wheeler transform of a string s of length N is defined as follows: Consider the result of sorting the N circular suffixes of s. The Burrows-Wheeler transform is the last column in the sorted suffixes array t[], preceded by the row number first in which the original string ends up.

Answer 43

keeping track of all vertices adjacent to each vertex on a linked list associaed w/ that vertex. Each edge appears twice in the data structure. So we get: -Space usage proportional to V + E - Constant time to add an edge - Time proportional to the degree of v to iterate through vertices adjacent to v

Answer 44

the order of growth of the running time of a program to solve a problem on any computing device is within a polynomial factor of some program to solve the problem on a Turing machine

Answer 45

all physically realizable computing devices can be simulated by a Turing Machine.

Answer 46

of a digraph G, another digraph w the same set of vertices but w/ an edge from v to w in the transitive closure if and only if w is reachable from v in G by convention, every vertex is reachable from itself, so the transitive closure has V self-loops helps for the reachability problem w/ digraphs; constructor uses space proportional to V^2 and time proportional V (V + E)

Answer 47

1. Straightforward 2. Tricky but not difficult 3. Extremely challenging 4. Open

Answer 48

Floor If k = key in node: The floor of k is k. If k \< key in node: The floor of k is in the left subtree. If k \> key in node: The floor of k is in the right subtree if there is any key

Answer 49

= that an edge connects two vertices

Answer 50

subgraph taht contains all of that graph's vertices and is a single tree

Answer 51

UF implementation(enables union(p, q), find(p) connected(p, q) and count()) organized thusly: a site indexed id[] array maintaining the invariant that p and q are connected iff id[p] = id[q]; all sites in a component have the same value in id[] called quickfind because find() just returns id[p], which implies that connected() just tests id[p] == id[q] union() maintains the invariant. Doing nothing if p and q are already in the same component, it otherwise scans id[], renaming p's component to id[q]. Uses 1 array access for each call to find(), but between N + 3 and 2N + 1 array access to each call to union that combines two components. Each call to union() that combines to components does so by making 2 calls to find(), testing each of the N entries in the id[] array, and changing between 1 and N - 1 of them. Dynamic connectivity problem might require at least N - 1 calls to union and thus at least (N + 3)(N - 1) ~ N^2 array accesses. So runs quadratic in a lot of applications.

Answer 52

a set of vertices and a collection of directed edges connecting an ordered pair of vertices adjacency-list represented, where an edge v-\>w is represented as a list node containing w in the linked list corresponding to v. DFS marks all the vertices in a digraph reachable from a given set of sources in time proportional to the sum of the outdegrees of the vertices marked.

Answer 53

corollary of the maxflow-mincut theorem when capacities are integers, there exists an integer-valued maxflow, and the Ford-Fulkerson algorithm finds it

Answer 54

a path w/ at least one edge whose first and last vertices are the same

Answer 55

Given a set of M linear inequalities and linear equatiosn involving N variables, and a linear objective function of the N variables, find an assignment of values to the variables that maximizes the objective function, or report that no feasible assignment exists. I have no idea what this means, but it's extremely important because many important problems reduce to it, and we have efficient algorithms for solving linear programming problems. Maxflow and shortest paths both reduce to linear programimng, for example.

Answer 56

graphs w/ parallel edges

Answer 57

specifies any number of copies of its operant + sign signifies at least one copy ? to signify 0 or 1 copy {} count or range within braces to specify a specific number eg (AB){1-2}

Answer 58

for computing strongly connected components in a digraph efficiently: 1. Given a digraph G, use DepthFirstOrder to compute the reverse postorder of its reverse G^R. 2. Run standard DFS on G, but consider the unmarked vertices in the order just computed the id[] array produced conveys strongly connected components in groups w/ a same number; all vertices reached on a call to the recursive dfs() from the constructor are in a strong component uses preprocessing time and space proportional to V + E to support constaint time strong connectivity queries in a digraph

Answer 59

Basic idea = modular hashing. ・Compute a hash of pat[0..M-1]. ・For each i, compute a hash of txt[i..M+i-1]. ・If pattern hash = text substring hash, check for a match. A string of length M corresponds to an M-digit base-R number. To use a table of size Q for keys of this type, use a hash function to convert an M-digit base-R number to an int value b/t 0 and Q - 1. Modular hashing: take the remainder when dividing the number by Q (usually a random prime as large as possible while avoiding overflow). When M is large (like 100 or 1000), can apply horner's method. For each digit in the number, multiply by R, add the digit, and take the remainder when divide by Q. However, this takes linear time, making substring search quadratic in the worst case. To avoid that problem, do it this way (rolling hashing): from current value subtract leading digit multiply by radix add new trailing digit to produce new value But for the first R entries, use Horner's rule. Monte Carlo correctness version makes the hash table "size" Q (since not actually building a hash table, but testing for collision w one value. Using a long balue greater than 10^20, the probability of a mismatch becomes less than 10^-20. Las vegas version. Check for substring match if hash match; continue search if false collision. Usually linear and always correct.

Answer 60

proposition that there's an NFA corresponding to any given RE, and vice versa

Answer 61

the second vertex in a direected edge

Answer 62

- finding the median - counting distinct values - scheduling to minimize average completion time

Answer 63

24 bytes + memory need to store primitive type values / object references / arrays

Answer 64

a sequence of vertices in which there is a (directed) edge pointing from each vertex in the sequence to its successor in the sequence

Answer 65

We represent 3-nodes as two 2-nodes connected by a single red link that leans left. We refer to BSTs that represent 2-3 trees in this way as red-black BSTs. Starts with standard BSTs (made up of 2-nodes) and adding extra information to encode 3-nodes. Red links - Bind together two 2-nodes to represent 3-nodes. Black links - binds together the 2-3 tree 3-nodes are represented such that one of the 2-nodes is the left child of the other. Alternative Definition: A BST such that: - No node has two red links connected to it. - Every path from root to null link has the same number of black links (black balance). - Red links lean left. 1–1 correspondence between 2–3 and LLRB Boolean added that indicates color of tree. Each node is pointed to by precisely one link (from its parent) ⇒ can encode color of links in nodes. Search is the same as for elementary BST (ignore color). It runs better because of balance, though. Most other ops (e.g., floor, iteration, selection) are also identical. Basic strategy. Maintain 1-1 correspondence with 2-3 trees. During internal operations, maintain: ・Symmetric order. ・Perfect black balance. [but not necessarily color invariants] How? Basic Operations: : rotation and color flip. Left Rotation. What the red link points to is made the parent of the pointer. The old pointer then points to oldpointed.left Right Rotation. The same but reversed. Color flip. Recolor to split a (temporary) 4-node. Always turn the root black after each insertion. Insert into a single 2-node. - Do standard BST insert; color new link red. - If new red link is a right link, rotate left. Insert into a 2-node at the bottom. - Do standard BST insert; color new link red. - If new red link is a right link, rotate left. Insert into a tree with two keys. - If the new key is larger than the two in the tree, and therefore attached to the rightmost link of the 3-node, ○ 1) flip colors to make it valid. - If the new key is smaller than the two in the tree, and therefore attached to the leftmost link of the 3-node ○ 1) rotate the top link to the right, and 2) just flip colors to make it valid - If the new key is between the two in the tree, and therefore a right-leaning node under a left-leaning node ○ 1) Rotate left the bottom link, then 2) rotate the top link to the right, and then 3) just flip colors to make it valid Insert into a 3-node at the bottom. - Do standard BST insert; color new link red - Rotate to balance the 4-node if needed - Flip colors to pass red link up one level - Rotate to make lean left (if necessary) - Repeat case 1 or case 2 up the tree (if needed). Same code for all cases. - Right child red, left child black: rotate left. - Left child, left-left grandchild red: rotate right. - Both children red: flip colors Height of tree is ≤ 2 lg N in the worst case. - Every path from root to null link has same number of black links. - Never two red links in-a-row. All symbol table operations are guaranteed to be logarithmic in the size of the tree (except range search, which costs time proportional to # of keys returned). Average length of a path from the root to a node in a red-black BST with N nodes is 1.00lgN.

Answer 66

"Standard" recipe for user-defined types. ・Combine each significant field using the 31x + y rule. ・If field is a primitive type, use wrapper type hashCode(). ・If field is null, return 0. ・If field is a reference type, use hashCode(). ・If field is an array, apply to each entry. In practice. Recipe works reasonably well; used in Java libraries. In theory. Keys are bitstring; "universal" hash functions exist. Basic rule. Need to use the whole key to compute hash code; consult an expert for state-of-the-art hash codes.

Answer 67

1. small alphabets 2. long sequences of identical bits/chars 3 . frequently used chars 4. long reused bit/char sequences

Answer 68

two vertices in a digraph are strongly connected if they are mutually reachable: there's a directed path from v to w and from w to v. a digraph is strongly connected if all of its vertices are strongly connected to one another two vertices are strongly connected iff there exists a general directed cycle that contains them both strong connectivity is an equivalence relation (features reflexive, symmetric, and transitive relationships) and subsets of strongly connected vertices are equivalence classes Applications: - decide how to group subjects in a textbook - decide how to organize progam modules - food web

Answer 69

How to deal with large numbers of duplicate keys? - Let v be partitioning item a[lo]. - Scan i from left to right. ○ (a[i] \< v): exchange a[lt] with a[i]; increment both lt and i ○ (a[i] \> v): exchange a[gt] with a[i]; decrement gt ○ (a[i] == v): increment i Quicksort with 3-way partitioning is entropy optimal and reduces running time from linearithmic to linear in broad class of applications.

Answer 70

acyclic connected graph that is, a graph with no cycles and in it exists a path from every vertex to every other vertex in teh graph for all trees, - adding an edge that connects two vertices in a tree creates a new cycle - removing an edge from a tree breaks it into two separate subtrees

Answer 71

the number of links on the path from a node to its root

Answer 72

is a graph model where we associate weights or costs w/ each edge assumptions: graph is connected edge weights not necessarily distances edge weights maybe zero or negative edge weights are all different represented with an EdgeWeightedGraph with operations V() (# of vertices), E() edge count, addEdge, adj(int v) edges incident to v, edges() (all of this graph's edges, etc. consisting of Edge objects weight(), int either() either of this edge's vertices, other(int v) the other vertex, compareTo, etc linked list of links to objects containing pair of int values and a double value

Answer 73

the first vertex in a directed edge

Answer 74

A binary heap is a collection of keys arranged in a complete heap-ordered binary tree, represented in level order in an array (not using the first entry). Children are in positions 2k and 2k + 1; can travel up and down w/ simple arithmetic on array indices Insert. Add node at end, then swim it up. - Takes no more than 1 + lgN compares - Proportional to length of path between the root and the bottom of the heap Remove the maximum. Exchange root with node at end, then sink it down. - No more than 2lgN compares Two compares for each node on the path (except at the bottom): one to find the child with the larger key, the other to decide whether that child needs to be promoted not cache friendly Multiway Heaps Can modify code to instead represent ternary or other sized trees. - Trade-off is between lower cost from reduced tree height (log\_d N) and higher cost of finding the largest of the d children at each node Floyd's sink-to-bottom trick. - Sink key at root all the way to bottom. - Swim key back up. - Fewer compares; more exchanges. - Worthwhile depending on cost of compare and exchange.

Answer 75

general purpose, especially when space is tight Basic plan. - Shuffle the array. - Partition so that, for some j ○ entry a[j] is in place ○ no larger entry to the left of j ○ no smaller entry to the right of j - Sort each subarray recursively. Partition: Repeat until i and j pointers cross. - Scan i from left to right so long as (a[i] \< a[lo]). - Scan j from right to left so long as (a[j] \> a[lo]). - Exchange a[i] with a[j]. When pointers cross. - Exchange a[lo] with a[j] Performance: Inner loop increments an index and compares an array entry against a fixed value, so hard to envision a shorter inner loop. -- Average case is ~2NlnN compares and 1/3NlnN exchanges; - close to 1.39NlgN compares - Worst case is N^2/2 compares. Improvements: - Quicksort is slower than insertion sort for tiny arrays so cutoff to insertion sort for small subarrays. - Sample items taken from the subarray and find the median thereof to improve partitioning a little through median-of-three partitioning. - With entropy-optimal sorting, Quicksort: implementation details - Partitioning in-place. Using an extra array makes partitioning easier (and stable), but is not worth the cost. - Terminating the loop. Testing whether the pointers cross is tricky. - Equal keys. When duplicates are present, it is better to stop scans on keys equal to the partitioning item's key. ○ Otherwise program might take quadratic time. - Preserving randomness. Shuffling is needed for performance guarantee. - Equivalent alternative. Pick a random partitioning item in each subarray.

Answer 76

Prim's algorithm builds the MST one edge at a time, finding a new edge to attach to a single growing tree at each step. Kruskal's algorithm also builds the MST one edge at a time; but, by contrast, it finds an edge that connects two trees in a forest of growing trees. We start w/ a degenerate forest of V single-vertex trees and perform the operation of combining two trees (using the shortest edge possible) until there is just one tree left: the MST. Uses space proportional to E and time proportional to E log E. Consider edges in ascending order of weight. ・Add next edge to tree T unless doing so would create a cycle.

Answer 77

number of edges incident to a vertex

Answer 78

graphs w/ no parallel edges or self-loops

Answer 79

of edges going into a vertex

Answer 80

for symbol tables 1. Maintain an (unordered) linked list of key-value pairs. 2. Search: scan through all keys until find a match. 3. Insert: scan through all keys until find a match; if no match, add to front. - Linear time for both in worst case; N/2 (for Search) and N (for insert) in average case. Not good enough, man. - Inserting N distinct keys into an empty Symbol table uses ~N^2/2 compares.

Answer 81

1. 16 bytes overhead 2. 8 bytes extra overhead 3. amount of memory used by each reference / instance variable.

Answer 82

set of vertices and collection of edges that each connect a pair of vertices

Answer 83

substring search by checking each possible position in the text at which the pattern could match - one pointer i into the text and one pointer j into the pattern requires ~NM character compares to search for a pattern of length M in a text of length N, in the worst case (worst case is a pattern and text of all As followed by a B. For each of the N - M + 1 possible match positions, all the characters are checked against the text, for a total cost of M(N - M + 1) and M is normally small)

Answer 84

for substring search preferred when backup is inefficient; we can do much better when backup is easy whenever we detect a mismatch, we already know some of the chars in teh text; we can take advantage of this info i (the pointer scanning the text) is never backed up; only j (moving through the pattern) is. \*\*\* an array dfa[][] is used to record how far to back up the pattern pointer when a mismatch is detected. During the search dfa[txt.charAt(i)][j] is the pattern position to compare w txt.charAt(i + 1) after comparing txt.charAt(i) w/ pat.charAt(j). dfa[pat.charAt(j)][j] is always j + 1 since w/ a match we just want to move on to the next character. Backup in general is the length of the maximum overlap of the beginning of the pattern w/ the known text characters. \*\*\* during search, when i and j point to mismatching characters, next posible position for a mattern match is beginning at psition i - dfa[txt.charAT(i)][j]. accesses no more than M + N charas to search for a pattern of length M in a text of length N. Because we access each pattern char once when computing dfa[][] and each text char once (in the worst case) in search().

Answer 85

paths are directed the weights are not necessarily distances not all vertices need be reachable negative weights introduce complications shortest paths are normally simple shortest paths are not necessarily unique parallel edges and self-loops may be present data structures ・ distTo[v] is length of shortest path from s to v. ・ edgeTo[v] is last edge on shortest path from s to v For shortest path implementations, start w/ distTo[source] = 0 and the rest = Double.POSITIVE\_INFINITY. General algorithm Initialize distTo[s] = 0 and distTo[v] = ∞ for all other vertices. Repeat until optimality conditions are satisﬁed: - Relax any edge Relax edge e = v→w. As an algorithm proceeds, it gathers information about the shortest paths that connect the source to each vertex encountered in our edgeTo[] and distTo[] data structures. By updating this information when we encounter edges, we can make new inferences about shortest paths. Specifically, we use edge relaxation, defined as follows: to relax an edge v-\>w means to - test whether the best known way from s to w is to go from s to v, - then take the edge from v to w, and, - if so, update our data structures to indicate that to be the case.

Answer 86

disjoint set of trees (acyclic connected graphs)

Answer 87

In weighted quick union, making all examined nodes during find() operation point directly to the root makes this nearly constant time amortized. (an extra loop in the find() operation that sets the id[] entry corresponding to each node encountered along the way to link directly to the root).

Answer 88

number of edges going from a vertex

Answer 89

do inorder traversal of trie; add keys encountered to a queue; maintains equence of chars on path from root to node

Answer 90

1. fixed-length code 2. append special stop char to each codeword 3. general prefix-free code

Answer 91

is a negative-cycle detection problem in edge-weighted digraphs replace each weight by its logarithm, negated. W/ this change, computing path weights by multiplying edge weights in the original problem corresponds to adding them in the transformed problem. The transformed edge weights might be negative or positive, a path from v to w gives a way of converting from currency v to currency w and any negative cycle is an arbitrage opportunity.

Answer 92

- In iteration i, swap a[i] with each larger entry to its left. - Items to the left are in sorted order during the sort, but not necessarily in final position - Fully sorted once index reaches right end - Steps: ○ Move the pointer to the right. ○ Moving from right to left, exchange a[i] with each larger entry to its left. - Performance: ○ The number of compares is the number of exchanges plus an additional term equal to (N - # of times the item inserted is the smallest so far) ○ In worst case, this is N^2/2 compares and N^2/2 exchanges; ○ in best case, this is N-1 compares and 0 exchanges ○ On average, this is N^2/4 compares and N^2/4 exchanges ○ Best case involves array being in order; worst case is reverse order ○ The number of exchanges used by insertion sort is equal to the number of inversions in the array ○ The number of compares is at least equal to the number of inversions plus the array size - 1 ○ For partially sorted algorithms (number of inversions \< cN) insertion sort runs in linear time - Improvement: ○ Shortening inner loop to move larger inputs to the right one position rather than doing full exchanges (Half exchanges) ○ Binary insertion sort. Use binary search to find insertion point. Makes compares N lg N, but still a quadratic # of array accesses. Liked because: - effective w/ partially sorted and tiny arrays

Answer 93

a search problem is a problem having solutions w/ the property that the time needed to certify that any solution is correct is bounded by a polynomial in the size of the input An algorithm solves a search problem if, given any input, it either produces a solution or reports that none exists

Answer 94

1. initialize dist[s] to 0 and all other distTo[] to positive infinity 2. relax and add to teh tree a non-tree vertex w/ the lowest distTo[value], continuing until all vertices are on the tree or no non-tree vertex has a finite DistTo[] value along with distTo[] and edgeTo[] data structures, includes a priority queue pq to keep track of vertices that are candidates for being the next to be relaxed. uses space proportional to V and time proportional to E log V to compute the SPT rooted at a given source in an edge-weighted digraph w/ E edges and V vertices

Answer 95

1. overhead of 16 bytes 2. padding of memory to make it a multiple of 8 3. amount of memory used by each instance variable / reference

Answer 96

good for random strings MSD string (radix) sort. ・Partition array into R pieces according to first character (use key-indexed counting). ・Recursively sort all strings that start with each character (key-indexed counts delineate subarrays to sort). not sure i get it switch to insertion sort for small subarrays is a must for MST since each sort involves initialize all entries of count array to 0 and transforming them to indices (many characters = huge cost). Effect of cut-off is dramatic. also relatively slow for subarrays containing large numbers of equal keys. The worst case wis when all keys are equal. Uses two auxiliary arrays to do the partitioning: a temproary array for distributing keys aux[] and an array that hodls the counts that are transformed into partition indices count[]. For random inputs, examines just enough chars to distinguish among keys, and then running time is sublinear n the # of chars in the data . For nonrandom, could still be sublinear but might need to examine more characters than in the random case, depending on the data. Nearly linear when significant # of equal keys present. In the worst case, examines all the chars in the keys and running tmime is linear. All strings are equal. examines about N logr N chars on average between 8N + 3R and ~7wN + 3WR array accesses to sort N strings taken from an R-character alphabet, where w is the average string length space is proportional to R times the length of the longest string + N, in the worst case

Answer 97

taking advantage of a bitstream w/ long runs of repeated bits by specifically encoding the lengths of runs. Real simple.

Answer 98

Rank. How many keys \< k ? Recursive algorithm: - If the given key is equal to the key at the root, return the # of keys in the left subtree - If it's \< than the key at the root, return the rank (recursively computed) of the key in the left subtree - If it's \> than the key at the root, return the # of keys in the right subtree + 1 (the key at the root) + the rank (recursively computed) of the right subtree

Answer 99

Selection Sort - In iteration i, find index min of smallest remaining entry. Swap a[i] and a[min]. - Invariants: ○ Entries the left of ↑ (including ↑) fixed and in order. ○ No entry to right of ↑ is smaller than any entry to the left of ↑ ○ Takes just as long for a sorted array as it does an unsorted array ○ The number of exchanges is a linear function of array size - Steps: 1. Move pointer to the right 2. Identify index of minimum entry on right 3. Exchange into position - Performance: ○ Invariably uses N(N - 1)/2 ~ .5 \* N^2 compares and N exchanges

Answer 100

single-source shortest path in undirected graphs parallel precedence-constrained scheduling arbitrage

Answer 101

a directed path w/ at least one edge whose first and last vertices are the same important for resolving scheduling problems (precedence constrained scheduling); topological sort works to put all the vertices in order such that all of its directed edges point from a vertex earlier in the order to a vertex later in the other, avoiding directed cycles

Answer 102

Perfect for finding the shortest path possible from source s to target v. Maintain a queue of all vertices that have been marked but whose adjacency lists have not been checked; put the source vertex on the queue, then perform the following steps until the queue is empty: - take the next vertex v from the queue and mark it - put onto the queue all unmarked vertices that are adjacent to v Result is an edgeTo[] array, a parent-linked representation of a tree rooted at s, defined as shortest path from s to any vertex connected to s. BFS takes time proportional to V + E in the worst case. BFS marks all the vertices connected to s in time proportional to the sum of their degrees; if the graph is connected, this sum is the sum of all the degrees of all the vertices, or 2E.

Answer 103

a flow network w/ 2 identified vertices, a source s and a sink t

Answer 104

enclosed in brackets, preceded by ^; any chracter but one of those characters

Answer 105

condition where no edge's flow is greater than the edge's capacity and the local equilibrium condition that every vertex's netflow is zero (except s and t)

Answer 106

Initialize distTo[s] = 0 and distTo[v] = ∞ for all other vertices. Repeat V times: - Relax each edge. Takes time proportional to EV and extra space proportional to V Solves the single source shortest path problem from a given source s for any edge-weighted digraph w/ V vertices and no negative cycles reachable from s. Can modify by determining a priorir that numerous edges are not going to lead to a successful relaxation in any given pass. The only edges that COULD change are those leaving a vertex whose distTo[] value changed in the previous pass. A FIFO queue keeps up with those vertices.

Answer 107

a cut of a graph is a partition of its vertices into two nonempty disjoint sets

Answer 108

the set of all search problems N stands for nondeterminism; represents the idea that one way to extend the power of a computer is to endow it w/ the power of nondeterminism: to assert that when an algorithm is faced w/ a choice of several options, it has the power to "guess" the right one no one has been able to prove that nondeterminism helps for any particular problem; lno one has found a problem that can be proven to be in NP but not in P

Answer 109

- Organize the merges so that we do all the merges of tiny subarrays in one pass, and then do a second pass to merge those subarrays in pairs, and so forth, continuing until we do a merge that encompasses the whole array. - about 10% slower than recursive, top-down mergesort on typical systems because of caching

Answer 110

two edges that connect the same pair of vertices

Answer 111

Basic plan. ・Divide array into two halves. ・Recursively sort each half. ・Merge two halves. Performance: - Uses between 1/2 NlgN and NlgN compares to sort any array of length N - Top-down mergesort uses at most 6NlgN array accesses to sort an array of length N ○ Each merge uses 2N for the copy, 2N for the move back, and at most 2N for the compares [so entire algorithm involves 6NlgN array accesses, since there are lgN merges] - Number of compares for the lerge is at least N/2 - Mergesort uses extra space proportional to N Improvements: - Use insertion sort for small arrays - Stop if already sorted - Switch the role of the input array and the auxiliary array at each level to eliminate the time required to copy to the auxiliary array for merging

Answer 112

the union of spanning trees of its connected components

Answer 113

A 2-3 search tree is a tree that is either empty or - A 2-node, with one key (and associated value) and two links, a left link to a 2-3 search tree with smaller keys, and a right link to a 2-3 search tree with larger keys - A 3-node, with two keys (and associated values) and three links, a left link to a 2-3 search tree with smaller keys, a middle link to a 2-3 search tree with keys between the node’s keys, and a right link to a 2-3 search tree with larger keys - As usual, we refer to a link to an empty tree as a null link. Search Directly generalizes that for BSTs. - Compare search key against keys in node. - Find interval containing search key. - Follow associated link (recursively Insert into a 2-Node If search terminates at a two-node, replace it with a 3-node. Else??? Insert into a tree consisting of a single 3-Node - Add new key to 3-node to create temporary 4-node - Move middle key in 4-node into parent - Repeat up the tree as necessary - If reach the root and it's a 4-node, split it into three 2-nodes Splitting a 4-node is a local transformation: constant number of operations. Each transformation maintains symmetric order and perfect balance. Search and insert operations in a 2-3 tree with N keys are guaranteed to visit at most lg N nodes. Proof: The height of an N-node 2-3 tree is between ⎣log3 N⎦ = ⎣(lg N)/(lg 3)⎦ (if the tree is all 3-nodes) and ⎣lg N⎦ (if the tree is all 2-nodes). Direct implementation is complicated, because: ・Maintaining multiple node types is cumbersome. ・Need multiple compares to move down tree. ・Need to move back up the tree to split 4-nodes. ・Large number of cases for splitting.

Answer 114

. represents any single character

Answer 115

string sort algorithm effective whenever keys are small integers 1. Compute frequency counts - count the frequency of occurrence of each key value, using an int array count[]. For each item, use the key to access an entry in count[] and increment that entry. That entry is always count[r + 1] for some reason. 2. transform counts to indices - use count[] to compute, for each key value, the starting index positions in the sorted order of items w/ that key. Obtained by summing the frequency counts of smaller values. 3. Distribute the data - use index table created from the count[] array to move the items into an auxiliary array aux[] 4. Copy the sorted result back to the original array Uses 8N + 3R + 1 array accesses to stably sort N items whose keys are integers b/t 0 and R - 1. Evades sorting bound by not performing compares.

Answer 116

the minimum st-cut given an st-network, find an st-cut such that the capcaity of no other cut is smaller. For any st-flow, the flow across each st-cut = the value of the flow. The outflow from s is equal to the inflow to t. No st-flow's value can exceed teh capacity of any st-cut.

Answer 117

an directed cycle whose total weight is negative there exists a shortest path from s to v in an edge-weighted digraph iff - there exists at least one directed path from s to v and no vertex on any directed path from s to v - no vertex on any directed path from s to v is on a negative cycle vastly complicates the shortest path (improperly defined problem) and even the shortest simple path problem (only exponentially possible);

Answer 118

AKA the augmenting path algorithm For any path w/ no forward edges, and no empty backward edges, we can increase the amount of flow in the network by increasing flow in forward edges and decreasing flow in backward edges. The amount by which the flow can be increased is limited by the min of the unused capacities in the forward edges and flows in the backward edges. This is called an augmenting path. Start w/ 0 flow everywhere. Increase the flow along any augmenting path from source to sink (w/ no full forward edges or empty backward edges), continuing until there are no such paths in the network. In a shortest-augmenting-path implementation, the # of augmenting paths needed for a flow network w/ V vertices and E edges is at most EV/2 and takes time proprtional to VE^2/2

Answer 119

the first proven NP-complete problem the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it establishes if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE. If no such assignments exist, the function expressed by the formula is identically FALSE for all possible variable assignments. In this latter case, it is called unsatisfiable, otherwise satisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

Answer 120

the total flow into a network (the sum of the flows on its incoming edges)

Answer 121

Representation: - A private nested class defining Nodes; ○ Contains a key, a value, a left link, a right link, and a node count (# of nodes in subtree rooted here) - Get() ○ Go down tree according to key of interest; ○ Number of compares is 1 + depth of node - Put() ○ Search for key, then two cases: § Key in tree -\> reset value. § Key not in tree -\> add new node. § Still 1 + depth of node - Tree shape depends on order of insertion ○ If N keys are inserted in random order, expected number of compares for Insert/search is 2ln N; ○ the expected height is 4.311 lnN ○ Worst case height is N All operations take time proportional to height of the tree in the worst case. All of the methods go down one or two paths in the tree. Inorder traversal involves printing left-subtrees first, and then roots, and then right subtrees.

Answer 122

a set of nonnegative values associated w/ each edge, called edge flows

Answer 123

sequence of vertices connected by edges

Answer 124

1. create an edge-weighted DAG w/ a source s, a sink t, and two vertices for each job (a start vertex and an end vertex) 2. for each job, add an edge from its start vertex to its end vertex w/ weight equal to its duration. 3. for each precedence constraint v-w, add a zero-weight edge from the end vertex, corresponding to the beginning vertex corresponding to w 4. Add zero-weight edges from the source to each job's start vertex and from each job's end vertex to the sink. 5. Schedule each job at the time given by the length of its longest path from the source works in linear time. i don't understand this at all.

Answer 125

1. read the trie (encoded at the start of the bitstream) 2. read the count of chars to be decoded 3. use teh trie to decode the bitstream

Answer 126

store characters and values in nodes (not keys) each node has 3 children: smaller(left) equal(middle) and larger (right) Search: follow links corresponding to each char in the key. If less, take left link. If \>, take right link. If equal, take middle length and move to the next key char. TST node is five fields: value, char, 3 references. L + lnN for search hit; lnN for search miss. L + lnN for insert. 4N space. TST is as fast as hashing for string keys, and space efficient.

Answer 127

IntroSort - Run quicksort. - Cutoff to heapsort if stack depth exceeds 2 lg N. - Cutoff to insertion sort for N = 16.

Answer 128

the maximum depth among a tree's nodes

Answer 129

edge that connects a vertex to itself

Final Exam Flashcards

(158 cards)