Final Exam Flashcards

1
Q

double memory

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

simple cycle

A

a cycle w/ no repeated edges or vertices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

preorder ordering

A

if we save the vertex given as an argument to the recursive dsf() in a data structure, then iterate through it, we see all the graph vertices, in order determined by the nature of the data structure and whether we do the save before or after the recursive calls

order in which dfs() is called.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

linear probing

A

method of open addressing for hashing When a new key collides, find next empty slot, and put it there. Simplest approach is Linear Probing (when we hash to a table index that is already occupied with a key different from the search key), then we just check the next entry in the table (by incrementing the index). Linear probing is characterized by identifying three possible outcomes: ■ Key equal to search key: search hit ■ Empty position (null key at indexed position): search miss ■ Key not equal to search key: try next entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

NFA

A

nondeterministic finite-state automata

A DFA changes from state to state by looking at a character from the text string based on the content of that character. A NFA can handle the or operation that makes a DFA unable to determine whether or not the pattern could occur at a given point.

Like DFA, starts at state zero, reading the first char of a text and then changing. Differently, however, - characters appear in the nodes, not the edges, in the diagrams (???) - nfa recognizes a text string only after reading all its characters;

DFA can end prematurely ways a NFA can move from one state to another:

  • if the current state corresponds to a character in the alphabet AND the current character in the text string matches the character, the automaton can scan past the character in the text string and take the (black) transition to the next state. This is a match transition.
  • The automaton can follow any red edge to another state w/o scanning any text character. That’s a E-transition, referring to the idea that it corresponds to matching the empty string E. Since an NFA may have multiple edges leaving a given site, the transition from such a state is not determinisitc
  • it might take one transition at one point in time, and a different transition at another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

byte memory

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

simple path in a graph

A

serquence w/ no repeated vertices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

cycle or path length

A

the number of edges in a cycle/path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

cut property

A

given any cut in an edge-weighted graph, the crossing edge of minimum weight is in the MST (minimum spanning tree) of the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

st-cut

A

a cut that places a vertex s in one of its sets and vertex t in the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

weighted quick union

A

UF implementation(enables union(p, q), find(p) connected(p, q) and count()) organized thusly: a site indexed id[] array organized such that the id[] entry for each site is the name of another site in the same component (possibly itself) find() follows until a root is reached; connected is find(a) == find(b). In this case, find requires 1 + 2 * depth of node corresponding to given site. From 1 to 2N + 1 array accesses. (2 array accesses for every miss; one to test if there’s a hit; another to change to next possible hit - usually just N + 1). union finds the roots of a and b, then makes one root point to another. Always linear. *** Weighted quick union improves on this by checking and always connecting the smaller with the larger tree. This ensures that the depth of any node is at most lg N. It uses an extra array sz[] to hold the node counts and slightly more code. The array starts w/ 1 for each id and then increments by the count of the root that’s being added to the larger tree. Worst case order of growth for find(), connected() and union() becomes log N. Path compression - making all find()-examined nodes point directly to the root makes this nearly constant time amortized. (an extra loop in the find() operation that sets the id[] entry corresponding to each node encountered along the way to link directly to the root).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

acyclic graph

A

a graph with no cycles, ie no a path w/ at least one edge whose first and last vertices are the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3-way radix quicksort

A

a hybrid of quicksort and msd

use three-way partitioning on the leading character of the keys, moving to the next character on only the middle subarray (keys w/ leading char equal to the partitioning character);

insertion sort w/ small subarrays restrict alphabet; shuffle the array to protect against worst case performance; for strings, since compareTo[] accesses chars in left-to-right order, all sorts are already MSD sorts uses 2NlnN compares on average good for general-purpose, and strings w/ long prefix matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

lsd radix sort

A

good for short fixed-length stringsf least-significant digit string sort - Consider characters from right to left. - Stably sort using dth character as the key (using key-indexed counting). sort is stable linear time sorting for typical applications: no matter how large the value of N, it makes W passes through the data uses ~7WN + 3WR array accesses and extra space proportional to N + R to sort N items whose keys are W-character strings taken from a R-character alphabet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

escape sequences

A

\ separating metacharacters from characters in the alphabet; otherwise escape sequences may signify special characters and whitespace

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

negative cycle detection

A

the digraph has a negative cycle reachable from the source if and only if the bellmanford queue is nonempty moreover the subgraph of edges in our edgeTo[] must contain a negative cycle So any vertex v is updated in pass V, there exists a negative cycle (and can trace back edgeTo[v] entries to find it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

inorder traversal

A

print all the keys in the left subtree, then the key at the root, and then t key in the right subtree, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

int memory

A

4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

graph density

A

proportion of possible pairs of vertices that are connected by edges sparse graph graph has relatively few of the possible edges present dense graph has relatively few of the possible edges missing graph is sparse if its number of diff edges is within a small constant factor of V (# of vertices)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

regular expression formal definition

A

either - empty (representing the empty set of strings, w/ 0 elements) - a single character (the set of strings w/ 1 element, itself) - a regular expression enclosed in parentheses (the same set of strings as the RE w/o the parentheses) - two or more concatenated regular expressions (the cross product of the sets of strings represented by the individual components - all possible strings that can be formed by taking one string from each and concatenating them in the same order as the REs) - two or more regular expressions separated by the or operator (|) (the union of the sets represented by the individual components) - a regulation expression followed by the closure operator (the empty string or the union of the sets represented by the concatenation of any number of copies of the RE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

graph connectedness

A

connected if there’s a path from every vertex to every other vertex in teh graph a graph that is not connected consists of a set of connected components, or maximal connected subgraphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

huffman compression steps

A
  1. read input 2. tabulate the freq of occurence of each char value in teh input 3. build the huffman encoding trie corresponding to those frequencies 4. build teh corresponding codeword table, to associate a bitstring w/ each char value in the input 5. write the trie, encoded as a bitstring. 6. write the count of chars in the input, encoded as a bitstring 7. use the codeword table to write the codeword for each input char.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

reverse postorder

A

if we save the vertex given as an argument to the recursive dsf() in a data structure, then iterate through it, we see all the graph vertices, in order determined by the nature of the data structure and whether we do the save before or after the recursive calls reverse order in which dfs() returns.

put the vertex on a stack after the recursive calls reverse postorder in a DAG is a topological sort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

adjacent vertices

A

when two vertices are connected by an edge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
specified set
a sequence of characters within square brackets represents any of those characters
18
shellsort
- Move entries more than one position at a time by h-sorting the array. - Insertion sort, with stride length h. - Performance: The order of growth of the worst-case number of compares used by shellsort with the 3x+1 increments is N 3/2. - Liked because: ○ Has acceptable running time even for moderately large arrays ○ Requires small amount of code ○ Uses no extra space
18
prim's algorithm
computes the MST of any connected edge-weighted graph attach a new edge to a single growing tree at each step start w/ any vertex as a single vertex tree; add V - 1 edges to it, always taking next (coloring black) the minimum weight edge that connects a vertex on the tree to a vertex not yet on the tree data structures - vertices on the tree: vertex-indexed boolean array marked[], where marked[v] is true if v is on the tree - edges on the tree: one of two data structures: a queue mst to collect MST edges or a vertex-indexed array edgeTo[] of edge objects, where edgeTo[v] is the Edge that connects v to the tree - crossing edges (edge that connects a vertex in one set w/ a vertex in the other): a minPQ priority queue that compares edges by weight Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. ・Key = edge; priority = weight of edge. ・Delete-min to determine next edge e = v–w to add to T. ・Disregard if both endpoints v and w are marked (both in T). ・Otherwise, let w be the unmarked vertex (not in T ): – add to PQ any edge incident to w (assuming other endpoint not in T) – add e to T and mark w uses space proportional to E and time proportional to E log E to compute MST. Eager solution. Maintain a PQ of vertices connected by an edge to T, where priority of vertex v = weight of shortest edge connecting v to T. ・Delete min vertex v and add its associated edge e = v–w to T. ・Update PQ by considering all edges e = v–x incident to v – ignore if x is already in T – add x to PQ if not already on it – decrease priority of x if v–x becomes shortest edge connecting x to T Uses space proportional to V and time proportional to E log V in the worst case.
18
netflow
the difference between inflow and outflow (presuming there are no edges leaving t or s)
19
string memory
java 7 and later - reference to character array and an in - 16 bytes of object overhead + 8 for array reference + (24 + 2N) for char array + 4 for int instance variable + 4 padding 2. java 6 and earlier - two extra int instance variables - 40 bytes for string object + 2 ints (8) + char array (24 + 2N) 3. substring constant (40) or linear extra memory (requires new char array of length substring.length())
20
crossing edge of a cut
an edge that connects a vertex in one set w/ a vertex in the other
21
hash functions
Positive integers - Choose array of size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M - Effective for dispersing keys evenly between 0 and M-1 - If M isn't prime (not a power of 2 in particular), may be case that not all the bits of key play a role, missing an opportunity to disperse values evenly Floating Point Numbers - Multiply by M if between 0 and 1 and round off to the nearest integer to get an index between 0 and M - 1 - Defective because it gives more weight to the most significant bits of the keys - Solution: modular hashing on the binary representation of the key Strings - Treat them as huge integers; Horner's method is supposed to get the job done with N multiplations, additions, and remainder operations Compound Keys - Mix them together in the same way Strings are Performance optimization. - Cache the hash value in an instance variable. - Return cached value. - Best if computing the hash code is expensive Uniform hashing assumption: The hash functions used uniformly and independently distribute keys among the integer values b/t 0 and M - 1. No deterministic hash funcion can satisfy these conditions, but by selecting a hash function from a universal family
22
LZW compression
Alternative to Huffman compression that instead of maintaining a table of variable-length code words for fixed-length patterns, maintains a table of fixed-length codewords for variable-length patterns in the input. Importantly, we do not have to encode the table in this case. Compression: - Find the longest string s in the symbol table that is a prefix of the unscanned input - write the 8-bit value (codeword) associated w/ s - scan one character past s in the input - associate the next code word value w/ s + c (the lookahead character) in the symbol table Involves two symbol table operations: - find a longest-prefix match of the input w/ a symbol table key - add an entry associating the next codeword w/ the key formed by appending the lookahead char to that key A ternary search trie is used for it in practice. Expansion: Maintain symbol table that associates strings of chars w/ codeword values (inverse of that used for compression). Fill entries from 00 to 7F w/ one-char strings (one for each ASCII character), set the first unassigned code value to 81 (reserving 80 for end of file), set the current string val to the 1-char string consisting of the first character and perform the following steps until reading codeword 80. - write the current string val - read a codeword x from teh input - set s to the value associated w/ x in the symbol table - associate the next unassigned codeword value to val + c in the symbol table, where c is the first char of s. - set the current string val to s. A tricky case occurs when the lookahead process gets one character ahead of itself (the codeword is the same as the table entry to be completed). I don't fully get it.
22
flow network
edge-weighted digraph w positive edge weights (capacities).
23
maxflow-mincut theorem
left f be an st-flow. The following three conditions are equivalent: - there exists an st-cut whose capacity equas the value of the flow f - f is a maxflow - there is no augmenting path w. respect to f
24
DAG
directed acyclic graph a digraph w/ no directed cycles (if you don't know what a directed cycle is, 1) by relaxing vertices in topological order, we can solve the single-source shortest-path problem for edge-weighted DAGs in time proportional to E + V. Every edge is relaxed exactly once. Longest path, too.
25
dfa construction
1. copy dfa[][X] to dfa[][j] (for mismatch cases) 2. set fda[pat.charAt(j)[j] to j + 1 (for the match case) 3. Update X (practice problems until you drop)
26
stacks
collection based on the last-in-last-out LIFO policy. Objects are output in reverse of the order that they were put in. All operations implementable in constant time using linked lists.
27
knuth shuffle
- In iteration i, pick integer r between 0 and i uniformly at random. - Swap a[i] and a[r].
28
uses of union find API
networks: represent computers in a large network, with pairs representing connections in teh network variable name equivalence: when you declare two variable names as referring to the same object, what do you do? mathematical sets: do p and q belong to the same set? Can we use existing connections to set up a communications path, or do we need to establish new ones?
28
bipartite graph
a graph whose vertices we can divide into two sets such that all edges connect a vertex in one set w/ a vertex in teh other set
28
turing machine
a finite-state machine that can read inputs, move from state to state, and write outputs
29
BST delete operation
Delete. 1. If there are 0 children, delete t by setting parent link to null. 2. [1 child] Delete t by replacing parent link 3. [2 children] ○ Find the successor of T (the child to the right that has no left child) ○ Delete the minimum in t's right subtree ○ Put successor in t's spot
29
simple cycle
a cycle w/ no repeated edges or vertices
30
huffman coding
based on the idea that we can save bits by encoding frequently used characters w/ fewer bits than rarely used characters a prefix-free code encoded in a trie permits avoidance of delimiter usage by ensuring that no character code is the prefix of another Expansion involves beginning at a root and going left if a bit is 0 and right if it is 1. Once a leaf is encountered, the char at it is output and restart. compression involves creating an char-indexed st[]. Then recursively walks the tree, maintaining a binary string corresponding to the path from root to each node (0 for left links, 1 for right links) and setting the codeword corresponding to each character when the character is found in a leaf. Once trhe coding table is built, compression is simply looking up the code for each character in the input. trie construction: 1. create a forest of 1-node trees (leaves), one for each char in the input stream, each assigned a freq value. 2. build the coding trie from the bottom up according to the frequencies. Find the new nodes with the smallest frequencies and then create a new node w/ those two nodes as children (new node has sum frequencies). Then iterate the process: find the two nodes w/ the smallest frequencies and repeat. - Straightforward w/ a priority queue. for any prefix-free code, the length of the encoded bitstring is equal to the weighted external path length (the sum of the weight [associated freq count] times depth of all the leaves) of the corresponding tree given a set of r symbols and frequencies, the Huffman algorithm builds one of at least one optimal prefix-free code
30
residual network
given a st-flow networka dna an st-flow, the residual network for the flow has the same vertices as the original and one or two edges in the residual network for each edge in the original defined as follows: - for each edge e from v to w in the original, let f be its flow and c its capacity If f is positive, include an edge w-\>v in the residual w/ capacity f; and if f \< c, include an edge v -\> w in the residual w/ capacity c - f the forward edges represent the remaining capacity (the amount of flow we can add if traversing that edge); the backward edges represent the flow (amount of flow we can remove if traversing that edge)
31
heapsort
Two phases: heap construction (reorganize array into a heap) and sortdown (where we pull the items out of the heap in decreasing order Heap Construction Straightforward method: - proceed from left to right through array, using swim() to ensure that the items to the left of the scanniing pointer make up a heap-ordered complete tree, like successive priority queue insertions Clever method: - Proceed from right to left, using sink() to make subheaps as we go. - Since most of the heaps processed are small, processing time gets minimized greatly. - Uses fewer than 2N compares and fewer than N exchanges to construct a heap from N items Sortdown - Simply involves removing the larges remaining item from the heap and placing it into the array position vacated as the heap shrinks
32
range
characters enclosed in [] and separated by - may convey a range of values
33
r-way tries
Tries... STore characters in nodes (not keys) Each node has R children, one for each possible character. Follow links corresponding to each car in the key. Search hit: node where seawrch ends has a non-null value. Search miss: reach null link or node where seawrch ends has null value. Insertion: Follow links corresponding to each char in the key. Encounter null link? Create a new node. Encounter the last char of the key? Set value in that node. Performance: Search hit: need to examine all L chars for equality. Search miss. COuld have mismatch on first character. Typical case: examine only a few characters (sublinear). Space: R null links at each leaf. (but sublinear space possible if many short strings share common prefixes) Fast search hit and even faster miss, but wastes space.
34
separate chaining
Items that collide are chained together in separate linked lists. Idea is to choose M to be sufficiently large that the lists are sufficiently sort ot allow efficient sort through a two-step process: has to find the list that could contain the key, then sequentially search through that list for the key. Simpler/less efficient process: build for each of the M array indicies, a symbol table of the keys that hash to that index, thus reusing already developed code. Average length of the lists is N/M (N keys and M lists), no matter how many keys distributed among lsits. The probability that the number of keys in a list is within a small constant factor of N/M is extremely close to 1.
35
nfa simulation
keep the RE in an array re[] of char values that defines the match transitions (if re[i] is in the alphabet, then there is a match transition from i to i + 1). - for E-transitions, represent w/ a digraph - directed edges connecting vertices b/t 0 and M (one for each state). Regular-expression-matching NFA. ・We assume RE enclosed in parentheses. ・One state per RE character (start = 0, accept = M). ・Red ε-transition (change state, but don't scan text). ・Black match transition (change state and scan to next text char). ・Accept if any sequence of transitions ends in accept state after scanning all text chars. Q. How to efficiently simulate an NFA? A. Maintain set of all possible states that NFA could be in after reading in the first i text characters. Reduces to digraph reachability problem. Find all vertices reachable from a given source or set of vertices. Solution. Run DFS from each source, without unmarking vertices. Performance. Runs in time proportional to E + V. Determining whether an N-character text string is recognized by the NFA corresponding to an M-char RE takes time proportional to NM in the worst case, since for each of the N text chars we iterate through a set of states fo size no more than M and run a DFS on the digraph of E-transitions, which has no more than 2M edges.
36
maxflow reductions
job placement (what is the maximum number of jobs that can be filled?) product distribution (Is it possible to get the product from the warehouses to the retail outlets such that supply meets demand everywhere?) network reliability (What is the minimum number of trunk lines that can be cut to disconnect some pair of computers?) (many others)
38
topological sort
topological sort works to put all the vertices in order such that all of its directed edges point from a vertex earlier in the order to a vertex later in the other, avoiding directed cycles a digraph has topological order iff it is a DAG (directed acyclic graph). reverse postorder in a DAG is a topological sort; w/ DFS we can use this fact to topologically sort a DAG in time proportional to V + E
38
greedy algorithm for the MST problem
method that colors black all edges in the MST of any connected edge-weighted graph w/ V vertices; starting w/ all edges colored grey, find a cut w/ no black edges, colors its minimum-weight edge black, and continue until V-1 edges have been colored black
39
NFA building
like Djikstra's two-stack algorithm for evaluating arithmetic, but just 1 stack. Challenges. Remember left parentheses to implement closure and or; remember | to implement or. Solution. Maintain a stack. ・( symbol: push ( onto stack. ・| symbol: push | onto stack. ・) symbol: pop corresponding ( and any intervening |; add ε-transition edges for closure/or. Building the NFA corresponding to an M-character RE takes time and space proportional to M. Pf. For each of the M characters in the RE, we add at most three ε-transitions and execute at most two stack operations. States. Include a state for each symbol in the RE, plus an accept state. Concatenation. Add match-transition edge from state corresponding to characters in the alphabet to next state. Parentheses. Add ε-transition edge from parentheses to next state. Closure. Add three ε-transition edges for each \* operator. Or. Add two ε-transition edges for each | operator. Takes time/space proportional to M in the worst case.
41
depth-first search
To visit a vertex: - Mark it as having already been visited - Visit (recursively) all the vertices that are adjacent to it and that have not yet been marked Marks all the vertices connected to a given source in time proportional to the sum of their degrees. Maintains an array of boolean values to mark all of the vertices that are connected to the source. If the graph is connected, every adjacency list entry is ehecked. Enables path detection. Using an array edgeTo[] of int values that convey the edge v-w that takes us to each vertex w for the first time. That is, v-w is the last edge on the known path from s (source) to w. edgeTo[w] = v. All occurs in time proportional to actual path's length. can also be used to find connected components in a graph in time/space proportional to V + E. Maintains an id[] array that associates the same int value w/ every vertex in each component, starting w/ all 0s, and then marking those found next w/ 1, and then 2, etc. Faster than UF in theory because it's surely constant and UF is not. However, UF is faster in practice b/c it doesn't have to build a full representation of the graph. UF is also an online algorithm, while DFS must preprocess.
42
long memory
8
43
minimum spanning tree
a spanning tree (connected subgraph w/ no cycles that includes all the vertices) whose weight (the sum of the weights of its edges) is no larger than the weight of any other spanning tree
44
P
the set of all search problems that can be solved in polynomial time (worst case)
45
mark and sweep garbage collection
application of dfs-based reachability testing in digraphs each vertex represents an object and each edge rerepsents a reference to an object in an appropriate model for the memory usage of a running Java program. Periodically marks the set of potentially accessible objects by running a digraph reachability algorithm and sweeps through all objects, collecting the unmarked ones for use on new objects.
45
NP-complete
a search problem A is said to be NP-complete if all problems in NP poly-time reduce to A
47
tree size
number of nodes in a tree
48
postorder ordering
if we save the vertex given as an argument to the recursive dsf() in a data structure, then iterate through it, we see all the graph vertices, in order determined by the nature of the data structure and whether we do the save before or after the recursive calls. order in which dfs() returns.
49
symbol graph
input: - vertex names are strings - a specified delimiter separates vertex names - each line rerpesents a set of edges, connecteing the first vertex name on the line to each of the other vertices named on the line - the number of vertices V and the number of edges E are both implicitly defined Builds three data structures - symbol table st w/ String keys (vertex names) and int values (indices) - an array keys[] that serves as an inverted index, giving the vertex name associated w/ each integer index - a graph G built using the indices to refer to vertices Uses two passes through the data to build these. Enables assessment of "degrees of separation" by assigning number to a source s and another to each other according to how far it is from s.
50
queues
collection based on the first-in-first-out FIFO policy. Objects are output in the order that they were put in. All operations implementable in constant time using linked lists.
50
boyer-moore algorithm
for substring search method that leads to substantial performance improvements because it can back up in the text; scans pattern from right to left . ex. if we find matches on the seventh and sixth chars but not the fifth, then we can immediately slide the pattern seven positions to the right. implements a mismatched character heuristic using an array right[] that gives for each char in the alphabet the index of the rightmost occurrence in the pattern (or -1 if the char isn't in the pattern). \*\*\* index i moving from left to right through text, and j moving right to left through the pattern; if txt.charAt(i + j) = pat.charAt(j) for all j, then there's a match if the char causing mismatch isn't in the pattern, slide the pattern j + 1 positions to the right. (increment i by j + 1). [We could proibably further increase i w/ a kmp-like table] If the char c causing the mismatch is found in the pattern, increment i by j - right[c] If computation wouldn't increase i, increment i anyhow. Usually takes N/M char compares to search for a pattern of length N in a text of length N.
51
outflow
the total flow out of a vertex (the sum of the flows on its outgoing edges)
53
object reference memory
8
55
burrows-wheeler
he Burrows-Wheeler transform of a string s of length N is defined as follows: Consider the result of sorting the N circular suffixes of s. The Burrows-Wheeler transform is the last column in the sorted suffixes array t[], preceded by the row number first in which the original string ends up.
56
adjacency-lists data structure
keeping track of all vertices adjacent to each vertex on a linked list associaed w/ that vertex. Each edge appears twice in the data structure. So we get: -Space usage proportional to V + E - Constant time to add an edge - Time proportional to the degree of v to iterate through vertices adjacent to v
57
extended church-turing thesis
the order of growth of the running time of a program to solve a problem on any computing device is within a polynomial factor of some program to solve the problem on a Turing machine
58
church-turing thesis
all physically realizable computing devices can be simulated by a Turing Machine.
59
transitive closure
of a digraph G, another digraph w the same set of vertices but w/ an edge from v to w in the transitive closure if and only if w is reachable from v in G by convention, every vertex is reachable from itself, so the transitive closure has V self-loops helps for the reachability problem w/ digraphs; constructor uses space proportional to V^2 and time proportional V (V + E)
59
classifications of problems
1. Straightforward 2. Tricky but not difficult 3. Extremely challenging 4. Open
60
BST floor operation
Floor If k = key in node: The floor of k is k. If k \< key in node: The floor of k is in the left subtree. If k \> key in node: The floor of k is in the right subtree if there is any key
62
edge incident to two vertices
= that an edge connects two vertices
63
spanning tree of a connected graph
subgraph taht contains all of that graph's vertices and is a single tree
64
quickfind
UF implementation(enables union(p, q), find(p) connected(p, q) and count()) organized thusly: a site indexed id[] array maintaining the invariant that p and q are connected iff id[p] = id[q]; all sites in a component have the same value in id[] called quickfind because find() just returns id[p], which implies that connected() just tests id[p] == id[q] union() maintains the invariant. Doing nothing if p and q are already in the same component, it otherwise scans id[], renaming p's component to id[q]. Uses 1 array access for each call to find(), but between N + 3 and 2N + 1 array access to each call to union that combines two components. Each call to union() that combines to components does so by making 2 calls to find(), testing each of the N entries in the id[] array, and changing between 1 and N - 1 of them. Dynamic connectivity problem might require at least N - 1 calls to union and thus at least (N + 3)(N - 1) ~ N^2 array accesses. So runs quadratic in a lot of applications.
65
directed graph (digraph)
a set of vertices and a collection of directed edges connecting an ordered pair of vertices adjacency-list represented, where an edge v-\>w is represented as a list node containing w in the linked list corresponding to v. DFS marks all the vertices in a digraph reachable from a given set of sources in time proportional to the sum of the outdegrees of the vertices marked.
66
integrality property
corollary of the maxflow-mincut theorem when capacities are integers, there exists an integer-valued maxflow, and the Ford-Fulkerson algorithm finds it
68
cycle
a path w/ at least one edge whose first and last vertices are the same
69
linear programming
Given a set of M linear inequalities and linear equatiosn involving N variables, and a linear objective function of the N variables, find an assignment of values to the variables that maximizes the objective function, or report that no feasible assignment exists. I have no idea what this means, but it's extremely important because many important problems reduce to it, and we have efficient algorithms for solving linear programming problems. Maxflow and shortest paths both reduce to linear programimng, for example.
71
multigraphs
graphs w/ parallel edges
72
closure
specifies any number of copies of its operant + sign signifies at least one copy ? to signify 0 or 1 copy {} count or range within braces to specify a specific number eg (AB){1-2}
73
kosaraju's algorithm
for computing strongly connected components in a digraph efficiently: 1. Given a digraph G, use DepthFirstOrder to compute the reverse postorder of its reverse G^R. 2. Run standard DFS on G, but consider the unmarked vertices in the order just computed the id[] array produced conveys strongly connected components in groups w/ a same number; all vertices reached on a call to the recursive dfs() from the constructor are in a strong component uses preprocessing time and space proportional to V + E to support constaint time strong connectivity queries in a digraph
74
rabin-karp substring search
Basic idea = modular hashing. ・Compute a hash of pat[0..M-1]. ・For each i, compute a hash of txt[i..M+i-1]. ・If pattern hash = text substring hash, check for a match. A string of length M corresponds to an M-digit base-R number. To use a table of size Q for keys of this type, use a hash function to convert an M-digit base-R number to an int value b/t 0 and Q - 1. Modular hashing: take the remainder when dividing the number by Q (usually a random prime as large as possible while avoiding overflow). When M is large (like 100 or 1000), can apply horner's method. For each digit in the number, multiply by R, add the digit, and take the remainder when divide by Q. However, this takes linear time, making substring search quadratic in the worst case. To avoid that problem, do it this way (rolling hashing): from current value subtract leading digit multiply by radix add new trailing digit to produce new value But for the first R entries, use Horner's rule. Monte Carlo correctness version makes the hash table "size" Q (since not actually building a hash table, but testing for collision w one value. Using a long balue greater than 10^20, the probability of a mismatch becomes less than 10^-20. Las vegas version. Check for substring match if hash match; continue search if false collision. Usually linear and always correct.
75
kleene's theorem
proposition that there's an NFA corresponding to any given RE, and vice versa
76
edge tail
the second vertex in a direected edge
78
problems that reduce to sorting
- finding the median - counting distinct values - scheduling to minimize average completion time
78
array memory
24 bytes + memory need to store primitive type values / object references / arrays
78
directed path
a sequence of vertices in which there is a (directed) edge pointing from each vertex in the sequence to its successor in the sequence
80
left-leaning red-black BSTs
We represent 3-nodes as two 2-nodes connected by a single red link that leans left. We refer to BSTs that represent 2-3 trees in this way as red-black BSTs. Starts with standard BSTs (made up of 2-nodes) and adding extra information to encode 3-nodes. Red links - Bind together two 2-nodes to represent 3-nodes. Black links - binds together the 2-3 tree 3-nodes are represented such that one of the 2-nodes is the left child of the other. Alternative Definition: A BST such that: - No node has two red links connected to it. - Every path from root to null link has the same number of black links (black balance). - Red links lean left. 1–1 correspondence between 2–3 and LLRB Boolean added that indicates color of tree. Each node is pointed to by precisely one link (from its parent) ⇒ can encode color of links in nodes. Search is the same as for elementary BST (ignore color). It runs better because of balance, though. Most other ops (e.g., floor, iteration, selection) are also identical. Basic strategy. Maintain 1-1 correspondence with 2-3 trees. During internal operations, maintain: ・Symmetric order. ・Perfect black balance. [but not necessarily color invariants] How? Basic Operations: : rotation and color flip. Left Rotation. What the red link points to is made the parent of the pointer. The old pointer then points to oldpointed.left Right Rotation. The same but reversed. Color flip. Recolor to split a (temporary) 4-node. Always turn the root black after each insertion. Insert into a single 2-node. - Do standard BST insert; color new link red. - If new red link is a right link, rotate left. Insert into a 2-node at the bottom. - Do standard BST insert; color new link red. - If new red link is a right link, rotate left. Insert into a tree with two keys. - If the new key is larger than the two in the tree, and therefore attached to the rightmost link of the 3-node, ○ 1) flip colors to make it valid. - If the new key is smaller than the two in the tree, and therefore attached to the leftmost link of the 3-node ○ 1) rotate the top link to the right, and 2) just flip colors to make it valid - If the new key is between the two in the tree, and therefore a right-leaning node under a left-leaning node ○ 1) Rotate left the bottom link, then 2) rotate the top link to the right, and then 3) just flip colors to make it valid Insert into a 3-node at the bottom. - Do standard BST insert; color new link red - Rotate to balance the 4-node if needed - Flip colors to pass red link up one level - Rotate to make lean left (if necessary) - Repeat case 1 or case 2 up the tree (if needed). Same code for all cases. - Right child red, left child black: rotate left. - Left child, left-left grandchild red: rotate right. - Both children red: flip colors Height of tree is ≤ 2 lg N in the worst case. - Every path from root to null link has same number of black links. - Never two red links in-a-row. All symbol table operations are guaranteed to be logarithmic in the size of the tree (except range search, which costs time proportional to # of keys returned). Average length of a path from the root to a node in a red-black BST with N nodes is 1.00lgN.
81
standard recipe for hashing
"Standard" recipe for user-defined types. ・Combine each significant field using the 31x + y rule. ・If field is a primitive type, use wrapper type hashCode(). ・If field is null, return 0. ・If field is a reference type, use hashCode(). ・If field is an array, apply to each entry. In practice. Recipe works reasonably well; used in Java libraries. In theory. Keys are bitstring; "universal" hash functions exist. Basic rule. Need to use the whole key to compute hash code; consult an expert for state-of-the-art hash codes.
82
various structural properties of data that can be exploited to enable compression
1. small alphabets 2. long sequences of identical bits/chars 3 . frequently used chars 4. long reused bit/char sequences
83
strong connectivity
two vertices in a digraph are strongly connected if they are mutually reachable: there's a directed path from v to w and from w to v. a digraph is strongly connected if all of its vertices are strongly connected to one another two vertices are strongly connected iff there exists a general directed cycle that contains them both strong connectivity is an equivalence relation (features reflexive, symmetric, and transitive relationships) and subsets of strongly connected vertices are equivalence classes Applications: - decide how to group subjects in a textbook - decide how to organize progam modules - food web
84
3-way quicksort
How to deal with large numbers of duplicate keys? - Let v be partitioning item a[lo]. - Scan i from left to right. ○ (a[i] \< v): exchange a[lt] with a[i]; increment both lt and i ○ (a[i] \> v): exchange a[gt] with a[i]; decrement gt ○ (a[i] == v): increment i Quicksort with 3-way partitioning is entropy optimal and reduces running time from linearithmic to linear in broad class of applications.
85
tree
acyclic connected graph that is, a graph with no cycles and in it exists a path from every vertex to every other vertex in teh graph for all trees, - adding an edge that connects two vertices in a tree creates a new cycle - removing an edge from a tree breaks it into two separate subtrees
86
tree depth
the number of links on the path from a node to its root
87
edge-weighted graph
is a graph model where we associate weights or costs w/ each edge assumptions: graph is connected edge weights not necessarily distances edge weights maybe zero or negative edge weights are all different represented with an EdgeWeightedGraph with operations V() (# of vertices), E() edge count, addEdge, adj(int v) edges incident to v, edges() (all of this graph's edges, etc. consisting of Edge objects weight(), int either() either of this edge's vertices, other(int v) the other vertex, compareTo, etc linked list of links to objects containing pair of int values and a double value
88
edge head
the first vertex in a directed edge
90
binary heaps
A binary heap is a collection of keys arranged in a complete heap-ordered binary tree, represented in level order in an array (not using the first entry). Children are in positions 2k and 2k + 1; can travel up and down w/ simple arithmetic on array indices Insert. Add node at end, then swim it up. - Takes no more than 1 + lgN compares - Proportional to length of path between the root and the bottom of the heap Remove the maximum. Exchange root with node at end, then sink it down. - No more than 2lgN compares Two compares for each node on the path (except at the bottom): one to find the child with the larger key, the other to decide whether that child needs to be promoted not cache friendly Multiway Heaps Can modify code to instead represent ternary or other sized trees. - Trade-off is between lower cost from reduced tree height (log\_d N) and higher cost of finding the largest of the d children at each node Floyd's sink-to-bottom trick. - Sink key at root all the way to bottom. - Swim key back up. - Fewer compares; more exchanges. - Worthwhile depending on cost of compare and exchange.
91
quicksort
general purpose, especially when space is tight Basic plan. - Shuffle the array. - Partition so that, for some j ○ entry a[j] is in place ○ no larger entry to the left of j ○ no smaller entry to the right of j - Sort each subarray recursively. Partition: Repeat until i and j pointers cross. - Scan i from left to right so long as (a[i] \< a[lo]). - Scan j from right to left so long as (a[j] \> a[lo]). - Exchange a[i] with a[j]. When pointers cross. - Exchange a[lo] with a[j] Performance: Inner loop increments an index and compares an array entry against a fixed value, so hard to envision a shorter inner loop. -- Average case is ~2NlnN compares and 1/3NlnN exchanges; - close to 1.39NlgN compares - Worst case is N^2/2 compares. Improvements: - Quicksort is slower than insertion sort for tiny arrays so cutoff to insertion sort for small subarrays. - Sample items taken from the subarray and find the median thereof to improve partitioning a little through median-of-three partitioning. - With entropy-optimal sorting, Quicksort: implementation details - Partitioning in-place. Using an extra array makes partitioning easier (and stable), but is not worth the cost. - Terminating the loop. Testing whether the pointers cross is tricky. - Equal keys. When duplicates are present, it is better to stop scans on keys equal to the partitioning item's key. ○ Otherwise program might take quadratic time. - Preserving randomness. Shuffling is needed for performance guarantee. - Equivalent alternative. Pick a random partitioning item in each subarray.
92
kruskal's algorithm
Prim's algorithm builds the MST one edge at a time, finding a new edge to attach to a single growing tree at each step. Kruskal's algorithm also builds the MST one edge at a time; but, by contrast, it finds an edge that connects two trees in a forest of growing trees. We start w/ a degenerate forest of V single-vertex trees and perform the operation of combining two trees (using the shortest edge possible) until there is just one tree left: the MST. Uses space proportional to E and time proportional to E log E. Consider edges in ascending order of weight. ・Add next edge to tree T unless doing so would create a cycle.
93
vertex degree
number of edges incident to a vertex
94
simple graphs
graphs w/ no parallel edges or self-loops
96
vertex indegree
of edges going into a vertex
97
sequential search
for symbol tables 1. Maintain an (unordered) linked list of key-value pairs. 2. Search: scan through all keys until find a match. 3. Insert: scan through all keys until find a match; if no match, add to front. - Linear time for both in worst case; N/2 (for Search) and N (for insert) in average case. Not good enough, man. - Inserting N distinct keys into an empty Symbol table uses ~N^2/2 compares.
99
linked list memory
1. 16 bytes overhead 2. 8 bytes extra overhead 3. amount of memory used by each reference / instance variable.
100
graph
set of vertices and collection of edges that each connect a pair of vertices
101
brute force substring search
substring search by checking each possible position in the text at which the pattern could match - one pointer i into the text and one pointer j into the pattern requires ~NM character compares to search for a pattern of length M in a text of length N, in the worst case (worst case is a pattern and text of all As followed by a B. For each of the N - M + 1 possible match positions, all the characters are checked against the text, for a total cost of M(N - M + 1) and M is normally small)
103
knuth-morris-pratt algorithm
for substring search preferred when backup is inefficient; we can do much better when backup is easy whenever we detect a mismatch, we already know some of the chars in teh text; we can take advantage of this info i (the pointer scanning the text) is never backed up; only j (moving through the pattern) is. \*\*\* an array dfa[][] is used to record how far to back up the pattern pointer when a mismatch is detected. During the search dfa[txt.charAt(i)][j] is the pattern position to compare w txt.charAt(i + 1) after comparing txt.charAt(i) w/ pat.charAt(j). dfa[pat.charAt(j)][j] is always j + 1 since w/ a match we just want to move on to the next character. Backup in general is the length of the maximum overlap of the beginning of the pattern w/ the known text characters. \*\*\* during search, when i and j point to mismatching characters, next posible position for a mattern match is beginning at psition i - dfa[txt.charAT(i)][j]. accesses no more than M + N charas to search for a pattern of length M in a text of length N. Because we access each pattern char once when computing dfa[][] and each text char once (in the worst case) in search().
104
properties of shortest paths algorithms
paths are directed the weights are not necessarily distances not all vertices need be reachable negative weights introduce complications shortest paths are normally simple shortest paths are not necessarily unique parallel edges and self-loops may be present data structures ・ distTo[v] is length of shortest path from s to v. ・ edgeTo[v] is last edge on shortest path from s to v For shortest path implementations, start w/ distTo[source] = 0 and the rest = Double.POSITIVE\_INFINITY. General algorithm Initialize distTo[s] = 0 and distTo[v] = ∞ for all other vertices. Repeat until optimality conditions are satisfied: - Relax any edge Relax edge e = v→w. As an algorithm proceeds, it gathers information about the shortest paths that connect the source to each vertex encountered in our edgeTo[] and distTo[] data structures. By updating this information when we encounter edges, we can make new inferences about shortest paths. Specifically, we use edge relaxation, defined as follows: to relax an edge v-\>w means to - test whether the best known way from s to w is to go from s to v, - then take the edge from v to w, and, - if so, update our data structures to indicate that to be the case.
106
forest
disjoint set of trees (acyclic connected graphs)
107
path compression
In weighted quick union, making all examined nodes during find() operation point directly to the root makes this nearly constant time amortized. (an extra loop in the find() operation that sets the id[] entry corresponding to each node encountered along the way to link directly to the root).
109
vertex outdegreee
number of edges going from a vertex
110
how to iterate through all keys in a trie in sorted order
do inorder traversal of trie; add keys encountered to a queue; maintains equence of chars on path from root to node
112
ways to avoid ambiguity in a code
1. fixed-length code 2. append special stop char to each codeword 3. general prefix-free code
113
arbitrage problem
is a negative-cycle detection problem in edge-weighted digraphs replace each weight by its logarithm, negated. W/ this change, computing path weights by multiplying edge weights in the original problem corresponds to adding them in the transformed problem. The transformed edge weights might be negative or positive, a path from v to w gives a way of converting from currency v to currency w and any negative cycle is an arbitrage opportunity.
114
insertion sort
- In iteration i, swap a[i] with each larger entry to its left. - Items to the left are in sorted order during the sort, but not necessarily in final position - Fully sorted once index reaches right end - Steps: ○ Move the pointer to the right. ○ Moving from right to left, exchange a[i] with each larger entry to its left. - Performance: ○ The number of compares is the number of exchanges plus an additional term equal to (N - # of times the item inserted is the smallest so far) ○ In worst case, this is N^2/2 compares and N^2/2 exchanges; ○ in best case, this is N-1 compares and 0 exchanges ○ On average, this is N^2/4 compares and N^2/4 exchanges ○ Best case involves array being in order; worst case is reverse order ○ The number of exchanges used by insertion sort is equal to the number of inversions in the array ○ The number of compares is at least equal to the number of inversions plus the array size - 1 ○ For partially sorted algorithms (number of inversions \< cN) insertion sort runs in linear time - Improvement: ○ Shortening inner loop to move larger inputs to the right one position rather than doing full exchanges (Half exchanges) ○ Binary insertion sort. Use binary search to find insertion point. Makes compares N lg N, but still a quadratic # of array accesses. Liked because: - effective w/ partially sorted and tiny arrays
115
search problem
a search problem is a problem having solutions w/ the property that the time needed to certify that any solution is correct is bounded by a polynomial in the size of the input An algorithm solves a search problem if, given any input, it either produces a solution or reports that none exists
117
dijikstra's algorithm
1. initialize dist[s] to 0 and all other distTo[] to positive infinity 2. relax and add to teh tree a non-tree vertex w/ the lowest distTo[value], continuing until all vertices are on the tree or no non-tree vertex has a finite DistTo[] value along with distTo[] and edgeTo[] data structures, includes a priority queue pq to keep track of vertices that are candidates for being the next to be relaxed. uses space proportional to V and time proportional to E log V to compute the SPT rooted at a given source in an edge-weighted digraph w/ E edges and V vertices
118
object memory
1. overhead of 16 bytes 2. padding of memory to make it a multiple of 8 3. amount of memory used by each instance variable / reference
119
char memory
2
120
MSD radix sort
good for random strings MSD string (radix) sort. ・Partition array into R pieces according to first character (use key-indexed counting). ・Recursively sort all strings that start with each character (key-indexed counts delineate subarrays to sort). not sure i get it switch to insertion sort for small subarrays is a must for MST since each sort involves initialize all entries of count array to 0 and transforming them to indices (many characters = huge cost). Effect of cut-off is dramatic. also relatively slow for subarrays containing large numbers of equal keys. The worst case wis when all keys are equal. Uses two auxiliary arrays to do the partitioning: a temproary array for distributing keys aux[] and an array that hodls the counts that are transformed into partition indices count[]. For random inputs, examines just enough chars to distinguish among keys, and then running time is sublinear n the # of chars in the data . For nonrandom, could still be sublinear but might need to examine more characters than in the random case, depending on the data. Nearly linear when significant # of equal keys present. In the worst case, examines all the chars in the keys and running tmime is linear. All strings are equal. examines about N logr N chars on average between 8N + 3R and ~7wN + 3WR array accesses to sort N strings taken from an R-character alphabet, where w is the average string length space is proportional to R times the length of the longest string + N, in the worst case
121
run-length coding
taking advantage of a bitstream w/ long runs of repeated bits by specifically encoding the lengths of runs. Real simple.
122
BST rank operation
Rank. How many keys \< k ? Recursive algorithm: - If the given key is equal to the key at the root, return the # of keys in the left subtree - If it's \< than the key at the root, return the rank (recursively computed) of the key in the left subtree - If it's \> than the key at the root, return the # of keys in the right subtree + 1 (the key at the root) + the rank (recursively computed) of the right subtree
123
float memory
4
124
selection sort
Selection Sort - In iteration i, find index min of smallest remaining entry. Swap a[i] and a[min]. - Invariants: ○ Entries the left of ↑ (including ↑) fixed and in order. ○ No entry to right of ↑ is smaller than any entry to the left of ↑ ○ Takes just as long for a sorted array as it does an unsorted array ○ The number of exchanges is a linear function of array size - Steps: 1. Move pointer to the right 2. Identify index of minimum entry on right 3. Exchange into position - Performance: ○ Invariably uses N(N - 1)/2 ~ .5 \* N^2 compares and N exchanges
125
problems that reduce to shortest-path problems
single-source shortest path in undirected graphs parallel precedence-constrained scheduling arbitrage
126
directed cycle
a directed path w/ at least one edge whose first and last vertices are the same important for resolving scheduling problems (precedence constrained scheduling); topological sort works to put all the vertices in order such that all of its directed edges point from a vertex earlier in the order to a vertex later in the other, avoiding directed cycles
127
breadth-first search
Perfect for finding the shortest path possible from source s to target v. Maintain a queue of all vertices that have been marked but whose adjacency lists have not been checked; put the source vertex on the queue, then perform the following steps until the queue is empty: - take the next vertex v from the queue and mark it - put onto the queue all unmarked vertices that are adjacent to v Result is an edgeTo[] array, a parent-linked representation of a tree rooted at s, defined as shortest path from s to any vertex connected to s. BFS takes time proportional to V + E in the worst case. BFS marks all the vertices connected to s in time proportional to the sum of their degrees; if the graph is connected, this sum is the sum of all the degrees of all the vertices, or 2E.
128
st-flow network
a flow network w/ 2 identified vertices, a source s and a sink t
129
complement
enclosed in brackets, preceded by ^; any chracter but one of those characters
130
feasible flow
condition where no edge's flow is greater than the edge's capacity and the local equilibrium condition that every vertex's netflow is zero (except s and t)
131
bellman-ford algoirhtm
Initialize distTo[s] = 0 and distTo[v] = ∞ for all other vertices. Repeat V times: - Relax each edge. Takes time proportional to EV and extra space proportional to V Solves the single source shortest path problem from a given source s for any edge-weighted digraph w/ V vertices and no negative cycles reachable from s. Can modify by determining a priorir that numerous edges are not going to lead to a successful relaxation in any given pass. The only edges that COULD change are those leaving a vertex whose distTo[] value changed in the previous pass. A FIFO queue keeps up with those vertices.
132
cut
a cut of a graph is a partition of its vertices into two nonempty disjoint sets
133
NP
the set of all search problems N stands for nondeterminism; represents the idea that one way to extend the power of a computer is to endow it w/ the power of nondeterminism: to assert that when an algorithm is faced w/ a choice of several options, it has the power to "guess" the right one no one has been able to prove that nondeterminism helps for any particular problem; lno one has found a problem that can be proven to be in NP but not in P
134
bottom-up mergesort
- Organize the merges so that we do all the merges of tiny subarrays in one pass, and then do a second pass to merge those subarrays in pairs, and so forth, continuing until we do a merge that encompasses the whole array. - about 10% slower than recursive, top-down mergesort on typical systems because of caching
135
parallel edges
two edges that connect the same pair of vertices
136
merge sort
Basic plan. ・Divide array into two halves. ・Recursively sort each half. ・Merge two halves. Performance: - Uses between 1/2 NlgN and NlgN compares to sort any array of length N - Top-down mergesort uses at most 6NlgN array accesses to sort an array of length N ○ Each merge uses 2N for the copy, 2N for the move back, and at most 2N for the compares [so entire algorithm involves 6NlgN array accesses, since there are lgN merges] - Number of compares for the lerge is at least N/2 - Mergesort uses extra space proportional to N Improvements: - Use insertion sort for small arrays - Stop if already sorted - Switch the role of the input array and the auxiliary array at each level to eliminate the time required to copy to the auxiliary array for merging
138
spanning forest of a graph
the union of spanning trees of its connected components
140
2-3 trees
A 2-3 search tree is a tree that is either empty or - A 2-node, with one key (and associated value) and two links, a left link to a 2-3 search tree with smaller keys, and a right link to a 2-3 search tree with larger keys - A 3-node, with two keys (and associated values) and three links, a left link to a 2-3 search tree with smaller keys, a middle link to a 2-3 search tree with keys between the node’s keys, and a right link to a 2-3 search tree with larger keys - As usual, we refer to a link to an empty tree as a null link. Search Directly generalizes that for BSTs. - Compare search key against keys in node. - Find interval containing search key. - Follow associated link (recursively Insert into a 2-Node If search terminates at a two-node, replace it with a 3-node. Else??? Insert into a tree consisting of a single 3-Node - Add new key to 3-node to create temporary 4-node - Move middle key in 4-node into parent - Repeat up the tree as necessary - If reach the root and it's a 4-node, split it into three 2-nodes Splitting a 4-node is a local transformation: constant number of operations. Each transformation maintains symmetric order and perfect balance. Search and insert operations in a 2-3 tree with N keys are guaranteed to visit at most lg N nodes. Proof: The height of an N-node 2-3 tree is between ⎣log3 N⎦ = ⎣(lg N)/(lg 3)⎦ (if the tree is all 3-nodes) and ⎣lg N⎦ (if the tree is all 2-nodes). Direct implementation is complicated, because: ・Maintaining multiple node types is cumbersome. ・Need multiple compares to move down tree. ・Need to move back up the tree to split 4-nodes. ・Large number of cases for splitting.
141
wildcard
. represents any single character
142
boolean memory
1
144
key-indexed counting
string sort algorithm effective whenever keys are small integers 1. Compute frequency counts - count the frequency of occurrence of each key value, using an int array count[]. For each item, use the key to access an entry in count[] and increment that entry. That entry is always count[r + 1] for some reason. 2. transform counts to indices - use count[] to compute, for each key value, the starting index positions in the sorted order of items w/ that key. Obtained by summing the frequency counts of smaller values. 3. Distribute the data - use index table created from the count[] array to move the items into an auxiliary array aux[] 4. Copy the sorted result back to the original array Uses 8N + 3R + 1 array accesses to stably sort N items whose keys are integers b/t 0 and R - 1. Evades sorting bound by not performing compares.
145
mincut
the minimum st-cut given an st-network, find an st-cut such that the capcaity of no other cut is smaller. For any st-flow, the flow across each st-cut = the value of the flow. The outflow from s is equal to the inflow to t. No st-flow's value can exceed teh capacity of any st-cut.
146
negative cycle
an directed cycle whose total weight is negative there exists a shortest path from s to v in an edge-weighted digraph iff - there exists at least one directed path from s to v and no vertex on any directed path from s to v - no vertex on any directed path from s to v is on a negative cycle vastly complicates the shortest path (improperly defined problem) and even the shortest simple path problem (only exponentially possible);
147
ford-fulkerson algorithm
AKA the augmenting path algorithm For any path w/ no forward edges, and no empty backward edges, we can increase the amount of flow in the network by increasing flow in forward edges and decreasing flow in backward edges. The amount by which the flow can be increased is limited by the min of the unused capacities in the forward edges and flows in the backward edges. This is called an augmenting path. Start w/ 0 flow everywhere. Increase the flow along any augmenting path from source to sink (w/ no full forward edges or empty backward edges), continuing until there are no such paths in the network. In a shortest-augmenting-path implementation, the # of augmenting paths needed for a flow network w/ V vertices and E edges is at most EV/2 and takes time proprtional to VE^2/2
148
boolean satisfiability
the first proven NP-complete problem the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it establishes if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE. If no such assignments exist, the function expressed by the formula is identically FALSE for all possible variable assignments. In this latter case, it is called unsatisfiable, otherwise satisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.
149
inflow
the total flow into a network (the sum of the flows on its incoming edges)
150
BSTs
Representation: - A private nested class defining Nodes; ○ Contains a key, a value, a left link, a right link, and a node count (# of nodes in subtree rooted here) - Get() ○ Go down tree according to key of interest; ○ Number of compares is 1 + depth of node - Put() ○ Search for key, then two cases: § Key in tree -\> reset value. § Key not in tree -\> add new node. § Still 1 + depth of node - Tree shape depends on order of insertion ○ If N keys are inserted in random order, expected number of compares for Insert/search is 2ln N; ○ the expected height is 4.311 lnN ○ Worst case height is N All operations take time proportional to height of the tree in the worst case. All of the methods go down one or two paths in the tree. Inorder traversal involves printing left-subtrees first, and then roots, and then right subtrees.
151
st-flow
a set of nonnegative values associated w/ each edge, called edge flows
152
path in a graph
sequence of vertices connected by edges
153
critical path method for parallel scheduling
1. create an edge-weighted DAG w/ a source s, a sink t, and two vertices for each job (a start vertex and an end vertex) 2. for each job, add an edge from its start vertex to its end vertex w/ weight equal to its duration. 3. for each precedence constraint v-w, add a zero-weight edge from the end vertex, corresponding to the beginning vertex corresponding to w 4. Add zero-weight edges from the source to each job's start vertex and from each job's end vertex to the sink. 5. Schedule each job at the time given by the length of its longest path from the source works in linear time. i don't understand this at all.
154
huffman expansion steps
1. read the trie (encoded at the start of the bitstream) 2. read the count of chars to be decoded 3. use teh trie to decode the bitstream
155
ternary search tries
store characters and values in nodes (not keys) each node has 3 children: smaller(left) equal(middle) and larger (right) Search: follow links corresponding to each char in the key. If less, take left link. If \>, take right link. If equal, take middle length and move to the next key char. TST node is five fields: value, char, 3 references. L + lnN for search hit; lnN for search miss. L + lnN for insert. 4N space. TST is as fast as hashing for string keys, and space efficient.
156
introsort
IntroSort - Run quicksort. - Cutoff to heapsort if stack depth exceeds 2 lg N. - Cutoff to insertion sort for N = 16.
157
tree height
the maximum depth among a tree's nodes
158
self-loop
edge that connects a vertex to itself