More graph Algos Flashcards
Forest
is an undirected, disconnected, acyclic graph. In other words, a disjoint collection of trees
Kruskal’s algo for a connected (+ edge weighted + undirected) graph does what?
Finds a minimum spanning tree
Is kruskals greedy or Non greedy
Greedy
What is a spanning tree?
(Pertains to undirected graphs). a subgraph that is a tree which includes all of the vertices of The graph
Prim’s algo benefits vs other MST algos
for graphs that are sufficiently dense, Prim’s algorithm can be made to run in linear time, meeting or improving the time bounds for other algorithms.
List the three basic greedy algorithms to find an MST
Prim’s (Jarnik’s), Kruskal’s, Borůvka’s (Sollin’s) algorithms
Kruskals
create a forest F (a set of trees), where each vertex in the graph is a separate tree
create a set S containing all the edges in the graph
while S is nonempty and F is not yet spanning
remove an edge with minimum weight from S
if the removed edge connects two different trees then add it to the forest F, combining two trees into a single tree
At the termination of the algorithm, the forest forms a minimum spanning forest of the graph. If the graph is connected, the forest has a single component and forms a minimum spanning tree.
Topological sort definition
a linear ordering of a graph’s vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering.
Canonical use case of topological sort
in scheduling a sequence of jobs or tasks based on their dependencies. The jobs are represented by vertices, and there is an edge from x to y if job x must be completed before job y can be started (for example, when washing clothes, the washing machine must finish before we put the clothes in the dryer)
List names of Topological sort implementations - Kahn, Tarjan’s DFS solution, PRAM with adjacency matrix squaring, parallelized Khan with a distributed memory machine
Kahn, Tarjan (DFS), adjacency matrix squaring on a PRAM, parallelized Kahn on a distributed memory machine
High level how to quickly compute shortest path thru weighted directed a cyclic graph
Use topological sort
Low level implementation of finding shortest paths on directed acyclic weighted graph
…
When are topological ordering a impossible
When there are cycles
DAG
Directed acyclic graph
Known Time complexity of finding a topological sorting of a DAG
O(n)
FAS stands for
Feedback Arc Set
Feedback Arc set is aka
Feedback Edge Set
Feedback arc set
set of edges which, when removed from the graph, leaves an acyclic graph. Put another way, it is a set containing at least one edge of every cycle in the graph.
What algorithm do you use to find shortest path from a source to all vertices?
Djikstra’s
Algo name for shortest path from every vertex to every other vertex
Floyd Marshall
Memory complexity of sum of length of all adjacency lists for directed graph vs undirected graph
|E| vs 2|E|
For both directed and undirected graphs the memory complexity of the adjacency list representation is
O(V + E)
disadvantage of adjacency-list representation for edges
there is no quicker way to determine if a given edge (u, v) is present in the graph than to search for v in the adjacency list Adj[u]. which worst case could be O(V) if u is connected to all vertices
List and compare two alternatives to adjacency lists to represent edges
1) have Adj[u] be a hash table containing the vertices v for which (u, v) ∈ E. Expected time is O(1). Worst case time is O(V) (if you get a lot of hash collisions (but is that really even likely???)). 2) have Adj[u] be an array that you sort and then during look-up you perform binary search for an expected and worst case time of O(lg V)`
Djikstra binary heap implementation time complexity
Theta((|E|+|V|)log|V|) time complexity,
Djikstras does what?
Finds the shortest path to every node in the (weighted-edge) graph from a single source
Bellman Ford does what?
Similar to Dijkstra’s algorithm, the Bellman-Ford algorithm 1)works to find the shortest path between a given node and all other nodes in the graph. Though it is 2)slower than the former, Bellman-Ford makes up for its a disadvantage with its versatility. Unlike Dijkstra’s algorithm, Bellman-Ford is capable of 3) handling graphs in which some of the edge weights are negative.
4) It’s important to note that if there is a negative cycle – in which the edges sum to a negative value – in the graph, then there is no shortest or cheapest path. Meaning the algorithm is prevented from being able to find the correct route since it terminates on a negative cycle. 5)Bellman-Ford is able to detect negative cycles and report on their existence.
Floyd Marshal does what?
The Floyd-Warshall stands out in that unlike djikstra and bellman Ford it is not a single-source algorithm. Meaning, it calculates the shortest distance between every pair of nodes in the graph, rather than only calculating from a single node
What is Floyd marshal very useful for?
when it comes to generating routes for multi-stop trips as it calculates the shortest path between all the relevant nodes. For this reason, many route planning software’ will utilize this algorithm as it will provide you with the most optimized route from any given location. Therefore, no matter where you currently are, Floyd-Warshall will determine the fastest way to get to any other node on the graph.
Johnson’s algorithm works best when?
With sparse graphs
Can Floyd warshal detect the presence of a negative weight cycle ?
Yes
How does Floyd warshal detect the presence of a negative weight cycle?
…
Floyd warshal time complexity
O(V^3)
Johnson’s runtime complexity
O(V^2 * logV + |V||E|)
How does Iohnsons work high level?
it uses two other shortest path algorithms as subroutines. It uses Bellman-Ford in order to reweight the input graph to eliminate negative edges and detect negative cycles. With this new, altered graph, it then uses Dijkstra’s shortest path algorithm to calculate the shortest path between all pairs of vertices. The output of the algorithm is then the set of shortest paths in the original graph.
Bellman-Ford time complexity
O(VE)
Djikstra Fibonacci heap time complexity
O(|E|+|V|log| V|)
In graphs how does the time complexity w.r.t E relate to V
E in worst case is V^2
What complexity class is determining whether a certain edge is in the MST?
P
What complexity class is determining whether a total weight (of an MST) exceeds a certain value?
P
MST
minimum spanning tree
What is considered a dense graph?
|E|/|V| > log log log |V|
what is the dynamic MST problem
concerns the update of a previously computed MST after an edge weight change in the original graph or the insertion/deletion of a vertex
maximum spanning tree
a spanning tree with weight greater than or equal to the weight of every other spanning tree
how to find a maximum spanning tree?
Such a tree can be found with algorithms such as Prim’s or Kruskal’s after multiplying the edge weights by -1 and solving the MST problem on the new grap
The degree constrained minimum spanning tree
a minimum spanning tree in which each vertex is connected to no more than d other vertices, for some given number d.
The capacitated minimum spanning tree
a tree that has a marked node (origin, or root) and each of the subtrees attached to the node contains no more than a c nodes. c is called a tree capacity.
Steiner tree problem
the problem of finding a minimum tree that spans the given subset of vertices (aka the terminals). this minimum tree may contain vertices that were not provided as terminals
What does Tarjan’s algorithm do?
an efficient graph algorithm to find the strongly connected components in a directed graph in linear time by utilizing Depth First Search traversal of a graph
SCCs
Strongly connected components
Strongly connected meaning
A directed graph is called strongly connected if there is a path in each direction between each pair of vertices of the graph. That is, a path exists from the first vertex in the pair to the second, and another path exists from the second vertex to the first (of course the path can be more than one edge long). Equivalently, a strongly connected component of a directed graph G is a subgraph that is strongly connected, and is maximal with this property: no additional edges or vertices from G can be included in the subgraph without breaking its property of being strongly connected.
Daniel Larkin 2014
They concluded that d-ary heaps such as binary heaps are faster than all other heap implementations when the decrease-key operation is not needed (and hence there is no need to externally track the location of nodes in the heap), but that when decrease-key is needed pairing heaps are often faster than d-ary heaps and almost always faster than other pointer-based heaps, including data structures like Fibonacci heaps that are theoretically more efficient. [is this conclusion just for djikstras? Or heaps in general?]
Mo Chen 2007
examined priority queues specifically for use with Dijkstra’s algorithm and concluded that in normal cases using a d-ary heap without decrease-key (instead duplicating nodes on the heap and ignoring redundant instances) resulted in better performance, despite the inferior theoretical performance guarantees.
Djikstras implementation w PQ w decreaseKey
Djikstra(adj, s){ // s = sourceKey
Dist = new PQ() keys = Object.keys(adj) For (let i = 0; i < keys.length; i ++) { key = keys[i] dist.insert(key, NUMBER.MAX_SAFE_INTEGER) Unvisited.push(keys[i]) }
Dist.decreaseKey(s,0)
While(x = dist.extractMin()){
Neighbors = adj[x] For(let i =0; I < neighbors.length;I++){ n = neighbors[i] if (dist.get(x) + n.weight < dist[n.label]) { dist[n.label] = dist[x] + n.weight Path[n.label] = x } }
Djiksyra with priority queue verbal explanation
You should use priority queue where the vertex with the shortest distance from the starting vertex will get the highest priority. Initially, all vertices will have the shortest distance of infinity and the starting vertex will have the shortest distance 0.
Start by inserting of all vertices (with its edges) from the graph inside the PQ. Remove vertex from the PQ and explore all its edges. Compare the shortest distances with all adjacent vertices and if any distance is less than the shortest distance on the current vertex, update adjacent vertex shortest distance inside the PQ. Continue while PQ is not empty. Vertices which have no edges will finish with the shortest distance of infinity because it is not possible ‘get to them’ from the starting vertex. However, they will be still removed from the PQ.
What happens if you don’t relax the edges of the unvisited node that has the least distance of all the unvisited nodes?
I’m not sure. I think it breaks some loop invariant that the logic of djikstra depends on. I’m not sure of the exact nature of the incorrect output that would result, but i might test it out one of these days
What is a simple way to emulate the behavior of the decrease-key operation of a priority queue without actually implementing such a method?
(Used in djikstra’s)
To decrease the key of a node, just insert that node again in the pq, such that there are now 2+ nodes with the same key in the queue, but such that the node with such key and the lowest value will now in the future be popped first from the queue
When popping (aka polling) nodes from the queue, when a node is polled, as part of preprocessing you add that key to a “visited” list, and in the future when polling nodes, if the polled node’s key already exists in the visited list, you skip processing that node and poll the next node to preprocess+process it
Speed of prim’s vs kruskal’s vs Boruvka’s algorithm’s?
equally asymptotically fast for sparse graphs, but slower than algorithms; Prim’s can be made to run in linear time on dense graphs
SoftHeap
a kind of priority queue that gives us an optimal
trade-off between accuracy and speed.
Pettie’s “contractibility”
A subgraph C is said to be DJP-contractible with respect to G if after executing the DJP algorithm on G for some number of steps, with a suitable
start vertex in C, the tree that results is a spanning tree for C. [ https://www.cs.utexas.edu/~vlr/papers/optmsf-jacm.pdf]
How can one provably identify edges in the MSF? How can one provably identify edges NOT in the MSF?
by using the cut property; by using the cycle property
The cut property states
that the lightest edge crossing any partition of the vertex set into two parts must belong to the MSF.
the cycle property states
that the heaviest edge in any cycle in the graph cannot be in the MSF.
contracting in graphs
https://www.cs.utexas.edu/~vlr/papers/optmsf-jacm.pdf
nominal density of a graph
(during contraction - e.g. in Pettie’s algorithm)
y is the ratio m/n’, where m, as usual, is the number of edges in the original graph, and n’ is the number of vertices in the current graph
[https://www.cs.utexas.edu/~vlr/papers/optmsf-jacm.pdf]
How to find a cycle in a directed graph?
Technique 1 (using colors). Use enum WHITE|GRAY|BLACK or UNVISITED | VISITING | VISITED
Do DFS:
hasCycle(n, adj, nodes) {
visiting[n.label] = true // use separate data structure if you can’t just add properties on the node itself
for (let child of adj[n.label]) { if (visiting[child.label]) { return true // if the connection to this child itself forms a cycle return true } else { // or if any grandchildren+ forms a cycle if (hasCycle(nodes[sister.label], adj, nodes)) { //)) return true } } return false }
}
——-
Technique 2
//doesn’t check if this node forms a particular cycle, but rather if one of its neighbors or neighbors’ neighbors forms the cycle (I guess taking into account the history of the path we are doing) function hasCycle(n, adj, path: SearchableStack) {
path.push(n.label)
for (let x of adj[n]) {
if (path.contains(n.label)) {
SPFA aka
Shortest Path Faster Algorithm
SPFA is an improvement of what algorithm
Bellman Ford; it is queue optimized Bellman Ford
SPFA is believed to work well when?
random sparse graphs and is particularly suitable for graphs that contain negative-weight edges
SPFA worst case running time
O(|V| |E|), same as Bellman Ford’s
SPFA average runtime
Experiments suggest that the average running time is O(|E|) , and indeed this is true on random graphs, but it is possible to construct sparse graphs where SPFA runs in time Omega (|V|^{2})} like the usual Bellman-Ford algorithm
Dual priority queue version of djikstra’s performance
has a significant overhead in the constant factor, and hence is quite slow in in-core execution, though it performs by far the best on sparse graphs out-of-core. [mo Chen 2007]
IDM PQ
A priority queue that only supports insert and delete-min, as opposed to also supporting decrease-key
A DK PQ
A priority queue that includes a decrease key operation
A second alternative to a DK PQ
If there is no explicit decrease key operation, if you have access to a find operation you can find, remove, and then add again.
How does djikstra algorithm work/perform correctly when it is used with a priority queue that doesn’t have a decrease key operation?
When you pop the min node x from the priority queue, you process it. Part of it is adding any of x’s neighbors to the priority queue if the path for that neighbor n thru x has a smaller weight (aka cost) than n used to have.
Now there are two categories of questions that you may have: 1) what if node C was added to the PQ with weight 10 ( A->C=10) but then was also added to the PQ again with (A->B=4, B->C=4) giving it a better candidate weight of 8. Since node C is now in the queue twice, is the algorithm still going to function correctly? Well, to make sure it still functions correctly, you need to mark nodes visited when they are removed from the priority queue, and never do the processing on a node being removed that has already been visited. This works because of the following lemma/loop invariant: The shortest path P from A to Z when using this algorithm will always be fully traversed (where traversed is as defined as Z being REMOVED from the priority queue with the weight corresponding to what it would be if Zs weight was calculated when visiting from Z.prev (in path P) before any other longer paths from A to Z get processed. That is because always the node with the least weight is getting removed from the queue, so paths with the least weight are always getting traversed first, meaning if a node has multiple partially traversed paths constructed for it (where partially traversed means node Z is in the queue but not yet removed), the first path that is fully traversed will be the one with the lowest weight.
The important difference between Boruvka’s algorithm and Kruskal’s or Prim’s is that
with Boruvka’s you don’t need to presort the edges or maintain a priority queue.
Both prims and boruvka are based on what observation?
the observation that if the edge costs are unique, the cheapest edge of any cut belongs to the spanning tree