Graph Agos Flashcards
How to get connected components in undirected graph G
Run DFS and keep track of component #
DFS inputs and outputs
input: G=(V,E) in adjacency list representation
Outputs:
prev[z]: The parent index of vertex z in the DFS visitation
pre[z]:The pre number of the vertex z in the DFS visitation
post[z]: The post number of vertex z in the DFS visitation
ccnum[z]:The connected components number IF vertices are passed in as highest post number
lowest number after running DFS on a reversed directed graph that is processed this way, the ccnum also be the topological sort order in reverse
DFS algorithm
DFS(G): cc = 0 for all v in V, visited(v) = FALSE for all v in V if not visited(v) then c++ explore(v)
DFS Explore
Explore(z) ccnum(z) = cc visited(z) = TRUE for all (z,w) in E: if not visited(w) Explore(w)
Running time for DFS
O(n+m)
How to find a path between connected vertices?
We add an initialization of prev(v) = NULL to the base case for DFS and in Explore we set prev(w)=z after running Explore(w)
We use the prev array to backtrack. For a pair of certices in the same connected component, we can use the prev array to find a path between this pair of connected vertices.
How can we get connectivity info for directed G?
use dfs: add pre/postorder numbers for the tree or forest of explored edges
DFS on directed graphs
DFS(G): clock =1 for all v in V, visited(v) = FALSE for all v in V if not visited(v) then explore(v)
Explore(z) for directed graphs
Explore(z) pre(x) = clock; clock++ visited(z) = TRUE for all (z,w) in E: if not visited(w) Explore(w) post(z) = clock, clock++
Do we use postorder numbers or pre order numbers for directed graphs?
postorder
How is a graph represented?
We can represent a graph by an adjacency matrix; if there are n = |V | vertices v1, . . . , vn, this is an n × n array whose (i, j)th entry is
aij =
1 if there is an edge from vi to vj
0 otherwise.
For undirected graphs, the matrix is symmetric since an edge (u,v) can be taken in any direction.
Tree edges
Part of the forest. post(z) > post(w)
Forward Edges
lead from a node to a nonchild descendant in the DFS tree. Goes down multiple steps. post(z) > post(w)
Back Edges
lead to an ancestor in the DFS tree. The post order number of post(z) < post(w)
Cross Edges
lead to neither descendant not ancestor; they therefore lead to a node that has already been completely explored. post(z) > post(w)
Cycle
A circular path v0->v1->v2….->vk->v0.
G has a cycle iff its DFS tree has a back eddge
How to know if a directed hraph has a cycle?
if its depth first search reveals a back edge
How is a dag linearized?
Decreasing post numbers
DAG
Directed Acylclic graph
No cycles = No back edges
Topologically sort inputs and outputs
Input: Graoh G = (V,E), directed acycic graoh(DAG)
Outut:topo[i]: the vertex number of the ith vetex in topological order from left to right
Topologically sorting runtime
O(n+m)
DAG Structure -
Source Vertex = no incoming edges
Sink Vertex = no outgoing edges
There is always at least one source and one sink
Source Vertex
ight have some outgoing edges, but No incoming edges = highest post #
Sink vertex
Might have some edges in, but it has no outgoing edges = lowest post #
Alternative topological sorting algorithm
1) Find a sink, output it and delete it
2) Repeat 1 until the graph is empty
Connectivity in directed graphs. How do we know v and w are strongly connected?
Vertices v and w are strongly connected if:
there is a path v->w and w->v
SCC
Strongly connected component
Maximal set of strongly connected vertices
How do we know a vertex is in a sink in a dag?
It has the lowest post order #
Does the vertex with a the lowest order # for a general graph exist in a sink
no
what characteristic of v will cause to live in a source for a general directed graph
highest post number
Find a sink in an SCC?
reverse the edges
For a directed G=(V,E), look at G^r = (V,E^r) = Reverse of G
Run DFS on the reversed graph and take the vertex with the highest post number beause it is guarenteed to be in sink.
source SCC in GR = sink
sink SCC in GR = source
SCC Algorithm
SCC(G):
input: DIrected G = (V,E) in adjency list representation
- Construct reverse of graph
- Run DFS on reversed graph
- Order vertices by descending post #
- RUn undirected connected components on G
5.
BFS
Input: Graph G = (V,E), directed or undirected;vertex s E V
Output:
dist[u]: distance from s to u if s can reach u, inf otherwise
prev[z]: The parent index of vertex z
Runtime: O(n+m)
Use this for unweighted Single Source Shortest Path(SSSP)
Dist is given as the number f edges from s to u, not the sum of weights. Don’t mess this up
Dijkstra’s
Input: Graph = (V,E), directed or undirected vertex s EV; positive edge weights w
Output:
dist[u]: distance from s to u if s can reach i, inf otherwise
prev[z]: the parent index of vertex z
Runtime: O(n+m) * O(logn) = O((n+m)logn)
Use this for non-negative weighted SSSP
It is paramount to understand that if weights are involved then we can’t get to O(n+m)
SAT Problem - Input and output
input: formula in CNF with n variables and m clauses
Each clause has at most two literals
Any clause with one literal can be satisfied and removed leaving us with a CNF f’ of only clauses with exactly two literals
Output: An assignment of T or F for each variable in f if it can be satisifed. No if it cannot be satisifed
SAT Run time
O(n+m)
SAT Properties
Each variable has two literals x and !x
n variables have 2n literals [x1, !x1, x2, !x2, …xn, !xn]
Note that the variables and literals are not provided separately. They are part of the formula. Finding all variables is O(m) as we must look at al clauses and list them all
We call SAT problems kSAT where k is replaces by the maximum number of literals in each clause. If there is no limit then it is simply called a SAT problem
k>= 3 are NP complete problems. 2SAT is P
The 2SAT algorithm transforms a 2SAT CNF into a graph and solves it
How does the 2SAT algorithm transfor a 2SAT CNF into a graph and solve it?
Create a graph G fro f by mapping the CNF to 2n vertices(one for each literal) and 2m edges(one for each implication)
2) RUn the SCC algorithm on G
3. If Ax E X, x and !x are in different SCCs then the CNF is satisifiable else return NO
- Set all source literals to False and remove the source SCC
- Set all sink literals to True and remove the sink SCC
- Repeat steps 4 and 5until all literals are set
- Return the variable assignents
Kruskals MST Algorithm
Input: Graph G = (VE), connected, undirected; edge weights w
Output: A minimum spanning tree defined by the edges E mst
Runtime: O(m log n)
Sort edges by weight. Grab the lightest available edge that will not create a cycle when added to the MST. Another way to look at this is to never add edgesto vertices in the same component in the MST. Keep doing this until all adges that will not create a cycle are added. This happens at exactly n-1 edges.
The more common of the ST algorithms. Gives a set of edges which is useful to construct the next input for a black box or to compare to the original G
Edge Weights(nxn matrix)
This is not an algorithm. It is how we can interpret edge weights w as O(1), O(1) write, and O(m) iteration. We take the adjacency list of a graoh and
1) Initialize an nxn atrix - O(1): Note that initialiaing a matrix with no values is O(1)
2) Set each edge weight in the table: O(1) * O(m) = O(m)
Note that setting a value is when there is a cost.
By setting values where there is an edhe, we avoid the typical n^2 cost of working with an nxn table
(u,v) access is O(1)
w(u,v) write is O(1)
When we want to do an iteratyion, we use the provided adjacency list and do an O(1) access for each edge - O(n+m)
We can assume O(m) for iteration
Union by Rank(DIsjoint Sets)
This is not an algoirhtm. It is a special data structure that akes the runtime of Kruskal’s MST algorithm possible
Functions
Makeset(x):Create singleton set containing just x
find(x):To which set does x belong
union(x,y): merge the sets containing x and y
Property 1: For any x, rank(x) < rank(pie(x)): A root node with rank k is created by merger of two trees with roots rank k-1
Property 2: Any root node of rank k has at least 2^k nodes in its tree
Property 3: If there are no elements overall there can be at mode n/2^k nodes of rank k. This last property implies that max rank is O(logn) Therfore, all trees have height <= logn and an upper bound on the running time of find and union.
Path compression: Extra maintenance to imporace speed.
During each find with a series of parent pointers is followed up to the root of a tree. We will change all these pointers so they point directly to the root.
The key point here is that once a node ceases to be a root, never surfaces, and its rank is forever fixed. Therefore all ranks of all notes are unchanged by path compression even though these numbers can no longer be intepreted as tree heights. In particular, properties 1-3 hold.
Example of how to establish non graoh problem into grap
We set bus stops as vertices
We set bus routes as undirected edges
We run DFS on G(V,E)
We do not need runtime for this. It will always be O(n+m)
WHich algorithms work well for MST?
Kruskal and Prims
MST Problem
Given: Undirected graph(V,E) with weights w(e) for e E E
Gaol: Find minimal size, connected graph of min weight
FInd min weight spanning tree of G
for T C E, w(T) sum of weights of edges in the tree
Tree properties
Connected acycic graph
1) Tree on n vertices and has n-1 edges
2) One path between every pair of vertices. If there was more than one path, it would be a cycle
3) Any connected G=(V,E) with |E|=|V|-1 is a tree
Greedy approach for MST
Review lecture
STart with empty graph. Assess adding each edge in one at time and add edges in that don’t cause a cycle
Kuskals algorithm
Kruskals(G):
Input: Undirected G=(V,E) with weights w(e)
- SOrt E by increasing weight
- Set X = 0
3) for e = (v,w) is in E ( go through in order) if XV e doesn’t have a cycle then: X = X Ve
4) Return (X)
X is an MST
Big O for Kruskals
O(mlogn)
Cuts
Set of edges that partition vertices into 2 sets S and the complement of S.
Edges crossing between S and S bar
Cut Property
For undirected graph G = (V,E)
Take a subset of edges are apart of an MST T
Take subset S where no edge of X is in the cut(S, S bar)
Let e* be minimum weight edge in cut(S and S bar)
ANy edge minimum weight across a cut is part of some MST
Prim’s
MST akin to Dijstra’s
Use cut property to prove correctness of Prim’s algorithm
Approach to solving problems with graphs
- Figure our black box
- State mod if needed. You may use black box as as is
- State steps. What changes do you need for the input. Repeat with more black boxes if needed
- Prove correctness
- Analyze runtime
How do directed and undirected graphs behave differently in DFS
Pre/Pst numbers give info on how a graph could be explored. Some numbers are interchangable between two vertices, some can never be swapped.
Pre/post numbers in undirected graphs give infrmation on how a graph would be explored given a starting point. The root node that is used can determine entirely different pre/post numbers
ccnum in a directed graph only works if you always explore fro a sink vertex if all previous sink SCCs were removed.
A source vertex will be able to reach al connected vertices and the ccnum will be somewhat useless.
ccnum in an undirected graph gives how many components there are.
Explore input and output
Input: Graph G = (V,E), directed or undirected; v in V
Output: visited[u] is set to true for all nodes u reachable from v
Any other data structure that DFS has if needed
Explore runtime
O(m) if run as part of DFS
O(n+m) if run by itself and visited[] needs to be created
Explore properties
This is a subroutine of DFS and does most of the work in DFS
It runs on all edges and vertices that are reachable from the provided v
Use this is you only care about single source but need DFS information
All of the outputs in DFS are availble in explore as well
In fact ccnum, prev, pre, pst are set in the explore subroutine
If you are allowed to modify your black box algorithm, it is likely to be in explore
Topological sorting
This algorith works by running DFS on the DAG and using the post order number to sort the vertices from highest post# to lowest post#
When a DAG is ordered from source to sink then all edges go from left to right
Either run a graph as topological sort or do it as part of DFS
Strongly Connected Algorithm Inputs and outputs
Input: Graph G=(V,E), directed
Output: G SCCC = (V SCCC, E SCCC) where G is a metagraph(a DAG) with each SCC in G forming a vertex in V and each edge between SCCCs in W.
ccnum[.] comes from DFS used underneath this algorithm
V SCCC will look like 1,2,3,4 which are ccnums
E SCCC will look like (1,2), (2,3), (3,4) which are edges between V SCCC
Use ccnum[u] and cccnum[v] to find which SCC u and v are in
SCC Runtime
O(n+m)
SCCC Algorithm
Works by running two DFSses.
Run once with pre/post order numbering on Gr which is the reverse of G
Sort V in descending post order numbers. This gives sink to source
Run again on G with V sorted(not on Gr)
Output will have ccnum representing SCC with highest = source and lowest - sink
We use the ccnum to gather up vertices that belong to each SCC
We then check all original edges and their endpoints. If the endpoints are in different SCCs and a corresponding E between those SCCs that does not already exist, we add a E to represent the edge from one SCC to another.
How does 2SAT algorithm transform a 2SAT CNF into a graph and solve it
- Create a graph G from f by mapping the CNF to 2n vertices one for each literal and 2 edges one for each implication
- Run SCC on G
- If x is in X and x and !x are in different SCCs then CNF is satisfiable else return NO
- Set all source literals to False and remove the source SCC
- Set all sink literals to True and remove the sink SCC
- repeat steps 4 and 5 until all literals are set
- Return the variable assignments
Prims Input/Output and runtime
Input: Graph G = (V,E) connected, undirected, edge weights w
Output: A minimum spanning tree defined by array prev
Runtime: O(mlogm) simplified to O(mlogn)
Prim’s MST Algo
Start with arbitrary vertex v and put into subtree S of included vertices. In each iteration, grow S by adding the lightest edge between a vertex in S and a vertex outside of S. Continue untill all vertices are in S. This happens at exactly n-1 edges
Very similar to Dijkstra’s in construction. Not as used as Kriskals but can be very useful if you need a path in the MST output. Lack of edge information makes it unlikely to be what you want
max flow inputs and outputs
Inputs: G(V,E), s,t,c
Directed Graph G(V,E)
s,t in V such that s is a source and t is a sink
c is capacities such that c(u,v) > 0 for (u,v) in E
This set of inputs is also known as a flow network
Outputs: f*
f is a function or vector such that f(u,v) gives flow of u,v
It is a general term to represent flow at some point
f is constantly updated while the algorithm runs
f* is the max flow such that the soe over f(.,t) is maximized
C = capacity = size of max flow = size(f) some of f*(., t)
Maxflow rules
Cycles are ok
Anti-parallel edges are not ok
Instead add ane xtra vertex in one of the ani-parallalel edges with the sae direction and c(e) as the original edge
for all (u,v) in E:
0 <= f(u,v) <= c(u,v)
Flow of each edge is a non-negative, not exceeding capacity of each edge
for all v in V where v != s and v != t
total flow into v is equal to total flow out of v
Total flow out of s is equal to total flow into t
Consequently C is equal to both
Max flow general steps
Set f(u,v) = 0 for (u,v) in E
Build residual network G(V,E)
Check for any path in G from s to t. Can use DFS or BFS
If none is found then we are done
If found, get the min capacity along path as c(path)
Augment by c(path) units along the path. For each forward edge(u,v) along the path, increase f(u,v) by c(path) units.
FOr each backwards edge(v,u) along the path, decrease f(u,v) by c(path) units
Recurse to build residual network step
Build Residual Network inputs/outputs and runtime
Input: G(V,E),c,f
Outputs(G=(V,E), cf
Runtime: for (u,v) in E - O(n+m)
O(n+)
We assume G is a connected graph, m >= n-1; O(n+m) = O(m)
Residual Network Algorithm
for (u,v) in E if f(u,v) < c(u,v): add(u,v) to E add c(u,v) = c(u,v) -f(u,v) Theseare forward edges
if f(u,v) > 0: add (v,u) to E add c(v,u) = f(u,v) These are backward edges
Ford-Fulkerson
User General Max Flow Algorithm
Assumes all capacities are positive integers
Runtime:
1) Rounds: Flow increases by >= 1 per round, so there are C rounds where C is the size of max flow. O(C)
2) Per round:
Explore/DFS/BFS - O(n+) or O(m)
O(n-1) to augment f
O(n+m) + O(n-1) = O(n+m)
We assume G is a connected graph, m >= n-1; O(n+m) = O(m)
We assume G to be a connected not beceause the algorith requires it but because a disconnected graph isn’t well defined for a ax flow problem. Suppose s can’t reach it. What is our solution?
O(m) * O(C) = O(mC)
Edmonds-Karp
Specific version of Fold-Fulkerson with the path finding well defined to use BFS. Rather than using any path, it uses shortest path
Use General max flow algorith
Capacities can be any positive number
Replace step 3 in the genera; ax flow to check for shortest path in G from s to t. Use BFS. If non is found we’re done
Runtime
Rounds
Delete and add any edge <= n/2 times - O(n)
There are m edges in a graph
O(nm)
Per round: BFS - O(n+m)
O(n-1) to augment f
O(n+m) + O(n-1) = O(n+m)
We assume G is a connected graph m > n-1; O(n+m) = O(m)
O(m) * O(nm) = O(nm^2)
DAG properties
In a dag every edge leads to a vertex with a lower post nuber
Every dag has at least one source and at least one sink
A directed graph has a cycle if and only if its DFS reveals a back edge
Are cycles ok in a flow network
Yes.
For Fulkerson is psuedo polynomial. TF
True
Ford FUkerson
Find any path and run DFS or BFS and take any path
Ford fulkerson vs Edmonds-Karp
FF: Assume capacities are integers. Does not always terminate on irrational capacities. Can convert rational capacirties to integers(out of scope
FLow increases by >= 1 per round so there are C rounds where C is the size of max flow
Runs in O(mC)
Can be faster than Edmonds-Karp is C is bounded by n or m
Ford fulkerson vs Edmonds-Karp
FF: Assume capacities are integers. Does not always terminate on irrational capacities. Can convert rational capacirties to integers(out of scope
FLow increases by >= 1 per round so there are C rounds where C is the size of max flow
Runs in O(mC)
Can be faster than Edmonds-Karp is C is bounded by n or m
Edmonds carp Uses BFS to find next path works with irrational capacities O(nm^2) Generally considered faster for unbounded capacities