Graphs Flashcards
Directed Edges
An edge (u,v) is from u to v, not v to u
Undirected Edges
An edge (u,v) is from u to v AND from v to u
Weighted Edges
Each edge has a “weight” (“cost”) associated with it (i.e. magnitude associated with relationship)
Unweighted Edges
Edges do not have weights
Cycle
A valid path starting and ending at the same node
Class: Unstructured
A disconnected collection of nodes
Class: Sequential
An ordered collection of connected nodes (i.e. Linked List)
Class: Hierarchical
A ranked collection of connected nodes
Class: Structured
A collection of both connected and disconnected nodes
Multigraph
A given pair of nodes may have multiple distinct edges (i.e. cites and flights, different prices between same cities)
Adjacency Matrix
O(|V|^2) Good for dense graphs. Not good for sparse graph. Must iterate through all indices Access constant time for searching.
From is rows, To is columns.
For undirected graph, it will be symmetric matrix.
Diagonals will be self directed path.
Adjacency List
Hash tables or has maps, or linked list. Has very fast look up and saves space without reference null connections unlike matrix. O(|V| + |E|)
Breadth First Search (BFS) Algorithm
- Add (0, start) to queue
- While the queue is not empty:
a. pop (d, curr) from the queue
b. if curr has not been visited:- mark curr as visited with distance d
- for all edges (curr, w):
if w has not been visited add (d+1, w) to queue
Depth First Search (DFS) Algorithm
- Add start to stack
- While the stack is not empty:
a. pop curr from stack
b. if curr has not been visited- mark curr as visited
- for all edges (curr, w):
- if w has not been visited, add w to stack
Dijkstra’s Algorithm
1. know prerequisite condition)
2. valid by proof by contradiction. No future path will be less than the current path using algorithms
3. keep track of the shortest path coming from
4.
Fundamentally the same as BFS (Queue and DFS (stack), but Dijkstra’s is a (min) heap where lowest distance is highest priority
Set all nodes to infinite distance. 1. add (0,start-inital node/source) to PQ/heap. Set others to -1 or infinity 2. while PQ !=empty: - A. Pop the (d,curr) from PQ - B. if curr is not done: - 1. mark curr as done - 2. (curr, w, e): only works for graphs with positive edge weights. No negative weights
Dijkstra’s Algorithm Time Complexity
Overall: O(|V| + |E| log |E|) P. Initialization of infinity = O(|V|) 1. Add (0,Start) to PQ = O(1) 2. while PQ not empty: A. Pop (d, curr) from PQ = O(log |E|) , since using a heap (perfectly balanced) number of pops is log of the number of edges
BFS Search Algorithm (Min-Path)
Debashis (11/25)
P. Initialize all nodes (other than source) with infinity distance and previous=-1
- Point CURR to source. Add CURR to PQ. Then enter while loop of (while PQ not empty).
- Pop from PQ (min heap). Start with (source) node and add to PQ all neighboring nodes (technically edges). If distance is infinity update neighbors with distance between +1. Once all neighbors pushed to PQ, then exit.
- Pop queue and update dist of curr to 2, prev to prev node, pop queue and update distance to 2 and prev prev node
For directed and connected graph what is expected
Each node is only enqueued once
At any time the queue holds at most 2 distinct DISTANCE VALUE from all nodes in it.
It can work for a graph with a cycle
Dijsktra’s Algorithm Properties
Properties (using PQ holding pairs (pointer to vertex, path cost ) )
Path Cost is the priority (low is higher priority)
The PAIR is maintaining a path cost to vertex.
Multiple pairs with the same “pointer-to-vertex” part can exist in the priority queue at the same time. These will usually differ in the “path cost”
Dijkstra’s Algorithm: Data Structures
no negative vertex weight
Maintain a sequence (array, vector, etc) of vertex objects, indexed by vertex number
- vertex objects contain 3 fields (and others):
1. Dist: cost of teh best (least cost) path discovered so far from start vertex to “this” (curr) vertex
2. Prev: the vertex number (index) of the rpevious node to the best path
3. Done: a boolean indicating whether the Dist and Prev fields contain the final best values for this vertex
Disjoint Sets is ADT or DS
Abstract Data Type
ADT optimized for Union and Find
Union: Given two elements, merge the sets to which they belong
Find: Given an element e, return the set to which it belongs
Purpose of union operation
Join two sets. Once two sets have been unioned , they should have the same sentinel node. (there are a lot of freedom around the concept of union, since there is many ways to merge two sets of vertices.)
Union by size
Union by weight
Queue
Array List Enqueue O(1)/O(n) O(1) Dequeue O(1) O(1) Peek O(1) O(1)
Disjoint Set is an ADT
Commonly referred to as Union-Find data type
Interface supports:
Union: given two elements u and v, merge the sets to which they belong
Find: Given an element e, return the set to which it belongs.
Efficiently implemented using Up-Tree DATA STRUCTURE. Simply a graph which all nodes have. ainge “parent” pointer. Those that do not have a parent pointer are the SENTINEL NODES. Sentinel Nodes represent a single set.
Kruskal’s Algorithm is implemented efficiently, it has a worst-case Big-O time complexity that is the same as Prim’s Algorithm: O(|V| + |E|*log(|E|)
Implementation-wise, here is the pseudocode for Kruskal’s Algorithm on a weighted undirected graph G = (V,E):
Create a forest of vertices Create a priority queue containing all the edges, ordered by edge weight While fewer than |V| - 1 edges have been added back to the forest: Deque the smallest-weight edge (v, w, c) from the priority queue If v and w already belong to the same tree in the forest: go to stipek 3.1 (adding this edge would create a cycle) Else: Join those vertices with that edge and continue
Can you identify the similarities and differences between Kruskal and rim’s algorithm?
The both have same run-time complexity O(|V| + |E|* log(|E|))
Prims uses PQ sorted by edge weights (single edge weights)
Dijkstra uses total path edge weights explored so far in PQ for comparison.
Prim’s adds an edge from anywhere in the existing/current subgraph of vertices with the least weight.
Prims and Dijsktra both start from a designed node/vertex.
Kruskal arbitrarily starts at any node in the graph.
DFS uses STACK (ADT)
Stack is FILO (first in last out)
Stack guarantees visiting all vertices that enter the stack by most RECENT FIRST (ordering by time of entry), ensuring the next vertex is that one entering at the first junction point.
The only difference in the pseudocode for BFS and DFS was their use of a queue and stack, respectively.
Thus, the difference in runtime complexity of DFS and BFS depends on their ADTs runtime.
But worst-cast time complexity of both ADT are the SAME.
Pop(), top(), and push() has worst-case time complexity of O(1)
DFS and BFS have worst-case time complexity of:
O(|V| +|E|)
How do DFS and BFS differ?
They differ by their memory management.
BFS had to store all vertex’s neighbors at each step in a queue to later traverse those neighbors.
DFS only needed to store previous neighbor to explore the next one. Thus, in certain graphs (structures) DFS can be more memory efficient.
How do DFS and BFS differ?
They differ by their memory management.
BFS had to store all vertex’s neighbors at each step in a queue to later traverse those neighbors.
DFS only needed to store previous neighbor to explore the next one. Thus, in certain graphs (structures) DFS can be more memory efficient.
A LONG/TALL/HEIGHT tree will have DFS store more elements in memory. A wider tree or a structure with many edges will have BFS store more information/elements
In a GRAPH with a TREE structure, DFS is not guaranteed to find a path between any two nodes.
In a GRAPH that is NOT a TREE, DFS will guarantee to find a path between any two nodes, it is not guaranteed to find the shortest path between two nodes.
BFS is guaranteed to find the shortest path in a graph that is not a tree.
Important to consider parameters of the problem to choose between BFS and DFS.
For a connected, weighted graph with n vertices and exactly n edges, it is possible to find a minimum spanning tree in O(n) time.
This graph only contains one cycle, which can be found by a DFS. Just remove the heaviest edge in that cycle.
In a connected, weighted graph, every lowest weight edge is always in some minimum spanning tree.
It can be the first edge added by Kruskal’s algorithm.
Algorithm:
If m = n, give an algorithm that runs in Θ(lg n) time.
Pick the median m1 for A and median m2 for B. If m1 = m2, return m1. If m1 > m2, remove the second half of A and the first half of B. Then we get two subarrays with size n/2. Repeat until both arrays are smaller than a constant m1 < m2 is symmetric.
Argue that: Any comparison based sorting algorithm can be made to be stable, without affecting the running time by more than a constant factor. (seems like a Dijsktra’s Algorithm)
Solution: To make a comparison based sorting algorithm stable, we just tag all elements with their original positions in the array. Now, if A[i] = A[j], then we compare
i and j , to decide the position of the elements. This increases the running time at a factor of 2 (at most).
Argue that you cannot have a Priority Queue in the comparison model with both the following properties.
EXTRACT-MIN runs in THETA (1) time.
BUILD-HEAP runs in THETA(n) time.
If such priority queues existed, then we could sort by running BUILD-HEAP (THETA(n)) and then extracting the minimum n times ( n*THETA(1)= THETA(n)). This algorithm would sort THETA(n) time in the comparison model, which violates the THETA(log n) lower bound for comparison based sorting.
Prim’s Algorithm Runtime
O(n log n)
Depth First Search
O(n)
(requires a diagram) Given a set of keys, 2,3,5,7,11,13,17,19, can you place them accordingly to fit a predetermined structure and perserve invariants/rules?
Can you explain why this cannot be a Red Black Tree
One is 17, 11, 19 , 3, 13 , 2 ,5, 7
Why not a read black tree? The number of black nodes on the path of the root
(requires a diagram) The binary search tree can be transformed into a red-black tree by performing a single rotation. Draw the red-black tree that results, labeling each node with “red” or “black.”
Include the keys from part (a).
Rotate right around the root. There are several valid ways to color the resulting tree. Here is one possible answer (all nodes are colored black except for 7).
Minimum Spanning Tree:
Let G = (V, E) be a connected, undirected graph with edge-weight function w : E => R, and assume all edge weights are distinct. Consider a cycle in G, where vk+1 = v1, and let (vi, vi+1) be the edge in the cycle with the largest edge weight. Prove that (vi, vi+1) does not belong to the minimum spanning tree T of G.
Proof by contradiction. Assume for the sake of contradiction that (vi, vi+1) does belong to the minimum spanning tree T. Removing (vi, vi+1) from T divides T into two connected components P and Q, where some nodes of the given cycle are in P and some are in Q. For any cycle, at least two edges must cross this cut, and therefore there is some other edge (vj , vj+1) on
the cycle, such that adding this edge connects P and Q again and creates another spanning tree T ‘.
Since the weight of (vj, vj+1) is less than (vi, vi+1), the weight of T ‘ is less than T and T cannot
be a minimum spanning tree. Contradiction.
Comparing Prims and Kruskals Algorithm
Both algorithms choose and add at each step a min-weight edge from the remaining edges, subject to constraints.
Prim’s:
- start a root
- Two rules for new edge.
a. adding new edge does not create cycle in subgraph
b. the graph is still a connected subgraph - terminate if no more edges to add can be found
Kruskal’s MST algorithm:
- Start a a min-weight edge
- One rule to follow for adding a new edge: No cycle in a forest of trees built
- Terminate if no more edges to add can be found
At each step a forrest of trees merging as the algorithm progresses
To prove these algorithms:
Let B be a subset of V(G); |B|