5. Graphs Flashcards
What is the goal of graph partitioning?
To find densely linked clusters in a graph that represent communities of some kind
What makes a good graph partition?
- Maximizes the number of within-group connections
- Minimizes the number of between-group connections
What is the equation for the cut of a graph?
When you split a graph into 2 subgraphs A and B, you sum the weights of edges which have one endpoint in A and the other in B
What is the issue with splitting graphs by the minimal cut?
It only considers external cluster connections and not internal connectivity so our minimum cut may be uninteresting or sub-optimal
What is the normalized cut?
Measures connectivity between groups in a graph relative to the density of each group
What is the formula for normalized cut?
ncut(A,B) = (cut(A,B)/vol(A)) + (cut(A,B)/vol(B))
What is the volume of a group in a graph?
The total weight of all edges with at least one endpoint within the group
What are the 3 basic stages of spectral clustering?
- Preprocessing: construct a matrix representation of the graph
- Decomposition: compute eigenvalues and eigenvectors of the matrix and map each point to a lower-dimensional representation based on one or more eigenvectors
- Grouping: Assign points to two or more clusters based on the new representation
What is an adjacency matrix?
A matrix representation of a graph that has an entry for each node pair.
It is symmetric if unweighted and undirected.
If the nodes are connected their entry is 1 or the weight of the edge
What is a degree matrix?
A matrix representation of a graph that has an entry for each node pair.
It is a diagonal matrix as we only fill out entries where a node intersects with itself
Each entry is the degree of the node
How do we compute the Laplacian Matrix?
L = D - A
Where D is the degree matrix and A is the adjacency matrix
How does the Laplacian matrix fit into spectral partitioning?
We build the Laplacian matrix of the graph, compute eigenvalues and eigenvectors of the matrix, and then group the points based on some criteria
How do we decide where to split an eigenvector to cluster the nodes?
Naive approach is to split at 0 or median value
There are other complicated expensive approaches
How can we partition a graph into k clusters instead of just 2?
- Recursive bi-partitioning where we recurively apply bi-partitioning hierarchically and divisively
- Cluster multiple eigenvectors and build a reduced space from them
Why do we not use recursive bi-partitioning for k-way spectral clustering?
It is inefficient and unstable