5. Graphs Flashcards
What is the goal of graph partitioning?
To find densely linked clusters in a graph that represent communities of some kind
What makes a good graph partition?
- Maximizes the number of within-group connections
- Minimizes the number of between-group connections
What is the equation for the cut of a graph?
When you split a graph into 2 subgraphs A and B, you sum the weights of edges which have one endpoint in A and the other in B
What is the issue with splitting graphs by the minimal cut?
It only considers external cluster connections and not internal connectivity so our minimum cut may be uninteresting or sub-optimal
What is the normalized cut?
Measures connectivity between groups in a graph relative to the density of each group
What is the formula for normalized cut?
ncut(A,B) = (cut(A,B)/vol(A)) + (cut(A,B)/vol(B))
What is the volume of a group in a graph?
The total weight of all edges with at least one endpoint within the group
What are the 3 basic stages of spectral clustering?
- Preprocessing: construct a matrix representation of the graph
- Decomposition: compute eigenvalues and eigenvectors of the matrix and map each point to a lower-dimensional representation based on one or more eigenvectors
- Grouping: Assign points to two or more clusters based on the new representation
What is an adjacency matrix?
A matrix representation of a graph that has an entry for each node pair.
It is symmetric if unweighted and undirected.
If the nodes are connected their entry is 1 or the weight of the edge
What is a degree matrix?
A matrix representation of a graph that has an entry for each node pair.
It is a diagonal matrix as we only fill out entries where a node intersects with itself
Each entry is the degree of the node
How do we compute the Laplacian Matrix?
L = D - A
Where D is the degree matrix and A is the adjacency matrix
How does the Laplacian matrix fit into spectral partitioning?
We build the Laplacian matrix of the graph, compute eigenvalues and eigenvectors of the matrix, and then group the points based on some criteria
How do we decide where to split an eigenvector to cluster the nodes?
Naive approach is to split at 0 or median value
There are other complicated expensive approaches
How can we partition a graph into k clusters instead of just 2?
- Recursive bi-partitioning where we recurively apply bi-partitioning hierarchically and divisively
- Cluster multiple eigenvectors and build a reduced space from them
Why do we not use recursive bi-partitioning for k-way spectral clustering?
It is inefficient and unstable
What is the issue with spectral clustering?
It cannot perform overlapping communities. It only shows disjoin communities and splits the nodes into their groups
What is the trick to finding overlapping communities?
Overlapping communities tend to have higher edge density as they are linked to multiple groups
What is the community-affiliation graph model?
Generative model to create graph representation of communities
B(V, C, M, {pc}) where V is the set of nodes, C is the set of communities, M is the set of community memberships for the nodes, pc is the set of probabilities for each community c
For the AGM model when do we decide to link 2 nodes?
Each pair of nodes in a community are connected with the probability of that community
P(u,v) = 1 - Product(1-pc) for communities c shared by u and v
What is the formula for probability of an edge between 2 nodes within a single community if community memberships can have strengths?
P(u, v) = 1 - exp(-FuA * FvA)
Where FuA is the strength of u’s membership to A from the Factor matrix F
FvA is the strength of v’s membership to A from the factor matrix F
What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths?
P(u, v) = 1 - exp(-Fu * Fv)
Where Fu is the vector of membership strengths for node u and Fv is the vector of membership strengths for node v.
Fv must be transposed in order to perform dot product correctly
What is the advantage of AGM over spectral clustering?
AGM can express a variety of community structures including non-overlapping, overlapping, and nested
How do we extend AGM to work for directed graphs?
Nodes can have outgoing memberships where the node sends an edge (outdegree) and incoming memberships where the node receives an edge (indegree).
We also use 2 factor matrices F and H. F is the outgoing community membership strengths and H is incoming
What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths and the graph is directed?
P(u, v) = 1 - exp(-Fu * Hv)
Where Fu is the vector of outgoing community membership strengths for node u and Hv is the vector of incoming community membership strengths for node v
What is the purpose of graph trawling?
Finding small communities within a large web graph
How do we perform trawling on a graph?
- View each node i as a set of nodes i points to
- For a given support threshold, we want to find sets of size t that occur in s sets to create a complete bipartite graph Ks,t