5. Graphs Flashcards by Branden Wheeler

What is the goal of graph partitioning?

To find densely linked clusters in a graph that represent communities of some kind

How well did you know this?

Not at all

Perfectly

What makes a good graph partition?

Maximizes the number of within-group connections
Minimizes the number of between-group connections

How well did you know this?

Not at all

Perfectly

What is the equation for the cut of a graph?

When you split a graph into 2 subgraphs A and B, you sum the weights of edges which have one endpoint in A and the other in B

How well did you know this?

Not at all

Perfectly

What is the issue with splitting graphs by the minimal cut?

It only considers external cluster connections and not internal connectivity so our minimum cut may be uninteresting or sub-optimal

How well did you know this?

Not at all

Perfectly

What is the normalized cut?

Measures connectivity between groups in a graph relative to the density of each group

How well did you know this?

Not at all

Perfectly

What is the formula for normalized cut?

ncut(A,B) = (cut(A,B)/vol(A)) + (cut(A,B)/vol(B))

How well did you know this?

Not at all

Perfectly

What is the volume of a group in a graph?

The total weight of all edges with at least one endpoint within the group

How well did you know this?

Not at all

Perfectly

What are the 3 basic stages of spectral clustering?

Preprocessing: construct a matrix representation of the graph
Decomposition: compute eigenvalues and eigenvectors of the matrix and map each point to a lower-dimensional representation based on one or more eigenvectors
Grouping: Assign points to two or more clusters based on the new representation

How well did you know this?

Not at all

Perfectly

What is an adjacency matrix?

A matrix representation of a graph that has an entry for each node pair.
It is symmetric if unweighted and undirected.
If the nodes are connected their entry is 1 or the weight of the edge

How well did you know this?

Not at all

Perfectly

What is a degree matrix?

A matrix representation of a graph that has an entry for each node pair.
It is a diagonal matrix as we only fill out entries where a node intersects with itself
Each entry is the degree of the node

How well did you know this?

Not at all

Perfectly

How do we compute the Laplacian Matrix?

L = D - A
Where D is the degree matrix and A is the adjacency matrix

How well did you know this?

Not at all

Perfectly

How does the Laplacian matrix fit into spectral partitioning?

We build the Laplacian matrix of the graph, compute eigenvalues and eigenvectors of the matrix, and then group the points based on some criteria

How well did you know this?

Not at all

Perfectly

How do we decide where to split an eigenvector to cluster the nodes?

Naive approach is to split at 0 or median value
There are other complicated expensive approaches

How well did you know this?

Not at all

Perfectly

How can we partition a graph into k clusters instead of just 2?

Recursive bi-partitioning where we recurively apply bi-partitioning hierarchically and divisively
Cluster multiple eigenvectors and build a reduced space from them

How well did you know this?

Not at all

Perfectly

Why do we not use recursive bi-partitioning for k-way spectral clustering?

It is inefficient and unstable

How well did you know this?

Not at all

Perfectly

What is the issue with spectral clustering?

It cannot perform overlapping communities. It only shows disjoin communities and splits the nodes into their groups

What is the trick to finding overlapping communities?

Overlapping communities tend to have higher edge density as they are linked to multiple groups

What is the community-affiliation graph model?

Generative model to create graph representation of communities

B(V, C, M, {pc}) where V is the set of nodes, C is the set of communities, M is the set of community memberships for the nodes, pc is the set of probabilities for each community c

For the AGM model when do we decide to link 2 nodes?

Each pair of nodes in a community are connected with the probability of that community

P(u,v) = 1 - Product(1-pc) for communities c shared by u and v

What is the formula for probability of an edge between 2 nodes within a single community if community memberships can have strengths?

P(u, v) = 1 - exp(-FuA * FvA)

Where FuA is the strength of u’s membership to A from the Factor matrix F
FvA is the strength of v’s membership to A from the factor matrix F

What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths?

P(u, v) = 1 - exp(-Fu * Fv)

Where Fu is the vector of membership strengths for node u and Fv is the vector of membership strengths for node v.
Fv must be transposed in order to perform dot product correctly

What is the advantage of AGM over spectral clustering?

AGM can express a variety of community structures including non-overlapping, overlapping, and nested

How do we extend AGM to work for directed graphs?

Nodes can have outgoing memberships where the node sends an edge (outdegree) and incoming memberships where the node receives an edge (indegree).

We also use 2 factor matrices F and H. F is the outgoing community membership strengths and H is incoming

What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths and the graph is directed?

P(u, v) = 1 - exp(-Fu * Hv)
Where Fu is the vector of outgoing community membership strengths for node u and Hv is the vector of incoming community membership strengths for node v

What is the purpose of graph trawling?

Finding small communities within a large web graph

How do we perform trawling on a graph?

1. View each node i as a set of nodes i points to 2. For a given support threshold, we want to find sets of size t that occur in s sets to create a complete bipartite graph Ks,t