5. Graphs Flashcards

1
Q

What is the goal of graph partitioning?

A

To find densely linked clusters in a graph that represent communities of some kind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What makes a good graph partition?

A
  1. Maximizes the number of within-group connections
  2. Minimizes the number of between-group connections
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the equation for the cut of a graph?

A

When you split a graph into 2 subgraphs A and B, you sum the weights of edges which have one endpoint in A and the other in B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the issue with splitting graphs by the minimal cut?

A

It only considers external cluster connections and not internal connectivity so our minimum cut may be uninteresting or sub-optimal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the normalized cut?

A

Measures connectivity between groups in a graph relative to the density of each group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the formula for normalized cut?

A

ncut(A,B) = (cut(A,B)/vol(A)) + (cut(A,B)/vol(B))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the volume of a group in a graph?

A

The total weight of all edges with at least one endpoint within the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 basic stages of spectral clustering?

A
  1. Preprocessing: construct a matrix representation of the graph
  2. Decomposition: compute eigenvalues and eigenvectors of the matrix and map each point to a lower-dimensional representation based on one or more eigenvectors
  3. Grouping: Assign points to two or more clusters based on the new representation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an adjacency matrix?

A

A matrix representation of a graph that has an entry for each node pair.
It is symmetric if unweighted and undirected.
If the nodes are connected their entry is 1 or the weight of the edge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a degree matrix?

A

A matrix representation of a graph that has an entry for each node pair.
It is a diagonal matrix as we only fill out entries where a node intersects with itself
Each entry is the degree of the node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we compute the Laplacian Matrix?

A

L = D - A
Where D is the degree matrix and A is the adjacency matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the Laplacian matrix fit into spectral partitioning?

A

We build the Laplacian matrix of the graph, compute eigenvalues and eigenvectors of the matrix, and then group the points based on some criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we decide where to split an eigenvector to cluster the nodes?

A

Naive approach is to split at 0 or median value
There are other complicated expensive approaches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we partition a graph into k clusters instead of just 2?

A
  1. Recursive bi-partitioning where we recurively apply bi-partitioning hierarchically and divisively
  2. Cluster multiple eigenvectors and build a reduced space from them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we not use recursive bi-partitioning for k-way spectral clustering?

A

It is inefficient and unstable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the issue with spectral clustering?

A

It cannot perform overlapping communities. It only shows disjoin communities and splits the nodes into their groups

17
Q

What is the trick to finding overlapping communities?

A

Overlapping communities tend to have higher edge density as they are linked to multiple groups

18
Q

What is the community-affiliation graph model?

A

Generative model to create graph representation of communities

B(V, C, M, {pc}) where V is the set of nodes, C is the set of communities, M is the set of community memberships for the nodes, pc is the set of probabilities for each community c

19
Q

For the AGM model when do we decide to link 2 nodes?

A

Each pair of nodes in a community are connected with the probability of that community

P(u,v) = 1 - Product(1-pc) for communities c shared by u and v

20
Q

What is the formula for probability of an edge between 2 nodes within a single community if community memberships can have strengths?

A

P(u, v) = 1 - exp(-FuA * FvA)

Where FuA is the strength of u’s membership to A from the Factor matrix F
FvA is the strength of v’s membership to A from the factor matrix F

21
Q

What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths?

A

P(u, v) = 1 - exp(-Fu * Fv)

Where Fu is the vector of membership strengths for node u and Fv is the vector of membership strengths for node v.
Fv must be transposed in order to perform dot product correctly

22
Q

What is the advantage of AGM over spectral clustering?

A

AGM can express a variety of community structures including non-overlapping, overlapping, and nested

23
Q

How do we extend AGM to work for directed graphs?

A

Nodes can have outgoing memberships where the node sends an edge (outdegree) and incoming memberships where the node receives an edge (indegree).

We also use 2 factor matrices F and H. F is the outgoing community membership strengths and H is incoming

24
Q

What is the formula for probability of an edge between 2 nodes for any community if community memberships can have strengths and the graph is directed?

A

P(u, v) = 1 - exp(-Fu * Hv)
Where Fu is the vector of outgoing community membership strengths for node u and Hv is the vector of incoming community membership strengths for node v

25
Q

What is the purpose of graph trawling?

A

Finding small communities within a large web graph

26
Q

How do we perform trawling on a graph?

A
  1. View each node i as a set of nodes i points to
  2. For a given support threshold, we want to find sets of size t that occur in s sets to create a complete bipartite graph Ks,t