3 Flashcards
Question 1
Social network data can be represented using
a) a sociogram.
b) an adjacency matrix.
c) an adjacency list.
d) all of the above.
d) all of the above.
Question 2
Which statement is NOT CORRECT?
a) The geodesic represents the shortest path between two nodes.
b) The closeness measures the extent to which a node is near to all other nodes in the network.
c) The betweenness counts the number of times that a node or edge occurs in the geodesics of the network.
d) The graph theoretic center is the node with the highest, minimum distance to all other nodes.
d) The graph theoretic center is the node with the highest, minimum distance to all other nodes.
smallest, maximum distance to all other nodes aka most central node in the network or node that influences other nodes fastest.
Question 3
Which statement is NOT CORRECT?
a) Graph partitioning approaches try to split the whole graph into a predetermined number of clusters by optimizing the ratio between the within-community and between-community edges.
b) The min-max cut does not take the between-community edges into consideration.
c) In community mining, the min-cut metric chooses the partitioning so that the sum of the weights of the between-community edges or the cut is minimal.
d) In community mining, the ratio-cut metric takes into account the size of the communities.
b) The min-max cut does not take the between-community edges into consideration.
Question 4
When doing community mininig, the Girvan-Newmann algorithm makes use of the
a) closeness.
b) betweenness.
c) graph theoretic center.
d) geodesic.
b) betweenness.
Question 5
In bottom-up community mining,
a) one starts with the whole network and splits it up into communities.
b) one starts with one node (e.g., a fraudulent node) and adds more nodes to the community based on the links of this node.
b) one starts with one node (e.g., a fraudulent node) and adds more nodes to the community based on the links of this node.
Question 6
The modularity Q measure is a measure used to
a) determine the number of communities.
b) determine the separation between communities.
a) determine the number of communities.
Question 7
For strong communities, Q will approach
a) 0
b) 0.5
c) 1
d) 5
c) 1
Question 8
Which statement is NOT CORRECT?
a) Homophily is a concept that stems from sociology where it usually described as “people have a strong tendency to associate with others whom they perceive as being similar to themselves in some way, e.g., live in same city, have same hobbies or interest”.
b) The connectance of a network is the probability that 2 nodes are connected. Say we have a network with N nodes and M edges. The connectance is then the ratio between the actual number of edges and the number of edges if the network was fully connected, the latter being the number of combinations of 2 out of N.
c) Dyadicity measures the number of same label edges compared to what is expected in a random configuration of the network, in other words, if the labels were randomly distributed.
d) Heterophilicity measures the connectedness between nodes with different labels compared to what is expected in a random configuration of the network.
e) For a homophilic network, the dyadicity should be less than 1 and the heterophilicity bigger than 1.
e) For a homophilic network, the dyadicity should be less than 1 and the heterophilicity bigger than 1.
Question 9
Which statement is NOT CORRECT?
a) The relational neighbor classifier assumes homophily, or in other words, connected nodes or neighbors have a propensity to belong to the same class.
b) Relational logistic regression uses only network (or link) attributes.
c) Relational logistic regression usually gives good performance and is also very commonly applied in the industry.
d) Featurization is one of the most important techniques to account for social network effects.
b) Relational logistic regression uses only network (or link) attributes.
looks at both local and network attributes.
Question 10
Which statement about featurizing social networks is CORRECT?
a) A triangle is a group of three nodes that are all connected to each other. A fraud triangle is a triangle where both connecting nodes are fraudulent. A legit triangle is a triangle where both connecting nodes are legitimate. A semi-fraud triangle is a triangle where one connecting node is fraudulent and the other one legitimate.
b) The number of 1-hop paths indicates how many fraudulent nodes we encounter in all possible 1-hop steps from the node.
c) When doing social network featurization, it is important to include as many features as possible. Ideally, it is recommended to add all three types of features to the data: features representing the dependent variable characteristics of the neighbors, features representing the independent variable characteristics of the neighbors, and features representing network characteristics.
d) All statements are correct.
d) All statements are correct.
Question 11
Which statement about PageRank is NOT CORRECT?
a) PageRank is an example of a collective inference procedure, which was developed by Page and Brin in 1999. It is the basis of Google’s famous search engine.
b) The PageRank value represents the probability of visiting a web page. It uses the democratic structure of the web as it is reflected by its link structure.
c) The PageRank value of page A will be high if another page B refers to A, and B has a low PageRank value with a lot of outgoing links.
d) PageRank accounts for the probability of random surfing behavior.
c) The PageRank value of page A will be high if another page B refers to A, and B has a low PageRank value with a lot of outgoing links.