Network analysis II - Network prediction Flashcards
Our goal is to be able to predict various things about the network. What are examples of such?
Node Properties
Is a sensor in a sensor network going to fail within the next 30 days?
Link Properties, Link Prediction
Is user A going to become a follower of user B?
Network Classification
Is a molecule a mutagen?
What kind of attributes can there be on a graph
We can have node attributes,
we can also have attributes on edges or multiple types of edges
There can however be nodes or attributes on the graph which is unobserved or just unknown
How can this problem be stated?
Given: a partially observed graph, what are its unobserved properties?
There are two types of classification that can be done
Independent (Local) Classification
- Treat each entity to be classified (node, edge) as an independent case
- Use standard prediction model to make prediction based on features constructed from predictors
Collective Classification
- Take into account dependency of target attribute/link for different entities
- Predict target for all entities jointly or collectively
Explain independent classification
Given a set of node-pair features with computed values,
we can introduce all the local characteristics for this node pair i,j
This will give us a table of information.
We can then use a standard classifier (svm etc.) for classification,
but we have to considder that there are a lot (most) of node pairs with the value 0. (the imbalance problem)
Explain Homophily
For the lawyer /deptartments and worktypes case.
https://gyazo.com/e24b13f3707e34839dcf2857e9714c18
We can assume that collaborating lawyers are more likely to have the same Practice and/or Office location.
Sometimes we know the practice or the office of a single node, and sometimes none of these.
However when we pridict the practise for one, we can assume similar for many other nodes
What types of homophily is there?
Homophily, a.k.a. Auto-Correlation
Linked entities are likely to share attribute values.
Link Homophily
(fact!) Entities are (more) likely to be linked, if they share common neighbors.
For collective classification, what algorithm can we make use of?
The Iterative Independent Classification algorithm
(see figure)
We have node features X depending on an attribute T which describes how we can differentiate. We can then compute the featue values for X, learn the model using training data, then repeaditly pridict missing values and recompute the feature X values. And lastly we can re-tain the model using the predicted values.
Give an example of the Iterative Independent Classification algorithm