Network models Flashcards
Erdős-Rényi model
What is the definition of the Erdős-Rényi model?
We take N nodes and pair them independetly with uniform probability p.
- it has the small world property
- average degree: ⟨k⟩ = (N − 1)p ≃ Np
- # of expected links: L = pN(N − 1)/2
Erdős-Rényi model
What characterizes the degree distribution of an Erdős-Rényi graph? What happens in the N —» ∞ case?
It’s a binomial distribution.
p(k) = (N above k) p^k (1 − p)^(N−k)
- bell-shaped curve
- ⟨k⟩ = Np
- variance: Var(k) = ⟨k^2⟩ − ⟨k⟩^2 = Np(1 − p)
N —» ∞ case: Poisson distribution
p(k) ≃ (⟨k⟩^k/k!) exp(−⟨k⟩)
- it must remain sparse, so ⟨k⟩ = Np = const. —» p —» 0
- Var(k) = ⟨k⟩(1 – p) = const., so it becomes negligable compared to the system size —» very homogenous network
- this is the more realistic case
Real networks are inhomogenous though.
Erdős-Rényi model
What is the clustering coefficient in the E-R graph?
The E-R graph is very democratic, and we expect all nodes to have more or less the same C, thus, C(i) ≃ ⟨C⟩. The clustering coefficient can be interpreted as the probability that two nodes are being connected and we link every pair independetly in the the E-R graph, thus: ⟨C⟩ = p.
Erdős-Rényi model
What is the relative size of a giant component in the E-R model? Where is the critical point and what does it tell us?
S ≈ 1 − exp(−⟨k⟩S), derived from the probability that a node is not in the giant component, u = 1 – S. This equation can be solved numerically.
Critical point: where S becomes larger than zero
- the condition for a non-trivial solution of the above equation: the derivative of the whole thing should be larger or equal to one
- thus the critical point is at k = 1
- for k ≥ 1 we have a giant component in the E-R graph
Watts-Strogatz model
What’s the basic principle of the Watts-Strogatz model for a network? What’s the purpose of this model?
We start from a regular ring of nodes in which the q first neighbors are linked, then we rewire each link randomly with probability β.
The purpose is to have a model that is both small-world and locally highly clustered.
Watts-Strogatz model
What is the average distance and the clustering coefficients in the two extreme limits for β?
β = 0: no rewiring at all
- highly clustered: ⟨C⟩ = (3q − 3)/(4q − 2) —» 3/4
- not small-world: ⟨ℓ⟩ ≃ N/4q (the average distance of all other nodes from any chosen node is going to equal to simply half of the largest distance)
β = 1: rewiring all links at random
- becomes similar to an E-R graph
- small-world: ⟨ℓ⟩ ∼ ln N/⟨k⟩ = lnN/2q
- not highly clustered: ⟨C⟩ = p(ER) = 2q/(N − 1) —» 2q/N
p(ER): p parameter of the E-R model
Watts-Strogatz model
Why does the Watts-Strogatz model work?
Because there is a wide range of β values where C is still relatively high, whereas ℓ is already relatively low, so it can be highly clustered and small-world at the same time.
- a few “long distance” random links can drastically reduce ⟨ℓ⟩: a relatively small number of rewired links can turn the system into a small world network
- however, in order to destroy the large amount of triangles responsible for the high ⟨C⟩ in the initial state, we would need significantly more rewiring steps
Conclusion: it takes a lot of randomness to ruin the clustering, but a very small amount to overcome locality
Watts-Strogatz model
How can we give an optimal value for β so that it’s small-world and highly clustered at the same time?
If β is too small, the network is not small-world however if β is too large, the network is small-world, but the clustering coefficient is also reduced due to the rewirings.
It can be derived analytically that the ℓ(β)-curve for all W-S graphs can be collapsed into a single curve via
ℓ = (N/q) f(βqN),
where f(x) is a universal function, βqN is the # of random links and the transition point is at β(c)qN = 1.
- if βqN «_space;1 then no random links and ⟨ℓ⟩ ∼ N
- if βqN»_space; 1 then many random links and ⟨ℓ⟩ ∼ ln N
Barabási-Albert model
What’s the purpose of the Barabási-Albert model? What’s the basic principle?
The goal is to build a network that’s also scale-free (unlike for the E-R and W-S graphs).
In this model, the number of nodes isn’t fixed, it’s a growing network over time.
- we start from a very small core (e.g.: single link, connected E-R graph with a few nodes)
- at every time step: one new node with m new links
- preferential attachment: when connecting the newcomers to the existing part, high degree nodes are chosen w/ higher probability —» P(choosing node i) = P(i) ~ k(i)
- Real networks are growing ones.
- The initial core has to be at least of size m.
Barabási-Albert model
How does the degree evolve as a function of time?
An analytical function can be derived: k(t) ~ t^(β), where β = 1/2.
- β is the dynamical exponent
- first mover advantage: the earlier a node arrives, the larger the advantage (as in gaining more connections)
- the cumulative distribution: P(k) = 1 – (m/k)² —» power law!
- p(k) = 2m²/k^3
Conclusion: γ = 3 means a scale-free network
The function is a step function at every time step (the degree can increase by 1 at most).
Barabási-Albert model
Why is the preferential attachment necessary?
Because otherwise the system would be homogenous and not scale-free.
Barabási-Albert model
What is the clustering coefficient in the B-A model?
- The probability that i and j – that are introduced at tim t(i) and t(j) – are connected: P(i − j) (m/2)(t(i)t(j))^(−1/2)
- The number of links between the neighbors of node l: n(l) = m^3/16t(l) (ln N)²
In the end: C = m(ln N)²/8N, C does not depend on the node.
Barabási-Albert model
How can we modify the B-A model so that it describes real networks better?
We involve a uniform additive fitness a beside the degree in the attachment process.
- a ∈ [0, m]
- when attaching a new node, the probability of node i gaining a new link: P(i) ~ k(i) – a
- we assume this leads to a scale-free distribution
- in the large degree regime (k»_space; 1): k(i)(t) = m(t/t(i))^{1/(2−a/m)}
- cumulative degree distribution: P(k) ≡ P(ki < k) = 1 − (m/k)^{2−a/m} –» p(k) = 2m²k^{−3+a/m}
- we can tune the degree exponent γ = 3 − a/m between 2 and 3 by changing a
- this solved the problem for a tunable exponent
- Assumption are that for large t: N ~ t, L ~ mt and the formula for k(i)(t) can be derived from the ∂k(i)/∂t = mP(i) diffeq.
- At γ = 0 we recover the original B-A model.
Barabási-Albert model
How can we improve on the additive fitness model?
To enable newcomers to overtake old nodes in the B-A model we need a further multiplicative fitness η drawn from some distribution ρ(η).
- the probability for node i to acquire new links: P(i) = η(i)k(i)/∑j η(j)k(j)
- the time evolution of the degree depends also on the fitness: k(i)(t) ∼ t^{β(η(i))}, β(η) = cη
- this leads to a network where the “fittest” node will become the largest node in the long run (instead of the oldest)
β is the dynamical exponent here.
Barabási-Albert model
How can the multiplicative fitness be mapped to quantum gases?
- fitness η —» energy level ε
- new node with fitness η —» new energy level ε
- link pointing to node η —» particle at level ε
- network —» quantum gas
Depending on ρ(η), we observe 3 phases:
- scale-free: ρ(η) = δ(η − η0)
- fit-gets-rich: nodes have diff. η, scale-free p(k), largest hubs are the fittest nodes in the long run
- Bose-Einstein condensation: nodes have diff. η, fittest node captures a finite fraction of the links, even when N → ∞
Barabási-Albert model
How can we modify the B-A model so that it’s highly clustered?
Klemm-Eguiluz model: each node can be active or inactive, m active nodes in the network at any moment
Holme-Kim model: extra triad formation steps
- after each step, with probability p, m triangles are formed with the new node and the randomly chosen neighbors of the nodes to which the new node was just attached
- leads to tunable clustering coefficient
Configuration model
What are the basic principles of the configuration model?
The goal is to generate a network with N nodes and a given p(k) corresponding to a random sample from all possible realizations.
- first we draw N times from p(k) to obtain the degree sequence, then the “half links” are joined to connect the nodes with each other
- we start with the largest node to end up with a simple graph rather than a pseudo-graph
- this graph will not be completely random so randomization is needed: goal is to keep the degree sequences preserved
- probability of two nodes connecting: P(i) = k(i)k(j)/2L
We can measure significance by randomizing the graph, so that’s why it’s important.
Hidden variable model
How to generate a random graph with degree correlations?
With the hidden variable model.
- define a distribution ρ(h) for the hidden variables, and a joint distribution r(h, h′) for the linking probabilities
- we take N nodes, and for each we draw its hidden variable from ρ(h)
- very node pair i and j are connected with a probability: P(i − j) = r(h(i), h(j))
Hidden variable model
How can we describe the overall average degree in the hidden variable model?
P(k ∣ h): the conditional probability that a node has degree k given its hidden variable is h
- degree distribution: p(k) = ∑(h) P(k ∣ h)ρ(h)
- normalization: ∑(k) P(k ∣ h) = 1 for all h
- average degree of nodes with given h: k(h) = ∑(k) kP(k ∣ h)
- overall average degree: ⟨k⟩ = ∑(k) kp(k) = ∑(k) k ∑(h) P(k ∣ h)ρ(h) = ∑(h) k(h)ρ(h) = ⟨k⟩
Hidden variable model
What is the distribution of a node with a given hidden variable connecting to other nodes?
Unfolding: P(k ∣ h) = ∑(k1,…,kc) P1(h)(k1, h1) P2(h)(k2, h2)…Pc(h)(kc, hc) δ(k1+k2+⋯+kc,k), where Pi(h)(ki, hi) denotes the probability that a node with hidden variable h is connected altogether with k(i) links to other nodes with hidden variable h(i).
- h(c): maximal value of h
Since links are introduced independent of each other: binomial distribution
Pi(h)(k(i), h(i)) = (N(i) over k(i))r(h(i), h)^k(i) [1 − r(h(i), h)]^(N(i)−k(i)),
where N(i) = Nρ(h(i)) is the number of nodes with hidden variable h(i).
Hidden variable model
What’s the generating function of the hidden variable model? How can we describe the average degree with it? How does this change is the network is sparse?
The generating function of the binomial distribution: G(h)(i) (z) = [1 − (1 − z)r(h(i), h)]^(N(i))
k(h) = G′(z)∣(z=1) = [ln G(z)]′∣(z=1) = N ∑(h′)ρ(h′)r(h′, h)
Sparse case: k(h) is independent of N
- if ρ(h) is size independent: r(h′, h) = C(h′, h)/N is a natural assumption, where C(h′, h) is a size independent “nice” function
- *G(z) ≃ exp [(z − 1) ∑(h′)ρ(h′)C(h′, h)] = exp [(z − 1)k(h)] *—» the probability mass function is the Poisson distribution
- k(h) = N ∑(h′) ρ(h′)r(h′, h) = ∑(h′) ρ(h′)C(h′, h)
r(h(i),h): connection function
Hidden variable model
How does the degree correlation look like for this model?
- Defining P(h′ ∣ h): conditional probability that a link is connecting to a node with hidden variable h′, given that the node at the other end has h
- Based on P(h′ ∣ h), P(k′ ∣ k) can be defined as well
- ANND can be written as: k(nn)(k) = [1/p(k)] ∑(h)k(nn)(h)P(k ∣ h)ρ(h)
Hidden variable model
How to generate a random graph with prescribed degree correlations?
We can use the method of hidden degrees.
Analogously to e(k′,k), if we specify a joint probability mass function P(h′, h), the distribution of the hidden variables can be written as: ρ(h) = ⟨h⟩/h ∑(h′) P(h′, h).
- # of nodes with h is N(h) = Nρ(h) which implies k(h) = h —» P(k ∣ h) = h^(k)exp(−h)/k!
- p(k) = ∑(h) h^(k)exp(−h)/k! ρ(h)
- k(nn)(k) = 1/p(k) ∑(h′) k(nn)(h) h^(k)exp(−h)/k! ρ(h)
In the large degree regime: p(k) ∼ ρ(h = k), k(nn)(k) ∼ k(nn)(h = k)