Network Analysis Flashcards
Data Tools and Techniques
- Basic Data Manipulation and Analysis
- Data Mining
- Machine Learning
- Data Visualization
- Data Collection and Preparation
Over a specific type of data
Performing well-defined computations or asking well-defined questions (“queries”)
Basic Data Manipulation and Analysis
Looking for patterns in data
Data Mining
Using data to make inferences or predictions
Machine Learning
Graphical depiction of data
Data Visualization
A _ is a collection of nodes (or vertices) and edges (or links) that represent relationships or connections between entities.
Network (graph)
note: dash in code (-) is actually underscore(_)
Load graph from CSV file with no header
f = open(‘Friends.csv’)
G = nx.read_edgelist(f, delimiter=’,’, nodetype=str)
print(G)
Graph with 10 nodes and 15 edges
Display graph
nx.draw(G, with-labels = True , node-size = 1500, node-color = ‘c’)
Displays graph
~
If Directed Graph
f = open(‘Follows.csv’)
D = nx.read_edgelist(f, delimiter=’,’, nodetype=str, create-using=nx.DiGraph())
print(D)
nx.draw(D, with-labels=True, node-size=1500, arrows=True, node-color=’c’)
~>
DiGraph with 10 nodes and 18 edges
Displays graph
The _ represent entities in a network.
Nodes (or vertices)
Iterating through nodes of the graph
for n in G:
print(n)
~>
Aaron
Chris
Emma
...
The _ represent connections or relationships between nodes.
Edges (or links)
Friends lists
for n in G:
print(n, ‘is friends with:’)
friends = G.neighbors(n) # friends is iterator
for f in friends:
print(‘ ‘, f)
~>
Aaron is friends with:
Chris
Emma
Chris is friends with:
Aaron
Drew
...
Friends lists v2
for n in G:
print(n, ‘is friends with:’, list(G.neighbors(n)))
~>
Aaron is friends with: ['Chris', 'Emma']
Chris is friends with: ['Aaron', 'Drew']
...
The _ is the number of edges connected to a node.
Degree
Degree
numfriends = G.degree
print(numfriends)
print(“”)
for n in numfriends:
print(n[0], ‘has’, n[1], ‘friends’)
print(“”)
~
Or can treat list of pairs like a dictionary
for n in G:
print(n, ‘has’, numfriends[n], ‘friends’)
~>
Aaron has 4 friends
Chris has 5 friends
...
Edges can be _ or _.
Directed, Undirected
Undirected edges imply a mutual connection (e.g., friendships), while directed edges indicate a one-way relationship (e.g., following someone on social media).
It is the study of the structure and behavior of networks, focusing on the relationships between nodes (entities) and edges (connections).
Network Analysis
It helps to understand patterns, connectivity, and dynamics within complex systems, such as social networks, communication systems, or transportation grids, using mathematical and computational techniques.
Network examples
- Flight Routes
- Disease Transmission
- Food Chain
- Criminal Networks
- Science Citations
- Retweets
- Facebook Friends
Other Examples
* Electricity grid + other civil infrastructure
* The brain + other biological structures
* Organizations and organizational behavior
* Spread of memes, other social phenomena
* And many, many more…
In network analysis, the _ of a graph measures how many edges are present in the graph compared to the maximum possible number of edges. It is a ratio that reflects the level of connectivity between nodes.
Density
A density of 1 means full connection while near 0 indicates sparse links
Density of graph
numnodes = G.number-of-nodes()
numedges = G.number-of-edges()
possedges = G.number-of-nodes() * (G.number-of-nodes() - 1)
print(‘Number of nodes:’, numnodes)
print(‘Number of edges:’, numedges)
print(‘Possible edges:’, possedges)
print(‘Density (edges divided by possible edges):’, numedges/possedges)
~>
Number of nodes: 10
Number of edges: 15
Possible edges: 90
Density (edges divided by possible edges): 0.16666666666666666
Using density function
print(‘Using density function:’, nx.density(G))
~>
Using density function: 0.3333333333333333
What is the formula for graph density?
Density = number of edges / number of possible edges
[Directed] Possibleedges = n(n−1)
[Undirected] Possibleedges = (n(n−1)) / 2
where n is the number of nodes
The _ in a graph is the minimum number of edges required to travel between two nodes in the network.
Shortest path
Shortest path (or shortest distance) between given pair of nodes
“Six degrees of separation”
(Four in Facebook)
Overall average shortest distance
print(‘Average shortest distance:’, nx.average-shortest-path-length(G))
~>
Average shortest distance: 2.022222222222222
The _ of a graph is the maximum shortest distance between any pair of nodes in the graph.
Diameter
Maximum shortest distance in graph
Diameter
print(‘Diameter:’, nx.diameter(G))
~>
Diameter: 4
The _ in a graph are sets of fully connected nodes, where every node is connected to every other node in the set.
Cliques
Sets of fully-connected nodes
Maximal cliques
cliques = nx.find_cliques(G) # cliques is iterator
for c in cliques:
print(c)
Modify code to only print cliques > 2
for c in cliques:
if len(c) > 2:
print(c)
~>
['Josh', 'Mike', 'Jess']
['Josh', 'Aaron', 'Jess']
['Josh', 'Aaron', 'Emma']
['Chris', 'Sarah', 'Drew']
The _ in a graph measures how close a node is to all other nodes, based on the average shortest distance from the node to all others.
Closeness centrality
Average shortest distance to all other nodes
(inverted so higher is “better”)
Closeness centrality - average shortest distance to other nodes, normalized on reverse 0-1 scale
cc = nx.closeness-centrality(G)
print(cc)
print(“”)
sorted-keys = sorted(cc, key=cc.get, reverse=True)
print(sorted-keys)
for k in sorted-keys:
print(k, ‘has closeness centrality’, cc[k])
~>
{'Aaron': 0.6, 'Chris': 0.6923076923076923, ...}
['Chris', 'Aaron', ...]
Chris has closeness centrality 0.6923076923076923
Aaron has closeness centrality 0.6
...
The _ in a graph indicates the number of shortest paths that pass through a particular node, showing how crucial it is to network connectivity.
Betweenness centrality
Number of shortest paths the node lies on
Betweenness centrality - number of shortest paths it’s on, normalized on 0-1 scale
bc = nx.betweenness-centrality(G)
sorted-keys = sorted(bc, key=bc.get, reverse=True)
for k in sorted-keys:
print(k, ‘has betweenness centrality’, bc[k])
~>
Chris has betweenness centrality 0.5555555555555556
Aaron has betweenness centrality 0.1759259259259259
...
In a directed graph, _ is the number of edges directed toward a node (followers).
In-degree
How many “followers”
Number of follows and followers
followers = D.in-degree
print(‘Number of followers: ‘, followers)
~>
Number of followers: [('Aaron', 3), ('Chris', 4), ...]
In a directed graph, _ is the number of edges directed from a node (following).
Out-degree
How many “following”
Number of follows and followers
follows = D.out-degree
print(‘Number of follows: ‘, follows)
~>
Number of follows: [('Aaron', 3), ('Chris', 2), ...]
~
Can treat list of pairs like a dictionary
for n in D:
print(n, ‘follows’, follows[n], ‘people and has’, followers[n], ‘followers’)
~>
Aaron follows 3 people and has 3 followers
Chris follows 2 people and has 4 followers
The _ in a directed graph measures how often links between nodes are bidirectional in a directed network.
Reciprocity
How often links are bidirectional
Reciprocity - people that follow each other
for n in D:
print(f”{n} follows {list(D.neighbors(n))}”)
~>
Aaron follows ['Chris', 'Emma', 'Josh']
Chris follows ['Aaron', 'Drew']
...
~
Alternative reciprocity
cycles = nx.simple_cycles(D)
for c in cycles:
if len(c) == 2:
print(c[0], ‘and’, c[1], ‘follow each other’)
~>
Mike and Jess follow each other
Chris and Aaron follow each other
The _ in a directed graph occur when a sequence of directed edges leads back to the starting node.
Cycles
Cycles
cycles = nx.simple_cycles(D)
for c in cycles:
print(c)
~>
['Mike', 'Chris', 'Drew']
['Mike', 'Jess', 'Aaron', 'Chris', 'Drew']
['Mike', 'Jess', 'Aaron', 'Emma', 'Sarah', 'Chris', 'Drew']
['Mike', 'Jess']
['Chris', 'Aaron']
['Chris', 'Aaron', 'Emma', 'Sarah']
['Sarah', 'Emma']
In network analysis, _ attempts to forecast which new edges will form in the network in the future, often used for friend or follower recommendations.
Link prediction
Predict future edges added to the graph
Friends (or Follows) recommendations
Dolphin friend recommendation
for n1 in G:
~for n2 in G:
~~if n1 != n2 and not G.has_edge(n1, n2):
~~~common = set(G.neighbors(n1)) & set(G.neighbors(n2))
~~~if len(common) >= 4:
~~~~print(f”Dolphins {n1} and {n2} have common friends with
~~~~{sorted(list(common))}”)
~>
Dolphins 6 and 7 have common friends with ['10', '14', '57', '58']
Dolphins 10 and 55 have common friends with ['14', '42', '58', '7']
...
In network analysis, _ identifies groups of nodes that are more densely connected to each other than to the rest of the network.
Community detection
Sets of interlinked/similar nodes
In network analysis, _ describe the spread or propagation of information through a network.
Cascades
Information propagation
Which Python package is commonly used for network analysis?
networkx
import networkx as nx
The networkx package is widely used for creating, manipulating, and analyzing graphs and networks in Python.