Comm 156 Midterm Flashcards
What is a network?
Mathematical model of interactions. (collection of points joined together in pairs by lines)
What are the two ingredients that every network has?
Nodes and links
What is the proper verbiage when describing a network?
Two nodes are connected if… “they share, they go to”
What is the degree of a node?
The number of links connected to the node
What is a component of a network?
A subnetwork in which every node is connected to every node through chain of links. There can be more than 1 in a network.
What is the small world effect?
The minimum number of links that one must pass to get to another node
What does the Allen Curve state?
The probability of two people communicating diminishes as the distance increasing.
What did the “This Hurts Me As Much As It Hurts You” reading find?
People closer together in a concentration network will tend to reinforce their behaviors, where more connections within groups can expose people to new behavior
What did the Enron emails reveal?
They revealed that hubs within a project network highlighted illegal activity.
What is a bipartite network?
Nodes that can be divided into two groups so that all the connections go from one group to the other.
What can a bipartite network be projected into?
Two different networks that each only have one type of node. “____ is connected if they share a ____” (Such as: ingredients, and flavor compounds”
What is a weighted network?
Different nodes have different weights (size) according to the strength of their connection, aka how many links they have attached to it
What did the Framingham study find?
People’s behavior is correlated to the behavior of their neighbor, or close neighbor. Aka if my neighbor smokes, I am more likely to smoke and a neighbor two links down is more likely to smoke than a random person in the population. The further you are, statistically is goes down, however.
What is a homophily?
When nodes in a network tend to connect to similar nodes.
What is a directed network?
When a node goes from one to the other; usually shown with an arrow
What are network communities?
They are subgroups of a network with a lot of connections within the group but not between the groups
Potential Causal Explanation for Social Contagion: Influence
An individual’s behavior affects the behavior of her network neighbors (as in the obesity study) . Network changes behavior. Being connected can lead to similar behavior.
Potential Causal Explanation for Social Contagion: Selection/ Network Dynamics
The network itself changes. Similar people become closer (homphily) and dissimilar people break ties. Similar behavior can lead to being connected. (as in the liberal/conservative/independent blog)
Potential Causal Explanation for Social Contagion: Exogenous Covariates
An external factor that is somehow correlated with the network causes the ties to change. (such as, geography)
What is the mathematical model for going viral? When does an epidemic occur?
Susceptible > Infected > Recovered. (SIR) An epidemic occurs when the infection rate exceeds the recovery rate.
What is the mathematical equation for the Epidemic threshold? What do the variables stand for? What is that number otherwise known as?
cid > 1 (C= contact rate, i= infectivity rate, and d= duration time) - The Viral Tipping Point
What is the infection rate?
The total contacts between susceptible people and infected people
What is infectivity?
The probability that a contact between a susceptible person and an infected person will lead to an infection
What is the contact rate?
On average, how many contacts does the susceptible person have a day
What is the SIR graph model that will happen over time?
The S people will go from 100 > almost 0. The I people will go in a bell curve. The R people will go from 0 > almost 100.
What was Ebola a failure?
It was always less than 1, so it dies. Low contact rate between people in rural Africa and those who contacted it, usually stayed home in isolation. Low infectivity rate because bodily fluid has to go into us. Low duration time because people tend to die pretty quickly.
Why was H1N1 not a failure?
It was contacted in the cities, where air travel and modern transportation increased the contact rate. Easier to transmit than ebola because people just need to be near you. There is a longer duration time (weeks).
What is the small world problem? 2 possibilities.
The probability of 2 people with mutual acquaintance meeting up. X –> Z through Y. Either everyone can be linked through a small amount of acquaintances or people have circles and cliques that prohibit any 2 random people from meeting. Additionally we have social classes and cliques.
What 3 things need to be done to start an epidemic?
Increase the contact rate, increase the infectivity rate, and increase the duration time. That way, cid > 1
Which marketing features work better: Viral or Passive? Why?
Viral works better because you are personally sending a message to a friend, rather than it being automatic every time you use that application. It works better because Active has a larger I and D, but the C is low because people ignore it. Viral has a low I and D but the C is high and overtime it has a longer duration.
What is the issue with the viral tipping point?
It assumes that:
- Everyone interacts at random
- Everyone is equally as susceptible
- All people have equal amounts of contacts
However:
- We often interact with the same people
- Our status is correlated with the status of others
What is the problem with using social science to measure contagion?
We don’t always have a large enough N (sample size). It is difficult to manipulate networks in a controlled way. They are expensive. Are not all face-to-face anymore.
How do we solve the social science problem?
We use a model network that “looks like” a real network. This allows us to be able to measure and manipulate the features of the network.
What is the average path length? How do we find it?
For any two nodes, this is the shortest path between them (the distance). You measure the length between all the pairs of nodes and then take the average.
What is the average degree? How do we find it?
This is the average number of links connected to a node. We add up the number of links that every node has and then take the average.
What is the clustering coefficient? What is the formula?
For each node, this is the number of connections between neighbors of that node divided by the number of possible connections between those neighbors.
Is is the # of connections among friends OVER # of possible connections among friends
But, there is a formula to find the bottom when the network is too big and you cannot count. It is:
N(N-1) OVER
2
In a real life social network, what happens with the clustering coefficient?
It tends to be high, since my friends tend to be friends with one another. “Birds of a feather flock together”
P in a regular and a random network. What happens in the middle? What happens when we increase randomness in a network?
The P (which is the rewiring possibility) in a regular network is 0. The P in a random network is 1. In the middle, after we rewire:
0 <= 0 <= 1
As we increase randomness, P gets closer to 1.
Clustering and path lengths in a structured ring lattice network versus a random network. How are these combined in a real network?
Ring lattice:
- Clustering is high and path lengths are high.
Random network:
- Clustering is low and path lengths are low.
Real network:
- Clustering is high and path lengths are low.
How does rewiring affect clustering and path lengths?
Rewiring makes path lengths shorter, but is does not affect clustering much.
How does the mathematical formula change from a random network to a general network through contagion stuff? What is the 1/c?
cid > 1 , you divide c so id > 1/c and you set d to 1, so:
i > 1/c
The i/c is the t (tau) , which is the tipping point for a non-random general network. It is the bar something has to cross to spread. It will depend on a network as it is being rewired.
What happens to the tipping point when networks are being rewired? What about in a real-life network? Whose model was this?
As we increase the randomness, the tipping point goes down and that makes contagion easier. This is the Watts-Strogatz Networks.
What 2 concepts were found in Malcom Gladwell’s Six Degrees of Lois Weisberg?
The Law of the Few & The Strength of Weak Ties
What is the law of the few?
It found that exceptional people who find out about a trend, through their social connections and energy, are able to spread things.
What is the T(tau) network and how does it work mathematically?
It is how infectious something has to be to spread in the network.
Mathematically, it is i > 1/c and if the c is 4, then
i > 1/4.
What happens in the T(tau) formula wen we make things less random, if it is i > 1/4?
As we make things less random, i goes up. So, as it moves from a random to a ring lattice, it becomes harder to spread. The tipping point will be greater in a ring lattice than it is in a random network because i is going up and it is harder to spread.
So, if the c is 6, then T(tau) is 1/6 and a ring lattice is i > 1/6 and in a random network, i < 1/6.
What is network centrality trying to measure?
It attempts to measure who is important in enough to pay to tweet, to give a product to, to choose to follow, to endorse your product, etc.
What are the 2 measure of centrality that we measured? Which one should we use?
Degree Centrality & Eigenvector Centrality. The one we use depends on what problem we are trying to study.
What is degree centrality? What does it award?
It is its degree divided by n-1, where n is the total number of nodes in the network. It awards 1 point for every neighbor.
When is degree centrality a good measure of influence? When is it not?
It is a good idea if it takes 1 step to reach a lot of people. It is a bad idea when it takes more than 1 step to reach a lot of people - doesn’t capture enough.
What is eigenvector centrality? Why is it different than degree centrality?
It gives each node a score proportional to the sum of the scores of all of its neighbors. (Neighbors are not equal- some have a stronger influence)
What is the mathematical equation for eigenvector centrality?
My score = K x (the sum of my neighbors)
What happens when we find K in a network while finding the eigenvector centrality?
K will hold for all of the nodes.
Mass Marketing Equation
Np = n
p- probability per contact
N- reach
n- how many people do it
Degree Distributions for Connections
Most people have few connections, and a few people have a huge number of connections. This is otherwise known as the power law distribution.
Twitter Cascades and the Power Law Distribution
Most tweets are reposted 0 times, so it is not actually passed on. The chance of a small cascade has a high probability, but the the chance of a big cascade has a very low probability.
Can we predict influence (Statistical Model)?
You can look at the date from the past to predict the future. Look at their # of followers, # following, tweet frequency, time on twitter, and their past cascades to predict their future influence.
What 2 factors predict Twitter influence?
Their # of followers and if they have had influence in the past.
Bayes’ Rule and the Base Rate Fallacy (as studied in class)
Probability that something goes viral given that an influencer tweeted it. Probability that you are an influence given that it spreads is high, unlike the probability that is spreads given that you are an influence.
Does contact matter when attempting to spread something?
NO.
Would we rather do 1 person with 10,000 followers or 100 people with 100 followers?
We would do it with 100 people with 100 followers. It is more cost effective, as well as seeing it multiple times from a lot of people. Cascade sizes are not different whether they are an influencer or a regular person. It will depend more on the network than on the individual.
What is the fundamental attribution error?
People tend to overemphasize the person, but often forget the context.
Ex: There was nothing special about Daphni Leef. There was already a huge discomfort with the housing prices and it just took 1 person to speak out and rebel against it for others to join. The network was already waiting for an individual.
Ro = cid > 1 What happens when R is greater than 1?
The viral marketing effect was not a failure. It has to be greater than 1 for it to take off. It it is less than 1, it will die off.
What is big seed marketing?
It is when you have a small seed that you send out to initial connections. As it continues down, it will eventually burn itself out, but it is still infecting a lot of people. This is done by starting out with traditional marketing and then tacking on the referral to get the seed.
What are complex contagions?
If a person becomes infected after some threshold number (or fraction) of friends are infected. This takes into account our past interactions. They need clustering in order to invade and survive. Rewiring can cause it to be harder to infect, as some may only have 1 neighbor and they need 2 or more for it to spread.
What are simple contagions?
Things that spread by simple contact, or one time, like a disease. Here, unlike in the complex contagions, we cannot choose to get infected.
Simple Contagions & Random
Simple contagions are easier to spread in a random network because they only need 1 contact, so rewiring helps it spread quicker.
Complex Contagions and Ring Lattice
Complex contagions are easier to spread in a ring lattice. Rewiring makes it harder to spread, since they need at least 2 and there is less clustering.
Critical Mass
It is the threshold needed for a complex contagion to spread. Above, it will grow and below it will die. (Like Ro and the big seed marketing) BUT it is different than Ro because it requires certain amount of connections.
Facebook Selective Release Strategy
They released it to certain people at a time. Harvard, then Ivy Leagues, then College then General. This allowed them to always have a low critical mass and high clustering, which made it easier to spread.
Why did Google + fail?
It failed because you could only sign up if someone invited you to. Therefore, there was no critical mass. That would only work if it was a simple contagion. (Viral)
What is the community detection algorithm?
Methods for detecting groups in a network based on the patterns of connections.
Modularity
Given a division of the network nodes into groups (partition of the nodes), the modularity of the division is the fraction of edges within the group minus the fraction expected by change. A good division will have a high modularity.
Structural Equivalence
It looks almost like classes. We are separated by the class and people like us, are on our level.
What is human capital? What is the residual?
What gives you value as an individual. Accumulated training, eduction, experience, skills, etc.
The residual is what’s left over with those that are underperforming
What are the clusters and bridges of a small worlds network? What are their clustering coefficients?
Clusters are a tight knit group of people who have a lot of connection between the groups, often by homophily. Bridges are long rage connections from moving or a random connection outside of your domain. Clusters have a high clustering coefficient and bridges have a low clustering coefficient.
Social Capitals: Closure
It is a network consisting of a single cluster and has a high clustering coefficient. (This was Cliff) Their value is:
- trust (group punishment)
- efficiency (shared jargon)
- known roles and what theyre good at
- affirmation (think the way they do)
Social Capitals: Broker
It is a network consisting of bridges connecting multiple clusters and has a low clustering coefficient. (This is Bob) Their value is:
- diversity (different groups and a lot of perspective)
- information (know a lot due to connections)
- vision (can see a change coming)
Brokers vs. Closures. How to succeed?
Typically, brokers do better and the underperformers tend to be closures. Closures tend to be echochambers. Brokers can only maximize themselves if they take advantage of the network, since they can be weak and non-jargon ties. You need broker connections acting like closure ties. It cannot be forced, or it will not work. Your friends connections have no effect on you, but you can look for structural gaps and fill those.
What to do to create broker ties acting like closure ties?
- Maintain connections we already have by reaching out to them and when you face a tough challenge, they will be there for you.
- Shared activities are a proven way, since we all have passions, there are stakes involved, they are genuine, and removes away hierarchy.
Centrality and directed networks
Pagerank does not do well in undirected networks. Eigenvector centrality is not useful in directed networks.
What is Pagerank?
It is a lot like degree centrality, but it is rescaled. Is is only useful when there are directed networks. It splits up equally among all of the nodes its connected to.
What is Betweenness centrality?
Each node gets a score proportional to the # of shortest paths that go through the node.
3 biases that hinder us
- Overconfidence
- Anchoring
- Endowment Effect (we latch on to things we already have)
Crowd Forecasts
They always beat its average members because of diversity. Some people are over and others are below the correct answer, but they cancel each other out and crowd error is lower. If all in the same direction, they have the same average error. If
Crowd Error
The error of the crowd is always less than or equal to the average error of its members.
Diversity Prediction Theorem
Crowd error = average error-diversity
Diversity matters just as much as accuracy. It is always true and the sample size does not matter. Add someone if they are different, not if you have someone a lot like them in the crowd already.