Final Exam Flashcards
In networks, what is degree? How is it measured?
A measure of local centrality. It is the most crude measure of how well connected a node is to other nodes.
It is measured by counting the number of edges(connections) a node has.
In networks, what is betweeness? How is it measured?
A measure of global centrality. It is a way to measure how well connected a specific node is to other modes.
It is measured by summing all the SHORTEST paths in a network that the node is on.
To calculate: Take all shortest paths between all two-node combinations and count how many times the specific node appears.
in networks what is centrality?
the extent to which each node is connected to other nodes and appears in the center of the graph.
In networks, what are the 4 measures of centrality?
degree
farness
closeness
betweenes
In networks what is a node?
An individual unit in the analysis
in networks what is an edge/vertice?
a line that represents the existence of a relationship between any pair of nodes.
What is a directed network? Given an example
A directed network is a network in which the edges travel either in or out, the edges only travel one direction.
Example: Twitter followers/following other.
What is an undirected network? Give an example
Qn undirected network is a network with edges that represent a two-way relationship that can travel both directions ad therefore has no direction
Example: On facebook by being “friends” the relationship ahs to go both ways.
In networks, what does in-degree mean? Give an example
In a directed network, in-degree is a measure of centrality that measures the number of incoming edges a node has
Example: In twitter the people who follow you are in -degree
in networks, what does out-degree mean? Give an exampl
in a directed network, out-degree is a measure of centrality that measures the number of outgoing edges a node has
example: in Twitter the people you follow is an out degree
in networks, what is farness? How is it measured?
Farness is a measure of centrality that measure how far away (distance) a node is from every other node,.
To measure farness, sum the distances between a node and every other node.
in networks, what is closeness? how is it measured?
Closeness is the inverse of farness. tells you how close a node is to every other node.
To measure closeness divide 1/farness
Explain the intuition behind interactions
When a hypothesis is conditional and the effect of a variable depends on another variable, the second variable becomes part of the equation rather than being “controlled” for in the equation. Interactions model this conditional effect.
What three terms are required for interactions?
Two separate constituent variable components and the interaction term.
in interactions what do the constituent terms mean
The effect of that term on Y when the other constituent term is zero
in interactions what does the interaction term mean
The slope of the conditional relationship
In interactions what is the interactive effect
The effect of all three terms
What is the equation for interactions
y= α + β1(Consituent 1) +β2(Constiuet2) + β3(β1*β2)
What is the unit of analysis
the unit that represents the entity you are studying
ex. country, individual, household, congressional district, state
What is the unit of observation
what uniquely identifies the observation being studied.
- is a characteristic of the unit of analysis
ex. country-year, state-month, individual wave
What is bias?
What are the 5 types of potential bias in survey sampling?
bias is the systematic faults in the sampling system. If it is not systematic then it is just white noise and not bias
1.) frame bias
2.) selection bias
3.) Unit non-response bias
4.) Item non-response bias
5.) response bias
What is Frame bias?
When the general population frame is non-representative
What is selection bias?
when the sample population is systematically not randomized
What is unit non-response bias?
When people in the sample or frame population systematically do not respond/participate in the survey
What is item non-response bias?
When participants in the survey systematically do not respond to a specific item on the survey
What is response bias?
When respondents lie on the survey or do not tell you the real response
ex.) social desirability bias, people tell you the answer they think is the most socially correct, not their real answer.
What are list experiments?
When are they useful?
Example?
List experiments are when the control group of respondents is given a list of 3 items and are asked how many of the 3 they support (or another indicator) and the treatment group is given the same list but with an extra 4th item. If the average number of “supported” items reported is increased in the treatment group compared to the control group, this indicates “support” for the 4th variable in the list.
useful when the questions are sensitive or there is social pressure.
Ex.) to determine if afghanis supported the Taliban a control group was given a list of 3 organizations to support, the average response was calculated. A treatment group was given the same question and list with he addition of the taliban. The increase in average supported groups was 2 in the control and 3 in the treatment. This indicates they do support the taliban.
What is probability Sampling?
Why is it used?
Is used to ensure representativeness.
Is when every unit in the population has a known non-zero probability of being selected to participate in the study
What is Simple Random Sampling?
Is used to properly randomize the sample. The bigger the sample, the more accurate the results.
In simple random sampling, every unit has an equal selection probability.
How do you find the interquartile range?
Subtract Q1 from Q3
How do you find Range?
subtract the minimum number from the maximum number
How do you find the three Quartiles?
Start by finding the median of the entire list. The median is considered Q2. The median then separates the list into two halves. Locate the median of the first half of the list, this median is Q1. Locate the median of the second half of the list, this median is Q3.
How do you determine if a number is an outlier?
You must find the highest and lowest limit of the dataset for non-outlier numbers. To find the lowest acceptable number take Q1 - 1.5IQR. To find the highest acceptable number take Q3 + 1.5IQR. If the number in question is below or above either of these numbers it is an outlier.
in linear regressions, How do you find Standard Deviation? What is the formula?
Formula:
SD = sqrt (1/n-1 * sum (xi-mean of X)^2
Steps:
1.) find the mean of X
2.) Subtract the mean from each x variable
3.) Square each result from step 2
4.) Add together all the squares
5.) Divide the sum of the squares by the total number of observations minus 1
6.) Square root the result of step 5
The result of step 6 is the standard deviation
How do you find the median?
What is the benefit of using median
If you have an odd amount of numbers locate the exact middle number.
If you have an even amount of numbers locate the two middle numbers, add them together, and divide the sum by 2.
benefit: more robust against the impact of outliers
How do you find Mean?
What is a detriment to using mean?
add together all of the numbers and divide the sum by the total amount of numbers
Detriment: can be influenced by outliers which pull the average too high or too low.
What is non-probability sampling
When all members of a population do NOT have an equal chance of participating in the study
what is a variable?
What is the key rule?
An empirical measure of a concept/characteristic.
key rule: variables must vary across observations
what are the 2 types of variables, describe them
1: Quantitative/Interval/Continuous- observations can take on an infinite number of numerical values between any two values (decimals).
2: Categorical — observations belong to one of a discrete set of categories & we assign a number to each category
what are the 3 types of categorical variables, describe them
1.) Nominal — categories are named (independent) but there is no order or ranking involved.
2.) Ordinal — categories are ranked
3.) Dichotomous variables — two values (e.g., yes/no)
What does the distribution of a variable tell us?
what values a variable takes and how often it takes on these values
what are the two types of modes a distribution can have, define them
unimodal: one mode/one hump in a distribution
bimodal: two modes/two humps in a distribution
what two S words are used to describe distribution?
define them
symmetric- looks the same on both sides, a normal bell curve distribution
skewed- the data bunches on one side of the curve and creates a tail on the other.
what is a Z score?
the score given to each observation of a variable which measures the number of standard deviations an observation is above or below the mean
It is a measure of deviation from the mean
It is not sensitive to how the variable is scaled and or shifted.
differentiate the two types of skewnees
right skew- the tail is on the right
left skew- the tail is on the left
how can you transform variables
You can collapse continuous variables into ordinal (or nominal) variables. this does not work in the reverse
ex. you can turn incomes into categories of incomes
Log Transformation for continuous variables
Why do we plot distributions
To better understand the spread of the data and to know if we need to log transform it.
What is probability
the set of mathematical tools that measure and model randomness in the world. It is a mathematical model of uncertainty