Final Exam Flashcards

Question 1

Q

In networks, what is degree? How is it measured?

Answer

A

A measure of local centrality. It is the most crude measure of how well connected a node is to other nodes.

It is measured by counting the number of edges(connections) a node has.

Question 2

Q

In networks, what is betweeness? How is it measured?

Answer

A

A measure of global centrality. It is a way to measure how well connected a specific node is to other modes.

It is measured by summing all the SHORTEST paths in a network that the node is on.

To calculate: Take all shortest paths between all two-node combinations and count how many times the specific node appears.

Question 3

Q

in networks what is centrality?

Answer

A

the extent to which each node is connected to other nodes and appears in the center of the graph.

Question 4

Q

In networks, what are the 4 measures of centrality?

Answer

A

degree
farness
closeness
betweenes

Question 5

Q

In networks what is a node?

Answer

A

An individual unit in the analysis

Question 6

Q

in networks what is an edge/vertice?

Answer

A

a line that represents the existence of a relationship between any pair of nodes.

Question 7

Q

What is a directed network? Given an example

Answer

A

A directed network is a network in which the edges travel either in or out, the edges only travel one direction.

Example: Twitter followers/following other.

Question 8

Q

What is an undirected network? Give an example

Answer

A

Qn undirected network is a network with edges that represent a two-way relationship that can travel both directions ad therefore has no direction

Example: On facebook by being “friends” the relationship ahs to go both ways.

Question 9

Q

In networks, what does in-degree mean? Give an example

Answer

A

In a directed network, in-degree is a measure of centrality that measures the number of incoming edges a node has

Example: In twitter the people who follow you are in -degree

Question 10

Q

in networks, what does out-degree mean? Give an exampl

Answer

A

in a directed network, out-degree is a measure of centrality that measures the number of outgoing edges a node has

example: in Twitter the people you follow is an out degree

Question 11

Q

in networks, what is farness? How is it measured?

Answer

A

Farness is a measure of centrality that measure how far away (distance) a node is from every other node,.

To measure farness, sum the distances between a node and every other node.

Question 12

Q

in networks, what is closeness? how is it measured?

Answer

A

Closeness is the inverse of farness. tells you how close a node is to every other node.

To measure closeness divide 1/farness

Question 13

Q

Explain the intuition behind interactions

Answer

A

When a hypothesis is conditional and the effect of a variable depends on another variable, the second variable becomes part of the equation rather than being “controlled” for in the equation. Interactions model this conditional effect.

Question 14

Q

What three terms are required for interactions?

Answer

A

Two separate constituent variable components and the interaction term.

Question 15

Q

in interactions what do the constituent terms mean

Answer

A

The effect of that term on Y when the other constituent term is zero

Question 16

Q

in interactions what does the interaction term mean

Answer

A

The slope of the conditional relationship

Question 17

Q

In interactions what is the interactive effect

Answer

A

The effect of all three terms

Question 18

Q

What is the equation for interactions

Answer

A

y= α + β1(Consituent 1) +β2(Constiuet2) + β3(β1*β2)

Question 19

Q

What is the unit of analysis

Answer

A

the unit that represents the entity you are studying

ex. country, individual, household, congressional district, state

Question 20

Q

What is the unit of observation

Answer

A

what uniquely identifies the observation being studied.

is a characteristic of the unit of analysis
ex. country-year, state-month, individual wave

Question 21

Q

What is bias?
What are the 5 types of potential bias in survey sampling?

Answer

A

bias is the systematic faults in the sampling system. If it is not systematic then it is just white noise and not bias
1.) frame bias
2.) selection bias
3.) Unit non-response bias
4.) Item non-response bias
5.) response bias

Question 22

Q

What is Frame bias?

Answer

A

When the general population frame is non-representative

Question 23

Q

What is selection bias?

Answer

A

when the sample population is systematically not randomized

Question 24

Q

What is unit non-response bias?

Answer

A

When people in the sample or frame population systematically do not respond/participate in the survey

Question 25

Q

What is item non-response bias?

Answer

A

When participants in the survey systematically do not respond to a specific item on the survey

Question 26

Q

What is response bias?

Answer

A

When respondents lie on the survey or do not tell you the real response

ex.) social desirability bias, people tell you the answer they think is the most socially correct, not their real answer.

Question 27

Q

What are list experiments?
When are they useful?
Example?

Answer

A

List experiments are when the control group of respondents is given a list of 3 items and are asked how many of the 3 they support (or another indicator) and the treatment group is given the same list but with an extra 4th item. If the average number of “supported” items reported is increased in the treatment group compared to the control group, this indicates “support” for the 4th variable in the list.

useful when the questions are sensitive or there is social pressure.

Ex.) to determine if afghanis supported the Taliban a control group was given a list of 3 organizations to support, the average response was calculated. A treatment group was given the same question and list with he addition of the taliban. The increase in average supported groups was 2 in the control and 3 in the treatment. This indicates they do support the taliban.

Question 28

Q

What is probability Sampling?
Why is it used?

Answer

A

Is used to ensure representativeness.
Is when every unit in the population has a known non-zero probability of being selected to participate in the study

Question 29

Q

What is Simple Random Sampling?

Answer

A

Is used to properly randomize the sample. The bigger the sample, the more accurate the results.

In simple random sampling, every unit has an equal selection probability.

Question 30

Q

How do you find the interquartile range?

Answer

A

Subtract Q1 from Q3

Question 31

Q

How do you find Range?

Answer

A

subtract the minimum number from the maximum number

Question 32

Q

How do you find the three Quartiles?

Answer

A

Start by finding the median of the entire list. The median is considered Q2. The median then separates the list into two halves. Locate the median of the first half of the list, this median is Q1. Locate the median of the second half of the list, this median is Q3.

Question 33

Q

How do you determine if a number is an outlier?

Answer

A

You must find the highest and lowest limit of the dataset for non-outlier numbers. To find the lowest acceptable number take Q1 - 1.5IQR. To find the highest acceptable number take Q3 + 1.5IQR. If the number in question is below or above either of these numbers it is an outlier.

Question 34

Q

in linear regressions, How do you find Standard Deviation? What is the formula?

Answer

A

Formula:
SD = sqrt (1/n-1 * sum (xi-mean of X)^2

Steps:
1.) find the mean of X
2.) Subtract the mean from each x variable
3.) Square each result from step 2
4.) Add together all the squares
5.) Divide the sum of the squares by the total number of observations minus 1
6.) Square root the result of step 5

The result of step 6 is the standard deviation

Question 35

Q

How do you find the median?
What is the benefit of using median

Answer

A

If you have an odd amount of numbers locate the exact middle number.

If you have an even amount of numbers locate the two middle numbers, add them together, and divide the sum by 2.

benefit: more robust against the impact of outliers

Question 36

Q

How do you find Mean?
What is a detriment to using mean?

Answer

A

add together all of the numbers and divide the sum by the total amount of numbers

Detriment: can be influenced by outliers which pull the average too high or too low.

Question 37

Q

What is non-probability sampling

Answer

A

When all members of a population do NOT have an equal chance of participating in the study

Question 38

Q

what is a variable?
What is the key rule?

Answer

A

An empirical measure of a concept/characteristic.

key rule: variables must vary across observations

Question 39

Q

what are the 2 types of variables, describe them

Answer

A

1: Quantitative/Interval/Continuous- observations can take on an infinite number of numerical values between any two values (decimals).
2: Categorical — observations belong to one of a discrete set of categories & we assign a number to each category

Question 40

Q

what are the 3 types of categorical variables, describe them

Answer

A

1.) Nominal — categories are named (independent) but there is no order or ranking involved.
2.) Ordinal — categories are ranked
3.) Dichotomous variables — two values (e.g., yes/no)

Question 41

Q

What does the distribution of a variable tell us?

Answer

A

what values a variable takes and how often it takes on these values

Question 42

Q

what are the two types of modes a distribution can have, define them

Answer

A

unimodal: one mode/one hump in a distribution
bimodal: two modes/two humps in a distribution

Question 43

Q

what two S words are used to describe distribution?
define them

Answer

A

symmetric- looks the same on both sides, a normal bell curve distribution

skewed- the data bunches on one side of the curve and creates a tail on the other.

Question 44

Q

what is a Z score?

Answer

A

the score given to each observation of a variable which measures the number of standard deviations an observation is above or below the mean

It is a measure of deviation from the mean

It is not sensitive to how the variable is scaled and or shifted.

Question 45

Q

differentiate the two types of skewnees

Answer

A

right skew- the tail is on the right
left skew- the tail is on the left

Question 46

Q

how can you transform variables

Answer

A

You can collapse continuous variables into ordinal (or nominal) variables. this does not work in the reverse
ex. you can turn incomes into categories of incomes

Log Transformation for continuous variables

Question 47

Q

Why do we plot distributions

Answer

A

To better understand the spread of the data and to know if we need to log transform it.

Question 48

Q

What is probability

Answer

A

the set of mathematical tools that measure and model randomness in the world. It is a mathematical model of uncertainty

Question 49

Q

What is independent probability?

Answer

A

the probability is independent when the outcome of any one trial is NOT affected by the outcome of any other trial. The events are mutually exclusive

i.e the probability of event A happening does not change the probability of event b happening.

Question 50

Q

What is conditional probability?

Answer

A

the probability is conditional when the outcome of any one trial IS affected by the outcome of any other trial. The events are not mutually exclusive

ie. the probability of event A happening affects the probability of event B happening

Question 51

Q

What is sampling variability?

Answer

A

the extent to which the value of a statistic differs across a series of samples

Question 52

Q

What is a sample?

Answer

A

The smaller portion of the true population being used for the study.

Example: In a town of 50, 10 people participate in the study. The 10 people are the sample

Question 53

Q

What is population?

Answer

A

The entire group the study hopes to say something about

Example: In a town of 50 people and 10 people participate in the study the population is 50

Question 54

Q

What is the relationship between the sample and the population in the law of large numbers?

Answer

A

As the sample size increases, the sample average of a random variable approaches its expected value, the true population average

Question 55

Q

Define probability in terms of outcome

Answer

A

the expected proportion of times that the outcome would occur in the long run

Question 56

Q

What is the equation for conditional probability when the events are independent?
P(A | B)

Answer

A

P(A | B) = P(A and B) / P(B)

Question 57

Q

What is the equation for the probability of A?
p(A)

Answer

A

Number of elements in A / total number of elements

Question 58

Q

What is the equation for the probability of A or B if events are non-independent?
P(A or B)

Answer

A

P(A or B) = P(A) + P(B) – P(A and B)

Question 59

Q

What is the equation for the probability of A and B if both events are independent?
P(A and B)

Answer

A

P(A and B) = P(A) * P(B)

Question 60

Q

What is the equation for the probability of A or B is events are independent?

Answer

A

P(A or B) = P(A) + P(B)

Question 61

Q

What is the equation for the probability of A and B if the events are not independent?

Answer

A

P(A and B)= P(A | B)P(B)
or P(B | A)P(A)

Question 62

Q

How do you find Standard Deviation? What is the formula?

Answer

A

Formula:
SD = sqrt (1/n-1 * sum (xi-mean of X)^2

Steps:
1.) find the mean of X
2.) Subtract the mean from each x variable
3.) Square each result from step 2
4.) Add together all the squares
5.) Divide the sum of the squares by the total number of observations minus 1
6.) Square root the result of step 5

The result of step 6 is the standard deviation

Question 63

Q

Why do you square and then square root in standard deviation?

Answer

A

You square to eliminate the impact of being on the opposite side of the mean, it negates negative numbers and gives you a flat distance from the mean. Squaring prevents the numbers from canceling each other out so you don’t end up with 0. You need to square root because once the numbers are squared they are no longer in the same units of the original data, the square root brings them back to the same unit.

Question 64

Q

what is a Z score?

Answer

A

the score given to each observation of a variable which measures the number of standard deviations an observation is above or below the mean

It is a measure of deviation from the mean

It is not sensitive to how the variable is scaled and or shifted.

Answer 65

A

for each iteration subtract the mean of the variable from the iteration value and then divide the result by the standard deviation of the variable.

z score of Xi = (Xi-x̄) / SD of X

Answer 66

A

The threshold is the critical value the z score much reach to be considered statistically significant at the designated confidence level.
90% confidence- 1.64
95% confidence- 1.96
99% confidence- 2.58

Answer 67

A

RMSE = sqrt (RSS/n)

1.) find the value of RSS (subtract predicted y from real y, square the results, add the squares

2.) divide RSS by the total number of values

3.) square root the result

Answer 68

A

the absolute fit of the model to the data–how close the observed data points are to the model’s predicted values

Answer 69

A

The number of ways to arrange objects with regard to order

Answer 70

A

The number of ways to select/arrange objects without regard to their arrangement/order

Answer 71

A

a variable whose value is a numerical
outcome of a random phenomenon

Answer 72

A

Discrete
* X has a finite number of possible values
*Probability distribution of X lists values and their probabilities
*Can determine probability of event by adding the probabilities of individual outcomes

Continuous
*X can take any value in an interval of numbers
*Probability distribution of X is described by a density curve
*Probability of event is the area under the density curve and above the values of X that make up the event

Answer 73

A

As the sample size increases, the distribution of
sample means approaches the standard Normal distribution

Answer 74

A

When x is larger than its mean, y is likely to be larger than its mean

Line is slanted upwards /

Answer 75

A

When x is larger than its mean, y is unlikely to be larger than its mean

Line is slanted downwards \

Answer 76

A

data cluster tightly around a line
indicates the two variables have a strong relationship

Answer 77

A

1.) Correlation is between −1 and 1
2.) Order does not matter: cor(x, y) = cor(y, x)
3.) Not affected by changes of scale
4.) Correlation measures linear association

Answer 78

A

1.) OLS regressions are linear
-uses line of best fit, but it may not be appropriate
-not resistant to the influence of outliers
-slope is constant
-true relationship may not be linear

2.) OLS allows for unreasonable predictions
-only want to generate reasonable predictions
-Evaluating predictions is key to assessing the relationship between variables & the strength of the model

3.) OLS correlations do not necessarily indicate causation
-correlation can be driven by unobserved variables

4.) OLS regressions are versatile and robust
-Models the relationship between IV and DV and allows for making predictions
-allows for including additional variables in the model
-Continuous DVs & continuous and/or dichotomous IVs

Answer 79

A

Worried about omitted variable bias: some underlying (unobserved) factor (X2) is driving relationship between X1 and Y

important to ‘control’ for other variables that we think lie in the causal path. When we control we can determine how much effect each X is having on Y

-Venn diagram, find the net effect of each by removing overlapping areas.

Answer 80

A

yes
because it uses a best-fit line to minimize the distance from all points to the line and if one or more points are far out of the pattern, the slope of
the line can change considerably

Ex. palm beach vote share

Answer 81

A

Y= α +βX + ε

Answer 82

A

Y: dependent variable, what you are trying to predict
α: alpha, is the y-intercept. Where y is when X=0
β: Beta, slope, the increase in Y when X has a one-unit increase
X: independent variable, the predictor
ε: error term, the observed error (actual y - predicted y)

Answer 83

A

Guessing what we do not observe from what we do observe

Answer 84

A

it is unobservable and can only be estimated

Answer 85

A

estimation error would be calculated by subtracting the estimated value from the true population parameter but we can never know the true population parameter and cannot do the calculation

Answer 86

A

an estimator is consistent if as the n increases the estimates converge toward the true population parameter

Answer 87

A

As n increases the X should get closer to the true population parameter

Answer 88

A

an estimator is unbiased if over repeated sampling the parameter estimates produced are on average correct

Answer 89

A

when the standard deviation of a statistic is estimated from the data

Tells us on average how far the sample mean estimate is likely from the true population parameter

Answer 90

A

standard deviation / √n

Answer 91

A

Standard error is impacted by the size of n and standard deviation ignores n

Answer 92

A

√ (x̄*(1-x̄)/n)

x̄= estimate

Answer 93

A

√ (p*(1-p)/n

p=sample average

Answer 94

A

√ (s1^2/n) + (s2^2 / M)

m= sample sixe of sample #2
n= sample size of sample 1
s1= sample 1 standard deviation
s2= sample 2 standard deviation

Answer 95

A

how confident we can be that, over repeated sampling, the population parameter will be in the confidence interval

Answer 96

A

a range of numbers that contains the true value 95% of the time over repeated data generating process

Can be created around the null or around the estimate

can also be at 90% or 99%

Answer 97

A

[(estimate - 1.96SE) ,( estimate + 1.96SE)]

at other levels exchange 1.96 for the critical z value.

to build around the null exchange the estimate for the null value (usually 0)

Answer 98

A

say the CI contains the true parameter 95% (or 90% or 99%) of the time over repeated sampling

Answer 99

A

when the n is very small

Answer 100

A

n-1-k
for t-distribution just do n-1
n= number of observations
k-number of parameters to be estimated

Answer 101

A

it penalizes smaller samples more by requiring wider confidence intervals

Answer 102

A

you use a t-test when the n is less than 50. After 50 the z score kicks in as the 95% threshold stays around 1.96. before 50 you need the chart to tell you the exact confidence threshold. Under 50 you use the chart to determine the confidence threshold and then build a confidence interval around t like you would z to determine if the estimate is statistically significant.

Answer 103

A

mistakenly reject the null hypothesis

Answer 104

A

mistakenly fail to reject the null
hypothesis

Answer 105

A

1- determine the value of the null hypothesis (usually 0)

2- determine what µ (mean) would be if the null were true

3- solve for the t-statististic

(estimate - null) / (standard error / √n)

4- use the chart to determine the p-value for the specific t-statisitic

5- Judge if the p value is significant at the specified level.

90%- less than or equal to .10
95%- less than or equal to .05
99%- less than or equal to .001

Answer 106

A

when zero is included in the confidence interval. when zero is in the bounds of the CI you cannot distinguish the estimate and the null at the designated confidence level

Answer 107

A

If the Average treatment affect is double the standard error it is likely statistically significant

Answer 108

A

coefficient/standard error

Answer 109

A

because the statistically significant threshold is not steady, you need the table to tell you the threshold

Answer 110

A

Exogeneity and Homoskedasticity

Answer 111

A

The mean of E does not depend on X, error is random and uncorrelated with your Xs

the main problems are endogeneity (the explanatory variable is correlated with the error term) and omitted variable bias

Answer 112

A

the variance in the error does not depend on x
and the error is fairly uniform throughout

Answer 113

A

least squares

ssr= sum ((actual y - predicted y)^2)

Answer 114

A

under the exogeneity and homoskedasticity assumptions standard errors are unbiased

under exogeneity, alpha and beta are unbiased

t distribution is often used

Answer 115

A

1.) Is p ≤ 0.05?
2.) is t greater than 1.96 in two-tailed test (with a large N)
3.) Does a 95% confidence interval have the same sign on the lower and upper bound?

Answer 116

A

1.) the size of the coefficient (the bigger the effect X has on Y the less likely it is that the true effect in the population is zero)

2.) the size of the standard error (t-statistic is the coefficient divided by standard error -smaller standard error means bigger t)

3.) sample size (Smaller t gives bigger p-value as degrees of freedom increases. This increase diminishes quickly and (essentially) stops at about 1,000 degrees of freedom.-> SE becomes smaller regardless of truth

Answer 117

A

because statistical significance does not tell us how large or meaningful the effect of X is on Y

What a one-unit change in x means and what that predicts in terms of units of Y

Answer 118

A

a measure of standard error (how far the estimate is from the mean in terms of standard errors )

Answer 119

A

you need to have error in how x changes or else you cannot understand why y varies. you need variance among all components to get the standard error to be able to evaluate significance. If there is no error you cannot determine statistical significance.

Tells you how precise the predictions are

Answer 120

A

(mean X - population mean) / (s/√n)

Answer 121

A

The number of edges in the shortest path between two nodes