Notes COPY Flashcards

1
Q

Experimental observations

A

the observation of a variable factor under controlled conditions to determine if this changes as the result of the manipulatin of another variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hypothesis testing

A

generating a theory from observations through inductive reasoning, and is usually the first step of most analyses in social sciences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you use the same data/information that gave rise of a theory to test that theory?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random treatment

A

some subjects get a treatment, others do not, and we observe the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random treatment in identical subjects

A

confidence that any difference between the treated and the non-treated group is due to the treatment itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random treatment in heterogenous subjects

A

cannot have full confidence that any difference between the treated and the non-treated group is due to the treatment itself, can get close to full confidence as it randomised and especially with larger same sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Self-selection

A

subjects put themselves forward for participation in the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sources of endogeneity bias (problems in causal inference)

A

omitted variables, simultaneity, reverse causality, selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Identification strategy

A

a research design that addresses endogeneity bias in order to derive a robust causal inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regression analysis

A

a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Probability distribution

A

a statistical function that gives the probabilities of occurrences of possible outcomes for an experiments within a given range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sampling

A

the process by which we select a portion of observations from all the possible observations in the population (done in order to learn about the larger population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Difference between good and bad sample

A

due to both the sampling procedure and luck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Random sample

A

every possible sample of a given size has an equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Central Limit Theorem

A

there is a systematic relationship between the probability that we will pick a particular sample, and how far that sample is from the true population average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sampling distribution

A

a picture that shows the relationship between the many possible sample averages we might conceivably calculate from different samples and the probability of getting those sample averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Beta in this sampling distribution and what do we know?

A

Beta is the true mean which is unknown, we know the shape of the sampling distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the 0 and 15 represent here?

A

0 is the mean under H0, 15 is the cut-off value for a one-tail test at 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Is the null hypothesis correct here?

A

Yes, as the mean under H0 is equal to the true mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Is the null hypothesis correct here?

A

No, as the mean under H0 is not equal to the true mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two possible errors one can make in hypothesis testing?

A

Type I error and Type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a Type I error and can the likelihood of it be controlled?

A

the null hypothesis is correct but we make a mistake and reject the null, yes it can be controlled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Type II error and can the we know the likelihood of it?

A

the null hypothesis is incorrect but we fail to reject it, we cannot know what the likelihood of a Type-II error
is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What happens to a Type II error as the likelihood of a Type I error decreases?

A

it increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Null hypothesis
has status of maintained hypothesis that will not be rejected because it is assumed not to be proven false unless the sample data provide strong contrary evidence (is in a favoured position relative to the alternative hypothesis)
26
Statistical significance level
the probability of the observed coefficient if the null hypothesis were true
27
Regression line
can be thought of as a 'guessing rule' that takes any value of X and maps it to a predicted value of Y (and vice versa)
28
What is the average of the residuals in a regression line always equal to?
0
29
What does the error Term (/residual) equal?
equals the predicted value of y minus the true value of y
30
What does the Central Limit Theorem tell us?
the probability distribution of the sample residuals (errors) from a regression will be a normal distribution with mean 0, for sufficiently large sample sizes the sampling distribution of the sample mean will be approximately normally distributed
31
Normal distributions
have in common a fixed relationship between the probability mass and the standard deviation
32
How do you calculate the standard normal?
/z-disrtibution 1) subtract the mean, beta, from every value along the X-axis 2) divide every value on the X-axis by the original standard deviation distribution is a normal curve with (new) mean equal to 0 and (new) standard deviation equal to 1
33
How do you calculate the t-test?
a test of statistical significance, where t = (beta_hat - beta_H0)/sd_beta_hat, since beta_H0 equals 0, T = beta_hat/sd_beta_hat
34
What does the t-statistics tell you?
how many standard deviations of the sampling distribution (or standard errors) beta hat is from the mean under the null hypothesis, 0 (if sufficiently far away, then null hypothesis is rejected)
35
Counterfactual
a potential outcome that would have happened in the absence of the cause, not observable
36
Statistical inference
generalising from sample to population
37
Causal (statistical) inference
understanding cause- and effect- relationships
38
Law of Large Numbers (LLN)
the sample mean will converge in probability to the population mean as the sample size grows larger
39
What are the practical implications of the Law of Large Numbers (LLN)?
we can trust large sample sizes to yield accurate parameter estimates as the sample mean is used to estimate the population mean
40
What are the practical implications of the Central Limit Theorem (CLT)?
allows us to make probalistic statements about the sample mean and construct confidence intervals (the normal distribution approximation is crucial for applying z-tests and t-tests)
41
What does a regression do?
compares treatment and control subjects who have the same observed characteristics
42
What does regression-based causal inference have to assume?
that when key observed variables have been made equal across treatment and control groups, selection bias from the things we cannot see is mostly eliminated
43
What are regression estimates?
weighted averages of multiple matched comparisons
44
What do dummy variables do?
classify data into yes-or-no categories
45
How are residuals calculated?
the difference between the observed Y and the fitted values generated but the regression Y_hat
46
How is regression analysis accomplished?
by choosing values that minimise the sum of squared residuals
47
Omitted Variable Bias
selection bias generated by inadequate controls (not enough of not right ones)
48
What does the robust standard error formula allow for?
the possibility that the regression line fits more or less well for different values of X (heteroskedasticity)
49
What is the equation for a line?
Y = alpha + beta*X
50
What is the equation for a model?
Y = alpha + beta*X + u
51
What is beta_hat?
the regression slope coefficient which is a function of the sample residuals
52
What does the t-statistic tell you?
how many standard deviations (/standard errors) of the sampling distribution beta_hat is from the mean under the null hypothesis (if sufficiently far away, you reject the null hypothesis)
53
What happens if you set the criteria for rejecting the Null hypothesis to be stricter?
increases Type II error
54
What does the null hypothesis state?
always that there is no relationship (to reduce the risk of Type I error as it is worse than Type II error)
55
Is there a mechanical way to know the truth?
No
56
How do you compare to the standard error?
divide the coefficient by 2 and compare to the standard error, if it is less than it you fail to reject the null hypothesis
57
What does the p-value measure?
the probability of obtaining he observed results, assuming that the null hypothesis is true (the lower it is, the greater the statistical significance of the observed difference)
58
In the (population) linear regression function Y = alpha + beta*X + u, what does each term represent?
Y is the dependent variable X is the independent variable alpha and beta are parameters we want to estimate u is error term
59
What does the Ordinary Least Squares method aim to do?
minimise the sum of squared residuals
60
What t-statistic indicates statistical significance?
larger than 1.96 at 5% level for a two-tailed test
61
What does the population regression function look like?
Y = alpha + beta*X + u
62
What does the sample regression function look like?
Y = alpha + beta*X
63
When does omitted variable bias occur?
when omitted variable is correlated with both dependent and independent variable of interest
64
What could the direction of bias do?
biases that operate in the opposite direction may actually strengthen the final argument because exclusion makes it harder to reject the null hypothesis, not easier
65
What do dummy variables capture?
some "qualitative" characteristic of each observation that does not have an obvious numerical variable
66
What is an omitted category (/control dummy variable)?
one dummy variable that is not included in the regression, coefficients interpreted are against the omitted category
67
What do you have to do when running a regression with dummy variables(, e.g. north, south, east, west)?
omit the intercept term or include the intercept term and omit one dummy variable
68
When do you use an interaction term?
when the relationship between the dependent and independent variable changes depending on the value of another independent variable
69
If a 'mechanism' variable connects Y and C and you hold the 'mechanism' variable constant across the whole data, what should you not observe?
you should not observe any relationship between X and Y anymore )beta_hat would not be statistically significant from zero)
70
What are parameters?
alpha and beta constant values to be estimates represent the relationship between variables
71
What are variables?
Y and X observed or measured quantities that can change Y is dependent (outcome) variable X is independent (explanatory) variable
72
What is does the error term represent and what does it account for?
represents unobserved factors affecting Y accounts for randomness and measurement errors
73
What are exogenous variables?
determined outside the model and not influenced by other variables in the model
74
What are endogenous variables?
determined within the model and influenced by other variable sin the model
75
When do endogeneity concerns in a regression model occur?
when explanatory variables are correlated with the error term (the estimated coefficient will be biased, meaning they do not converge to the true population parameters as the sample size increases)
76
If you have the true regression is Y = beta0 + beta1_X1 + beta2_X2 + u and misspecified regression is Y = alpha0 + alpha1_X1 + v, when does omitting X2 lead to a bias in alpha1?
when X2 has a non-zero effect on Y (means X2 is a relegation explanatory variable for Y) and when X2 is correlated with X1 (allows the effect of X2 to be partially captured by X1 in the misspecified model)
77
What do "country fixed effects" do?
allow each country to have its own intercept term
78
When "country fixed effects" are included, what does the single common slope coefficient across coefficient represent?
the average 'within' relationship within countries (regression will look like a set of parallel lines with different intercepts but same slope)
79
When "country fixed effects" are included, what does each country's intercept (their fixed effect) absorb?
will completely absorb any 'between variation' in the data
80
What do fixed effects force the regression to estimate?
the relationship within the countries
81
In panel data with country fixed effects, what do the slope coefficients capture?
a weighted average of the 'within' relationship in each country
82
What do country fixed effects control for?
'between' variation (control for both observe and unobservable time-invariant country characteristics (omitted variables))
83
What do time fixed effects control for in an analysis with countries?
control for time-varying observable and unobservable omitted variables that are common across countries (e.g. global shocks and trends)
84
What must the estimates of standard errors take into consideration?
the patterns of correlations between the units of analysis
85
When are clustered standard errors necessary?
when an estimating regression includes variables of different degrees of aggregation
86
What are clustered standard errors?
a type of robust standard error that estimates the standard error of a regression parameter when observations are grouped into smaller clusters
87
When the regression is level(Y)-level(X), how do you interpret it?
a unit-increase in X results in a beta unit increase in Y
88
When the regression is level(Y)-log(X), how do you interpret it?
a 1% unit increase in X leads to a beta/100 unit increase in Y
89
When the regression is logY)-level(X), how do you interpret it?
a unit increase in X results in a 100*beta% increase in Y
90
When the regression is log(Y)-log(X), how do you interpret it?
a 1% increase in X leads to a beta% increase in Y
91
If a fixed effect is correlation with X and is left out of the regression, what does that mean for the regression of Y on X?
it is biased
92
When can fixed effects be used ubiquitous cross-sectional data?
can be used to account for group-level characteristics that are constant within groups but very between them (unobserved heterogeneity at the group level)
93
In what type of data are fixed effects most commonly used?
in panel data
94
What are two-way fixed effects?
entity fixed effects and time fixed effects
95
What is left after two-way fixed effects?
variation within-entity, over-time deviations relative to the overall time trends (estimation focuses on deviations of each entity from its average trajectory over time, relative to global or common time trends) --> remaining variation is how Y and X deviate from entity-specific average and global time trend for each year (time fixed effects absorb all variation common to all entities in given time-frame, entity fixed effects remove any time-invariant differences between entities)
96
What would the interacted fixed effect 'continent*year' fixed effect account for?
continent-specific shocks or trends that vary over time, but which affect all countries within a given continent the same way in each year (capture continent-level trends that vary from year to year, but are constant across all countries within the continent in a given year) --> estimates relationship between X and Y based on the within-country variation, net of time trends that are specific to the country or continent for a particular year
97
What is selection bias?
a distortion in a measure of association due to a sample selection that does not accurately reflect the target population
98
What are the four main examples of evidence?
1) testing the robustness of results to the inclusion of different sets of control variables and/or different methods of estimation 2) "placebo" or falsification tests (looking for an effect in otherwise similar circumstances but where you know no 'treatment' has been administered) 3) use of qualitative supporting evidence (e.g. descriptive information, survey data, ethnographic data, historical archives) 4) exploiting otherwise unlikely testable hypotheses from theory (Einstein approach)
99
What does reverse causality with any variable in a regression do to the coefficient estimates?
will bias all the coefficient estimates
100
What is reverse causality?
When Y causes X
101
What is simultaneity?
an omitted variable that is correlated to Y and X
102
How is the 2SLS estimator calculated?
(reduced form)/(first stage)
103
What type of estimation is Instrumental Variable Estimation?
Local Average Treatment Effect (LATE)
104
How do you test the exclusion restriction?
if there is more than one instrument, one can run an 'over-identification' test of the hypothesis that delta1=delta2=0
105
Are 'over-identification' tests stronng?
No, they are weak tests
106
When is an instrumental variable valid? (3 points)
must be significantly correlated with the endogenous explanatory variable of interest only impacts the dependent variable via its impact on the endogenous explanatory variable (exclusion restriction) is itself not exogenous in that the dependent variable cannot cause the instrumental variable
107
What does the exclusion restriction state?
that the instrumental variable only impacts the dependent variable via its impact on the endogenous explanatory variable
108
Does the 'treatment' (/causal) effect for any individual remain observed or unobserved?
unobserved
109
What does the Average Treatment Effects (ATE) equal?
the difference between the average outcome across all units if all units were in the treatment condition and the average outcome across all units if all units were in the control condition
110
How does true ATE emerge?
when we take averages across many individuals, the differences due to other unobserved factors tend to cancel out (,especially if treatment assignment is random)
111
What are compliers?
subjects who do what they are told
112
What are always-takers?
subjects who always take up the treatment whether they are told to or not
113
What are never-takers?
subjects who never take up the treatment whether are told to or not
114
What are defiers?
subjects who always do the opposite of what they are told
115
What is the Intention-To-Treat effect (ITT)?
the estimate from when being an always- or never-taker is correlated with outcomes, and the treatment and control group are unbalanced
116
How do you calculate the treatment on the compliers?
(ITT)/(% of compliers) (as long as you can rule out the presence of 'defiers')
117
What is the Local Average Treatment Effect (LATE)?
local treatment effect as it is the treatment effect only on a subset of the population
118
When is the LATE estimate very close to the ATE estimate?
when the subset in LATE is similar to rest of population
119
What are two forms of LATE?
treatment on compliers instrumental variables
120
What is a covariate?
measurable variable that may impact a study's outcome (and has a statistical relationship with the dependent variable)
121
Do instrumental variables harness random assignment?
harness partial or incomplete random assignment whether naturally occurring or generated by researchers
122
What happens after standardisation?
values are measured in units defined by the standard deviation of the reference population
123
How is standardisation done?
by subtracting the mean and dividing by the standard deviation of the reference population
124
What is the independence assumption?
instrument variable are randomly assigned, so unrelated to the omitted variables we might like to control for
125
What is the reduced form?
the direct effect of the instrument on outcomes, which runs the full length of the chain
126
How is the causal effect of interest determined (LATE in IV)?
determined by the ratio of reduced form to first-stage estimates
127
Compliers and LATE
LATE is the average causal effect of interest on such people
128
What is monotonicity?
no-defiers assumption, meaning that the instrument pushes affected in one direction only
129
What is the LATE theorem?
for any randomly assigned instrument with a nonzero first stage, satisfying both monotonicity and an exclusion restriction, the ratio of reduced fotr to first stage is LATE (the average causal effect on compliers)
130
What is the average causal effect called?
treatment effect on the treated (TOT)
131
Is the TOT the same as LATE?
usually not the same, as treated population included always-takes
132
What is external validity?
whether a particular causal estimate has predictable values for times, places, and people beyond those represented in the study that produced it
133
What is the best evidence for external validity?
from comparisons of LATEs for the same or similar treatments across different populations
134
What are ITT effects?
effects of random assignment in randomised trials with imperfect compliance, where treatment assigned differs from treatment delivered (captures the causal effect of being assigned to treatment, but ignore the fact that some of those assigned to be not treated, were treated)
135
What is the ITT the reduced form for?
ITT is the reduced form for a randomly assigned instrument (dividing ITT estimates from a randomised trial by the corresponding difference in compliance rates)
136
What are two assumption checks for instrumental variable?
1) first stage vby looking for a strong relationship between instruments and the proposed causal channel 2) independence by checking covariate balance with the instrument switched on and off , as in a randomised trial
137
Can the exclusion restriction be easily verified?
No
138
What does finite sample bias occur?
occurs when the instrumental variable estimator does not converge to its true value as the sample size increases
139
What is the fixed effects estimator also called?
within estimator
140
What variation does the IV regression use?
only use variation in the endogenous regressor that is induced by the instruments
141
What do the IV estimators tell us?
IV estimators only tell us the effect on the outcome of the type of variation in the endogenous variable that is typically induced by the instruments
142
What is used to estimate IV estimators?
2SLS
143
What are the steps of 2SLS?
1) regress X on Z (X = y0 + y1Z) 2) regress Y on X --> the estimated coefficient for X, beta_hat is the causal estimate of interest
144
Are differences-in-differences commonly used?
one of the most common mainstays of quantitative causal analysis
145
What should a convincing identification strategy show in a DiD analysis?
convincing identification strategy should show that the control group is a valid counterfactual for the treated group
146
What are parallel (common) trends? (DiD)
before treatment, both groups were following this, even if they had different levels
147
Is it possible to control for differences in trends if parallel trends is violated? (DiD)
Yes
148
Is parallel trends enough? (DiD)
No, story must be compelling for estimator to be convincing
149
What is the 'treatment' arguably exogenous to? (DiD)
arguably exogenous to factors related to time trends in the outcome variable
150
For 'treatment', can something else have happened to the treated group at the same time as the treatment that could affect the outcome variable? (DiD)
No
151
What do robustness checks in DiD do?
ensure that it is likely the treatment itself caused the change
152
What is the difference-in-differences estimate?
the coefficient on the interaction tetrm
153
What is an event study?
it is part of the DiD estimating strategy exploiting the staggered timing of treatment
154
What does an event study have to do to work?
the year is normalised to 0 for all treatments for when treatment started
155
What do the time terms describe in an event study?
time terms describe the dynamic path in the years before treatment started and years after
156
What is the purpose of an event study model?
used for the purpose of estimating dynamic treatment effects when there are multiple instances of a treatment (an 'event') (treatments can occur simultaneously across all units, or staggered across time)
157
What do the coefficients in an event study model capture after the event occurring?
capture the dynamic effects of the treatment as these effects manifest over time since the event
158
What do the terms in an event study model provide before the event occurring?
provide a placebo or falsification test
159
What happens if there are only treated units with common event date in an event study model?
cannot identify treatment effects, as cannot separate effects of event from other confounders that occur in calendar time
160
What happens if there are both treated and untreated units with common event date in an event study model?
can identify treatment effects, as never-treated units help to identify the change in counterfactual outcomes across calendar times
161
What happens if there are only treated units with varying event date in an event study model?
if the timing of the event is as good as random those treated earlier or later can serve as controls for another
162
In event study models, what is a variable that is common to exclude?
the dummy variable for j = -1 event time
163
How can a classic differences-in-differences be implemented?
can be implemented using a two-way fixed effects regression that includes unit fixed effects and time fixed effects (unit fixed effects controls for time-invariant characteristics, time fixed effects control for common shocks)
164
What is the equation for a classic DiD?
Y_it = alpha + beta*(Treat_i*Post_t) + y*Treat_i + d*Post_t + u_it
165
What does a standard DiD estimate?
estimates single treatment effect
166
Does an event study allow dynamic responses?
Yes
167
What are 3 benefits of an event study?
tests parallel pre-trends (beta = 0 for k<0) reveals treatment dynamics (beta pattern over k>0) shows anticipation effects (beta =/= 0 just before treatment)
168
What are the implementation details for an event study?
normalise beta = 0 (omitted period), and add confidence intervals for inference
169
What are the interpretations of the beta regarding time in an event study?
beta_k for k<0 is pre-trends beta_0 is immediate effect beta_k for k>0 is dynamic responses
170
In an event study graph what do the solid points, empty circle and vertical bars represent?
solid dots are point estimates (beta_k) empty circle is the omitted period (normally k=-1, normalised to 0) vertical bars are the 95% confidence intervals
171
What are 3 identifying assumptions for a DiD model?
parallel trends no anticipation no spillovers
172
Is a suitable control group an important check for DiD?
most important check for DiD
173
When is a suitable control group often feasible for DiD?
often feasible when 'treatment' is (quasi-) random
174
When is finding a suitable control group challenging for DiD?
challenging when treated groups characteristics that could drive trends are correlated with treatment probability
175
What is matching?
use statistical techniques to construct an artificial control group by identifying for every treated observation an untreated observation that has similar observable characteristics
176
When matching, is there a guarantee groups will be matched on unobservable characteristics?
No
177
What does matching creates?
a comparison group for which the joint distribution of observable characteristics is the same as that of the treated group
178
What do matching estimators allow for?
allow broadly for nonlinearities in the relationships between observables
179
Do regression-based estimators return linear approximations?
Yes, unless it introduces nonlinearities e.g. quadratic, logs, interaction terms
180
When are regression-based estimators more likely to give a better answer?
when observable and unobservable characteristics are reasonably uniformly distributed across a continuum
181
When is matching better to find suitable controls?
if both the observables are clustered together, and unobservables are correlated with the observables (some chance that unobservable characteristics cluster as well)
182
How does traditional matching pair (and how to interpret findings)?
pair each 'treated' observation with an observably similar non-treated observation and interpret the difference in their outcomes as the effect of the treatment
183
What is the ATT the mean of?
mean of individual differences from traditional matching (expected difference in outcomes between the treated units and what those same units would have experienced had they not been treated)
184
Why do matching techniques yield ATT and not ATE?
because they examine the difference between the treatment and control on observations that have similar characteristics to the treated group
185
What is the curse of dimensionality problem?
difficult to apply exact matching if conditioning on a large set of characteristics is required
186
What does the Propensity score (Pr(Z)) represent?
probability that a unit of analysis is 'treated' based on observed characteristics (Z) (solution to the dimensionality problem) (probability of receiving treatment given covariaties)
187
What type of matching results in more participants being able to be matched than with exact matching?
Propensity Score Matching (PSM)
188
How does Propensity Score Matching (PSM) match?
units with same (or similar) propensity scores are matched
189
For which units can PSM only be done?
can only be done for units who's propensity scores lie within the common support
190
What is the common support?
the values of the propensity score for which there are observations in the data for both the treated and the untreated
191
What is nearest neighbour matching (pairwise matching)?
non-treated unit whose propensity score is closest to the treated is selected as the match
192
Why is nearest neighbour matching often used?
because of its ease of implementation
193
What is Caliper matching?
variation of nearest neighbour matching that attempts to avoid bad matches by limiting the maximum distance between propensity scores allowed
194
What does stratification/interval matching require?
requires decision about how wide intervals should be (e.g. intervals so that the mean values of the estimated propensity scores are not statistically different from each other within each interval)
195
What are the nonparametric methods in matching (Kernel matching/local linear matching)?
construct a match for each treated unit using a weighted average over multiple units in the non-treated group rather than a single nearest neighbour (more recent approach)
196
What is the main advantage of nonparametric methods in matching (Kernel matching/local linear matching)?
reduction in the variance of the estimated matched outcome (may come at expense of higher bias)
197
What is the difference between linear probability model and probit/logit model?
in a linear probability model when Y is a binary dummy variable, predicted values of Y from OLS can be greater than 1 or less than 0, whereas a profit/logit model is an alternative that generates only predicted values between 0 and 1 (nonlinear approximation function)
198
Why is the linear probability model recommended unless strong reason not to? (matching)
have known properties and tend to be much more robust under various conditions
199
What is the probit/logit model commonage for estimating and is it 'less biased' than the linear probability model?
coming for estimating PSM, and not 'less biased' than LMP as very unlikely that the true underlying distribution is Probit or Logit (trading in one bias for another)
200
What data do DiD matching estimators require?
require panel data or repeated cross-sectional data both before and after treatment time
201
How do DiD matching estimators identify treatment effects?
by comparing outcome changes of treated observations to outcome changes for matched untreated observations
202
What do DiD matching estimators identify allow selection into treatment to be based on?
allow selection into treatment to be based on unobserved time-invariant characteristics of observations
203
What do DiD matching estimators control for?
control for time-varying unobserved characteristics to the extent that time varying unobservables are clustered and correlated with time varying observables
204
What can DiD matching estimators not control for?
cannot control for time varying unobservables not correlated with observables
205
What do synthetic control matching (SCM) construct? (matching)
construct a "synthetic" control group as a weighted combination of potential control units, which in theory provides a counterfactual of what would have happened to treated units in absence of treatment
206
What do synthetic control matching (SCM) place a lot of emphasis on?
matching pre-treatment trends between treated unit and synthetic control
207
What cases are synthetic control matching (SCM) suited to?
suited to cases with clear treatment and control group , but where traditional DiD might be problematic due to lack of parallel trends or other pre-treatment differences
208
What type of data is synthetic control matching (SCM) more suited for?
aggregate-level data and works more effectively when there is a single or few treated units and many potential control units
209
What can synthetic control matching (SCM) not address?
cannot address unobserved confounders that affect post-treatment trends differently than pre-treatment trends
210
What cases are synthetic control matching (SCM) most suitable for?
most suitable for cases where a counterfactual is needed for a single country or region (or at most a few) with a clear treatment and where there is sufficiently long time series data for both treatment and 'donor' countries or regions
211
What challenge does PSM address and how does PSM address it?
the challenge of how to estimate causal effects when treatment is not random (selection bias) solution is to find 'similar' individuals across treatment and control groups
212
What is the fundamental problem in DiD?
(treatment outcome - no treatment outcome) is wanted, but only one potential outcome is observed, and comparing means introduces selection bias
213
What is the key theorem of PSM?
if selection into treatment depends only on observables (X), then matching on p(X) is as good as matching on X (reduces dimensionality problem)
214
What are 3 questions regarding the dimensionality problem in matching?
When are units "similar"? How close is "close enough"? What are the trade-offs between variables?
215
What do you need to do for multi-dimensional matching?
need to match on each X separately (many cells will be empty)
216
What are 3 benefits of one-dimensional matching?
single number summarises all characteristic clear metric for "closeness" so easier to find matches balancing property ensures X's are balanced
217
How are Propensity Scores usually estimated?
usually via logistic regression (log((p(X)/(1-p(X)))
218
What does a high Propensity Score mean?
high score means similar to treated units (low score means similar to non-treated units)
219
In the distribution of Propensity Scores, what does the left tail, right tail and middle represent? What are the implications?
left tail are controls with no comparable treated units right tail are treated units with no comparable units middle is where valid comparisons can be made implications are that the treatment effect is only defined where distributions overlap (cannot generalise to very high/low propensity regions, balance improves when restricting to common support)
220
How do you use PSM?
1) find comparable individuals (estimate p(X)=P(T=1/X)) and match treated individuals with similar control individuals 2) compare matched individuals (single period or DiD to compare changes over time)
221
What do matches from PSM in single period account for?
account for X, but not pre-treatment outcome
222
What do matches from PSM in DiD account for?
account for X and pre-treatment outcome
223
What do matches from PSM in DiD control for?
controls for observables through matching controls for time-invariant unobservables through differencing
224
Why is PSM and DiD combined methods more credible than either alone?
PSM ensures similar individuals are compared and DiD removes fixed differences between groups
225
What assumption does PSM alone rely on?
selection on observables assumption
226
In a Regression Discontinuity Design (RDD), what is being found?
'jumps' in the probability of treatment as we move along some running variable
227
How is 'treatment' assigned in Regression Discontinuity Design (RDD)?
assigned to a unit if and only if Z>z, where Z is observable and where z is a known threshold
228
What is Z in a Regression Discontinuity Design (RDD)?
the 'running' (or 'forcing') variable and is a continuous variable assigning units to treatment
229
What does Z in a Regression Discontinuity Design (RDD) depend on?
can depend on unit's characteristics and choices, but there is also a random chance element
230
In a Regression Discontinuity Design (RDD), when is treatment status as good as randomised?
When Z = z (for units in z, treated and control groups should possess the same distribution of baseline characteristics)
231
What does Regression Discontinuity Design (RDD) require?
requires assignment rule to be known, precise, and free of manipulation (does not have to be arbitrary)
232
What is the complication in a Regression Discontinuity Design (RDD)?
matching cannot be used, aster is no case where the same underling attribute occurs both in treatment and without treatment (no "common support")
233
What is extrapolation in RDD?
comparing units with different values of the running variable (only overlap on the limit as Z approaches the cutoff from either direction)
234
What does Kernel local linear regression do (RDD)?
gives more weight to observations closer to the cutoff (bandwidth crucial parameter of Kernel function that determines the range around the cutoff within which data is included)
235
Are there warnings against using higher order polynomials in RDD?
Yes, so methods are still evolving
236
What is the local random assignment assumption? (RDD)
there needs to be a non-trivial random chance component to the ultimate precise value of the running variable Z
237
What is the exclusion restriction? (RDD)
a random draw of Z does not itself have an impact on the outcome except through its impact on treatment status
238
What is the continuity assumption? (RDD)
the running variable Z is a smooth, continuous process (absent the treatment the expected potential outcomes would not have humped, would have remained smooth functions of Z)
239
What does the continuity assumption rule out? (RDD)
rules out omitted variable bias at the cut-off itself (as without the treatment, the expected potential outcomes are not jumping at z, then there are no competing interventions occurring at z)
240
Can the continuity assumption be proved? (RDD)
cannot directly prove it, but some implications are empirically testable
241
What is the identifying assumption in RDD?
density of the forcing variable should be smooth around the cutoff
242
What are sharp RD designs?
probability of treatment goes from 0 to 1 at the cutoff, C (treatment status is entirely determined by the running variable, Z)
243
What are fuzzy RD designs?
probability of treatment discontinuously increases at the cutoff (represents a discontinuous "jump" in the probability of treatment when Z>=C, and cutoff is used as an instrumental variable for treatment)
244
What are two common kinds of RDD studies?
sharp and fuzzy
245
What is the cutoff used as in fuzzy RD designs?
as an instrumental variable
246
What are the differences between fuzzy design and sharp design?
fizzy design differs from the star design in that the treatment assignment is not deterministic function of Z because there are also other variables that determine assignment to treatment
247
What do randomised evaluations do?
use random assignment to create a counterfactual
248
How do randomised control trials (RCTs) work?
there is a heterogeneous population on observables and unobservables and individuals are randomly assigned to being under treatment or not (comparison group) (the two groups are, on average, comparable)
249
What do we ask to judge internal validity?
Can we infer from the data that policy caused the desired outcome? Did X really cause Y?
250
What do RCTs solve?
solve reverse causality and omitted variable bias by construction
251
What do we ask to judge external validity?
Can we predict that this policy will have the same impact when implemented somewhere else? Will X cause Y in other, similar contexts?
252
What are 4 challenges to internal validity?
Measurement bias Statistical power Spillovers Attrition
253
What is involved in RCTs measurement?
innovative data collection (primary and secondary data collection) evaluation effects survey-based (bias, but not restricted to RCTs)
254
What are evaluation effects?
when respondents change their behaviour in response to the evaluation itself instead of the intervention (salience of being evaluated, social pressure)
255
What are 5 evaluation effects?
Hawthorne effects Anticipation effects Resentment/demoralisation effects Demand effects Survey effects
256
What are Hawthorne effects? (evaluation effects)
behaviour changes due to attention from the study or intervention
257
What are anticipation effects? (evaluation effects)
comparison group changes behaviour because they expect to receive the treatment later
258
What are resentment/demoralisation effects? (evaluation effects)
comparison group resents missing out on treatment and changes behaviour
259
What are demand effects? (evaluation effects)
behaviour changes due to perceptions of evaluators objectives
260
What are survey effects? (evaluation effects)
being surveyed changed subsequent behaviour
261
What are 2 solutions to evaluation effect ts?
minimise salience of evaluation as much as possible (make sure staff is impartial and treats both groups similarly, e.g. blind data collection staff to treatment arm) measure the evaluation-driven effects in a subset of the sample (prime a subset of the sample by reminding them of the evaluation)
262
What is statistical power?
the probability of detecting an impact of a given size if there is one (probability of finding an effect if there actually is one)
263
Without statistical power, can we learn much from an expression?
might not learn much
264
What does statistical power avoid?
avoids false negatives (falsely concluding there is no impact, Type II error)
265
What is statistical power by convention?
80% power is aimed for (expect that 20% of the time, falsely conclude there is no impact)
266
What is statistical significance, what does it avoid and what is it usually set to?
"detecting an effect" avoiding false positives (falsely condoling there is an impact when there is none, Type I error) usually set to 90% or higher (10% of time, false positive is gotten)
267
What can failure to find statistically significant effect be misinterpreted as?
can be misinterpreted as failure of the program, but it might just be a failure of the evaluation
268
What affects statistical power? (7)
effect size (minimal detectable effect size of X on Y) sample size (power calculations) variance of the outcome unit of randomisation attrition spillovers non-compliance
269
Can unit of randomisation only randomise at individual level?
No, can randomise at individual level or relevant unit (e.g. schools, households, villages)
270
What is the challenge of unit of randomisation in clusters?
challenge in units within clusters are not independent of one another (e.g. students within same school likely have similar family income)
271
What is attrition?
Data are missing for some participants in the study (refusals, not located, missing from administrative data, etc.)
272
Does attrition reduce statistical power?
Yes
273
What are spillovers?
the outcomes of comparison units are indirectly affected by the treatment given to the treated units (common causes are geographic proximity, social networks like information transmission or market interactions)
274
What are marketwide/general equilibrium effects?
competing in same region for marketshare (control group harmed)
275
How can you deal with spillovers?
avoid spillovers (e.g. spatial buffers between treatment and control units, randomise at higher level) measure spillovers
276
What can partial compliance happen?
individuals assigned to treatment group may not receive the program individuals assigned to the comparison group may access the treatment can be due to project implementers or the participants themselves
277
What can non compliance lead to?
can lead to sample selection bias and threaten internal validity of not properly accounted for in analysis
278
When does selection bias occur?
occurs when individuals who receive or opt into the program are systematically different from those who do not
279
Can you switch or drop non-compliers?
No, you cannot switch or drop them, you remain comparing the original groups (treatment, and control)
280
What does ITT measure?
difference in means regardless of whether groups received the treatment
281
What overall effect does ITT give?
gives overall effect of intervention, acknowledging that noncompliance is likely to happen
282
How is ITT calculated?
(average outcome in treated group) - (average outcome in control group)
283
What is ToT and how is it calculated?
effect of the treatment on those who complied with their treatment status ((ITT)/((take-up in the treatment group)-(take-up in control group)))
284
When is ToT LATE?
if IV
285
What are 3 steps (questions) to judge external validity?
1) What needs does the program address and what is the disaggregated theory behind the program? Are the needs the same in the new setting? 2) How strong is the evidence? 3) Can the intervention be implemented in the news setting?
286
What are 5 advantages of RCTs?
when well implemented, allow for rigorous counterfactual analysis with the fewest assumptions take advantage of scarcity of resources to rigorously assess impact cleanest/easiest technique easier to communicate results and methods to policy-makers, more likely to be scaled up allow for straightforward cost-effectiveness
287
What are 3 disadvantages of RCTs?
studies are in 'real' time (only works for prospective evaluations since it requires random assignment before the policy starts) restricting treatment may be politically difficult or undesirable (best for pilots we are not certain will work) not suitable for several key policies (e.g. rule of law, macroeconomic policies, etc.)
288
What does a placebo test do?
involves repeating an analysis using a different dataset or a part of the dataset where no intervention occurred
289
What is bunching?
a behavioral pattern where individuals or firms locate at key policy thresholds (e.g. firms reporting earnings just below a threshold that triggers taxes or regulations, individuals working hours just below a threshold that classifies them as full-time)
290
What is Moore’s Law?
the number of transistors on computer chips doubles approximately every two years
291
How has computational capacity changed over time?
has increased exponentially over time
292
How has cost of memory over time?
- has plummeted over time
293
What are structured datasets? (ML)
matrices of variables across observations
294
What are "Bulldozer methods" according to Diana?
one class of ML tools which are methods that work with structured datasets that look very much like the datasets economists work with using ‘conventional’ econometrics
295
What can "Bulldozer methods" provide insight on?
can provide insights into the importance of different features for making predictions and identifying heterogenous treatments effects
296
Can "Bulldozer methods" address problems of endogenous unobservable selection or reverse causality?
cannot (yet) mechanically address problems of endogenous unobservable selection or reverse causality
297
What can you do to make "Bulldozer methods" more powerful?
used in combination with more conventional approaches to causal inference, may become powerful additions to researchers’ toolkits
298
What type of AI ML technique are 'deep learning models' and what do they use?
second class AI ML techniques use more complex neural network engines
299
Why are 'deep learning models' considered ‘black box’ approaches?
are considered ‘black box’ approaches in that it is hard to back out how they arrived at a conclusion
300
In what way are 'deep learning models' powerful?
powerful enough to be able to work with initially featureless (raw) data (work out themselves how best to combine the data to generate analytically salient features (variables))
301
What is a key difference between conventional econometric causal inference and AI/ML methods regarding the primary goal?
primary goal of ML is predictive power, rather than estimation of a particular structural or causal parameter or the ability to formally test hypotheses (inference)
302
What is a key difference between conventional econometric causal inference and AI/ML methods regarding what it relies on?
ML relies much more on out-of-sample comparisons rather than in-sample goodness-of-fit measures
303
What has the focus on prediction come at the expense of for AI/ML methods?
focus on prediction has come at expense of ability to do inference (e.g. construct asymptotically valid confidence intervals), for many ML methods it is currently impossible to construct valid confidence intervals
304
Are credible instrumental variables easy to find?
can be hard to find
304
What does the common trends assumption take account of?
takes account of pre-treatment differences in levels
305
Can the common trends assumption be tested?
with more data can be probed, tested and relaxed
306
Does the population-weighted average from state regression increase precision of regression estimates?
may increase precision of regression estimates (in a statistical sense, data from state with larger population may be more reliable and therefore worthy of higher weight)
307
When does the population-weighted average from state regression increase precision of regression estimates?
only when a number of restrictive technical conditions are met e.g. the underlying CEF is linear (however many regression models are only linear approximations to the CEF)
308
Is population-weighted average from state regression appealing to use?
may not be appealing, as variation may be just as useful in both states
309
When using population-weighted average what do you need to hope for regarding the regression estimate?
hope that regression estimates from state-year panel are not highly sensitive to weighting
310
What does unit-year panel data typically exhibit?
serial correlation (repetitive structure of such data raises this)
311
What is serial correlation regarding unit-year panel data?
deviation from randomness with important consequence that each new observation in a serially correlated time series contains less information than would be the case if the sample were random
312
What is the issue with serially correlated data?
persistent, meaning the values of variables for nearby periods are likely to be similar when the dependent variable in a regression is serially correlated, the residuals from any regression model explaining this variable are often serially correlated as well
313
When you have a combination of serially correlated residuals and serially correlated regressors, what changes?
a combination of serially correlated residuals and serially correlated regressors changes the formula required to calculate standard errors
314
If you ignore serial correlation and use the simple standard error formula, what happens?
resulting statistical conclusions are likely to be misleading (penalty for this is that you exaggerate the precision of regression estimates, as sampling theory for regression inference presumes that data come from random samples)
315
What do robust standard errors correct for?
correct for heteroskedasticity
316
What issues do clustered standard errors address?
answers the serial correlation challenge appropriate for a wide variety of settings solves for any sort of dependence problem in your data (although may lead to large standard errors that result)
317
What does clustering allow for?
allow for correlated data within researcher-defined clusters
318
Does clustering require that all data are randomly sampled?
does not require that all data are randomly sampled, requires only that the clusters be sampled randomly, with no random sampling assumption invoked for what is inside them
319
Is a pair or a handful of clusters enough?
a pair or a handful of clusters may not be enough
320
When you do clustering, what does statistctal inference presume?
once you start, statistical inference presumes you have many clusters instead of (or in addition to) many individual observations within clusters
321
What is the standard error?
standard deviation of a statistic like the sample error
322
What is the sampling variance?
variability of a sample statistic (as opposed to the dispersion of raw data)
323
What is variance?
average squared deviations from the mean to gauge variability (positive and negative gaps get equal weights)
324
What does causal inference compare?
compares potential outcomes (descriptions of the world when alternative roads are taken)
325
What is the first check in any research design and what does it involve?
checking for balance process to check whether treatment and control groups indeed look similar and amounts to a comparison of sample averages
326
What does the LLN state about the sample average?
a sample average can be brought as close as we like to the average in the population from which it is drawn simply by enlarging the sample
327
How do groups treated or untreated assigned by random assignment differ? (randomised trial/experiment)
groups treated or untreated by random assignment differ only in their treatment and any outcomes that follow from it (due to LLN two randomly chosen groups are indeed comparable when large enough )
328
How does experimental random assignment eliminate selection bias?
works not by eliminating individual differences but rather by ensuring that the mix of individuals being compared is the same
329
What is the constant-effects assumption?
Y1i - Y0i = k (k is both the individual and average causal effect of the treatment on the outcome)
330
How does average causal effects of treatment?
Y1i - Y0i, where averaging is done in the usual way (sum individual outcomes and divide by n) difference in group means (Avgn(Y1i/Di=1)-Avgn(Y0/Di=0)) (difference in group means = average causal effect + selection bias)
331
What do standard deviations measure?
measure variability in data
332
What does a good control group reveal?
reveals the fate of the treated in a counterfactual world where they are not treated
333
What are clustered standard errors used for?
used to adjust for the fact that the data contain correlated observations