OLS Flashcards

1
Q

What is the difference between causal effect and correlation?

A

Causal effect tells us that changes in one variable (say hot weather) lead to changes in an other variable (say ice cream sales). Correlation is slightly similar as it shows two variables moving in a similar pattern (either negatively or positevely so) but it does not mean that one causes the other, it could mean that there is another factor influencing both (say a higher amount of wasps seems to correlate with a higher amount of ice cream sales but they are both driven by hot weather).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Quasi-Experimental Methods?

A

Research designs that sharecharacteristics with experimental designs but lack full randomization of participants into treatment and control groups. Often involve naturally occurring events that researchers leverage to study the effects of a treamtent. When true randomization is not feasible or ethical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is OLS?

A

Ordinary Least Squares: method used to estimate parameters of a linear model. The OLS method involves finding the values of the regression coefficients that minimize the sum of the squared residuals, where the residual is the difference between the observed and predicted values of the dependent variable. minB0,B1,…Bk sumi=1->n (Yi -(B0 + B1Xi1 +B2Xi2 +…BkXik))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the line of best fit?

A

the line which best describes the relationship
between yi and xi. The line of ‘best fit’ is the one that gives the best approximation to all the
data points. It does this by minimising the (squared) distance between the line and all of the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In OLS, how do we define the predicted outcome

A

yi^ = a + Bxi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the residual in OLS

A

ui = yi - yi^

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best line in function notation

A

y^ = a + Bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you visually represent OLS and line of best fit?

A

draw graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Formally, what is the OLS estimate?

A

The value of a,B that minimises SSR(a,B): a^,B^ = arg min(a,B) SSR (a,B). We find a^ = mean(y) - B^mean(x) and B^ = cov(x,y)/var(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a^ capture

A

the intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does B^ capture

A

The fact that the slope
coefficient relates to what happens to y when x changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the simple OLS model not favoured by researchers?

A

Usually as social scientists we want more than just the best linear
approximation of one variable given another variable (or variables).
We want to say something about the causal effect and for that we need to specify a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the classic linear model?

A

y = α + β1x1 + β2x2 + … βkxk + u. y, x1, x1, . . . xk and u are random variables.
u is the residual or error term.
α and the βs are referred to as parameters (or coefficients when we
estimate the model) and are real numbers. The model writes the outcome of interest y (e.g. wage in our earlier
example) as a linear function of some explanatory variables x (say age,
gender, education, . . . ) plus a residual or error term u. This is the first
assumption of the model: that the relationship is linear in the parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the residual u of the classic linear model?

A

The residual u can be thought of as standing for ‘unobserved’ – everything
that we think may affect y but we do not obse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are we able to determine a causal effect?

A

In order to give the causal interpretation we want to the model, we need to
be able to interpret βk as the marginal effect of xk on y whilst keeping all
of the other variables (xm for m ̸= k) and the error term u constant
i.e. our good old ceteris paribus condition, where ‘all’ includes the
unobservables. In practice we cannot do this since u is unobserved, we cannot hold it
constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we need assumptions for OLS/classic linear model?

A

In practice we cannot hold all other terms cosntant since u is unobserved, we cannot hold it
constant. The model therefore requires us to make some assumptions about the
unobserved u given what we do observe: the xs.

17
Q

What are the OLS/Gauss-Markov Assumptions?

A

A1: Linearity of parameters
A2: No endogeneity: E(ui|xi) = 0 (mean independence of error term)
A3: Homoskedasticity: variance of the error terms is constant: var(ui|x1, x2, …, xk) = σ2
A4: Zero covariance between the error terms (independent distribution of errors): Cov(ui, uj) = 0, ∀i ̸= j
A5: The error has a normal distribution ui~N(0,σ2) (for statistical inference)
A6: No multicollinearity of variables. If one variable has a linear
relationship with another then we cannot distinguish between the effects of
each individual variable and so cannot estimate the coefficients.

18
Q

What is unbiasedness in an estimator?

A

Bias: the bias of µˆ is given by E(µˆ) - µ. The estimator is unbiased if
E(ˆµ) − µ = 0 i.e. E(µˆ) = µ. Unbiasedness means that if we compute µˆ for
many different random samples then the average of the estimates over
these samples will be the true population parameter µ. Probability density of x¯n the estimator for µ (i.e. x¯n is our estimator µˆ).
x¯n is an unbiased estimator for µ since E(x¯n) = E(x) = µ

19
Q

What is consistency in an estimator?

A

Consistency: µˆ is a consistent estimator of µ if
p lim µˆn = µ, limn→Pr (|µˆn − µ| < ϵ) → 1, ∀ϵ > 0, µˆ is a consistent estimator for µ if for any ϵ > 0 the probability that the
distance between µˆ and µ is less than ϵ tends to 1 as the sample size n tends to infinity. That is, as the sample size increases the estimate will converge to the population value. A consistent estimator delivers estimates such that as the sample size
increases the distribution of the estimates is concentrated ever closer to the
single point µ i.e. the variance of the distribution of the estimates
produced by the estimator tends to zero.

20
Q

What is efficiency in an estimator?

A

Efficiency: let µ˜ be another estimator of µ and assume that both µˆ and µ˜ are unbiased. µˆ is more efficient than µ˜ if var(ˆµ) < var(˜µ). Efficiency of an estimator is relative to other potential estimators. For a
more efficient estimator the estimates computed for different samples tend
to be more tightly centred around their average than for a less efficient
estimator. Using an efficient estimator means that it is less likely that we
obtain a random sample which yields an estimate far from the
corresponding population value.

21
Q

What are finite and small sample properties?

A

Unbiasedness and efficieny, they apply to samples of any size

22
Q

What is an asymptotic or large sample property?

A

Consistency which needs at least n>30 to apply

23
Q

What does the Gauss-Markov theorem tells us?

A

If assumptions A(1) to A(4) hold, then the OLS estimator is BLUE (best linear unbiased estimator). OLS is an estimator and as usual its estimates vary from (repeated) sample to sample. GMT says that:
the expected values of the parameter estimates across these samples
are equal to the true (population) regression parameters, i.e. OLS gets
it right on average (unbiasedness). OLS has the lowest variance among all linear unbiased estimators, so the probability that in a random sample you get an estimate close to
the true parameter is highest when using OLS (efficiency). OLS is also consistent.

24
Q

What is the estimate model?

A

yˆi = ˆα + βˆxi
is an estimate for E(yi
|xi) = α + βxi (which is the CEF if the
CEF is linear and is the best linear approximation to it if the CEF is
non-linear)

25
Q

Can OLS be used for non-linear relationships?

A

Yes, by applying some transformations

26
Q

Why would you take a log transformation?

A

If a regressor is in logs then it is not affected by a
scale or unit of measurement change in the variable. If a distribution is strongly skewed then taking logs will make the distribution more symmetrical and more ‘normally’ distributed variables are more suited to linear regression.

27
Q

What are the 4 different type of log transformation models?

A

1) level-level:
y =a +bx + u
Interpreted as a unit change in x leads to a bunit change in y on average
2) level-log:
y = a +bln(x) + u
Interpreted as a 1% increase in x leads to a b/100 unit change in y
3) log-level:
ln(y) = a + bx + u
A unit change in x leads to a 100%
b (%) change in y (semi-elasticity)
4) log-log
ln(y) = a + bln(x) + u
b is the % change in y when x changes by 1%. This is the elasticity of y
with respect to x.

28
Q

What is the formula for the standard error for B^

A

std.error(B^) = sqr(σ2/n*var(x)). The standard error of βˆ tells us the precision of the estimate i.e. how far on
average is the sample estimate from the population parameter

29
Q

What 3 factors influence the precision of the estimate?

A

1) Variance of x
2) Sample size
3) Variance of the population errors

30
Q

What is the t-distribution?

A

(B^-B)/sqr(var(B^)), for hypothesis testing where H0: B=0, this becomes: B^/std.err(B^)

31
Q

What are the 2 main threats to OLS?

A

1) The estimator of the causal effects of interest may be biased or inconsistent. This will be the case if the exogeneity assumption does not hold (A2 violated). E(u|x) = 0 is pretty strong. Could be caused by omitted variable bias, measurement error, simultaneity (reverse causation), selection bias…
If assumption A2 is violated then our OLS estimator will not longer be an unbiased (and consistent) estimator of the causal effect of our independent variables on our dependent variable.
In this case OLS still has a descriptive purpose – it can tell us about the
correlations (or linear associations) between variables.

2)The estimated standard errors may be inconsistent. If this is the case, we cannot conduct tests on the parameters of interest. This can happen if the errors are not homoskedastic (i.e. there is heteroskedasticity, A3 violated), or if they are correlated (sample not
iid, A4 violated).

32
Q

What is the omitted variable bias formula and how do we interpret it?

A

E(B^|x,z) = B + γ* (cov(x,z)/var(x)).
1) If cov(x,z)>0, γ>0 –> z↑ –> x↑, y↑ => E(B^) > B
2) If cov(x,z)<0, γ>0 –> z↑ –> x↓, y↑ => E(B^) < B
3) If cov(x,z)>0, γ<0 –> z↑ –> x↑, y↓ => E(B^)< B
4) If cov(x,z)<0, γ<0 –> z↑ –> x↓, y↓ => E(B^) > B

33
Q

How does human behaviour help us with regression discontinuity design?

A

Human behaviour is constrained by rules. Rules sometimes are arbitrary but they generate interesting experiments.

34
Q

When would we apply RDD?

A

1) RDD exploits precise knowledge of the rules determining treatment.
2) RDD is based on the idea that in a highly ruled-based world, some
arbitrary rules provide good experiments.
3) Researchers are interested in the causal effect of a binary intervention
treatment or a probability intervention treatment on a dependent
variable.
4) Units may be individuals, firms, countries or other entities, which are
exposed or not to a treatment with a clear cut-off.
5) RDD comes in two styles: sharp and fuzzy

35
Q

How would you set up RDD?

A

1) For the treatment, we use an eligibility index or assignment variable, on
which the population can be ranked.
2) A clearly defined cutoff score.
3) In an RD, the assignment variable typically has an effect on the
outcome.
▶ If it does not, that is not a problem at all. The RD will work fine,
but we might not even need an RD.

36
Q

What is sharp RDD?

A

1) Sharp RDD is used when treatment status is a deterministic and discontinous function of xi, where Di = {if xi>= 0, 1, if xi<0, 0}. where x0 is a known threshold or cutoff. This assignment mechanism is a
deterministic function of xi because once we know xi we know Di.
2) Treatment is a discontinuous function of xi because no matter how close xi gets to x0, treatment is unchanged until xi = x0.
Potential outcomes can be described by a linear constant effects function: E[Y1,i|xi] = α + βxi + ρDi + ϵi
where ρ is the causal effect of interest.
The regressor of interest Di is correlated with xi and it is a deterministic
function of xi.

37
Q

What does RDD allow for?

A

Non linearities in the running variable

38
Q

Why is graphical inspection key to RDD

A

Plot the running variable (determinant of treatment) on horizontal axis and outcome variable on vertical axis) because non linearity can be misten for discontinuity.