OLS Flashcards
What is the difference between causal effect and correlation?
Causal effect tells us that changes in one variable (say hot weather) lead to changes in an other variable (say ice cream sales). Correlation is slightly similar as it shows two variables moving in a similar pattern (either negatively or positevely so) but it does not mean that one causes the other, it could mean that there is another factor influencing both (say a higher amount of wasps seems to correlate with a higher amount of ice cream sales but they are both driven by hot weather).
What are Quasi-Experimental Methods?
Research designs that sharecharacteristics with experimental designs but lack full randomization of participants into treatment and control groups. Often involve naturally occurring events that researchers leverage to study the effects of a treamtent. When true randomization is not feasible or ethical.
What is OLS?
Ordinary Least Squares: method used to estimate parameters of a linear model. The OLS method involves finding the values of the regression coefficients that minimize the sum of the squared residuals, where the residual is the difference between the observed and predicted values of the dependent variable. minB0,B1,…Bk sumi=1->n (Yi -(B0 + B1Xi1 +B2Xi2 +…BkXik))^2
What is the line of best fit?
the line which best describes the relationship
between yi and xi. The line of ‘best fit’ is the one that gives the best approximation to all the
data points. It does this by minimising the (squared) distance between the line and all of the data points.
In OLS, how do we define the predicted outcome
yi^ = a + Bxi
What is the residual in OLS
ui = yi - yi^
What is the best line in function notation
y^ = a + Bx
How do you visually represent OLS and line of best fit?
draw graph
Formally, what is the OLS estimate?
The value of a,B that minimises SSR(a,B): a^,B^ = arg min(a,B) SSR (a,B). We find a^ = mean(y) - B^mean(x) and B^ = cov(x,y)/var(x).
What does a^ capture
the intercept
What does B^ capture
The fact that the slope
coefficient relates to what happens to y when x changes.
Why is the simple OLS model not favoured by researchers?
Usually as social scientists we want more than just the best linear
approximation of one variable given another variable (or variables).
We want to say something about the causal effect and for that we need to specify a model
What is the classic linear model?
y = α + β1x1 + β2x2 + … βkxk + u. y, x1, x1, . . . xk and u are random variables.
u is the residual or error term.
α and the βs are referred to as parameters (or coefficients when we
estimate the model) and are real numbers. The model writes the outcome of interest y (e.g. wage in our earlier
example) as a linear function of some explanatory variables x (say age,
gender, education, . . . ) plus a residual or error term u. This is the first
assumption of the model: that the relationship is linear in the parameters.
What is the residual u of the classic linear model?
The residual u can be thought of as standing for ‘unobserved’ – everything
that we think may affect y but we do not obse
How are we able to determine a causal effect?
In order to give the causal interpretation we want to the model, we need to
be able to interpret βk as the marginal effect of xk on y whilst keeping all
of the other variables (xm for m ̸= k) and the error term u constant
i.e. our good old ceteris paribus condition, where ‘all’ includes the
unobservables. In practice we cannot do this since u is unobserved, we cannot hold it
constant.