Simple Regression Flashcards
What is the most common took of the applied economist?
Regression
What is regression?
It is used to help understand the relationships between many variables
What does regression do on an XY-plot?
It fits a line through the points in the XY-plot that best captures the relationship between X & Y
What is the equation of a straight line (linear function)?
Y = 𝛼 + 𝛽X
What is 𝛼 in the straight line equation?
The intercept
What is 𝛽 in the straight line equation?
The slope
Why would we never get all points on an XY-plot lying precisely on it?
Due to measurement error
Is the straight line the true relationship in an XY-plot?
The true relationship is probably more complicated, a straight line may just be an approximation
What happens to important variables which affect Y?
They may be omitted
What is the simple regression model?
Y = 𝛼 + 𝛽X + 𝑒
What is 𝑒 in the simple regression model?
The error term
What does regression analysis use?
It uses data (X and Y) to make a guess or estimate of what 𝛼 and 𝛽 are
What happens if there are more than two points on the XY-plot?
It won’t be possible to find a line that fits perfectly through all points
Why do we need to fine the “best fitting” line?
Because it makes the residuals as small as possible
What do we mean by “as small as possible”?
The one that minimises the sum of squared residuals
What is the most common method used to fit a line to the data?
We obtain the “Ordinary Least Squares” or OLS estimator
How do we choose 𝛼 and 𝛽?
So that the vertical distances from the data points to the fitted line are minimised
What does OLS do?
It minimises the sum of the squared residuals
What is Y?
Dependent variable
What is X?
Explanatory (or independent) variable
What are 𝛼 and 𝛽?
Coefficients
What are 𝛼’ and 𝛽’ ?
OLS estimates of coefficients
How do we decide which is the dependent variable?
Ideally, the explanatory variable should be the one which causes/influences the dependent variable (X causes Y)
What is an example of a model with this dependent variable?
Increases in X (population density) causes Y (deforestation to increase) - not vice versa
Why must great care be taken in interpreting regression results as reflecting causality?
In some cases: the assumption that X causes Y may be wrong, we may not know whether X causes Y, X may cause Y but may also cause X, and the whole concept of causality may be inappropriate
What question does regression address?
How much of the variability in Y can be explained in X?
What do good fitting models have?
Small residuals
What does it mean if the residual is big for one observation?
Then it is an outlier
Why is it good to look at fitted values and residuals?
It can be very informative
What is the coefficient of determination?
The total variability in the dependent variable Y equals the variability explained in the explanatory variable (X) in the regression plus the variability that cannot be explained and is left as an error
What is R^2 known as?
The most common goodness of fit statistcs
What is one way to define R^2?
To say that it is the square of the correlation coefficient between y and yi
We can split the TSS into two parts, what are these parts?
Explained Sum of Squares and the Residual Sum of Squares
Where must R^2 lie between?
It must always lie between 0 and 1
What does R^2 = 1 mean?
Perfect fit - all data points are exactly on regression line
What does R^2 = 0 mean?
X does not have any explanatory power for Y whatsoever
What does bigger values of R^2 imply?
That X has more explanatory power for Y
R^2 is equal to what?
The correlation between X and Y squared
What does R^2 measure?
The proportion of the variability in Y that can be explained in X
How do we carry out non-linear regression?
Replace Y or X (or both) in the regression model by a suitable non-linear transformation (ln(Y) or X^2)