2. Goodness of fit Flashcards
What are the three measures of variation?
- The total sum of squares (SST)
- Explained sum of squares (SSE)
- Residual sum of squares (SSR)
What does the total sum of squares measure (SST)?
The total variation in the dependent variable. (How much variation you have in the sample when you look at the dependent variable y) it measures how spread out the yi are in the sample
What does the explained sum of squares show (SSE)?
Represents the variation explained by the regression. SSE measures the sample variation in the ŷi
What does the residual sum of squares show (SSR)?
The variation not explained by the regression. SSR measures the sample variation in the u^i
What does R^2 show?
R^2 measures the fraction of the total variation that is explained by the regression
What is important to remember about R^2 as a measure of goodness of fit?
R^2 does not explain any internal validity just fitting the data, there is no sense of causality
What can a low R^2 be interpreted as?
Means there is a lot of stuff in the variation that is not explained by the model (the model is not a great fit)
What are the restrictions on how y and x can relate to the original explained and explanatory variables of interest?
As long as the model remains linear in parameters, there are no restrictions (logs for example). The mechanics of estimation and inference do not depend on how y and x are defined
What is a log-level model?
The natural logarithm of the dependent variable
How does taking the log-level model change your model?
Everything remains the same in terms of mechanics but your interpretation changes
What is linearity in and why does it mean we can take the log-level model?
Linearity is not in the variable, linearity is in the parameter
What does B1 in a level-level model measure?
B1 measures the unit change in y for a 1 unit change in x
What does B1 in a Log-level model measure?
B1 measures the percentage change in y for an absolute change in x
What does B1 in a Level-log model measure?
B1 measures the absolute change in y for a percentage change in x
What does B1 in a log-log model measure?
B1 measures the elasticity of y with respect to x, that is the percentage change in y for a given percentage change in x. A 1% increase in x is associated with a B1% change in y
What type of variables are Bo hat and B1 hat and what does this mean?
Random variables meaning that the outcome depends on the sample you use, it also means they have a distribution which means they also have a variance and expectation
What is a fitted value?
By definition, each fitted value of yi is on the OLS regression line. The OLS residual associated with observation i, u^i, is the difference between yi and its fitted value, as given in equation (2.21). If u^i is positive, the line underpredicts yi; if u i is negative, the line overpredicts yi.
What is the point of the OLS estimates?
They are chosen to make the residuals add up to 0 for any data set (see slide 32 W1)
What is the gap between the observed value and the fitted values line called?
The disturbance
Why is a linear regression model known as linear if we can have non-linearities as in the log case?
The key is that this equation is linear in the parameters ß0 and ß1. There are no restrictions on how y and x relate to the original explained and explanatory variables of interest.
What are the statistical properties of a simple linear regression model?
- Linear in Parameters
- Random sampling
- Sample variation in the explanatory variable
- Zero conditional mean
- Homoskedasticity
What conditions need to be satisfied for a regression to be unbiased?
- Linear in Parameters
- Random sampling
- Sample variation in the explanatory variable
- Zero conditional mean
In your own words, what is meant by unbiased?
The estimated coefficients may be smaller or larger depending on the sample that is the result of a random draw but on average (if repeated) they will be equal to the values that characterise the true relationship between y and x in the population. the sampling distribution of ß^1 is centered about ß1
What will happen if you repeat the random sampling an infinite amount of times?
You will get the true values