Econometrics Flashcards
TSS
ESS+RSS
Degrees of freedom for TSS
n-1
Degrees of freedom for ESS
k-1
Degrees of freedom for RSS
n-k
MSS
XSS/df
R-Squared
ESS/TSS, The proportion of the total sample variation that is explained by the independant variable.
Adjusted R^2
1-(1-R^2)*((n-1)/(n-k)
F test
(Explained SS/(df) )/(RSS/(df)), use table to find level of significance.
t-test for significance of coeficients
(beta-0)/se
confidence interval
mean + or - se*t, t using the degrees of freedom n-1
Population Regression Function
E(y*bar**x), it is a linear function of x. It is constructed from the population data. How the conditional expecatation of y changes with x.
OLS regression analysis
The technique of estimating the coefficient of population regression by minimising the sum of the square residuals between the sample regression function and the observed data.
Dummy Variable
a binary variable where the observations are either 0 or 1. Usually describes different groups, or states of nature. aka qualitative variables. Where x=0 is the baseline case and x=1 is non-baseline case.
Chow test
restricted F-Test for structural stability. Used to test the stability of the coefficients when the data is split at a specific point. Usually used in time-series data to test for the presence of a structural break. This is done by comparing the restricted model to the unrestricted model.
Goodness of Fit
it is defined as ESS/TSS. This is a measure of the proportion of the total variation in Y that is explined by the regression model. And as such gives the goodness of fit. Since TSS = ESS +RSS, SUM(Yi-Ymean)^2 = SUM(Yhat-Ymean)^2+Sum(ui)^2. Do not exceed 1 or less than 0
Assunption 1
linear in perameters
assumption 2
random sampling
assumption 3
sample variation in explanatory variables
assumption 4
0 conditionalmean. error term has an EV of zero.
Autocorrelation
correlation between the errors in different time periods
Bias
The difference between the expected value of an estimator and the population value of of the value.
critical value
in hypothesis testing, the alue against which a test statistic is compared to determine whether or not the null hypothesis is rejected.
Fitted value
the estimated values of the dependant variable, when the regression is run.
heteroskedasticity
The variance of the error term is not constant.
kurtosis
the measurement of the the thickness of the tails of the distribution
least squared estimator
an estimator that minimises the sum of the squared residuals.
Multicolinearity
correlation among independant variables
Power of a test
the probability of rejecting the null hypothesis when it is flase. Depends on the values of the population perameters under the alternative.
residual
the difference between the actual value and the fitted value. there is a residual for each observation.
testing the marginal contribution of variables
h0: restriction is true, e.g. B3 = 0, h1 restriction is not true. (RSSr - RSSur/m)/(RSSur/n-k)
where m is the number of restrictions, n is the number of observations, k is the number of perameters in the unrestricted case. This is an F-Test is F is greater than the critical value then reject the null. ‘variables provide a significant marginal contribution.’
F test restiction R^2 form
(R^2ur-R^2r/m)/(1-R^2ur/n-k)
Multiple regression assumption 5
Homoskedasticity.
Why does heteroskedasticity cause problems with OLS
Since we have to estimate the variances of the perameters in our regression, we use the homoskedasticity assumption. And our standard errors are based entirely on this, therefore t-test may not work.
Causes of heteroskedasticity is cross-sectional
Error variance may increase proportionally with a variable in the model. Since cross sectional data often includes very small and very large values, the variance may be higher at higher values of x.
consequences of heterosedasticity
perameters are still unbiased. the variance of betas is different under heteroskedasticity, (efficiency). P tests and t tests will be wrong as the homoskedasticity assumption is used to estimate s.e.
GLS
Transforming the model to create a new model tha obeys the classical assumptions. Since the variance is not constant we can represent it as w^2*o^2. By dividing all of the perameters by w, we can find a new error term. o/w. 1/w^2 * w^2 *o^2 = o. All variables are now divided by w. In GLS, it now obeys classical assumptions so it is unbiased and will have a smaller variance than OLS on the original model.
Park test for heteroskedasticity
find the squared residuals from the original regression. Run the regression. lnu^2 = α + βln(Xi) + Vi. We can then test Ho: β = 0. Significant β suggests heteroskedasticity is present. However, this does not rule out that the error term in the new regression is not heteroskedastic itself. and therefore the significance of the beta could be wrong.
Goldfields-Quandt Test for heteroskedasticity
We estimate the varince of Ui as the RSS/n-k. Order the data by magnitude of Z (explanatory variable), omit some observations in the middle. Run two seperate regression, with low values and high values. Find the RSS for both. the test statistic is
o2^2/O1^1 = (RSS2/n2-k)/(RSS1/(n1-k). If this test statistic is greater than the critical F value then we can rejcet the null hypothesis that the variance is constant. Only tests one variable.
White Test for heteroskedasticity
Run OLS on the original regression, find residuals. Run a second regression with Ui^2 as the dependant variable and the original variables, variables squared and interaction effects as the dependant varaibles. Obtain R^2 from this regression. Calculate the White test statistic. W = nR^2. This test statistic has a chi^2(p) distribution where p is the number of regressors in the auxillary regressiion (not the constant.)
Breusch-Pagan-Godfrey Test for heteroskedasticity
Run original regression and obtain residuals. Use this to construct a varibale Pi. Pi = (ui^2)/Sum(U^2i/n). Run a second regression with p as the dependant variable. Calculate BPG = ESS/2. On a Chi^2(p) distribution.
Correcting for Heterskedasticity if the error variances vary directly with an explanatory variable.
GLS. So wi = Xi. Divide all variables by Xi. Rerun regression
Correcting for heteroskedasticity if the error variances vary inversely with an explanatory variable.
GLS. divide by 1/Z^2. Issues: if Z=0 doesnt work, dummy variables etc.
If we cannot do GLS
Use OLS but with the corrected formula for variance. Using E(u^2) as an estimate for o^2w^2. Then use this estimated variance for intervals. Pros: easy to calculate Cons: not as efficient.
Types of incorrectly specified Model
Omission of relevant Variables, Inclusion of irrelevant variables, measurement error.
Issues with incorrect Specification
the Expected value of Beta doesnt equal the population value., we obtain a biased estimate of beta 1.
RESET test for model specification
Rerun a regression with the dependant variable squared and cubed as explanatory varibles. ho: betay^2 = betay^3 = 0, if it is rejected then model is correctly specified. Do normal restricted f-test where UR is the regression with added Y^2 etc.
Lagrange multiplier test for model specification
obtain Ui from an estimated restricted regression. Run auxiliary regression: regress residuals on all regressiors including omitted ones. LM = nR^2 Chi^2 distribution. if it is significant reject restricted regression.
autocorrelation
When the assumption that the errors are uncorrelated with one another is violated.
interpretation of beta in a linear regression
beta = dy/dx
Interpretartion of beta in log-linear model
beta = dlny/dlnx, not a marginal effect. this is an elasticity. dy/dx * x/y.
interpretation of beta in semi-log model
B = (dy/y)/dx for lny = … or B = (dy)/(dx/x) for y= b + bln(x).
estimated variance of Beta
sigma^2*SUM(X^2)/(SUM(X^2))^2
Expected value of beta
b + (1/SUM(x^2))*SUM(XiE(ui))