Econometrics Flashcards
How to formulate a model?
- Statement of theory/hypothesis
- Collect data
- Specify mathematical model and stats theory
- Estimate parameters
- Check for model adequacy
- Test hypothesis
- Use model for predictors
Types of data and examples?
Time series: e.g GDP, unemployment. Can be both quantitative (e.g. prices) and qualitative (gender)
Cross-section: Data on variables from one point in time
Pooled: Combination of both
What is a linear regression?
Regression studying the linear relationship between dependant (explained) variables and independent (explanatory)
What is the population regression function?
Mathematical representation of the line of best fit
What does the error term represent?
- stochastic error (random probability)
- Represents variables not in the model
- Randomness of human behaviour
- Errors of measurement
- Ockham’s razor (keep simple until proved inadequate)
How is a sample regression function different?
Used when you can only estimate values using a sample of the data. Error term is a residual (ei) and Y has a ^ as it is an estimate. Get as close to the PRF as we can
How does OLS work?
Aims to minimise the value of the residual sum of ei^2.
Equation for B1 and B2
Mean of Y - B2(Mean of X) = B1
Sum of XiYi - (n)(mean of x)(mean of y).
ALL DIVIDED BY
Sum of Xi^2 - (n)(Mean of X)^2)
OLS properties?
- SRF passes through the sample means
- Mean of residuals is 0
- Sum of residuals and explanatory variables X is 0 (uncorrelated)
Difference between percentage increase, and percentage point increase?
6%- 7% = 1% percentage point increase
((7-6)/6) x 100 = 16.6% percentage increase
Assumptions of OLS model?
-Linear parameters
-X is uncorrelated with U
-E(u l xi) = 0
-Var (u) = δ^2
-No correlation of error terms (autocorrelation)
cov (ui, uj) = 0 (I and J not equal)
-No specification errors
What is homoscedastic variance? How is it calculated?
All variables have the same variance
Sum of residuals squared (RSS) / n -2 (degrees of freedom)
What is the Gauo- Markov theorem?
OLS estimators are BLUE: Best linear unbiased estimators
What is the central limit theorem?
If there is a large number of independent and identically distributed random variables then the distribution of their sum tends to a normal distribution as the sample reaches infinity
When is the T distribution used?
To test the null that Ho: B2 = 0, to see if there is a relationship between X and Y and the true variance is unknown
T test equation?
(b2 - B2) / se (b2) approx equal to t, n - (n-1)
Why might a one tailed test be used?
If you think the value is definitely above 0 for example
What is the total sum of squares ( sum of y^2) equal to?
ESS + RSS
(b2^2 x sum of (x^2)) + Sum of residuals squared
Proof that TSS = ESS + RSS
Y = (Y^) + e (Y - MeanY) = (Y^ - MeanY) + (Y - Y^)(e) y = b2xi + e and Y^ = b2xi Solve out... Sum of y^2 = b2^2 x sum of x^2 + sum of residuals squared
If the line is a good fit what relationship does ESS and RSS have?
ESS > RSS
How can R^2 be calculated?
ESS/TSS
OR
1 - Sum of residuals / sum of y^2
What is a normality test and some examples?
Used to see if data is close to a normal distribution. Tests include histogram of residuals, probability plot and Jarque- bara test
How is the Jarque-Bara test conducted?
Test skewness and kurtosis for a normal distribution match (small value = normal)
n/6 (skew^2 + (kurt - 3)^2 /4)
What is forecasting?
Using the equation to predict the value outputted (only use numbers within the range to avoid extrapolating)
What does B2 measure in a multiple regression?
The change in the mean of Y, per unit change in X2 holding X3 constant
What is multicollinearity?
When an exact linear relationship exists between the explanatory variables
How to use P value for significance testing?
Calculate T test statistic. Calculate the P value.
“If the null hypothesis is true, what is the probability that we’d observe a more extreme test statistic in the direction of the alternative hypothesis than we did?”
Set the significance level, α (type 1 error) at 5% etc. If the P-value is less than (or equal to) α, reject the null hypothesis in favor of the alternative hypothesis.
What do lower P Values mean?
More chance of rejecting the null
What is the test of overall significance?
Ho; B2 = B3 = 0, jointly and simultaneously equal to 0, no influence on Y. Can be significant variables together even if not apart
Equation for the F test?
ESS/ D.F Variance explained by X2 and X3 over
RSS/D.F Unexplained variance
K-1 df in numerator
N-k df in the denominator
What does K and N represent?
N: Number of partial slope coefficients
K: Number of parameters (slopes and intercept)
What does a large F value mean?
More evidence that X2 and X3 do have an effect on Y
Equation linking F and R^2
F = R^2 / (K-1)
(1-R^2) (N-K)
n= Number of observations k= number of explanatory variables
When R^2 = 1 what does F equal?
Infinity
TSS in terms of R^2
(Sum of y^2) = R^2 (Sum of y^2) + (1-R^2)(Sum of y^2)
What is a specification bias?
If X3 is ignored then X2 displays the gross effect of X2 and indirect effect of X3 by omitting the values we have a specification bias
Why can’t we compare R^2 values?
R^2 is larger the more explanatory variables there are but doesn’t account for degrees of freedom- cannot compare two values!
How to calculate adjusted R^2 to compare values?
1 - (1-R^2) ((n-1)/(n-k))
Properties of the adjusted R^2?
Adjusted R^2 is less than or equal to R^2. The more variables in the model the smaller the adjusted value compared to R^2 becomes (it can become negative)
Adjusted R^2 increases in absolute t value of the coefficient is greater than 1
When can we use RLS?
This assumes some of the variables do not belong in the model, only use this when the dependant variables are in the same form
F test for restricted least squares equation?
Fm, n-k = (R^2ur - R^2r) / m
(1-R^2ur) / (n-k)
What is the elasticity coefficient?
%change in y / % change in x
What does a coefficient of B2 represent on a linear y equation with a continuous variable?
y = b1 + b2x1
A one unit change in x generates a B2 unit change in y
What does a coefficient of B2 represent on a linear y equation with a log variable?
y = b1 + b2lnx1
A 100% change in x generates a b2 change in y
What does a coefficient of B2 represent on a linear y equation with a dummy variable?
y = b1 + b2D1
The movement of the dummy from 0 to 1 produces a B2 unit change in y
What does a coefficient of B2 represent on a log y equation with a continuous variable?
lny = b1 + b2x1
A one unit change in x generates a 100*B2 percentage change in y
What does a coefficient of B2 represent on a log y equation with a log variable?
lny = b1 + b2lnx1
A 100% change in x generates a 100*B2 percentage change in y
What does a coefficient of B2 represent on a log y equation with a dummy variable?
lny = b1 + b2D1
The movement of the dummy from 0 to 1 produces a 100*B2 percentage change in y
What does a coefficient of B2 represent on a dummy y equation with a continuous variable?
Dy = b1 + b2x1
A one unit change in x generates a 100*B2 percentage point change in the probability y occurs
What does a coefficient of B2 represent on a dummy y equation with a log variable?
Dy = b1 + b2lnx1
A 100% change in x generates a 100*B2 percentage point change in the probability y occurs
What does a coefficient of B2 represent on a dummy y equation with a dummy variable?
Dy = b1 + b2D1
The movement of the dummy from 0 to 1 produces a 100*B2 percentage point change in the probability y occurs
In a log linear model what does B2 represent in comparison to B3?
B2 measures the elasticity of Y with respect to X2, holding X3 constant (partial elasticity)
What is a semi-log model?
Used to examine growth rates, by replacing ln equations with B, for regression- Only one variable in log form. Slope coefficient measures the proportional change in Y for an absolute change in explanatory variable
What is a linear trend model?
When Y is regressed on itself (Yt). This displays the absolute changes, not the relative. (Needs a stationary error term mean and variance)
What are polynomial regression models?
When the variables are not linear but the parameters are- can still use regression analysis. Be careful for collinearity.
y= B1 + B2X + B3X^2 + B4X^3