Econometrics 1 Flashcards
- Example of a spurious relationship
- sunspots is related to business cycles
2. snowfall in Portland and annual growth rate in U.S. investments.
- Linear regression: Y = α0 + α1χ + ε
Describe causation
Causation is assumed to flow from the RHS to the LHS. Variations in χ systematically cause variations in Y.
- Can econometrics show causation?
No. It can only show correlation. A logical and well-motivated mechanism is needed to CAUSALLY link the explanatory variables and the dependent variable. Without a “well motivated story,” a regression will be spurious.
- Define “identification.”
Identification is knowing that something is what you say it is. The estimate of a parameter is an estimate of that parameter and not, in fact, something else.
- Give an example of the identification problem and what can cause it.
Effect of school vouchers on educational outcomes. People who are more likely to do well in school signed up for vouchers & all that signed up received a voucher. The estimator was not identified because voucher effects were conflated with motivational and capability effects. Cause: “comparative advantage selection” or “self-selection bias.”
- What are the five elements that must be done correctly to achieve “good” empirical analysis.
- Specification: Choosing RHS variables and function form in logical and thoughtful manner (having a well-motivated story).
- Identification: Ensuring that empirical effects causally associated with each variable are well-defined (avoiding conflation).
- Data set: Obtaining data that is best structured for analysis (type of variation, sample size…)
- Estimator: Choosing best estimation rule for the model and data structure.
- Testing and validation: Providing info that demonstrates strengths and drawbacks of empirical model and analyais.
- In a regression analysis, the error term ε captures what?
[Kennedy, p. 3]
The error or disturbance term is the stochastic/random part of the model. The error term is justified in three main ways:
- Omission of influence of innumerable chance events
- Measurement error of included explanatory variables
- Human indeterminacy.
- What are the six assumptions of the linear regression model (Gauss-Markov Theorem assumptions)?
- Linear in the parameters and the error term
- Full rank: Explanatory variables are not perfectly correlated
- Exogeneity of explanatory variables: No X is correlated with the error term.
- Homoskedasticity and non-autocorrelation
- Data in X may be any mixture of constants and random variables, BUT must be generated by a mechanism that is unrelated to ε. [Data generation]
- Normal distribution: disturbances are normally distributed.
- On what does precision of the estimates depend?
- The amount of variation in Y and X contained in a sample (dependent and explanatory variables) –> More variation in the raw data- the better.
- Sample size (more is better).
- “Noise” level in process… which is related directly to variance of the error term.
- Define “precision”
Degree to which repeated measurements under unchanged conditions show the same results. Precision has to do w/ how compressed (small) the sample distribution is for OLS estimates. Precision declines as multicollinearity increases (error variance will increase). Precision increases w/sample size. Precision will decrease as the overall fit of the model improves (better fit results in less error therefore error variance will be reduced).
- What is a consistent estimator?
An estimator whose distribution converges on the true β as sample size increases. (Relates to an asymptotic estimator.) Consistency is large-sample (asymptotic distribution) counterpart to unbiasedness in finite, small-sample distributions.
- What is an efficient estimator?
An estimator that has the lowest variance of all other estimators in its class. Typically, where one unbiased estimator can be found- many other unbiased estimators are also possible. It is therefore desirable to chose the unbiased estimator that has the least amount of variance- aka “BEST unbiased” estimator. A researcher would be more confident that a single draw out of a distribution w/ less variance was closer to the true β than a single draw out of distribution w/large variance. [Kennedy, p. 16]
- Why are estimators restricted to be a linear function of the observations on the dependent variable?
Reduce task of finding the efficient (smallest variance) estimator to mathematically manageable proportions. [Kenndy, p. 17]
- Define BLUE (as an estimator that is BLUE).
B = "best" = efficient--> least amount of variance amongst all other estimators in the same class. Requires GM assumptions 1-4 hold for OLS LUE. L = Linear U = unbiased: β-hat = β E = estimator
- Define unbiased.
The mean of the sampling distribution of the estimator is equal to the true parameter value.
“On average” an estimator will correctly estimate the parameter in question; it will not systematically under- or over-estimate the parameter (Greene, p. 54).
Unbiasedness is a property of finite (small-sample) least squares.
- What are “residuals”?
ε = y - y(hat). Observed value - predicted value.
- Define OLS
Ordinary least squares: The estimator generating the set of values of the parameters that minimizes the sum of squared residuals.
- Why is OLS popular?
NOT necessarily because it makes residuals “small” by minimizing the sum of squared errors, but because (1) it scores well on other criteria such as R^2, unbiasedness, efficiency, mean square error and (2) computational ease.
- Define and describe R^2.
R^2 is the coefficient of determination. It represents the amount of variation in the DV “explained” by variation in the independent variables.
R^2 = SSR /SST -or- R^2 = 1- (SSE/SST)
CAVEAT: Because OLS minimizes sum of squared residuals, it also automatically maximizes the R^2 value.
- Define “SST”
Total sum of squares: Σ(y - ybar)^2
y = observed value
ybar = mean of observed values
SST = SSR + SSE
- Define “SSR”
Sum of squares due to regression (explained variation).
SSR = Σ(yhat - ybar)^2
yhat = estimated values
ybar = mean of estimated values
- Define “SSE”
Sum of squared errors/residuals (unexplained variation): Σ(y - yhat)^2.
y = observed values
yhat = estimated values
- What are three common data problems?
(1) Outliers. Can drop outliers to check difference in results, i.e. how much impact outliers had on estimation. If outliers can be explained, can augment model to take into account causes of outliers.
(2) Missing data. Don’t make up data. Make sure no self-selection resulting in bias, e.g. unemployment stats that don’t account for people who’ve given up looking for employment.
(3) Multicollinearity. Examine W.M.S. Run pre-estimate. Do hypothesis testing and drop variables with low impact on estimation. (**Start w/ more variables and drop variables w/low impact. Don’t start w/fewer and add….). If can’t break multicollinearity- GIVE UP or if have resources do own experiment.
- Is an OLS estimator a random variable?
Yes- because estimator (βhat) will vary each time the experiment is run because of the error term ε.
- Under what data generating conditions, does the Gauss-Markov theorem/assumptions apply and then do asymptotic properties apply along with the Grenander conditions?
GM properties used with finite, small-sample –> Non-stochastic X or experimental types of data.
Asymptotic properties used w/ stochastic X or non-experimental data.
- Interpret slope parameter on a standard linear model:
weight (kg) = -114.3 + 106.5*Height (meters)
For every 1 meter increase in height, weight will increase by an average of 106.5 kg.