Topics in Applied Econometrics Flashcards
How is an independently pooled cross section usually obtained?
Sampling randomly from a large population at different points in time, usually, but not necessarily, different years.
When sampling from a population at different points in time, what can we likely say about the data obtained?
It may be independent but is unlikely to be I.I.D.
What is Panel Data?
The same entities are tracked across time, be it families, cities etc.
A sample population is selected from the given entity and the selected individuals are reinterviewed at several subsequent points in time.
It has a cross sectional and time series dimension.
What is another way to refer to panel data?
Longitudinal data.
13.1 - Pooling Independent Cross Sections Across Time - Why is panel data useful?
As it measures the changes in given parameters in a specific area over time, there exist some unknown factors within a sample. As such, by re-questioning the same individuals, the relationship and the dependent variable and at least some of the independent variables will remain the same over time.
13.1 - Pooling Independent Cross Sections Across Time - When the population has different distribution in different time periods, the intercept can differ across periods. How can we account for this?
Dummy variables can be included for all but one year, with the earliest year in the sample usually being the base year.
13.1 - Pooling Independent Cross Sections Across Time - Heteroskedasticity?
Wealth example?
The circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it. Variability can increase or decrease with time.
Annual income might be a heteroscedastic variable when predicted by age, because most teens aren’t flying around in G6 jets that they bought from their own income. More commonly, teen workers earn close to the minimum wage, so there isn’t a lot of variability during the teen years. However, as teens turn into 20-somethings, and 20-somethings into 30-somethings, some will tend to shoot-up the tax brackets, while others will increase more gradually (or perhaps not at all, unfortunately). Put simply, the gap between the “haves” and the “have-nots” is likely to widen with age.
13.1 - Pooling Independent Cross Sections Across Time - When should Logarithms be used in graphs?
LOGS are just a way of writing exponential functions.
The first is to respond to skewness towards large values; i.e., cases in which one or a few points are much larger than the bulk of the data. The second is to show percent change or multiplicative factors.
https://www.forbes.com/sites/naomirobbins/2012/01/19/when-should-i-use-logarithmic-scales-in-my-charts-and-graphs/
13.1 - Pooling Independent Cross Sections Across Time - Given the model:
log(wage) = β|0| + 𝛿|0| y85+ β|1|educ + 𝛿|1|y85.educ + β|2|female + 𝛿|2|y85.female + u
and given that the base year is 1975, what do the following equal:
β|0|
β|0| + 𝛿|0|
β|1| + 𝛿|1|
𝛿|1|
β|0| is the intercept for 1978.
β|0| + 𝛿|0| us the intercept in 1985
β|1| + 𝛿|1| is the return to education in 1985.
𝛿|1| measures how the return to another year of education has changed over the seven-year period.
13.1 - Pooling Independent Cross Sections Across Time - Given the model:
log(wage) = β|0| + 𝛿|0| y85+ β|1|educ + 𝛿|1|y85.educ + β|2|female + 𝛿|2|y85.female + u
What is the differential between men and woman in the two periods?
How could we test the null hypothesis that nothing has happened to the gender differential over the seven year period?
β|2| is the log(wage) differential between men and woman.
The differential in 1985 is β|2|+ 𝛿|2|
H|0|: 𝛿|2| = 0
H|1|: 𝛿|2| > 0
13.1 - Pooling Independent Cross Sections Across Time - When should one adjust nominal to real terms in a regression (use wage and log(wage) as an example)?
If we were using wage as the dependent variable, we would need to change the values to real. A year dummy should also be included to enable use to convert.
When using log(wage), with dummy variables for all periods (except the base period), the use of aggregate price deflators will only affect the intercepts; non of the slope estimates will change.
log(wage/P85) = log(wage) -log(P85).
where P(85) is the deflation factor for 1985 wages. You will note that while wage figures across people, P85 does not.
13.1 - Pooling Independent Cross Sections Across Time - Simply, what is the Chow test?
An F test.
13.1 - Pooling Independent Cross Sections Across Time - What is an F-Test?
F-Tests are used for testing multiple coefficients. e.g. H|0| = β|1|. β|2|, …, β|n| = 0.
We estimate the model using the independent variables of interest and find the estimate of the population error (u), which we call residuals.
Unrestricted Model: We then find the Sum of Squared Residuals (SSR) which will be the sum of the population error terms squared.
Restrecited Model: We find the SSR for y = α, where α is the mean, saying that the dependent variable doesn’t depend on the independent variables.
SSR of the unrestricted will always be greater than SSR for the restricted, as it will always go up as independent variables are added. We are asking whether it is significantly higher.
F stat = [SSR|restricted| - SSR|unrestricted|] / SSR|unrestricted| - We do need to adjust this based on n and the p value though, but it serves as a good base for understanding.
13.1 - Pooling Independent Cross Sections Across Time - T-Test?
Uses a T-Distribution - this is basically a normal distribution but with fatter tails. If the sample SD were known, we would use the normal distribution, but when it is not known, we use the t-distribution which is calculated based on the sample population. with n-1 degrees of freedom. As n increases, the t distribution increases to the standard normal.
We test that the sample mean (μ) is equal to some hypothesised value (μ|0|)
If there is a large enough difference between the sample mean and the value you are testing against, relative to the standard deviation, we reject the null hypothesis.
The higher the t stat, the less likely the null is true
13.1 - Pooling Independent Cross Sections Across Time - F-test vs T-test?
T-Tests are used for testing a restriction on a single variable of coefficient.
F-Tests are used for testing multiple coefficients. e.g. H|0| = β|1|. β|2|, …, β|n| = 0.
This would fail if any βi had was significant at a value greater than 0.
13.1 - Pooling Independent Cross Sections Across Time - Chow Test use?
How can this be applied to Panel data?
Is it robust against Heteroskedasticity?
An F test that can be used to determine whether a multiple regression function differs across two groups.
This can be applied to testing in two different periods.
As with any F test based on sums of squared residuals or R-squareds, this
test is not robust to heteroskedasticity. To
obtain a heteroskedasticity-robust test, we must construct the interaction terms and do a
pooled regression.
13.1 - Pooling Independent Cross Sections Across Time - How would we conduct a Chow Test?
Estimate the restricted model by doing a pooled regression allowing for different time intercepts; this gives SSRr. Then, run a regression
for each of the, say, T time periods and obtain the sum of squared residuals for each
time period. The unrestricted sum of squared residuals is obtained as SSRur = SSR1 +
SSR2 + … + SSR|T| .
If there are k explanatory variables (not including the intercept or the
time dummies) with T time periods, then we are testing (T 2 1)k restrictions, and there
are T 1 Tk parameters estimated in the unrestricted model.
13.2 (Panel Data - Two Period) - In empirical economics, what has 𝛿|1| HAT become known as?
The difference-in-differences estimator.
It shows the difference in the dependent variable due to an independent variable in the two time periods.
It will be fond in the equation next to a dummy variable for the latter year and the aforementioned independent variable.
𝛿|1| = (y|2, group 1| - y|2, group 2| - (y|1, group 1| - y|1, group 2|)
See P.455 in Text book for more information.
13.2 (Panel Data - Two Period) - What are β symbols used to represent and what are 𝛿 used to represent?
β are coefficients of explanatory/ independent variables.
𝛿 are coefficients used to allow multiple periods into an equation. The 𝛿 will be partnered with a dummy variable for a different year.