Questions from Course Manual Flashcards

1
Q

How can we use the standard error to infer statistical significance of a coefficient?

A

The standard error determines how much variability “surrounds” a coefficient estimate. A coefficient is significant if it is non-zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we need to interpret coefficients when independent variables are dummies?

A

A dummy variable is always compared with the reference group. For example, in a regression assessing the relationship between income and political affiliation, a positive regression coefficient means that income is higher for the dummy variable than for the reference group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does controlling for other variables mean? What is the difference with interaction variables?

A

Controlling for a variable is the attempt to reduce the effect of confounding variables.
An interaction variable is a variable constructed from an original set of variables to try to represent either all of the interaction present or part of it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the R2 measure and mean?

A

R-squared explains to what extent the variance of one variable explains the variance of the second variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can misspecification tests help to assess the validity of your OLS estimates?

A

A regression suffers from misspecification of the functional when the functional form of the estimated regression model differs from the functional form the population regression function. Functional form misspecification leads to biased and inconsistent coefficient estimators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Under which conditions do omitted variables or reverse causality bias coefficients?

A

For omitted variable bias to occur, two conditions must be fulfilled:
- X is correlated with the omitted variable
- The omitted variable is a determinant of the dependent variable Y
Reverse causality leads to correlation between X and the error in the population of interest such that the coefficient on X is estimated with bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens to the estimates when there is measurement error in the dependent variable?

A

if there is measurement error in the dependent variable, and the measurement error is random than there is no bias but only an increase in variance. If the error is random, then there is a bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens to the estimates when there is measurement error in the independent variable? In which direction does measurement error bias coefficients in this case?

A

When independent variables are measured imprecisely, we speak of errors-in-variables bias. This bias does not disappear if the sample size is large. If the measurement error has mean zero and is independent of the affected variable, the OLS estimator of the respective coefficient is biased towards zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Under which conditions can panel data be used to solve omitted variable bias?

A

Regression using panel data may mitigate omitted variable bias when there is no information on variables that correlate with both the regressors of interest and the independent variable and if these variables are constant in the time dimension or across entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is meant by Pooled OLS? When is Pooled OLS appropriate? Can this be tested?

A
  • POLS refers to the application of OLS to panel data. In POLS the data is treated as if it were cross-sectional, ignoring the time effect.
  • POLS is appropriate when the explanatory variables in each time period are uncorrelated with the idiosyncratic error (the time-varying part of the error)
  • to test for pool ability use Breusch-Pagan Test, which tests for heteroscedasticity
  • if the errors are heteroskedastic then there is a correlation idiosyncratic error and the explanatory variable x.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is meant by a fixed effects estimator?

A

In Panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. IN panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between FE and RE estimator? How can you choose between the FE or RE estimates?

A
  • There are two assumptions about the individual-specific effect: FE assumption and RE assumption.
    The RE assumption is that the individual-specific effects are uncorrelated with the independent variable.
    The FE assumption is that the individual-specific effects are correlated with the independent variable.
    Hausmann checks whether FE & RE generate similar results
    Sargan J determines correlation between error term and independent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is meant by a First Difference estimator?

A

FD approach is used to address the problem of endogeneity which is caused by unobserved heterogeneity. The endogeneity problem leaves the estimator biased and inconsistent, therefore FD are taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the criteria for a valid instrumental variable?

A
    • The instrument z should be correlated with the independent variable x
  • The instrument z should not be correlated with the error u
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the criteria for a valid instrumental variable?

A
    • The instrument z should be correlated with the independent variable x
  • The instrument z affects the dependent variably y only through x.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a first stage mean? How do you determine whether an instrument, or a set of instruments, is strong? What is the danger of weak instruments?

A
  • The instrument must be correlated with the endogenous explanatory variable. If the correlation is strong, then the instrument is said to have a strong first stage. A weak correlation may provide misleading inferences about parameter estimates and standard errors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the exclusion restriction imply? Can this be tested?

A

Exclusion restriction:
The instrument z affects dependent variable y only through x. In other words, z itself does not cause y.

The exclusion restriction cannot be tested.

18
Q

How does the IV estimator solve the potential endogeneity problem of OLS? Why does this create a trade off between consistency and efficiency? Can this be tested?

A
  • we can apply IV estimations because IVs are used to cut correlations between error term and independent variable
  • if there is correlation between instrument and error term, then IV regression is not consistent
  • the standard errors of IV are relatively large, which causes a loss of efficiency
  • Hausman test checks endogeneity condition to decide between OLS and IV
19
Q

How does Regression Discontinuity (RD) work? What is the difference with an IV approach?

A

In RD you take a subsample, which consists of observations that are close around the instrument (before and after). The further you move away from the threshold (the larger the bandwidth gets), the more dissimilar the control and treatment group become ( the standard error increases). For small bandwidth conclusions about causality can be drawn, as the groups are more similar and there are less disturbing factors influencing the model. Disadvantage of a small bandwidth: small sample size.

  • -> For IV, instrument is random by assumption.
  • -> For RD we know our cut-off - not random so we need to restrict the bandwidth
20
Q

What is the difference between a sharp and a fuzzy RD?

A

Sharp RD: if treatment and control are perfectly predicted based on whether an observation is above or below the threshold
Fuzzy RD: if the threshold is strongly correlated with treatment group, but its not a perfectly predictive relationship

21
Q

What kind of a trade-off does one need to make in the choice of bandwidth?

A
  • larger bandwidth increases standard errors

- smaller bandwidth decreases sample size

22
Q

What is meant by Difference-in-Differences? How can DiD be obtained by regression techniques?

A
  • DiD is used when a treatment or program has to be evaluated
  • there has to be treatment and control groups
  • groups are observed before and after
    Key Assumption:
  • trend in control group approximates what would have happened in the treatment group in the absence of the treatment
23
Q

Why is a common trend important in this technique? Can this be tested?

A
  • Without trend, groups are not comparable
  • Trend can be tested visually by graphing the trend before and after the treatment
  • Formal test which is also suitable for multivalued treatments or several groups is to interact the treatment variable with time dummies
24
Q

What is an autoregressive (AR) model? When can it be used?

A

An AR model predicts future behavior based on past behavior. Its used for forecasting. The process is basically a linear regression of the data in the current series against one or more past values in the same series.
Where simple linear regression and AR models differ is that Y is dependent on X and previous values for Y.

25
Q

What is an ARDL(p,q) model? How can such an ARDL be estimated)

A

A model where lagged values of the dependent variable appear as explanatory variables (the “autoregressive” part) and the other explanatory variables all have several lag (the “distributed” lag part).
The ARDL model is useful for forecasting and to disentangle long-run relationships from short-run dynamics.

26
Q

How to compute (cumulative) dynamic effects in an ARDL model?

A
  • it sometimes is of interest to know the size of current and future reaction of Y to change in X. This is called the dynamic causal effect on Y of a change in X.
  • two assumptions to measure dynamic causal effect:
  • -> Stationarity
  • -> Exogeneity
27
Q

What are information criteria?Why are they used?

A

There are two information criteria
- Bayes information criteria (BIC)
- Akaike information criteria (AIC)
Both criteria are estimators of the optimal lag length p.
The basic idea of both criteria is that the SSR decreases as additional lags are added to the model.
Note that increasing the lag order increases R2

28
Q

What is residual autocorrelation? How can we test for this in time series?

A

Autocorrelation refers to the degree of correlation between the values of the same variables across different observations in the data.
Test for autocorrelation with
–> Durbin-Watson test
- can’t be used with multiple lags. Not valid with lagged dependent.
–> Berausch-Godfrey test
- H0 no serial correlation between residuals

29
Q

How can we test for residual autocorrelation in long panels?

A

To test for autocorrelation in long panels one can make use of the Wooldridge test

30
Q

What is cross-section dependence? How can we test for this?

A

The problem of cross-sectional dependence arises if the n individuals in our sample are no longer independently drawn observations but affect each other’s outcomes.

31
Q

What is the Nickell bias in dynamic panels?

A

The Nickell bias refers to the fact that when estimating

32
Q

What is the Nickell bias in dynamic panels?

A

The Nickell bias refers to the fact that when estimating dynamic panels using POLS the parameter y will be overestimated and when using FE/FD the parameter y will be underestimated.

33
Q

What are stationary and non-stationary data?

A
  • A time series Yt is stationary if its probability distribution is time independent
  • Stationarity is when the outcomes are random. If they are not random, the process is non-stationary. Regressions that are non-stationary are called “spurious” and may have no meaning.
34
Q

What is unit root? How can we test for unit roots?

A
  • A unit root is a stochastic trend in a time series, sometimes called a “random walk with drift”. If a time series has a unit root, it shows a systematic pattern that is unpredictable.
  • To test for unit roots, we test for stationarity with the DF or ADF test. Also with the Phillips-Perron test and the KPSS test.
35
Q

What is a spurious regression?

A

Regressions that are non-stationary are called “spurious” and may have no meaning.

36
Q

What is error-correction (ECM)? How are ARDL and ECM related?

A

ECM improves on regressions that use only differenced variables to ensure stationarity. The variables included in an ECM must be cointegrated. The model allows for short-run and long-run dynamics.

37
Q

What is error-correction (ECM)? How are ARDL and ECM related?

A
  • ECM improves on regressions that use only differenced variables to ensure stationarity. The variables included in an ECM must be cointegrated. The model allows for short-run and long-run dynamics.
  • We derive ECM from ARDL. To derive the corresponding error-correction model (ECM), rewrite the ARDL, with one lag less.
38
Q

What is cointegration? How are cointegration and ECM related?

A

Cointegration is the existence of long-run relationship between two or more variables.

39
Q

How can we test for cointegration using Engle-Granger (EG), Dynamic OLS (DOLS), Bounds testing and Johansen-Juselius?

A

EG
- same as ADF but accounts for estimation of ß
- H0 -> no cointegration
- HA -> cointegration
DOLS
- lags and leads of the independent Δ𝑥𝑡’s are added to the above static level regression
Bounds testing
- this approach tests for the significance of the lagged 𝑦𝑡−1, or the joint significance of 𝑦𝑡−1 and 𝑥𝑡−1’s in the unconstrained ECM
- Johansen
Tests for number of cointegrating relationships. Less restrictive than EG. Maximum likelihood approach in which all the unknown parameters are estimated simultaneously
H0 -> no cointegration
HA -> cointegrating relationship(s)

40
Q

What is a VAR? What is a VEC?

A
  • In VARs we model several series in terms of their past. A VAR is an n-equation, n-variable linear model in which each variable is in turn explained by its own lagged values, plus current and past values of the remaining n-1 variables. VARs are powerful in data description and forecasting, but do not solve the problem of correlation/causation.
  • The VEC is a more generalized form of the ECM. The VEC has cointegration relations built into the specification so that it restricts the long-run behavior of the endogenous variables to converge to their cointegrating relationships while allowing for short-run adjustment dynamics. The cointegration term is known as the error correction term since the deviation form long-run equilibrium is corrected gradually through a series of partial short-run adjustments.
41
Q

What is meant by Granger-causality? How can you test for Granger-Causality?

A

Testing for Granger causality is a test whether, after controlling for past y, does past z (a series) help to forecast yt. It is a test for the ability of one series to predict another. Granger causality does not imply actual causality.
If the p-value of the independent variable is insignificant, that variable does not granger cause the dependent variable.
What you want to know is if a particular variable comes before another in the time series.

42
Q

How can we test the validity of panel estimates? What standard errors should we use?

A

The validity of panel approach first depends on the poolability tests. LSDV and Breusch-Pagan LM.
Clustered Standard Errors should be used.