Applied econometrics Flashcards

Question

What are the consequences of high, but non-perfect multicollinearity?

Answer 1

OLS is still BLUE but: Large variances and covariances, precise estimation difficult, wider confidence intervals, t-ratio tends to be statistically insignificant, R^2 tends to be very high, OLS estimators and standard errors can be sensitive to small changes in data

Answer 2

When tro variables are highly correlated, we can regress all our other variables against one of them and use the residual errors from the regression instead of the variable in the primary regression

Answer 3

Looking for: High R^2 values but few significant t-values, High correlation between two explanatory variables, Scatterplot, Auxiliary regressions

Answer 4

Log-log: b1 is the elasticity of Y with respect to X. Log-linear: b1 is approx. the percentage change in Y with respect to x. Linear-log: b1 is approx. the change in Y for a 100 percentage change in X

Answer 5

Log models are invariant to the scale of the variables since measuring percent changes. They give a direct estimate of elasticity. For models with y > 0, the conditional distribution is often heteroskedastic or skewed, while ln(Y) is much less so. The distribution of ln(Y) is more narrow, limiting the effect of outliers.

Answer 6

OLS is no longer blue, there is biased standard errors, and the normal t- and f-statistics cannot be used

Answer 7

When the covariance between the error terms are zero; they are independently distributed. If not, there is autocorrelation

Answer 8

Dummy variable can be thought of as changing the intercept. Interaction term bewteen a dummy and a continuous variable can be thought of as changing the slope (and intercept?).

Answer 9

It tests if one regression line or two different regression lines best fit the data. If the two coefficients are equal, the null hypothesis can be rejected; two lines fit better than one

Answer 10

Probabilities/ the prediction can lie outside [0;1], therefore, use Probit or Logit. Also, the error, u, has a discrete, non-normal distribution.

Answer 11

Pr(y=1|X) = phi(beta0+beta1X) . 01

Answer 12

No, assuming the ommited variable does not change over time, the change in Y must be caused by the observed factors

Answer 13

1) changes specification method (not OLS) 2) n-1 binary regressors method 3) entity demeanded fixed efffects method method

Answer 14

That a regressor is correlated with the error term. Exogeneity, when the variable is uncorrelated to the error term.

Answer 15

"Simultaneity: X causes changes in Y and Y causes changes in X, Reverse Causality: Y causes changes in X."

Answer 16

It should be relevant (there should be a (strong) correlation to the explanatory variable) and exogenous (there should be no correlation to the error term)

Answer 17

When the independent variable is correlated with the error term. Løs funktion for beskrive x, indsæt herefter dette estimat som xhat i funktionen for y

Answer 18

Cov(Zi,Yi)/Cov(Zi,Xi). The beta is obtained through a two-stage procedure. First, you estimate an OLS regression of Xi on Zi (and other exogenous regressors). Second, you estimate the principal/original equation substituting Xi from the first equation.

Answer 19

Fewer: mk; over-identified and one should test relevance witht the J-statistic

Answer 20

The F-statistic. Weak instruments imply small first stage F-statistics

Answer 21

The j-statistic. Can only be used when the regression is overidentified, and it tests whether the error is explained/correlated to the instrumental variable. The null-hypothesis is, that the terms are uncorrelated.

Answer 22

1. It is difficult to find good estimates that captures all of the variance of the endogenous variables, which is not correlated with the error. 2. The instrument is often not well correlated with the endogenous variable, which is a problem (weak instrument/low relevance). 3. The OLS standard errors from the second stage regression are not right (use the ones provided by the software).

Answer 23

An expirement is an consciously implemented study that randomly assigns subjects to treatment and control groups. A quasi-experiment has a source of randomization "as if" randomly assigned, but this variation was not the results of an explicit randomized treatment and control design. Progam evaluation is the field of statistics aimed at evaluating the effect of a program or policy (ad campaign to stop smoking e.g.)

Answer 24

Internal validity refers to the validity of the findings within the research study. It is primarily concerned with controlling the extraneous variables and outside influences that may impact the outcome. External validity refers to the extent to which the results of study can be generalized or applied to other members of the larger population being studied.

Answer 25

1) Failure to randomize (X is correlated with u), failure to follow treatment protocol, attrition, experimental effects change individual behaviors

Answer 26

1) Nonrepresentative sample, 2) Nonrepresentative "treatment", 3) General equilibrium effects

Answer 27

As there is no correlation, from the beginning there was no ommited variable bias. However, including W, the error term is reduced and there will be smaller standard errors

Answer 28

Testing the effect of treated versus control group regressing delta Y against the mean of the treated and control group before and after treatment

Answer 29

In sharp RDD, crossing the threshold increases the probability of treatment from 0 to 1, and in fuzzy RDD, crossing the threshold increases the probability of treatment more slowly than from 0 to 1

Answer 30

When a change today has a causal effect several periods forward, thus estimating the effect of current and past changes in x on y.

Answer 31

d = delta, 100*dln(yt) = ln(yt) - ln(yt-1). Not a precise measurement, but an estimate of the percentage change.

Answer 32

A series exhibiting autocorrelation is related to its own past values

Answer 33

Cov (yt, yt-j) = E(yt - Eyt-j)*(yt-j - Eyt-j)

Answer 34

gamma(j)/gamma(0)

Answer 35

F/t>=AIC>=BIC

Answer 36

F-test that added x-variables to an AR regression (making it an ADL) have a coefficient that is significantly different than 0. Be aware that granger "causality" does not test for for causality, but rather whether the variable is a good predictor for Y.

Answer 37

A time series is stationary if its probability distribution does not change over time.

Answer 38

When the roots lie outside the unit circle

Answer 39

When the coefficient of Yt-1 is less than one

Answer 40

It is a persistent long-term movement of a variable over time. Deterministic trend is a nonrandom function of time. A stochastic trend is random (random walk model) and varies over time, also it can have a drift.

Answer 41

The normal random walk model is: yt+1 = Yt + ut. With a drift, Bo*h is included: yt+h|T = yt + Bo*h + ut. All errors are identically, idenpendently distributed.

Answer 42

Because the distribution of yt changes over time, that is: Var(Yt) = Var(Yt-1) + Var(Ut). But in order for Yt to be stationary, the variance must be constant over time, that is. Var(Yt) = Var(Yt-1), which implies that Var(Ut) = 0, and that cannot be true, since we know that Var(Yt) = t*var (ie variance is increasing in time as model becomes "more random".

Answer 43

This can be done be making an integrated model (taking the differences). We say that a nonstationary series is integrated id its nonstationarity is appropriately "undone" by differencing.

Answer 44

To handle stochastic trend, take the first difference --> it becomes stationary. BE AWARE we consider trends stochastic as most other brilliant econetricians! nonetheless, To handle deterministic trend, remove the trend. run the regression Yt=alpa0 + alpha1t + e_t and use the residuals for modeling purposes

Answer 45

Augmented Dickey Fuller tests for a stochastic trend in the regression. Null hypothesis is that we have a unit root, alternative hypothesis is that beta<1

Answer 46

Breaks arise from a change in the population regression coefficients at a distinct time or from a gradual evolution of the coefficients. Thus, it is another type of nonstationarity.

Answer 47

We get a relationship that holds on average.

Answer 48

for known time of potential break: chow test for no difference before and after date. With unknown date: Use QLR statistic for running f-statisticstics on a range of values and pick the one with the highest value (note; we use a different distribution)

Answer 49

ARMA: Combination of an autoregressive function and a moving average funtion. ARIMA includes an integrated part, which controls for the number of unit roots. Thus your ARIMA is defined by: p, d and q. P is the order (number of lags) of the autoregression, d is the number of unit roots (ie the number of times you have to take the difference to eliminate the effect of stochastic trends), and q is the number of lags you use to estimate the moving average.

Answer 50

When they are not autocorrelated. This can be seen from creating an ACF graph of the residuals, once you have decided on your ARMA/ARIMA model.

Answer 51

The same subjects are being given different treatments at different points in time

Answer 52

When Yt is only lagged on X-values

Answer 53

Past and prestent exogeneity (All of the coefficients in the distributed lag model constitute all the non-zero dynamic effcts). Strict exogeneity: past, present AND future error terms has a mean of zero given values of Xt. Note: strict exogeneity implies past/present exogeneity, though the opposite is not true.

Answer 54

In a distributed lag model with autocorrelation between error terms

Answer 55

The sum of all the dynamic multipliers in the distributed lag model.

Answer 56

When we have a distributed lag model with autocorrelated errors, AND NOT when we have an AR, ARMA, ADL or ARIMA model. Because in that case we have included a sufficient number of lags, which implies that we have no autocorrelation in the errors anymore.

Answer 57

When want to develop a model that forecasts several variables and makes forecasts mutually consistent.

Answer 58

It is the analog of ACF for the multivariate case: it is a correlation between a variable and lags of another variable.

Answer 59

To forecast two periods in the future, we first forecast YT+1 and uses this in the equation to get Yt+2. This is iterated until period h

Answer 60

In the theory of vector spaces, a set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others; if no vector in the set can be written in this way, then the vectors are said to be linearly independent.

Answer 61

A regression performed on panel data to test the effect of being in state i. The model can be either entity demeaned, time demeaned or both. All regressions will have the same slope, but different intersections.

Answer 62

When the variance of the regression is different in different time perioed. We then use Arch or Garch models to correct our model.

Answer 63

The autocorrelation between yt and yt-k, controlling for all intermediate y-t-k observations.

Answer 64

The integrated part, which is the numbers of differences, d, that is to be taken to remove the stochastic trend in the data.

Answer 65

ADL models include laged x values in addition to lagged y-values, while AR only consists of lagged y-values.

Answer 66

A moving average model regresses y on the present and past (white noise) error terms and is used to improve an AR function in forecasting future values.

Applied econometrics Flashcards

Exam (90 cards)