Week 4 Flashcards by Rebecka Otter

Predictors

Observed economic variables that we can base predictions on. This is what we call the regressors in a prediction context.

How well did you know this?

Not at all

Perfectly

Training sample

The data we use to estimate our predictive model. Contains both outcome and predictors.

How well did you know this?

Not at all

Perfectly

Out-of-sample observation

OSS, the test data. A new draw from the population that is independent of the training sample.

How well did you know this?

Not at all

Perfectly

Mean squared prediction error (MSPE):

The expected squared deviation between the test outcome and the predictive model that is estimated on the training data. This measure takes into account the uncertainty due to estimation error as well as uncertainty about the out-of-sample observation. In other words, it measure predictive ability.

How well did you know this?

Not at all

Perfectly

Difference between MSPE and MSE:

While the MSE measures the estimators fit , i.e. how well we fit the line in a bunch of true value observations, the MSPE measures a predictor’s fit, i.e. how well we make predictions with this fitted line.

How well did you know this?

Not at all

Perfectly

How can MSPE be decomposed?

Three parts:

Irreducable error
Approximation error (bias)
Estimation error (variance)

How well did you know this?

Not at all

Perfectly

The bias/variance trade-off of prediciton

The predictive model doesn’t affect the irreducable error. Complex models have low approx error (bias) and high est error (variance). Simple models have the reverse properties. The optimal model is somewhere in between.

How well did you know this?

Not at all

Perfectly

Training error

The MSE of the predictive model evaluated on the training sample. Measures the sum of irreducable and bias but ignores variance. (OLS)

How well did you know this?

Not at all

Perfectly

Test error

The MSE of the predictive model evaluated on the test sample. The test error estimates the MSPE.

How well did you know this?

Not at all

Perfectly

Overfitting

Choosing a model that has non-optimal (too high) MSPE because it focuses too much on reducing the bias. OLS minimizes the training error and therefore tends to overfit.

How well did you know this?

Not at all

Perfectly

The principle of shrinkage

Changing a predictive model to react less strongly to variation in the predictors can increase the MSPE (increases bias but lowers variance). This is a good method since the OLS may have overfitted the regression model so that it doesn’t work well in a new dataset –> by shrinking towards zero, the MSE will typically decrease.
Two methods doing this is the Ridge and Lasso.

How well did you know this?

Not at all

Perfectly

Ridge regression

A version of OLS with shrunken slope coefficients (smaller in absolute value than OLS). Ridge considers the cost of complexity through a penalty term that is parameterized by a regularization parameter lambda. If lambda=0 then the ridge regression is identical to OLS and if lambda=infinity then ridge is identical to sample mean (no variance).

How well did you know this?

Not at all

Perfectly

Lasso regression

Similar to ridge but uses a diff penalty term. Whereas ridge never shrinks b’s to zero, Lasso typically shrinks some (or many) to zero. WE say that some are selected (still in the model) and some are not selected (shrunken to 0).

How well did you know this?

Not at all

Perfectly

Random sampling: two properties:

Independently - Yt and Yt+r is independent if r is large and we end up somewhere unpredictable. New info is added when travelling in time.
Identical population - stationarity, ex ante we should not be able to predict because of rules/patterns. All samples from the same population have unconditional distributions. At t=0 we predict the same.

Observations that are generated through random sampling are identically independently distributed (iid). The order in which rows are arranged doesn’t matter but the rows (n) should be large.

How well did you know this?

Not at all

Perfectly

Cross-sectional data

Rows in the data set correspond to units and the columns describe unit characteristics. Rows are randomly sampled.

How well did you know this?

Not at all

Perfectly

Time series data

Study These Flashcards

Rows are time periods and columns describe a TS, i.e. the evolution of an economic variable over time. Rows are ordered by time period and this order matters and they are expected to exhibit serial correlation. Number of rows is T large.

Panel data

Study These Flashcards

Rows in the data correspond to a unit at a particular time period. Columns describe unit characteristics at different time periods. Units are n large and time period is T small.

Stationarity

Study These Flashcards

Unconditional distribution of any two segments of the time series of length r are identical. Replaces the “identical” part of the iid assumption for cross-sectional data.

Weak time dependence

Study These Flashcards

The TS at t is almost independent from TS at t+r for large r. Replaces the “independent” assumption for iid for cross-sectional data. This does not rule out serial correlation.
(Egen tanke, man vet att egenskaper idag kommer att ha betydelse i framtiden men vi vet inte idag HUR det kommer påverka framtiden så vi kan inte göra predictions utifrån det. Så serial correlation är OK men vi vet inte var vi “end up” om 100 år).

First lag, k’th lag

Study These Flashcards

The first lag of TS is the time series shifted back one period, t-1. The k’th lag of Yt is the Yt-k.

First difference

Study These Flashcards

The D of a time series is the difference between the time series and its first lag.

Auto-correlation

Study These Flashcards

The coefficient of correlation between a time series and its first lag.

Forecast

Study These Flashcards

A prediction about the future values of a TS abs on historical data.

Nowcast

Study These Flashcards

A prediction about the current value of a TS based on historical data (when we cannot observe current data, ex unemployment level is hard to gather info about).

Integrated process

A process is integrated of order one, I(1), if its first difference is stationary and of order k, I(k), if its first difference is integrated of order k-1, I(k-1).

Individual/unit fixed effect

The combined effect of all unobservable unit characteristics that don't change over time.

Clustered standard error

A SE for regression coefficient that is computed in a way that takes into account that collections of rows (clusters) are correlated. In panel data we define clusters to be all the rows corresponding to the same unit (ex obs from the same state in US for different time periods). By clustering we can allow for patterns of serial correlation.

Why is causal inference much harder with time series than cross-sectional data?

Because the exogeneity assumption is hard to satisfy: When the regressors are not only characteristics of something but also time. So U must be unconditional of the entire past if we have lags of Yt as regressors - and this will never be satisfied. So we cannot have lags of Yt as regressors --> they will get in to U instead so U will be huge and affect too much in the model.

Which level is lambda chosen at?

The point where the increase in bias is equal to the decrease in variance - in other words, the marginal effect is the same. There, we have the optimal complexity of the model.

Regularization

Keeping the same number of parameters (not trying to reduce complexity in the model) but reduce the magnitude of the coefficients. This is done by either the Ridge or Lasso regressions. - Mostly used to prevent multicollinearity.

Multicollinearity

When there are high correlation between two or more predictor variables. One variable can be used to predict the other (obvious example if x2=x1^2). This creates redundant information in the model.

Week 4 Flashcards

(31 cards)