Final Exam Flashcards

Question 1

Q

The classical assumptions must be

Answer

A

met in order for OLS estimators to be the best available

Question 2

Q

Classical Assumption #1

Answer

A

The regression model is linear, is correctly specified, and has an additive error term

Question 3

Q

Classical Assumption #2

Answer

A

The error term has a zero population mean

Question 4

Q

Classical Assumption #3

Answer

A

All explanatory variables are uncorrelated with the error term

Question 5

Q

Classical Assumption #4

Answer

A

Observations of the error term are uncorrelated with each other (no serial correlation)

Question 6

Q

Classical Assumption #5

Answer

A

The error term has a constant variance (no heteroskedasticity)

Question 7

Q

Classical Assumption #6

Answer

A

No explanatory variable is a perfect linear function of any other explanatory variable(s) (no perfect multicollinearity)

Question 8

Q

Classical Assumption #7

Answer

A

The error term is normally distributed

Question 9

Q

Omitted Variable Bias (Conditions)

Answer

A

relevant (β2 ≠ 0) – X1 and X2 are correlated

Question 10

Q

Expected bias

Answer

A

Expected bias in መ 𝛽1 has two components: the sign of β2 and the sign of Corr(X1, X2)

Question 11

Q

Limited Dependent Variables

Answer

A

We have discussed dummy variables (indicator variables, binary variables) as a tool for measuring qualitative/categorical independent variables (gender, race, etc.)

Question 12

Q

linear probability model

Answer

A

simply running OLS for a regression, where the dependent variable is a dummy (i.e. binary) variable:
where Di is a dummy variable, and the Xs, βs, and ε are typical independent variables, regression coefficients, and an error term, respectively

Question 13

Q

the term, linear probability model

Answer

A

comes from the fact that the right side of the equation is linear while the expected value of the left side measures the probability that Di = 1

Question 14

Q

Some issues with LPM

Answer

A

෠ 𝑌 𝑖 ≤ 0 or ෠ 𝑌 𝑖 ≥ 1 a more fundamental problem with the linear probability model: nothing in the model requires ෠ 𝑌 to be between 0 and 1!  If ෠ 𝑌 is not between 0 and 1, how do we interpret it as a probability?  A related limitation is that the marginal effect of a 1-unit increase in any X is forced to be constant, which cannot possibly be true for all values of X  E.g., if increasing X by 1 always increases Y by a particular amount, ෠ 𝑌 must exceed 1 when X is sufficiently large

Question 15

Q

The Binomial Logit Model

Answer

A

The binomial logit is an estimation technique for equations with dummy dependent variables that avoids the unboundedness problem of the linear probability model

Question 16

Q

Logits cannot be estimated using OLS

Answer

A

but are instead estimated by maximum likelihood (ML), an iterative estimation technique that is especially useful for equations that are nonlinear in the coefficients  Again, for the logit model is bounded by 1 and 0

Question 17

Q

Basic Procedure for Random Assignment Experiments

Answer

A

 Recruit sample of subjects  Randomly assign some to treatment group and some to control group ◼ Random assignment makes treatment uncorrelated with individual characteristics  Measure average difference in outcomes between treatment and control groups

Question 18

Q

natural experiments (or quasi-experiments

Answer

A

attempt to utilize the “treatment-control” framework in the absence of actual random assignment to treatment and control groups

Question 19

Q

Difference-in-difference estimator:

Answer

A

Policy impact = (Tpost – Tpre) – (Cpost– Cpre)  T: treatment group outcome, C: control group outcome  The DD estimate is the amount by which the change for the treatment group exceeded the change for the control grou

Question 20

Q

Panel data:

Answer

A

repeated observations of multiple units over time (combination of cross-sectional and time-series)

Question 21

Q

Main advantages of panel data

Answer

A

 Increased sample size  Ability to answer types of questions that cross-sectional and time-series data cannot accommodate  Enables use of additional methods to eliminate omitted variables bias

Question 22

Q

Panel Data Notation

Answer

A

 𝑖 subscript indexes the cross-sectional unit (individual, county, state, etc.)  t subscript indexes the time period in which the unit is observe

Question 23

Q

First-differenced estimator

Answer

A

∆𝑌 𝑖 = 𝛼0 + 𝛽1∆𝑋𝑖 + ∆𝜀𝑖

Question 24

Q

Advantages of random effects estimator (if assumption about 𝑎𝑖 is correct):

Answer

A

 Allows time-invariant regressors to be included  More degrees of freedom (only estimates parameters of distribution from which 𝑎𝑖 is assumed to be drawn; fixed effects estimator uses one degree of freedom per fixed effect)

Question 25

Q

Disadvantages of random effects estimator:

Answer

A

 Biased if assumption that 𝑎𝑖 is uncorrelated with regressors is incorrect (while FE estimator allows arbitrary correlation between 𝑎𝑖 and regressors)

Question 26

Q

❑ Fixed effects estimator widely preferred when regressors of interest are time-varying

Answer

A

 It rarely seems likely that 𝑎𝑖 is uncorrelated with any regressors; fixed effects model is generally far more convincing

Question 27

Q

Hausman test

Answer

A

❑ Fixed and random effect estimators can be compared with a Hausman test (previously seen in instrumental variables context as test for endogeneity)

Question 28

Q

Fixed vs. random effects - General Advice

Answer

A

 Under random effects hypothesis, both RE and FE estimators are consistent (should give similar results); under alternative hypothesis, FE consistent but RE is not  Therefore, if estimates are significantly different, can reject null hypothesis of random effects

Question 29

Q

Fixed vs. random effects - Concept

Answer

A

❑ General advice: use fixed effects estimator if it’s feasible

Question 30

Q

T-test

Answer

A

Divide the coefficient by the standard error to get the t-value

Question 31

Q

Omitted Variables – Bias Assessment

Answer

A

Sign (β2) * Sign (Corr (X1, X2)) = Sign of Bias (β1)

Question 32

Q

Irrelevant Variables - Inclusion Criteria

Answer

A

Theory: is there sound justification for including the variable?

Bias: do the coefficients for other variables change noticeably when the variable is included?

T-Test: is the variable’s estimated coefficient statistically significant?

R-square: has the R-square (adjusted R-square) improved?

Question 33

Q

Serial Correlation

Answer

A

First-order serial correlation occurs when the value of the error term in one period is a function of its value in the previous period; the current error term is correlated with the previous error term.

Question 34

Q

DW Test

Answer

A

compare DW(d) to the critical values (𝐝_𝐋, 𝐝_𝐔)

Question 35

Q

Pure Heteroskedasticity

Answer

A

occurs in correctly specified equations

Question 36

Q

Impure Heteroskedasticity

Answer

A

arises due to model misspecification

Question 37

Q

Multicollinearity

Answer

A

Multicollinearity exists in every equation & the severity can change from sample to sample.

There are no generally accepted true statistical tests for multicollinearity.

VIF > 5 as a rule of thumb

Question 38

Q

Binary Dependent Variable Models

Answer

A

Linear Probability Model (LPM)
&
Logit / Probit Model

Question 39

Q

Linear Probability Model (LPM)

Answer

A

Similar to OLS regression
R-squared is no longer an accurate goodness-of-fit measure
Interpretation: probability that Y=1 on a percentage point scale

Question 40

Q

Logit / Probit Model

Answer

A

Restricted between 0 and 1
Automatically corrects for heteroskedasticity
Marginal effect of X is not constant
Not linear in the coefficients

Question 41

Q

LPM Interpretation (example)

Answer

A

On average, a 1-unit increase in DISTANCE is associated with a negative 7.2 percentage point change in the probability of choosing Cedars Sinai, holding all else constant

Question 42

Q

LPM Limitation #1

Answer

A

Unboundedness

The linear probability model produces nonsensical forecasts (>1 and <0)

Question 43

Q

LPM Limitation #2

Answer

A

Adj-R^2 is no longer accurate measure of overall fit

Question 44

Q

LPM Limitation #3

Answer

A

Marginal Effect (slope) of a 1-unit increase in X is forced to be constant

Question 45

Q

LPM Limitation #4

Answer

A

Error term is neither homoskedastic nor normally distributed

Question 46

Q

Logit Model Interpretation -

Coefficient Interpretation:

Answer

A

The sign and significance can be interpreted just as in linear models. B1 is the effect of a 1-unit increase in X1 on the log-odds ratio

Question 47

Q

Calculating Marginal Effects

Answer

A

margin, dydx (X) at mean

Question 48

Q

LPM Interpretations

Answer

A

B*100 percentage point change in the probability that Y=1

Question 49

Q

Logit Interpretations

Answer

A

B change in the log-odds ratio of Y=1

Question 50

Q

Marginal Effects Interpretations

Answer

A

(dy/dx)*100 percentage point change in the probability that Y=1

Question 51

Q

Experimental Methods -

Selection Problems

Answer

A

Treatment is not necessarily randomly assigned because of other systematic differences in the error term (endogeneity) which would cause bias in the treatment’s effect. We need a valid counterfactual to truly understand the effect of the intervention / treatment.

Question 52

Q

Valid Counterfactual

Answer

A

a control group that is exactly the same as the treatment group except it does not receive the treatment

Question 53

Q

Solution to Selection Problems

Answer

A

Randomization

-Researcher randomly assigns subjects to either a treatment or control group to estimate the treatment effect

Question 54

Q

Natural / Quasi-Experiments

Answer

A

Randomized experiments are hard to do in the social sciences, so researchers often rely upon natural experiments where an exogenous event mimics the treatment and control group framework in the absence of actual random assignment

Question 55

Q

Counterfactual Challenge

Answer

A

Counterfactual Challenge

Hard to find an untreated group that really is otherwise identical to the treated group

Question 56

Q

Panel Data - definitions expanded

Answer

A

Formed when cross-sectional and time-series data sets are combined to create a single data set. Main reason for working with panel data (beyond increasing sample size) is to provide insight into analytical questions that can’t be answered by using time-series or cross-sectional data alone

Question 57

Q

Panel Data Advantages

Answer

A

Increased sample size so more degrees of freedom & sample variability
Able to answer new research questions
Can eliminate omitted variable bias with fixed effects (controlling for unobserved heterogeneity)

Question 58

Q

Panel Data Concerns

Answer

A

Heteroskedasticity & Serial Correlation

Question 59

Q

Panel Data - Fixed Effects Model

Answer

A

Does a good job of estimating panel data equations, and it also helps avoid omitted variable bias due to unobserved heterogeneity.

Question 60

Q

Fixed Effects Model Assumptions

Answer

A

Each cross sectional unit has its own intercept. A fixed effects analysis will allow arbitrary correlation between all time-varying explanatory variables and 𝑎_𝑖

Question 61

Q

Fixed Effects Model Drawback

Answer

A

measurement error, autocorrelation, heteroskedasticity

Question 62

Q

Fixed Effects Model

Answer

A

The omitted variable bias arising from unobserved heterogeneity can be mitigated with panel data and the fixed effects model.

Question 63

Q

How Fixed Effect Model address Omitted Variable Bias

Answer

A

How? Estimates panel data by including enough dummy variables to allow each cross-sectional of individual i and time period t to have a different intercept. These dummy variables absorb the time-invariant, individual-specific omitted factors in the error term

Question 64

Q

Panel Data - Random Effects Model

When to use

Answer

A

When the explanatory variable of interest is time-invariant

Answer 65

A

Ai and regressors (Xit) are uncorrelated

Answer 66

A

Can handle time-invariant variables

- Uses fewer degrees of freedom than FE because of the lack of subject dummies

Answer 67

A

Compares fixed and random effect estimators to see if their difference is statistically significant.
If different  fixed effects model preferred (reject the null hypothesis of random effects)
If not different  random effects model to conserve degrees of freedom
(or provide estimates of both the fixed effects and random effects models)

If both models predict VERY DIFFERENT results, it suggests RE model has omitted variable bias and endogeneity present, making FE more statistically accurate

Answer 68

A

Cook’s D is used to detect influential outliers.

Answer 69

A

𝐻𝑆𝑖 = 0.42 + 0.028𝑚𝑒𝑑𝑢𝑐𝑖 + 0.002𝑚𝑒𝑑𝑢𝑐𝑖

2 + 0.06𝑤𝑜𝑟𝑘�

Answer 70

A

1) HS෢i could be ≤ 0 or ≥ 1. Linear probability model is difficult to interpret as a probability
because HS෢i
is not bounded by 0 and 1. The linear probability model produces nonsensical
forecasts (greater than 1 and less than 0).
2) The marginal effect of a 1-unit increase in any X is forced to be constant, which cannot possibly
be true for all values of X.
3) 𝑅ത2
is no longer an accurate goodness-of-fit measure. The predicted values of Y are forced to
change linearly with X, so you could obtain a low 𝑅ത2
for an accurate model.

Answer 71

A

Random experiments involve researcher’s randomly assigned subjects to either a treatment or control
group to estimate the treatment effect. Natural experiments or quasi-experiments, attempt to utilize
the “treatment-control” framework in the absence of actual random assignment to treatment and
control groups. Instead of researcher randomly assigning treatment, rely on some exogenous event to
create treatment and control groups. When the event or the policy is truly exogenous, treatment is as
good as randomly assigned.

Answer 72

A

1) Random experiments are often very costly or cannot be carried out due to being unethical.
2) Non-random samples. They often lack generalizability since the sample may not be randomly
drawn from the entire population of interest.
3) Attrition bias because treatment or control units non-randomly drop out of the experiment.
4) Hawthorne effects (people behave differently when observed, may respond to treatment/control
status).
5) Randomization failure (can only control for observed treatment-control differences; bias may
result if unobservable characteristics not perfectly balanced).

Answer 73

A

This method estimates the impact of a treatment by comparing the outcomes of a treatment
group and a control group before and after the treatment is received.

Answer 74

A

The main underlying
assumption: in the absence of the treatment, the difference between the outcomes of the two
groups would not have changed (i.e., they would have followed a common trend). The change
in outcomes of the control group is viewed as the counterfactual for the change in outcomes
of the treatment group.

Answer 75

A

Panel data are repeated observations of multiple units over time. It is a combination of cross-sectional
and time-series.

Answer 76

A

1) More degrees of freedom and more sample variability than cross sectional data alone or time
series data alone allow for more accurate inference of the model parameters and hence increase
the efficiency of estimates.
2) Eliminate omitted variables bias. It is often argued that the real reason someone finds an effect
is because of ignoring specific variables that are correlated with the explanatory variables when
specifying the model. Panel data allows us to control for missing or unobserved variables.
3) Ability to answer types of questions that cross-sectional and time-series data cannot
accommodate. For example transitions from employment to unemployment, from employment
to retirement, changes on health status or any other variables that can change through time.

Answer 77

A

The fixed effects model allows for 𝑎𝑖
to be correlated with the regressors and the random effects
estimator assumes 𝑎𝑖
is not correlated with the regressors.

Answer 78

A

Advantages of the random effects model are that allows time-invariant regressors to be included and
it includes more degrees of freedom.

Answer 79

A

The main disadvantage of random effects estimator is that is
biased if the assumption that 𝑎𝑖
is uncorrelated with regressors is incorrect.

Answer 80

A

An advantage of the fixed effects model is that allows arbitrary correlation between 𝑎𝑖 and any
regressors

Answer 81

A

One of the main disadvantages is that drops out time-invariant regressors.

Answer 82

A

Unless we
wish to estimate the effect of a time invariant variable, fixed effects are generally preferred over
random effects due to having less restrictive assumptions.