exam 3 Flashcards

1
Q

Quadratic population regression model

A

test score = p0+p1income+p2income^2+u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Note that we can test if the linear specification is true against the
alternative that the quadratic specification is true by testing
H0 : β2=0 vs β2 doesn’t equal 0

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relationship b/w Y&X is nonlinear

A
  1. Effect on Y of a change in X depends on X (marginal effect of X isn’t constant)
  2. linear regression is mis-specified: the functional form is wrong
  3. estimator of the effect on Y of X is bias
  4. the solution is to estimate a regression function that is nonlinear in X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Internal validity

A

the statistical inferences about casual effects are valid for the population being study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

External validity

A

statistical inferences can be generalized from population + setting studied to other population + setting

setting= legal, policy, physical environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Threats to external validity

Assessing threats to external validity requires detailed substantive knowledge and judgment on a case-by-case basis

A

How far can we generalize class size results from California?
– Differences in populations
*California in 2011?
* Massachusetts in 2011?
* Mexico in 2011?
– Differences in settings
* different legal requirements (e.g. special education)
* different treatment of bilingual education
– differences in teacher characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Internal validity Threats
SOWES

A

Sample selection bias
Omitted variable bias
Wrong functional form
Error in variable buas
Simultaneous causality bias

All of these imply that E(ui|X1i,…,X ki) ≠ 0 (or that conditional mean independence fails)
meaning OLS is biased and inconsistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Omitted Variable Bias arises if
1. determinant of Y
2. correlated with at least one included regressor

A

A control variable W correlated with, and
controls for, an omitted causal factor in the regression of Y
on X, but which itself does not necessarily have a causal effect

  • If the multiple regression includes control variables,
  • there are omitted factors that are not
    adequately controlled for
    -whether the error term is
    correlated with the variable of interest even after we have
    included the control variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are solutions to omitted variable bias?

A
  1. Include omitted causal variable as another regressor
  2. have data on one + controls and they’re adequate, then include control variables
  3. use panel data, each entity is observed more than once
  4. if omitted variable can’t be measured, use instrumental variable regression
    - replace dependent variable correlated w/ error with other that’s not correlated with error
  5. run randomized controlled experiment
    if X is randomly assigned, then X necessarily will be distributed independently of u; thus E(u|X = x) = 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Wrong functional form
- if functional form is incorrect
ex: an interaction term is incorrectly omitted;
then inferences on causal effects will be biased.

A

Solution
1. Continuous dependent variable: use the “appropriate”
nonlinear specifications in X (logarithms, interactions,
etc.)
2. Discrete (example: binary) dependent variable: need an
extension of multiple regression methods (“probit” or
“logit” analysis for binary dependent variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Errors in variable bias

A

So far we have assumed that X is measured without
error.
In reality, economic data often have measurement
error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Lessons in classical measurement error

A
  • The amount of bias in beta hat depends on the nature of the measurement error
  • If there is pure noise added to Xi, then beta hat is biased towards 0
  • The potential importance of measurement error bias depends
    on how the data are collected.
    – administrative data (e.g. # teachers in a school) are often quite accurate.
    – Survey data on sensitive questions (how much do you earn?)
    often have considerable measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Solutions to errors in variable bias

A
  1. Obtain better data
  2. Develop a specific model of measurement error process
  3. instrumental variables regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Missing data + sample selection bias

A
  1. Data are missing at random.
  2. Data are missing based on the value of one or more X’s
  3. Data are missing based in part on the value of Y or u

Cases 1 and 2 don’t introduce bias: the SE are larger than they would be if the data weren’t missing but is ˆβ
unbiased.
Case 3 introduces “sample selection” bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Case 1: data are missing at random

Suppose you took a simple random sample of 100 workers, dog ate 20 of the response sheets (selected
at random) before you could enter them into the computer
- This is equivalent to your having taken
a simple random sample of 80 workers , so your dog didn’t introduce any bias

A

Case 2 Data are missing based on a value of one of the X’s

restrict your analysis to the subset of school districts with STR < 20.
By only considering districts with small class sizes you won’t be able to say anything about districts with large class sizes, but focusing on just the small-class districts doesn’t
introduce bias.

This is equivalent to having missing data,
where the data are missing if STR > 20. More generally, if data are missing based only on values of X’s, the fact that
data are missing doesn’t bias the OLS estimator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CASE THREE data is missing based in part on the value of Y or u

A

In general this type of missing data does introduce bias into the OLS estimator.
- called sample selection bias.
Sample selection bias arises when a selection process:
- influences the availability of data and
- is related to the DV

17
Q

Simultaneous causality bias

A

X causes Y and Y causes X
- large u means large y, meaning large X
- corr(u,x) doesn’t = 0
- beta hat is biased and inconsistent

18
Q

Solution to simultaneous causality bias

A
  1. Run a randomized controlled experiment. Because Xi is
    chosen at random by the experimenter, there is no
    feedback from the outcome variable to Y i (assuming perfect
    compliance).
  2. Develop and estimate a complete model of both directions
    of causality. This is the idea behind many large macro
    models (e.g. Federal Reserve Bank-US). This is extremely
    difficult in practice.
  3. Use instrumental variables regression to estimate the
    causal effect of interest (effect of X on Y, ignoring effect of
    Y on X).
19
Q

Internal and External Validity

A
  • Forecasting and estimation of causal effects are
    quite different objectives.
  • For forecasting,
    – matters (a lot!)
    – Omitted variable bias isn’t a problem!
    – Interpreting coefficients in forecasting models is not
    important – the important thing is a good fit and a model
    you can “trust” to work in your application
    – External validity is paramount: the model estimated
    using historical data must hold into the (near) future
    – More on forecasting when we take up time series data
    R2
20
Q

Simultaneous causality bias

A

Large u means large Y, which implies large X
- corr doesn’t equal 0
- beta hat is bias and inconsistent

21
Q

Solutions to simultaneous causality bias

A
  1. Run a randomized controlled experiment. Because Xi is
    chosen at random by the experimenter, there is no feedback from the outcome variable to Y i (assuming perfect
    compliance).
  2. Develop and estimate a complete model of both directions
    of causality. This is the idea behind many large macro models (e.g. Federal Reserve Bank-US). This is extremely
    difficult in practice.
  3. Use instrumental variables regression to estimate the causal effect of interest (effect of X on Y, ignoring effect of Y on X).
22
Q

Internal and External Validity

A
  • forecasting and estimation of causal effects are different
  • R^2 adjusted matters
  • omitted variable bias isn’t a problem
  • external validity is hella important
23
Q

Omitted Variable Bias
- including control variables, is the error term uncorrelated with STR
Some evidence that the control variables might be doing their job:
– The STR coefficient doesn’t change much when the control
variables specifications change
– The results for California and Massachusetts are similar – so if
there is OV bias remaining, that OV bias would need to be
similar in the two data sets

A
24
Q

Why study experiments?
- ideal randomized controlled experiments provide a conceptual benchmark for assessing observation
- actual experiments are rare but influential
- Experiments can overcome the threats to internal validity of observational studies, however they have their own threats to internal and external
validity.
* Thinking about experiments helps us to understand quasi-experiments, or “natural experiments,” in “natural” variation induces “as if” random assignment.

A
  • An experiment is designed and implemented consciously by
    human researchers. An experiment randomly assigns subjects to treatment and control groups (think of clinical
    drug trials)
  • A quasi-experiment or natural experiment has a source
    of randomization that is “as if” randomly assigned, but this variation was not the result of an explicit randomized treatment and control design.
  • Program evaluation aimed at evaluating the effect of a program or policy
    ex: ad campaign to cut smoking, or a job training program.
25
Q

A treatment has a causal effect for a given individual

A
26
Q

Potential Outcome

A

outcome for an individual under a potential treatment or potential non-treatment

27
Q

Average treatment effect

A

the population mean value of the
individual treatment effects

28
Q

Wi= control variables

A
  1. If X is randomly assigned then Xi is uncorrelated with control so there will not be an omitted variable bias if W is removed. W included helps with smaller SE and reducing error variance
  2. If probability of assignment depends on W so that X is randomly assigned given W then omitting W can lead to OV bias. Including it eliminates OV bias
29
Q

Threats to Internal Validity
FAFE

threats show corr(x,u) doesn’t = 0
so OLS is bias

A
  1. FAILURE to randomize
  2. ATTRIBUTION- some subjects drop out
  3. FAILURE to follow treatment protocol
    4.Experimental effects
    - experimenter or subject bias
30
Q

Threats to External Validity

A
  1. Nonrepresentative sample
  2. Nonrepresentative “treatment” (that is,
    program or policy)
  3. General equilibrium effects (effect of a
    program can depend on its scale;
    admissions counseling
31
Q

Quasi/Natural Experiment

A

source of randomization that is “as if” randomly assigned

32
Q

2 Types of Quasi Experiment

A
  1. Treatment X “as if” randomly assigned
  2. Variable (Z) which influences receipt of
    treatment (X) is “as if” randomly assigned
33
Q

Potential Problems w/ Quasi Experiment

A

Threats to internal validity
1. Failure to randomize
2. Attribute
3. Failure to follow treatment
4. Experimental effects
5. Instrument invalidity- relevance and exogeneity

34
Q

Threats to external validity of a quasi experiment

A
  1. Nonrepresentative sample
  2. Nonrepresentative treatment
35
Q

Ideal experiments and potential outcomes
* The average treatment effect is the population mean
of the individual treatment effect, which is the
difference in potential outcomes when treated and
not treated.
* The treatment effect estimated in an ideal
randomized controlled experiment is unbiased for
the average treatment effect.

A
  • The average treatment effect is the population mean of the individual treatment effect = difference in potential outcomes when treated and not treated.
  • The treatment effect estimated in an ideal randomized controlled experiment is unbiased for the average treatment effect.
36
Q

Actual experiment

A
  • have threats to internal
    validity
  • Depending on the threat to internal validity. can be addressed by:
    – panel data regression (differences-in-differences)
    – multiple regression (including control variables), and
    – IV (using initial assignment as an instrument, possibly
    with control variables)
  • External validity also can be an important threat to the validity of experiments
37
Q

Quasi Experiment

  • have threats to internal validity
A

have an “as-if” randomly
assigned source of variation.
* generate:
– Xi which plausibly satisfies E(u i|Xi) = 0 (so estimation
proceeds using OLS); or
– instrumental variable(s) which plausibly satisfy E(u i|Zi) =
0 (so estimation proceeds using TSLS)