Week 4: Properties of Estimators, Gauss-Markov, Assumptions Part I Flashcards
Gauss-Markov theorem
Name the definition as well as the 5 conditions that have to be met
The Gaus-Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE), that is, the smallest variance among those that are unbiased and linear in the observed output variables.
1: linear function of X’s plus disturbances
2: (Conditional) mean independence (disturbances have mean zero)
3: Homoskedasticity
4: Uncorrelated disturbances
5: Disturbances are normally distributed
Regression
Definition
Regression allows researchers to predict or explain the variation in one variable based on another variable (Dependent - Independent variables).
Regression
Why regression 1
To predict the dependent variable - easier:
- We don’t care about the warning correlation ≠ causation
- We just want to choose X’s to minimize our errors of predictions.
- In other words - we want to maximize R-squared
- No assumptions needed
Examples: Predicting stocks, financial markets, economic indicators, election outcomes
Regression
Why regression 2
Regression to make causal inferences: We want to know if X causes Y
Violations of mean independence
- Measurement error in the independent variables
- Reverse causation
- Specification error-omission of relevant variables
measurement error
The difference between the observed value (the result of measurement) and the actual value of what we are measuring
Systematic measurement error
- Measuring something in addition to, instead of, or as an incomplete part of the true concept of interest
- Because relative to true concept of interest, depends on first having a good definition
- Always results in bias, inefficiency, and nonsense
- Can only be dealt with during research design
Example: GDP as a measure of national wealth:
- GDP measures only the monetary value of goods and services produced in a country
- Values destruction of ecosystems that generate short-term revenues, undervalues unpaid ‘household’ and other work
Example: survey design, measuring feminism, and surveyor
- induced measurement error-“Should men and women get equal pay for equal work?”
- Measures the extent to which social pressure induces individuals to answer questions in a certain way
Random measurement error
- For a particular observation, the observed value differs from the true value
- This difference is callede “error”
- These errors are random
- Faulty measuring tool, carelessness, rounding
To fix, you need better data
Reverse causation
- Instead of X causing Y, Y causes X (The effect of something could actually be its cause)
- Reverse causation leads to biasedness
- Solution is very difficult: Theory, advanced methods are needed. It’s potentially unsolvable
For example, some studies have observed that diet soda drinkers are more likely to be obese than people who don’t drink diet soda.
Specification error
Wrong variables:
- Including an irrelevvant variable - inefficiency
- Excluding a relevant variable - biasedness
Ubiasedness
Wheter the expected value of the sampling distribution of an estimator is equal to the unknown true value of the population parameter
Biasedness
Any instance that creates a difference between an expected value and the true value of a parameter being estimated.
In other words, it occurs when a statistic is unrepresentative of the population
Consequence of including an irrelevant variable
- Partial slope coefficients remain unbiased
- Estimates are inefficient
* The greater the correlation between the included variables, the more inefficient
* If they are uncorrelated, estimates are efficient
Excluding a relevant variable
- Does the variable have a causal effect on the dependent variable?
- Is the variable correlated with those variables whose effects are the focus of the study?
If answer “yes” to both, then the excluding the variable leads to bias.
Solution: add the variable as a “control”
How to detect and deal with wrong variable errors
- Exclusion of relevant variables is a problem of theory. Always ask: is there a third variable Z that is linked to BOTH X and Y?
- Inclusion of irrelevant variables can be diagnosed by the t-statistics (e.g. hypothesis testing)