Biases Flashcards
what is Heteroscedasticy
- error term doesn’t have a constant variance.
What is Multicollinearity
- When independent variables highly correlate.
- This can affect the value of B1 and B2 as we dont get their real val
- LEADS TO UNBIASED STANDARD ERRORS
- And Affected Coefficient values
What is heteroscedasticity and homoscedasticity?
The error term u is homoscedastic if the variance of the conditional distribution of u given X is constant and does not depend on X. Otherwise, the error therm is heteroskedastic.
Homosced: The error has a constant variance
Heterosced: The error has not a constant variance.
The distribution of the errors u is for various values of X. imagine a plot where the variance is large and one where it is small and compact.
What are the problems of working with heteroscedastic data?
Parameters will be unbiased, but variance estimator will be inconsistent. One solution is to use White’s robust variance estimator. Using White’s estimator on homoscedastic data will however give worse finite sample properties and increases likelihood of size distortions. Another solution to heteroscedastcity is to use GLS
.
.
What is a type 1 error
Rejecting a true null
What is meant by unbiasedness of an estimator?
- estimator whose expected value is equal to the population value
What is multicollinearity, and how can we test for it?
Perfect multicorr uccurs if two or more regressors are perfectly correlated. In reality, we will not often see two regressors that are perfectly correlated. That is why it most often occurs from the dummy trap or by including the same regressor twice. Can use Volatility inflation factor to test if there is multicollinearity. A rule of thumb is that there is multicollinearity if VIF > 10. The solution to this problem is simply just to drop the variable
What are the problems and solutions with heteroscedasticy?
PROBLEM:
Coefficients are unbiased and consistent
Standard errors are biased
OLS t statistic does not follow a t distribution
(Fail to) reject H0 too often or not often enough
SOLUTION
Use heteroskedasticity robust standard errors
Prudent to assume errors are heteroskedasticity unless there is acompelling reason
Implementation see Lab example
Whats the difference between biases and heterosced + multicorr?
They lead to violation of LS.1, hence the coefficient is biased. In contrast, heteroscedasticy and multicollinearity lead to biased standard errors, not biased errors.
What is omitted variable biases?
- occurs when a statistical model leaves out one or more relevant variables
- Not included in model, but are affecting dependent variable
- Zero mean assumption Violated
.
Simultanely bias
- one or more of the independent variables are jointly determined with the dependent variable.
- X causes Y but Y causes also X
- two variables on either side influence each other
- dont give the real causal effect
- Violate mean of zero assumption
Supply/demand a good example. Quantity and price Investments and Productivity Sales and advertisement This leads to violation of LS.1, hence our coefficient is biased.
Sample Selection bias
A type of bias that arises by choosing non-random data for statistical analysis. For example when people volunteer for a study. Those who volunteer might share the same characteristics.
For example, you want to study the context between veganism and undergraduate students. You send out a survey to the students in class of art and culture. Because this is not a random draw sample, it is not representative for the target population. These students might be more liberal etc.
Measurement error in independent variable
- There are often error in the data
Feks:
- Reporting error
- Coding error
- Estimation error
2 good examples of omitted variable bias in wage education
Education of individual’s parents,
Ability
how is B(hat) distribution if it is unbiased
the sampling distribution of βhat is centred around β
what is stationarity
- no trends or seasonality
- its statistical properties does not change over time
- constant mean and variance
What is a Type I Error?
What is a Type II Error?
1 - Fail to reject Null Hyp when you should have done it
2 - You don’t reject H0 when you should
What is perfect multicollinearity?
A phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. Generally, if we observe few significant t-ratios, but high R^2.
What are the consequences of high, but non-perfect multicollinearity?
- Large Variances and Covariances
- Wider Confidence intervalls
- r^2 tends to be very high
- Reduces precission of the estimated coefficients, which weakens your model
- might not trust your model
OLS is still BLUE but: Large variances and covariances, precise estimation difficult, wider confidence intervals, t-ratio tends to be statistically insignificant, R^2 tends to be very high, OLS estimators and standard errors can be sensitive to small changes in data
What does heteroscedasticy lead to?
- Coefficient doesn’t change
- But it leads to biased Standard Errors (SER)
- Biased SER makes hypothesis testing, t-test, p-vaules etc impossible
- Not BLUE or Gauss-Markov
What can you do about heteroscedasticy
- you can use clustered standard errors
How does perfect Multicollinearity occur?
- Dummy-trap
2. Include a variable twice
What does multicollinearity lead to? both perfect and not-perfect
- Large SER
- Affected Coeff
If it is PERFECT, it will violate assumption for multiple reg
What is Clustered Standard Errors?
- allow the regression errors to correlate within a cluster (entity),
- but assume that the regression is uncorrelated across clusters
- Clustered SER allow for heterosced and autocorr
Pooled OLS: what to do with SER
Can use a Robust Regression to fix the SER
BLUE for B1(hatt) stands for?
Best
Linear combination
Unbiased
Estimator
Assumptions when sample size is small / homoskedastic normal regression assumptions?
- if 3 assumptions from simple linear holds:
- the regression is homosced and SER are normally distributed
- use t-stat with t-distributionj
What type of biases do we have?
- Sample Selevtion bias
- Omitted variable bias
- Simultaneity bias
- Measurment error in independent variable
- ALL THESE LEADS TO VIOLATION OF LS.1
- biased coefficients
What is Measurment error in independent variable?
Data is often measured in error
- coding error
- reporting error
- estimation error
- Violated LS.1
- measurment error en dep var not a problem
Why do we want time-series to be stationary?
- nonstationary have undefined mean and infinite variance. Makes very biased answers