CIP Session 9: Regression Analysis Flashcards
correlation and causation
- relation/association, nut no cause-effect
- causation: one causes the other
Strong correlation might suggest underlying causation
Regression Analysis
A statistical method of identifying the relationship between one or multiple independent variables (x) and a dependent variable (y)
Types of data
- Experimental (laboratory) evidence:
- Cross-sectional: data for many subjects at the same time
- Time series data: Data collected for the same units (N) on one or multiple variables over time
- Panel/Longitundinal data: collected for the same unit over a given time period
Ordinary Least Square (OLS) estimation and the 5 assumptions for unbiasdness, consistency and efficiency.
To select which fitted line fits the data best.
1. the parameters have a linear relationship
2. random sampling
3. there is variation in x
4. the mean of the error should be 0
5. Homoscedasticity; error does not depend on x
Unbiased, consistent and efficient estimator
- Unbiasedness = when you calculate β for various samples, the average for each sample should be roughly the population average
- Consistency = if you increase the sample size, your β should get closer to the population average Improve the estimator by increasing the sample size.
- Efficiency = how many data points you need to have a reliable coefficient estimator for β. The more you need, the less efficient.
What can you use to evaluate the fit of a model to data?
R2 tells how well the model fits the data, between 0 and 1, 1 is perfect.
What is Omitted Variable Bias and what can you do to solve it?
When important variable that influence Y are omitted from the model. To solve use multiple linear regressions and add a sixth assumption: the independent variables must not be milticollinear.