M11 - Multiple Regression Flashcards
Objective of regression analysis
- analysis of the relship between … DV y and …. or … IV x on metric scale
- areas of application:
- -> ….. analysis: “what …. exist and how …. are they?”
- -> …. analysis: … the … of a change in an …
Objective of regression analysis
- analysis of the relship between ONE DV y and ONE or MORE IV x on metric scale
- areas of application:
- -> CAUSALITY analysis: “what RESHIPS exist and how STRONG are they?”
- -> IMPACT analysis: FORECASTING the IMPACT of a change in an IV
Conducting empirical research projects - steps
- research question
- literature
- developing hypotheses
- correlation analysis
- testing hypotheses using t-test
Simple Linear regression
- whats y ?
- whats x?
- structural term
- stochastic term
y = regressand, dependent variable x = regressor, independent variable
structural term: b0 + b1x describes the ystematic influence of x on y
stochastic term: u (error term, disturbance term, noise) describes the non-systematic/ random influence on y
–> also covers measurement errors
OLS estimator
- why?
- how?
why? –> the true coefficients in the regr equation are unknown and have to be estimated based on the sample
how?
–> selects the regr parameters in a way minimizing the sum of squared residuals
sum of squared residuals
measure of the discrepancy between the data and an estimation model.
A small indicates a tight fit of the model to the data.
Multiple Linear Regression
- why?
- how?
- parameters
why?
–> more than one independent variable is needed
- how?
- -> select regr parameters in a way minimizing the sum of squared residuals
- b0 : intercept , const term
b1 to bm: effect sizes
Interpretation of coefficients
- non-standardized
- standardized
- if IV is raised by 1unit, the DV will increase by the coefficient’s value
- at first standardize! beta = bj* (SD Xj/ SD y)
–> variances of IV and DV = 1
standardized coefficients refer to how many SD a DV will change, per 1 SD increase in the IV.
–> “effect size”
t-test for coefficients
- H0
t-test: know how well the model fits the data and the contribution of individual predictors
–> linear regression
H0: bj = 0 /
xj has no influence / effect on y.
–> if H0 is rejected, the slope bj is sufficiently high and contributes to y
Goodness of fit
- How well does the … … the data?
- variance decomp
- Coeff of deterination
- Interpretation
- How well does the MODEL FIT the data?
- total = explained + residual
(actual y - mean y)² = (predicted - mean)² + (actual - predicted)² - R² = expained var / total var
= (predicted - mean)² / (actual -mean)²
R² adjusted is adjusted by the number of IV used
–> if u use more IV the model R² would get better, R²adj is independent of that
–> The higher R², the better (if R² = 1, than the residuals are 0)
Omitted variable bias
- -> the true model has 2/more IV, but less are used in the model
- -> when a variable that is correlated with Y and one IV is omitted, than the effect of that variable is wrongly attributed to another IV