L5 - Inferences from Predictions & Regressions Flashcards
When presented with 2 or more IVs to a prediction, what are the two possible ways to approach the analysis?
- Simple regression
- Multiple regression!
Why would multiple regression be preferred over simple regression when there are multiple IVs?
By doing the regression as a multiple regression, the method of least squares partials out any joint overlapping effects on the DV among the set of IVs.
(Can be a problem when the IVs correlate with each other)/
What were some things that went wrong when performing a simple regression for multiple IVs?
- R Sqaured values, when added together, were very high compared to the single R Squared value obtained from multiple regression
- Regression coefficient (R) values were not the same for each IV in multiple regression, when compared to separate simple regression coeffecients.
How does multiple regression adjust the predictive effect of IVs to not have any overlap?
Method of least squares adjusts predictive effect of each IV in multiple regression due to CORRELATIONAL OVERLAP with over IVS by PARTIALLING IT OUT.
Each partial regression coefficient indicates the optimal strength of prediction for each IV, CONTROLLING for effects of all over IVs in the model.
What are the two ways we can make an inference from linear regression from?
- Null Hypothesis testing
- Confidence intervals
What is the corresponding population parameter to R Squared?
P ^2
(Rho) squared.
What does the null hypothesis test for R Squared use?
ANOVA table
How is mean square calculated?
MS = (the relevant) SS/ df
What can be found in the ANOVA table?
- SS for SS reg, SS res, SS tot
- Df for reg, res, tot
- MS for reg, res, tot
- F test statistic (Tobs)
- p value
What can be found in model summary statistics table?
- R (Multiple R)
- R square
- Adjusted R square
On what distribution is the F statistic modelled on, from the ANOVA table?
F theoretical probability distribution
What is the formula for F statistic (Tobs) in ANOVA
F = MS reg/ MS res
by looking at this, we can tell that F statistic is just a ratio for mean sums of regression and mean sums of residuals… hence it’s the AVERAGE AMOUNT OF VARIATION
What is the shape of an F distribution defined by?
2 parameters: numerator DF (df reg), denominator DF (df res).
What does the precision of a CI on R squared depend on?
- size of sample: bigger sample = more precise
- number of IVs: less IVs = more precise
- size of observed R square: larger R square = more precision
How do we obtain a CI for Multiple R?
By square rooting the upper and lower onds of the CI for R square, as R = square root of R square.