W5: RQ for Predictions 2 Flashcards
What is a full regression equation involving 2 IVs
Yi = a + b1X1i + b2X2i + ei
- b1 and b2
- partial regerssion coefficients
- a
- intercept
What is a model regression equation involving 2 IVs
Y^i = a + b1X1i + b2X2i
What is an intercept in a regression equation. What is it signified by
- a
- Expected score on DV when all IVs = 0
What is a partial regression coefficient in a regression equation. What is it signified by
- b1, b2,…,
- Expected change in DV for each unit change in IV, holding constant scores on all other IV
What is the Sum of Squares in a regression equation. What is SS Total Indicated by
SStotal = SSreg + SSres
- OLS gurantees SSreg will be as large as possible because it ensures SSres is as small as possible!
- Unbiased in this regard
What is the aim of OLS. Use sum of squares to explain
Maximize SSreg; Minimize SSres
What is R^2. Alterative name. Is it an effect size?
-
Coefficient of Determination
- Effect size
- Strength of prediction in a regression analysis
- R^2 = SSreg/SStotal
- Proportion of SStotal accounted for by SSreg
- Proportion of variance in DV that is predictable from IVs
What is the range of R^2
0 to 1.
Closer to 1 = Stronger = Greater proportion of DV explained by regression model (IVs) = SSreg great
To calculate a confidence interval on R, what are the 4 things we require
- Estimated R2 value
- Numerator df on F statistic (no. of IVs)
- Denominator df on F statistic (n-IVs-1)
- Desired confidence level
- Smaller width = Greater precision
Under which conditions will the observed R-squared value be most biased (all other aspects being equal)?
When sample size is small and the number of IVs is large
Typically what is the biasness/consistency of R^2
OLS estimate of R2 is biased (often overestimated), but consistent (if sample size increase, it will get increasingly closer to 95%)
- Note
- OLS estimate of slope is unbiased, but it is bias for R2
How is an adjusted R2 better than R2
It is usually less biased, but we should always report both values
What are the two ways we can make meaningful comparison between IVs in a multiple regression
- Transform regression coefficient to standardised partial regression coefficient
- Z-scores
- Semi-Partial and Squared Semi-Partial Correlations
Making meaningful comparison between IVs in a multiple regression: Method 1. When interpreting coefficients, what is the difference. When is this method useful?
- Coefficients are interpreted in SD units
- Example
- One SD increase in variable X will increase in 0.5 SD decrease in variable Y, holding constant scores on all other IV
- Example
- No (intercept)
- Only useful when IV has an arbitrary scaling
Making meaningful comparison between IVs in a multiple regression: Method 1. What is the intercept
Always 0
Making meaningful comparison between IVs in a multiple regression: Semi-Partial and Squared semi-partial Correlations. Overview.
- Semi-partial correlatons
- Correlation between DV and each focal IV, when effects of other IVs have been removed from that focal IV
- Correlation between observed (not predicted) DV and scores on focal IV that is not accounted for by all other IV in the regression analysis
- Effect size estimate
- Squared semi-partial correlation
- Proportion of variation in observed (not predicted) DV uniquely explained by each IV
- Directly analgous to R2 of the overall model
Making meaningful comparison between IVs in a multiple regression: What is the SQUARED semipartial correlation? What is the analogous to?
- Indicates proportion of variation in DV UNIQUELY explained by each IV.
- Directly analogous to R^2 in regression model but telling about each IV on its own.
What are the 4 statistical assumptions underlying the linear regression model
- Independence of Observations
- Linearity
- Constant variance of residuals/ homoscedastiscity
- (Not homogenity)
- Normality of Residual Scores
Statistical Assumption Linear Regression: Independence of Observations. How do we meet this
- Scores are not duplicated for bigger sample
- Responses on one variable does not determine person’s response to another variable
Statistical Assumption Linear Regression: Linearity.
(1) What is it (2) How do we meet this
- Scores on the dependent variable are an additive
linear function of scores on the set of independent variables
- Scatterplot Matrix
- Residual Plot/ Scatterplot of Residual Scores
- Marginal Model Plots
- Marginal Conditional Plots
Statistical Assumption Linear Regression: Linearity. 1. Scatterplot Matrix
- Y-Axis
- DV
- X-Axis
- IVs
- Look for U-Shapes/Inverted U-shapes
Statistical Assumption Linear Regression: Linearity. 2. Scatterplot of Residual Scores
- Y-Axis
- Residual Scores
- X-Axis
- Observed IV Scores
- Predicted DV Scores
Statistical Assumption Linear Regression: Linearity. Marignal Model Plot
- Y-Axis
- Observed DV Scores
- X-Axis
- Observed IV Scores
- Predicted DV Scores
Statistical Assumption Linear Regression: Marginal Conditional Plot. Why is it especially good
- Y-Axis
- Predicted DV scores
- X-Axis
- Observed IV Scores
_Conditional regression slop_e shows partial regression line after other IVs are partialled out with each other and the DV
Statistical Assumption Linear Regression: Homoscedasticity, What is it. What are the 2 ways
Constant residual variance for different predicted values of the dependent variable.
1) Residual Plots (Similar to linearity)
2. ) Breusch-Pagan Test (nCV)
Statistical Assumption Linear Regression: Homoscedasticity, Residual Plots
- Y-Axis
- Residual Scores
- X-Axis
- Observed IV Scores
- Predicted DV Scores
- Examine fanning out
Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test Overview
- Null-Hypothesis Test that assumes the variance of residuals are constant/homoscedastic
- Individual IVs
- IV together
- Regresion model
- Evidence from the residual plots and from the ncvTest
results are consistent, but it may not be the case. Report both if so.
Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test. What happens when the P value is small
We have reason to reject the assumption of constant variance for the residuals
Statistical Assumption Linear Regression: Normality of Residuals. What are the 3 ways
- qqplot
- Histogram
- boxplot
Statistical Assumption Linear Regression: Normality of Residuals. qqplot. What does it contain
- Middle line: Strict normality
- Side Lines: Confidence envelope
- See residuals inside confidence
Statistical Assumption Linear Regression: Normality of Residuals. What are studentized residuals.
Define Outliers and Influential Cases.
How are they checked?
Stundentized Residuals = Standardized form of Residuals Value, where SD is one
- Outliers (Measured by studentized value)
- Vary large studentized residual value in absolute terms
- Usually about 3
- Vary large studentized residual value in absolute terms
- Influential Case (Measured by Cook’s D)
- If removed from analysis, results in regression coefficient changing notably in value
- Both are checked by influence index plot, which labels largest Cook’s D and largest absolute studentized reisdual value
- Note: Outliers IS NOT NECESSARILY infleuntial
Statistical Assumption Linear Regression: Normality of Residuals. How do we find influential cases
Cook’s D
- Measure of overly influential values
- Range: >0
- Problem: >1
Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers
- Vary large studentized residual value in absolute terms
- Usually about 3
What does “Over the Long Run” mean in CI
Repeated sampling of a population and 95% of these CIs will capture the true population parameter value.
What indicates the precision of R^2
Width of the interval between lower and upper bound. (Smaller width = More precise)
What are the factors in which observed R^2 will be much larger than true population value
1.) Small samples 2.) Many IVs
What is the size of each partial regression coefficient determined by
Scaling/metric of each IV (If scaling differ = cannot use relative size of b values to say which is a stronger predictor)
Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers
Studentized residuals (absolute value 3 problematic)
What is a studentized residual
A particular form of standardized residual