W5: RQ for Predictions 2 Flashcards by andy Sitoh

What is a full regression equation involving 2 IVs

Yi = a + b₁X_1i + b₂X_2i + e_i

b₁and b₂
- partial regerssion coefficients
a
- intercept

How well did you know this?

Not at all

Perfectly

What is a model regression equation involving 2 IVs

Y^i = a + b₁X_1i + b₂X_2i

How well did you know this?

Not at all

Perfectly

What is an intercept in a regression equation. What is it signified by

a
Expected score on DV when all IVs = 0

How well did you know this?

Not at all

Perfectly

What is a partial regression coefficient in a regression equation. What is it signified by

b1, b2,…,
Expected change in DV for each unit change in IV, holding constant scores on all other IV

How well did you know this?

Not at all

Perfectly

What is the Sum of Squares in a regression equation. What is SS Total Indicated by

SS_total= SS_reg + SS_res

OLS gurantees SS_regwill be as large as possible because it ensures SS_resis as small as possible!
- Unbiased in this regard

How well did you know this?

Not at all

Perfectly

What is the aim of OLS. Use sum of squares to explain

Maximize SS_reg; Minimize SS_res

How well did you know this?

Not at all

Perfectly

What is R^2. Alterative name. Is it an effect size?

Coefficient of Determination
- Effect size
Strength of prediction in a regression analysis
R^2 = SS_reg/SS_total
- Proportion of SS_totalaccounted for by SS_reg
- Proportion of variance in DV that is predictable from IVs

How well did you know this?

Not at all

Perfectly

What is the range of R^2

0 to 1.

Closer to 1 = Stronger = Greater proportion of DV explained by regression model (IVs) = SS_reggreat

How well did you know this?

Not at all

Perfectly

To calculate a confidence interval on R, what are the 4 things we require

Estimated R²value
Numerator df on F statistic (no. of IVs)
Denominator df on F statistic (n-IVs-1)
Desired confidence level
- Smaller width = Greater precision

How well did you know this?

Not at all

Perfectly

Under which conditions will the observed R-squared value be most biased (all other aspects being equal)?

When sample size is small and the number of IVs is large

How well did you know this?

Not at all

Perfectly

Typically what is the biasness/consistency of R^2

OLS estimate of R² is biased (often overestimated), but consistent (if sample size increase, it will get increasingly closer to 95%)

Note
- OLS estimate of slope is unbiased, but it is bias for R²

How well did you know this?

Not at all

Perfectly

How is an adjusted R² better than R²

It is usually less biased, but we should always report both values

How well did you know this?

Not at all

Perfectly

What are the two ways we can make meaningful comparison between IVs in a multiple regression

Transform regression coefficient to standardised partial regression coefficient
- Z-scores
Semi-Partial and Squared Semi-Partial Correlations

How well did you know this?

Not at all

Perfectly

Making meaningful comparison between IVs in a multiple regression: Method 1. When interpreting coefficients, what is the difference. When is this method useful?

Coefficients are interpreted in SD units
- Example
  - One SD increase in variable X will increase in 0.5 SD decrease in variable Y, holding constant scores on all other IV
No (intercept)
Only useful when IV has an arbitrary scaling

How well did you know this?

Not at all

Perfectly

Making meaningful comparison between IVs in a multiple regression: Method 1. What is the intercept

Always 0

How well did you know this?

Not at all

Perfectly

Making meaningful comparison between IVs in a multiple regression: Semi-Partial and Squared semi-partial Correlations. Overview.

Semi-partial correlatons
- Correlation between DV and each focal IV, when effects of other IVs have been removed from that focal IV
- Correlation between observed (not predicted) DV and scores on focal IV that is not accounted for by all other IV in the regression analysis
- Effect size estimate
Squared semi-partial correlation
- Proportion of variation in observed (not predicted) DV uniquely explained by each IV
- Directly analgous to R²of the overall model

How well did you know this?

Not at all

Perfectly

Making meaningful comparison between IVs in a multiple regression: What is the SQUARED semipartial correlation? What is the analogous to?

Indicates proportion of variation in DV UNIQUELY explained by each IV.
Directly analogous to R^2 in regression model but telling about each IV on its own.

What are the 4 statistical assumptions underlying the linear regression model

Independence of Observations
Linearity
Constant variance of residuals/ homoscedastiscity
- (Not homogenity)
Normality of Residual Scores

Statistical Assumption Linear Regression: Independence of Observations. How do we meet this

Scores are not duplicated for bigger sample
Responses on one variable does not determine person’s response to another variable

Statistical Assumption Linear Regression: Linearity.

(1) What is it (2) How do we meet this

Scores on the dependent variable are an additive
linear function of scores on the set of independent variables

Scatterplot Matrix
Residual Plot/ Scatterplot of Residual Scores
Marginal Model Plots
Marginal Conditional Plots

Statistical Assumption Linear Regression: Linearity. 1. Scatterplot Matrix

Y-Axis
- DV
X-Axis
- IVs
Look for U-Shapes/Inverted U-shapes

Statistical Assumption Linear Regression: Linearity. 2. Scatterplot of Residual Scores

Y-Axis
- Residual Scores
X-Axis
- Observed IV Scores
- Predicted DV Scores

Statistical Assumption Linear Regression: Linearity. Marignal Model Plot

Y-Axis
- Observed DV Scores
X-Axis
- Observed IV Scores
- Predicted DV Scores

Statistical Assumption Linear Regression: Marginal Conditional Plot. Why is it especially good

Y-Axis
- Predicted DV scores
X-Axis
- Observed IV Scores

_Conditional regression slop_e shows partial regression line after other IVs are partialled out with each other and the DV

Statistical Assumption Linear Regression: Homoscedasticity, What is it. What are the 2 ways

Constant residual variance for different predicted values of the dependent variable. 1) Residual Plots (Similar to linearity) 2. ) Breusch-Pagan Test (nCV)

Statistical Assumption Linear Regression: Homoscedasticity, Residual Plots

* Y-Axis * Residual Scores * X-Axis * Observed IV Scores * Predicted DV Scores * Examine fanning out

Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test Overview

* Null-Hypothesis Test that assumes the variance of **residuals** are constant/homoscedastic * Individual IVs * IV together * Regresion model * Evidence from the residual plots and from the ncvTest results are consistent, but it may not be the case. Report both if so.

Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test. What happens when the P value is small

We have reason to reject the assumption of constant variance for the residuals

Statistical Assumption Linear Regression: Normality of Residuals. What are the 3 ways

1. qqplot 2. Histogram 3. boxplot

Statistical Assumption Linear Regression: Normality of Residuals. qqplot. What does it contain

* Middle line: Strict normality * Side Lines: Confidence envelope * See residuals inside confidence

Statistical Assumption Linear Regression: Normality of Residuals. What are studentized residuals. Define Outliers and Influential Cases. How are they checked?

Stundentized Residuals = Standardized form of Residuals Value, where SD is one * Outliers (Measured by studentized value) * Vary large studentized residual value in absolute terms * Usually about 3 * Influential Case (Measured by Cook's D) * If removed from analysis, results in regression coefficient changing notably in value * Both are checked by **_influence index plot_**, which labels largest Cook's D and largest absolute studentized reisdual value * Note: Outliers IS NOT NECESSARILY infleuntial

Statistical Assumption Linear Regression: Normality of Residuals. How do we find influential cases

Cook's D * Measure of overly influential values * Range: \>0 * Problem: \>1

Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers

* Vary large studentized residual value in absolute terms * Usually about 3

What does "Over the Long Run" mean in CI

Repeated sampling of a population and 95% of these CIs will capture the true population parameter value.

What indicates the precision of R^2

Width of the interval between lower and upper bound. (Smaller width = More precise)

What are the factors in which observed R^2 will be much larger than true population value

1.) Small samples 2.) Many IVs

What is the size of each partial regression coefficient determined by

Scaling/metric of each IV (If scaling differ = cannot use relative size of b values to say which is a stronger predictor)

Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers

Studentized residuals (absolute value 3 problematic)

What is a studentized residual

A particular form of standardized residual