W5: RQ for Predictions 2 Flashcards

1
Q

What is a full regression equation involving 2 IVs

A

Yi = a + b1X1i + b2X2i + ei

  • b1 and b2
    • partial regerssion coefficients
  • a
    • intercept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a model regression equation involving 2 IVs

A

Y^i = a + b1X1i + b2X2i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an intercept in a regression equation. What is it signified by

A
  • a
  • Expected score on DV when all IVs = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a partial regression coefficient in a regression equation. What is it signified by

A
  • b1, b2,…,
  • Expected change in DV for each unit change in IV, holding constant scores on all other IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Sum of Squares in a regression equation. What is SS Total Indicated by

A

SStotal = SSreg + SSres

  • OLS gurantees SSreg will be as large as possible because it ensures SSres is as small as possible!
    • Unbiased in this regard
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the aim of OLS. Use sum of squares to explain

A

Maximize SSreg; Minimize SSres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is R^2. Alterative name. Is it an effect size?

A
  • Coefficient of Determination
    • Effect size
  • Strength of prediction in a regression analysis
  • R^2 = SSreg/SStotal
    • ​​Proportion of SStotal accounted for by SSreg
    • Proportion of variance in DV that is predictable from IVs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the range of R^2

A

0 to 1.

Closer to 1 = Stronger = Greater proportion of DV explained by regression model (IVs) = SSreg great

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

To calculate a confidence interval on R, what are the 4 things we require

A
  • Estimated R2 value
  • Numerator df on F statistic (no. of IVs)
  • Denominator df on F statistic (n-IVs-1)
  • Desired confidence level
    • Smaller width = Greater precision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Under which conditions will the observed R-squared value be most biased (all other aspects being equal)?

A

When sample size is small and the number of IVs is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Typically what is the biasness/consistency of R^2

A

OLS estimate of R2 is biased (often overestimated), but consistent (if sample size increase, it will get increasingly closer to 95%)

  • Note
    • OLS estimate of slope is unbiased, but it is bias for R2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is an adjusted R2 better than R2

A

It is usually less biased, but we should always report both values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two ways we can make meaningful comparison between IVs in a multiple regression

A
  • Transform regression coefficient to standardised partial regression coefficient
    • Z-scores
  • Semi-Partial and Squared Semi-Partial Correlations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Making meaningful comparison between IVs in a multiple regression: Method 1. When interpreting coefficients, what is the difference. When is this method useful?

A
  • Coefficients are interpreted in SD units
    • Example
      • One SD increase in variable X will increase in 0.5 SD decrease in variable Y, holding constant scores on all other IV
  • No (intercept)
  • Only useful when IV has an arbitrary scaling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Making meaningful comparison between IVs in a multiple regression: Method 1. What is the intercept

A

Always 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Making meaningful comparison between IVs in a multiple regression: Semi-Partial and Squared semi-partial Correlations. Overview.

A
  • Semi-partial correlatons
    • Correlation between DV and each focal IV, when effects of other IVs have been removed from that focal IV
    • Correlation between observed (not predicted) DV and scores on focal IV that is not accounted for by all other IV in the regression analysis
    • Effect size estimate
  • Squared semi-partial correlation
    • Proportion of variation in observed (not predicted) DV uniquely explained by each IV
    • Directly analgous to R2 of the overall model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Making meaningful comparison between IVs in a multiple regression: What is the SQUARED semipartial correlation? What is the analogous to?

A
  • Indicates proportion of variation in DV UNIQUELY explained by each IV.
  • Directly analogous to R^2 in regression model but telling about each IV on its own.
18
Q

What are the 4 statistical assumptions underlying the linear regression model

A
  • Independence of Observations
  • Linearity
  • Constant variance of residuals/ homoscedastiscity
    • (Not homogenity)
  • Normality of Residual Scores
19
Q

Statistical Assumption Linear Regression: Independence of Observations. How do we meet this

A
  • Scores are not duplicated for bigger sample
  • Responses on one variable does not determine person’s response to another variable
20
Q

Statistical Assumption Linear Regression: Linearity.

(1) What is it (2) How do we meet this

A
  • Scores on the dependent variable are an additive
    linear function of scores on the set of independent variables
  1. Scatterplot Matrix
  2. Residual Plot/ Scatterplot of Residual Scores
  3. Marginal Model Plots
  4. Marginal Conditional Plots
21
Q

Statistical Assumption Linear Regression: Linearity. 1. Scatterplot Matrix

A
  • Y-Axis
    • DV
  • X-Axis
    • IVs
  • Look for U-Shapes/Inverted U-shapes
22
Q

Statistical Assumption Linear Regression: Linearity. 2. Scatterplot of Residual Scores

A
  • Y-Axis
    • Residual Scores
  • X-Axis
    • Observed IV Scores
    • Predicted DV Scores
23
Q

Statistical Assumption Linear Regression: Linearity. Marignal Model Plot

A
  • Y-Axis
    • Observed DV Scores
  • X-Axis
    • Observed IV Scores
    • Predicted DV Scores
24
Q

Statistical Assumption Linear Regression: Marginal Conditional Plot. Why is it especially good

A
  • Y-Axis
    • Predicted DV scores
  • X-Axis
    • Observed IV Scores

_Conditional regression slop_e shows partial regression line after other IVs are partialled out with each other and the DV

25
Q

Statistical Assumption Linear Regression: Homoscedasticity, What is it. What are the 2 ways

A

Constant residual variance for different predicted values of the dependent variable.

1) Residual Plots (Similar to linearity)
2. ) Breusch-Pagan Test (nCV)

26
Q

Statistical Assumption Linear Regression: Homoscedasticity, Residual Plots

A
  • Y-Axis
    • Residual Scores
  • X-Axis
    • Observed IV Scores
    • Predicted DV Scores
  • Examine fanning out
27
Q

Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test Overview

A
  • Null-Hypothesis Test that assumes the variance of residuals are constant/homoscedastic
    • Individual IVs
    • IV together
    • Regresion model
  • Evidence from the residual plots and from the ncvTest
    results are consistent, but it may not be the case. Report both if so.
28
Q

Statistical Assumption Linear Regression: Homoscedasticity, Breusch-Pagan Test. What happens when the P value is small

A

We have reason to reject the assumption of constant variance for the residuals

29
Q

Statistical Assumption Linear Regression: Normality of Residuals. What are the 3 ways

A
  1. qqplot
  2. Histogram
  3. boxplot
30
Q

Statistical Assumption Linear Regression: Normality of Residuals. qqplot. What does it contain

A
  • Middle line: Strict normality
  • Side Lines: Confidence envelope
    • See residuals inside confidence
31
Q

Statistical Assumption Linear Regression: Normality of Residuals. What are studentized residuals.

Define Outliers and Influential Cases.

How are they checked?

A

Stundentized Residuals = Standardized form of Residuals Value, where SD is one

  • Outliers (Measured by studentized value)
    • Vary large studentized residual value in absolute terms
      • Usually about 3
  • Influential Case (Measured by Cook’s D)
    • If removed from analysis, results in regression coefficient changing notably in value
  • Both are checked by influence index plot, which labels largest Cook’s D and largest absolute studentized reisdual value
    • Note: Outliers IS NOT NECESSARILY infleuntial
32
Q

Statistical Assumption Linear Regression: Normality of Residuals. How do we find influential cases

A

Cook’s D

  • Measure of overly influential values
    • Range: >0
    • Problem: >1
33
Q

Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers

A
  • Vary large studentized residual value in absolute terms
  • Usually about 3
34
Q

What does “Over the Long Run” mean in CI

A

Repeated sampling of a population and 95% of these CIs will capture the true population parameter value.

35
Q

What indicates the precision of R^2

A

Width of the interval between lower and upper bound. (Smaller width = More precise)

36
Q

What are the factors in which observed R^2 will be much larger than true population value

A

1.) Small samples 2.) Many IVs

37
Q

What is the size of each partial regression coefficient determined by

A

Scaling/metric of each IV (If scaling differ = cannot use relative size of b values to say which is a stronger predictor)

38
Q

Statistical Assumption Linear Regression: Normality of Residuals. How do we find outliers

A

Studentized residuals (absolute value 3 problematic)

39
Q

What is a studentized residual

A

A particular form of standardized residual

40
Q
A