Weeks 4 & 5 - Regression Flashcards

Question

What does Multiple R measure in a regression analysis?

Answer 1

The extent to which higher predicted scores (Y hat) for the DV are associated with higher observed scores (Y) on the DV.

Answer 2

The number of independent variables

Answer 3

df res= n - no. of IVs - 1

Answer 4

df total = df reg + df res

Answer 5

The F distribution

Answer 6

Null Hypothesis significance testing of R squared

Answer 7

P squared (rho squared)

Answer 8

Ho = P2 (rho suqared) = 0

Answer 9

Ha = P2 (rho squared) is not equal to zero (or is larger than zero)

Answer 10

SS measures variation (they are squared deviation scores)

Answer 11

Measures the average variance (average SS)

Answer 12

MSReg = SSReg/df reg MSres = SSres/df res

Answer 13

Tobs = MSReg/MSres

Answer 14

F distributiion Tobs = F(dfReg, dfres) = MSReg/MSres

Answer 15

Positively skewed

Answer 16

F statistic must be between 0 and 1 (can't be negative)

Answer 17

In the tail (because positively skewed)

Answer 18

The Chi Square distribution

Answer 19

The use of the point estimate can only tell us if P2 (corresponding to R2) is equal, or not equal to zero. Can't provide a range of plausible values that R2 could be.

Answer 20

1. R2 can only range between 0 - 1, there for the CI can immediately and clearly estimate the precision with which R2 is being estimated. 2. If the lower bound of the CI is not 0, we immediately know that the null hypothesised value of 0 would be rejected. 3. The CI can indicate extreme bias in R2 (ie. R2 may not be contained within CI, indicating extreme bias).

Answer 21

1. The number of IVs 2. The sample size 3. The size of R2 - fewer IVs = greater precision - larger sample size = greater precision - larger R2 = greater precision

Answer 22

By taking the square root of the upper and lower bounds, a CI for Multiple R can be obtained.

Answer 23

The CI for Multiple R provides an estimate for the expected correlaiton between the observed and predicted scores on the dependent variable at the population level.

Answer 24

Yes, but it is also a consistent estimator

Answer 25

When the R2 is very small, Multiple R can be used to determine if there is a significant correlation between obs and exp. scores on the DV.

Answer 26

The unbiased estimate is better than the adjusted (but SPSS doesn't produce it)

Answer 27

That there is a huge upward bias in the observed R2 and that the population P2 is more likely to lie between a CI that is lower.

Answer 28

A large number of IVs and small sample size.

Answer 29

By using a larger sample size and fewer IVs

Answer 30

The Unbiased R2 is the only one that is definitely an accurate estimator of P2 as a point estimator (adjusted has slight negative bias and unadjusted is positively biased). However, unadjusted R2 is only accurate interval estimator.

Answer 31

All 3 estimators of R2 are accurate (therefore all consistent), all good estimators of interval estimates.

Answer 32

"b" - indicates the expected change in the scores for the DV for a unit change on the focal IV (while holding constant scores on all other IVs)

Answer 33

1. The semipartial correlation | 2. Standardised partial regression coefficient

Answer 34

the t-dstribution

Answer 35

df = n - dfReg - 1

Answer 36

Tobs = tobs = (partial regression coefficient - population regresssion coefficient)/ Standard Error of the regression coefficient.

Answer 37

A 95% CI indicates the range of expected change in the DV for a unit change in the focal IV (holding constant all other IVs).

Answer 38

The pearson correlation between scores on the DV and that part of the scores on an IV not accounted for by all other IVs in the regression model.

Answer 39

Irrespective of what the predicted score on the dependent variable are, the degree of variability of the residual variance is the same.

Answer 40

Systematic variabilty in the residual variances according to the predicted values (eg. low variability with low scores on DV and high variability with high scores on the DV)

Answer 41

A scatterplot of the residuals against predicted scores on the DV.

Answer 42

Because the predicted scores on the DV represent a linear combination of all scores on the IV (so don't need to check each IV individually)

Answer 43

Standardised predicted scores (Z transformation of Y hat scores)

Answer 44

The residual scores (can be standardised, studentised, or studentised deleted residuals)

Answer 45

Obtained when applying a Z transformation to raw residuals (mean =0, SD = 1)

Answer 46

Transforming raw score residuals by an estimate of their standard error (mean approx. 0, SD = 1)

Answer 47

Residuals obtained by repeatedly reapplying the same regression model to the sample data when one case is left out of the data, and then the next and the next. The difference between the original raw score and the predicted score from the regression equation excluding for case that value on the DV is called the deleted residual. The studentised deleted residual for a case is therefore its deleted residual value divided by an estimate of its standard error.

Answer 48

A studentised deleted residual identified on a scatter plot with a value greater than 2.5 to 3 (or less than -2.5 - 3. Studentised deleted residuals plotted on Y axis and predicted scores on the X axis.

Answer 49

1. Good at picking up outliers & extreme data points, especially when sample size is small. A studentised deleted residual with a value + or - 3 or above is an extreme data point.

Answer 50

Indicative of heteroscedasticity (systematic variation of residuals as predicted values incrsase/decrease.

Answer 51

Yes! In a small sample it is almost impossible to see on a scatterplot unless it is really obvious.

Answer 52

It can dramatically change the results of the regression model.

Answer 53

1. Examine the studentised deleted residuals for extreme scores 2. Investigate a measure of influential cases on our data using Cook's d statistic.

Answer 54

A studentised deleted residual of more than 2.5 or 3, or less than -2.5 or 3.

Answer 55

A statistic that is calculated for each data value in the regression model and assesses the influence of each case on the model, when that case has been removed from the model.

Answer 56

Minimum value of 0, and a large value (e.g. +1 or more) is indicative of an extreme datapoint.

Answer 57

A cook's d value of +1 or higher.

Answer 58

Systematic patterning indicating non-linearity should be evident in a scatterplot of the residuals and predicted values.

Answer 59

Expected value on the DV when the scores on all IVs = 0. Intercept always = 0 in a standardised regression equation.

Answer 60

Expected value on the DV when the scores on all IVs = 0. Intercept always = 0 in a standardised regression equation. Only need to understand this if an expected score of 0 is meaningful (otherwise forget it).

Weeks 4 & 5 - Regression Flashcards

(85 cards)