Chapter8 Flashcards

Question

Model sum of squares

Answer 1

a measure of the total amount of variability for which a model can account. It is the difference between the total sum of squares and the residual sum of squares.

Answer 2

a situation in which two or more variables are very closely linearly related.

Answer 3

the multiple correlation coefficient. It is the correlation between the observed values of an outcome and the values of the outcome predicted by a multiple regression model.

Answer 4

an extension of simple regression in which an outcome is predicted by a linear combination of two or more predictor variables. The form of the model is: (see above image)

Answer 5

a method of regression in which the parameters of the model are estimated using the method of least squares.

Answer 6

a variable whose values we are trying to predict from one or more predictor variables.

Answer 7

exists when at least one predictor in a regression model is a perfect linear combination of the others (the simplest example being two predictors that are perfectly correlated - they have a correlation coefficient of 1).

Answer 8

the value of an outcome variable based on specific values of the predictor variable or variables being placed into a statistical model.

Answer 9

a variable that is used to try to predict values of another variable known as an outcome variable.

Answer 10

The difference between the value a model predicts and the value observed in the data on which the model is based. Basically, an error. When the residual is calculated for each observation in a data set the resulting collection is referred to as the residuals.

Answer 11

a measure of the variability that cannot be explained by the model fitted to the data. It is the total squared deviance between the observations, and the value of those observations predicted by whatever model is fitted to the data.

Answer 12

the loss of predictive power of a regression model if the model had been derived from the population from which the sample was taken, rather than the sample itself.

Answer 13

a linear model in which one variable or outcome is predicted from a single predictor variable. The model takes the form: (see above image)

Answer 14

a standardized version of DFBeta. These standardized values are easier to use than DFBeta because universal cut-off points can be applied. Stevens (2002) suggests looking at cases with absolute values greater than 2.

Answer 15

a standardized version of DFFit.

Answer 16

the residuals of a model expressed in standard deviation units. Standardized residuals with an absolute value greater than 3.29 (actually, we usually just use 3) are cause for concern because in an average sample a value this high is unlikely to happen by chance; if more than 1% of our observations have standardized residuals with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit of the sample data); and if more than 5% of observations have standardized residuals with an absolute value greater than 1.96 (or 2 for convenience) then there is also evidence that the model is a poor representation of the actual data.

Answer 17

a method of multiple regression in which variables are entered into the model based on a statistical criterion (the semi-partial correlation with the outcome variable). Once a new variable is entered into the model, all variables in the model are assessed to see whether they should be removed.

Answer 18

a measure of the influence of a particular case of data. This is a standardized version of the deleted residual.

Answer 19

a variation on standardized residuals. A Studentized residual is an unstandardized residual divided by an estimate of its standard deviation that varies point by point. These residuals have the same properties as the standardized residuals but usually provide a more precise estimate of the error variance of a specific case.

Answer 20

situation where a predictor has a significant effect, but only when another variable is held constant.

Answer 21

Student's t is a test statistic with a known probability distribution (the t-distribution). In the context of regression it is used to test whether a regression coefficient b is significantly different from zero; in the context of experimental work it is used to test whether the differences between two means are significantly different from zero. See also paired-samples t-test and Independent t-test.

Answer 22

tolerance statistics measure multicollinearity and are simply the reciprocal of the variance inflation factor (1/VIF). Values below 0.1 indicate serious problems, although Menard (1995) suggests that values below 0.2 are worthy of concern.

Answer 23

a measure of the total variability within a set of observations. It is the total squared deviance between each observation and the overall mean of all observations.

Answer 24

the residuals of a model expressed in the units in which the original outcome variable was measured.

Answer 25

a measure of multicollinearity. The VIF indicates whether a predictor has a strong linear relationship with the other predictor(s). Myers (1990) suggests that a value of 10 is a good value at which to worry. Bowerman and O'Connell (1990) suggest that if the average VIF is greater than 1, then multicollinearity may be biasing the regression model.

Chapter8 Flashcards

(49 cards)